linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-01 02:33:01 +09:00

Author	SHA1	Message	Date
Paul E. McKenney	7e7bc11a75	FROMGIT: rcu: Don't deboost before reporting expedited quiescent state Currently rcu_preempt_deferred_qs_irqrestore() releases rnp->boost_mtx before reporting the expedited quiescent state. Under heavy real-time load, this can result in this function being preempted before the quiescent state is reported, which can in turn prevent the expedited grace period from completing. Tim Murray reports that the resulting expedited grace periods can take hundreds of milliseconds and even more than one second, when they should normally complete in less than a millisecond. This was fine given that there were no particular response-time constraints for synchronize_rcu_expedited(), as it was designed for throughput rather than latency. However, some users now need sub-100-millisecond response-time constratints. This patch therefore follows Neeraj's suggestion (seconded by Tim and by Uladzislau Rezki) of simply reversing the two operations. Reported-by: Tim Murray <timmurray@google.com> Reported-by: Joel Fernandes <joelaf@google.com> Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Tim Murray <timmurray@google.com> Cc: Todd Kjos <tkjos@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: <stable@vger.kernel.org> # 5.4.x Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Bug: 217236054 Bug: 224756824 (cherry picked from commit `10c5357874` git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next) Change-Id: Iac442f4cb0648c967ec65d4df0f74c8c25940393 Signed-off-by: Kyle Lin <kylelin@google.com>	2022-03-17 15:22:42 +00:00
Greg Kroah-Hartman	b2a024ac7f	Merge `d04937ae94` ("x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT") into android13-5.10 Steps on the way to 5.10.105 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ie2249241cd53eea62126d9ee140a3d4e5a9012d8	2022-03-16 13:24:43 +01:00
Chris Goldsworthy	239dde6763	ANDROID: dma-direct: Make DMA32 disablement work for CONFIG_NUMA zone_dma32_is_empty() currently lacks the proper validation to ensure that the NUMA node ID it receives as an argument is valid. This has no effect on kernels with CONFIG_NUMA=n as NODE_DATA() will return the same pglist_data on these devices, but on kernels with CONFIG_NUMA=y, this is not the case, and the node passed to NODE_DATA must be validated. Rather than trying to find the node containing ZONE_DMA32, replace calls of zone_dma32_is_empty() with zone_dma32_are_empty() (which iterates over all nodes and returns false if one of the nodes holds DMA32 and it is non-empty). Bug: 199917449 Fixes: `c3c2bb34ac` ("ANDROID: arm64/mm: Add command line option to make ZONE_DMA32 empty") Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com> Change-Id: I850fb9213b71a1ef29106728bfda0cc6de46fdbb (cherry picked from commit `bf96382fb9`)	2022-03-16 01:27:07 +00:00
Chris Goldsworthy	19507e098b	ANDROID: arm64/mm: Add command line option to make ZONE_DMA32 empty ZONE_DMA32 is enabled by default on android12-5.10, yet it is not needed for all devices, nor is it desirable to have if not needed. For instance, if a partner in GKI 1.0 did not use ZONE_DMA32, memory can be lower for ZONE_NORMAL relative to older targets, such that memory would run out more quickly in ZONE_NORMAL leading kswapd to be invoked unnecessarily. Correspondingly, provide a means of making ZONE_DMA32 empty via the kernel command line when it is compiled in via CONFIG_ZONE_DMA32. Bug: 199917449 Change-Id: I70ec76914b92e518d61a61072f0b3cb41cb28646 Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com> (cherry picked from commit `c3c2bb34ac`)	2022-03-16 00:04:01 +00:00
Greg Kroah-Hartman	66c9212cdf	Merge 5.10.104 into android13-5.10 Changes in 5.10.104 mac80211_hwsim: report NOACK frames in tx_status mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work i2c: bcm2835: Avoid clock stretching timeouts ASoC: rt5668: do not block workqueue if card is unbound ASoC: rt5682: do not block workqueue if card is unbound regulator: core: fix false positive in regulator_late_cleanup() Input: clear BTN_RIGHT/MIDDLE on buttonpads KVM: arm64: vgic: Read HW interrupt pending state from the HW tipc: fix a bit overflow in tipc_crypto_key_rcv() cifs: fix double free race when mount fails in cifs_get_root() selftests/seccomp: Fix seccomp failure by adding missing headers dmaengine: shdma: Fix runtime PM imbalance on error i2c: cadence: allow COMPILE_TEST i2c: qup: allow COMPILE_TEST net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990 usb: gadget: don't release an existing dev->buf usb: gadget: clear related members when goto fail exfat: reuse exfat_inode_info variable instead of calling EXFAT_I() exfat: fix i_blocks for files truncated over 4 GiB tracing: Add test for user space strings when filtering on string pointers serial: stm32: prevent TDR register overwrite when sending x_char ata: pata_hpt37x: fix PCI clock detection drm/amdgpu: check vm ready by amdgpu_vm->evicting flag tracing: Add ustring operation to filtering string pointers ALSA: intel_hdmi: Fix reference to PCM buffer address riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP riscv: Fix config KASAN && DEBUG_VIRTUAL ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min iommu/amd: Recover from event log overflow drm/i915: s/JSP2/ICP2/ PCH xen/netfront: destroy queues before real_num_tx_queues is zeroed thermal: core: Fix TZ_GET_TRIP NULL pointer dereference ntb: intel: fix port config status offset for SPR mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls xfrm: fix MTU regression netfilter: fix use-after-free in __nf_register_net_hook() bpf, sockmap: Do not ignore orig_len parameter xfrm: fix the if_id check in changelink xfrm: enforce validity of offload input flags e1000e: Correct NVM checksum verification flow net: fix up skbs delta_truesize in UDP GRO frag_list netfilter: nf_queue: don't assume sk is full socket netfilter: nf_queue: fix possible use-after-free netfilter: nf_queue: handle socket prefetch batman-adv: Request iflink once in batadv-on-batadv check batman-adv: Request iflink once in batadv_get_real_netdevice batman-adv: Don't expect inter-netns unique iflink indices net: ipv6: ensure we call ipv6_mc_down() at most once net: dcb: flush lingering app table entries for unregistered devices net/smc: fix connection leak net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server rcu/nocb: Fix missed nocb_timer requeue ice: Fix race conditions between virtchnl handling and VF ndo ops ice: fix concurrent reset and removal of VFs sched/topology: Make sched_init_numa() use a set for the deduplicating sort sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa() ia64: ensure proper NUMA distance and possible map initialization mac80211: fix forwarded mesh frames AC & queue selection net: stmmac: fix return value of __setup handler mac80211: treat some SAE auth steps as final iavf: Fix missing check for running netdev net: sxgbe: fix return value of __setup handler ibmvnic: register netdev after init of adapter net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe() ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc() efivars: Respect "block" flag in efivar_entry_set_safe() firmware: arm_scmi: Remove space in MODULE_ALIAS name ASoC: cs4265: Fix the duplicated control name can: gs_usb: change active_channels's type from atomic_t to u8 arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output igc: igc_read_phy_reg_gpy: drop premature return ARM: Fix kgdb breakpoint for Thumb2 ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions selftests: mlxsw: tc_police_scale: Make test more robust pinctrl: sunxi: Use unique lockdep classes for IRQs igc: igc_write_phy_reg_gpy: drop premature return ibmvnic: free reset-work-item when flushing memfd: fix F_SEAL_WRITE after shmem huge page allocated s390/extable: fix exception table sorting ARM: dts: switch timer config to common devkit8000 devicetree ARM: dts: Use 32KiHz oscillator on devkit8000 soc: fsl: guts: Revert commit `3c0d64e867` soc: fsl: guts: Add a missing memory allocation failure check soc: fsl: qe: Check of ioremap return value ARM: tegra: Move panels to AUX bus ibmvnic: complete init_done on transport events net: chelsio: cxgb3: check the return value of pci_find_capability() iavf: Refactor iavf state machine tracking nl80211: Handle nla_memdup failures in handle_nan_filter drm/amdgpu: fix suspend/resume hang regression net: dcb: disable softirqs in dcbnl_flush_dev() Input: elan_i2c - move regulator_[en\|dis]able() out of elan_[en\|dis]able_power() Input: elan_i2c - fix regulator enable count imbalance after suspend/resume Input: samsung-keypad - properly state IOMEM dependency HID: add mapping for KEY_DICTATE HID: add mapping for KEY_ALL_APPLICATIONS tracing/histogram: Fix sorting on old "cpu" value tracing: Fix return value of __setup handlers btrfs: fix lost prealloc extents beyond eof after full fsync btrfs: qgroup: fix deadlock between rescan worker and remove qgroup btrfs: add missing run of delayed items after unlink during log replay Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6" hamradio: fix macro redefine warning Linux 5.10.104 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6db85dae2ee6420dfab7fc72fe79acdb74560637	2022-03-15 13:57:31 +01:00
Josh Poimboeuf	afc2d635b5	x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting commit `44a3918c82` upstream. With unprivileged eBPF enabled, eIBRS (without retpoline) is vulnerable to Spectre v2 BHB-based attacks. When both are enabled, print a warning message and report it in the 'spectre_v2' sysfs vulnerabilities file. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> [fllinden@amazon.com: backported to 5.10] Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-11 12:11:49 +01:00
Barry Song	66e59d2b41	UPSTREAM: genirq: Add IRQF_NO_AUTOEN for request_irq/nmi() Many drivers don't want interrupts enabled automatically via request_irq(). So they are handling this issue by either way of the below two: (1) irq_set_status_flags(irq, IRQ_NOAUTOEN); request_irq(dev, irq...); (2) request_irq(dev, irq...); disable_irq(irq); The code in the second way is silly and unsafe. In the small time gap between request_irq() and disable_irq(), interrupts can still come. The code in the first way is safe though it's subobtimal. Add a new IRQF_NO_AUTOEN flag which can be handed in by drivers to request_irq() and request_nmi(). It prevents the automatic enabling of the requested interrupt/nmi in the same safe way as #1 above. With that the various usage sites of #1 and #2 above can be simplified and corrected. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: dmitry.torokhov@gmail.com Link: https://lore.kernel.org/r/20210302224916.13980-2-song.bao.hua@hisilicon.com (cherry picked from commit `cbe16f35be`) Bug: 196772804 Signed-off-by: Keir Fraser <keirf@google.com> Change-Id: Ia3fa1c3c583c1562f2029250ac315ac2346def18	2022-03-09 08:57:45 -08:00
Randy Dunlap	827172ffa9	tracing: Fix return value of __setup handlers commit `1d02b444b8` upstream. __setup() handlers should generally return 1 to indicate that the boot options have been handled. Using invalid option values causes the entire kernel boot option string to be reported as Unknown and added to init's environment strings, polluting it. Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6 kprobe_event=p,syscall_any,$arg1 trace_options=quiet trace_clock=jiffies", will be passed to user space. Run /sbin/init as init process with arguments: /sbin/init with environment: HOME=/ TERM=linux BOOT_IMAGE=/boot/bzImage-517rc6 kprobe_event=p,syscall_any,$arg1 trace_options=quiet trace_clock=jiffies Return 1 from the __setup() handlers so that init's environment is not polluted with kernel boot options. Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Link: https://lkml.kernel.org/r/20220303031744.32356-1-rdunlap@infradead.org Cc: stable@vger.kernel.org Fixes: `7bcfaf54f5` ("tracing: Add trace_options kernel command line parameter") Fixes: `e1e232ca6b` ("tracing: Add trace_clock=<clock> kernel parameter") Fixes: `970988e19e` ("tracing/kprobe: Add kprobe_event= boot parameter") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:09:38 +01:00
Steven Rostedt (Google)	78059b1cfc	tracing/histogram: Fix sorting on old "cpu" value commit `1d1898f656` upstream. When trying to add a histogram against an event with the "cpu" field, it was impossible due to "cpu" being a keyword to key off of the running CPU. So to fix this, it was changed to "common_cpu" to match the other generic fields (like "common_pid"). But since some scripts used "cpu" for keying off of the CPU (for events that did not have "cpu" as a field, which is most of them), a backward compatibility trick was added such that if "cpu" was used as a key, and the event did not have "cpu" as a field name, then it would fallback and switch over to "common_cpu". This fix has a couple of subtle bugs. One was that when switching over to "common_cpu", it did not change the field name, it just set a flag. But the code still found a "cpu" field. The "cpu" field is used for filtering and is returned when the event does not have a "cpu" field. This was found by: # cd /sys/kernel/tracing # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger # cat events/sched/sched_wakeup/hist Which showed the histogram unsorted: { cpu: 19, pid: 1175 } hitcount: 1 { cpu: 6, pid: 239 } hitcount: 2 { cpu: 23, pid: 1186 } hitcount: 14 { cpu: 12, pid: 249 } hitcount: 2 { cpu: 3, pid: 994 } hitcount: 5 Instead of hard coding the "cpu" checks, take advantage of the fact that trace_event_field_field() returns a special field for "cpu" and "CPU" if the event does not have "cpu" as a field. This special field has the "filter_type" of "FILTER_CPU". Check that to test if the returned field is of the CPU type instead of doing the string compare. Also, fix the sorting bug by testing for the hist_field flag of HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use the special CPU field to know what compare routine to use, and since that special field does not have a size, it returns tracing_map_cmp_none. Cc: stable@vger.kernel.org Fixes: `1e3bac71c5` ("tracing/histogram: Rename "cpu" to "common_cpu"") Reported-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:09:38 +01:00
Dietmar Eggemann	1312ef5ad0	sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa() commit `71e5f6644f` upstream. Commit "sched/topology: Make sched_init_numa() use a set for the deduplicating sort" allocates 'i + nr_levels (level)' instead of 'i + nr_levels + 1' sched_domain_topology_level. This led to an Oops (on Arm64 juno with CONFIG_SCHED_DEBUG): sched_init_domains build_sched_domains() __free_domain_allocs() __sdt_free() { ... for_each_sd_topology(tl) ... sd = *per_cpu_ptr(sdd->sd, j); <-- ... } Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Barry Song <song.bao.hua@hisilicon.com> Link: https://lkml.kernel.org/r/6000e39e-7d28-c360-9cd6-8798fd22a9bf@arm.com Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:09:34 +01:00
Valentin Schneider	d753aecb3d	sched/topology: Make sched_init_numa() use a set for the deduplicating sort commit `620a6dc407` upstream. The deduplicating sort in sched_init_numa() assumes that the first line in the distance table contains all unique values in the entire table. I've been trying to pen what this exactly means for the topology, but it's not straightforward. For instance, topology.c uses this example: node 0 1 2 3 0: 10 20 20 30 1: 20 10 20 20 2: 20 20 10 20 3: 30 20 20 10 0 ----- 1 \| / \| \| / \| \| / \| 2 ----- 3 Which works out just fine. However, if we swap nodes 0 and 1: 1 ----- 0 \| / \| \| / \| \| / \| 2 ----- 3 we get this distance table: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 30 2: 20 20 10 20 3: 20 30 20 10 Which breaks the deduplicating sort (non-representative first line). In this case this would just be a renumbering exercise, but it so happens that we can have a deduplicating sort that goes through the whole table in O(n²) at the extra cost of a temporary memory allocation (i.e. any form of set). The ACPI spec (SLIT) mentions distances are encoded on 8 bits. Following this, implement the set as a 256-bits bitmap. Should this not be satisfactory (i.e. we want to support 32-bit values), then we'll have to go for some other sparse set implementation. This has the added benefit of letting us allocate just the right amount of memory for sched_domains_numa_distance[], rather than an arbitrary (nr_node_ids + 1). Note: DT binding equivalent (distance-map) decodes distances as 32-bit values. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210122123943.1217-2-valentin.schneider@arm.com Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:09:34 +01:00
Frederic Weisbecker	0c145262ac	rcu/nocb: Fix missed nocb_timer requeue commit `b2fcf21020` upstream. This sequence of events can lead to a failure to requeue a CPU's ->nocb_timer: 1. There are no callbacks queued for any CPU covered by CPU 0-2's ->nocb_gp_kthread. Note that ->nocb_gp_kthread is associated with CPU 0. 2. CPU 1 enqueues its first callback with interrupts disabled, and thus must defer awakening its ->nocb_gp_kthread. It therefore queues its rcu_data structure's ->nocb_timer. At this point, CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE. 3. CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a callback, but with interrupts enabled, allowing it to directly awaken the ->nocb_gp_kthread. 4. The newly awakened ->nocb_gp_kthread associates both CPU 1's and CPU 2's callbacks with a future grace period and arranges for that grace period to be started. 5. This ->nocb_gp_kthread goes to sleep waiting for the end of this future grace period. 6. This grace period elapses before the CPU 1's timer fires. This is normally improbably given that the timer is set for only one jiffy, but timers can be delayed. Besides, it is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y. 7. The grace period ends, so rcu_gp_kthread awakens the ->nocb_gp_kthread, which in turn awakens both CPU 1's and CPU 2's ->nocb_cb_kthread. Then ->nocb_gb_kthread sleeps waiting for more newly queued callbacks. 8. CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps waiting for more invocable callbacks. 9. Note that neither kthread updated any ->nocb_timer state, so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE. 10. CPU 1 enqueues its second callback, this time with interrupts enabled so it can wake directly ->nocb_gp_kthread. It does so with calling wake_nocb_gp() which also cancels the pending timer that got queued in step 2. But that doesn't reset CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE. So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now desynchronized. 11. ->nocb_gp_kthread associates the callback queued in 10 with a new grace period, arranges for that grace period to start and sleeps waiting for it to complete. 12. The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread, which in turn wakes up CPU 1's ->nocb_cb_kthread which then invokes the callback queued in 10. 13. CPU 1 enqueues its third callback, this time with interrupts disabled so it must queue a timer for a deferred wakeup. However the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which incorrectly indicates that a timer is already queued. Instead, CPU 1's ->nocb_timer was cancelled in 10. CPU 1 therefore fails to queue the ->nocb_timer. 14. CPU 1 has its pending callback and it may go unnoticed until some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever calls an explicit deferred wakeup, for example, during idle entry. This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime we delete the ->nocb_timer. It is quite possible that there is a similar scenario involving ->nocb_bypass_timer and ->nocb_defer_wakeup. However, despite some effort from several people, a failure scenario has not yet been located. However, that by no means guarantees that no such scenario exists. Finding a failure scenario is left as an exercise for the reader, and the "Fixes:" tag below relates to ->nocb_bypass_timer instead of ->nocb_timer. Fixes: `d1b222c6be` (rcu/nocb: Add bypass callback queueing) Cc: <stable@vger.kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:09:34 +01:00
Steven Rostedt	e57dfaf66f	tracing: Add ustring operation to filtering string pointers [ Upstream commit `f37c3bbc63` ] Since referencing user space pointers is special, if the user wants to filter on a field that is a pointer to user space, then they need to specify it. Add a ".ustring" attribute to the field name for filters to state that the field is pointing to user space such that the kernel can take the appropriate action to read that pointer. Link: https://lore.kernel.org/all/yt9d8rvmt2jq.fsf@linux.ibm.com/ Fixes: `77360f9bbc` ("tracing: Add test for user space strings when filtering on string pointers") Tested-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-03-08 19:09:31 +01:00
Steven Rostedt	c999c5927e	tracing: Add test for user space strings when filtering on string pointers [ Upstream commit `77360f9bbc` ] Pingfan reported that the following causes a fault: echo "filename ~ \"cpu\"" > events/syscalls/sys_enter_openat/filter echo 1 > events/syscalls/sys_enter_at/enable The reason is that trace event filter treats the user space pointer defined by "filename" as a normal pointer to compare against the "cpu" string. The following bug happened: kvm-03-guest16 login: [72198.026181] BUG: unable to handle page fault for address: 00007fffaae8ef60 #PF: supervisor read access in kernel mode #PF: error_code(0x0001) - permissions violation PGD 80000001008b7067 P4D 80000001008b7067 PUD 2393f1067 PMD 2393ec067 PTE 8000000108f47867 Oops: 0001 [#1] PREEMPT SMP PTI CPU: 1 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.14.0-32.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:strlen+0x0/0x20 Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31 RSP: 0018:ffffb5b900013e48 EFLAGS: 00010246 RAX: 0000000000000018 RBX: ffff8fc1c49ede00 RCX: 0000000000000000 RDX: 0000000000000020 RSI: ffff8fc1c02d601c RDI: 00007fffaae8ef60 RBP: 00007fffaae8ef60 R08: 0005034f4ddb8ea4 R09: 0000000000000000 R10: ffff8fc1c02d601c R11: 0000000000000000 R12: ffff8fc1c8a6e380 R13: 0000000000000000 R14: ffff8fc1c02d6010 R15: ffff8fc1c00453c0 FS: 00007fa86123db40(0000) GS:ffff8fc2ffd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffaae8ef60 CR3: 0000000102880001 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: filter_pred_pchar+0x18/0x40 filter_match_preds+0x31/0x70 ftrace_syscall_enter+0x27a/0x2c0 syscall_trace_enter.constprop.0+0x1aa/0x1d0 do_syscall_64+0x16/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fa861d88664 The above happened because the kernel tried to access user space directly and triggered a "supervisor read access in kernel mode" fault. Worse yet, the memory could not even be loaded yet, and a SEGFAULT could happen as well. This could be true for kernel space accessing as well. To be even more robust, test both kernel and user space strings. If the string fails to read, then simply have the filter fail. Note, TASK_SIZE is used to determine if the pointer is user or kernel space and the appropriate strncpy_from_kernel/user_nofault() function is used to copy the memory. For some architectures, the compare to TASK_SIZE may always pick user space or kernel space. If it gets it wrong, the only thing is that the filter will fail to match. In the future, this needs to be fixed to have the event denote which should be used. But failing a filter is much better than panicing the machine, and that can be solved later. Link: https://lore.kernel.org/all/20220107044951.22080-1-kernelfans@gmail.com/ Link: https://lkml.kernel.org/r/20220110115532.536088fd@gandalf.local.home Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Reported-by: Pingfan Liu <kernelfans@gmail.com> Tested-by: Pingfan Liu <kernelfans@gmail.com> Fixes: `87a342f5db` ("tracing/filters: Support filtering for char * strings") Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-03-08 19:09:30 +01:00
Greg Kroah-Hartman	89c97134d0	Merge 5.10.103 into android13-5.10 Changes in 5.10.103 cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug btrfs: tree-checker: check item_size for inode_item btrfs: tree-checker: check item_size for dev_item clk: jz4725b: fix mmc0 clock gating vhost/vsock: don't check owner in vhost_vsock_stop() while releasing parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel parisc/unaligned: Fix ldw() and stw() unalignment handlers KVM: x86/mmu: make apf token non-zero to fix bug drm/amdgpu: disable MMHUB PG for Picasso drm/i915: Correctly populate use_sagv_wm for all pipes sr9700: sanity check for packet length USB: zaurus: support another broken Zaurus CDC-NCM: avoid overflow in sanity checking netfilter: nf_tables_offload: incorrect flow offload action array size x86/fpu: Correct pkru/xstate inconsistency tee: export teedev_open() and teedev_close_context() optee: use driver internal tee_context for some rpc ping: remove pr_err from ping_lookup perf data: Fix double free in perf_session__delete() bnx2x: fix driver load from initrd bnxt_en: Fix active FEC reporting to ethtool hwmon: Handle failure to register sensor with thermal zone correctly bpf: Do not try bpf_msg_push_data with len 0 selftests: bpf: Check bpf_msg_push_data return value bpf: Add schedule points in batch ops io_uring: add a schedule point in io_add_buffers() net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends tipc: Fix end of loop tests for list_for_each_entry() gso: do not skip outer ip header in case of ipip and net_failover openvswitch: Fix setting ipv6 fields causing hw csum failure drm/edid: Always set RGB444 net/mlx5e: Fix wrong return value on ioctl EEPROM query failure net/sched: act_ct: Fix flow table lookup after ct clear or switching zones net: ll_temac: check the return value of devm_kmalloc() net: Force inlining of checksum functions in net/checksum.h nfp: flower: Fix a potential leak in nfp_tunnel_add_shared_mac() netfilter: nf_tables: fix memory leak during stateful obj update net/smc: Use a mutex for locking "struct smc_pnettable" surface: surface3_power: Fix battery readings on batteries without a serial number udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister() net/mlx5: Fix possible deadlock on rule deletion net/mlx5: Fix wrong limitation of metadata match on ecpf net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets spi: spi-zynq-qspi: Fix a NULL pointer dereference in zynq_qspi_exec_mem_op() regmap-irq: Update interrupt clear register for proper reset RDMA/rtrs-clt: Fix possible double free in error case RDMA/rtrs-clt: Kill wait_for_inflight_permits RDMA/rtrs-clt: Move free_permit from free_clt to rtrs_clt_close configfs: fix a race in configfs_{,un}register_subsystem() RDMA/ib_srp: Fix a deadlock tracing: Have traceon and traceoff trigger honor the instance iio: adc: men_z188_adc: Fix a resource leak in an error handling path iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot iio: Fix error handling for PM sc16is7xx: Fix for incorrect data being transmitted ata: pata_hpt37x: disable primary channel on HPT371 Revert "USB: serial: ch341: add new Product ID for CH341A" usb: gadget: rndis: add spinlock for rndis response list USB: gadget: validate endpoint index for xilinx udc tracefs: Set the group ownership in apply_options() not parse_options() USB: serial: option: add support for DW5829e USB: serial: option: add Telit LE910R1 compositions usb: dwc2: drd: fix soft connect when gadget is unconfigured usb: dwc3: pci: Fix Bay Trail phy GPIO mappings usb: dwc3: gadget: Let the interrupt handler disable bottom halves. xhci: re-initialize the HC during resume if HCE was set xhci: Prevent futile URB re-submissions due to incorrect return value. driver core: Free DMA range map when device is released RDMA/cma: Do not change route.addr.src_addr outside state checks thermal: int340x: fix memory leak in int3400_notify() riscv: fix oops caused by irqsoff latency tracer tty: n_gsm: fix encoding of control signal octet bit DV tty: n_gsm: fix proper link termination after failed open tty: n_gsm: fix NULL pointer access due to DLCI release tty: n_gsm: fix wrong tty control line for flow control tty: n_gsm: fix deadlock in gsmtty_open() gpio: tegra186: Fix chip_data type confusion memblock: use kfree() to release kmalloced memblock regions Linux 5.10.103 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I574f0ca699653770d2a7d4418b3bd76c2669b138	2022-03-02 15:44:44 +01:00
Steven Rostedt (Google)	afbeee13be	tracing: Have traceon and traceoff trigger honor the instance commit `302e9edd54` upstream. If a trigger is set on an event to disable or enable tracing within an instance, then tracing should be disabled or enabled in the instance and not at the top level, which is confusing to users. Link: https://lkml.kernel.org/r/20220223223837.14f94ec3@rorschach.local.home Cc: stable@vger.kernel.org Fixes: `ae63b31e4d` ("tracing: Separate out trace events from global variables") Tested-by: Daniel Bristot de Oliveira <bristot@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-02 11:42:53 +01:00
Eric Dumazet	7ef94bfb08	bpf: Add schedule points in batch ops commit `75134f16e7` upstream. syzbot reported various soft lockups caused by bpf batch operations. INFO: task kworker/1:1:27 blocked for more than 140 seconds. INFO: task hung in rcu_barrier Nothing prevents batch ops to process huge amount of data, we need to add schedule points in them. Note that maybe_wait_bpf_programs(map) calls from generic_map_delete_batch() can be factorized by moving the call after the loop. This will be done later in -next tree once we get this fix merged, unless there is strong opinion doing this optimization sooner. Fixes: `aa2e93b8e5` ("bpf: Add generic support for update and delete batch ops") Fixes: `cb4d03ab49` ("bpf: Add generic support for lookup batch op") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: Brian Vazquez <brianvv@google.com> Link: https://lore.kernel.org/bpf/20220217181902.808742-1-eric.dumazet@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-02 11:42:49 +01:00
Zhang Qiao	fcec42dd28	cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug commit `05c7b7a92c` upstream. As previously discussed(https://lkml.org/lkml/2022/1/20/51), cpuset_attach() is affected with similar cpu hotplug race, as follow scenario: cpuset_attach() cpu hotplug --------------------------- ---------------------- down_write(cpuset_rwsem) guarantee_online_cpus() // (load cpus_attach) sched_cpu_deactivate set_cpu_active() // will change cpu_active_mask set_cpus_allowed_ptr(cpus_attach) __set_cpus_allowed_ptr_locked() // (if the intersection of cpus_attach and cpu_active_mask is empty, will return -EINVAL) up_write(cpuset_rwsem) To avoid races such as described above, protect cpuset_attach() call with cpu_hotplug_lock. Fixes: `be367d0992` ("cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time") Cc: stable@vger.kernel.org # v2.6.32+ Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Acked-by: Waiman Long <longman@redhat.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-02 11:42:45 +01:00
Greg Kroah-Hartman	768ef3a611	Merge 5.10.102 into android13-5.10 Changes in 5.10.102 drm/nouveau/pmu/gm200-: use alternate falcon reset sequence mm: memcg: synchronize objcg lists with a dedicated spinlock rcu: Do not report strict GPs for outgoing CPUs fget: clarify and improve __fget_files() implementation fs/proc: task_mmu.c: don't read mapcount for migration entry can: isotp: prevent race between isotp_bind() and isotp_setsockopt() can: isotp: add SF_BROADCAST support for functional addressing scsi: lpfc: Fix mailbox command failure during driver initialization HID:Add support for UGTABLET WP5540 Revert "svm: Add warning message for AVIC IPI invalid target" serial: parisc: GSC: fix build when IOSAPIC is not set parisc: Drop __init from map_pages declaration parisc: Fix data TLB miss in sba_unmap_sg parisc: Fix sglist access in ccio-dma.c mmc: block: fix read single on recovery logic mm: don't try to NUMA-migrate COW pages that have other uses PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology parisc: Add ioread64_lo_hi() and iowrite64_lo_hi() btrfs: send: in case of IO error log it platform/x86: touchscreen_dmi: Add info for the RWC NANOTE P8 AY07J 2-in-1 platform/x86: ISST: Fix possible circular locking dependency detected selftests: rtc: Increase test timeout so that all tests run kselftest: signal all child processes net: ieee802154: at86rf230: Stop leaking skb's selftests/zram: Skip max_comp_streams interface on newer kernel selftests/zram01.sh: Fix compression ratio calculation selftests/zram: Adapt the situation that /dev/zram0 is being used selftests: openat2: Print also errno in failure messages selftests: openat2: Add missing dependency in Makefile selftests: openat2: Skip testcases that fail with EOPNOTSUPP selftests: skip mincore.check_file_mmap when fs lacks needed support ax25: improve the incomplete fix to avoid UAF and NPD bugs vfs: make freeze_super abort when sync_filesystem returns error quota: make dquot_quota_sync return errors from ->sync_fs scsi: pm8001: Fix use-after-free for aborted TMF sas_task scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task nvme: fix a possible use-after-free in controller reset during load nvme-tcp: fix possible use-after-free in transport error_recovery work nvme-rdma: fix possible use-after-free in transport error_recovery work drm/amdgpu: fix logic inversion in check x86/Xen: streamline (and fix) PV CPU enumeration Revert "module, async: async_synchronize_full() on module init iff async is used" gcc-plugins/stackleak: Use noinstr in favor of notrace random: wake up /dev/random writers after zap kbuild: lto: merge module sections kbuild: lto: Merge module sections if and only if CONFIG_LTO_CLANG is enabled iwlwifi: fix use-after-free drm/radeon: Fix backlight control on iMac 12,1 drm/i915/opregion: check port number bounds for SWSCI display power state vsock: remove vsock from connected table when connect is interrupted by a signal drm/i915/gvt: Make DRM_I915_GVT depend on X86 iwlwifi: pcie: fix locking when "HW not ready" iwlwifi: pcie: gen2: fix locking when "HW not ready" selftests: netfilter: fix exit value for nft_concat_range netfilter: nft_synproxy: unregister hooks on init error path ipv6: per-netns exclusive flowlabel checks net: dsa: lan9303: fix reset on probe net: dsa: lantiq_gswip: fix use after free in gswip_remove() net: ieee802154: ca8210: Fix lifs/sifs periods ping: fix the dif and sdif check in ping_lookup bonding: force carrier update when releasing slave drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hit net_sched: add __rcu annotation to netdev->qdisc bonding: fix data-races around agg_select_timer libsubcmd: Fix use-after-free for realloc(..., 0) dpaa2-eth: Initialize mutex used in one step timestamping path perf bpf: Defer freeing string after possible strlen() on it selftests/exec: Add non-regular to TEST_GEN_PROGS ALSA: hda/realtek: Add quirk for Legion Y9000X 2019 ALSA: hda/realtek: Fix deadlock by COEF mutex ALSA: hda: Fix regression on forced probe mask option ALSA: hda: Fix missing codec probe on Shenker Dock 15 ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw() ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_range() powerpc/lib/sstep: fix 'ptesync' build error mtd: rawnand: gpmi: don't leak PM reference in error path KVM: SVM: Never reject emulation due to SMAP errata for !SEV guests ASoC: tas2770: Insert post reset delay block/wbt: fix negative inflight counter when remove scsi device NFS: LOOKUP_DIRECTORY is also ok with symlinks NFS: Do not report writeback errors in nfs_getattr() tty: n_tty: do not look ahead for EOL character past the end of the buffer mtd: rawnand: qcom: Fix clock sequencing in qcom_nandc_probe() mtd: rawnand: brcmnand: Fixed incorrect sub-page ECC status Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id() KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW NFS: Don't set NFS_INO_INVALID_XATTR if there is no xattr cache ARM: OMAP2+: hwmod: Add of_node_put() before break ARM: OMAP2+: adjust the location of put_device() call in omapdss_init_of phy: usb: Leave some clocks running during suspend irqchip/sifive-plic: Add missing thead,c900-plic match string netfilter: conntrack: don't refresh sctp entries in closed state arm64: dts: meson-gx: add ATF BL32 reserved-memory region arm64: dts: meson-g12: add ATF BL32 reserved-memory region arm64: dts: meson-g12: drop BL32 region from SEI510/SEI610 pidfd: fix test failure due to stack overflow on some arches selftests: fixup build warnings in pidfd / clone3 tests kconfig: let 'shell' return enough output for deep path names lib/iov_iter: initialize "flags" in new pipe_buffer ata: libata-core: Disable TRIM on M88V29 soc: aspeed: lpc-ctrl: Block error printing on probe defer cases xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create drm/rockchip: dw_hdmi: Do not leave clock enabled in error case tracing: Fix tp_printk option related with tp_printk_stop_on_boot net: usb: qmi_wwan: Add support for Dell DW5829e net: macb: Align the dma and coherent dma masks kconfig: fix failing to generate auto.conf scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop EDAC: Fix calculation of returned address and next offset in edac_align_ptr() net: sched: limit TC_ACT_REPEAT loops dmaengine: sh: rcar-dmac: Check for error num after setting mask dmaengine: stm32-dmamux: Fix PM disable depth imbalance in stm32_dmamux_probe dmaengine: sh: rcar-dmac: Check for error num after dma_set_max_seg_size i2c: qcom-cci: don't delete an unregistered adapter i2c: qcom-cci: don't put a device tree node before i2c_add_adapter() copy_process(): Move fd_install() out of sighand->siglock critical section i2c: brcmstb: fix support for DSL and CM variants lockdep: Correct lock_classes index mapping Linux 5.10.102 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ief4c91d35938898a0cfebc0c746444f8a40c7ec4	2022-02-23 12:26:12 +01:00
Cheng Jui Wang	6062d1267f	lockdep: Correct lock_classes index mapping commit `28df029d53` upstream. A kernel exception was hit when trying to dump /proc/lockdep_chains after lockdep report "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!": Unable to handle kernel paging request at virtual address 00054005450e05c3 ... 00054005450e05c3] address between user and kernel address ranges ... pc : [0xffffffece769b3a8] string+0x50/0x10c lr : [0xffffffece769ac88] vsnprintf+0x468/0x69c ... Call trace: string+0x50/0x10c vsnprintf+0x468/0x69c seq_printf+0x8c/0xd8 print_name+0x64/0xf4 lc_show+0xb8/0x128 seq_read_iter+0x3cc/0x5fc proc_reg_read_iter+0xdc/0x1d4 The cause of the problem is the function lock_chain_get_class() will shift lock_classes index by 1, but the index don't need to be shifted anymore since commit `01bb6f0af9` ("locking/lockdep: Change the range of class_idx in held_lock struct") already change the index to start from 0. The lock_classes[-1] located at chain_hlocks array. When printing lock_classes[-1] after the chain_hlocks entries are modified, the exception happened. The output of lockdep_chains are incorrect due to this problem too. Fixes: `f611e8cf98` ("lockdep: Take read/write status in consideration when generate chainkey") Signed-off-by: Cheng Jui Wang <cheng-jui.wang@mediatek.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20220210105011.21712-1-cheng-jui.wang@mediatek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-23 12:01:08 +01:00
Waiman Long	9fee985f9a	copy_process(): Move fd_install() out of sighand->siglock critical section commit `ddc204b517` upstream. I was made aware of the following lockdep splat: [ 2516.308763] ===================================================== [ 2516.309085] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected [ 2516.309433] 5.14.0-51.el9.aarch64+debug #1 Not tainted [ 2516.309703] ----------------------------------------------------- [ 2516.310149] stress-ng/153663 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 2516.310512] ffff0000e422b198 (&newf->file_lock){+.+.}-{2:2}, at: fd_install+0x368/0x4f0 [ 2516.310944] and this task is already holding: [ 2516.311248] ffff0000c08140d8 (&sighand->siglock){-.-.}-{2:2}, at: copy_process+0x1e2c/0x3e80 [ 2516.311804] which would create a new lock dependency: [ 2516.312066] (&sighand->siglock){-.-.}-{2:2} -> (&newf->file_lock){+.+.}-{2:2} [ 2516.312446] but this new dependency connects a HARDIRQ-irq-safe lock: [ 2516.312983] (&sighand->siglock){-.-.}-{2:2} : [ 2516.330700] Possible interrupt unsafe locking scenario: [ 2516.331075] CPU0 CPU1 [ 2516.331328] ---- ---- [ 2516.331580] lock(&newf->file_lock); [ 2516.331790] local_irq_disable(); [ 2516.332231] lock(&sighand->siglock); [ 2516.332579] lock(&newf->file_lock); [ 2516.332922] <Interrupt> [ 2516.333069] lock(&sighand->siglock); [ 2516.333291] * DEADLOCK * [ 2516.389845] stack backtrace: [ 2516.390101] CPU: 3 PID: 153663 Comm: stress-ng Kdump: loaded Not tainted 5.14.0-51.el9.aarch64+debug #1 [ 2516.390756] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 2516.391155] Call trace: [ 2516.391302] dump_backtrace+0x0/0x3e0 [ 2516.391518] show_stack+0x24/0x30 [ 2516.391717] dump_stack_lvl+0x9c/0xd8 [ 2516.391938] dump_stack+0x1c/0x38 [ 2516.392247] print_bad_irq_dependency+0x620/0x710 [ 2516.392525] check_irq_usage+0x4fc/0x86c [ 2516.392756] check_prev_add+0x180/0x1d90 [ 2516.392988] validate_chain+0x8e0/0xee0 [ 2516.393215] __lock_acquire+0x97c/0x1e40 [ 2516.393449] lock_acquire.part.0+0x240/0x570 [ 2516.393814] lock_acquire+0x90/0xb4 [ 2516.394021] _raw_spin_lock+0xe8/0x154 [ 2516.394244] fd_install+0x368/0x4f0 [ 2516.394451] copy_process+0x1f5c/0x3e80 [ 2516.394678] kernel_clone+0x134/0x660 [ 2516.394895] __do_sys_clone3+0x130/0x1f4 [ 2516.395128] __arm64_sys_clone3+0x5c/0x7c [ 2516.395478] invoke_syscall.constprop.0+0x78/0x1f0 [ 2516.395762] el0_svc_common.constprop.0+0x22c/0x2c4 [ 2516.396050] do_el0_svc+0xb0/0x10c [ 2516.396252] el0_svc+0x24/0x34 [ 2516.396436] el0t_64_sync_handler+0xa4/0x12c [ 2516.396688] el0t_64_sync+0x198/0x19c [ 2517.491197] NET: Registered PF_ATMPVC protocol family [ 2517.491524] NET: Registered PF_ATMSVC protocol family [ 2591.991877] sched: RT throttling activated One way to solve this problem is to move the fd_install() call out of the sighand->siglock critical section. Before commit `6fd2fe494b` ("copy_process(): don't use ksys_close() on cleanups"), the pidfd installation was done without holding both the task_list lock and the sighand->siglock. Obviously, holding these two locks are not really needed to protect the fd_install() call. So move the fd_install() call down to after the releases of both locks. Link: https://lore.kernel.org/r/20220208163912.1084752-1-longman@redhat.com Fixes: `6fd2fe494b` ("copy_process(): don't use ksys_close() on cleanups") Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-23 12:01:08 +01:00
JaeSang Yoo	15616ba17d	tracing: Fix tp_printk option related with tp_printk_stop_on_boot [ Upstream commit `3203ce39ac` ] The kernel parameter "tp_printk_stop_on_boot" starts with "tp_printk" which is the same as another kernel parameter "tp_printk". If "tp_printk" setup is called before the "tp_printk_stop_on_boot", it will override the latter and keep it from being set. This is similar to other kernel parameter issues, such as: Commit `745a600cf1` ("um: console: Ignore console= option") or init/do_mounts.c:45 (setup function of "ro" kernel param) Fix it by checking for a "_" right after the "tp_printk" and if that exists do not process the parameter. Link: https://lkml.kernel.org/r/20220208195421.969326-1-jsyoo5b@gmail.com Signed-off-by: JaeSang Yoo <jsyoo5b@gmail.com> [ Fixed up change log and added space after if condition ] Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-02-23 12:01:06 +01:00
Kees Cook	143aaf79ba	gcc-plugins/stackleak: Use noinstr in favor of notrace [ Upstream commit `dcb85f85fa` ] While the stackleak plugin was already using notrace, objtool is now a bit more picky. Update the notrace uses to noinstr. Silences the following objtool warnings when building with: CONFIG_DEBUG_ENTRY=y CONFIG_STACK_VALIDATION=y CONFIG_VMLINUX_VALIDATION=y CONFIG_GCC_PLUGIN_STACKLEAK=y vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section Note that the plugin's addition of calls to stackleak_track_stack() from noinstr functions is expected to be safe, as it isn't runtime instrumentation and is self-contained. Cc: Alexander Popov <alex.popov@linux.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-02-23 12:01:00 +01:00
Igor Pylypiv	de55891e16	Revert "module, async: async_synchronize_full() on module init iff async is used" [ Upstream commit `67d6212afd` ] This reverts commit `774a1221e8`. We need to finish all async code before the module init sequence is done. In the reverted commit the PF_USED_ASYNC flag was added to mark a thread that called async_schedule(). Then the PF_USED_ASYNC flag was used to determine whether or not async_synchronize_full() needs to be invoked. This works when modprobe thread is calling async_schedule(), but it does not work if module dispatches init code to a worker thread which then calls async_schedule(). For example, PCI driver probing is invoked from a worker thread based on a node where device is attached: if (cpu < nr_cpu_ids) error = work_on_cpu(cpu, local_pci_probe, &ddi); else error = local_pci_probe(&ddi); We end up in a situation where a worker thread gets the PF_USED_ASYNC flag set instead of the modprobe thread. As a result, async_synchronize_full() is not invoked and modprobe completes without waiting for the async code to finish. The issue was discovered while loading the pm80xx driver: (scsi_mod.scan=async) modprobe pm80xx worker ... do_init_module() ... pci_call_probe() work_on_cpu(local_pci_probe) local_pci_probe() pm8001_pci_probe() scsi_scan_host() async_schedule() worker->flags \|= PF_USED_ASYNC; ... < return from worker > ... if (current->flags & PF_USED_ASYNC) <--- false async_synchronize_full(); Commit `21c3c5d280` ("block: don't request module during elevator init") fixed the deadlock issue which the reverted commit `774a1221e8` ("module, async: async_synchronize_full() on module init iff async is used") tried to fix. Since commit `0fdff3ec6d` ("async, kmod: warn on synchronous request_module() from async workers") synchronous module loading from async is not allowed. Given that the original deadlock issue is fixed and it is no longer allowed to call synchronous request_module() from async we can remove PF_USED_ASYNC flag to make module init consistently invoke async_synchronize_full() unless async module probe is requested. Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Changyuan Lyu <changyuanl@google.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-02-23 12:01:00 +01:00
Paul E. McKenney	657991fb06	rcu: Do not report strict GPs for outgoing CPUs commit `bfb3aa735f` upstream. An outgoing CPU is marked offline in a stop-machine handler and most of that CPU's services stop at that point, including IRQ work queues. However, that CPU must take another pass through the scheduler and through a number of CPU-hotplug notifiers, many of which contain RCU readers. In the past, these readers were not a problem because the outgoing CPU has interrupts disabled, so that rcu_read_unlock_special() would not be invoked, and thus RCU would never attempt to queue IRQ work on the outgoing CPU. This changed with the advent of the CONFIG_RCU_STRICT_GRACE_PERIOD Kconfig option, in which rcu_read_unlock_special() is invoked upon exit from almost all RCU read-side critical sections. Worse yet, because interrupts are disabled, rcu_read_unlock_special() cannot immediately report a quiescent state and will therefore attempt to defer this reporting, for example, by queueing IRQ work. Which fails with a splat because the CPU is already marked as being offline. But it turns out that there is no need to report this quiescent state because rcu_report_dead() will do this job shortly after the outgoing CPU makes its final dive into the idle loop. This commit therefore makes rcu_read_unlock_special() refrain from queuing IRQ work onto outgoing CPUs. Fixes: `44bad5b3cc` ("rcu: Do full report for .need_qs for strict GPs") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-23 12:00:56 +01:00
Greg Kroah-Hartman	f0a557399c	Merge 5.10.101 into android13-5.10 Changes in 5.10.101 integrity: check the return value of audit_log_start() ima: Remove ima_policy file before directory ima: Allow template selection with ima_template[_fmt]= after ima_hash= ima: Do not print policy rule with inactive LSM labels mmc: sdhci-of-esdhc: Check for error num after setting mask can: isotp: fix potential CAN frame reception race in isotp_rcv() net: phy: marvell: Fix RGMII Tx/Rx delays setting in 88e1121-compatible PHYs net: phy: marvell: Fix MDI-x polarity setting in 88e1118-compatible PHYs NFS: Fix initialisation of nfs_client cl_flags field NFSD: Clamp WRITE offsets NFSD: Fix offset type in I/O trace points drm/amdgpu: Set a suitable dev_info.gart_page_size tracing: Propagate is_signed to expression NFS: change nfs_access_get_cached to only report the mask NFSv4 only print the label when its queried nfs: nfs4clinet: check the return value of kstrdup() NFSv4.1: Fix uninitialised variable in devicenotify NFSv4 remove zero number of fs_locations entries error check NFSv4 expose nfs_parse_server_name function NFSv4 handle port presence in fs_location server string x86/perf: Avoid warning for Arch LBR without XSAVE drm: panel-orientation-quirks: Add quirk for the 1Netbook OneXPlayer net: sched: Clarify error message when qdisc kind is unknown powerpc/fixmap: Fix VM debug warning on unmap scsi: target: iscsi: Make sure the np under each tpg is unique scsi: ufs: ufshcd-pltfrm: Check the return value of devm_kstrdup() scsi: qedf: Add stag_work to all the vports scsi: qedf: Fix refcount issue when LOGO is received during TMF scsi: pm8001: Fix bogus FW crash for maxcpus=1 scsi: ufs: Treat link loss as fatal error scsi: myrs: Fix crash in error case PM: hibernate: Remove register_nosave_region_late() usb: dwc2: gadget: don't try to disable ep0 in dwc2_hsotg_suspend perf: Always wake the parent event nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout() KVM: eventfd: Fix false positive RCU usage warning KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow riscv: fix build with binutils 2.38 ARM: dts: imx23-evk: Remove MX23_PAD_SSP1_DETECT from hog group ARM: dts: Fix boot regression on Skomer ARM: socfpga: fix missing RESET_CONTROLLER nvme-tcp: fix bogus request completion when failing to send AER ACPI/IORT: Check node revision for PMCG resources PM: s2idle: ACPI: Fix wakeup interrupts handling drm/rockchip: vop: Correct RK3399 VOP register fields ARM: dts: Fix timer regression for beagleboard revision c ARM: dts: meson: Fix the UART compatible strings ARM: dts: meson8: Fix the UART device-tree schema validation ARM: dts: meson8b: Fix the UART device-tree schema validation staging: fbtft: Fix error path in fbtft_driver_module_init() ARM: dts: imx6qdl-udoo: Properly describe the SD card detect phy: xilinx: zynqmp: Fix bus width setting for SGMII ARM: dts: imx7ulp: Fix 'assigned-clocks-parents' typo usb: f_fs: Fix use-after-free for epfile gpio: aggregator: Fix calling into sleeping GPIO controllers drm/vc4: hdmi: Allow DBLCLK modes even if horz timing is odd. misc: fastrpc: avoid double fput() on failed usercopy netfilter: ctnetlink: disable helper autoassign arm64: dts: meson-g12b-odroid-n2: fix typo 'dio2133' ixgbevf: Require large buffers for build_skb on 82599VF drm/panel: simple: Assign data from panel_dpi_probe() correctly ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE gpio: sifive: use the correct register to read output values bonding: pair enable_port with slave_arr_updates net: dsa: mv88e6xxx: don't use devres for mdiobus net: dsa: ar9331: register the mdiobus under devres net: dsa: bcm_sf2: don't use devres for mdiobus net: dsa: felix: don't use devres for mdiobus net: dsa: lantiq_gswip: don't use devres for mdiobus ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path nfp: flower: fix ida_idx not being released net: do not keep the dst cache when uncloning an skb dst and its metadata net: fix a memleak when uncloning an skb dst and its metadata veth: fix races around rq->rx_notify_masked net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE tipc: rate limit warning for received illegal binding update net: amd-xgbe: disable interrupts during pci removal dpaa2-eth: unregister the netdev before disconnecting from the PHY ice: fix an error code in ice_cfg_phy_fec() ice: fix IPIP and SIT TSO offload net: mscc: ocelot: fix mutex lock error during ethtool stats read net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister vt_ioctl: fix array_index_nospec in vt_setactivate vt_ioctl: add array_index_nospec to VT_ACTIVATE n_tty: wake up poll(POLLRDNORM) on receiving data eeprom: ee1004: limit i2c reads to I2C_SMBUS_BLOCK_MAX usb: dwc2: drd: fix soft connect when gadget is unconfigured Revert "usb: dwc2: drd: fix soft connect when gadget is unconfigured" net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup usb: ulpi: Move of_node_put to ulpi_dev_release usb: ulpi: Call of_node_put correctly usb: dwc3: gadget: Prevent core from processing stale TRBs usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition USB: gadget: validate interface OS descriptor requests usb: gadget: rndis: check size of RNDIS_MSG_SET command usb: gadget: f_uac2: Define specific wTerminalType usb: raw-gadget: fix handling of dual-direction-capable endpoints USB: serial: ftdi_sio: add support for Brainboxes US-159/235/320 USB: serial: option: add ZTE MF286D modem USB: serial: ch341: add support for GW Instek USB2.0-Serial devices USB: serial: cp210x: add NCR Retail IO box id USB: serial: cp210x: add CPI Bulk Coin Recycler id speakup-dectlk: Restore pitch setting phy: ti: Fix missing sentinel for clk_div_table hwmon: (dell-smm) Speed up setting of fan speed Makefile.extrawarn: Move -Wunaligned-access to W=1 can: isotp: fix error path in isotp_sendmsg() to unlock wait queue scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabled scsi: lpfc: Reduce log messages seen after firmware download arm64: dts: imx8mq: fix lcdif port node perf: Fix list corruption in perf_cgroup_switch() iommu: Fix potential use-after-free during probe Linux 5.10.101 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6105dcbfc0c7f1373020d378d2048e692dc502ab	2022-02-16 15:14:37 +01:00
Song Liu	f6b5d51976	perf: Fix list corruption in perf_cgroup_switch() commit `5f4e5ce638` upstream. There's list corruption on cgrp_cpuctx_list. This happens on the following path: perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list) cpu_ctx_sched_in ctx_sched_in ctx_pinned_sched_in merge_sched_in perf_cgroup_event_disable: remove the event from the list Use list_for_each_entry_safe() to allow removing an entry during iteration. Fixes: `058fe1c044` ("perf/core: Make cgroup switch visit only cpuctxs with cgroup events") Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220204004057.2961252-1-song@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-16 12:54:31 +01:00
Rafael J. Wysocki	a941384fba	PM: s2idle: ACPI: Fix wakeup interrupts handling commit `cb1f65c1e1` upstream. After commit `e3728b50cd` ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE") wakeup interrupts occurring immediately after the one discarded by acpi_s2idle_wake() may be missed. Moreover, if the SCI triggers again immediately after the rearming in acpi_s2idle_wake(), that wakeup may be missed too. The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup() when pm_wakeup_irq is 0, but that's not the case any more after the interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is cleared by the pm_wakeup_clear() call in s2idle_loop(). However, there may be wakeup interrupts occurring in that time frame and if that happens, they will be missed. To address that issue first move the clearing of pm_wakeup_irq to the point at which it is known that the interrupt causing acpi_s2idle_wake() to tun will be discarded, before rearming the SCI for wakeup. Moreover, because that only reduces the size of the time window in which the issue may manifest itself, allow pm_system_irq_wakeup() to register two second wakeup interrupts in a row and, when discarding the first one, replace it with the second one. [Of course, this assumes that only one wakeup interrupt can be discarded in one go, but currently that is the case and I am not aware of any plans to change that.] Fixes: `e3728b50cd` ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE") Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-16 12:54:22 +01:00
James Clark	d0774cf730	perf: Always wake the parent event [ Upstream commit `961c391217` ] When using per-process mode and event inheritance is set to true, forked processes will create a new perf events via inherit_event() -> perf_event_alloc(). But these events will not have ring buffers assigned to them. Any call to wakeup will be dropped if it's called on an event with no ring buffer assigned because that's the object that holds the wakeup list. If the child event is disabled due to a call to perf_aux_output_begin() or perf_aux_output_end(), the wakeup is dropped leaving userspace hanging forever on the poll. Normally the event is explicitly re-enabled by userspace after it wakes up to read the aux data, but in this case it does not get woken up so the event remains disabled. This can be reproduced when using Arm SPE and 'stress' which forks once before running the workload. By looking at the list of aux buffers read, it's apparent that they stop after the fork: perf record -e arm_spe// -vvv -- stress -c 1 With this patch applied they continue to be printed. This behaviour doesn't happen when using systemwide or per-cpu mode. Reported-by: Ruben Ayrapetyan <Ruben.Ayrapetyan@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211206113840.130802-2-james.clark@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-02-16 12:54:20 +01:00
Amadeusz Sławiński	4607218fde	PM: hibernate: Remove register_nosave_region_late() [ Upstream commit `33569ef3c7` ] It is an unused wrapper forcing kmalloc allocation for registering nosave regions. Also, rename __register_nosave_region() to register_nosave_region() now that there is no need for disambiguation. Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-02-16 12:54:20 +01:00
Tom Zanussi	b4e0c9bcf1	tracing: Propagate is_signed to expression commit `097f1eefed` upstream. During expression parsing, a new expression field is created which should inherit the properties of the operands, such as size and is_signed. is_signed propagation was missing, causing spurious errors with signed operands. Add it in parse_expr() and parse_unary() to fix the problem. Link: https://lkml.kernel.org/r/f4dac08742fd7a0920bf80a73c6c44042f5eaa40.1643319703.git.zanussi@kernel.org Cc: stable@vger.kernel.org Fixes: `100719dcef` ("tracing: Add simple expression support to hist triggers") Reported-by: Yordan Karadzhov <ykaradzhov@vmware.com> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215513 Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> [sudip: adjust context] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-16 12:54:17 +01:00
Andrey Konovalov	01047c8c75	BACKPORT: FROMGIT: kasan, vmalloc: add vmalloc tagging for HW_TAGS (Backport: drop find_vmap_area_exceed_addr changes, as the function is not present in 5.10.) Add vmalloc tagging support to HW_TAGS KASAN. The key difference between HW_TAGS and the other two KASAN modes when it comes to vmalloc: HW_TAGS KASAN can only assign tags to physical memory. The other two modes have shadow memory covering every mapped virtual memory region. Make __kasan_unpoison_vmalloc() for HW_TAGS KASAN: - Skip non-VM_ALLOC mappings as HW_TAGS KASAN can only tag a single mapping of normal physical memory; see the comment in the function. - Generate a random tag, tag the returned pointer and the allocation, and initialize the allocation at the same time. - Propagate the tag into the page stucts to allow accesses through page_address(vmalloc_to_page()). The rest of vmalloc-related KASAN hooks are not needed: - The shadow-related ones are fully skipped. - __kasan_poison_vmalloc() is kept as a no-op with a comment. Poisoning and zeroing of physical pages that are backing vmalloc() allocations are skipped via __GFP_SKIP_KASAN_UNPOISON and __GFP_SKIP_ZERO: __kasan_unpoison_vmalloc() does that instead. Enabling CONFIG_KASAN_VMALLOC with HW_TAGS is not yet allowed. Link: https://lkml.kernel.org/r/d19b2e9e59a9abc59d05b72dea8429dcaea739c6.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit c9a950bcf1d67298187050bc3179096e4ef248c1 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Bug: 217222520 Change-Id: I446b0ae074938389ade70bf503784d4d32b5d09b Signed-off-by: Andrey Konovalov <andreyknvl@google.com>	2022-02-15 17:55:27 +01:00
Andrey Konovalov	092c06519c	FROMLIST: kasan, fork: reset pointer tags of vmapped stacks [Combines a FROMGIT patch and FROMLIST fix for it.] Once tag-based KASAN modes start tagging vmalloc() allocations, kernel stacks start getting tagged if CONFIG_VMAP_STACK is enabled. Reset the tag of kernel stack pointers after allocation in alloc_thread_stack_node(). For SW_TAGS KASAN, when CONFIG_KASAN_STACK is enabled, the instrumentation can't handle the SP register being tagged. For HW_TAGS KASAN, there's no instrumentation-related issues. However, the impact of having a tagged SP register needs to be properly evaluated, so keep it non-tagged for now. Note, that the memory for the stack allocation still gets tagged to catch vmalloc-into-stack out-of-bounds accesses. Link: https://lkml.kernel.org/r/c6c96f012371ecd80e1936509ebcd3b07a5956f7.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 9d2dae85d689202c56068ce62e20821ad91c3606 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/linux-mm/f50c5f96ef896d7936192c888b0c0a7674e33184.1644943792.git.andreyknvl@google.com/ Bug: 217222520 Change-Id: Ie723b03f1b857bc841cffc9a424b2791c97044a6 Signed-off-by: Andrey Konovalov <andreyknvl@google.com>	2022-02-15 17:54:25 +01:00
Jun Miao	e7c11d3363	BACKPORT: rcu: Avoid alloc_pages() when recording stack (Backport: adjacent lines changes.) The default kasan_record_aux_stack() calls stack_depot_save() with GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...). In general, however, it is not even possible to use either GFP_ATOMIC nor GFP_NOWAIT in certain non-preemptive contexts/RT kernel including raw_spin_locks (see gfp.h and `ab00db216c`). Fix it by instructing stackdepot to not expand stack storage via alloc_pages() in case it runs out by using kasan_record_aux_stack_noalloc(). Jianwei Hu reported: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:969 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 15319, name: python3 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590 softirqs last enabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 6 PID: 15319 Comm: python3 Tainted: G W O 5.15-rc7-preempt-rt #1 Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1b 12/17/2018 Call Trace: show_stack+0x52/0x58 dump_stack+0xa1/0xd6 ___might_sleep.cold+0x11c/0x12d rt_spin_lock+0x3f/0xc0 rmqueue+0x100/0x1460 rmqueue+0x100/0x1460 mark_usage+0x1a0/0x1a0 ftrace_graph_ret_addr+0x2a/0xb0 rmqueue_pcplist.constprop.0+0x6a0/0x6a0 __kasan_check_read+0x11/0x20 __zone_watermark_ok+0x114/0x270 get_page_from_freelist+0x148/0x630 is_module_text_address+0x32/0xa0 __alloc_pages_nodemask+0x2f6/0x790 __alloc_pages_slowpath.constprop.0+0x12d0/0x12d0 create_prof_cpu_mask+0x30/0x30 alloc_pages_current+0xb1/0x150 stack_depot_save+0x39f/0x490 kasan_save_stack+0x42/0x50 kasan_save_stack+0x23/0x50 kasan_record_aux_stack+0xa9/0xc0 __call_rcu+0xff/0x9c0 call_rcu+0xe/0x10 put_object+0x53/0x70 __delete_object+0x7b/0x90 kmemleak_free+0x46/0x70 slab_free_freelist_hook+0xb4/0x160 kfree+0xe5/0x420 kfree_const+0x17/0x30 kobject_cleanup+0xaa/0x230 kobject_put+0x76/0x90 netdev_queue_update_kobjects+0x17d/0x1f0 ... ... ksys_write+0xd9/0x180 __x64_sys_write+0x42/0x50 do_syscall_64+0x38/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Links: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/kasan.h?id=7cb3007ce2da27ec02a1a3211941e7fe6875b642 Fixes: `84109ab585` ("rcu: Record kvfree_call_rcu() call stack for KASAN") Fixes: `26e760c9a7` ("rcu: kasan: record and print call_rcu() call stack") Reported-by: Jianwei Hu <jianwei.hu@windriver.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Marco Elver <elver@google.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Jun Miao <jun.miao@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `300c0c5e72`) Bug: 217222520 Change-Id: I5b7d4f9dfe5da290627599d8d6de3278debb2a13 Signed-off-by: Andrey Konovalov <andreyknvl@google.com>	2022-02-15 17:54:14 +01:00
Marco Elver	8f225ea876	UPSTREAM: workqueue, kasan: avoid alloc_pages() when recording stack Shuah Khan reported: \| When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled, \| kasan_record_aux_stack() runs into "BUG: Invalid wait context" when \| it tries to allocate memory attempting to acquire spinlock in page \| allocation code while holding workqueue pool raw_spinlock. \| \| There are several instances of this problem when block layer tries \| to __queue_work(). Call trace from one of these instances is below: \| \| kblockd_mod_delayed_work_on() \| mod_delayed_work_on() \| __queue_delayed_work() \| __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held) \| insert_work() \| kasan_record_aux_stack() \| kasan_save_stack() \| stack_depot_save() \| alloc_pages() \| __alloc_pages() \| get_page_from_freelist() \| rm_queue() \| rm_queue_pcplist() \| local_lock_irqsave(&pagesets.lock, flags); \| [ BUG: Invalid wait context triggered ] The default kasan_record_aux_stack() calls stack_depot_save() with GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...). In general, however, it is not even possible to use either GFP_ATOMIC nor GFP_NOWAIT in certain non-preemptive contexts, including raw_spin_locks (see gfp.h and commmit `ab00db216c`). Fix it by instructing stackdepot to not expand stack storage via alloc_pages() in case it runs out by using kasan_record_aux_stack_noalloc(). While there is an increased risk of failing to insert the stack trace, this is typically unlikely, especially if the same insertion had already succeeded previously (stack depot hit). For frequent calls from the same location, it therefore becomes extremely unlikely that kasan_record_aux_stack_noalloc() fails. Link: https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org Link: https://lkml.kernel.org/r/20210913112609.2651084-7-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reported-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Taras Madan <tarasmadan@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `f70da745be`) Bug: 217222520 Change-Id: I81052dba582dbcb492fd7efcef8d56bac4766503 Signed-off-by: Andrey Konovalov <andreyknvl@google.com>	2022-02-15 17:54:14 +01:00
Zqiang	adebe26902	UPSTREAM: rcu: Record kvfree_call_rcu() call stack for KASAN This commit adds a call to kasan_record_aux_stack() in kvfree_call_rcu() in order to record the call stack of the code that caused the object to be freed. Please note that this function does not update the allocated/freed state, which is important because RCU readers might still be referencing this object. Acked-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `84109ab585`) Bug: 217222520 Change-Id: Ia7c27babe4b2318ab116b508578c17e8967e70b1 Signed-off-by: Andrey Konovalov <andreyknvl@google.com>	2022-02-15 17:54:11 +01:00
Yee Lee	5e5294d0e5	UPSTREAM: scs: Release kasan vmalloc poison in scs_free process Since scs allocation is moved to vmalloc region, the shadow stack is protected by kasan_posion_vmalloc. However, the vfree_atomic operation needs to access its context for scs_free process and causes kasan error as the dump info below. This patch Adds kasan_unpoison_vmalloc() before vfree_atomic, which aligns to the prior flow as using kmem_cache. The vmalloc region will go back posioned in the following vumap() operations. ================================================================== BUG: KASAN: vmalloc-out-of-bounds in llist_add_batch+0x60/0xd4 Write of size 8 at addr ffff8000100b9000 by task kthreadd/2 CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.15.0-rc2-11681-g92477dd1faa6-dirty #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x43c show_stack+0x1c/0x2c dump_stack_lvl+0x68/0x84 print_address_description+0x80/0x394 kasan_report+0x180/0x1dc __asan_report_store8_noabort+0x48/0x58 llist_add_batch+0x60/0xd4 vfree_atomic+0x60/0xe0 scs_free+0x1dc/0x1fc scs_release+0xa4/0xd4 free_task+0x30/0xe4 __put_task_struct+0x1ec/0x2e0 delayed_put_task_struct+0x5c/0xa0 rcu_do_batch+0x62c/0x8a0 rcu_core+0x60c/0xc14 rcu_core_si+0x14/0x24 __do_softirq+0x19c/0x68c irq_exit+0x118/0x2dc handle_domain_irq+0xcc/0x134 gic_handle_irq+0x7c/0x1bc call_on_irq_stack+0x40/0x70 do_interrupt_handler+0x78/0x9c el1_interrupt+0x34/0x60 el1h_64_irq_handler+0x1c/0x2c el1h_64_irq+0x78/0x7c _raw_spin_unlock_irqrestore+0x40/0xcc sched_fork+0x4f0/0xb00 copy_process+0xacc/0x3648 kernel_clone+0x168/0x534 kernel_thread+0x13c/0x1b0 kthreadd+0x2bc/0x400 ret_from_fork+0x10/0x20 Memory state around the buggy address: ffff8000100b8f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ffff8000100b8f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 >ffff8000100b9000: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ^ ffff8000100b9080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ffff8000100b9100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ================================================================== Suggested-by: Kuan-Ying Lee <kuan-ying.lee@mediatek.com> Acked-by: Will Deacon <will@kernel.org> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Yee Lee <yee.lee@mediatek.com> Fixes: `a2abe7cbd8` ("scs: switch to vmapped shadow stacks") Link: https://lore.kernel.org/r/20210930081619.30091-1-yee.lee@mediatek.com Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit `528a4ab453`) Bug: 187129171 Signed-off-by: Connor O'Brien <connoro@google.com> Change-Id: Idc934fd9b43cbec74a9c9e085094ebcbcbc88344	2022-02-14 20:09:32 -08:00
Masami Hiramatsu	2e7174e822	UPSTREAM: tracing/boot: Fix to loop on only subkeys Since the commit `e5efaeb8a8` ("bootconfig: Support mixing a value and subkeys under a key") allows to co-exist a value node and key nodes under a node, xbc_node_for_each_child() is not only returning key node but also a value node. In the boot-time tracing using xbc_node_for_each_child() to iterate the events, groups and instances, but those must be key nodes. Thus it must use xbc_node_for_each_subkey(). Link: https://lkml.kernel.org/r/163112988361.74896.2267026262061819145.stgit@devnote2 Fixes: `e5efaeb8a8` ("bootconfig: Support mixing a value and subkeys under a key") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> (cherry picked from commit `cfd799837d`) Bug: 187129171 Signed-off-by: Connor O'Brien <connoro@google.com> Change-Id: I7cf6b7fbf334edc44e6da1f519f30d467ece6fec	2022-02-14 20:09:31 -08:00
Will Deacon	48879e2416	ANDROID: sched: Don't allow frozen asymmetric tasks to remain on the rq If a task with a restricted possible CPU mask and PF_FROZEN or PF_FREEZER_SKIP set blocks, then we must not put it back on the runqueue to handle a signal because this could lead to migration failures later on if the suspending CPU is not capable of running it. Return such a task to the runqueue only if a fatal signal is pending, and otherwise allow the task to block. Bug: 202918514 Signed-off-by: Will Deacon <willdeacon@google.com> Change-Id: I04cc9e65751f2bffc556c4da9ef02fe386764324	2022-02-10 10:49:26 +00:00
Greg Kroah-Hartman	193970ba72	Merge 5.10.99 into android13-5.10 Changes in 5.10.99 selinux: fix double free of cond_list on error paths audit: improve audit queue handling when "audit=1" on cmdline ASoC: ops: Reject out of bounds values in snd_soc_put_volsw() ASoC: ops: Reject out of bounds values in snd_soc_put_volsw_sx() ASoC: ops: Reject out of bounds values in snd_soc_put_xr_sx() ALSA: usb-audio: Correct quirk for VF0770 ALSA: hda: Fix UAF of leds class devs at unbinding ALSA: hda: realtek: Fix race at concurrent COEF updates ALSA: hda/realtek: Add quirk for ASUS GU603 ALSA: hda/realtek: Add missing fixup-model entry for Gigabyte X570 ALC1220 quirks ALSA: hda/realtek: Fix silent output on Gigabyte X570S Aorus Master (newer chipset) ALSA: hda/realtek: Fix silent output on Gigabyte X570 Aorus Xtreme after reboot from Windows btrfs: fix deadlock between quota disable and qgroup rescan worker drm/nouveau: fix off by one in BIOS boundary checking drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts() mm/debug_vm_pgtable: remove pte entry from the page table mm/pgtable: define pte_index so that preprocessor could recognize it mm/kmemleak: avoid scanning potential huge holes block: bio-integrity: Advance seed correctly for larger interval sizes dma-buf: heaps: Fix potential spectre v1 gadget IB/hfi1: Fix AIP early init panic Revert "ASoC: mediatek: Check for error clk pointer" memcg: charge fs_context and legacy_fs_context RDMA/cma: Use correct address when leaving multicast group RDMA/ucma: Protect mc during concurrent multicast leaves IB/rdmavt: Validate remote_addr during loopback atomic tests RDMA/siw: Fix broken RDMA Read Fence/Resume logic. RDMA/mlx4: Don't continue event handler after memory allocation failure iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping() iommu/amd: Fix loop timeout issue in iommu_ga_log_enable() spi: bcm-qspi: check for valid cs before applying chip select spi: mediatek: Avoid NULL pointer crash in interrupt spi: meson-spicc: add IRQ check in meson_spicc_probe spi: uniphier: fix reference count leak in uniphier_spi_probe() net: ieee802154: hwsim: Ensure proper channel selection at probe time net: ieee802154: mcr20a: Fix lifs/sifs periods net: ieee802154: ca8210: Stop leaking skb's net: ieee802154: Return meaningful error codes from the netlink helpers net: macsec: Fix offload support for NETDEV_UNREGISTER event net: macsec: Verify that send_sci is on when setting Tx sci explicitly net: stmmac: dump gmac4 DMA registers correctly net: stmmac: ensure PTP time register reads are consistent drm/i915/overlay: Prevent divide by zero bugs in scaling ASoC: fsl: Add missing error handling in pcm030_fabric_probe ASoC: xilinx: xlnx_formatter_pcm: Make buffer bytes multiple of period bytes ASoC: cpcap: Check for NULL pointer after calling of_get_child_by_name ASoC: max9759: fix underflow in speaker_gain_control_put() pinctrl: intel: Fix a glitch when updating IRQ flags on a preconfigured line pinctrl: intel: fix unexpected interrupt pinctrl: bcm2835: Fix a few error paths scsi: bnx2fc: Make bnx2fc_recv_frame() mp safe nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client. gve: fix the wrong AdminQ buffer queue index check bpf: Use VM_MAP instead of VM_ALLOC for ringbuf selftests/exec: Remove pipe from TEST_GEN_FILES selftests: futex: Use variable MAKE instead of make tools/resolve_btfids: Do not print any commands when building silently rtc: cmos: Evaluate century appropriate Revert "fbcon: Disable accelerated scrolling" fbcon: Add option to enable legacy hardware acceleration perf stat: Fix display of grouped aliased events perf/x86/intel/pt: Fix crash with stop filters in single-range mode x86/perf: Default set FREEZE_ON_SMI for all EDAC/altera: Fix deferred probing EDAC/xgene: Fix deferred probing ext4: prevent used blocks from being allocated during fast commit replay ext4: modify the logic of ext4_mb_new_blocks_simple ext4: fix error handling in ext4_restore_inline_data() ext4: fix error handling in ext4_fc_record_modified_inode() ext4: fix incorrect type issue during replay_del_range net: dsa: mt7530: make NET_DSA_MT7530 select MEDIATEK_GE_PHY cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning selftests: nft_concat_range: add test for reload with no element add/del Linux 5.10.99 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I3571b1ff035a83999daafa35157f96ebfd65ef0a	2022-02-09 12:26:22 +01:00
Waiman Long	5577273135	cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning commit `2bdfd2825c` upstream. It was found that a "suspicious RCU usage" lockdep warning was issued with the rcu_read_lock() call in update_sibling_cpumasks(). It is because the update_cpumasks_hier() function may sleep. So we have to release the RCU lock, call update_cpumasks_hier() and reacquire it afterward. Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks() instead of stating that in the comment. Fixes: `4716909cc5` ("cpuset: Track cpusets that use parent's effective_cpus") Signed-off-by: Waiman Long <longman@redhat.com> Tested-by: Phil Auld <pauld@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-08 18:30:41 +01:00
Hou Tao	6304a613a9	bpf: Use VM_MAP instead of VM_ALLOC for ringbuf commit `b293dcc473` upstream. After commit 2fd3fb0be1d1 ("kasan, vmalloc: unpoison VM_ALLOC pages after mapping"), non-VM_ALLOC mappings will be marked as accessible in __get_vm_area_node() when KASAN is enabled. But now the flag for ringbuf area is VM_ALLOC, so KASAN will complain out-of-bound access after vmap() returns. Because the ringbuf area is created by mapping allocated pages, so use VM_MAP instead. After the change, info in /proc/vmallocinfo also changes from [start]-[end] 24576 ringbuf_map_alloc+0x171/0x290 vmalloc user to [start]-[end] 24576 ringbuf_map_alloc+0x171/0x290 vmap user Fixes: `457f44363a` ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: syzbot+5ad567a418794b9b5983@syzkaller.appspotmail.com Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220202060158.6260-1-houtao1@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-08 18:30:39 +01:00
Paul Moore	0ff6b80506	audit: improve audit queue handling when "audit=1" on cmdline commit `f26d043313` upstream. When an admin enables audit at early boot via the "audit=1" kernel command line the audit queue behavior is slightly different; the audit subsystem goes to greater lengths to avoid dropping records, which unfortunately can result in problems when the audit daemon is forcibly stopped for an extended period of time. This patch makes a number of changes designed to improve the audit queuing behavior so that leaving the audit daemon in a stopped state for an extended period does not cause a significant impact to the system. - kauditd_send_queue() is now limited to looping through the passed queue only once per call. This not only prevents the function from looping indefinitely when records are returned to the current queue, it also allows any recovery handling in kauditd_thread() to take place when kauditd_send_queue() returns. - Transient netlink send errors seen as -EAGAIN now cause the record to be returned to the retry queue instead of going to the hold queue. The intention of the hold queue is to store, perhaps for an extended period of time, the events which led up to the audit daemon going offline. The retry queue remains a temporary queue intended to protect against transient issues between the kernel and the audit daemon. - The retry queue is now limited by the audit_backlog_limit setting, the same as the other queues. This allows admins to bound the size of all of the audit queues on the system. - kauditd_rehold_skb() now returns records to the end of the hold queue to ensure ordering is preserved in the face of recent changes to kauditd_send_queue(). Cc: stable@vger.kernel.org Fixes: `5b52330bbf` ("audit: fix auditd/kernel connection state tracking") Fixes: `f4b3ee3c85` ("audit: improve robustness of the audit queue handling") Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com> Tested-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-08 18:30:34 +01:00
Greg Kroah-Hartman	a23e639396	Merge 5.10.97 into android13-5.10 Changes in 5.10.97 PCI: pciehp: Fix infinite loop in IRQ handler upon power fault net: ipa: fix atomic update in ipa_endpoint_replenish() net: ipa: use a bitmap for endpoint replenish_enabled net: ipa: prevent concurrent replenish Revert "drivers: bus: simple-pm-bus: Add support for probing simple bus only devices" KVM: x86: Forcibly leave nested virt when SMM state is toggled psi: Fix uaf issue when psi trigger is destroyed while being polled x86/mce: Add Xeon Sapphire Rapids to list of CPUs that support PPIN x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN drm/vc4: hdmi: Make sure the device is powered with CEC cgroup-v1: Require capabilities to set release_agent net/mlx5e: Fix handling of wrong devices during bond netevent net/mlx5: Use del_timer_sync in fw reset flow of halting poll net/mlx5: E-Switch, Fix uninitialized variable modact ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback net: amd-xgbe: ensure to reset the tx_timer_active flag net: amd-xgbe: Fix skb data length underflow fanotify: Fix stale file descriptor in copy_event_to_user() net: sched: fix use-after-free in tc_new_tfilter() rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink() cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask() af_packet: fix data-race in packet_setsockopt / packet_setsockopt tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data() Linux 5.10.97 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I5fedd9e1819539df7b733cdfc7ff619bace1ece8	2022-02-05 13:23:26 +01:00
Tianchen Ding	aa9e96db31	cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask() commit `c80d401c52` upstream. subparts_cpus should be limited as a subset of cpus_allowed, but it is updated wrongly by using cpumask_andnot(). Use cpumask_and() instead to fix it. Fixes: `ee8dde0cd2` ("cpuset: Add new v2 cpuset.sched.partition flag") Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-05 12:37:56 +01:00
Eric W. Biederman	1fc3444cda	cgroup-v1: Require capabilities to set release_agent commit `24f6008564` upstream. The cgroup release_agent is called with call_usermodehelper. The function call_usermodehelper starts the release_agent with a full set fo capabilities. Therefore require capabilities when setting the release_agaent. Reported-by: Tabitha Sable <tabitha.c.sable@gmail.com> Tested-by: Tabitha Sable <tabitha.c.sable@gmail.com> Fixes: `81a6a5cdd2` ("Task Control Groups: automatic userspace notification of idle cgroups") Cc: stable@vger.kernel.org # v2.6.24+ Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-05 12:37:55 +01:00
Suren Baghdasaryan	d4e4e61d4a	psi: Fix uaf issue when psi trigger is destroyed while being polled commit `a06247c680` upstream. With write operation on psi files replacing old trigger with a new one, the lifetime of its waitqueue is totally arbitrary. Overwriting an existing trigger causes its waitqueue to be freed and pending poll() will stumble on trigger->event_wait which was destroyed. Fix this by disallowing to redefine an existing psi trigger. If a write operation is used on a file descriptor with an already existing psi trigger, the operation will fail with EBUSY error. Also bypass a check for psi_disabled in the psi_trigger_destroy as the flag can be flipped after the trigger is created, leading to a memory leak. Fixes: `0e94682b73` ("psi: introduce psi monitor") Reported-by: syzbot+cdb5dd11c97cc532efad@syzkaller.appspotmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Analyzed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220111232309.1786347-1-surenb@google.com [surenb: backported to 5.10 kernel] CC: stable@vger.kernel.org # 5.10 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-05 12:37:55 +01:00
Greg Kroah-Hartman	7859e08c9c	Merge 5.10.96 into android13-5.10 Changes in 5.10.96 Bluetooth: refactor malicious adv data check media: venus: core: Drop second v4l2 device unregister net: sfp: ignore disabled SFP node net: stmmac: skip only stmmac_ptp_register when resume from suspend s390/module: fix loading modules with a lot of relocations s390/hypfs: include z/VM guests with access control group set bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack() scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices udf: Restore i_lenAlloc when inode expansion fails udf: Fix NULL ptr deref when converting from inline format efi: runtime: avoid EFIv2 runtime services on Apple x86 machines PM: wakeup: simplify the output logic of pm_show_wakelocks() tracing/histogram: Fix a potential memory leak for kstrdup() tracing: Don't inc err_log entry count if entry allocation fails ceph: properly put ceph_string reference after async create attempt ceph: set pool_ns in new inode layout for async creates fsnotify: fix fsnotify hooks in pseudo filesystems Revert "KVM: SVM: avoid infinite loop on NPF from bad address" perf/x86/intel/uncore: Fix CAS_COUNT_WRITE issue for ICX drm/etnaviv: relax submit size limits KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS arm64: errata: Fix exec handling in erratum `1418040` workaround netfilter: nft_payload: do not update layer 4 checksum when mangling fragments serial: 8250: of: Fix mapped region size when using reg-offset property serial: stm32: fix software flow control transfer tty: n_gsm: fix SW flow control encoding/handling tty: Add support for Brainboxes UC cards. usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge usb: xhci-plat: fix crash when suspend if remote wake enable usb: common: ulpi: Fix crash in ulpi_match() usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS USB: core: Fix hang in usb_kill_urb by adding memory barriers usb: typec: tcpm: Do not disconnect while receiving VBUS off ucsi_ccg: Check DEV_INT bit only when starting CCG4 jbd2: export jbd2_journal_[grab\|put]_journal_head ocfs2: fix a deadlock when commit trans sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask x86/MCE/AMD: Allow thresholding interface updates after init powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs powerpc/32s: Fix kasan_init_region() for KASAN powerpc/32: Fix boot failure with GCC latent entropy plugin i40e: Increase delay to 1 s after global EMP reset i40e: Fix issue when maximum queues is exceeded i40e: Fix queues reservation for XDP i40e: Fix for failed to init adminq while VF reset i40e: fix unsigned stat widths usb: roles: fix include/linux/usb/role.h compile issue rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev rpmsg: char: Fix race between the release of rpmsg_eptdev and cdev scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put() ipv6_tunnel: Rate limit warning messages net: fix information leakage in /proc/net/ptype hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649 hwmon: (lm90) Mark alert as broken for MAX6680 ping: fix the sk_bound_dev_if match in ping_lookup ipv4: avoid using shared IP generator for connected sockets hwmon: (lm90) Reduce maximum conversion rate for G781 NFSv4: Handle case where the lookup of a directory fails NFSv4: nfs_atomic_open() can race when looking up a non-regular file net-procfs: show net devices bound packet types drm/msm: Fix wrong size calculation drm/msm/dsi: Fix missing put_device() call in dsi_get_phy drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable ipv6: annotate accesses to fn->fn_sernum NFS: Ensure the server has an up to date ctime before hardlinking NFS: Ensure the server has an up to date ctime before renaming powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06 netfilter: conntrack: don't increment invalid counter on NF_REPEAT kernel: delete repeated words in comments perf: Fix perf_event_read_local() time sched/pelt: Relax the sync of util_sum with util_avg net: phy: broadcom: hook up soft_reset for BCM54616S phylib: fix potential use-after-free octeontx2-pf: Forward error codes to VF rxrpc: Adjust retransmission backoff efi/libstub: arm64: Fix image check alignment at entry hwmon: (lm90) Mark alert as broken for MAX6654 powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending net: ipv4: Move ip_options_fragment() out of loop net: ipv4: Fix the warning for dereference ipv4: fix ip option filtering for locally generated fragments ibmvnic: init ->running_cap_crqs early ibmvnic: don't spin in tasklet video: hyperv_fb: Fix validation of screen resolution drm/msm/hdmi: Fix missing put_device() call in msm_hdmi_get_phy drm/msm/dpu: invalid parameter check in dpu_setup_dspp_pcc yam: fix a memory leak in yam_siocdevprivate() net: cpsw: Properly initialise struct page_pool_params net: hns3: handle empty unknown interrupt for VF Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values" net: bridge: vlan: fix single net device option dumping ipv4: raw: lock the socket in raw_bind() ipv4: tcp: send zero IPID in SYNACK messages ipv4: remove sparse error in ip_neigh_gw4() net: bridge: vlan: fix memory leak in __allowed_ingress dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config usr/include/Makefile: add linux/nfc.h to the compile-test coverage fsnotify: invalidate dcache before IN_DELETE event block: Fix wrong offset in bio_truncate() mtd: rawnand: mpc5121: Remove unused variable in ads5121_select_chip() Linux 5.10.96 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I3bb702bbb2f32488f2ba15d40ed00982e028d7cb	2022-02-02 09:32:52 +01:00
Vincent Guittot	57b2f3632b	sched/pelt: Relax the sync of util_sum with util_avg [ Upstream commit `98b0d89022` ] Rick reported performance regressions in bugzilla because of cpu frequency being lower than before: https://bugzilla.kernel.org/show_bug.cgi?id=215045 He bisected the problem to: commit `1c35b07e6d` ("sched/fair: Ensure _sum and _avg values stay consistent") This commit forces util_sum to be synced with the new util_avg after removing the contribution of a task and before the next periodic sync. By doing so util_sum is rounded to its lower bound and might lost up to LOAD_AVG_MAX-1 of accumulated contribution which has not yet been reflected in util_avg. Instead of always setting util_sum to the low bound of util_avg, which can significantly lower the utilization of root cfs_rq after propagating the change down into the hierarchy, we revert the change of util_sum and propagate the difference. In addition, we also check that cfs's util_sum always stays above the lower bound for a given util_avg as it has been observed that sched_entity's util_sum is sometimes above cfs one. Fixes: `1c35b07e6d` ("sched/fair: Ensure _sum and _avg values stay consistent") Reported-by: Rick Yiu <rickyiu@google.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Link: https://lkml.kernel.org/r/20220111134659.24961-2-vincent.guittot@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-02-01 17:25:45 +01:00
Peter Zijlstra	91b04e83c7	perf: Fix perf_event_read_local() time [ Upstream commit `09f5e7dc7a` ] Time readers that cannot take locks (due to NMI etc..) currently make use of perf_event::shadow_ctx_time, which, for that event gives: time' = now + (time - timestamp) or, alternatively arranged: time' = time + (now - timestamp) IOW, the progression of time since the last time the shadow_ctx_time was updated. There's problems with this: A) the shadow_ctx_time is per-event, even though the ctx_time it reflects is obviously per context. The direct concequence of this is that the context needs to iterate all events all the time to keep the shadow_ctx_time in sync. B) even with the prior point, the context itself might not be active meaning its time should not advance to begin with. C) shadow_ctx_time isn't consistently updated when ctx_time is There are 3 users of this stuff, that suffer differently from this: - calc_timer_values() - perf_output_read() - perf_event_update_userpage() /* A / - perf_event_read_local() / A,B / In particular, perf_output_read() doesn't suffer at all, because it's sample driven and hence only relevant when the event is actually running. This same was supposed to be true for perf_event_update_userpage(), after all self-monitoring implies the context is active HOWEVER*, as per commit `f792565326` ("perf/core: fix userpage->time_enabled of inactive events") this goes wrong when combined with counter overcommit, in that case those events that do not get scheduled when the context becomes active (task events typically) miss out on the EVENT_TIME update and ENABLED time is inflated (for a little while) with the time the context was inactive. Once the event gets rotated in, this gets corrected, leading to a non-monotonic timeflow. perf_event_read_local() made things even worse, it can request time at any point, suffering all the problems perf_event_update_userpage() does and more. Because while perf_event_update_userpage() is limited by the context being active, perf_event_read_local() users have no such constraint. Therefore, completely overhaul things and do away with perf_event::shadow_ctx_time. Instead have regular context time updates keep track of this offset directly and provide perf_event_time_now() to complement perf_event_time(). perf_event_time_now() will, in adition to being context wide, also take into account if the context is active. For inactive context, it will not advance time. This latter property means the cgroup perf_cgroup_info context needs to grow addition state to track this. Additionally, since all this is strictly per-cpu, we can use barrier() to order context activity vs context time. Fixes: `7d9285e82d` ("perf/bpf: Extend the perf_event_read_local() interface, a.k.a. "bpf: perf event change needed for subsequent bpf helpers"") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Song Liu <song@kernel.org> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/r/YcB06DasOBtU0b00@hirez.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-02-01 17:25:45 +01:00

1 2 3 4 5 ...

36020 Commits