Commit Graph

36020 Commits

Author SHA1 Message Date
Paul E. McKenney
7e7bc11a75 FROMGIT: rcu: Don't deboost before reporting expedited quiescent state
Currently rcu_preempt_deferred_qs_irqrestore() releases rnp->boost_mtx
before reporting the expedited quiescent state.  Under heavy real-time
load, this can result in this function being preempted before the
quiescent state is reported, which can in turn prevent the expedited grace
period from completing.  Tim Murray reports that the resulting expedited
grace periods can take hundreds of milliseconds and even more than one
second, when they should normally complete in less than a millisecond.

This was fine given that there were no particular response-time
constraints for synchronize_rcu_expedited(), as it was designed
for throughput rather than latency.  However, some users now need
sub-100-millisecond response-time constratints.

This patch therefore follows Neeraj's suggestion (seconded by Tim and
by Uladzislau Rezki) of simply reversing the two operations.

Reported-by: Tim Murray <timmurray@google.com>
Reported-by: Joel Fernandes <joelaf@google.com>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Tim Murray <timmurray@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: <stable@vger.kernel.org> # 5.4.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Bug: 217236054
Bug: 224756824
(cherry picked from commit 10c5357874
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
rcu/next)

Change-Id: Iac442f4cb0648c967ec65d4df0f74c8c25940393
Signed-off-by: Kyle Lin <kylelin@google.com>
2022-03-17 15:22:42 +00:00
Greg Kroah-Hartman
b2a024ac7f Merge d04937ae94 ("x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT") into android13-5.10
Steps on the way to 5.10.105

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie2249241cd53eea62126d9ee140a3d4e5a9012d8
2022-03-16 13:24:43 +01:00
Chris Goldsworthy
239dde6763 ANDROID: dma-direct: Make DMA32 disablement work for CONFIG_NUMA
zone_dma32_is_empty() currently lacks the proper validation to ensure
that the NUMA node ID it receives as an argument is valid. This has no
effect on kernels with CONFIG_NUMA=n as NODE_DATA() will return the
same pglist_data on these devices, but on kernels with CONFIG_NUMA=y,
this is not the case, and the node passed to NODE_DATA must be
validated.

Rather than trying to find the node containing ZONE_DMA32, replace
calls of zone_dma32_is_empty() with zone_dma32_are_empty() (which
iterates over all nodes and returns false if one of the nodes holds
DMA32 and it is non-empty).

Bug: 199917449
Fixes: c3c2bb34ac ("ANDROID: arm64/mm: Add command line option to make ZONE_DMA32 empty")
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Change-Id: I850fb9213b71a1ef29106728bfda0cc6de46fdbb
(cherry picked from commit bf96382fb9)
2022-03-16 01:27:07 +00:00
Chris Goldsworthy
19507e098b ANDROID: arm64/mm: Add command line option to make ZONE_DMA32 empty
ZONE_DMA32 is enabled by default on android12-5.10, yet it is not
needed for all devices, nor is it desirable to have if not needed. For
instance, if a partner in GKI 1.0 did not use ZONE_DMA32, memory can
be lower for ZONE_NORMAL relative to older targets, such that memory
would run out more quickly in ZONE_NORMAL leading kswapd to be invoked
unnecessarily.

Correspondingly, provide a means of making ZONE_DMA32 empty via the
kernel command line when it is compiled in via CONFIG_ZONE_DMA32.

Bug: 199917449
Change-Id: I70ec76914b92e518d61a61072f0b3cb41cb28646
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
(cherry picked from commit c3c2bb34ac)
2022-03-16 00:04:01 +00:00
Greg Kroah-Hartman
66c9212cdf Merge 5.10.104 into android13-5.10
Changes in 5.10.104
	mac80211_hwsim: report NOACK frames in tx_status
	mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work
	i2c: bcm2835: Avoid clock stretching timeouts
	ASoC: rt5668: do not block workqueue if card is unbound
	ASoC: rt5682: do not block workqueue if card is unbound
	regulator: core: fix false positive in regulator_late_cleanup()
	Input: clear BTN_RIGHT/MIDDLE on buttonpads
	KVM: arm64: vgic: Read HW interrupt pending state from the HW
	tipc: fix a bit overflow in tipc_crypto_key_rcv()
	cifs: fix double free race when mount fails in cifs_get_root()
	selftests/seccomp: Fix seccomp failure by adding missing headers
	dmaengine: shdma: Fix runtime PM imbalance on error
	i2c: cadence: allow COMPILE_TEST
	i2c: qup: allow COMPILE_TEST
	net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
	usb: gadget: don't release an existing dev->buf
	usb: gadget: clear related members when goto fail
	exfat: reuse exfat_inode_info variable instead of calling EXFAT_I()
	exfat: fix i_blocks for files truncated over 4 GiB
	tracing: Add test for user space strings when filtering on string pointers
	serial: stm32: prevent TDR register overwrite when sending x_char
	ata: pata_hpt37x: fix PCI clock detection
	drm/amdgpu: check vm ready by amdgpu_vm->evicting flag
	tracing: Add ustring operation to filtering string pointers
	ALSA: intel_hdmi: Fix reference to PCM buffer address
	riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value
	riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
	riscv: Fix config KASAN && DEBUG_VIRTUAL
	ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
	iommu/amd: Recover from event log overflow
	drm/i915: s/JSP2/ICP2/ PCH
	xen/netfront: destroy queues before real_num_tx_queues is zeroed
	thermal: core: Fix TZ_GET_TRIP NULL pointer dereference
	ntb: intel: fix port config status offset for SPR
	mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
	xfrm: fix MTU regression
	netfilter: fix use-after-free in __nf_register_net_hook()
	bpf, sockmap: Do not ignore orig_len parameter
	xfrm: fix the if_id check in changelink
	xfrm: enforce validity of offload input flags
	e1000e: Correct NVM checksum verification flow
	net: fix up skbs delta_truesize in UDP GRO frag_list
	netfilter: nf_queue: don't assume sk is full socket
	netfilter: nf_queue: fix possible use-after-free
	netfilter: nf_queue: handle socket prefetch
	batman-adv: Request iflink once in batadv-on-batadv check
	batman-adv: Request iflink once in batadv_get_real_netdevice
	batman-adv: Don't expect inter-netns unique iflink indices
	net: ipv6: ensure we call ipv6_mc_down() at most once
	net: dcb: flush lingering app table entries for unregistered devices
	net/smc: fix connection leak
	net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
	net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
	rcu/nocb: Fix missed nocb_timer requeue
	ice: Fix race conditions between virtchnl handling and VF ndo ops
	ice: fix concurrent reset and removal of VFs
	sched/topology: Make sched_init_numa() use a set for the deduplicating sort
	sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()
	ia64: ensure proper NUMA distance and possible map initialization
	mac80211: fix forwarded mesh frames AC & queue selection
	net: stmmac: fix return value of __setup handler
	mac80211: treat some SAE auth steps as final
	iavf: Fix missing check for running netdev
	net: sxgbe: fix return value of __setup handler
	ibmvnic: register netdev after init of adapter
	net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
	ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
	efivars: Respect "block" flag in efivar_entry_set_safe()
	firmware: arm_scmi: Remove space in MODULE_ALIAS name
	ASoC: cs4265: Fix the duplicated control name
	can: gs_usb: change active_channels's type from atomic_t to u8
	arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
	igc: igc_read_phy_reg_gpy: drop premature return
	ARM: Fix kgdb breakpoint for Thumb2
	ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
	selftests: mlxsw: tc_police_scale: Make test more robust
	pinctrl: sunxi: Use unique lockdep classes for IRQs
	igc: igc_write_phy_reg_gpy: drop premature return
	ibmvnic: free reset-work-item when flushing
	memfd: fix F_SEAL_WRITE after shmem huge page allocated
	s390/extable: fix exception table sorting
	ARM: dts: switch timer config to common devkit8000 devicetree
	ARM: dts: Use 32KiHz oscillator on devkit8000
	soc: fsl: guts: Revert commit 3c0d64e867
	soc: fsl: guts: Add a missing memory allocation failure check
	soc: fsl: qe: Check of ioremap return value
	ARM: tegra: Move panels to AUX bus
	ibmvnic: complete init_done on transport events
	net: chelsio: cxgb3: check the return value of pci_find_capability()
	iavf: Refactor iavf state machine tracking
	nl80211: Handle nla_memdup failures in handle_nan_filter
	drm/amdgpu: fix suspend/resume hang regression
	net: dcb: disable softirqs in dcbnl_flush_dev()
	Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
	Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
	Input: samsung-keypad - properly state IOMEM dependency
	HID: add mapping for KEY_DICTATE
	HID: add mapping for KEY_ALL_APPLICATIONS
	tracing/histogram: Fix sorting on old "cpu" value
	tracing: Fix return value of __setup handlers
	btrfs: fix lost prealloc extents beyond eof after full fsync
	btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
	btrfs: add missing run of delayed items after unlink during log replay
	Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"
	hamradio: fix macro redefine warning
	Linux 5.10.104

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6db85dae2ee6420dfab7fc72fe79acdb74560637
2022-03-15 13:57:31 +01:00
Josh Poimboeuf
afc2d635b5 x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
commit 44a3918c82 upstream.

With unprivileged eBPF enabled, eIBRS (without retpoline) is vulnerable
to Spectre v2 BHB-based attacks.

When both are enabled, print a warning message and report it in the
'spectre_v2' sysfs vulnerabilities file.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
[fllinden@amazon.com: backported to 5.10]
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11 12:11:49 +01:00
Barry Song
66e59d2b41 UPSTREAM: genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()
Many drivers don't want interrupts enabled automatically via request_irq().
So they are handling this issue by either way of the below two:

(1)
  irq_set_status_flags(irq, IRQ_NOAUTOEN);
  request_irq(dev, irq...);

(2)
  request_irq(dev, irq...);
  disable_irq(irq);

The code in the second way is silly and unsafe. In the small time gap
between request_irq() and disable_irq(), interrupts can still come.

The code in the first way is safe though it's subobtimal.

Add a new IRQF_NO_AUTOEN flag which can be handed in by drivers to
request_irq() and request_nmi(). It prevents the automatic enabling of the
requested interrupt/nmi in the same safe way as #1 above. With that the
various usage sites of #1 and #2 above can be simplified and corrected.

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: dmitry.torokhov@gmail.com
Link: https://lore.kernel.org/r/20210302224916.13980-2-song.bao.hua@hisilicon.com
(cherry picked from commit cbe16f35be)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: Ia3fa1c3c583c1562f2029250ac315ac2346def18
2022-03-09 08:57:45 -08:00
Randy Dunlap
827172ffa9 tracing: Fix return value of __setup handlers
commit 1d02b444b8 upstream.

__setup() handlers should generally return 1 to indicate that the
boot options have been handled.

Using invalid option values causes the entire kernel boot option
string to be reported as Unknown and added to init's environment
strings, polluting it.

  Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6
    kprobe_event=p,syscall_any,$arg1 trace_options=quiet
    trace_clock=jiffies", will be passed to user space.

 Run /sbin/init as init process
   with arguments:
     /sbin/init
   with environment:
     HOME=/
     TERM=linux
     BOOT_IMAGE=/boot/bzImage-517rc6
     kprobe_event=p,syscall_any,$arg1
     trace_options=quiet
     trace_clock=jiffies

Return 1 from the __setup() handlers so that init's environment is not
polluted with kernel boot options.

Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Link: https://lkml.kernel.org/r/20220303031744.32356-1-rdunlap@infradead.org

Cc: stable@vger.kernel.org
Fixes: 7bcfaf54f5 ("tracing: Add trace_options kernel command line parameter")
Fixes: e1e232ca6b ("tracing: Add trace_clock=<clock> kernel parameter")
Fixes: 970988e19e ("tracing/kprobe: Add kprobe_event= boot parameter")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08 19:09:38 +01:00
Steven Rostedt (Google)
78059b1cfc tracing/histogram: Fix sorting on old "cpu" value
commit 1d1898f656 upstream.

When trying to add a histogram against an event with the "cpu" field, it
was impossible due to "cpu" being a keyword to key off of the running CPU.
So to fix this, it was changed to "common_cpu" to match the other generic
fields (like "common_pid"). But since some scripts used "cpu" for keying
off of the CPU (for events that did not have "cpu" as a field, which is
most of them), a backward compatibility trick was added such that if "cpu"
was used as a key, and the event did not have "cpu" as a field name, then
it would fallback and switch over to "common_cpu".

This fix has a couple of subtle bugs. One was that when switching over to
"common_cpu", it did not change the field name, it just set a flag. But
the code still found a "cpu" field. The "cpu" field is used for filtering
and is returned when the event does not have a "cpu" field.

This was found by:

  # cd /sys/kernel/tracing
  # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger
  # cat events/sched/sched_wakeup/hist

Which showed the histogram unsorted:

{ cpu:         19, pid:       1175 } hitcount:          1
{ cpu:          6, pid:        239 } hitcount:          2
{ cpu:         23, pid:       1186 } hitcount:         14
{ cpu:         12, pid:        249 } hitcount:          2
{ cpu:          3, pid:        994 } hitcount:          5

Instead of hard coding the "cpu" checks, take advantage of the fact that
trace_event_field_field() returns a special field for "cpu" and "CPU" if
the event does not have "cpu" as a field. This special field has the
"filter_type" of "FILTER_CPU". Check that to test if the returned field is
of the CPU type instead of doing the string compare.

Also, fix the sorting bug by testing for the hist_field flag of
HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use
the special CPU field to know what compare routine to use, and since that
special field does not have a size, it returns tracing_map_cmp_none.

Cc: stable@vger.kernel.org
Fixes: 1e3bac71c5 ("tracing/histogram: Rename "cpu" to "common_cpu"")
Reported-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08 19:09:38 +01:00
Dietmar Eggemann
1312ef5ad0 sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()
commit 71e5f6644f upstream.

Commit "sched/topology: Make sched_init_numa() use a set for the
deduplicating sort" allocates 'i + nr_levels (level)' instead of
'i + nr_levels + 1' sched_domain_topology_level.

This led to an Oops (on Arm64 juno with CONFIG_SCHED_DEBUG):

sched_init_domains
  build_sched_domains()
    __free_domain_allocs()
      __sdt_free() {
	...
        for_each_sd_topology(tl)
	  ...
          sd = *per_cpu_ptr(sdd->sd, j); <--
	  ...
      }

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Barry Song <song.bao.hua@hisilicon.com>
Link: https://lkml.kernel.org/r/6000e39e-7d28-c360-9cd6-8798fd22a9bf@arm.com
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08 19:09:34 +01:00
Valentin Schneider
d753aecb3d sched/topology: Make sched_init_numa() use a set for the deduplicating sort
commit 620a6dc407 upstream.

The deduplicating sort in sched_init_numa() assumes that the first line in
the distance table contains all unique values in the entire table. I've
been trying to pen what this exactly means for the topology, but it's not
straightforward. For instance, topology.c uses this example:

  node   0   1   2   3
    0:  10  20  20  30
    1:  20  10  20  20
    2:  20  20  10  20
    3:  30  20  20  10

  0 ----- 1
  |     / |
  |   /   |
  | /     |
  2 ----- 3

Which works out just fine. However, if we swap nodes 0 and 1:

  1 ----- 0
  |     / |
  |   /   |
  | /     |
  2 ----- 3

we get this distance table:

  node   0  1  2  3
    0:  10 20 20 20
    1:  20 10 20 30
    2:  20 20 10 20
    3:  20 30 20 10

Which breaks the deduplicating sort (non-representative first line). In
this case this would just be a renumbering exercise, but it so happens that
we can have a deduplicating sort that goes through the whole table in O(n²)
at the extra cost of a temporary memory allocation (i.e. any form of set).

The ACPI spec (SLIT) mentions distances are encoded on 8 bits. Following
this, implement the set as a 256-bits bitmap. Should this not be
satisfactory (i.e. we want to support 32-bit values), then we'll have to go
for some other sparse set implementation.

This has the added benefit of letting us allocate just the right amount of
memory for sched_domains_numa_distance[], rather than an arbitrary
(nr_node_ids + 1).

Note: DT binding equivalent (distance-map) decodes distances as 32-bit
values.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210122123943.1217-2-valentin.schneider@arm.com
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08 19:09:34 +01:00
Frederic Weisbecker
0c145262ac rcu/nocb: Fix missed nocb_timer requeue
commit b2fcf21020 upstream.

This sequence of events can lead to a failure to requeue a CPU's
->nocb_timer:

1.	There are no callbacks queued for any CPU covered by CPU 0-2's
	->nocb_gp_kthread.  Note that ->nocb_gp_kthread is associated
	with CPU 0.

2.	CPU 1 enqueues its first callback with interrupts disabled, and
	thus must defer awakening its ->nocb_gp_kthread.  It therefore
	queues its rcu_data structure's ->nocb_timer.  At this point,
	CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.

3.	CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a
	callback, but with interrupts enabled, allowing it to directly
	awaken the ->nocb_gp_kthread.

4.	The newly awakened ->nocb_gp_kthread associates both CPU 1's
	and CPU 2's callbacks with a future grace period and arranges
	for that grace period to be started.

5.	This ->nocb_gp_kthread goes to sleep waiting for the end of this
	future grace period.

6.	This grace period elapses before the CPU 1's timer fires.
	This is normally improbably given that the timer is set for only
	one jiffy, but timers can be delayed.  Besides, it is possible
	that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

7.	The grace period ends, so rcu_gp_kthread awakens the
	->nocb_gp_kthread, which in turn awakens both CPU 1's and
	CPU 2's ->nocb_cb_kthread.  Then ->nocb_gb_kthread sleeps
	waiting for more newly queued callbacks.

8.	CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps
	waiting for more invocable callbacks.

9.	Note that neither kthread updated any ->nocb_timer state,
	so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.

10.	CPU 1 enqueues its second callback, this time with interrupts
 	enabled so it can wake directly	->nocb_gp_kthread.
	It does so with calling wake_nocb_gp() which also cancels the
	pending timer that got queued in step 2. But that doesn't reset
	CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
	So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now
	desynchronized.

11.	->nocb_gp_kthread associates the callback queued in 10 with a new
	grace period, arranges for that grace period to start and sleeps
	waiting for it to complete.

12.	The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread,
	which in turn wakes up CPU 1's ->nocb_cb_kthread which then
	invokes the callback queued in 10.

13.	CPU 1 enqueues its third callback, this time with interrupts
	disabled so it must queue a timer for a deferred wakeup. However
	the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
	incorrectly indicates that a timer is already queued.  Instead,
	CPU 1's ->nocb_timer was cancelled in 10.  CPU 1 therefore fails
	to queue the ->nocb_timer.

14.	CPU 1 has its pending callback and it may go unnoticed until
	some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever
	calls an explicit deferred wakeup, for example, during idle entry.

This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime
we delete the ->nocb_timer.

It is quite possible that there is a similar scenario involving
->nocb_bypass_timer and ->nocb_defer_wakeup.  However, despite some
effort from several people, a failure scenario has not yet been located.
However, that by no means guarantees that no such scenario exists.
Finding a failure scenario is left as an exercise for the reader, and the
"Fixes:" tag below relates to ->nocb_bypass_timer instead of ->nocb_timer.

Fixes: d1b222c6be (rcu/nocb: Add bypass callback queueing)
Cc: <stable@vger.kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08 19:09:34 +01:00
Steven Rostedt
e57dfaf66f tracing: Add ustring operation to filtering string pointers
[ Upstream commit f37c3bbc63 ]

Since referencing user space pointers is special, if the user wants to
filter on a field that is a pointer to user space, then they need to
specify it.

Add a ".ustring" attribute to the field name for filters to state that the
field is pointing to user space such that the kernel can take the
appropriate action to read that pointer.

Link: https://lore.kernel.org/all/yt9d8rvmt2jq.fsf@linux.ibm.com/

Fixes: 77360f9bbc ("tracing: Add test for user space strings when filtering on string pointers")
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08 19:09:31 +01:00
Steven Rostedt
c999c5927e tracing: Add test for user space strings when filtering on string pointers
[ Upstream commit 77360f9bbc ]

Pingfan reported that the following causes a fault:

  echo "filename ~ \"cpu\"" > events/syscalls/sys_enter_openat/filter
  echo 1 > events/syscalls/sys_enter_at/enable

The reason is that trace event filter treats the user space pointer
defined by "filename" as a normal pointer to compare against the "cpu"
string. The following bug happened:

 kvm-03-guest16 login: [72198.026181] BUG: unable to handle page fault for address: 00007fffaae8ef60
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0001) - permissions violation
 PGD 80000001008b7067 P4D 80000001008b7067 PUD 2393f1067 PMD 2393ec067 PTE 8000000108f47867
 Oops: 0001 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.14.0-32.el9.x86_64 #1
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:strlen+0x0/0x20
 Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11
       48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8
       48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
 RSP: 0018:ffffb5b900013e48 EFLAGS: 00010246
 RAX: 0000000000000018 RBX: ffff8fc1c49ede00 RCX: 0000000000000000
 RDX: 0000000000000020 RSI: ffff8fc1c02d601c RDI: 00007fffaae8ef60
 RBP: 00007fffaae8ef60 R08: 0005034f4ddb8ea4 R09: 0000000000000000
 R10: ffff8fc1c02d601c R11: 0000000000000000 R12: ffff8fc1c8a6e380
 R13: 0000000000000000 R14: ffff8fc1c02d6010 R15: ffff8fc1c00453c0
 FS:  00007fa86123db40(0000) GS:ffff8fc2ffd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fffaae8ef60 CR3: 0000000102880001 CR4: 00000000007706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  filter_pred_pchar+0x18/0x40
  filter_match_preds+0x31/0x70
  ftrace_syscall_enter+0x27a/0x2c0
  syscall_trace_enter.constprop.0+0x1aa/0x1d0
  do_syscall_64+0x16/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7fa861d88664

The above happened because the kernel tried to access user space directly
and triggered a "supervisor read access in kernel mode" fault. Worse yet,
the memory could not even be loaded yet, and a SEGFAULT could happen as
well. This could be true for kernel space accessing as well.

To be even more robust, test both kernel and user space strings. If the
string fails to read, then simply have the filter fail.

Note, TASK_SIZE is used to determine if the pointer is user or kernel space
and the appropriate strncpy_from_kernel/user_nofault() function is used to
copy the memory. For some architectures, the compare to TASK_SIZE may always
pick user space or kernel space. If it gets it wrong, the only thing is that
the filter will fail to match. In the future, this needs to be fixed to have
the event denote which should be used. But failing a filter is much better
than panicing the machine, and that can be solved later.

Link: https://lore.kernel.org/all/20220107044951.22080-1-kernelfans@gmail.com/
Link: https://lkml.kernel.org/r/20220110115532.536088fd@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reported-by: Pingfan Liu <kernelfans@gmail.com>
Tested-by: Pingfan Liu <kernelfans@gmail.com>
Fixes: 87a342f5db ("tracing/filters: Support filtering for char * strings")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08 19:09:30 +01:00
Greg Kroah-Hartman
89c97134d0 Merge 5.10.103 into android13-5.10
Changes in 5.10.103
	cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
	btrfs: tree-checker: check item_size for inode_item
	btrfs: tree-checker: check item_size for dev_item
	clk: jz4725b: fix mmc0 clock gating
	vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
	parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
	parisc/unaligned: Fix ldw() and stw() unalignment handlers
	KVM: x86/mmu: make apf token non-zero to fix bug
	drm/amdgpu: disable MMHUB PG for Picasso
	drm/i915: Correctly populate use_sagv_wm for all pipes
	sr9700: sanity check for packet length
	USB: zaurus: support another broken Zaurus
	CDC-NCM: avoid overflow in sanity checking
	netfilter: nf_tables_offload: incorrect flow offload action array size
	x86/fpu: Correct pkru/xstate inconsistency
	tee: export teedev_open() and teedev_close_context()
	optee: use driver internal tee_context for some rpc
	ping: remove pr_err from ping_lookup
	perf data: Fix double free in perf_session__delete()
	bnx2x: fix driver load from initrd
	bnxt_en: Fix active FEC reporting to ethtool
	hwmon: Handle failure to register sensor with thermal zone correctly
	bpf: Do not try bpf_msg_push_data with len 0
	selftests: bpf: Check bpf_msg_push_data return value
	bpf: Add schedule points in batch ops
	io_uring: add a schedule point in io_add_buffers()
	net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends
	tipc: Fix end of loop tests for list_for_each_entry()
	gso: do not skip outer ip header in case of ipip and net_failover
	openvswitch: Fix setting ipv6 fields causing hw csum failure
	drm/edid: Always set RGB444
	net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
	net/sched: act_ct: Fix flow table lookup after ct clear or switching zones
	net: ll_temac: check the return value of devm_kmalloc()
	net: Force inlining of checksum functions in net/checksum.h
	nfp: flower: Fix a potential leak in nfp_tunnel_add_shared_mac()
	netfilter: nf_tables: fix memory leak during stateful obj update
	net/smc: Use a mutex for locking "struct smc_pnettable"
	surface: surface3_power: Fix battery readings on batteries without a serial number
	udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister()
	net/mlx5: Fix possible deadlock on rule deletion
	net/mlx5: Fix wrong limitation of metadata match on ecpf
	net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
	spi: spi-zynq-qspi: Fix a NULL pointer dereference in zynq_qspi_exec_mem_op()
	regmap-irq: Update interrupt clear register for proper reset
	RDMA/rtrs-clt: Fix possible double free in error case
	RDMA/rtrs-clt: Kill wait_for_inflight_permits
	RDMA/rtrs-clt: Move free_permit from free_clt to rtrs_clt_close
	configfs: fix a race in configfs_{,un}register_subsystem()
	RDMA/ib_srp: Fix a deadlock
	tracing: Have traceon and traceoff trigger honor the instance
	iio: adc: men_z188_adc: Fix a resource leak in an error handling path
	iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits
	iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot
	iio: Fix error handling for PM
	sc16is7xx: Fix for incorrect data being transmitted
	ata: pata_hpt37x: disable primary channel on HPT371
	Revert "USB: serial: ch341: add new Product ID for CH341A"
	usb: gadget: rndis: add spinlock for rndis response list
	USB: gadget: validate endpoint index for xilinx udc
	tracefs: Set the group ownership in apply_options() not parse_options()
	USB: serial: option: add support for DW5829e
	USB: serial: option: add Telit LE910R1 compositions
	usb: dwc2: drd: fix soft connect when gadget is unconfigured
	usb: dwc3: pci: Fix Bay Trail phy GPIO mappings
	usb: dwc3: gadget: Let the interrupt handler disable bottom halves.
	xhci: re-initialize the HC during resume if HCE was set
	xhci: Prevent futile URB re-submissions due to incorrect return value.
	driver core: Free DMA range map when device is released
	RDMA/cma: Do not change route.addr.src_addr outside state checks
	thermal: int340x: fix memory leak in int3400_notify()
	riscv: fix oops caused by irqsoff latency tracer
	tty: n_gsm: fix encoding of control signal octet bit DV
	tty: n_gsm: fix proper link termination after failed open
	tty: n_gsm: fix NULL pointer access due to DLCI release
	tty: n_gsm: fix wrong tty control line for flow control
	tty: n_gsm: fix deadlock in gsmtty_open()
	gpio: tegra186: Fix chip_data type confusion
	memblock: use kfree() to release kmalloced memblock regions
	Linux 5.10.103

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I574f0ca699653770d2a7d4418b3bd76c2669b138
2022-03-02 15:44:44 +01:00
Steven Rostedt (Google)
afbeee13be tracing: Have traceon and traceoff trigger honor the instance
commit 302e9edd54 upstream.

If a trigger is set on an event to disable or enable tracing within an
instance, then tracing should be disabled or enabled in the instance and
not at the top level, which is confusing to users.

Link: https://lkml.kernel.org/r/20220223223837.14f94ec3@rorschach.local.home

Cc: stable@vger.kernel.org
Fixes: ae63b31e4d ("tracing: Separate out trace events from global variables")
Tested-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02 11:42:53 +01:00
Eric Dumazet
7ef94bfb08 bpf: Add schedule points in batch ops
commit 75134f16e7 upstream.

syzbot reported various soft lockups caused by bpf batch operations.

 INFO: task kworker/1:1:27 blocked for more than 140 seconds.
 INFO: task hung in rcu_barrier

Nothing prevents batch ops to process huge amount of data,
we need to add schedule points in them.

Note that maybe_wait_bpf_programs(map) calls from
generic_map_delete_batch() can be factorized by moving
the call after the loop.

This will be done later in -next tree once we get this fix merged,
unless there is strong opinion doing this optimization sooner.

Fixes: aa2e93b8e5 ("bpf: Add generic support for update and delete batch ops")
Fixes: cb4d03ab49 ("bpf: Add generic support for lookup batch op")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Brian Vazquez <brianvv@google.com>
Link: https://lore.kernel.org/bpf/20220217181902.808742-1-eric.dumazet@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02 11:42:49 +01:00
Zhang Qiao
fcec42dd28 cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
commit 05c7b7a92c upstream.

As previously discussed(https://lkml.org/lkml/2022/1/20/51),
cpuset_attach() is affected with similar cpu hotplug race,
as follow scenario:

     cpuset_attach()				cpu hotplug
    ---------------------------            ----------------------
    down_write(cpuset_rwsem)
    guarantee_online_cpus() // (load cpus_attach)
					sched_cpu_deactivate
					  set_cpu_active()
					  // will change cpu_active_mask
    set_cpus_allowed_ptr(cpus_attach)
      __set_cpus_allowed_ptr_locked()
       // (if the intersection of cpus_attach and
         cpu_active_mask is empty, will return -EINVAL)
    up_write(cpuset_rwsem)

To avoid races such as described above, protect cpuset_attach() call
with cpu_hotplug_lock.

Fixes: be367d0992 ("cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time")
Cc: stable@vger.kernel.org # v2.6.32+
Reported-by: Zhao Gongyi <zhaogongyi@huawei.com>
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Acked-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02 11:42:45 +01:00
Greg Kroah-Hartman
768ef3a611 Merge 5.10.102 into android13-5.10
Changes in 5.10.102
	drm/nouveau/pmu/gm200-: use alternate falcon reset sequence
	mm: memcg: synchronize objcg lists with a dedicated spinlock
	rcu: Do not report strict GPs for outgoing CPUs
	fget: clarify and improve __fget_files() implementation
	fs/proc: task_mmu.c: don't read mapcount for migration entry
	can: isotp: prevent race between isotp_bind() and isotp_setsockopt()
	can: isotp: add SF_BROADCAST support for functional addressing
	scsi: lpfc: Fix mailbox command failure during driver initialization
	HID:Add support for UGTABLET WP5540
	Revert "svm: Add warning message for AVIC IPI invalid target"
	serial: parisc: GSC: fix build when IOSAPIC is not set
	parisc: Drop __init from map_pages declaration
	parisc: Fix data TLB miss in sba_unmap_sg
	parisc: Fix sglist access in ccio-dma.c
	mmc: block: fix read single on recovery logic
	mm: don't try to NUMA-migrate COW pages that have other uses
	PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
	parisc: Add ioread64_lo_hi() and iowrite64_lo_hi()
	btrfs: send: in case of IO error log it
	platform/x86: touchscreen_dmi: Add info for the RWC NANOTE P8 AY07J 2-in-1
	platform/x86: ISST: Fix possible circular locking dependency detected
	selftests: rtc: Increase test timeout so that all tests run
	kselftest: signal all child processes
	net: ieee802154: at86rf230: Stop leaking skb's
	selftests/zram: Skip max_comp_streams interface on newer kernel
	selftests/zram01.sh: Fix compression ratio calculation
	selftests/zram: Adapt the situation that /dev/zram0 is being used
	selftests: openat2: Print also errno in failure messages
	selftests: openat2: Add missing dependency in Makefile
	selftests: openat2: Skip testcases that fail with EOPNOTSUPP
	selftests: skip mincore.check_file_mmap when fs lacks needed support
	ax25: improve the incomplete fix to avoid UAF and NPD bugs
	vfs: make freeze_super abort when sync_filesystem returns error
	quota: make dquot_quota_sync return errors from ->sync_fs
	scsi: pm8001: Fix use-after-free for aborted TMF sas_task
	scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task
	nvme: fix a possible use-after-free in controller reset during load
	nvme-tcp: fix possible use-after-free in transport error_recovery work
	nvme-rdma: fix possible use-after-free in transport error_recovery work
	drm/amdgpu: fix logic inversion in check
	x86/Xen: streamline (and fix) PV CPU enumeration
	Revert "module, async: async_synchronize_full() on module init iff async is used"
	gcc-plugins/stackleak: Use noinstr in favor of notrace
	random: wake up /dev/random writers after zap
	kbuild: lto: merge module sections
	kbuild: lto: Merge module sections if and only if CONFIG_LTO_CLANG is enabled
	iwlwifi: fix use-after-free
	drm/radeon: Fix backlight control on iMac 12,1
	drm/i915/opregion: check port number bounds for SWSCI display power state
	vsock: remove vsock from connected table when connect is interrupted by a signal
	drm/i915/gvt: Make DRM_I915_GVT depend on X86
	iwlwifi: pcie: fix locking when "HW not ready"
	iwlwifi: pcie: gen2: fix locking when "HW not ready"
	selftests: netfilter: fix exit value for nft_concat_range
	netfilter: nft_synproxy: unregister hooks on init error path
	ipv6: per-netns exclusive flowlabel checks
	net: dsa: lan9303: fix reset on probe
	net: dsa: lantiq_gswip: fix use after free in gswip_remove()
	net: ieee802154: ca8210: Fix lifs/sifs periods
	ping: fix the dif and sdif check in ping_lookup
	bonding: force carrier update when releasing slave
	drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hit
	net_sched: add __rcu annotation to netdev->qdisc
	bonding: fix data-races around agg_select_timer
	libsubcmd: Fix use-after-free for realloc(..., 0)
	dpaa2-eth: Initialize mutex used in one step timestamping path
	perf bpf: Defer freeing string after possible strlen() on it
	selftests/exec: Add non-regular to TEST_GEN_PROGS
	ALSA: hda/realtek: Add quirk for Legion Y9000X 2019
	ALSA: hda/realtek: Fix deadlock by COEF mutex
	ALSA: hda: Fix regression on forced probe mask option
	ALSA: hda: Fix missing codec probe on Shenker Dock 15
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw()
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_range()
	powerpc/lib/sstep: fix 'ptesync' build error
	mtd: rawnand: gpmi: don't leak PM reference in error path
	KVM: SVM: Never reject emulation due to SMAP errata for !SEV guests
	ASoC: tas2770: Insert post reset delay
	block/wbt: fix negative inflight counter when remove scsi device
	NFS: LOOKUP_DIRECTORY is also ok with symlinks
	NFS: Do not report writeback errors in nfs_getattr()
	tty: n_tty: do not look ahead for EOL character past the end of the buffer
	mtd: rawnand: qcom: Fix clock sequencing in qcom_nandc_probe()
	mtd: rawnand: brcmnand: Fixed incorrect sub-page ECC status
	Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj
	KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
	KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event
	KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW
	NFS: Don't set NFS_INO_INVALID_XATTR if there is no xattr cache
	ARM: OMAP2+: hwmod: Add of_node_put() before break
	ARM: OMAP2+: adjust the location of put_device() call in omapdss_init_of
	phy: usb: Leave some clocks running during suspend
	irqchip/sifive-plic: Add missing thead,c900-plic match string
	netfilter: conntrack: don't refresh sctp entries in closed state
	arm64: dts: meson-gx: add ATF BL32 reserved-memory region
	arm64: dts: meson-g12: add ATF BL32 reserved-memory region
	arm64: dts: meson-g12: drop BL32 region from SEI510/SEI610
	pidfd: fix test failure due to stack overflow on some arches
	selftests: fixup build warnings in pidfd / clone3 tests
	kconfig: let 'shell' return enough output for deep path names
	lib/iov_iter: initialize "flags" in new pipe_buffer
	ata: libata-core: Disable TRIM on M88V29
	soc: aspeed: lpc-ctrl: Block error printing on probe defer cases
	xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create
	drm/rockchip: dw_hdmi: Do not leave clock enabled in error case
	tracing: Fix tp_printk option related with tp_printk_stop_on_boot
	net: usb: qmi_wwan: Add support for Dell DW5829e
	net: macb: Align the dma and coherent dma masks
	kconfig: fix failing to generate auto.conf
	scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop
	EDAC: Fix calculation of returned address and next offset in edac_align_ptr()
	net: sched: limit TC_ACT_REPEAT loops
	dmaengine: sh: rcar-dmac: Check for error num after setting mask
	dmaengine: stm32-dmamux: Fix PM disable depth imbalance in stm32_dmamux_probe
	dmaengine: sh: rcar-dmac: Check for error num after dma_set_max_seg_size
	i2c: qcom-cci: don't delete an unregistered adapter
	i2c: qcom-cci: don't put a device tree node before i2c_add_adapter()
	copy_process(): Move fd_install() out of sighand->siglock critical section
	i2c: brcmstb: fix support for DSL and CM variants
	lockdep: Correct lock_classes index mapping
	Linux 5.10.102

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ief4c91d35938898a0cfebc0c746444f8a40c7ec4
2022-02-23 12:26:12 +01:00
Cheng Jui Wang
6062d1267f lockdep: Correct lock_classes index mapping
commit 28df029d53 upstream.

A kernel exception was hit when trying to dump /proc/lockdep_chains after
lockdep report "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!":

Unable to handle kernel paging request at virtual address 00054005450e05c3
...
00054005450e05c3] address between user and kernel address ranges
...
pc : [0xffffffece769b3a8] string+0x50/0x10c
lr : [0xffffffece769ac88] vsnprintf+0x468/0x69c
...
 Call trace:
  string+0x50/0x10c
  vsnprintf+0x468/0x69c
  seq_printf+0x8c/0xd8
  print_name+0x64/0xf4
  lc_show+0xb8/0x128
  seq_read_iter+0x3cc/0x5fc
  proc_reg_read_iter+0xdc/0x1d4

The cause of the problem is the function lock_chain_get_class() will
shift lock_classes index by 1, but the index don't need to be shifted
anymore since commit 01bb6f0af9 ("locking/lockdep: Change the range
of class_idx in held_lock struct") already change the index to start
from 0.

The lock_classes[-1] located at chain_hlocks array. When printing
lock_classes[-1] after the chain_hlocks entries are modified, the
exception happened.

The output of lockdep_chains are incorrect due to this problem too.

Fixes: f611e8cf98 ("lockdep: Take read/write status in consideration when generate chainkey")
Signed-off-by: Cheng Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20220210105011.21712-1-cheng-jui.wang@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 12:01:08 +01:00
Waiman Long
9fee985f9a copy_process(): Move fd_install() out of sighand->siglock critical section
commit ddc204b517 upstream.

I was made aware of the following lockdep splat:

[ 2516.308763] =====================================================
[ 2516.309085] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[ 2516.309433] 5.14.0-51.el9.aarch64+debug #1 Not tainted
[ 2516.309703] -----------------------------------------------------
[ 2516.310149] stress-ng/153663 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 2516.310512] ffff0000e422b198 (&newf->file_lock){+.+.}-{2:2}, at: fd_install+0x368/0x4f0
[ 2516.310944]
               and this task is already holding:
[ 2516.311248] ffff0000c08140d8 (&sighand->siglock){-.-.}-{2:2}, at: copy_process+0x1e2c/0x3e80
[ 2516.311804] which would create a new lock dependency:
[ 2516.312066]  (&sighand->siglock){-.-.}-{2:2} -> (&newf->file_lock){+.+.}-{2:2}
[ 2516.312446]
               but this new dependency connects a HARDIRQ-irq-safe lock:
[ 2516.312983]  (&sighand->siglock){-.-.}-{2:2}
   :
[ 2516.330700]  Possible interrupt unsafe locking scenario:

[ 2516.331075]        CPU0                    CPU1
[ 2516.331328]        ----                    ----
[ 2516.331580]   lock(&newf->file_lock);
[ 2516.331790]                                local_irq_disable();
[ 2516.332231]                                lock(&sighand->siglock);
[ 2516.332579]                                lock(&newf->file_lock);
[ 2516.332922]   <Interrupt>
[ 2516.333069]     lock(&sighand->siglock);
[ 2516.333291]
                *** DEADLOCK ***
[ 2516.389845]
               stack backtrace:
[ 2516.390101] CPU: 3 PID: 153663 Comm: stress-ng Kdump: loaded Not tainted 5.14.0-51.el9.aarch64+debug #1
[ 2516.390756] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 2516.391155] Call trace:
[ 2516.391302]  dump_backtrace+0x0/0x3e0
[ 2516.391518]  show_stack+0x24/0x30
[ 2516.391717]  dump_stack_lvl+0x9c/0xd8
[ 2516.391938]  dump_stack+0x1c/0x38
[ 2516.392247]  print_bad_irq_dependency+0x620/0x710
[ 2516.392525]  check_irq_usage+0x4fc/0x86c
[ 2516.392756]  check_prev_add+0x180/0x1d90
[ 2516.392988]  validate_chain+0x8e0/0xee0
[ 2516.393215]  __lock_acquire+0x97c/0x1e40
[ 2516.393449]  lock_acquire.part.0+0x240/0x570
[ 2516.393814]  lock_acquire+0x90/0xb4
[ 2516.394021]  _raw_spin_lock+0xe8/0x154
[ 2516.394244]  fd_install+0x368/0x4f0
[ 2516.394451]  copy_process+0x1f5c/0x3e80
[ 2516.394678]  kernel_clone+0x134/0x660
[ 2516.394895]  __do_sys_clone3+0x130/0x1f4
[ 2516.395128]  __arm64_sys_clone3+0x5c/0x7c
[ 2516.395478]  invoke_syscall.constprop.0+0x78/0x1f0
[ 2516.395762]  el0_svc_common.constprop.0+0x22c/0x2c4
[ 2516.396050]  do_el0_svc+0xb0/0x10c
[ 2516.396252]  el0_svc+0x24/0x34
[ 2516.396436]  el0t_64_sync_handler+0xa4/0x12c
[ 2516.396688]  el0t_64_sync+0x198/0x19c
[ 2517.491197] NET: Registered PF_ATMPVC protocol family
[ 2517.491524] NET: Registered PF_ATMSVC protocol family
[ 2591.991877] sched: RT throttling activated

One way to solve this problem is to move the fd_install() call out of
the sighand->siglock critical section.

Before commit 6fd2fe494b ("copy_process(): don't use ksys_close()
on cleanups"), the pidfd installation was done without holding both
the task_list lock and the sighand->siglock. Obviously, holding these
two locks are not really needed to protect the fd_install() call.
So move the fd_install() call down to after the releases of both locks.

Link: https://lore.kernel.org/r/20220208163912.1084752-1-longman@redhat.com
Fixes: 6fd2fe494b ("copy_process(): don't use ksys_close() on cleanups")
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 12:01:08 +01:00
JaeSang Yoo
15616ba17d tracing: Fix tp_printk option related with tp_printk_stop_on_boot
[ Upstream commit 3203ce39ac ]

The kernel parameter "tp_printk_stop_on_boot" starts with "tp_printk" which is
the same as another kernel parameter "tp_printk". If "tp_printk" setup is
called before the "tp_printk_stop_on_boot", it will override the latter
and keep it from being set.

This is similar to other kernel parameter issues, such as:
  Commit 745a600cf1 ("um: console: Ignore console= option")
or init/do_mounts.c:45 (setup function of "ro" kernel param)

Fix it by checking for a "_" right after the "tp_printk" and if that
exists do not process the parameter.

Link: https://lkml.kernel.org/r/20220208195421.969326-1-jsyoo5b@gmail.com

Signed-off-by: JaeSang Yoo <jsyoo5b@gmail.com>
[ Fixed up change log and added space after if condition ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23 12:01:06 +01:00
Kees Cook
143aaf79ba gcc-plugins/stackleak: Use noinstr in favor of notrace
[ Upstream commit dcb85f85fa ]

While the stackleak plugin was already using notrace, objtool is now a
bit more picky.  Update the notrace uses to noinstr.  Silences the
following objtool warnings when building with:

CONFIG_DEBUG_ENTRY=y
CONFIG_STACK_VALIDATION=y
CONFIG_VMLINUX_VALIDATION=y
CONFIG_GCC_PLUGIN_STACKLEAK=y

  vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section

Note that the plugin's addition of calls to stackleak_track_stack() from
noinstr functions is expected to be safe, as it isn't runtime
instrumentation and is self-contained.

Cc: Alexander Popov <alex.popov@linux.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23 12:01:00 +01:00
Igor Pylypiv
de55891e16 Revert "module, async: async_synchronize_full() on module init iff async is used"
[ Upstream commit 67d6212afd ]

This reverts commit 774a1221e8.

We need to finish all async code before the module init sequence is
done.  In the reverted commit the PF_USED_ASYNC flag was added to mark a
thread that called async_schedule().  Then the PF_USED_ASYNC flag was
used to determine whether or not async_synchronize_full() needs to be
invoked.  This works when modprobe thread is calling async_schedule(),
but it does not work if module dispatches init code to a worker thread
which then calls async_schedule().

For example, PCI driver probing is invoked from a worker thread based on
a node where device is attached:

	if (cpu < nr_cpu_ids)
		error = work_on_cpu(cpu, local_pci_probe, &ddi);
	else
		error = local_pci_probe(&ddi);

We end up in a situation where a worker thread gets the PF_USED_ASYNC
flag set instead of the modprobe thread.  As a result,
async_synchronize_full() is not invoked and modprobe completes without
waiting for the async code to finish.

The issue was discovered while loading the pm80xx driver:
(scsi_mod.scan=async)

modprobe pm80xx                      worker
...
  do_init_module()
  ...
    pci_call_probe()
      work_on_cpu(local_pci_probe)
                                     local_pci_probe()
                                       pm8001_pci_probe()
                                         scsi_scan_host()
                                           async_schedule()
                                           worker->flags |= PF_USED_ASYNC;
                                     ...
      < return from worker >
  ...
  if (current->flags & PF_USED_ASYNC) <--- false
  	async_synchronize_full();

Commit 21c3c5d280 ("block: don't request module during elevator init")
fixed the deadlock issue which the reverted commit 774a1221e8
("module, async: async_synchronize_full() on module init iff async is
used") tried to fix.

Since commit 0fdff3ec6d ("async, kmod: warn on synchronous
request_module() from async workers") synchronous module loading from
async is not allowed.

Given that the original deadlock issue is fixed and it is no longer
allowed to call synchronous request_module() from async we can remove
PF_USED_ASYNC flag to make module init consistently invoke
async_synchronize_full() unless async module probe is requested.

Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23 12:01:00 +01:00
Paul E. McKenney
657991fb06 rcu: Do not report strict GPs for outgoing CPUs
commit bfb3aa735f upstream.

An outgoing CPU is marked offline in a stop-machine handler and most
of that CPU's services stop at that point, including IRQ work queues.
However, that CPU must take another pass through the scheduler and through
a number of CPU-hotplug notifiers, many of which contain RCU readers.
In the past, these readers were not a problem because the outgoing CPU
has interrupts disabled, so that rcu_read_unlock_special() would not
be invoked, and thus RCU would never attempt to queue IRQ work on the
outgoing CPU.

This changed with the advent of the CONFIG_RCU_STRICT_GRACE_PERIOD
Kconfig option, in which rcu_read_unlock_special() is invoked upon exit
from almost all RCU read-side critical sections.  Worse yet, because
interrupts are disabled, rcu_read_unlock_special() cannot immediately
report a quiescent state and will therefore attempt to defer this
reporting, for example, by queueing IRQ work.  Which fails with a splat
because the CPU is already marked as being offline.

But it turns out that there is no need to report this quiescent state
because rcu_report_dead() will do this job shortly after the outgoing
CPU makes its final dive into the idle loop.  This commit therefore
makes rcu_read_unlock_special() refrain from queuing IRQ work onto
outgoing CPUs.

Fixes: 44bad5b3cc ("rcu: Do full report for .need_qs for strict GPs")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 12:00:56 +01:00
Greg Kroah-Hartman
f0a557399c Merge 5.10.101 into android13-5.10
Changes in 5.10.101
	integrity: check the return value of audit_log_start()
	ima: Remove ima_policy file before directory
	ima: Allow template selection with ima_template[_fmt]= after ima_hash=
	ima: Do not print policy rule with inactive LSM labels
	mmc: sdhci-of-esdhc: Check for error num after setting mask
	can: isotp: fix potential CAN frame reception race in isotp_rcv()
	net: phy: marvell: Fix RGMII Tx/Rx delays setting in 88e1121-compatible PHYs
	net: phy: marvell: Fix MDI-x polarity setting in 88e1118-compatible PHYs
	NFS: Fix initialisation of nfs_client cl_flags field
	NFSD: Clamp WRITE offsets
	NFSD: Fix offset type in I/O trace points
	drm/amdgpu: Set a suitable dev_info.gart_page_size
	tracing: Propagate is_signed to expression
	NFS: change nfs_access_get_cached to only report the mask
	NFSv4 only print the label when its queried
	nfs: nfs4clinet: check the return value of kstrdup()
	NFSv4.1: Fix uninitialised variable in devicenotify
	NFSv4 remove zero number of fs_locations entries error check
	NFSv4 expose nfs_parse_server_name function
	NFSv4 handle port presence in fs_location server string
	x86/perf: Avoid warning for Arch LBR without XSAVE
	drm: panel-orientation-quirks: Add quirk for the 1Netbook OneXPlayer
	net: sched: Clarify error message when qdisc kind is unknown
	powerpc/fixmap: Fix VM debug warning on unmap
	scsi: target: iscsi: Make sure the np under each tpg is unique
	scsi: ufs: ufshcd-pltfrm: Check the return value of devm_kstrdup()
	scsi: qedf: Add stag_work to all the vports
	scsi: qedf: Fix refcount issue when LOGO is received during TMF
	scsi: pm8001: Fix bogus FW crash for maxcpus=1
	scsi: ufs: Treat link loss as fatal error
	scsi: myrs: Fix crash in error case
	PM: hibernate: Remove register_nosave_region_late()
	usb: dwc2: gadget: don't try to disable ep0 in dwc2_hsotg_suspend
	perf: Always wake the parent event
	nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs
	net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
	KVM: eventfd: Fix false positive RCU usage warning
	KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
	KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
	KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode
	KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow
	riscv: fix build with binutils 2.38
	ARM: dts: imx23-evk: Remove MX23_PAD_SSP1_DETECT from hog group
	ARM: dts: Fix boot regression on Skomer
	ARM: socfpga: fix missing RESET_CONTROLLER
	nvme-tcp: fix bogus request completion when failing to send AER
	ACPI/IORT: Check node revision for PMCG resources
	PM: s2idle: ACPI: Fix wakeup interrupts handling
	drm/rockchip: vop: Correct RK3399 VOP register fields
	ARM: dts: Fix timer regression for beagleboard revision c
	ARM: dts: meson: Fix the UART compatible strings
	ARM: dts: meson8: Fix the UART device-tree schema validation
	ARM: dts: meson8b: Fix the UART device-tree schema validation
	staging: fbtft: Fix error path in fbtft_driver_module_init()
	ARM: dts: imx6qdl-udoo: Properly describe the SD card detect
	phy: xilinx: zynqmp: Fix bus width setting for SGMII
	ARM: dts: imx7ulp: Fix 'assigned-clocks-parents' typo
	usb: f_fs: Fix use-after-free for epfile
	gpio: aggregator: Fix calling into sleeping GPIO controllers
	drm/vc4: hdmi: Allow DBLCLK modes even if horz timing is odd.
	misc: fastrpc: avoid double fput() on failed usercopy
	netfilter: ctnetlink: disable helper autoassign
	arm64: dts: meson-g12b-odroid-n2: fix typo 'dio2133'
	ixgbevf: Require large buffers for build_skb on 82599VF
	drm/panel: simple: Assign data from panel_dpi_probe() correctly
	ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE
	gpio: sifive: use the correct register to read output values
	bonding: pair enable_port with slave_arr_updates
	net: dsa: mv88e6xxx: don't use devres for mdiobus
	net: dsa: ar9331: register the mdiobus under devres
	net: dsa: bcm_sf2: don't use devres for mdiobus
	net: dsa: felix: don't use devres for mdiobus
	net: dsa: lantiq_gswip: don't use devres for mdiobus
	ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path
	nfp: flower: fix ida_idx not being released
	net: do not keep the dst cache when uncloning an skb dst and its metadata
	net: fix a memleak when uncloning an skb dst and its metadata
	veth: fix races around rq->rx_notify_masked
	net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
	tipc: rate limit warning for received illegal binding update
	net: amd-xgbe: disable interrupts during pci removal
	dpaa2-eth: unregister the netdev before disconnecting from the PHY
	ice: fix an error code in ice_cfg_phy_fec()
	ice: fix IPIP and SIT TSO offload
	net: mscc: ocelot: fix mutex lock error during ethtool stats read
	net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
	vt_ioctl: fix array_index_nospec in vt_setactivate
	vt_ioctl: add array_index_nospec to VT_ACTIVATE
	n_tty: wake up poll(POLLRDNORM) on receiving data
	eeprom: ee1004: limit i2c reads to I2C_SMBUS_BLOCK_MAX
	usb: dwc2: drd: fix soft connect when gadget is unconfigured
	Revert "usb: dwc2: drd: fix soft connect when gadget is unconfigured"
	net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup
	usb: ulpi: Move of_node_put to ulpi_dev_release
	usb: ulpi: Call of_node_put correctly
	usb: dwc3: gadget: Prevent core from processing stale TRBs
	usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition
	USB: gadget: validate interface OS descriptor requests
	usb: gadget: rndis: check size of RNDIS_MSG_SET command
	usb: gadget: f_uac2: Define specific wTerminalType
	usb: raw-gadget: fix handling of dual-direction-capable endpoints
	USB: serial: ftdi_sio: add support for Brainboxes US-159/235/320
	USB: serial: option: add ZTE MF286D modem
	USB: serial: ch341: add support for GW Instek USB2.0-Serial devices
	USB: serial: cp210x: add NCR Retail IO box id
	USB: serial: cp210x: add CPI Bulk Coin Recycler id
	speakup-dectlk: Restore pitch setting
	phy: ti: Fix missing sentinel for clk_div_table
	hwmon: (dell-smm) Speed up setting of fan speed
	Makefile.extrawarn: Move -Wunaligned-access to W=1
	can: isotp: fix error path in isotp_sendmsg() to unlock wait queue
	scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabled
	scsi: lpfc: Reduce log messages seen after firmware download
	arm64: dts: imx8mq: fix lcdif port node
	perf: Fix list corruption in perf_cgroup_switch()
	iommu: Fix potential use-after-free during probe
	Linux 5.10.101

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6105dcbfc0c7f1373020d378d2048e692dc502ab
2022-02-16 15:14:37 +01:00
Song Liu
f6b5d51976 perf: Fix list corruption in perf_cgroup_switch()
commit 5f4e5ce638 upstream.

There's list corruption on cgrp_cpuctx_list. This happens on the
following path:

  perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
      cpu_ctx_sched_in
         ctx_sched_in
            ctx_pinned_sched_in
              merge_sched_in
                  perf_cgroup_event_disable: remove the event from the list

Use list_for_each_entry_safe() to allow removing an entry during
iteration.

Fixes: 058fe1c044 ("perf/core: Make cgroup switch visit only cpuctxs with cgroup events")
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220204004057.2961252-1-song@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16 12:54:31 +01:00
Rafael J. Wysocki
a941384fba PM: s2idle: ACPI: Fix wakeup interrupts handling
commit cb1f65c1e1 upstream.

After commit e3728b50cd ("ACPI: PM: s2idle: Avoid possible race
related to the EC GPE") wakeup interrupts occurring immediately after
the one discarded by acpi_s2idle_wake() may be missed.  Moreover, if
the SCI triggers again immediately after the rearming in
acpi_s2idle_wake(), that wakeup may be missed too.

The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup()
when pm_wakeup_irq is 0, but that's not the case any more after the
interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is
cleared by the pm_wakeup_clear() call in s2idle_loop().  However,
there may be wakeup interrupts occurring in that time frame and if
that happens, they will be missed.

To address that issue first move the clearing of pm_wakeup_irq to
the point at which it is known that the interrupt causing
acpi_s2idle_wake() to tun will be discarded, before rearming the SCI
for wakeup.  Moreover, because that only reduces the size of the
time window in which the issue may manifest itself, allow
pm_system_irq_wakeup() to register two second wakeup interrupts in
a row and, when discarding the first one, replace it with the second
one.  [Of course, this assumes that only one wakeup interrupt can be
discarded in one go, but currently that is the case and I am not
aware of any plans to change that.]

Fixes: e3728b50cd ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE")
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16 12:54:22 +01:00
James Clark
d0774cf730 perf: Always wake the parent event
[ Upstream commit 961c391217 ]

When using per-process mode and event inheritance is set to true,
forked processes will create a new perf events via inherit_event() ->
perf_event_alloc(). But these events will not have ring buffers
assigned to them. Any call to wakeup will be dropped if it's called on
an event with no ring buffer assigned because that's the object that
holds the wakeup list.

If the child event is disabled due to a call to
perf_aux_output_begin() or perf_aux_output_end(), the wakeup is
dropped leaving userspace hanging forever on the poll.

Normally the event is explicitly re-enabled by userspace after it
wakes up to read the aux data, but in this case it does not get woken
up so the event remains disabled.

This can be reproduced when using Arm SPE and 'stress' which forks once
before running the workload. By looking at the list of aux buffers read,
it's apparent that they stop after the fork:

  perf record -e arm_spe// -vvv -- stress -c 1

With this patch applied they continue to be printed. This behaviour
doesn't happen when using systemwide or per-cpu mode.

Reported-by: Ruben Ayrapetyan <Ruben.Ayrapetyan@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211206113840.130802-2-james.clark@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16 12:54:20 +01:00
Amadeusz Sławiński
4607218fde PM: hibernate: Remove register_nosave_region_late()
[ Upstream commit 33569ef3c7 ]

It is an unused wrapper forcing kmalloc allocation for registering
nosave regions. Also, rename __register_nosave_region() to
register_nosave_region() now that there is no need for disambiguation.

Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16 12:54:20 +01:00
Tom Zanussi
b4e0c9bcf1 tracing: Propagate is_signed to expression
commit 097f1eefed upstream.

During expression parsing, a new expression field is created which
should inherit the properties of the operands, such as size and
is_signed.

is_signed propagation was missing, causing spurious errors with signed
operands.  Add it in parse_expr() and parse_unary() to fix the problem.

Link: https://lkml.kernel.org/r/f4dac08742fd7a0920bf80a73c6c44042f5eaa40.1643319703.git.zanussi@kernel.org

Cc: stable@vger.kernel.org
Fixes: 100719dcef ("tracing: Add simple expression support to hist triggers")
Reported-by: Yordan Karadzhov <ykaradzhov@vmware.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215513
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16 12:54:17 +01:00
Andrey Konovalov
01047c8c75 BACKPORT: FROMGIT: kasan, vmalloc: add vmalloc tagging for HW_TAGS
(Backport: drop find_vmap_area_exceed_addr changes, as the function
 is not present in 5.10.)

Add vmalloc tagging support to HW_TAGS KASAN.

The key difference between HW_TAGS and the other two KASAN modes when it
comes to vmalloc: HW_TAGS KASAN can only assign tags to physical memory.
The other two modes have shadow memory covering every mapped virtual
memory region.

Make __kasan_unpoison_vmalloc() for HW_TAGS KASAN:

- Skip non-VM_ALLOC mappings as HW_TAGS KASAN can only tag a single
  mapping of normal physical memory; see the comment in the function.
- Generate a random tag, tag the returned pointer and the allocation,
  and initialize the allocation at the same time.
- Propagate the tag into the page stucts to allow accesses through
  page_address(vmalloc_to_page()).

The rest of vmalloc-related KASAN hooks are not needed:

- The shadow-related ones are fully skipped.
- __kasan_poison_vmalloc() is kept as a no-op with a comment.

Poisoning and zeroing of physical pages that are backing vmalloc()
allocations are skipped via __GFP_SKIP_KASAN_UNPOISON and __GFP_SKIP_ZERO:
__kasan_unpoison_vmalloc() does that instead.

Enabling CONFIG_KASAN_VMALLOC with HW_TAGS is not yet allowed.

Link: https://lkml.kernel.org/r/d19b2e9e59a9abc59d05b72dea8429dcaea739c6.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit c9a950bcf1d67298187050bc3179096e4ef248c1
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 217222520
Change-Id: I446b0ae074938389ade70bf503784d4d32b5d09b
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
2022-02-15 17:55:27 +01:00
Andrey Konovalov
092c06519c FROMLIST: kasan, fork: reset pointer tags of vmapped stacks
[Combines a FROMGIT patch and FROMLIST fix for it.]

Once tag-based KASAN modes start tagging vmalloc() allocations, kernel
stacks start getting tagged if CONFIG_VMAP_STACK is enabled.

Reset the tag of kernel stack pointers after allocation in
alloc_thread_stack_node().

For SW_TAGS KASAN, when CONFIG_KASAN_STACK is enabled, the instrumentation
can't handle the SP register being tagged.

For HW_TAGS KASAN, there's no instrumentation-related issues.  However,
the impact of having a tagged SP register needs to be properly evaluated,
so keep it non-tagged for now.

Note, that the memory for the stack allocation still gets tagged to catch
vmalloc-into-stack out-of-bounds accesses.

Link: https://lkml.kernel.org/r/c6c96f012371ecd80e1936509ebcd3b07a5956f7.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 9d2dae85d689202c56068ce62e20821ad91c3606
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/linux-mm/f50c5f96ef896d7936192c888b0c0a7674e33184.1644943792.git.andreyknvl@google.com/
Bug: 217222520
Change-Id: Ie723b03f1b857bc841cffc9a424b2791c97044a6
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
2022-02-15 17:54:25 +01:00
Jun Miao
e7c11d3363 BACKPORT: rcu: Avoid alloc_pages() when recording stack
(Backport: adjacent lines changes.)

The default kasan_record_aux_stack() calls stack_depot_save() with GFP_NOWAIT,
which in turn can then call alloc_pages(GFP_NOWAIT, ...).  In general, however,
it is not even possible to use either GFP_ATOMIC nor GFP_NOWAIT in certain
non-preemptive contexts/RT kernel including raw_spin_locks (see gfp.h and ab00db216c).
Fix it by instructing stackdepot to not expand stack storage via alloc_pages()
in case it runs out by using kasan_record_aux_stack_noalloc().

Jianwei Hu reported:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:969
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 15319, name: python3
INFO: lockdep is turned off.
irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
  softirqs last  enabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  CPU: 6 PID: 15319 Comm: python3 Tainted: G        W  O 5.15-rc7-preempt-rt #1
  Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1b 12/17/2018
  Call Trace:
    show_stack+0x52/0x58
    dump_stack+0xa1/0xd6
    ___might_sleep.cold+0x11c/0x12d
    rt_spin_lock+0x3f/0xc0
    rmqueue+0x100/0x1460
    rmqueue+0x100/0x1460
    mark_usage+0x1a0/0x1a0
    ftrace_graph_ret_addr+0x2a/0xb0
    rmqueue_pcplist.constprop.0+0x6a0/0x6a0
     __kasan_check_read+0x11/0x20
     __zone_watermark_ok+0x114/0x270
     get_page_from_freelist+0x148/0x630
     is_module_text_address+0x32/0xa0
     __alloc_pages_nodemask+0x2f6/0x790
     __alloc_pages_slowpath.constprop.0+0x12d0/0x12d0
     create_prof_cpu_mask+0x30/0x30
     alloc_pages_current+0xb1/0x150
     stack_depot_save+0x39f/0x490
     kasan_save_stack+0x42/0x50
     kasan_save_stack+0x23/0x50
     kasan_record_aux_stack+0xa9/0xc0
     __call_rcu+0xff/0x9c0
     call_rcu+0xe/0x10
     put_object+0x53/0x70
     __delete_object+0x7b/0x90
     kmemleak_free+0x46/0x70
     slab_free_freelist_hook+0xb4/0x160
     kfree+0xe5/0x420
     kfree_const+0x17/0x30
     kobject_cleanup+0xaa/0x230
     kobject_put+0x76/0x90
     netdev_queue_update_kobjects+0x17d/0x1f0
     ... ...
     ksys_write+0xd9/0x180
     __x64_sys_write+0x42/0x50
     do_syscall_64+0x38/0x50
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

Links: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/kasan.h?id=7cb3007ce2da27ec02a1a3211941e7fe6875b642
Fixes: 84109ab585 ("rcu: Record kvfree_call_rcu() call stack for KASAN")
Fixes: 26e760c9a7 ("rcu: kasan: record and print call_rcu() call stack")
Reported-by: Jianwei Hu <jianwei.hu@windriver.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Marco Elver <elver@google.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Jun Miao <jun.miao@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 300c0c5e72)
Bug: 217222520
Change-Id: I5b7d4f9dfe5da290627599d8d6de3278debb2a13
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
2022-02-15 17:54:14 +01:00
Marco Elver
8f225ea876 UPSTREAM: workqueue, kasan: avoid alloc_pages() when recording stack
Shuah Khan reported:

 | When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
 | kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
 | it tries to allocate memory attempting to acquire spinlock in page
 | allocation code while holding workqueue pool raw_spinlock.
 |
 | There are several instances of this problem when block layer tries
 | to __queue_work(). Call trace from one of these instances is below:
 |
 |     kblockd_mod_delayed_work_on()
 |       mod_delayed_work_on()
 |         __queue_delayed_work()
 |           __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
 |             insert_work()
 |               kasan_record_aux_stack()
 |                 kasan_save_stack()
 |                   stack_depot_save()
 |                     alloc_pages()
 |                       __alloc_pages()
 |                         get_page_from_freelist()
 |                           rm_queue()
 |                             rm_queue_pcplist()
 |                               local_lock_irqsave(&pagesets.lock, flags);
 |                               [ BUG: Invalid wait context triggered ]

The default kasan_record_aux_stack() calls stack_depot_save() with
GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...).
In general, however, it is not even possible to use either GFP_ATOMIC
nor GFP_NOWAIT in certain non-preemptive contexts, including
raw_spin_locks (see gfp.h and commmit ab00db216c).

Fix it by instructing stackdepot to not expand stack storage via
alloc_pages() in case it runs out by using
kasan_record_aux_stack_noalloc().

While there is an increased risk of failing to insert the stack trace,
this is typically unlikely, especially if the same insertion had already
succeeded previously (stack depot hit).

For frequent calls from the same location, it therefore becomes
extremely unlikely that kasan_record_aux_stack_noalloc() fails.

Link: https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
Link: https://lkml.kernel.org/r/20210913112609.2651084-7-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Taras Madan <tarasmadan@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f70da745be)
Bug: 217222520
Change-Id: I81052dba582dbcb492fd7efcef8d56bac4766503
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
2022-02-15 17:54:14 +01:00
Zqiang
adebe26902 UPSTREAM: rcu: Record kvfree_call_rcu() call stack for KASAN
This commit adds a call to kasan_record_aux_stack() in kvfree_call_rcu()
in order to record the call stack of the code that caused the object
to be freed.  Please note that this function does not update the
allocated/freed state, which is important because RCU readers might
still be referencing this object.

Acked-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 84109ab585)
Bug: 217222520
Change-Id: Ia7c27babe4b2318ab116b508578c17e8967e70b1
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
2022-02-15 17:54:11 +01:00
Yee Lee
5e5294d0e5 UPSTREAM: scs: Release kasan vmalloc poison in scs_free process
Since scs allocation is moved to vmalloc region, the
shadow stack is protected by kasan_posion_vmalloc.
However, the vfree_atomic operation needs to access
its context for scs_free process and causes kasan error
as the dump info below.

This patch Adds kasan_unpoison_vmalloc() before vfree_atomic,
which aligns to the prior flow as using kmem_cache.
The vmalloc region will go back posioned in the following
vumap() operations.

 ==================================================================
 BUG: KASAN: vmalloc-out-of-bounds in llist_add_batch+0x60/0xd4
 Write of size 8 at addr ffff8000100b9000 by task kthreadd/2

 CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.15.0-rc2-11681-g92477dd1faa6-dirty #1
 Hardware name: linux,dummy-virt (DT)
 Call trace:
  dump_backtrace+0x0/0x43c
  show_stack+0x1c/0x2c
  dump_stack_lvl+0x68/0x84
  print_address_description+0x80/0x394
  kasan_report+0x180/0x1dc
  __asan_report_store8_noabort+0x48/0x58
  llist_add_batch+0x60/0xd4
  vfree_atomic+0x60/0xe0
  scs_free+0x1dc/0x1fc
  scs_release+0xa4/0xd4
  free_task+0x30/0xe4
  __put_task_struct+0x1ec/0x2e0
  delayed_put_task_struct+0x5c/0xa0
  rcu_do_batch+0x62c/0x8a0
  rcu_core+0x60c/0xc14
  rcu_core_si+0x14/0x24
  __do_softirq+0x19c/0x68c
  irq_exit+0x118/0x2dc
  handle_domain_irq+0xcc/0x134
  gic_handle_irq+0x7c/0x1bc
  call_on_irq_stack+0x40/0x70
  do_interrupt_handler+0x78/0x9c
  el1_interrupt+0x34/0x60
  el1h_64_irq_handler+0x1c/0x2c
  el1h_64_irq+0x78/0x7c
  _raw_spin_unlock_irqrestore+0x40/0xcc
  sched_fork+0x4f0/0xb00
  copy_process+0xacc/0x3648
  kernel_clone+0x168/0x534
  kernel_thread+0x13c/0x1b0
  kthreadd+0x2bc/0x400
  ret_from_fork+0x10/0x20

 Memory state around the buggy address:
  ffff8000100b8f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  ffff8000100b8f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
 >ffff8000100b9000: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
                    ^
  ffff8000100b9080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  ffff8000100b9100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
 ==================================================================

Suggested-by: Kuan-Ying Lee <kuan-ying.lee@mediatek.com>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Yee Lee <yee.lee@mediatek.com>
Fixes: a2abe7cbd8 ("scs: switch to vmapped shadow stacks")
Link: https://lore.kernel.org/r/20210930081619.30091-1-yee.lee@mediatek.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 528a4ab453)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: Idc934fd9b43cbec74a9c9e085094ebcbcbc88344
2022-02-14 20:09:32 -08:00
Masami Hiramatsu
2e7174e822 UPSTREAM: tracing/boot: Fix to loop on only subkeys
Since the commit e5efaeb8a8 ("bootconfig: Support mixing
a value and subkeys under a key") allows to co-exist a value
node and key nodes under a node, xbc_node_for_each_child()
is not only returning key node but also a value node.
In the boot-time tracing using xbc_node_for_each_child() to
iterate the events, groups and instances, but those must be
key nodes. Thus it must use xbc_node_for_each_subkey().

Link: https://lkml.kernel.org/r/163112988361.74896.2267026262061819145.stgit@devnote2

Fixes: e5efaeb8a8 ("bootconfig: Support mixing a value and subkeys under a key")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
(cherry picked from commit cfd799837d)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I7cf6b7fbf334edc44e6da1f519f30d467ece6fec
2022-02-14 20:09:31 -08:00
Will Deacon
48879e2416 ANDROID: sched: Don't allow frozen asymmetric tasks to remain on the rq
If a task with a restricted possible CPU mask and PF_FROZEN or
PF_FREEZER_SKIP set blocks, then we must not put it back on the runqueue
to handle a signal because this could lead to migration failures later
on if the suspending CPU is not capable of running it.

Return such a task to the runqueue only if a fatal signal is pending,
and otherwise allow the task to block.

Bug: 202918514
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I04cc9e65751f2bffc556c4da9ef02fe386764324
2022-02-10 10:49:26 +00:00
Greg Kroah-Hartman
193970ba72 Merge 5.10.99 into android13-5.10
Changes in 5.10.99
	selinux: fix double free of cond_list on error paths
	audit: improve audit queue handling when "audit=1" on cmdline
	ASoC: ops: Reject out of bounds values in snd_soc_put_volsw()
	ASoC: ops: Reject out of bounds values in snd_soc_put_volsw_sx()
	ASoC: ops: Reject out of bounds values in snd_soc_put_xr_sx()
	ALSA: usb-audio: Correct quirk for VF0770
	ALSA: hda: Fix UAF of leds class devs at unbinding
	ALSA: hda: realtek: Fix race at concurrent COEF updates
	ALSA: hda/realtek: Add quirk for ASUS GU603
	ALSA: hda/realtek: Add missing fixup-model entry for Gigabyte X570 ALC1220 quirks
	ALSA: hda/realtek: Fix silent output on Gigabyte X570S Aorus Master (newer chipset)
	ALSA: hda/realtek: Fix silent output on Gigabyte X570 Aorus Xtreme after reboot from Windows
	btrfs: fix deadlock between quota disable and qgroup rescan worker
	drm/nouveau: fix off by one in BIOS boundary checking
	drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels
	nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts()
	mm/debug_vm_pgtable: remove pte entry from the page table
	mm/pgtable: define pte_index so that preprocessor could recognize it
	mm/kmemleak: avoid scanning potential huge holes
	block: bio-integrity: Advance seed correctly for larger interval sizes
	dma-buf: heaps: Fix potential spectre v1 gadget
	IB/hfi1: Fix AIP early init panic
	Revert "ASoC: mediatek: Check for error clk pointer"
	memcg: charge fs_context and legacy_fs_context
	RDMA/cma: Use correct address when leaving multicast group
	RDMA/ucma: Protect mc during concurrent multicast leaves
	IB/rdmavt: Validate remote_addr during loopback atomic tests
	RDMA/siw: Fix broken RDMA Read Fence/Resume logic.
	RDMA/mlx4: Don't continue event handler after memory allocation failure
	iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()
	iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()
	spi: bcm-qspi: check for valid cs before applying chip select
	spi: mediatek: Avoid NULL pointer crash in interrupt
	spi: meson-spicc: add IRQ check in meson_spicc_probe
	spi: uniphier: fix reference count leak in uniphier_spi_probe()
	net: ieee802154: hwsim: Ensure proper channel selection at probe time
	net: ieee802154: mcr20a: Fix lifs/sifs periods
	net: ieee802154: ca8210: Stop leaking skb's
	net: ieee802154: Return meaningful error codes from the netlink helpers
	net: macsec: Fix offload support for NETDEV_UNREGISTER event
	net: macsec: Verify that send_sci is on when setting Tx sci explicitly
	net: stmmac: dump gmac4 DMA registers correctly
	net: stmmac: ensure PTP time register reads are consistent
	drm/i915/overlay: Prevent divide by zero bugs in scaling
	ASoC: fsl: Add missing error handling in pcm030_fabric_probe
	ASoC: xilinx: xlnx_formatter_pcm: Make buffer bytes multiple of period bytes
	ASoC: cpcap: Check for NULL pointer after calling of_get_child_by_name
	ASoC: max9759: fix underflow in speaker_gain_control_put()
	pinctrl: intel: Fix a glitch when updating IRQ flags on a preconfigured line
	pinctrl: intel: fix unexpected interrupt
	pinctrl: bcm2835: Fix a few error paths
	scsi: bnx2fc: Make bnx2fc_recv_frame() mp safe
	nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
	gve: fix the wrong AdminQ buffer queue index check
	bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
	selftests/exec: Remove pipe from TEST_GEN_FILES
	selftests: futex: Use variable MAKE instead of make
	tools/resolve_btfids: Do not print any commands when building silently
	rtc: cmos: Evaluate century appropriate
	Revert "fbcon: Disable accelerated scrolling"
	fbcon: Add option to enable legacy hardware acceleration
	perf stat: Fix display of grouped aliased events
	perf/x86/intel/pt: Fix crash with stop filters in single-range mode
	x86/perf: Default set FREEZE_ON_SMI for all
	EDAC/altera: Fix deferred probing
	EDAC/xgene: Fix deferred probing
	ext4: prevent used blocks from being allocated during fast commit replay
	ext4: modify the logic of ext4_mb_new_blocks_simple
	ext4: fix error handling in ext4_restore_inline_data()
	ext4: fix error handling in ext4_fc_record_modified_inode()
	ext4: fix incorrect type issue during replay_del_range
	net: dsa: mt7530: make NET_DSA_MT7530 select MEDIATEK_GE_PHY
	cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
	selftests: nft_concat_range: add test for reload with no element add/del
	Linux 5.10.99

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3571b1ff035a83999daafa35157f96ebfd65ef0a
2022-02-09 12:26:22 +01:00
Waiman Long
5577273135 cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
commit 2bdfd2825c upstream.

It was found that a "suspicious RCU usage" lockdep warning was issued
with the rcu_read_lock() call in update_sibling_cpumasks().  It is
because the update_cpumasks_hier() function may sleep. So we have
to release the RCU lock, call update_cpumasks_hier() and reacquire
it afterward.

Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks()
instead of stating that in the comment.

Fixes: 4716909cc5 ("cpuset: Track cpusets that use parent's effective_cpus")
Signed-off-by: Waiman Long <longman@redhat.com>
Tested-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08 18:30:41 +01:00
Hou Tao
6304a613a9 bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
commit b293dcc473 upstream.

After commit 2fd3fb0be1d1 ("kasan, vmalloc: unpoison VM_ALLOC pages
after mapping"), non-VM_ALLOC mappings will be marked as accessible
in __get_vm_area_node() when KASAN is enabled. But now the flag for
ringbuf area is VM_ALLOC, so KASAN will complain out-of-bound access
after vmap() returns. Because the ringbuf area is created by mapping
allocated pages, so use VM_MAP instead.

After the change, info in /proc/vmallocinfo also changes from
  [start]-[end]   24576 ringbuf_map_alloc+0x171/0x290 vmalloc user
to
  [start]-[end]   24576 ringbuf_map_alloc+0x171/0x290 vmap user

Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: syzbot+5ad567a418794b9b5983@syzkaller.appspotmail.com
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220202060158.6260-1-houtao1@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08 18:30:39 +01:00
Paul Moore
0ff6b80506 audit: improve audit queue handling when "audit=1" on cmdline
commit f26d043313 upstream.

When an admin enables audit at early boot via the "audit=1" kernel
command line the audit queue behavior is slightly different; the
audit subsystem goes to greater lengths to avoid dropping records,
which unfortunately can result in problems when the audit daemon is
forcibly stopped for an extended period of time.

This patch makes a number of changes designed to improve the audit
queuing behavior so that leaving the audit daemon in a stopped state
for an extended period does not cause a significant impact to the
system.

- kauditd_send_queue() is now limited to looping through the
  passed queue only once per call.  This not only prevents the
  function from looping indefinitely when records are returned
  to the current queue, it also allows any recovery handling in
  kauditd_thread() to take place when kauditd_send_queue()
  returns.

- Transient netlink send errors seen as -EAGAIN now cause the
  record to be returned to the retry queue instead of going to
  the hold queue.  The intention of the hold queue is to store,
  perhaps for an extended period of time, the events which led
  up to the audit daemon going offline.  The retry queue remains
  a temporary queue intended to protect against transient issues
  between the kernel and the audit daemon.

- The retry queue is now limited by the audit_backlog_limit
  setting, the same as the other queues.  This allows admins
  to bound the size of all of the audit queues on the system.

- kauditd_rehold_skb() now returns records to the end of the
  hold queue to ensure ordering is preserved in the face of
  recent changes to kauditd_send_queue().

Cc: stable@vger.kernel.org
Fixes: 5b52330bbf ("audit: fix auditd/kernel connection state tracking")
Fixes: f4b3ee3c85 ("audit: improve robustness of the audit queue handling")
Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Tested-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08 18:30:34 +01:00
Greg Kroah-Hartman
a23e639396 Merge 5.10.97 into android13-5.10
Changes in 5.10.97
	PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
	net: ipa: fix atomic update in ipa_endpoint_replenish()
	net: ipa: use a bitmap for endpoint replenish_enabled
	net: ipa: prevent concurrent replenish
	Revert "drivers: bus: simple-pm-bus: Add support for probing simple bus only devices"
	KVM: x86: Forcibly leave nested virt when SMM state is toggled
	psi: Fix uaf issue when psi trigger is destroyed while being polled
	x86/mce: Add Xeon Sapphire Rapids to list of CPUs that support PPIN
	x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
	drm/vc4: hdmi: Make sure the device is powered with CEC
	cgroup-v1: Require capabilities to set release_agent
	net/mlx5e: Fix handling of wrong devices during bond netevent
	net/mlx5: Use del_timer_sync in fw reset flow of halting poll
	net/mlx5: E-Switch, Fix uninitialized variable modact
	ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
	net: amd-xgbe: ensure to reset the tx_timer_active flag
	net: amd-xgbe: Fix skb data length underflow
	fanotify: Fix stale file descriptor in copy_event_to_user()
	net: sched: fix use-after-free in tc_new_tfilter()
	rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
	cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
	af_packet: fix data-race in packet_setsockopt / packet_setsockopt
	tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
	Linux 5.10.97

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5fedd9e1819539df7b733cdfc7ff619bace1ece8
2022-02-05 13:23:26 +01:00
Tianchen Ding
aa9e96db31 cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
commit c80d401c52 upstream.

subparts_cpus should be limited as a subset of cpus_allowed, but it is
updated wrongly by using cpumask_andnot(). Use cpumask_and() instead to
fix it.

Fixes: ee8dde0cd2 ("cpuset: Add new v2 cpuset.sched.partition flag")
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-05 12:37:56 +01:00
Eric W. Biederman
1fc3444cda cgroup-v1: Require capabilities to set release_agent
commit 24f6008564 upstream.

The cgroup release_agent is called with call_usermodehelper.  The function
call_usermodehelper starts the release_agent with a full set fo capabilities.
Therefore require capabilities when setting the release_agaent.

Reported-by: Tabitha Sable <tabitha.c.sable@gmail.com>
Tested-by: Tabitha Sable <tabitha.c.sable@gmail.com>
Fixes: 81a6a5cdd2 ("Task Control Groups: automatic userspace notification of idle cgroups")
Cc: stable@vger.kernel.org # v2.6.24+
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-05 12:37:55 +01:00
Suren Baghdasaryan
d4e4e61d4a psi: Fix uaf issue when psi trigger is destroyed while being polled
commit a06247c680 upstream.

With write operation on psi files replacing old trigger with a new one,
the lifetime of its waitqueue is totally arbitrary. Overwriting an
existing trigger causes its waitqueue to be freed and pending poll()
will stumble on trigger->event_wait which was destroyed.
Fix this by disallowing to redefine an existing psi trigger. If a write
operation is used on a file descriptor with an already existing psi
trigger, the operation will fail with EBUSY error.
Also bypass a check for psi_disabled in the psi_trigger_destroy as the
flag can be flipped after the trigger is created, leading to a memory
leak.

Fixes: 0e94682b73 ("psi: introduce psi monitor")
Reported-by: syzbot+cdb5dd11c97cc532efad@syzkaller.appspotmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Analyzed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220111232309.1786347-1-surenb@google.com
[surenb: backported to 5.10 kernel]
CC: stable@vger.kernel.org # 5.10
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-05 12:37:55 +01:00
Greg Kroah-Hartman
7859e08c9c Merge 5.10.96 into android13-5.10
Changes in 5.10.96
	Bluetooth: refactor malicious adv data check
	media: venus: core: Drop second v4l2 device unregister
	net: sfp: ignore disabled SFP node
	net: stmmac: skip only stmmac_ptp_register when resume from suspend
	s390/module: fix loading modules with a lot of relocations
	s390/hypfs: include z/VM guests with access control group set
	bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
	scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
	udf: Restore i_lenAlloc when inode expansion fails
	udf: Fix NULL ptr deref when converting from inline format
	efi: runtime: avoid EFIv2 runtime services on Apple x86 machines
	PM: wakeup: simplify the output logic of pm_show_wakelocks()
	tracing/histogram: Fix a potential memory leak for kstrdup()
	tracing: Don't inc err_log entry count if entry allocation fails
	ceph: properly put ceph_string reference after async create attempt
	ceph: set pool_ns in new inode layout for async creates
	fsnotify: fix fsnotify hooks in pseudo filesystems
	Revert "KVM: SVM: avoid infinite loop on NPF from bad address"
	perf/x86/intel/uncore: Fix CAS_COUNT_WRITE issue for ICX
	drm/etnaviv: relax submit size limits
	KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS
	arm64: errata: Fix exec handling in erratum 1418040 workaround
	netfilter: nft_payload: do not update layer 4 checksum when mangling fragments
	serial: 8250: of: Fix mapped region size when using reg-offset property
	serial: stm32: fix software flow control transfer
	tty: n_gsm: fix SW flow control encoding/handling
	tty: Add support for Brainboxes UC cards.
	usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
	usb: xhci-plat: fix crash when suspend if remote wake enable
	usb: common: ulpi: Fix crash in ulpi_match()
	usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
	USB: core: Fix hang in usb_kill_urb by adding memory barriers
	usb: typec: tcpm: Do not disconnect while receiving VBUS off
	ucsi_ccg: Check DEV_INT bit only when starting CCG4
	jbd2: export jbd2_journal_[grab|put]_journal_head
	ocfs2: fix a deadlock when commit trans
	sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
	x86/MCE/AMD: Allow thresholding interface updates after init
	powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs
	powerpc/32s: Fix kasan_init_region() for KASAN
	powerpc/32: Fix boot failure with GCC latent entropy plugin
	i40e: Increase delay to 1 s after global EMP reset
	i40e: Fix issue when maximum queues is exceeded
	i40e: Fix queues reservation for XDP
	i40e: Fix for failed to init adminq while VF reset
	i40e: fix unsigned stat widths
	usb: roles: fix include/linux/usb/role.h compile issue
	rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev
	rpmsg: char: Fix race between the release of rpmsg_eptdev and cdev
	scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
	ipv6_tunnel: Rate limit warning messages
	net: fix information leakage in /proc/net/ptype
	hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649
	hwmon: (lm90) Mark alert as broken for MAX6680
	ping: fix the sk_bound_dev_if match in ping_lookup
	ipv4: avoid using shared IP generator for connected sockets
	hwmon: (lm90) Reduce maximum conversion rate for G781
	NFSv4: Handle case where the lookup of a directory fails
	NFSv4: nfs_atomic_open() can race when looking up a non-regular file
	net-procfs: show net devices bound packet types
	drm/msm: Fix wrong size calculation
	drm/msm/dsi: Fix missing put_device() call in dsi_get_phy
	drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable
	ipv6: annotate accesses to fn->fn_sernum
	NFS: Ensure the server has an up to date ctime before hardlinking
	NFS: Ensure the server has an up to date ctime before renaming
	powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
	netfilter: conntrack: don't increment invalid counter on NF_REPEAT
	kernel: delete repeated words in comments
	perf: Fix perf_event_read_local() time
	sched/pelt: Relax the sync of util_sum with util_avg
	net: phy: broadcom: hook up soft_reset for BCM54616S
	phylib: fix potential use-after-free
	octeontx2-pf: Forward error codes to VF
	rxrpc: Adjust retransmission backoff
	efi/libstub: arm64: Fix image check alignment at entry
	hwmon: (lm90) Mark alert as broken for MAX6654
	powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending
	net: ipv4: Move ip_options_fragment() out of loop
	net: ipv4: Fix the warning for dereference
	ipv4: fix ip option filtering for locally generated fragments
	ibmvnic: init ->running_cap_crqs early
	ibmvnic: don't spin in tasklet
	video: hyperv_fb: Fix validation of screen resolution
	drm/msm/hdmi: Fix missing put_device() call in msm_hdmi_get_phy
	drm/msm/dpu: invalid parameter check in dpu_setup_dspp_pcc
	yam: fix a memory leak in yam_siocdevprivate()
	net: cpsw: Properly initialise struct page_pool_params
	net: hns3: handle empty unknown interrupt for VF
	Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
	net: bridge: vlan: fix single net device option dumping
	ipv4: raw: lock the socket in raw_bind()
	ipv4: tcp: send zero IPID in SYNACK messages
	ipv4: remove sparse error in ip_neigh_gw4()
	net: bridge: vlan: fix memory leak in __allowed_ingress
	dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
	usr/include/Makefile: add linux/nfc.h to the compile-test coverage
	fsnotify: invalidate dcache before IN_DELETE event
	block: Fix wrong offset in bio_truncate()
	mtd: rawnand: mpc5121: Remove unused variable in ads5121_select_chip()
	Linux 5.10.96

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3bb702bbb2f32488f2ba15d40ed00982e028d7cb
2022-02-02 09:32:52 +01:00
Vincent Guittot
57b2f3632b sched/pelt: Relax the sync of util_sum with util_avg
[ Upstream commit 98b0d89022 ]

Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
    https://bugzilla.kernel.org/show_bug.cgi?id=215045

He bisected the problem to:
commit 1c35b07e6d ("sched/fair: Ensure _sum and _avg values stay consistent")

This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.

Instead of always setting util_sum to the low bound of util_avg, which can
significantly lower the utilization of root cfs_rq after propagating the
change down into the hierarchy, we revert the change of util_sum and
propagate the difference.

In addition, we also check that cfs's util_sum always stays above the
lower bound for a given util_avg as it has been observed that
sched_entity's util_sum is sometimes above cfs one.

Fixes: 1c35b07e6d ("sched/fair: Ensure _sum and _avg values stay consistent")
Reported-by: Rick Yiu <rickyiu@google.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-2-vincent.guittot@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-01 17:25:45 +01:00
Peter Zijlstra
91b04e83c7 perf: Fix perf_event_read_local() time
[ Upstream commit 09f5e7dc7a ]

Time readers that cannot take locks (due to NMI etc..) currently make
use of perf_event::shadow_ctx_time, which, for that event gives:

  time' = now + (time - timestamp)

or, alternatively arranged:

  time' = time + (now - timestamp)

IOW, the progression of time since the last time the shadow_ctx_time
was updated.

There's problems with this:

 A) the shadow_ctx_time is per-event, even though the ctx_time it
    reflects is obviously per context. The direct concequence of this
    is that the context needs to iterate all events all the time to
    keep the shadow_ctx_time in sync.

 B) even with the prior point, the context itself might not be active
    meaning its time should not advance to begin with.

 C) shadow_ctx_time isn't consistently updated when ctx_time is

There are 3 users of this stuff, that suffer differently from this:

 - calc_timer_values()
   - perf_output_read()
   - perf_event_update_userpage()	/* A */

 - perf_event_read_local()		/* A,B */

In particular, perf_output_read() doesn't suffer at all, because it's
sample driven and hence only relevant when the event is actually
running.

This same was supposed to be true for perf_event_update_userpage(),
after all self-monitoring implies the context is active *HOWEVER*, as
per commit f792565326 ("perf/core: fix userpage->time_enabled of
inactive events") this goes wrong when combined with counter
overcommit, in that case those events that do not get scheduled when
the context becomes active (task events typically) miss out on the
EVENT_TIME update and ENABLED time is inflated (for a little while)
with the time the context was inactive. Once the event gets rotated
in, this gets corrected, leading to a non-monotonic timeflow.

perf_event_read_local() made things even worse, it can request time at
any point, suffering all the problems perf_event_update_userpage()
does and more. Because while perf_event_update_userpage() is limited
by the context being active, perf_event_read_local() users have no
such constraint.

Therefore, completely overhaul things and do away with
perf_event::shadow_ctx_time. Instead have regular context time updates
keep track of this offset directly and provide perf_event_time_now()
to complement perf_event_time().

perf_event_time_now() will, in adition to being context wide, also
take into account if the context is active. For inactive context, it
will not advance time.

This latter property means the cgroup perf_cgroup_info context needs
to grow addition state to track this.

Additionally, since all this is strictly per-cpu, we can use barrier()
to order context activity vs context time.

Fixes: 7d9285e82d ("perf/bpf: Extend the perf_event_read_local() interface, a.k.a. "bpf: perf event change needed for subsequent bpf helpers"")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Song Liu <song@kernel.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/YcB06DasOBtU0b00@hirez.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-01 17:25:45 +01:00