Commit Graph

38775 Commits

Author SHA1 Message Date
Liujie Xie
456a8d4c1f ANDROID: vendor_hooks: Add hooks for rwsem and mutex
Add hooks to apply oem's optimization of rwsem and mutex

Bug: 182237112
Signed-off-by: xieliujie <xieliujie@oppo.com>
(cherry picked from commit 80b4341d05)

Signed-off-by: xieliujie <xieliujie@oppo.com>
Change-Id: I36895c432e5b6d6bff8781b4a7872badb693284c
Signed-off-by: Carlos Llamas <cmllamas@google.com>
[cmllamas: completes the cherry-pick of original commit 80b4341d05
since commit 0902cc73b793 was only partial]
(cherry picked from commit d4528a28cb5be0c322031f333a6230fa3042931f)
2023-05-06 00:25:54 +00:00
Liujie Xie
a945254842 ANDROID: vendor_hooks: Add hooks for mutex and rwsem optimistic spin
These hooks help us do the following things:
a) Record the number of mutex and rwsem optimistic spin.
b) Monitor the time of mutex and rwsem optimistic spin.
c) Make it possible if oems don't want mutex and rwsem to optimistic spin
for a long time.

Bug: 267565260
Change-Id: I2bee30fb17946be85e026213b481aeaeaee2459f
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
(cherry picked from commit d01f7e1269)
(cherry picked from commit 05b5ff11ad98c5896b352b4c376a84b63684e06c)
2023-05-06 00:25:54 +00:00
Rick Yiu
f9688670ca ANDROID: sched: Add trace_android_rvh_setscheduler
Sync to android13-5.10. This vendor hook is declared already.

Bug: 245675204
Change-Id: Ib081b52542380d22317f225a50b553cda5f2634c
Signed-off-by: Rick Yiu <rickyiu@google.com>
2023-05-05 20:45:38 +00:00
Peifeng Li
3568391d31 ANDROID: vendor_hook: add hooks to protect locking-tsk in cpu scheduler
Providing vendor hooks to record the start time of holding the lock, which
protects rwsem/mutex locking-process from being preemptedfor a short time
in some cases.

- android_vh_record_mutex_lock_starttime
- android_vh_record_rtmutex_lock_starttime
- android_vh_record_rwsem_lock_starttime
- android_vh_record_pcpu_rwsem_starttime

Bug: 241191475

Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I0e967a1e8b77c32a1ad588acd54028fae2f90c4e
(cherry picked from commit f7294947672eb6b786f3c16b49e71e6a239101ad)
2023-05-05 20:26:24 +00:00
Liujie Xie
e1f430a487 ANDROID: vendor_hooks: Add hooks to record the time of the process in various states
These hooks will do the following works:
a) record the time of the process in various states
b) Make corresponding optimization strategies in different hooks

Bug: 205938967

Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Ia3c47bbf0aadd17337ce18fd910343b1b8c3ef93
(cherry picked from commit a61d61bab7)
Signed-off-by: Carlos Llamas <cmllamas@google.com>
(cherry picked from commit b7a1174cc5dfc68cf769547b91958df407e55981)
2023-04-28 23:05:18 +00:00
xieliujie
5491cac560 ANDROID: sched: Add vendor hooks to compute new cpu freq.
add vendor hooks to compute new cpu freq for oem feature.

Bug: 280021175

Signed-off-by: xieliujie <xieliujie@oppo.com>
Change-Id: I2b8e1f76f3b9792148f153190b862face679ebbd
2023-04-28 23:02:59 +00:00
Liujie Xie
08d7a5f89e ANDROID: vendor_hooks: Add hooks for futex
We want to use this hook to record the sleeping time due to Futex

Bug: 210947226

Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I637f889dce42937116d10979e0c40fddf96cd1a2
(cherry picked from commit a7ab784f60)
2023-04-28 22:58:52 +00:00
Greg Kroah-Hartman
bad74575a1 Revert "sched/fair: Detect capacity inversion"
This reverts commit 99b704ae7a.

It breaks the current Android kernel abi.  It will be brought back at
the next KABI break update.

Bug: 161946584
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie9a141414435aa17dc97ef1e1ec8c23dbe7816e8
2023-04-28 10:37:41 +00:00
Greg Kroah-Hartman
114ae28faa Revert "sched/fair: Consider capacity inversion in util_fits_cpu()"
This reverts commit 98762616db.

It breaks the current Android kernel abi.  It will be brought back at
the next KABI break update.

Bug: 161946584
Change-Id: I4ed9d6760b8d2e26bad66d9af39d7819e7b464d9
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-28 10:37:41 +00:00
Greg Kroah-Hartman
b06a054a03 Revert "sched/uclamp: Fix a uninitialized variable warnings"
This reverts commit 4ee882e0e1.

It breaks the current Android kernel abi.  It will be brought back at
the next KABI break update.

Bug: 161946584
Change-Id: I6f6c769ebcd31248b16f792add4d206c8c1b5c19
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-28 10:37:41 +00:00
Greg Kroah-Hartman
8ea8ecbd12 Revert "sched/fair: Fixes for capacity inversion detection"
This reverts commit e779884c71.

It breaks the current Android kernel abi.  It will be brought back at
the next KABI break update.

Bug: 161946584
Change-Id: Ic6ccde7b57e525a742602d783457c810a1ca0930
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-28 10:37:41 +00:00
Greg Kroah-Hartman
66858c87b1 Merge 5.15.109 into android14-5.15
Changes in 5.15.109
	ARM: dts: rockchip: fix a typo error for rk3288 spdif node
	arm64: dts: qcom: ipq8074-hk01: enable QMP device, not the PHY node
	arm64: dts: meson-g12-common: specify full DMC range
	arm64: dts: imx8mm-evk: correct pmic clock source
	netfilter: br_netfilter: fix recent physdev match breakage
	regulator: fan53555: Explicitly include bits header
	regulator: fan53555: Fix wrong TCS_SLEW_MASK
	net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg
	virtio_net: bugfix overflow inside xdp_linearize_page()
	sfc: Split STATE_READY in to STATE_NET_DOWN and STATE_NET_UP.
	sfc: Fix use-after-free due to selftest_work
	netfilter: nf_tables: fix ifdef to also consider nf_tables=m
	i40e: fix accessing vsi->active_filters without holding lock
	i40e: fix i40e_setup_misc_vector() error handling
	netfilter: nf_tables: validate catch-all set elements
	netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements
	bnxt_en: Do not initialize PTP on older P3/P4 chips
	mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next()
	bonding: Fix memory leak when changing bond type to Ethernet
	net: rpl: fix rpl header size calculation
	mlxsw: pci: Fix possible crash during initialization
	spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe()
	bpf: Fix incorrect verifier pruning due to missing register precision taints
	e1000e: Disable TSO on i219-LM card to increase speed
	f2fs: Fix f2fs_truncate_partial_nodes ftrace event
	Input: i8042 - add quirk for Fujitsu Lifebook A574/H
	platform/x86 (gigabyte-wmi): Add support for A320M-S2H V2
	selftests: sigaltstack: fix -Wuninitialized
	scsi: megaraid_sas: Fix fw_crash_buffer_show()
	scsi: core: Improve scsi_vpd_inquiry() checks
	net: dsa: b53: mmap: add phy ops
	s390/ptrace: fix PTRACE_GET_LAST_BREAK error handling
	nvme-tcp: fix a possible UAF when failing to allocate an io queue
	xen/netback: use same error messages for same errors
	platform/x86: gigabyte-wmi: add support for X570S AORUS ELITE
	rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
	iio: light: tsl2772: fix reading proximity-diodes from device tree
	nilfs2: initialize unused bytes in segment summary blocks
	memstick: fix memory leak if card device is never registered
	kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()
	mmc: sdhci_am654: Set HIGH_SPEED_ENA for SDR12 and SDR25
	drm/i915: Fix fast wake AUX sync len
	mm/khugepaged: check again on anon uffd-wp during isolation
	mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages
	sched/uclamp: Fix fits_capacity() check in feec()
	sched/uclamp: Make cpu_overutilized() use util_fits_cpu()
	sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition
	sched/fair: Detect capacity inversion
	sched/fair: Consider capacity inversion in util_fits_cpu()
	sched/uclamp: Fix a uninitialized variable warnings
	sched/fair: Fixes for capacity inversion detection
	MIPS: Define RUNTIME_DISCARD_EXIT in LD script
	docs: futex: Fix kernel-doc references after code split-up preparation
	purgatory: fix disabling debug info
	fuse: fix attr version comparison in fuse_read_update_size()
	fuse: always revalidate rename target dentry
	fuse: fix deadlock between atomic O_TRUNC and page invalidation
	udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).
	tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
	inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().
	dccp: Call inet6_destroy_sock() via sk->sk_destruct().
	sctp: Call inet6_destroy_sock() via sk->sk_destruct().
	pwm: meson: Explicitly set .polarity in .get_state()
	pwm: iqs620a: Explicitly set .polarity in .get_state()
	pwm: hibvt: Explicitly set .polarity in .get_state()
	counter: 104-quad-8: Fix race condition between FLAG and CNTR reads
	iio: adc: at91-sama5d2_adc: fix an error code in at91_adc_allocate_trigger()
	mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
	ASoC: fsl_asrc_dma: fix potential null-ptr-deref
	ASN.1: Fix check for strdup() success
	soc: sifive: l2_cache: fix missing iounmap() in error path in sifive_l2_init()
	soc: sifive: l2_cache: fix missing free_irq() in error path in sifive_l2_init()
	soc: sifive: l2_cache: fix missing of_node_put() in sifive_l2_init()
	Linux 5.15.109

Change-Id: I2165f73cd4b1056ffb268e8b6a12e71588309188
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-26 13:37:41 +00:00
Qais Yousef
e779884c71 sched/fair: Fixes for capacity inversion detection
commit da07d2f9c1 upstream.

Traversing the Perf Domains requires rcu_read_lock() to be held and is
conditional on sched_energy_enabled(). Ensure right protections applied.

Also skip capacity inversion detection for our own pd; which was an
error.

Fixes: 44c7b80bff ("sched/fair: Detect capacity inversion")
Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20230112122708.330667-3-qyousef@layalina.io
(cherry picked from commit da07d2f9c1)
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26 13:51:53 +02:00
Qais Yousef
4ee882e0e1 sched/uclamp: Fix a uninitialized variable warnings
commit e26fd28db8 upstream.

Addresses the following warnings:

> config: riscv-randconfig-m031-20221111
> compiler: riscv64-linux-gcc (GCC) 12.1.0
>
> smatch warnings:
> kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_min'.
> kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_max'.

Fixes: 244226035a ("sched/uclamp: Fix fits_capacity() check in feec()")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20230112122708.330667-2-qyousef@layalina.io
(cherry picked from commit e26fd28db8)
[Conflict in kernel/sched/fair.c due to new automatic variables being
added on master vs 5.15]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26 13:51:53 +02:00
Qais Yousef
98762616db sched/fair: Consider capacity inversion in util_fits_cpu()
commit aa69c36f31 upstream.

We do consider thermal pressure in util_fits_cpu() for uclamp_min only.
With the exception of the biggest cores which by definition are the max
performance point of the system and all tasks by definition should fit.

Even under thermal pressure, the capacity of the biggest CPU is the
highest in the system and should still fit every task. Except when it
reaches capacity inversion point, then this is no longer true.

We can handle this by using the inverted capacity as capacity_orig in
util_fits_cpu(). Which not only addresses the problem above, but also
ensure uclamp_max now considers the inverted capacity. Force fitting
a task when a CPU is in this adverse state will contribute to making the
thermal throttling last longer.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-10-qais.yousef@arm.com
(cherry picked from commit aa69c36f31)
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26 13:51:53 +02:00
Qais Yousef
99b704ae7a sched/fair: Detect capacity inversion
commit 44c7b80bff upstream.

Check each performance domain to see if thermal pressure is causing its
capacity to be lower than another performance domain.

We assume that each performance domain has CPUs with the same
capacities, which is similar to an assumption made in energy_model.c

We also assume that thermal pressure impacts all CPUs in a performance
domain equally.

If there're multiple performance domains with the same capacity_orig, we
will trigger a capacity inversion if the domain is under thermal
pressure.

The new cpu_in_capacity_inversion() should help users to know when
information about capacity_orig are not reliable and can opt in to use
the inverted capacity as the 'actual' capacity_orig.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-9-qais.yousef@arm.com
(cherry picked from commit 44c7b80bff)
[fix trivial conflict in kernel/sched/sched.h due to code shuffling]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26 13:51:53 +02:00
Qais Yousef
1de6ee9d81 sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition
commit d81304bc61 upstream.

If the utilization of the woken up task is 0, we skip the energy
calculation because it has no impact.

But if the task is boosted (uclamp_min != 0) will have an impact on task
placement and frequency selection. Only skip if the util is truly
0 after applying uclamp values.

Change uclamp_task_cpu() signature to avoid unnecessary additional calls
to uclamp_eff_get(). feec() is the only user now.

Fixes: 732cd75b8c ("sched/fair: Select an energy-efficient CPU on task wake-up")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-8-qais.yousef@arm.com
(cherry picked from commit d81304bc61)
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26 13:51:52 +02:00
Qais Yousef
a77e3c0e06 sched/uclamp: Make cpu_overutilized() use util_fits_cpu()
commit c56ab1b350 upstream.

So that it is now uclamp aware.

This fixes a major problem of busy tasks capped with UCLAMP_MAX keeping
the system in overutilized state which disables EAS and leads to wasting
energy in the long run.

Without this patch running a busy background activity like JIT
compilation on Pixel 6 causes the system to be in overutilized state
74.5% of the time.

With this patch this goes down to  9.79%.

It also fixes another problem when long running tasks that have their
UCLAMP_MIN changed while running such that they need to upmigrate to
honour the new UCLAMP_MIN value. The upmigration doesn't get triggered
because overutilized state never gets set in this state, hence misfit
migration never happens at tick in this case until the task wakes up
again.

Fixes: af24bde8df ("sched/uclamp: Add uclamp support to energy_compute()")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-7-qais.yousef@arm.com
(cherry picked from commit c56ab1b350)
[Fixed trivial conflict in cpu_overutilized() - use cpu_util() instead
of cpu_util_cfs()]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26 13:51:52 +02:00
Qais Yousef
ac407e5102 sched/uclamp: Fix fits_capacity() check in feec()
commit 244226035a upstream.

As reported by Yun Hsiang [1], if a task has its uclamp_min >= 0.8 * 1024,
it'll always pick the previous CPU because fits_capacity() will always
return false in this case.

The new util_fits_cpu() logic should handle this correctly for us beside
more corner cases where similar failures could occur, like when using
UCLAMP_MAX.

We open code uclamp_rq_util_with() except for the clamp() part,
util_fits_cpu() needs the 'raw' values to be passed to it.

Also introduce uclamp_rq_{set, get}() shorthand accessors to get uclamp
value for the rq. Makes the code more readable and ensures the right
rules (use READ_ONCE/WRITE_ONCE) are respected transparently.

[1] https://lists.linaro.org/pipermail/eas-dev/2020-July/001488.html

Fixes: 1d42509e47 ("sched/fair: Make EAS wakeup placement consider uclamp restrictions")
Reported-by: Yun Hsiang <hsiang023167@gmail.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-4-qais.yousef@arm.com
(cherry picked from commit 244226035a)
[Conflict in kernel/sched/fair.c mainly due to new automatic variables
being added on master vs 5.15]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26 13:51:52 +02:00
Ondrej Mosnacek
1aaa1e0a9a kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()
commit 659c0ce1cb upstream.

Linux Security Modules (LSMs) that implement the "capable" hook will
usually emit an access denial message to the audit log whenever they
"block" the current task from using the given capability based on their
security policy.

The occurrence of a denial is used as an indication that the given task
has attempted an operation that requires the given access permission, so
the callers of functions that perform LSM permission checks must take care
to avoid calling them too early (before it is decided if the permission is
actually needed to perform the requested operation).

The __sys_setres[ug]id() functions violate this convention by first
calling ns_capable_setid() and only then checking if the operation
requires the capability or not.  It means that any caller that has the
capability granted by DAC (task's capability set) but not by MAC (LSMs)
will generate a "denied" audit record, even if is doing an operation for
which the capability is not required.

Fix this by reordering the checks such that ns_capable_setid() is checked
last and -EPERM is returned immediately if it returns false.

While there, also do two small optimizations:
* move the capability check before prepare_creds() and
* bail out early in case of a no-op.

Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26 13:51:52 +02:00
Mel Gorman
c157379654 rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
commit 1c0908d8e4 upstream.

Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
on XFS on arm64.

 kernel BUG at fs/inode.c:625!
 Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
 CPU: 11 PID: 6611 Comm: dbench Tainted: G            E   6.0.0-rt14-rt+ #1
 pc : clear_inode+0xa0/0xc0
 lr : clear_inode+0x38/0xc0
 Call trace:
  clear_inode+0xa0/0xc0
  evict+0x160/0x180
  iput+0x154/0x240
  do_unlinkat+0x184/0x300
  __arm64_sys_unlinkat+0x48/0xc0
  el0_svc_common.constprop.4+0xe4/0x2c0
  do_el0_svc+0xac/0x100
  el0_svc+0x78/0x200
  el0t_64_sync_handler+0x9c/0xc0
  el0t_64_sync+0x19c/0x1a0

It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
is likely a bug that existed forever and only became visible when ARM
support was added to preempt-rt. The same problem does not occur on x86-64
and he also reported that converting sb->s_inode_wblist_lock to
raw_spinlock_t makes the problem disappear indicating that the RT spinlock
variant is the problem.

Which in turn means that RT mutexes on ARM64 and any other weakly ordered
architecture are affected by this independent of RT.

Will Deacon observed:

  "I'd be more inclined to be suspicious of the slowpath tbh, as we need to
   make sure that we have acquire semantics on all paths where the lock can
   be taken. Looking at the rtmutex code, this really isn't obvious to me
   -- for example, try_to_take_rt_mutex() appears to be able to return via
   the 'takeit' label without acquire semantics and it looks like we might
   be relying on the caller's subsequent _unlock_ of the wait_lock for
   ordering, but that will give us release semantics which aren't correct."

Sebastian Andrzej Siewior prototyped a fix that does work based on that
comment but it was a little bit overkill and added some fences that should
not be necessary.

The lock owner is updated with an IRQ-safe raw spinlock held, but the
spin_unlock does not provide acquire semantics which are needed when
acquiring a mutex.

Adds the necessary acquire semantics for lock owner updates in the slow path
acquisition and the waiter bit logic.

It successfully completed 10 iterations of the dbench workload while the
vanilla kernel fails on the first iteration.

[ bigeasy@linutronix.de: Initial prototype fix ]

Fixes: 700318d1d7 ("locking/rtmutex: Use acquire/release semantics")
Fixes: 23f78d4a03 ("[PATCH] pi-futex: rt mutex core")
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.net
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26 13:51:51 +02:00
Daniel Borkmann
e722ea6dae bpf: Fix incorrect verifier pruning due to missing register precision taints
[ Upstream commit 71b547f561 ]

Juan Jose et al reported an issue found via fuzzing where the verifier's
pruning logic prematurely marks a program path as safe.

Consider the following program:

   0: (b7) r6 = 1024
   1: (b7) r7 = 0
   2: (b7) r8 = 0
   3: (b7) r9 = -2147483648
   4: (97) r6 %= 1025
   5: (05) goto pc+0
   6: (bd) if r6 <= r9 goto pc+2
   7: (97) r6 %= 1
   8: (b7) r9 = 0
   9: (bd) if r6 <= r9 goto pc+1
  10: (b7) r6 = 0
  11: (b7) r0 = 0
  12: (63) *(u32 *)(r10 -4) = r0
  13: (18) r4 = 0xffff888103693400 // map_ptr(ks=4,vs=48)
  15: (bf) r1 = r4
  16: (bf) r2 = r10
  17: (07) r2 += -4
  18: (85) call bpf_map_lookup_elem#1
  19: (55) if r0 != 0x0 goto pc+1
  20: (95) exit
  21: (77) r6 >>= 10
  22: (27) r6 *= 8192
  23: (bf) r1 = r0
  24: (0f) r0 += r6
  25: (79) r3 = *(u64 *)(r0 +0)
  26: (7b) *(u64 *)(r1 +0) = r3
  27: (95) exit

The verifier treats this as safe, leading to oob read/write access due
to an incorrect verifier conclusion:

  func#0 @0
  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (b7) r6 = 1024                     ; R6_w=1024
  1: (b7) r7 = 0                        ; R7_w=0
  2: (b7) r8 = 0                        ; R8_w=0
  3: (b7) r9 = -2147483648              ; R9_w=-2147483648
  4: (97) r6 %= 1025                    ; R6_w=scalar()
  5: (05) goto pc+0
  6: (bd) if r6 <= r9 goto pc+2         ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff00000000; 0xffffffff)) R9_w=-2147483648
  7: (97) r6 %= 1                       ; R6_w=scalar()
  8: (b7) r9 = 0                        ; R9=0
  9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
  10: (b7) r6 = 0                       ; R6_w=0
  11: (b7) r0 = 0                       ; R0_w=0
  12: (63) *(u32 *)(r10 -4) = r0
  last_idx 12 first_idx 9
  regs=1 stack=0 before 11: (b7) r0 = 0
  13: R0_w=0 R10=fp0 fp-8=0000????
  13: (18) r4 = 0xffff8ad3886c2a00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
  17: (07) r2 += -4                     ; R2_w=fp-4
  18: (85) call bpf_map_lookup_elem#1   ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0)
  19: (55) if r0 != 0x0 goto pc+1       ; R0=0
  20: (95) exit

  from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
  21: (77) r6 >>= 10                    ; R6_w=0
  22: (27) r6 *= 8192                   ; R6_w=0
  23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
  24: (0f) r0 += r6
  last_idx 24 first_idx 19
  regs=40 stack=0 before 23: (bf) r1 = r0
  regs=40 stack=0 before 22: (27) r6 *= 8192
  regs=40 stack=0 before 21: (77) r6 >>= 10
  regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
  parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
  last_idx 18 first_idx 9
  regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
  regs=40 stack=0 before 17: (07) r2 += -4
  regs=40 stack=0 before 16: (bf) r2 = r10
  regs=40 stack=0 before 15: (bf) r1 = r4
  regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00
  regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
  regs=40 stack=0 before 11: (b7) r0 = 0
  regs=40 stack=0 before 10: (b7) r6 = 0
  25: (79) r3 = *(u64 *)(r0 +0)         ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
  26: (7b) *(u64 *)(r1 +0) = r3         ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
  27: (95) exit

  from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
  11: (b7) r0 = 0                       ; R0_w=0
  12: (63) *(u32 *)(r10 -4) = r0
  last_idx 12 first_idx 11
  regs=1 stack=0 before 11: (b7) r0 = 0
  13: R0_w=0 R10=fp0 fp-8=0000????
  13: (18) r4 = 0xffff8ad3886c2a00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
  17: (07) r2 += -4                     ; R2_w=fp-4
  18: (85) call bpf_map_lookup_elem#1
  frame 0: propagating r6
  last_idx 19 first_idx 11
  regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
  regs=40 stack=0 before 17: (07) r2 += -4
  regs=40 stack=0 before 16: (bf) r2 = r10
  regs=40 stack=0 before 15: (bf) r1 = r4
  regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00
  regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
  regs=40 stack=0 before 11: (b7) r0 = 0
  parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0
  last_idx 9 first_idx 9
  regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
  parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=0 R10=fp0
  last_idx 8 first_idx 0
  regs=40 stack=0 before 8: (b7) r9 = 0
  regs=40 stack=0 before 7: (97) r6 %= 1
  regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=40 stack=0 before 5: (05) goto pc+0
  regs=40 stack=0 before 4: (97) r6 %= 1025
  regs=40 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024
  19: safe
  frame 0: propagating r6
  last_idx 9 first_idx 0
  regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=40 stack=0 before 5: (05) goto pc+0
  regs=40 stack=0 before 4: (97) r6 %= 1025
  regs=40 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024

  from 6 to 9: safe
  verification time 110 usec
  stack depth 4
  processed 36 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2

The verifier considers this program as safe by mistakenly pruning unsafe
code paths. In the above func#0, code lines 0-10 are of interest. In line
0-3 registers r6 to r9 are initialized with known scalar values. In line 4
the register r6 is reset to an unknown scalar given the verifier does not
track modulo operations. Due to this, the verifier can also not determine
precisely which branches in line 6 and 9 are taken, therefore it needs to
explore them both.

As can be seen, the verifier starts with exploring the false/fall-through
paths first. The 'from 19 to 21' path has both r6=0 and r9=0 and the pointer
arithmetic on r0 += r6 is therefore considered safe. Given the arithmetic,
r6 is correctly marked for precision tracking where backtracking kicks in
where it walks back the current path all the way where r6 was set to 0 in
the fall-through branch.

Next, the pruning logics pops the path 'from 9 to 11' from the stack. Also
here, the state of the registers is the same, that is, r6=0 and r9=0, so
that at line 19 the path can be pruned as it is considered safe. It is
interesting to note that the conditional in line 9 turned r6 into a more
precise state, that is, in the fall-through path at the beginning of line
10, it is R6=scalar(umin=1), and in the branch-taken path (which is analyzed
here) at the beginning of line 11, r6 turned into a known const r6=0 as
r9=0 prior to that and therefore (unsigned) r6 <= 0 concludes that r6 must
be 0 (**):

  [...]                                 ; R6_w=scalar()
  9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
  [...]

  from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
  [...]

The next path is 'from 6 to 9'. The verifier considers the old and current
state equivalent, and therefore prunes the search incorrectly. Looking into
the two states which are being compared by the pruning logic at line 9, the
old state consists of R6_rwD=Pscalar() R9_rwD=0 R10=fp0 and the new state
consists of R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968)
R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0. While r6 had the reg->precise flag
correctly set in the old state, r9 did not. Both r6'es are considered as
equivalent given the old one is a superset of the current, more precise one,
however, r9's actual values (0 vs 0x80000000) mismatch. Given the old r9
did not have reg->precise flag set, the verifier does not consider the
register as contributing to the precision state of r6, and therefore it
considered both r9 states as equivalent. However, for this specific pruned
path (which is also the actual path taken at runtime), register r6 will be
0x400 and r9 0x80000000 when reaching line 21, thus oob-accessing the map.

The purpose of precision tracking is to initially mark registers (including
spilled ones) as imprecise to help verifier's pruning logic finding equivalent
states it can then prune if they don't contribute to the program's safety
aspects. For example, if registers are used for pointer arithmetic or to pass
constant length to a helper, then the verifier sets reg->precise flag and
backtracks the BPF program instruction sequence and chain of verifier states
to ensure that the given register or stack slot including their dependencies
are marked as precisely tracked scalar. This also includes any other registers
and slots that contribute to a tracked state of given registers/stack slot.
This backtracking relies on recorded jmp_history and is able to traverse
entire chain of parent states. This process ends only when all the necessary
registers/slots and their transitive dependencies are marked as precise.

The backtrack_insn() is called from the current instruction up to the first
instruction, and its purpose is to compute a bitmask of registers and stack
slots that need precision tracking in the parent's verifier state. For example,
if a current instruction is r6 = r7, then r6 needs precision after this
instruction and r7 needs precision before this instruction, that is, in the
parent state. Hence for the latter r7 is marked and r6 unmarked.

For the class of jmp/jmp32 instructions, backtrack_insn() today only looks
at call and exit instructions and for all other conditionals the masks
remain as-is. However, in the given situation register r6 has a dependency
on r9 (as described above in **), so also that one needs to be marked for
precision tracking. In other words, if an imprecise register influences a
precise one, then the imprecise register should also be marked precise.
Meaning, in the parent state both dest and src register need to be tracked
for precision and therefore the marking must be more conservative by setting
reg->precise flag for both. The precision propagation needs to cover both
for the conditional: if the src reg was marked but not the dst reg and vice
versa.

After the fix the program is correctly rejected:

  func#0 @0
  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (b7) r6 = 1024                     ; R6_w=1024
  1: (b7) r7 = 0                        ; R7_w=0
  2: (b7) r8 = 0                        ; R8_w=0
  3: (b7) r9 = -2147483648              ; R9_w=-2147483648
  4: (97) r6 %= 1025                    ; R6_w=scalar()
  5: (05) goto pc+0
  6: (bd) if r6 <= r9 goto pc+2         ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff80000000; 0x7fffffff),u32_min=-2147483648) R9_w=-2147483648
  7: (97) r6 %= 1                       ; R6_w=scalar()
  8: (b7) r9 = 0                        ; R9=0
  9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
  10: (b7) r6 = 0                       ; R6_w=0
  11: (b7) r0 = 0                       ; R0_w=0
  12: (63) *(u32 *)(r10 -4) = r0
  last_idx 12 first_idx 9
  regs=1 stack=0 before 11: (b7) r0 = 0
  13: R0_w=0 R10=fp0 fp-8=0000????
  13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
  17: (07) r2 += -4                     ; R2_w=fp-4
  18: (85) call bpf_map_lookup_elem#1   ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0)
  19: (55) if r0 != 0x0 goto pc+1       ; R0=0
  20: (95) exit

  from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
  21: (77) r6 >>= 10                    ; R6_w=0
  22: (27) r6 *= 8192                   ; R6_w=0
  23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
  24: (0f) r0 += r6
  last_idx 24 first_idx 19
  regs=40 stack=0 before 23: (bf) r1 = r0
  regs=40 stack=0 before 22: (27) r6 *= 8192
  regs=40 stack=0 before 21: (77) r6 >>= 10
  regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
  parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
  last_idx 18 first_idx 9
  regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
  regs=40 stack=0 before 17: (07) r2 += -4
  regs=40 stack=0 before 16: (bf) r2 = r10
  regs=40 stack=0 before 15: (bf) r1 = r4
  regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
  regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
  regs=40 stack=0 before 11: (b7) r0 = 0
  regs=40 stack=0 before 10: (b7) r6 = 0
  25: (79) r3 = *(u64 *)(r0 +0)         ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
  26: (7b) *(u64 *)(r1 +0) = r3         ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
  27: (95) exit

  from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
  11: (b7) r0 = 0                       ; R0_w=0
  12: (63) *(u32 *)(r10 -4) = r0
  last_idx 12 first_idx 11
  regs=1 stack=0 before 11: (b7) r0 = 0
  13: R0_w=0 R10=fp0 fp-8=0000????
  13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
  17: (07) r2 += -4                     ; R2_w=fp-4
  18: (85) call bpf_map_lookup_elem#1
  frame 0: propagating r6
  last_idx 19 first_idx 11
  regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
  regs=40 stack=0 before 17: (07) r2 += -4
  regs=40 stack=0 before 16: (bf) r2 = r10
  regs=40 stack=0 before 15: (bf) r1 = r4
  regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
  regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
  regs=40 stack=0 before 11: (b7) r0 = 0
  parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0
  last_idx 9 first_idx 9
  regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
  parent didn't have regs=240 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=P0 R10=fp0
  last_idx 8 first_idx 0
  regs=240 stack=0 before 8: (b7) r9 = 0
  regs=40 stack=0 before 7: (97) r6 %= 1
  regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=240 stack=0 before 5: (05) goto pc+0
  regs=240 stack=0 before 4: (97) r6 %= 1025
  regs=240 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024
  19: safe

  from 6 to 9: R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0
  9: (bd) if r6 <= r9 goto pc+1
  last_idx 9 first_idx 0
  regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=240 stack=0 before 5: (05) goto pc+0
  regs=240 stack=0 before 4: (97) r6 %= 1025
  regs=240 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024
  last_idx 9 first_idx 0
  regs=200 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=240 stack=0 before 5: (05) goto pc+0
  regs=240 stack=0 before 4: (97) r6 %= 1025
  regs=240 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024
  11: R6=scalar(umax=18446744071562067968) R9=-2147483648
  11: (b7) r0 = 0                       ; R0_w=0
  12: (63) *(u32 *)(r10 -4) = r0
  last_idx 12 first_idx 11
  regs=1 stack=0 before 11: (b7) r0 = 0
  13: R0_w=0 R10=fp0 fp-8=0000????
  13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
  17: (07) r2 += -4                     ; R2_w=fp-4
  18: (85) call bpf_map_lookup_elem#1   ; R0_w=map_value_or_null(id=3,off=0,ks=4,vs=48,imm=0)
  19: (55) if r0 != 0x0 goto pc+1       ; R0_w=0
  20: (95) exit

  from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=scalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm????
  21: (77) r6 >>= 10                    ; R6_w=scalar(umax=18014398507384832,var_off=(0x0; 0x3fffffffffffff))
  22: (27) r6 *= 8192                   ; R6_w=scalar(smax=9223372036854767616,umax=18446744073709543424,var_off=(0x0; 0xffffffffffffe000),s32_max=2147475456,u32_max=-8192)
  23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
  24: (0f) r0 += r6
  last_idx 24 first_idx 21
  regs=40 stack=0 before 23: (bf) r1 = r0
  regs=40 stack=0 before 22: (27) r6 *= 8192
  regs=40 stack=0 before 21: (77) r6 >>= 10
  parent didn't have regs=40 stack=0 marks: R0_rw=map_value(off=0,ks=4,vs=48,imm=0) R6_r=Pscalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm????
  last_idx 19 first_idx 11
  regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
  regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
  regs=40 stack=0 before 17: (07) r2 += -4
  regs=40 stack=0 before 16: (bf) r2 = r10
  regs=40 stack=0 before 15: (bf) r1 = r4
  regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
  regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
  regs=40 stack=0 before 11: (b7) r0 = 0
  parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0
  last_idx 9 first_idx 0
  regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
  regs=240 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=240 stack=0 before 5: (05) goto pc+0
  regs=240 stack=0 before 4: (97) r6 %= 1025
  regs=240 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024
  math between map_value pointer and register with unbounded min value is not allowed
  verification time 886 usec
  stack depth 4
  processed 49 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 2

Fixes: b5dc0163d8 ("bpf: precise scalar_value tracking")
Reported-by: Juan Jose Lopez Jaimez <jjlopezjaimez@google.com>
Reported-by: Meador Inge <meadori@google.com>
Reported-by: Simon Scannell <simonscannell@google.com>
Reported-by: Nenad Stojanovski <thenenadx@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Juan Jose Lopez Jaimez <jjlopezjaimez@google.com>
Reviewed-by: Meador Inge <meadori@google.com>
Reviewed-by: Simon Scannell <simonscannell@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26 13:51:49 +02:00
Greg Kroah-Hartman
ab338605b8 Merge changes Icdb539c6,I5d35ddaa,I59838b0a,I88efacf4 into android14-5.15
* changes:
  Merge 5.15.108 into android14-5.15
  ANDROID: db845c: Update list of module outs
  ANDROID: preserve CRC for xhci symbols
  Merge 5.15.107 into android14-5.15
2023-04-26 07:00:10 +00:00
Greg Kroah-Hartman
4ea7053789 Merge 5.15.108 into android14-5.15
Changes in 5.15.108
	Revert "pinctrl: amd: Disable and mask interrupts on resume"
	ALSA: emu10k1: fix capture interrupt handler unlinking
	ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
	ALSA: i2c/cs8427: fix iec958 mixer control deactivation
	ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex()
	ALSA: emu10k1: don't create old pass-through playback device on Audigy
	ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
	Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
	Bluetooth: Fix race condition in hidp_session_thread
	btrfs: print checksum type and implementation at mount time
	btrfs: fix fast csum implementation detection
	fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace
	mtdblock: tolerate corrected bit-flips
	mtd: rawnand: meson: fix bitmask for length in command word
	mtd: rawnand: stm32_fmc2: remove unsupported EDO mode
	mtd: rawnand: stm32_fmc2: use timings.mode instead of checking tRC_min
	KVM: arm64: PMU: Restore the guest's EL0 event counting after migration
	drm/i915/dsi: fix DSS CTL register offsets for TGL+
	clk: sprd: set max_register according to mapping range
	RDMA/irdma: Fix memory leak of PBLE objects
	RDMA/irdma: Increase iWARP CM default rexmit count
	RDMA/irdma: Add ipv4 check to irdma_find_listener()
	IB/mlx5: Add support for 400G_8X lane speed
	RDMA/cma: Allow UD qp_type to join multicast only
	bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp
	9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
	niu: Fix missing unwind goto in niu_alloc_channels()
	tcp: restrict net.ipv4.tcp_app_win
	drm/armada: Fix a potential double free in an error handling path
	qlcnic: check pci_reset_function result
	net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
	sctp: fix a potential overflow in sctp_ifwdtsn_skip
	RDMA/core: Fix GID entry ref leak when create_ah fails
	udp6: fix potential access to stale information
	net: macb: fix a memory corruption in extended buffer descriptor mode
	skbuff: Fix a race between coalescing and releasing SKBs
	libbpf: Fix single-line struct definition output in btf_dump
	ARM: 9290/1: uaccess: Fix KASAN false-positives
	power: supply: cros_usbpd: reclassify "default case!" as debug
	wifi: mwifiex: mark OF related data as maybe unused
	i2c: imx-lpi2c: clean rx/tx buffers upon new message
	i2c: hisi: Avoid redundant interrupts
	efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
	drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Book X90F
	verify_pefile: relax wrapper length check
	asymmetric_keys: log on fatal failures in PE/pkcs7
	wifi: iwlwifi: mvm: fix mvmtxq->stopped handling
	ACPI: resource: Add Medion S17413 to IRQ override quirk
	counter: stm32-lptimer-cnt: Provide defines for clock polarities
	counter: stm32-timer-cnt: Provide defines for slave mode selection
	counter: Internalize sysfs interface code
	counter: 104-quad-8: Fix Synapse action reported for Index signals
	tracing: Add trace_array_puts() to write into instance
	tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
	i915/perf: Replace DRM_DEBUG with driver specific drm_dbg call
	drm/i915: fix race condition UAF in i915_perf_add_config_ioctl
	riscv: add icache flush for nommu sigreturn trampoline
	net: sfp: initialize sfp->i2c_block_size at sfp allocation
	net: phy: nxp-c45-tja11xx: add remove callback
	net: phy: nxp-c45-tja11xx: fix unsigned long multiplication overflow
	scsi: ses: Handle enclosure with just a primary component gracefully
	x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
	cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
	mptcp: use mptcp_schedule_work instead of open-coding it
	mptcp: stricter state check in mptcp_worker
	ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
	ubi: Fix deadlock caused by recursively holding work_sem
	powerpc/papr_scm: Update the NUMA distance table for the target node
	sched/fair: Move calculate of avg_load to a better location
	sched/fair: Fix imbalance overflow
	x86/rtc: Remove __init for runtime functions
	i2c: ocores: generate stop condition after timeout in polling mode
	nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S50
	nvme-pci: avoid the deepest sleep state on ZHITAI TiPro7000 SSDs
	nvme-pci: Crucial P2 has bogus namespace ids
	nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM610
	nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM760
	nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
	nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD
	kexec: turn all kexec_mutex acquisitions into trylocks
	panic, kexec: make __crash_kexec() NMI safe
	counter: fix docum. build problems after filename change
	counter: Add the necessary colons and indents to the comments of counter_compi
	nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs
	Linux 5.15.108

Change-Id: Icdb539c68f2ea04d37818cb4fe66b08384b77609
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-25 16:02:54 +00:00
xieliujie
226f36eddd ANDROID: vendor_hooks: Add hooks for signal
Add hook to boost thread when process killed.

Bug: 237749933
Signed-off-by: xieliujie <xieliujie@oppo.com>
Change-Id: I7cc6f248397021f3a8271433144a0e582ed27cfa
(cherry picked from commit 709679142d583b0b7338d931fdd43b27b1bbf9e0)
2023-04-25 18:08:34 +08:00
Liujie Xie
b7f527071c ANDROID: vendor_hooks: Export the tracepoints sched_stat_sleep
and sched_waking to let module probe them

Get task info about sleep and waking

Bug: 190422437
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I828c93f531f84e6133c2c3a7f8faada51683afcf
(cherry picked from commit 13af062abf)
(cherry picked from commit 869954e72dac700580d0ea5734d07b574e41afe9)
2023-04-25 02:13:50 +00:00
Liujie Xie
c3c2917768 ANDROID: vendor_hooks: Export the tracepoints sched_stat_iowait, sched_stat_blocked, sched_stat_wait to let modules probe them
Get task info about scheduling delay, iowait, and block time.

It is used to get thread scheduling info when thread happened abnormal situation.

Bug: 189415303
Change-Id: Ib6b548f8a78de5b26d555e9a89e3cc79ea2d1024
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
(cherry picked from commit a6bb1af39d)
(cherry picked from commit 6d8d2ab52facfd6d5de2715e2470872e6a70cf22)
2023-04-25 02:12:21 +00:00
zhengding chen
38a713dc80 ANDROID: workqueue: export symbol of the function wq_worker_comm()
Export symbol of the function wq_worker_comm() in kernel/workqueue.c for dlkm to get the description of the kworker process. It is used to get the description when kworker thread happened abnormal situation.

Bug: 208394207
Signed-off-by: zhengding chen <chenzhengding@oppo.com>
Change-Id: I2e7ddd52a15e22e99e6596f16be08243af1bb473
(cherry picked from commit 28de741861)
(cherry picked from commit 87e0e98c25ba8e121975708943335e3abad651d9)
2023-04-25 01:58:41 +00:00
Valentin Schneider
0cf2833400 panic, kexec: make __crash_kexec() NMI safe
commit 05c6257433 upstream.

Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
panic() doesn't work.  The cause of that lies in the PREEMPT_RT definition
of mutex_trylock():

	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
		return 0;

This prevents an nmi_panic() from executing the main body of
__crash_kexec() which does the actual kexec into the kdump kernel.  The
warning and return are explained by:

  6ce47fd961 ("rtmutex: Warn if trylock is called from hard/softirq context")
  [...]
  The reasons for this are:

      1) There is a potential deadlock in the slowpath

      2) Another cpu which blocks on the rtmutex will boost the task
	 which allegedly locked the rtmutex, but that cannot work
	 because the hard/softirq context borrows the task context.

Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex
and replace it with an atomic variable.  This is somewhat overzealous as
*some* callsites could keep using a mutex (e.g.  the sysfs-facing ones
like crash_shrink_memory()), but this has the benefit of involving a
single unified lock and preventing any future NMI-related surprises.

Tested by triggering NMI panics via:

  $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
  $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
  $ echo 1 > /proc/sys/kernel/panic

  $ ipmitool power diag

Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com
Fixes: 6ce47fd961 ("rtmutex: Warn if trylock is called from hard/softirq context")
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:13:57 +02:00
Valentin Schneider
9e1e511119 kexec: turn all kexec_mutex acquisitions into trylocks
commit 7bb5da0d49 upstream.

Patch series "kexec, panic: Making crash_kexec() NMI safe", v4.


This patch (of 2):

Most acquistions of kexec_mutex are done via mutex_trylock() - those were
a direct "translation" from:

  8c5a1cf0ad ("kexec: use a mutex for locking rather than xchg()")

there have however been two additions since then that use mutex_lock():
crash_get_memory_size() and crash_shrink_memory().

A later commit will replace said mutex with an atomic variable, and
locking operations will become atomic_cmpxchg().  Rather than having those
mutex_lock() become while (atomic_cmpxchg(&lock, 0, 1)), turn them into
trylocks that can return -EBUSY on acquisition failure.

This does halve the printable size of the crash kernel, but that's still
neighbouring 2G for 32bit kernels which should be ample enough.

Link: https://lkml.kernel.org/r/20220630223258.4144112-1-vschneid@redhat.com
Link: https://lkml.kernel.org/r/20220630223258.4144112-2-vschneid@redhat.com
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:13:57 +02:00
Vincent Guittot
b11ff3ef4d sched/fair: Fix imbalance overflow
[ Upstream commit 91dcf1e806 ]

When local group is fully busy but its average load is above system load,
computing the imbalance will overflow and local group is not the best
target for pulling this load.

Fixes: 0b0695f2b3 ("sched/fair: Rework load_balance()")
Reported-by: Tingjia Cao <tjcao980311@gmail.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tingjia Cao <tjcao980311@gmail.com>
Link: https://lore.kernel.org/lkml/CABcWv9_DAhVBOq2=W=2ypKE9dKM5s2DvoV8-U0+GDwwuKZ89jQ@mail.gmail.com/T/
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:13:56 +02:00
zgpeng
90e3dc5101 sched/fair: Move calculate of avg_load to a better location
[ Upstream commit 0635490078 ]

In calculate_imbalance function, when the value of local->avg_load is
greater than or equal to busiest->avg_load, the calculated sds->avg_load is
not used. So this calculation can be placed in a more appropriate position.

Signed-off-by: zgpeng <zgpeng@tencent.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Samuel Liao <samuelliao@tencent.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/1649239025-10010-1-git-send-email-zgpeng@tencent.com
Stable-dep-of: 91dcf1e806 ("sched/fair: Fix imbalance overflow")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:13:56 +02:00
Waiman Long
f4f2a1d491 cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
commit ba9182a896 upstream.

After a successful cpuset_can_attach() call which increments the
attach_in_progress flag, either cpuset_cancel_attach() or cpuset_attach()
will be called later. In cpuset_attach(), tasks in cpuset_attach_wq,
if present, will be woken up at the end. That is not the case in
cpuset_cancel_attach(). So missed wakeup is possible if the attach
operation is somehow cancelled. Fix that by doing the wakeup in
cpuset_cancel_attach() as well.

Fixes: e44193d39e ("cpuset: let hotplug propagation work wait for task attaching")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:13:55 +02:00
Steven Rostedt (Google)
6b337a13c1 tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
[ Upstream commit 9d52727f80 ]

If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.

Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <zwisler@google.com>
Fixes: 2824f50332 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:13:55 +02:00
Steven Rostedt (Google)
1403518ed0 tracing: Add trace_array_puts() to write into instance
[ Upstream commit d503b8f747 ]

Add a generic trace_array_puts() that can be used to "trace_puts()" into
an allocated trace_array instance. This is just another variant of
trace_array_printk().

Link: https://lkml.kernel.org/r/20230207173026.584717290@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: 9d52727f80 ("tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:13:55 +02:00
ziwei.dai
71cf9c9835 BACKPORT: FROMGIT: rcu: Avoid freeing new kfree_rcu() memory after old grace period
Memory passed to kvfree_rcu() that is to be freed is tracked by a
per-CPU kfree_rcu_cpu structure, which in turn contains pointers
to kvfree_rcu_bulk_data structures that contain pointers to memory
that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
structure that tracks the memory that has already been handed to RCU.
These structures track three categories of memory: (1) Memory for
kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
during an OOM episode.  The first two categories are tracked in a
cache-friendly manner involving a dynamically allocated page of pointers
(the aforementioned kvfree_rcu_bulk_data structures), while the third
uses a simple (but decidedly cache-unfriendly) linked list through the
rcu_head structures in each block of memory.

On a given CPU, these three categories are handled as a unit, with that
CPU's kfree_rcu_cpu_work structure having one pointer for each of the
three categories.  Clearly, new memory for a given category cannot be
placed in the corresponding kfree_rcu_cpu_work structure until any old
memory has had its grace period elapse and thus has been removed.  And
the kfree_rcu_monitor() function does in fact check for this.

Except that the kfree_rcu_monitor() function checks these pointers one
at a time.  This means that if the previous kfree_rcu() memory passed
to RCU had only category 1 and the current one has only category 2, the
kfree_rcu_monitor() function will send that current category-2 memory
along immediately.  This can result in memory being freed too soon,
that is, out from under unsuspecting RCU readers.

To see this, consider the following sequence of events, in which:

o       Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
        then is preempted.

o       CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
        after a later grace period.  Except that "from_cset" is freed
        right after the previous grace period ended, so that "from_cset"
        is immediately freed.  Task A resumes and references "from_cset"'s
        member, after which nothing good happens.

In full detail:

CPU 0                                   CPU 1
----------------------                  ----------------------
count_memcg_event_mm()
|rcu_read_lock()  <---
|mem_cgroup_from_task()
 |// css_set_ptr is the "from_cset" mentioned on CPU 1
 |css_set_ptr = rcu_dereference((task)->cgroups)
 |// Hard irq comes, current task is scheduled out.

                                        cgroup_attach_task()
                                        |cgroup_migrate()
                                        |cgroup_migrate_execute()
                                        |css_set_move_task(task, from_cset, to_cset, true)
                                        |cgroup_move_task(task, to_cset)
                                        |rcu_assign_pointer(.., to_cset)
                                        |...
                                        |cgroup_migrate_finish()
                                        |put_css_set_locked(from_cset)
                                        |from_cset->refcount return 0
                                        |kfree_rcu(cset, rcu_head) // free from_cset after new gp
                                        |add_ptr_to_bulk_krc_lock()
                                        |schedule_delayed_work(&krcp->monitor_work, ..)

                                        kfree_rcu_monitor()
                                        |krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
                                        |queue_rcu_work(system_wq, &krwp->rcu_work)
                                        |if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
                                        |call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp

                                        // There is a perious call_rcu(.., rcu_work_rcufn)
                                        // gp end, rcu_work_rcufn() is called.
                                        rcu_work_rcufn()
                                        |__queue_work(.., rwork->wq, &rwork->work);

                                        |kfree_rcu_work()
                                        |krwp->bulk_head_free[0] bulk is freed before new gp end!!!
                                        |The "from_cset" is freed before new gp end.

// the task resumes some time later.
 |css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.

This commit therefore causes kfree_rcu_monitor() to refrain from moving
kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
grace period has completed for all three categories.

v2: Use helper function instead of inserted code block at kfree_rcu_monitor().

Fixes: 34c8817455 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Fixes: 5f3c8d6204 ("rcu/tree: Maintain separate array for vmalloc ptrs")
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Bug: 275289142
Link: https://lore.kernel.org/rcu/1680266529-28429-1-git-send-email-ziwei.dai@unisoc.com/T/#m9d1d6a4542548acddee133a2807511bccf2b01b6
(cherry picked from commit e222f9a512539c3f4093a55d16624d9da614800b
 https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2023.03.30a)
[ziwei.dai: Added missing need_offload_krc() function ]

Change-Id: I63b618ed8454cb2826f04e8789c762be8f1ba1e1
Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com>
(cherry picked from commit 6ccb91c80a501c800cc48a3fcbb5473f1ec72c02)
2023-04-20 01:04:33 +00:00
Greg Kroah-Hartman
82dadab157 Merge 5.15.107 into android14-5.15
Changes in 5.15.107
	ocfs2: ocfs2_mount_volume does cleanup job before return error
	ocfs2: rewrite error handling of ocfs2_fill_super
	ocfs2: fix memory leak in ocfs2_mount_volume()
	NFSD: Fix sparse warning
	NFSD: pass range end to vfs_fsync_range() instead of count
	RDMA/irdma: Do not request 2-level PBLEs for CQ alloc
	platform/x86: int3472: Split into 2 drivers
	platform/x86: int3472/discrete: Ensure the clk/power enable pins are in output mode
	iavf: return errno code instead of status code
	iavf/iavf_main: actually log ->src mask when talking about it
	serial: 8250_exar: derive nr_ports from PCI ID for Acces I/O cards
	serial: exar: Add support for Sealevel 7xxxC serial cards
	bpf: hash map, avoid deadlock with suitable hash mask
	gpio: GPIO_REGMAP: select REGMAP instead of depending on it
	Drivers: vmbus: Check for channel allocation before looking up relids
	pwm: cros-ec: Explicitly set .polarity in .get_state()
	pwm: sprd: Explicitly set .polarity in .get_state()
	KVM: s390: pv: fix external interruption loop not always detected
	wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
	net: qrtr: combine nameservice into main module
	net: qrtr: Fix a refcount bug in qrtr_recvmsg()
	NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL
	icmp: guard against too small mtu
	net: don't let netpoll invoke NAPI if in xmit context
	net: dsa: mv88e6xxx: Reset mv88e6393x force WD event bit
	sctp: check send stream number after wait_for_sndbuf
	net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT
	ipv6: Fix an uninit variable access bug in __ip6_make_skb()
	platform/x86: think-lmi: Fix memory leak when showing current settings
	platform/x86: think-lmi: Fix memory leaks when parsing ThinkStation WMI strings
	platform/x86: think-lmi: Clean up display of current_value on Thinkstation
	gpio: davinci: Add irq chip flag to skip set wake
	net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
	net: stmmac: fix up RX flow hash indirection table when setting channels
	sunrpc: only free unix grouplist after RCU settles
	NFSD: callback request does not use correct credential for AUTH_SYS
	ice: fix wrong fallback logic for FDIR
	ice: Reset FDIR counter in FDIR init stage
	ethtool: reset #lanes when lanes is omitted
	gve: Secure enough bytes in the first TX desc for all TCP pkts
	kbuild: refactor single builds of *.ko
	usb: xhci: tegra: fix sleep in atomic call
	xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu
	usb: cdnsp: Fixes error: uninitialized symbol 'len'
	usb: dwc3: pci: add support for the Intel Meteor Lake-S
	USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
	usb: typec: altmodes/displayport: Fix configure initial pin assignment
	USB: serial: option: add Telit FE990 compositions
	USB: serial: option: add Quectel RM500U-CN modem
	iio: adis16480: select CONFIG_CRC32
	iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
	iio: dac: cio-dac: Fix max DAC write value check for 12-bit
	iio: light: cm32181: Unregister second I2C client if present
	tty: serial: sh-sci: Fix transmit end interrupt handler
	tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
	tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty
	nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
	nilfs2: fix sysfs interface lifetime
	dt-bindings: serial: renesas,scif: Fix 4th IRQ for 4-IRQ SCIFs
	ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARN
	ALSA: hda/realtek: Add quirk for Clevo X370SNW
	coresight: etm4x: Do not access TRCIDR1 for identification
	coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
	iio: adc: ad7791: fix IRQ flags
	scsi: qla2xxx: Fix memory leak in qla2x00_probe_one()
	scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()
	smb3: allow deferred close timeout to be configurable
	smb3: lower default deferred close timeout to address perf regression
	cifs: sanitize paths in cifs_update_super_prepath.
	perf/core: Fix the same task check in perf_event_set_output
	ftrace: Mark get_lock_parent_ip() __always_inline
	ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
	fs: drop peer group ids under namespace lock
	can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
	can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
	tracing: Free error logs of tracing instances
	ASoC: hdac_hdmi: use set_stream() instead of set_tdm_slots()
	mm: vmalloc: avoid warn_alloc noise caused by fatal signal
	drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
	drm/nouveau/disp: Support more modes by checking with lower bpc
	ring-buffer: Fix race while reader and writer are on the same page
	mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
	drm/bridge: lt9611: Fix PLL being unable to lock
	mm: take a page reference when removing device exclusive entries
	kbuild: fix single directory build
	ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown
	bpftool: Print newline before '}' for struct with padding only fields
	Linux 5.15.107

Change-Id: I88efacf4aaf63d4b21429eef2350c78da7e2528e
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-17 11:31:12 +00:00
Arve Hjønnevåg
12d161b7ae FROMLIST: sched/wait: Fix a kthread_park race with wait_woken()
kthread_park and wait_woken have a similar race that kthread_stop and
wait_woken used to have before it was fixed in
cb6538e740. Extend that fix to also cover
kthread_park.

Bug: 274686101
Link: https://lore.kernel.org/lkml/20230406194053.876844-1-arve@android.com/
Change-Id: Iaec960d7e30862f4ccac5c98dd43d32bbcf9a72b
Signed-off-by: Arve Hjønnevåg <arve@android.com>
(cherry picked from commit 69eba53950444890063b1e0469a61b69f8301767)
2023-04-14 21:07:17 +00:00
Zheng Yejian
cbe5f7fed7 ring-buffer: Fix race while reader and writer are on the same page
commit 6455b6163d upstream.

When user reads file 'trace_pipe', kernel keeps printing following logs
that warn at "cpu_buffer->reader_page->read > rb_page_size(reader)" in
rb_get_reader_page(). It just looks like there's an infinite loop in
tracing_read_pipe(). This problem occurs several times on arm64 platform
when testing v5.10 and below.

  Call trace:
   rb_get_reader_page+0x248/0x1300
   rb_buffer_peek+0x34/0x160
   ring_buffer_peek+0xbc/0x224
   peek_next_entry+0x98/0xbc
   __find_next_entry+0xc4/0x1c0
   trace_find_next_entry_inc+0x30/0x94
   tracing_read_pipe+0x198/0x304
   vfs_read+0xb4/0x1e0
   ksys_read+0x74/0x100
   __arm64_sys_read+0x24/0x30
   el0_svc_common.constprop.0+0x7c/0x1bc
   do_el0_svc+0x2c/0x94
   el0_svc+0x20/0x30
   el0_sync_handler+0xb0/0xb4
   el0_sync+0x160/0x180

Then I dump the vmcore and look into the problematic per_cpu ring_buffer,
I found that tail_page/commit_page/reader_page are on the same page while
reader_page->read is obviously abnormal:
  tail_page == commit_page == reader_page == {
    .write = 0x100d20,
    .read = 0x8f9f4805,  // Far greater than 0xd20, obviously abnormal!!!
    .entries = 0x10004c,
    .real_end = 0x0,
    .page = {
      .time_stamp = 0x857257416af0,
      .commit = 0xd20,  // This page hasn't been full filled.
      // .data[0...0xd20] seems normal.
    }
 }

The root cause is most likely the race that reader and writer are on the
same page while reader saw an event that not fully committed by writer.

To fix this, add memory barriers to make sure the reader can see the
content of what is committed. Since commit a0fcaaed0c ("ring-buffer: Fix
race between reset page and reading page") has added the read barrier in
rb_get_reader_page(), here we just need to add the write barrier.

Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: 77ae365eca ("ring-buffer: make lockless")
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13 16:48:26 +02:00
Steven Rostedt (Google)
33d5d4e67a tracing: Free error logs of tracing instances
commit 3357c6e429 upstream.

When a tracing instance is removed, the error messages that hold errors
that occurred in the instance needs to be freed. The following reports a
memory leak:

 # cd /sys/kernel/tracing
 # mkdir instances/foo
 # echo 'hist:keys=x' > instances/foo/events/sched/sched_switch/trigger
 # cat instances/foo/error_log
 [  117.404795] hist:sched:sched_switch: error: Couldn't find field
   Command: hist:keys=x
                      ^
 # rmdir instances/foo

Then check for memory leaks:

 # echo scan > /sys/kernel/debug/kmemleak
 # cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88810d8ec700 (size 192):
  comm "bash", pid 869, jiffies 4294950577 (age 215.752s)
  hex dump (first 32 bytes):
    60 dd 68 61 81 88 ff ff 60 dd 68 61 81 88 ff ff  `.ha....`.ha....
    a0 30 8c 83 ff ff ff ff 26 00 0a 00 00 00 00 00  .0......&.......
  backtrace:
    [<00000000dae26536>] kmalloc_trace+0x2a/0xa0
    [<00000000b2938940>] tracing_log_err+0x277/0x2e0
    [<000000004a0e1b07>] parse_atom+0x966/0xb40
    [<0000000023b24337>] parse_expr+0x5f3/0xdb0
    [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560
    [<00000000293a9645>] trigger_process_regex+0x135/0x1a0
    [<000000005c22b4f2>] event_trigger_write+0x87/0xf0
    [<000000002cadc509>] vfs_write+0x162/0x670
    [<0000000059c3b9be>] ksys_write+0xca/0x170
    [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0
    [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
unreferenced object 0xffff888170c35a00 (size 32):
  comm "bash", pid 869, jiffies 4294950577 (age 215.752s)
  hex dump (first 32 bytes):
    0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 68 69 73 74  .  Command: hist
    3a 6b 65 79 73 3d 78 0a 00 00 00 00 00 00 00 00  :keys=x.........
  backtrace:
    [<000000006a747de5>] __kmalloc+0x4d/0x160
    [<000000000039df5f>] tracing_log_err+0x29b/0x2e0
    [<000000004a0e1b07>] parse_atom+0x966/0xb40
    [<0000000023b24337>] parse_expr+0x5f3/0xdb0
    [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560
    [<00000000293a9645>] trigger_process_regex+0x135/0x1a0
    [<000000005c22b4f2>] event_trigger_write+0x87/0xf0
    [<000000002cadc509>] vfs_write+0x162/0x670
    [<0000000059c3b9be>] ksys_write+0xca/0x170
    [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0
    [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

The problem is that the error log needs to be freed when the instance is
removed.

Link: https://lore.kernel.org/lkml/76134d9f-a5ba-6a0d-37b3-28310b4a1e91@alu.unizg.hr/
Link: https://lore.kernel.org/linux-trace-kernel/20230404194504.5790b95f@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Fixes: 2f754e771b ("tracing: Have the error logs show up in the proper instances")
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13 16:48:25 +02:00
Zheng Yejian
33a503b7c3 ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
commit 2a2d8c51de upstream.

Syzkaller report a WARNING: "WARN_ON(!direct)" in modify_ftrace_direct().

Root cause is 'direct->addr' was changed from 'old_addr' to 'new_addr' but
not restored if error happened on calling ftrace_modify_direct_caller().
Then it can no longer find 'direct' by that 'old_addr'.

To fix it, restore 'direct->addr' to 'old_addr' explicitly in error path.

Link: https://lore.kernel.org/linux-trace-kernel/20230330025223.1046087-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: <ast@kernel.org>
Cc: <daniel@iogearbox.net>
Fixes: 8a141dd7f7 ("ftrace: Fix modify_ftrace_direct.")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13 16:48:24 +02:00
Kan Liang
a007b7dc19 perf/core: Fix the same task check in perf_event_set_output
[ Upstream commit 24d3ae2f37 ]

The same task check in perf_event_set_output has some potential issues
for some usages.

For the current perf code, there is a problem if using of
perf_event_open() to have multiple samples getting into the same mmap’d
memory when they are both attached to the same process.
https://lore.kernel.org/all/92645262-D319-4068-9C44-2409EF44888E@gmail.com/
Because the event->ctx is not ready when the perf_event_set_output() is
invoked in the perf_event_open().

Besides the above issue, before the commit bd27568117 ("perf: Rewrite
core context handling"), perf record can errors out when sampling with
a hardware event and a software event as below.
 $ perf record -e cycles,dummy --per-thread ls
 failed to mmap with 22 (Invalid argument)
That's because that prior to the commit a hardware event and a software
event are from different task context.

The problem should be a long time issue since commit c3f00c7027
("perk: Separate find_get_context() from event initialization").

The task struct is stored in the event->hw.target for each per-thread
event. It is a more reliable way to determine whether two events are
attached to the same task.

The event->hw.target was also introduced several years ago by the
commit 50f16a8bf9 ("perf: Remove type specific target pointers"). It
can not only be used to fix the issue with the current code, but also
back port to fix the issues with an older kernel.

Note: The event->hw.target was introduced later than commit
c3f00c7027. The patch may cannot be applied between the commit
c3f00c7027 and commit 50f16a8bf9. Anybody that wants to back-port
this at that period may have to find other solutions.

Fixes: c3f00c7027 ("perf: Separate find_get_context() from event initialization")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lkml.kernel.org/r/20230322202449.512091-1-kan.liang@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13 16:48:24 +02:00
Tonghao Zhang
0a473f8343 bpf: hash map, avoid deadlock with suitable hash mask
[ Upstream commit 9f907439dc ]

The deadlock still may occur while accessed in NMI and non-NMI
context. Because in NMI, we still may access the same bucket but with
different map_locked index.

For example, on the same CPU, .max_entries = 2, we update the hash map,
with key = 4, while running bpf prog in NMI nmi_handle(), to update
hash map with key = 20, so it will have the same bucket index but have
different map_locked index.

To fix this issue, using min mask to hash again.

Fixes: 20b6cc34ea ("bpf: Avoid hashtab deadlock with map_locked")
Signed-off-by: Tonghao Zhang <tong@infragraf.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230111092903.92389-1-tong@infragraf.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13 16:48:16 +02:00
Greg Kroah-Hartman
83e0304b4e Merge 5.15.106 into android14-5.15
Changes in 5.15.106
	fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
	usb: dwc3: gadget: move cmd_endtransfer to extra function
	usb: dwc3: gadget: Add 1ms delay after end transfer command without IOC
	kernel: kcsan: kcsan_test: build without structleak plugin
	kcsan: avoid passing -g for test
	ksmbd: don't terminate inactive sessions after a few seconds
	bus: imx-weim: fix branch condition evaluates to a garbage value
	xfrm: Zero padding when dumping algos and encap
	ASoC: codecs: tx-macro: Fix for KASAN: slab-out-of-bounds
	md: avoid signed overflow in slot_store()
	x86/PVH: obtain VGA console info in Dom0
	net: hsr: Don't log netdev_err message on unknown prp dst node
	ALSA: asihpi: check pao in control_message()
	ALSA: hda/ca0132: fixup buffer overrun at tuning_ctl_set()
	fbdev: tgafb: Fix potential divide by zero
	sched_getaffinity: don't assume 'cpumask_size()' is fully initialized
	fbdev: nvidia: Fix potential divide by zero
	fbdev: intelfb: Fix potential divide by zero
	fbdev: lxfb: Fix potential divide by zero
	fbdev: au1200fb: Fix potential divide by zero
	tools/power turbostat: Fix /dev/cpu_dma_latency warnings
	tools/power turbostat: fix decoding of HWP_STATUS
	tracing: Fix wrong return in kprobe_event_gen_test.c
	ca8210: Fix unsigned mac_len comparison with zero in ca8210_skb_tx()
	mips: bmips: BCM6358: disable RAC flush for TP1
	ALSA: usb-audio: Fix recursive locking at XRUN during syncing
	platform/x86: think-lmi: add missing type attribute
	platform/x86: think-lmi: use correct possible_values delimiters
	platform/x86: think-lmi: only display possible_values if available
	platform/x86: think-lmi: Add possible_values for ThinkStation
	mtd: rawnand: meson: invalidate cache on polling ECC bit
	SUNRPC: fix shutdown of NFS TCP client socket
	sfc: ef10: don't overwrite offload features at NIC reset
	scsi: megaraid_sas: Fix crash after a double completion
	scsi: mpt3sas: Don't print sense pool info twice
	ptp_qoriq: fix memory leak in probe()
	net: dsa: microchip: ksz8863_smi: fix bulk access
	r8169: fix RTL8168H and RTL8107E rx crc error
	regulator: Handle deferred clk
	net/net_failover: fix txq exceeding warning
	net: stmmac: don't reject VLANs when IFF_PROMISC is set
	drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state
	platform/x86/intel/pmc: Alder Lake PCH slp_s0_residency fix
	can: bcm: bcm_tx_setup(): fix KMSAN uninit-value in vfs_write
	s390/vfio-ap: fix memory leak in vfio_ap device driver
	loop: suppress uevents while reconfiguring the device
	loop: LOOP_CONFIGURE: send uevents for partitions
	net: mvpp2: classifier flow fix fragmentation flags
	net: mvpp2: parser fix QinQ
	net: mvpp2: parser fix PPPoE
	smsc911x: avoid PHY being resumed when interface is not up
	ice: add profile conflict check for AVF FDIR
	ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
	ALSA: ymfpci: Create card with device-managed snd_devm_card_new()
	ALSA: ymfpci: Fix BUG_ON in probe function
	net: ipa: compute DMA pool size properly
	i40e: fix registers dump after run ethtool adapter self test
	bnxt_en: Fix reporting of test result in ethtool selftest
	bnxt_en: Fix typo in PCI id to device description string mapping
	bnxt_en: Add missing 200G link speed reporting
	net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
	net: ethernet: mtk_eth_soc: fix flow block refcounting logic
	pinctrl: ocelot: Fix alt mode for ocelot
	iommu/vt-d: Allow zero SAGAW if second-stage not supported
	Input: alps - fix compatibility with -funsigned-char
	Input: focaltech - use explicitly signed char type
	cifs: prevent infinite recursion in CIFSGetDFSRefer()
	cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL
	Input: goodix - add Lenovo Yoga Book X90F to nine_bytes_report DMI table
	btrfs: fix race between quota disable and quota assign ioctls
	btrfs: scan device in non-exclusive mode
	zonefs: Always invalidate last cached page on append write
	can: j1939: prevent deadlock by moving j1939_sk_errqueue()
	xen/netback: don't do grant copy across page boundary
	net: phy: dp83869: fix default value for tx-/rx-internal-delay
	pinctrl: amd: Disable and mask interrupts on resume
	pinctrl: at91-pio4: fix domain name assignment
	powerpc: Don't try to copy PPR for task with NULL pt_regs
	NFSv4: Fix hangs when recovering open state after a server reboot
	ALSA: hda/conexant: Partial revert of a quirk for Lenovo
	ALSA: usb-audio: Fix regression on detection of Roland VS-100
	ALSA: hda/realtek: Add quirks for some Clevo laptops
	ALSA: hda/realtek: Add quirk for Lenovo ZhaoYang CF4620Z
	xtensa: fix KASAN report for show_stack
	rcu: Fix rcu_torture_read ftrace event
	drm/etnaviv: fix reference leak when mmaping imported buffer
	drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
	KVM: arm64: Disable interrupts while walking userspace PTs
	s390/uaccess: add missing earlyclobber annotations to __clear_user()
	KVM: VMX: Move preemption timer <=> hrtimer dance to common x86
	KVM: x86: Inject #GP on x2APIC WRMSR that sets reserved bits 63:32
	KVM: x86: Purge "highest ISR" cache when updating APICv state
	zonefs: Fix error message in zonefs_file_dio_append()
	selftests/bpf: Test btf dump for struct with padding only fields
	libbpf: Fix BTF-to-C converter's padding logic
	selftests/bpf: Add few corner cases to test padding handling of btf_dump
	libbpf: Fix btf_dump's packed struct determination
	hsr: ratelimit only when errors are printed
	x86/PVH: avoid 32-bit build warning when obtaining VGA console info
	Linux 5.15.106

Change-Id: I3197b16c9f82b9bd6a17d4637a00b15e9bd5b873
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-05 14:15:04 +00:00
Anton Gusev
5362344e1c tracing: Fix wrong return in kprobe_event_gen_test.c
[ Upstream commit bc4f359b3b ]

Overwriting the error code with the deletion result may cause the
function to return 0 despite encountering an error. Commit b111545d26
("tracing: Remove the useless value assignment in
test_create_synth_event()") solves a similar issue by
returning the original error code, so this patch does the same.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Link: https://lore.kernel.org/linux-trace-kernel/20230131075818.5322-1-aagusev@ispras.ru

Signed-off-by: Anton Gusev <aagusev@ispras.ru>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05 11:24:54 +02:00
Linus Torvalds
9de1325bc2 sched_getaffinity: don't assume 'cpumask_size()' is fully initialized
[ Upstream commit 6015b1aca1 ]

The getaffinity() system call uses 'cpumask_size()' to decide how big
the CPU mask is - so far so good.  It is indeed the allocation size of a
cpumask.

But the code also assumes that the whole allocation is initialized
without actually doing so itself.  That's wrong, because we might have
fixed-size allocations (making copying and clearing more efficient), but
not all of it is then necessarily used if 'nr_cpu_ids' is smaller.

Having checked other users of 'cpumask_size()', they all seem to be ok,
either using it purely for the allocation size, or explicitly zeroing
the cpumask before using the size in bytes to copy it.

See for example the ublk_ctrl_get_queue_affinity() function that uses
the proper 'zalloc_cpumask_var()' to make sure that the whole mask is
cleared, whether the storage is on the stack or if it was an external
allocation.

Fix this by just zeroing the allocation before using it.  Do the same
for the compat version of sched_getaffinity(), which had the same logic.

Also, for consistency, make sched_getaffinity() use 'cpumask_bits()' to
access the bits.  For a cpumask_var_t, it ends up being a pointer to the
same data either way, but it's just a good idea to treat it like you
would a 'cpumask_t'.  The compat case already did that.

Reported-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/lkml/7d026744-6bd6-6827-0471-b5e8eae0be3f@arm.com/
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05 11:24:53 +02:00
Marco Elver
0c873ab68f kcsan: avoid passing -g for test
[ Upstream commit 5eb39cde1e ]

Nathan reported that when building with GNU as and a version of clang that
defaults to DWARF5, the assembler will complain with:

  Error: non-constant .uleb128 is not supported

This is because `-g` defaults to the compiler debug info default. If the
assembler does not support some of the directives used, the above errors
occur. To fix, remove the explicit passing of `-g`.

All the test wants is that stack traces print valid function names, and
debug info is not required for that. (I currently cannot recall why I
added the explicit `-g`.)

Link: https://lkml.kernel.org/r/20230316224705.709984-2-elver@google.com
Fixes: 1fe84fd4a4 ("kcsan: Add test suite")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05 11:24:52 +02:00
Anders Roxell
b27e663cf1 kernel: kcsan: kcsan_test: build without structleak plugin
[ Upstream commit 6fcd4267a8 ]

Building kcsan_test with structleak plugin enabled makes the stack frame
size to grow.

kernel/kcsan/kcsan_test.c:704:1: error: the frame size of 3296 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]

Turn off the structleak plugin checks for kcsan_test.

Link: https://lkml.kernel.org/r/20221128104358.2660634-1-anders.roxell@linaro.org
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Marco Elver <elver@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Gow <davidgow@google.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 5eb39cde1e ("kcsan: avoid passing -g for test")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05 11:24:52 +02:00
Greg Kroah-Hartman
aa7f85d696 Merge 5.15.105 into android14-5.15
Changes in 5.15.105
	interconnect: qcom: osm-l3: fix icc_onecell_data allocation
	perf/core: Fix perf_output_begin parameter is incorrectly invoked in perf_event_bpf_output
	perf: fix perf_event_context->time
	tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr
	serial: fsl_lpuart: Fix comment typo
	tty: serial: fsl_lpuart: switch to new dmaengine_terminate_* API
	tty: serial: fsl_lpuart: fix race on RX DMA shutdown
	serial: 8250: SERIAL_8250_ASPEED_VUART should depend on ARCH_ASPEED
	serial: 8250: ASPEED_VUART: select REGMAP instead of depending on it
	kthread: add the helper function kthread_run_on_cpu()
	trace/hwlat: make use of the helper function kthread_run_on_cpu()
	trace/hwlat: Do not start per-cpu thread if it is already running
	net: tls: fix possible race condition between do_tls_getsockopt_conf() and do_tls_setsockopt_conf()
	power: supply: bq24190_charger: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
	power: supply: bq24190: Fix use after free bug in bq24190_remove due to race condition
	power: supply: da9150: Fix use after free bug in da9150_charger_remove due to race condition
	ARM: dts: imx6sll: e60k02: fix usbotg1 pinctrl
	ARM: dts: imx6sl: tolino-shine2hd: fix usbotg1 pinctrl
	arm64: dts: imx8mn: specify #sound-dai-cells for SAI nodes
	xsk: Add missing overflow check in xdp_umem_reg
	iavf: fix inverted Rx hash condition leading to disabled hash
	iavf: fix non-tunneled IPv6 UDP packet type and hashing
	intel/igbvf: free irq on the error path in igbvf_request_msix()
	igbvf: Regard vf reset nack as success
	igc: fix the validation logic for taprio's gate list
	i2c: imx-lpi2c: check only for enabled interrupt flags
	i2c: hisi: Only use the completion interrupt to finish the transfer
	scsi: scsi_dh_alua: Fix memleak for 'qdata' in alua_activate()
	net: dsa: b53: mmap: fix device tree support
	net: usb: smsc95xx: Limit packet length to skb->len
	qed/qed_sriov: guard against NULL derefs from qed_iov_get_vf_info
	xirc2ps_cs: Fix use after free bug in xirc2ps_detach
	net: phy: Ensure state transitions are processed from phy_stop()
	net: mdio: fix owner field for mdio buses registered using device-tree
	net: mdio: fix owner field for mdio buses registered using ACPI
	drm/i915/gt: perform uc late init after probe error injection
	net: qcom/emac: Fix use after free bug in emac_remove due to race condition
	net/ps3_gelic_net: Fix RX sk_buff length
	net/ps3_gelic_net: Use dma_mapping_error
	octeontx2-vf: Add missing free for alloc_percpu
	bootconfig: Fix testcase to increase max node
	keys: Do not cache key in task struct if key is requested from kernel thread
	iavf: fix hang on reboot with ice
	i40e: fix flow director packet filter programming
	bpf: Adjust insufficient default bpf_jit_limit
	net/mlx5e: Set uplink rep as NETNS_LOCAL
	net/mlx5: Fix steering rules cleanup
	net/mlx5: Read the TC mapping of all priorities on ETS query
	net/mlx5: E-Switch, Fix an Oops in error handling code
	net: dsa: tag_brcm: legacy: fix daisy-chained switches
	atm: idt77252: fix kmemleak when rmmod idt77252
	erspan: do not use skb_mac_header() in ndo_start_xmit()
	net/sonic: use dma_mapping_error() for error check
	nvme-tcp: fix nvme_tcp_term_pdu to match spec
	hvc/xen: prevent concurrent accesses to the shared ring
	ksmbd: add low bound validation to FSCTL_SET_ZERO_DATA
	ksmbd: add low bound validation to FSCTL_QUERY_ALLOCATED_RANGES
	ksmbd: fix possible refcount leak in smb2_open()
	gve: Cache link_speed value from device
	net: dsa: mt7530: move enabling disabling core clock to mt7530_pll_setup()
	net: dsa: mt7530: move lowering TRGMII driving to mt7530_setup()
	net: dsa: mt7530: move setting ssc_delta to PHY_INTERFACE_MODE_TRGMII case
	net: mdio: thunder: Add missing fwnode_handle_put()
	Bluetooth: btqcomsmd: Fix command timeout after setting BD address
	Bluetooth: L2CAP: Fix responding with wrong PDU type
	Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work
	platform/chrome: cros_ec_chardev: fix kernel data leak from ioctl
	thread_info: Add helpers to snapshot thread flags
	entry: Snapshot thread flags
	entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up
	hwmon: fix potential sensor registration fail if of_node is missing
	hwmon (it87): Fix voltage scaling for chips with 10.9mV ADCs
	scsi: qla2xxx: Synchronize the IOCB count to be in order
	scsi: qla2xxx: Perform lockless command completion in abort path
	uas: Add US_FL_NO_REPORT_OPCODES for JMicron JMS583Gen 2
	thunderbolt: Use scale field when allocating USB3 bandwidth
	thunderbolt: Call tb_check_quirks() after initializing adapters
	thunderbolt: Disable interrupt auto clear for rings
	thunderbolt: Add missing UNSET_INBOUND_SBTX for retimer access
	thunderbolt: Use const qualifier for `ring_interrupt_index`
	thunderbolt: Rename shadowed variables bit to interrupt_bit and auto_clear_bit
	ACPI: x86: utils: Add Cezanne to the list for forcing StorageD3Enable
	riscv: Bump COMMAND_LINE_SIZE value to 1024
	drm/cirrus: NULL-check pipe->plane.state->fb in cirrus_pipe_update()
	HID: cp2112: Fix driver not registering GPIO IRQ chip as threaded
	ca8210: fix mac_len negative array access
	HID: intel-ish-hid: ipc: Fix potential use-after-free in work function
	m68k: Only force 030 bus error if PC not in exception table
	selftests/bpf: check that modifier resolves after pointer
	scsi: target: iscsi: Fix an error message in iscsi_check_key()
	scsi: hisi_sas: Check devm_add_action() return value
	scsi: ufs: core: Add soft dependency on governor_simpleondemand
	scsi: lpfc: Check kzalloc() in lpfc_sli4_cgn_params_read()
	scsi: lpfc: Avoid usage of list iterator variable after loop
	scsi: storvsc: Handle BlockSize change in Hyper-V VHD/VHDX file
	net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990
	net: usb: qmi_wwan: add Telit 0x1080 composition
	sh: sanitize the flags on sigreturn
	net/sched: act_mirred: better wording on protection against excessive stack growth
	act_mirred: use the backlog for nested calls to mirred ingress
	cifs: empty interface list when server doesn't support query interfaces
	cifs: print session id while listing open files
	scsi: core: Add BLIST_SKIP_VPD_PAGES for SKhynix H28U74301AMR
	usb: dwc2: fix a devres leak in hw_enable upon suspend resume
	usb: gadget: u_audio: don't let userspace block driver unbind
	efi: sysfb_efi: Fix DMI quirks not working for simpledrm
	mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP
	fscrypt: destroy keyring after security_sb_delete()
	fsverity: Remove WQ_UNBOUND from fsverity read workqueue
	lockd: set file_lock start and end when decoding nlm4 testargs
	arm64: dts: imx8mm-nitrogen-r2: fix WM8960 clock name
	igb: revert rtnl_lock() that causes deadlock
	dm thin: fix deadlock when swapping to thin device
	usb: typec: tcpm: fix warning when handle discover_identity message
	usb: cdns3: Fix issue with using incorrect PCI device function
	usb: cdnsp: Fixes issue with redundant Status Stage
	usb: cdnsp: changes PCI Device ID to fix conflict with CNDS3 driver
	usb: chipdea: core: fix return -EINVAL if request role is the same with current role
	usb: chipidea: core: fix possible concurrent when switch role
	usb: ucsi: Fix NULL pointer deref in ucsi_connector_change()
	kfence: avoid passing -g for test
	KVM: x86: hyper-v: Avoid calling kvm_make_vcpus_request_mask() with vcpu_mask==NULL
	ksmbd: set FILE_NAMED_STREAMS attribute in FS_ATTRIBUTE_INFORMATION
	ksmbd: return STATUS_NOT_SUPPORTED on unsupported smb2.0 dialect
	ksmbd: return unsupported error on smb1 mount
	wifi: mac80211: fix qos on mesh interfaces
	nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()
	drm/bridge: lt8912b: return EPROBE_DEFER if bridge is not found
	drm/meson: fix missing component unbind on bind errors
	drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi
	drm/i915/active: Fix missing debug object activation
	drm/i915: Preserve crtc_state->inherited during state clearing
	riscv: mm: Fix incorrect ASID argument when flushing TLB
	riscv: Handle zicsr/zifencei issues between clang and binutils
	tee: amdtee: fix race condition in amdtee_open_session
	firmware: arm_scmi: Fix device node validation for mailbox transport
	i2c: xgene-slimpro: Fix out-of-bounds bug in xgene_slimpro_i2c_xfer()
	dm stats: check for and propagate alloc_percpu failure
	dm crypt: add cond_resched() to dmcrypt_write()
	dm crypt: avoid accessing uninitialized tasklet
	sched/fair: sanitize vruntime of entity being placed
	sched/fair: Sanitize vruntime of entity being migrated
	mm: kfence: fix using kfence_metadata without initialization in show_object()
	ocfs2: fix data corruption after failed write
	NFSD: fix use-after-free in __nfs42_ssc_open()
	Linux 5.15.105

Change-Id: I79851567ddb8856f76486b164b96a2456f08df29
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-04 15:14:14 +00:00
Ramji Jiyani
47819edaf6 ANDROID: GKI: Multi arch exports protection support
ABI is being implemented for x86_64, making it necessary
to support protected exports header file generation for
the GKI modules for multiple architecture.

Enable support to select required inputs based on the ARCH
to generate gki_module_protected_exports.h during kernel
build.

Inputs for generating gki_module_protected_exports.h are:

ARCH = arm64:
ABI Protected exports list: abi_gki_protected_exports_aarch64
Protected GKI modules list: gki_aarch64_protected_modules

ARCH = x86_64:
ABI Protected exports list: abi_gki_protected_exports_x86_64
Protected GKI modules list: gki_x86_64_protected_modules

Test: TH
Test: Manual verification of the generated header file
Test: bazel run //common:kernel_aarch64_abi_update_protected_exports
Bug: 151893768
Change-Id: Ic4bcb2732199b71a7973b5ce4c852bcd95d37131
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
(cherry picked from commit bf07621480a22ed6254731244d14fc7d1d5070f1)
2023-03-31 18:52:41 +00:00