[ Upstream commit a22b7388d6 ]
Peek at old qdisc and graft only when deleting a leaf class in the htb,
rather than when deleting the htb itself. Do not peek at the qdisc of the
netdev queue when destroying the htb. The caller may already have grafted a
new qdisc that is not part of the htb structure being destroyed.
This fix resolves two use cases.
1. Using tc to destroy the htb.
- Netdev was being prematurely activated before the htb was fully
destroyed.
2. Using tc to replace the htb with another qdisc (which also leads to
the htb being destroyed).
- Premature netdev activation like previous case. Newly grafted qdisc
was also getting accidentally overwritten when destroying the htb.
Fixes: d03b195b5a ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230113005528.302625-1-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c463721a7 ]
This lockdep splat says it better than I could:
================================
WARNING: inconsistent lock state
6.2.0-rc2-07010-ga9b9500ffaac-dirty #967 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
kworker/1:3/179 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff3ec4036ce098 (_xmit_ETHER#2){+.?.}-{3:3}, at: netif_freeze_queues+0x5c/0xc0
{IN-SOFTIRQ-W} state was registered at:
_raw_spin_lock+0x5c/0xc0
sch_direct_xmit+0x148/0x37c
__dev_queue_xmit+0x528/0x111c
ip6_finish_output2+0x5ec/0xb7c
ip6_finish_output+0x240/0x3f0
ip6_output+0x78/0x360
ndisc_send_skb+0x33c/0x85c
ndisc_send_rs+0x54/0x12c
addrconf_rs_timer+0x154/0x260
call_timer_fn+0xb8/0x3a0
__run_timers.part.0+0x214/0x26c
run_timer_softirq+0x3c/0x74
__do_softirq+0x14c/0x5d8
____do_softirq+0x10/0x20
call_on_irq_stack+0x2c/0x5c
do_softirq_own_stack+0x1c/0x30
__irq_exit_rcu+0x168/0x1a0
irq_exit_rcu+0x10/0x40
el1_interrupt+0x38/0x64
irq event stamp: 7825
hardirqs last enabled at (7825): [<ffffdf1f7200cae4>] exit_to_kernel_mode+0x34/0x130
hardirqs last disabled at (7823): [<ffffdf1f708105f0>] __do_softirq+0x550/0x5d8
softirqs last enabled at (7824): [<ffffdf1f7081050c>] __do_softirq+0x46c/0x5d8
softirqs last disabled at (7811): [<ffffdf1f708166e0>] ____do_softirq+0x10/0x20
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(_xmit_ETHER#2);
<Interrupt>
lock(_xmit_ETHER#2);
*** DEADLOCK ***
3 locks held by kworker/1:3/179:
#0: ffff3ec400004748 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0
#1: ffff80000a0bbdc8 ((work_completion)(&priv->tx_onestep_tstamp)){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0
#2: ffff3ec4036cd438 (&dev->tx_global_lock){+.+.}-{3:3}, at: netif_tx_lock+0x1c/0x34
Workqueue: events enetc_tx_onestep_tstamp
Call trace:
print_usage_bug.part.0+0x208/0x22c
mark_lock+0x7f0/0x8b0
__lock_acquire+0x7c4/0x1ce0
lock_acquire.part.0+0xe0/0x220
lock_acquire+0x68/0x84
_raw_spin_lock+0x5c/0xc0
netif_freeze_queues+0x5c/0xc0
netif_tx_lock+0x24/0x34
enetc_tx_onestep_tstamp+0x20/0x100
process_one_work+0x28c/0x6c0
worker_thread+0x74/0x450
kthread+0x118/0x11c
but I'll say it anyway: the enetc_tx_onestep_tstamp() work item runs in
process context, therefore with softirqs enabled (i.o.w., it can be
interrupted by a softirq). If we hold the netif_tx_lock() when there is
an interrupt, and the NET_TX softirq then gets scheduled, this will take
the netif_tx_lock() a second time and deadlock the kernel.
To solve this, use netif_tx_lock_bh(), which blocks softirqs from
running.
Fixes: 7294380c52 ("enetc: support PTP Sync packet one-step timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230112105440.1786799-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 488e0bf7f3 ]
If uhdlc_priv_tsa != 1 then utdm is not initialized.
And if ret != NULL then goto undo_uhdlc_init, where
utdm is dereferenced. Same if dev == NULL.
Found by Astra Linux on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: 8d68100ab4 ("soc/fsl/qe: fix err handling of ucc_of_parse_tdm")
Signed-off-by: Esina Ekaterina <eesina@astralinux.ru>
Link: https://lore.kernel.org/r/20230112074703.13558-1-eesina@astralinux.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4bb4db7f31 ]
Fix a use-after-free that occurs in kfree_skb() called from
local_cleanup(). This could happen when killing nfc daemon (e.g. neard)
after detaching an nfc device.
When detaching an nfc device, local_cleanup() called from
nfc_llcp_unregister_device() frees local->rx_pending and decreases
local->ref by kref_put() in nfc_llcp_local_put().
In the terminating process, nfc daemon releases all sockets and it leads
to decreasing local->ref. After the last release of local->ref,
local_cleanup() called from local_release() frees local->rx_pending
again, which leads to the bug.
Setting local->rx_pending to NULL in local_cleanup() could prevent
use-after-free when local_cleanup() is called twice.
Found by a modified version of syzkaller.
BUG: KASAN: use-after-free in kfree_skb()
Call Trace:
dump_stack_lvl (lib/dump_stack.c:106)
print_address_description.constprop.0.cold (mm/kasan/report.c:306)
kasan_check_range (mm/kasan/generic.c:189)
kfree_skb (net/core/skbuff.c:955)
local_cleanup (net/nfc/llcp_core.c:159)
nfc_llcp_local_put.part.0 (net/nfc/llcp_core.c:172)
nfc_llcp_local_put (net/nfc/llcp_core.c:181)
llcp_sock_destruct (net/nfc/llcp_sock.c:959)
__sk_destruct (net/core/sock.c:2133)
sk_destruct (net/core/sock.c:2181)
__sk_free (net/core/sock.c:2192)
sk_free (net/core/sock.c:2203)
llcp_sock_release (net/nfc/llcp_sock.c:646)
__sock_release (net/socket.c:650)
sock_close (net/socket.c:1365)
__fput (fs/file_table.c:306)
task_work_run (kernel/task_work.c:179)
ptrace_notify (kernel/signal.c:2354)
syscall_exit_to_user_mode_prepare (kernel/entry/common.c:278)
syscall_exit_to_user_mode (kernel/entry/common.c:296)
do_syscall_64 (arch/x86/entry/common.c:86)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:106)
Allocated by task 4719:
kasan_save_stack (mm/kasan/common.c:45)
__kasan_slab_alloc (mm/kasan/common.c:325)
slab_post_alloc_hook (mm/slab.h:766)
kmem_cache_alloc_node (mm/slub.c:3497)
__alloc_skb (net/core/skbuff.c:552)
pn533_recv_response (drivers/nfc/pn533/usb.c:65)
__usb_hcd_giveback_urb (drivers/usb/core/hcd.c:1671)
usb_giveback_urb_bh (drivers/usb/core/hcd.c:1704)
tasklet_action_common.isra.0 (kernel/softirq.c:797)
__do_softirq (kernel/softirq.c:571)
Freed by task 1901:
kasan_save_stack (mm/kasan/common.c:45)
kasan_set_track (mm/kasan/common.c:52)
kasan_save_free_info (mm/kasan/genericdd.c:518)
__kasan_slab_free (mm/kasan/common.c:236)
kmem_cache_free (mm/slub.c:3809)
kfree_skbmem (net/core/skbuff.c:874)
kfree_skb (net/core/skbuff.c:931)
local_cleanup (net/nfc/llcp_core.c:159)
nfc_llcp_unregister_device (net/nfc/llcp_core.c:1617)
nfc_unregister_device (net/nfc/core.c:1179)
pn53x_unregister_nfc (drivers/nfc/pn533/pn533.c:2846)
pn533_usb_disconnect (drivers/nfc/pn533/usb.c:579)
usb_unbind_interface (drivers/usb/core/driver.c:458)
device_release_driver_internal (drivers/base/dd.c:1279)
bus_remove_device (drivers/base/bus.c:529)
device_del (drivers/base/core.c:3665)
usb_disable_device (drivers/usb/core/message.c:1420)
usb_disconnect (drivers/usb/core.c:2261)
hub_event (drivers/usb/core/hub.c:5833)
process_one_work (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:212 include/trace/events/workqueue.h:108 kernel/workqueue.c:2281)
worker_thread (include/linux/list.h:282 kernel/workqueue.c:2423)
kthread (kernel/kthread.c:319)
ret_from_fork (arch/x86/entry/entry_64.S:301)
Fixes: 3536da06db ("NFC: llcp: Clean local timers and works when removing a device")
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Link: https://lore.kernel.org/r/20230111131914.3338838-1-jisoo.jang@yonsei.ac.kr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e4f4db4779 ]
To mitigate Spectre v4, 2039f26f3a ("bpf: Fix leakage due to
insufficient speculative store bypass mitigation") inserts lfence
instructions after 1) initializing a stack slot and 2) spilling a
pointer to the stack.
However, this does not cover cases where a stack slot is first
initialized with a pointer (subject to sanitization) but then
overwritten with a scalar (not subject to sanitization because
the slot was already initialized). In this case, the second write
may be subject to speculative store bypass (SSB) creating a
speculative pointer-as-scalar type confusion. This allows the
program to subsequently leak the numerical pointer value using,
for example, a branch-based cache side channel.
To fix this, also sanitize scalars if they write a stack slot
that previously contained a pointer. Assuming that pointer-spills
are only generated by LLVM on register-pressure, the performance
impact on most real-world BPF programs should be small.
The following unprivileged BPF bytecode drafts a minimal exploit
and the mitigation:
[...]
// r6 = 0 or 1 (skalar, unknown user input)
// r7 = accessible ptr for side channel
// r10 = frame pointer (fp), to be leaked
//
r9 = r10 # fp alias to encourage ssb
*(u64 *)(r9 - 8) = r10 // fp[-8] = ptr, to be leaked
// lfence added here because of pointer spill to stack.
//
// Ommitted: Dummy bpf_ringbuf_output() here to train alias predictor
// for no r9-r10 dependency.
//
*(u64 *)(r10 - 8) = r6 // fp[-8] = scalar, overwrites ptr
// 2039f26f3a: no lfence added because stack slot was not STACK_INVALID,
// store may be subject to SSB
//
// fix: also add an lfence when the slot contained a ptr
//
r8 = *(u64 *)(r9 - 8)
// r8 = architecturally a scalar, speculatively a ptr
//
// leak ptr using branch-based cache side channel:
r8 &= 1 // choose bit to leak
if r8 == 0 goto SLOW // no mispredict
// architecturally dead code if input r6 is 0,
// only executes speculatively iff ptr bit is 1
r8 = *(u64 *)(r7 + 0) # encode bit in cache (0: slow, 1: fast)
SLOW:
[...]
After running this, the program can time the access to *(r7 + 0) to
determine whether the chosen pointer bit was 0 or 1. Repeat this 64
times to recover the whole address on amd64.
In summary, sanitization can only be skipped if one scalar is
overwritten with another scalar. Scalar-confusion due to speculative
store bypass can not lead to invalid accesses because the pointer
bounds deducted during verification are enforced using branchless
logic. See 979d63d50c ("bpf: prevent out of bounds speculation on
pointer arithmetic") for details.
Do not make the mitigation depend on !env->allow_{uninit_stack,ptr_leaks}
because speculative leaks are likely unexpected if these were enabled.
For example, leaking the address to a protected log file may be acceptable
while disabling the mitigation might unintentionally leak the address
into the cached-state of a map that is accessible to unprivileged
processes.
Fixes: 2039f26f3a ("bpf: Fix leakage due to insufficient speculative store bypass mitigation")
Signed-off-by: Luis Gerhorst <gerhorst@cs.fau.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Henriette Hofmeier <henriette.hofmeier@rub.de>
Link: https://lore.kernel.org/bpf/edc95bad-aada-9cfc-ffe2-fa9bb206583c@cs.fau.de
Link: https://lore.kernel.org/bpf/20230109150544.41465-1-gerhorst@cs.fau.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 926446ae24 ]
AN restart triggered during KR training not only aborts the KR training
process but also move the HW to unstable state. Driver has to wait upto
500ms or until the KR training is completed before restarting AN cycle.
Fixes: 7c12aa0877 ("amd-xgbe: Move the PHY support into amd-xgbe")
Co-developed-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 579923d84b ]
There is difference in the TX Flow Control registers (TFCR) between the
revisions of the hardware. The older revisions of hardware used to have
single register per queue. Whereas, the newer revision of hardware (from
ver 30H onwards) have one register per priority.
Update the driver to use the TFCR based on the reported version of the
hardware.
Fixes: c5aa9e3b81 ("amd-xgbe: Initial AMD 10GbE platform driver")
Co-developed-by: Ajith Nayak <Ajith.Nayak@amd.com>
Signed-off-by: Ajith Nayak <Ajith.Nayak@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17eee264ef ]
sp_usb_phy_probe() will call platform_get_resource_byname() that may fail
and return NULL. devm_ioremap() will use usbphy->moon4_res_mem->start as
input, which may causes null-ptr-deref. Check the ret value of
platform_get_resource_byname() to avoid the null-ptr-deref.
Fixes: 99d9ccd973 ("phy: usb: Add USB2.0 phy driver for Sunplus SP7021")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Link: https://lore.kernel.org/r/20221125021222.25687-1-shangxiaojing@huawei.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 07a2975c65 ]
Commit 374146cad4 ("drm/vc4: Switch to drmm_mutex_init") converted,
among other functions, vc4_create_object() to use drmm_mutex_init().
However, that function is used to allocate a BO, and therefore the
mutex needs to be freed much sooner than when the DRM device is removed
from the system.
For each buffer allocation we thus end up allocating a small structure
as part of the DRM-managed mechanism that is never freed, eventually
leading us to no longer having any free memory anymore.
Let's switch back to mutex_init/mutex_destroy to deal with it properly.
Fixes: 374146cad4 ("drm/vc4: Switch to drmm_mutex_init")
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230112091243.490799-1-maxime@cerno.tech
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6cf91b7b4 ]
If signal_pending() returns true, schedule_timeout() will not be executed,
causing the waiting task to remain in the wait queue.
Fixed by adding a call to finish_wait(), which ensures that the waiting
task will always be removed from the wait queue.
Fixes: f4e44b3933 ("NFSD: delay unmount source's export after inter-server copy completed.")
Signed-off-by: Xingyuan Mo <hdthky0@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 74d3320f6f ]
When CONFIG_DEBUG_INFO_BTF_MODULES=y, running 'make modules'
in the clean kernel tree will get the following error.
$ grep CONFIG_DEBUG_INFO_BTF_MODULES .config
CONFIG_DEBUG_INFO_BTF_MODULES=y
$ make -s clean
$ make modules
[snip]
AR vmlinux.a
ar: ./built-in.a: No such file or directory
make: *** [Makefile:1241: vmlinux.a] Error 1
'modules' depends on 'vmlinux', but builtin objects are not built.
Define KBUILD_BUILTIN.
Fixes: f73edc8951 ("kbuild: unify two modpost invocations")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8debed3efe ]
Nathan Chancellor reports that $(NM) emits an error message when
GNU Make 4.4 is used to build the ARM zImage.
$ make-4.4 ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- O=build defconfig zImage
[snip]
LD vmlinux
NM System.map
SORTTAB vmlinux
OBJCOPY arch/arm/boot/Image
Kernel: arch/arm/boot/Image is ready
arm-linux-gnueabi-nm: 'arch/arm/boot/compressed/../../../../vmlinux': No such file
/bin/sh: 1: arithmetic expression: expecting primary: " "
LDS arch/arm/boot/compressed/vmlinux.lds
AS arch/arm/boot/compressed/head.o
GZIP arch/arm/boot/compressed/piggy_data
AS arch/arm/boot/compressed/piggy.o
CC arch/arm/boot/compressed/misc.o
This occurs since GNU Make commit 98da874c4303 ("[SV 10593] Export
variables to $(shell ...) commands"), and the O= option is needed to
reproduce it. The generated zImage is correct despite the error message.
As the commit description of 98da874c4303 [1] says, exported variables
are passed down to $(shell ) functions, which means exported recursive
variables might be expanded earlier than before, in the parse stage.
The following test code demonstrates the change for GNU Make 4.4.
[Test Makefile]
$(shell echo hello > foo)
export foo = $(shell cat bar/../foo)
$(shell mkdir bar)
all:
@echo $(foo)
[GNU Make 4.3]
$ rm -rf bar; make-4.3
hello
[GNU Make 4.4]
$ rm -rf bar; make-4.4
cat: bar/../foo: No such file or directory
hello
The 'foo' is a resursively expanded (i.e. lazily expanded) variable.
GNU Make 4.3 expands 'foo' just before running the recipe '@echo $(foo)',
at this point, the directory 'bar' exists.
GNU Make 4.4 expands 'foo' to evaluate $(shell mkdir bar) because it is
exported. At this point, the directory 'bar' does not exit yet. The cat
command cannot resolve the bar/../foo path, hence the error message.
Let's get back to the kernel Makefile.
In arch/arm/boot/compressed/Makefile, KBSS_SZ is referenced by
LDFLAGS_vmlinux, which is recursive and also exported by the top
Makefile.
GNU Make 4.3 expands KBSS_SZ just before running the recipes, so no
error message.
GNU Make 4.4 expands KBSS_SZ in the parse stage, where the directory
arm/arm/boot/compressed does not exit yet. When compiled with O=,
the output directory is created by $(shell mkdir -p $(obj-dirs))
in scripts/Makefile.build.
There are two ways to fix this particular issue:
- change "$(obj)/../../../../vmlinux" in KBSS_SZ to "vmlinux"
- unexport LDFLAGS_vmlinux
This commit takes the latter course because it is what I originally
intended.
Commit 3ec8a5b33d ("kbuild: do not export LDFLAGS_vmlinux")
unexported LDFLAGS_vmlinux.
Commit 5d4aeffbf7 ("kbuild: rebuild .vmlinux.export.o when its
prerequisite is updated") accidentally exported it again.
We can clean up arch/arm/boot/compressed/Makefile later.
[1]: https://git.savannah.gnu.org/cgit/make.git/commit/?id=98da874c43035a490cdca81331724f233a3d0c9a
Link: https://lore.kernel.org/all/Y7i8+EjwdnhHtlrr@dev-arch.thelio-3990X/
Fixes: 5d4aeffbf7 ("kbuild: rebuild .vmlinux.export.o when its prerequisite is updated")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eef034ac66 ]
When aops->write_begin() does not initialize fsdata, KMSAN may report
an error passing the latter to aops->write_end().
Fix this by unconditionally initializing fsdata.
Fixes: f2b6a16eb8 ("fs: affs convert to new aops")
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e0c4a422f5 ]
Fix three error exit issues in expected receive setup.
Re-arrange error exits to increase readability.
Issues and fixes:
1. Possible missed page unpin if tidlist copyout fails and
not all pinned pages where made part of a TID.
Fix: Unpin the unused pages.
2. Return success with unset return values tidcnt and length
when no pages were pinned.
Fix: Return -ENOSPC if no pages were pinned.
3. Return success with unset return values tidcnt and length when
no rcvarray entries available.
Fix: Return -ENOSPC if no rcvarray entries are available.
Fixes: 7e7a436ecb ("staging/hfi1: Add TID entry program function body")
Fixes: 97736f36db ("IB/hfi1: Validate page aligned for a given virtual addres")
Fixes: f404ca4c7e ("IB/hfi1: Refactor hfi_user_exp_rcv_setup() IOCTL")
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/167328548150.1472310.1492305874804187634.stgit@awfm-02.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0afec5e9ce ]
When registering a new DMA MR after selecting the best aligned page size
for it, we iterate over the given sglist to split each entry to smaller,
aligned to the selected page size, DMA blocks.
In given circumstances where the sg entry and page size fit certain
sizes and the sg entry is not aligned to the selected page size, the
total size of the aligned pages we need to cover the sg entry is >= 4GB.
Under this circumstances, while iterating page aligned blocks, the
counter responsible for counting how much we advanced from the start of
the sg entry is overflowed because its type is u32 and we pass 4GB in
size. This can lead to an infinite loop inside the iterator function
because the overflow prevents the counter to be larger
than the size of the sg entry.
Fix the presented problem by changing the advancement condition to
eliminate overflow.
Backtrace:
[ 192.374329] efa_reg_user_mr_dmabuf
[ 192.376783] efa_register_mr
[ 192.382579] pgsz_bitmap 0xfffff000 rounddown 0x80000000
[ 192.386423] pg_sz [0x80000000] umem_length[0xc0000000]
[ 192.392657] start 0x0 length 0xc0000000 params.page_shift 31 params.page_num 3
[ 192.399559] hp_cnt[3], pages_in_hp[524288]
[ 192.403690] umem->sgt_append.sgt.nents[1]
[ 192.407905] number entries: [1], pg_bit: [31]
[ 192.411397] biter->__sg_nents [1] biter->__sg [0000000008b0c5d8]
[ 192.415601] biter->__sg_advance [665837568] sg_dma_len[3221225472]
[ 192.419823] biter->__sg_nents [1] biter->__sg [0000000008b0c5d8]
[ 192.423976] biter->__sg_advance [2813321216] sg_dma_len[3221225472]
[ 192.428243] biter->__sg_nents [1] biter->__sg [0000000008b0c5d8]
[ 192.432397] biter->__sg_advance [665837568] sg_dma_len[3221225472]
Fixes: a808273a49 ("RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks")
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Link: https://lore.kernel.org/r/20230109133711.13678-1-ynachum@amazon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1aefe5c177 ]
If you create MRs more than 0x10000 times after loading the module,
responder starts to reply NAKs for RDMA/Atomic operations because of rkey
violation detected in check_rkey(). The root cause is that rkeys are
incremented each time a new MR is created and the value overflows into the
range reserved for MWs.
This commit also increases the value of RXE_MAX_MW that has been limited
unlike other parameters.
Fixes: 0994a1bcd5 ("RDMA/rxe: Bump up default maximum values used via uverbs")
Link: https://lore.kernel.org/r/20221220080848.253785-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eaf2213ba5 ]
If *.conf.default is updated, builtin-policy.h should be rebuilt,
but this does not work when compiled with O= option.
[Without this commit]
$ touch security/tomoyo/policy/exception_policy.conf.default
$ make O=/tmp security/tomoyo/
make[1]: Entering directory '/tmp'
GEN Makefile
CALL /home/masahiro/ref/linux/scripts/checksyscalls.sh
DESCEND objtool
make[1]: Leaving directory '/tmp'
[With this commit]
$ touch security/tomoyo/policy/exception_policy.conf.default
$ make O=/tmp security/tomoyo/
make[1]: Entering directory '/tmp'
GEN Makefile
CALL /home/masahiro/ref/linux/scripts/checksyscalls.sh
DESCEND objtool
POLICY security/tomoyo/builtin-policy.h
CC security/tomoyo/common.o
AR security/tomoyo/built-in.a
make[1]: Leaving directory '/tmp'
$(srctree)/ is essential because $(wildcard ) does not follow VPATH.
Fixes: f02dee2d14 ("tomoyo: Do not generate empty policy files")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e325285de2 ]
When unloading the SCMI core stack module, configured to use the virtio
SCMI transport, LOCKDEP reports the splat down below about unsafe locks
dependencies.
In order to avoid this possible unsafe locking scenario call upfront
virtio_break_device() before getting hold of vioch->lock.
=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.1.0-00067-g6b934395ba07-dirty #4 Not tainted
-----------------------------------------------------
rmmod/307 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffff000080c510e0 (&dev->vqs_list_lock){+.+.}-{3:3}, at: virtio_break_device+0x28/0x68
and this task is already holding:
ffff00008288ada0 (&channels[i].lock){-.-.}-{3:3}, at: virtio_chan_free+0x60/0x168 [scmi_module]
which would create a new lock dependency:
(&channels[i].lock){-.-.}-{3:3} -> (&dev->vqs_list_lock){+.+.}-{3:3}
but this new dependency connects a HARDIRQ-irq-safe lock:
(&channels[i].lock){-.-.}-{3:3}
... which became HARDIRQ-irq-safe at:
lock_acquire+0x128/0x398
_raw_spin_lock_irqsave+0x78/0x140
scmi_vio_complete_cb+0xb4/0x3b8 [scmi_module]
vring_interrupt+0x84/0x120
vm_interrupt+0x94/0xe8
__handle_irq_event_percpu+0xb4/0x3d8
handle_irq_event_percpu+0x20/0x68
handle_irq_event+0x50/0xb0
handle_fasteoi_irq+0xac/0x138
generic_handle_domain_irq+0x34/0x50
gic_handle_irq+0xa0/0xd8
call_on_irq_stack+0x2c/0x54
do_interrupt_handler+0x8c/0x90
el1_interrupt+0x40/0x78
el1h_64_irq_handler+0x18/0x28
el1h_64_irq+0x64/0x68
_raw_write_unlock_irq+0x48/0x80
ep_start_scan+0xf0/0x128
do_epoll_wait+0x390/0x858
do_compat_epoll_pwait.part.34+0x1c/0xb8
__arm64_sys_epoll_pwait+0x80/0xd0
invoke_syscall+0x4c/0x110
el0_svc_common.constprop.3+0x98/0x120
do_el0_svc+0x34/0xd0
el0_svc+0x40/0x98
el0t_64_sync_handler+0x98/0xc0
el0t_64_sync+0x170/0x174
to a HARDIRQ-irq-unsafe lock:
(&dev->vqs_list_lock){+.+.}-{3:3}
... which became HARDIRQ-irq-unsafe at:
...
lock_acquire+0x128/0x398
_raw_spin_lock+0x58/0x70
__vring_new_virtqueue+0x130/0x1c0
vring_create_virtqueue+0xc4/0x2b8
vm_find_vqs+0x20c/0x430
init_vq+0x308/0x390
virtblk_probe+0x114/0x9b0
virtio_dev_probe+0x1a4/0x248
really_probe+0xc8/0x3a8
__driver_probe_device+0x84/0x190
driver_probe_device+0x44/0x110
__driver_attach+0x104/0x1e8
bus_for_each_dev+0x7c/0xd0
driver_attach+0x2c/0x38
bus_add_driver+0x1e4/0x258
driver_register+0x6c/0x128
register_virtio_driver+0x2c/0x48
virtio_blk_init+0x70/0xac
do_one_initcall+0x84/0x420
kernel_init_freeable+0x2d0/0x340
kernel_init+0x2c/0x138
ret_from_fork+0x10/0x20
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->vqs_list_lock);
local_irq_disable();
lock(&channels[i].lock);
lock(&dev->vqs_list_lock);
<Interrupt>
lock(&channels[i].lock);
*** DEADLOCK ***
================
Fixes: 42e90eb53b ("firmware: arm_scmi: Add a virtio channel refcount")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20221222183823.518856-6-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b3d40c3ec3 ]
As the kcalloc may return NULL pointer,
it should be better to check the ishtp_dma_tx_map
before use in order to avoid NULL pointer dereference.
Fixes: 3703f53b99 ("HID: intel_ish-hid: ISH Transport layer")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 10e2f328bd ]
dt_binding_check detects an issue with the pgc_hsiomix power
domain:
pgc: 'power-domains@17' does not match any of the regexes
This is because 'power-domains' should be 'power-domain'
Fixes: 2ae42e0c0b ("arm64: dts: imx8mp: add HSIO power-domains")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b3b75ace20 ]
The GPC node references an interrupt parent, but it doesn't
state the interrupt itself. According to the TRM, this IRQ
is 87. This also eliminate an error detected from dt_binding_check
Fixes: fc0f051246 ("arm64: dts: imx8mp: add GPC node with GPU power domains")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 21b84ebeee ]
Setting the device name after it has been registered confuses the sysfs
cleanup paths. This has already been fixed for the imx8m-blk-ctrl driver in
b64b46fbaa ("Revert "soc: imx: imx8m-blk-ctrl: set power device name""),
but the same problem exists in imx8mp-blk-ctrl.
Fixes: 556f5cf956 ("soc: imx: add i.MX8MP HSIO blk-ctrl")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>