linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-05 18:41:58 +09:00

Author	SHA1	Message	Date
Hengqi Chen	a19df91b5b	LoongArch: Add more instruction opcodes and emit_* helpers commit add28024405ed600afaa02749989d4fd119f9057 upstream. This patch adds more instruction opcodes and their corresponding emit_* helpers which will be used in later patches. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:59 +01:00
Eric Dumazet	029935507d	arp: do not assume dev_hard_header() does not change skb->head [ Upstream commit c92510f5e3f82ba11c95991824a41e59a9c5ed81 ] arp_create() is the only dev_hard_header() caller making assumption about skb->head being unchanged. A recent commit broke this assumption. Initialize @arp pointer after dev_hard_header() call. Fixes: db5b4e39c4e6 ("ip6_gre: make ip6gre_header() robust") Reported-by: syzbot+58b44a770a1585795351@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260107212250.384552-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:59 +01:00
Wei Fang	0d254b0a15	net: enetc: fix build warning when PAGE_SIZE is greater than 128K [ Upstream commit 4b5bdabb5449b652122e43f507f73789041d4abe ] The max buffer size of ENETC RX BD is 0xFFFF bytes, so if the PAGE_SIZE is greater than 128K, ENETC_RXB_DMA_SIZE and ENETC_RXB_DMA_SIZE_XDP will be greater than 0xFFFF, thus causing a build warning. This will not cause any practical issues because ENETC is currently only used on the ARM64 platform, and the max PAGE_SIZE is 64K. So this patch is only for fixing the build warning that occurs when compiling ENETC drivers for other platforms. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601050637.kHEKKOG7-lkp@intel.com/ Fixes: e59bc32df2e9 ("net: enetc: correct the value of ENETC_RXB_TRUESIZE") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260107091204.1980222-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:59 +01:00
Petko Manolov	93f18eaa19	net: usb: pegasus: fix memory leak in update_eth_regs_async() [ Upstream commit afa27621a28af317523e0836dad430bec551eb54 ] When asynchronously writing to the device registers and if usb_submit_urb() fail, the code fail to release allocated to this point resources. Fixes: `323b34963d` ("drivers: net: usb: pegasus: fix control urb submission") Signed-off-by: Petko Manolov <petkan@nucleusys.com> Link: https://patch.msgid.link/20260106084821.3746677-1-petko.manolov@konsulko.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:59 +01:00
Xiang Mei	11bf913461	net/sched: sch_qfq: Fix NULL deref when deactivating inactive aggregate in qfq_reset [ Upstream commit c1d73b1480235731e35c81df70b08f4714a7d095 ] `qfq_class->leaf_qdisc->q.qlen > 0` does not imply that the class itself is active. Two qfq_class objects may point to the same leaf_qdisc. This happens when: 1. one QFQ qdisc is attached to the dev as the root qdisc, and 2. another QFQ qdisc is temporarily referenced (e.g., via qdisc_get() / qdisc_put()) and is pending to be destroyed, as in function tc_new_tfilter. When packets are enqueued through the root QFQ qdisc, the shared leaf_qdisc->q.qlen increases. At the same time, the second QFQ qdisc triggers qdisc_put and qdisc_destroy: the qdisc enters qfq_reset() with its own q->q.qlen == 0, but its class's leaf qdisc->q.qlen > 0. Therefore, the qfq_reset would wrongly deactivate an inactive aggregate and trigger a null-deref in qfq_deactivate_agg: [ 0.903172] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 0.903571] #PF: supervisor write access in kernel mode [ 0.903860] #PF: error_code(0x0002) - not-present page [ 0.904177] PGD 10299b067 P4D 10299b067 PUD 10299c067 PMD 0 [ 0.904502] Oops: Oops: 0002 [#1] SMP NOPTI [ 0.904737] CPU: 0 UID: 0 PID: 135 Comm: exploit Not tainted 6.19.0-rc3+ #2 NONE [ 0.905157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 [ 0.905754] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2)) [ 0.906046] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0 Code starting with the faulting instruction =========================================== 0: 0f 84 4d 01 00 00 je 0x153 6: 48 89 70 18 mov %rsi,0x18(%rax) a: 8b 4b 10 mov 0x10(%rbx),%ecx d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 14: 48 8b 78 08 mov 0x8(%rax),%rdi 18: 48 d3 e2 shl %cl,%rdx 1b: 48 21 f2 and %rsi,%rdx 1e: 48 2b 13 sub (%rbx),%rdx 21: 48 8b 30 mov (%rax),%rsi 24: 48 d3 ea shr %cl,%rdx 27: 8b 4b 18 mov 0x18(%rbx),%ecx ... [ 0.907095] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246 [ 0.907368] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000 [ 0.907723] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.908100] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000 [ 0.908451] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000 [ 0.908804] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880 [ 0.909179] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000 [ 0.909572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.909857] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0 [ 0.910247] PKRU: 55555554 [ 0.910391] Call Trace: [ 0.910527] <TASK> [ 0.910638] qfq_reset_qdisc (net/sched/sch_qfq.c:357 net/sched/sch_qfq.c:1485) [ 0.910826] qdisc_reset (include/linux/skbuff.h:2195 include/linux/skbuff.h:2501 include/linux/skbuff.h:3424 include/linux/skbuff.h:3430 net/sched/sch_generic.c:1036) [ 0.911040] __qdisc_destroy (net/sched/sch_generic.c:1076) [ 0.911236] tc_new_tfilter (net/sched/cls_api.c:2447) [ 0.911447] rtnetlink_rcv_msg (net/core/rtnetlink.c:6958) [ 0.911663] ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6861) [ 0.911894] netlink_rcv_skb (net/netlink/af_netlink.c:2550) [ 0.912100] netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) [ 0.912296] ? __alloc_skb (net/core/skbuff.c:706) [ 0.912484] netlink_sendmsg (net/netlink/af_netlink.c:1894) [ 0.912682] sock_write_iter (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1) net/socket.c:1195 (discriminator 1)) [ 0.912880] vfs_write (fs/read_write.c:593 fs/read_write.c:686) [ 0.913077] ksys_write (fs/read_write.c:738) [ 0.913252] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) [ 0.913438] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131) [ 0.913687] RIP: 0033:0x424c34 [ 0.913844] Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 2d 44 09 00 00 74 13 b8 01 00 00 00 0f 05 9 Code starting with the faulting instruction =========================================== 0: 89 02 mov %eax,(%rdx) 2: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax 9: eb bd jmp 0xffffffffffffffc8 b: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 12: 00 00 00 15: 90 nop 16: f3 0f 1e fa endbr64 1a: 80 3d 2d 44 09 00 00 cmpb $0x0,0x9442d(%rip) # 0x9444e 21: 74 13 je 0x36 23: b8 01 00 00 00 mov $0x1,%eax 28: 0f 05 syscall 2a: 09 .byte 0x9 [ 0.914807] RSP: 002b:00007ffea1938b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 0.915197] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000424c34 [ 0.915556] RDX: 000000000000003c RSI: 000000002af378c0 RDI: 0000000000000003 [ 0.915912] RBP: 00007ffea1938bc0 R08: 00000000004b8820 R09: 0000000000000000 [ 0.916297] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffea1938d28 [ 0.916652] R13: 00007ffea1938d38 R14: 00000000004b3828 R15: 0000000000000001 [ 0.917039] </TASK> [ 0.917158] Modules linked in: [ 0.917316] CR2: 0000000000000000 [ 0.917484] ---[ end trace 0000000000000000 ]--- [ 0.917717] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2)) [ 0.917978] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0 Code starting with the faulting instruction =========================================== 0: 0f 84 4d 01 00 00 je 0x153 6: 48 89 70 18 mov %rsi,0x18(%rax) a: 8b 4b 10 mov 0x10(%rbx),%ecx d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 14: 48 8b 78 08 mov 0x8(%rax),%rdi 18: 48 d3 e2 shl %cl,%rdx 1b: 48 21 f2 and %rsi,%rdx 1e: 48 2b 13 sub (%rbx),%rdx 21: 48 8b 30 mov (%rax),%rsi 24: 48 d3 ea shr %cl,%rdx 27: 8b 4b 18 mov 0x18(%rbx),%ecx ... [ 0.918902] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246 [ 0.919198] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000 [ 0.919559] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.919908] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000 [ 0.920289] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000 [ 0.920648] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880 [ 0.921014] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000 [ 0.921424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.921710] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0 [ 0.922097] PKRU: 55555554 [ 0.922240] Kernel panic - not syncing: Fatal exception [ 0.922590] Kernel Offset: disabled Fixes: `0545a30377` ("pkt_sched: QFQ - quick fair queue scheduler") Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260106034100.1780779-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:59 +01:00
René Rebe	9be8261788	HID: quirks: work around VID/PID conflict for appledisplay [ Upstream commit c7fabe4ad9219866c203164a214c474c95b36bf2 ] For years I wondered why the Apple Cinema Display driver would not just work for me. Turns out the hidraw driver instantly takes it over. Fix by adding appledisplay VID/PIDs to hid_have_special_driver. Fixes: `069e8a65cd` ("Driver for Apple Cinema Display") Signed-off-by: René Rebe <rene@exactco.de> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:59 +01:00
Mohammad Heib	88bea149db	net: fix memory leak in skb_segment_list for GRO packets [ Upstream commit 238e03d0466239410b72294b79494e43d4fabe77 ] When skb_segment_list() is called during packet forwarding, it handles packets that were aggregated by the GRO engine. Historically, the segmentation logic in skb_segment_list assumes that individual segments are split from a parent SKB and may need to carry their own socket memory accounting. Accordingly, the code transfers truesize from the parent to the newly created segments. Prior to commit ed4cccef64c1 ("gro: fix ownership transfer"), this truesize subtraction in skb_segment_list() was valid because fragments still carry a reference to the original socket. However, commit ed4cccef64c1 ("gro: fix ownership transfer") changed this behavior by ensuring that fraglist entries are explicitly orphaned (skb->sk = NULL) to prevent illegal orphaning later in the stack. This change meant that the entire socket memory charge remained with the head SKB, but the corresponding accounting logic in skb_segment_list() was never updated. As a result, the current code unconditionally adds each fragment's truesize to delta_truesize and subtracts it from the parent SKB. Since the fragments are no longer charged to the socket, this subtraction results in an effective under-count of memory when the head is freed. This causes sk_wmem_alloc to remain non-zero, preventing socket destruction and leading to a persistent memory leak. The leak can be observed via KMEMLEAK when tearing down the networking environment: unreferenced object 0xffff8881e6eb9100 (size 2048): comm "ping", pid 6720, jiffies 4295492526 backtrace: kmem_cache_alloc_noprof+0x5c6/0x800 sk_prot_alloc+0x5b/0x220 sk_alloc+0x35/0xa00 inet6_create.part.0+0x303/0x10d0 __sock_create+0x248/0x640 __sys_socket+0x11b/0x1d0 Since skb_segment_list() is exclusively used for SKB_GSO_FRAGLIST packets constructed by GRO, the truesize adjustment is removed. The call to skb_release_head_state() must be preserved. As documented in commit `cf673ed0e0` ("net: fix fraglist segmentation reference count leak"), it is still required to correctly drop references to SKB extensions that may be overwritten during __copy_skb_header(). Fixes: ed4cccef64c1 ("gro: fix ownership transfer") Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260104213101.352887-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:59 +01:00
Srijit Bose	970a1ac903	bnxt_en: Fix potential data corruption with HW GRO/LRO [ Upstream commit ffeafa65b2b26df2f5b5a6118d3174f17bd12ec5 ] Fix the max number of bits passed to find_first_zero_bit() in bnxt_alloc_agg_idx(). We were incorrectly passing the number of long words. find_first_zero_bit() may fail to find a zero bit and cause a wrong ID to be used. If the wrong ID is already in use, this can cause data corruption. Sometimes an error like this can also be seen: bnxt_en 0000:83:00.0 enp131s0np0: TPA end agg_buf 2 != expected agg_bufs 1 Fix it by passing the correct number of bits MAX_TPA_P5. Use DECLARE_BITMAP() to more cleanly define the bitmap. Add a sanity check to warn if a bit cannot be found and reset the ring [MChan]. Fixes: `ec4d8e7cf0` ("bnxt_en: Add TPA ID mapping logic for 57500 chips.") Reviewed-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Srijit Bose <srijit.bose@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251231083625.3911652-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:58 +01:00
Zilin Guan	dc6f73f73c	net: wwan: iosm: Fix memory leak in ipc_mux_deinit() [ Upstream commit 92e6e0a87f6860a4710f9494f8c704d498ae60f8 ] Commit `1f52d7b622` ("net: wwan: iosm: Enable M.2 7360 WWAN card support") allocated memory for pp_qlt in ipc_mux_init() but did not free it in ipc_mux_deinit(). This results in a memory leak when the driver is unloaded. Free the allocated memory in ipc_mux_deinit() to fix the leak. Fixes: `1f52d7b622` ("net: wwan: iosm: Enable M.2 7360 WWAN card support") Co-developed-by: Jianhao Xu <jianhao.xu@seu.edu.cn> Signed-off-by: Jianhao Xu <jianhao.xu@seu.edu.cn> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Link: https://patch.msgid.link/20251230071853.1062223-1-zilin@seu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:58 +01:00
Gal Pressman	81e7205b82	net/mlx5e: Don't print error message due to invalid module [ Upstream commit 144297e2a24e3e54aee1180ec21120ea38822b97 ] Dumping module EEPROM on newer modules is supported through the netlink interface only. Querying with old userspace ethtool (or other tools, such as 'lshw') which still uses the ioctl interface results in an error message that could flood dmesg (in addition to the expected error return value). The original message was added under the assumption that the driver should be able to handle all module types, but now that such flows are easily triggered from userspace, it doesn't serve its purpose. Change the log level of the print in mlx5_query_module_eeprom() to debug. Fixes: `bb64143eee` ("net/mlx5e: Add ethtool support for dump module EEPROM") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20251225132717.358820-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:58 +01:00
Di Zhu	0ab968d9c5	netdev: preserve NETIF_F_ALL_FOR_ALL across TSO updates [ Upstream commit 02d1e1a3f9239cdb3ecf2c6d365fb959d1bf39df ] Directly increment the TSO features incurs a side effect: it will also directly clear the flags in NETIF_F_ALL_FOR_ALL on the master device, which can cause issues such as the inability to enable the nocache copy feature on the bonding driver. The fix is to include NETIF_F_ALL_FOR_ALL in the update mask, thereby preventing it from being cleared. Fixes: `b0ce3508b2` ("bonding: allow TSO being set on bonding master") Signed-off-by: Di Zhu <zhud@hygon.cn> Link: https://patch.msgid.link/20251224012224.56185-1-zhud@hygon.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:58 +01:00
Weiming Shi	582a5e922a	net: sock: fix hardened usercopy panic in sock_recv_errqueue [ Upstream commit 2a71a1a8d0ed718b1c7a9ac61f07e5755c47ae20 ] skbuff_fclone_cache was created without defining a usercopy region, [1] unlike skbuff_head_cache which properly whitelists the cb[] field. [2] This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is enabled and the kernel attempts to copy sk_buff.cb data to userspace via sock_recv_errqueue() -> put_cmsg(). The crash occurs when: 1. TCP allocates an skb using alloc_skb_fclone() (from skbuff_fclone_cache) [1] 2. The skb is cloned via skb_clone() using the pre-allocated fclone [3] 3. The cloned skb is queued to sk_error_queue for timestamp reporting 4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE) 5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb [4] 6. __check_heap_object() fails because skbuff_fclone_cache has no usercopy whitelist [5] When cloned skbs allocated from skbuff_fclone_cache are used in the socket error queue, accessing the sock_exterr_skb structure in skb->cb via put_cmsg() triggers a usercopy hardening violation: [ 5.379589] usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_fclone_cache' (offset 296, size 16)! [ 5.382796] kernel BUG at mm/usercopy.c:102! [ 5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI [ 5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57 #7 [ 5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 5.384903] RIP: 0010:usercopy_abort+0x6c/0x80 [ 5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f> 0b 490 [ 5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246 [ 5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX: 1ffffffff0f72e74 [ 5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff87b973a0 [ 5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09: fffffbfff0f72e74 [ 5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12: 0000000000000001 [ 5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15: ffffea00003c2b00 [ 5.384903] FS: 0000000011bc4380(0000) GS:ffff8880bf100000(0000) knlGS:0000000000000000 [ 5.384903] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4: 0000000000770ef0 [ 5.384903] PKRU: 55555554 [ 5.384903] Call Trace: [ 5.384903] <TASK> [ 5.384903] __check_heap_object+0x9a/0xd0 [ 5.384903] __check_object_size+0x46c/0x690 [ 5.384903] put_cmsg+0x129/0x5e0 [ 5.384903] sock_recv_errqueue+0x22f/0x380 [ 5.384903] tls_sw_recvmsg+0x7ed/0x1960 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5.384903] ? schedule+0x6d/0x270 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5.384903] ? mutex_unlock+0x81/0xd0 [ 5.384903] ? __pfx_mutex_unlock+0x10/0x10 [ 5.384903] ? __pfx_tls_sw_recvmsg+0x10/0x10 [ 5.384903] ? _raw_spin_lock_irqsave+0x8f/0xf0 [ 5.384903] ? _raw_read_unlock_irqrestore+0x20/0x40 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 The crash offset 296 corresponds to skb2->cb within skbuff_fclones: - sizeof(struct sk_buff) = 232 - offsetof(struct sk_buff, cb) = 40 - offset of skb2.cb in fclones = 232 + 40 = 272 - crash offset 296 = 272 + 24 (inside sock_exterr_skb.ee) This patch uses a local stack variable as a bounce buffer to avoid the hardened usercopy check failure. [1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885 [2] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104 [3] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566 [4] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491 [5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719 Fixes: `6d07d1cd30` ("usercopy: Restrict non-usercopy caches to size 0") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251223203534.1392218-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:58 +01:00
yuan.gao	874794fb4f	inet: ping: Fix icmp out counting [ Upstream commit 4c0856c225b39b1def6c9a6bc56faca79550da13 ] When the ping program uses an IPPROTO_ICMP socket to send ICMP_ECHO messages, ICMP_MIB_OUTMSGS is counted twice. ping_v4_sendmsg ping_v4_push_pending_frames ip_push_pending_frames ip_finish_skb __ip_make_skb icmp_out_count(net, icmp_type); // first count icmp_out_count(sock_net(sk), user_icmph.type); // second count However, when the ping program uses an IPPROTO_RAW socket, ICMP_MIB_OUTMSGS is counted correctly only once. Therefore, the first count should be removed. Fixes: `c319b4d76b` ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: yuan.gao <yuan.gao@ucloud.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251224063145.3615282-1-yuan.gao@ucloud.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:58 +01:00
Jerry Wu	2985712dc7	net: mscc: ocelot: Fix crash when adding interface under a lag [ Upstream commit 34f3ff52cb9fa7dbf04f5c734fcc4cb6ed5d1a95 ] Commit 15faa1f67ab4 ("lan966x: Fix crash when adding interface under a lag") fixed a similar issue in the lan966x driver caused by a NULL pointer dereference. The ocelot_set_aggr_pgids() function in the ocelot driver has similar logic and is susceptible to the same crash. This issue specifically affects the ocelot_vsc7514.c frontend, which leaves unused ports as NULL pointers. The felix_vsc9959.c frontend is unaffected as it uses the DSA framework which registers all ports. Fix this by checking if the port pointer is valid before accessing it. Fixes: `528d3f190c` ("net: mscc: ocelot: drop the use of the "lags" array") Signed-off-by: Jerry Wu <w.7erry@foxmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/tencent_75EF812B305E26B0869C673DD1160866C90A@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:58 +01:00
Alexandre Knecht	4c8facf028	bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress [ Upstream commit 3128df6be147768fe536986fbb85db1d37806a9f ] When using an 802.1ad bridge with vlan_tunnel, the C-VLAN tag is incorrectly stripped from frames during egress processing. br_handle_egress_vlan_tunnel() uses skb_vlan_pop() to remove the S-VLAN from hwaccel before VXLAN encapsulation. However, skb_vlan_pop() also moves any "next" VLAN from the payload into hwaccel: /* move next vlan tag to hw accel tag */ __skb_vlan_pop(skb, &vlan_tci); __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci); For QinQ frames where the C-VLAN sits in the payload, this moves it to hwaccel where it gets lost during VXLAN encapsulation. Fix by calling __vlan_hwaccel_clear_tag() directly, which clears only the hwaccel S-VLAN and leaves the payload untouched. This path is only taken when vlan_tunnel is enabled and tunnel_info is configured, so 802.1Q bridges are unaffected. Tested with 802.1ad bridge + VXLAN vlan_tunnel, verified C-VLAN preserved in VXLAN payload via tcpdump. Fixes: `11538d039a` ("bridge: vlan dst_metadata hooks in ingress and egress paths") Signed-off-by: Alexandre Knecht <knecht.alexandre@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20251228020057.2788865-1-knecht.alexandre@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:58 +01:00
Alok Tiwari	94e070cd50	net: marvell: prestera: fix NULL dereference on devlink_alloc() failure [ Upstream commit a428e0da1248c353557970848994f35fd3f005e2 ] devlink_alloc() may return NULL on allocation failure, but prestera_devlink_alloc() unconditionally calls devlink_priv() on the returned pointer. This leads to a NULL pointer dereference if devlink allocation fails. Add a check for a NULL devlink pointer and return NULL early to avoid the crash. Fixes: `34dd1710f5` ("net: marvell: prestera: Add basic devlink support") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Acked-by: Elad Nachman <enachman@marvell.com> Link: https://patch.msgid.link/20251230052124.897012-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:57 +01:00
Fernando Fernandez Mancera	3cd717359e	netfilter: nf_conncount: update last_gc only when GC has been performed [ Upstream commit 7811ba452402d58628e68faedf38745b3d485e3c ] Currently last_gc is being updated everytime a new connection is tracked, that means that it is updated even if a GC wasn't performed. With a sufficiently high packet rate, it is possible to always bypass the GC, causing the list to grow infinitely. Update the last_gc value only when a GC has been actually performed. Fixes: `d265929930` ("netfilter: nf_conncount: reduce unnecessary GC") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:57 +01:00
Zilin Guan	c6cfd76700	netfilter: nf_tables: fix memory leak in nf_tables_newrule() [ Upstream commit d077e8119ddbb4fca67540f1a52453631a47f221 ] In nf_tables_newrule(), if nft_use_inc() fails, the function jumps to the err_release_rule label without freeing the allocated flow, leading to a memory leak. Fix this by adding a new label err_destroy_flow and jumping to it when nft_use_inc() fails. This ensures that the flow is properly released in this error case. Fixes: `1689f25924` ("netfilter: nf_tables: report use refcount overflow") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:57 +01:00
Ernest Van Hoecke	76f4218bda	gpio: pca953x: handle short interrupt pulses on PCAL devices [ Upstream commit 014a17deb41201449f76df2b20c857a9c3294a7c ] GPIO drivers with latch input support may miss short pulses on input pins even when input latching is enabled. The generic interrupt logic in the pca953x driver reports interrupts by comparing the current input value against the previously sampled one and only signals an event when a level change is observed between two reads. For short pulses, the first edge is captured when the input register is read, but if the signal returns to its previous level before the read, the second edge is not observed. As a result, successive pulses can produce identical input values at read time and no level change is detected, causing interrupts to be missed. Below timing diagram shows this situation where the top signal is the input pin level and the bottom signal indicates the latched value. ─────┐ ┌─────────────────┐ ┌───────────────────┐ ┌───── │ │ . │ │ . │ │ . │ │ │ │ │ │ │ │ │ └────┘ │ └────┘ │ └────┘ │ Input │ │ │ │ │ │ ▼ │ ▼ │ ▼ │ IRQ │ IRQ │ IRQ │ . . . ─────┐ .┌──────────────┐ .┌────────────────┐ .┌── │ │ │ │ │ │ │ │ │ │ │ │ └────────┘ └────────┘ └────────┘ Latched │ │ │ ▼ ▼ ▼ READ 0 READ 0 READ 0 NO CHANGE NO CHANGE PCAL variants provide an interrupt status register that records which pins triggered an interrupt, but the status and input registers cannot be read atomically. The interrupt status is only cleared when the input port is read, and the input value must also be read to determine the triggering edge. If another interrupt occurs on a different line after the status register has been read but before the input register is sampled, that event will not be reflected in the earlier status snapshot, so relying solely on the interrupt status register is also insufficient. Support for input latching and interrupt status handling was previously added by [1], but the interrupt status-based logic was reverted by [2] due to these issues. This patch addresses the original problem by combining both sources of information. Events indicated by the interrupt status register are merged with events detected through the existing level-change logic. As a result: short pulses, whose second edges are invisible, are detected via the interrupt status register, and * interrupts that occur between the status and input reads are still caught by the generic level-change logic. This significantly improves robustness on devices that signal interrupts as short pulses, while avoiding the issues that led to the earlier reversion. In practice, even if only the first edge of a pulse is observable, the interrupt is reliably detected. This fixes missed interrupts from an Ilitek touch controller with its interrupt line connected to a PCAL6416A, where active-low pulses are approximately 200 us long. [1] commit `44896beae6` ("gpio: pca953x: add PCAL9535 interrupt support for Galileo Gen2") [2] commit d6179f6c6204 ("gpio: pca953x: Improve interrupt support") Fixes: d6179f6c6204 ("gpio: pca953x: Improve interrupt support") Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20251217153050.142057-1-ernestvanhoecke@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:57 +01:00
Potin Lai	4d7652d1a3	gpio: pca953x: Add support for level-triggered interrupts [ Upstream commit 417b0f8d08f878615de9481c6e8827fbc8b57ed2 ] Adds support for level-triggered interrupts in the PCA953x GPIO expander driver. Previously, the driver only supported edge-triggered interrupts, which could lead to missed events in scenarios where an interrupt condition persists until it is explicitly cleared. By enabling level-triggered interrupts, the driver can now detect and respond to sustained interrupt conditions more reliably. Signed-off-by: Potin Lai <potin.lai.pt@gmail.com> Link: https://lore.kernel.org/r/20250409-gpio-pca953x-level-triggered-irq-v3-1-7f184d814934@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Stable-dep-of: 014a17deb412 ("gpio: pca953x: handle short interrupt pulses on PCAL devices") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:57 +01:00
Andy Shevchenko	26f64b3ee5	gpio: pca953x: Utilise temporary variable for struct device [ Upstream commit 6811886ac91eb414b1b74920e05e6590c3f2a688 ] We have a temporary variable to keep pointer to struct device. Utilise it where it makes sense. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Stable-dep-of: 014a17deb412 ("gpio: pca953x: handle short interrupt pulses on PCAL devices") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:57 +01:00
Andy Shevchenko	2a968d1fd7	gpio: pca953x: Utilise dev_err_probe() where it makes sense [ Upstream commit c47f7ff0fe61738a40b1b4fef3cd8317ec314079 ] At least in pca953x_irq_setup() we may use dev_err_probe(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Stable-dep-of: 014a17deb412 ("gpio: pca953x: handle short interrupt pulses on PCAL devices") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:57 +01:00
Fernando Fernandez Mancera	62ecdf65b8	netfilter: nft_synproxy: avoid possible data-race on update operation [ Upstream commit 36a3200575642846a96436d503d46544533bb943 ] During nft_synproxy eval we are reading nf_synproxy_info struct which can be modified on update operation concurrently. As nf_synproxy_info struct fits in 32 bits, use READ_ONCE/WRITE_ONCE annotations. Fixes: `ee394f96ad` ("netfilter: nft_synproxy: add synproxy stateful object support") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:57 +01:00
Marek Vasut	51ea246778	arm64: dts: imx8mp: Fix LAN8740Ai PHY reference clock on DH electronics i.MX8M Plus DHCOM [ Upstream commit c63749a7ddc59ac6ec0b05abfa0a21af9f2c1d38 ] Add missing 'clocks' property to LAN8740Ai PHY node, to allow the PHY driver to manage LAN8740Ai CLKIN reference clock supply. This fixes sporadic link bouncing caused by interruptions on the PHY reference clock, by letting the PHY driver manage the reference clock and assure there are no interruptions. This follows the matching PHY driver recommendation described in commit `bedd8d78ab` ("net: phy: smsc: LAN8710/20: add phy refclk in support") Fixes: `8d6712695b` ("arm64: dts: imx8mp: Add support for DH electronics i.MX8M Plus DHCOM and PDK2") Signed-off-by: Marek Vasut <marek.vasut@mailbox.org> Tested-by: Christoph Niedermaier <cniedermaier@dh-electronics.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:56 +01:00
Ian Ray	b6600d9d89	ARM: dts: imx6q-ba16: fix RTC interrupt level [ Upstream commit e6a4eedd49ce27c16a80506c66a04707e0ee0116 ] RTC interrupt level should be set to "LOW". This was revealed by the introduction of commit: `f181987ef4` ("rtc: m41t80: use IRQ flags obtained from fwnode") which changed the way IRQ type is obtained. Fixes: `56c27310c1` ("ARM: dts: imx: Add Advantech BA-16 Qseven module") Signed-off-by: Ian Ray <ian.ray@gehealthcare.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:56 +01:00
Haibo Chen	837fe3df68	arm64: dts: add off-on-delay-us for usdhc2 regulator [ Upstream commit ca643894a37a25713029b36cfe7d1bae515cac08 ] For SD card, according to the spec requirement, for sd card power reset operation, it need sd card supply voltage to be lower than 0.5v and keep over 1ms, otherwise, next time power back the sd card supply voltage to 3.3v, sd card can't support SD3.0 mode again. To match such requirement on imx8qm-mek board, add 4.8ms delay between sd power off and power on. Fixes: `307fd14d4b` ("arm64: dts: imx: add imx8qm mek support") Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:56 +01:00
Xingui Yang	c6f7b3cf44	scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" [ Upstream commit 278712d20bc8ec29d1ad6ef9bdae9000ef2c220c ] This reverts commit ab2068a6fb84751836a84c26ca72b3beb349619d. When probing the exp-attached sata device, libsas/libata will issue a hard reset in sas_probe_sata() -> ata_sas_async_probe(), then a broadcast event will be received after the disk probe fails, and this commit causes the probe will be re-executed on the disk, and a faulty disk may get into an indefinite loop of probe. Therefore, revert this commit, although it can fix some temporary issues with disk probe failure. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20251202065627.140361-1-yangxingui@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:56 +01:00
Brian Kao	cf73e6020b	scsi: ufs: core: Fix EH failure after W-LUN resume error [ Upstream commit b4bb6daf4ac4d4560044ecdd81e93aa2f6acbb06 ] When a W-LUN resume fails, its parent devices in the SCSI hierarchy, including the scsi_target, may be runtime suspended. Subsequently, the error handler in ufshcd_recover_pm_error() fails to set the W-LUN device back to active because the parent target is not active. This results in the following errors: google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ... ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000 ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5 ... ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active Address this by: 1. Ensuring the W-LUN's parent scsi_target is runtime resumed before attempting to set the W-LUN to active within ufshcd_recover_pm_error(). 2. Explicitly checking for power.runtime_error on the HBA and W-LUN devices before calling pm_runtime_set_active() to clear the error state. 3. Adding pm_runtime_get_sync(hba->dev) in ufshcd_err_handling_prepare() to ensure the HBA itself is active during error recovery, even if a child device resume failed. These changes ensure the device power states are managed correctly during error recovery. Signed-off-by: Brian Kao <powenkao@google.com> Tested-by: Brian Kao <powenkao@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251112063214.1195761-1-powenkao@google.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:56 +01:00
Wen Xiong	ac01c92333	scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset [ Upstream commit 6ac3484fb13b2fc7f31cfc7f56093e7d0ce646a5 ] A dynamic remove/add storage adapter test hits EEH on PowerPC: EEH: [c00000000004f75c] __eeh_send_failure_event+0x7c/0x160 EEH: [c000000000048444] eeh_dev_check_failure.part.0+0x254/0x650 EEH: [c008000001650678] eeh_readl+0x60/0x90 [ipr] EEH: [c00800000166746c] ipr_cancel_op+0x2b8/0x524 [ipr] EEH: [c008000001656524] ipr_eh_abort+0x6c/0x130 [ipr] EEH: [c000000000ab0d20] scmd_eh_abort_handler+0x140/0x440 EEH: [c00000000017e558] process_one_work+0x298/0x590 EEH: [c00000000017eef8] worker_thread+0xa8/0x620 EEH: [c00000000018be34] kthread+0x124/0x130 EEH: [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 A PCIe bus trace reveals that a vector of MSI-X is cleared to 0 by irqbalance daemon. If we disable irqbalance daemon, we won't see the issue. With debug enabled in ipr driver: [ 44.103071] ipr: Entering __ipr_remove [ 44.103083] ipr: Entering ipr_initiate_ioa_bringdown [ 44.103091] ipr: Entering ipr_reset_shutdown_ioa [ 44.103099] ipr: Leaving ipr_reset_shutdown_ioa [ 44.103105] ipr: Leaving ipr_initiate_ioa_bringdown [ 44.149918] ipr: Entering ipr_reset_ucode_download [ 44.149935] ipr: Entering ipr_reset_alert [ 44.150032] ipr: Entering ipr_reset_start_timer [ 44.150038] ipr: Leaving ipr_reset_alert [ 44.244343] scsi 1:2:3:0: alua: Detached [ 44.254300] ipr: Entering ipr_reset_start_bist [ 44.254320] ipr: Entering ipr_reset_start_timer [ 44.254325] ipr: Leaving ipr_reset_start_bist [ 44.364329] scsi 1:2:4:0: alua: Detached [ 45.134341] scsi 1:2:5:0: alua: Detached [ 45.860949] ipr: Entering ipr_reset_shutdown_ioa [ 45.860962] ipr: Leaving ipr_reset_shutdown_ioa [ 45.860966] ipr: Entering ipr_reset_alert [ 45.861028] ipr: Entering ipr_reset_start_timer [ 45.861035] ipr: Leaving ipr_reset_alert [ 45.964302] ipr: Entering ipr_reset_start_bist [ 45.964309] ipr: Entering ipr_reset_start_timer [ 45.964313] ipr: Leaving ipr_reset_start_bist [ 46.264301] ipr: Entering ipr_reset_bist_done [ 46.264309] ipr: Leaving ipr_reset_bist_done During adapter reset, ipr device driver blocks config space access but can't block MMIO access for MSI-X entries. There is very small window: irqbalance daemon kicks in during adapter reset before ipr driver calls pci_restore_state(pdev) to restore MSI-X table. irqbalance daemon reads back all 0 for that MSI-X vector in __pci_read_msi_msg(). irqbalance daemon: msi_domain_set_affinity() ->irq_chip_set_affinity_patent() ->xive_irq_set_affinity() ->irq_chip_compose_msi_msg() ->pseries_msi_compose_msg() ->__pci_read_msi_msg(): read all 0 since didn't call pci_restore_state ->irq_chip_write_msi_msg() -> pci_write_msg_msi(): write 0 to the msix vector entry When ipr driver calls pci_restore_state(pdev) in ipr_reset_restore_cfg_space(), the MSI-X vector entry has been cleared by irqbalance daemon in pci_write_msg_msix(). pci_restore_state() ->__pci_restore_msix_state() Below is the MSI-X table for ipr adapter after irqbalance daemon kicked in during adapter reset: Dump MSIx table: index=0 address_lo=c800 address_hi=10000000 msg_data=0 Dump MSIx table: index=1 address_lo=c810 address_hi=10000000 msg_data=0 Dump MSIx table: index=2 address_lo=c820 address_hi=10000000 msg_data=0 Dump MSIx table: index=3 address_lo=c830 address_hi=10000000 msg_data=0 Dump MSIx table: index=4 address_lo=c840 address_hi=10000000 msg_data=0 Dump MSIx table: index=5 address_lo=c850 address_hi=10000000 msg_data=0 Dump MSIx table: index=6 address_lo=c860 address_hi=10000000 msg_data=0 Dump MSIx table: index=7 address_lo=c870 address_hi=10000000 msg_data=0 Dump MSIx table: index=8 address_lo=0 address_hi=0 msg_data=0 ---------> Hit EEH since msix vector of index=8 are 0 Dump MSIx table: index=9 address_lo=c890 address_hi=10000000 msg_data=0 Dump MSIx table: index=10 address_lo=c8a0 address_hi=10000000 msg_data=0 Dump MSIx table: index=11 address_lo=c8b0 address_hi=10000000 msg_data=0 Dump MSIx table: index=12 address_lo=c8c0 address_hi=10000000 msg_data=0 Dump MSIx table: index=13 address_lo=c8d0 address_hi=10000000 msg_data=0 Dump MSIx table: index=14 address_lo=c8e0 address_hi=10000000 msg_data=0 Dump MSIx table: index=15 address_lo=c8f0 address_hi=10000000 msg_data=0 [ 46.264312] ipr: Entering ipr_reset_restore_cfg_space [ 46.267439] ipr: Entering ipr_fail_all_ops [ 46.267447] ipr: Leaving ipr_fail_all_ops [ 46.267451] ipr: Leaving ipr_reset_restore_cfg_space [ 46.267454] ipr: Entering ipr_ioa_bringdown_done [ 46.267458] ipr: Leaving ipr_ioa_bringdown_done [ 46.267467] ipr: Entering ipr_worker_thread [ 46.267470] ipr: Leaving ipr_worker_thread IRQ balancing is not required during adapter reset. Enable "IRQ_NO_BALANCING" flag before starting adapter reset and disable it after calling pci_restore_state(). The irqbalance daemon is disabled for this short period of time (~2s). Co-developed-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com> Signed-off-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com> Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com> Link: https://patch.msgid.link/20251028142427.3969819-2-wenxiong@linux.ibm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:56 +01:00
ChenXiaoSong	afd993baba	smb/client: fix NT_STATUS_NO_DATA_DETECTED value [ Upstream commit a1237c203f1757480dc2f3b930608ee00072d3cc ] This was reported by the KUnit tests in the later patches. See MS-ERREF 2.3.1 STATUS_NO_DATA_DETECTED. Keep it consistent with the value in the documentation. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:56 +01:00
ChenXiaoSong	bbbc1a48f1	smb/client: fix NT_STATUS_DEVICE_DOOR_OPEN value [ Upstream commit b2b50fca34da5ec231008edba798ddf92986bd7f ] This was reported by the KUnit tests in the later patches. See MS-ERREF 2.3.1 STATUS_DEVICE_DOOR_OPEN. Keep it consistent with the value in the documentation. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:56 +01:00
ChenXiaoSong	ca9b4aaa7e	smb/client: fix NT_STATUS_UNABLE_TO_FREE_VM value [ Upstream commit 9f99caa8950a76f560a90074e3a4b93cfa8b3d84 ] This was reported by the KUnit tests in the later patches. See MS-ERREF 2.3.1 STATUS_UNABLE_TO_FREE_VM. Keep it consistent with the value in the documentation. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:55 +01:00
Trond Myklebust	f18975f2cd	NFS: Fix up the automount fs_context to use the correct cred [ Upstream commit a2a8fc27dd668e7562b5326b5ed2f1604cb1e2e9 ] When automounting, the fs_context should be fixed up to use the cred from the parent filesystem, since the operation is just extending the namespace. Authorisation to enter that namespace will already have been provided by the preceding lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:55 +01:00
Scott Mayhew	f719a300ea	NFSv4: ensure the open stateid seqid doesn't go backwards [ Upstream commit 2e47c3cc64b44b0b06cd68c2801db92ff143f2b2 ] We have observed an NFSv4 client receiving a LOCK reply with a status of NFS4ERR_OLD_STATEID and subsequently retrying the LOCK request with an earlier seqid value in the stateid. As this was for a new lockowner, that would imply that nfs_set_open_stateid_locked() had updated the open stateid seqid with an earlier value. Looking at nfs_set_open_stateid_locked(), if the incoming seqid is out of sequence, the task will sleep on the state->waitq for up to 5 seconds. If the task waits for the full 5 seconds, then after finishing the wait it'll update the open stateid seqid with whatever value the incoming seqid has. If there are multiple waiters in this scenario, then the last one to perform said update may not be the one with the highest seqid. Add a check to ensure that the seqid can only be incremented, and add a tracepoint to indicate when old seqids are skipped. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:55 +01:00
Mikulas Patocka	af4fc583fd	dm-snapshot: fix 'scheduling while atomic' on real-time kernels [ Upstream commit 8581b19eb2c5ccf06c195d3b5468c3c9d17a5020 ] There is reported 'scheduling while atomic' bug when using dm-snapshot on real-time kernels. The reason for the bug is that the hlist_bl code does preempt_disable() when taking the lock and the kernel attempts to take other spinlocks while holding the hlist_bl lock. Fix this by converting a hlist_bl spinlock into a regular spinlock. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Jiping Ma <jiping.ma2@windriver.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:55 +01:00
Sam James	fc220dae3c	alpha: don't reference obsolete termio struct for TC* constants [ Upstream commit 9aeed9041929812a10a6d693af050846942a1d16 ] Similar in nature to ab107276607af90b13a5994997e19b7b9731e251. glibc-2.42 drops the legacy termio struct, but the ioctls.h header still defines some TC* constants in terms of termio (via sizeof). Hardcode the values instead. This fixes building Python for example, which falls over like: ./Modules/termios.c:1119:16: error: invalid application of 'sizeof' to incomplete type 'struct termio' Link: https://bugs.gentoo.org/961769 Link: https://bugs.gentoo.org/962600 Signed-off-by: Sam James <sam@gentoo.org> Reviewed-by: Magnus Lindholm <linmag7@gmail.com> Link: https://lore.kernel.org/r/6ebd3451908785cad53b50ca6bc46cfe9d6bc03c.1764922497.git.sam@gentoo.org Signed-off-by: Magnus Lindholm <linmag7@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:55 +01:00
Sebastian Andrzej Siewior	be3bc3d84a	ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels [ Upstream commit fedadc4137234c3d00c4785eeed3e747fe9036ae ] gup_pgd_range() is invoked with disabled interrupts and invokes __kmap_local_page_prot() via pte_offset_map(), gup_p4d_range(). With HIGHPTE enabled, __kmap_local_page_prot() invokes kmap_high_get() which uses a spinlock_t via lock_kmap_any(). This leads to an sleeping-while-atomic error on PREEMPT_RT because spinlock_t becomes a sleeping lock and must not be acquired in atomic context. The loop in map_new_virtual() uses wait_queue_head_t for wake up which also is using a spinlock_t. Since HIGHPTE is rarely needed at all, turn it off for PREEMPT_RT to allow the use of get_user_pages_fast(). [arnd: rework patch to turn off HIGHPTE instead of HAVE_PAST_GUP] Co-developed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:55 +01:00
Yang Li	78faf28333	csky: fix csky_cmpxchg_fixup not working [ Upstream commit 809ef03d6d21d5fea016bbf6babeec462e37e68c ] In the csky_cmpxchg_fixup function, it is incorrect to use the global variable csky_cmpxchg_stw to determine the address where the exception occurred.The global variable csky_cmpxchg_stw stores the opcode at the time of the exception, while &csky_cmpxchg_stw shows the address where the exception occurred. Signed-off-by: Yang Li <yang.li85200@gmail.com> Signed-off-by: Guo Ren <guoren@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:29:55 +01:00
Kuniyuki Iwashima	13159c7125	tls: Use __sk_dst_get() and dst_dev_rcu() in get_netdev_for_sock(). commit c65f27b9c3be2269918e1cbad6d8884741f835c5 upstream. get_netdev_for_sock() is called during setsockopt(), so not under RCU. Using sk_dst_get(sk)->dev could trigger UAF. Let's use __sk_dst_get() and dst_dev_rcu(). Note that the only ->ndo_sk_get_lower_dev() user is bond_sk_get_lower_dev(), which uses RCU. Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20250916214758.650211-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Keerthana: Backport to v6.6.y ] Signed-off-by: Keerthana K <keerthana.kalyanasundaram@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:55 +01:00
Chuck Lever	381261f24f	NFSD: NFSv4 file creation neglects setting ACL commit 913f7cf77bf14c13cfea70e89bcb6d0b22239562 upstream. An NFSv4 client that sets an ACL with a named principal during file creation retrieves the ACL afterwards, and finds that it is only a default ACL (based on the mode bits) and not the ACL that was requested during file creation. This violates RFC 8881 section 6.4.1.3: "the ACL attribute is set as given". The issue occurs in nfsd_create_setattr(), which calls nfsd_attrs_valid() to determine whether to call nfsd_setattr(). However, nfsd_attrs_valid() checks only for iattr changes and security labels, but not POSIX ACLs. When only an ACL is present, the function returns false, nfsd_setattr() is skipped, and the POSIX ACL is never applied to the inode. Subsequently, when the client retrieves the ACL, the server finds no POSIX ACL on the inode and returns one generated from the file's mode bits rather than returning the originally-specified ACL. Reported-by: Aurélien Couderc <aurelien.couderc2002@gmail.com> Fixes: `c0cbe70742` ("NFSD: add posix ACLs to struct nfsd_attrs") Cc: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:55 +01:00
Stephen Smalley	4c06b7cb87	nfsd: set security label during create operations commit 442d27ff09a218b61020ab56387dbc508ad6bfa6 upstream. When security labeling is enabled, the client can pass a file security label as part of a create operation for the new file, similar to mode and other attributes. At present, the security label is received by nfsd and passed down to nfsd_create_setattr(), but nfsd_setattr() is never called and therefore the label is never set on the new file. This bug may have been introduced on or around commit `d6a97d3f58` ("NFSD: add security label to struct nfsd_attrs"). Looking at nfsd_setattr() I am uncertain as to whether the same issue presents for file ACLs and therefore requires a similar fix for those. An alternative approach would be to introduce a new LSM hook to set the "create SID" of the current task prior to the actual file creation, which would atomically label the new inode at creation time. This would be better for SELinux and a similar approach has been used previously (see security_dentry_create_files_as) but perhaps not usable by other LSMs. Reproducer: 1. Install a Linux distro with SELinux - Fedora is easiest 2. git clone https://github.com/SELinuxProject/selinux-testsuite 3. Install the requisite dependencies per selinux-testsuite/README.md 4. Run something like the following script: MOUNT=$HOME/selinux-testsuite sudo systemctl start nfs-server sudo exportfs -o rw,no_root_squash,security_label localhost:$MOUNT sudo mkdir -p /mnt/selinux-testsuite sudo mount -t nfs -o vers=4.2 localhost:$MOUNT /mnt/selinux-testsuite pushd /mnt/selinux-testsuite/ sudo make -C policy load pushd tests/filesystem sudo runcon -t test_filesystem_t ./create_file -f trans_test_file \ -e test_filesystem_filetranscon_t -v sudo rm -f trans_test_file popd sudo make -C policy unload popd sudo umount /mnt/selinux-testsuite sudo exportfs -u localhost:$MOUNT sudo rmdir /mnt/selinux-testsuite sudo systemctl stop nfs-server Expected output: <eliding noise from commands run prior to or after the test itself> Process context: unconfined_u:unconfined_r:test_filesystem_t:s0-s0:c0.c1023 Created file: trans_test_file File context: unconfined_u:object_r:test_filesystem_filetranscon_t:s0 File context is correct Actual output: <eliding noise from commands run prior to or after the test itself> Process context: unconfined_u:unconfined_r:test_filesystem_t:s0-s0:c0.c1023 Created file: trans_test_file File context: system_u:object_r:test_file_t:s0 File context error, expected: test_filesystem_filetranscon_t got: test_file_t Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Stable-dep-of: 913f7cf77bf1 ("NFSD: NFSv4 file creation neglects setting ACL") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:54 +01:00
Trond Myklebust	d761a185f8	nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr() commit 24d92de9186ebc340687caf7356e1070773e67bc upstream. The main point of the guarded SETATTR is to prevent races with other WRITE and SETATTR calls. That requires that the check of the guard time against the inode ctime be done after taking the inode lock. Furthermore, we need to take into account the 32-bit nature of timestamps in NFSv3, and the possibility that files may change at a faster rate than once a second. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Stable-dep-of: 442d27ff09a2 ("nfsd: set security label during create operations") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:54 +01:00
Jeff Layton	ab2b575526	nfsd: convert to new timestamp accessors commit 11fec9b9fb04fd1b3330a3b91ab9dcfa81ad5ad3 upstream. Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-50-jlayton@kernel.org Stable-dep-of: 24d92de9186e ("nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()") Signed-off-by: Christian Brauner <brauner@kernel.org> [ cel: d68886bae76a has already been applied ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:54 +01:00
Sharath Chandra Vurukala	1310640f9a	net: Add locking to protect skb->dev access in ip_output commit 1dbf1d590d10a6d1978e8184f8dfe20af22d680a upstream. In ip_output() skb->dev is updated from the skb_dst(skb)->dev this can become invalid when the interface is unregistered and freed, Introduced new skb_dst_dev_rcu() function to be used instead of skb_dst_dev() within rcu_locks in ip_output.This will ensure that all the skb's associated with the dev being deregistered will be transnmitted out first, before freeing the dev. Given that ip_output() is called within an rcu_read_lock() critical section or from a bottom-half context, it is safe to introduce an RCU read-side critical section within it. Multiple panic call stacks were observed when UL traffic was run in concurrency with device deregistration from different functions, pasting one sample for reference. [496733.627565][T13385] Call trace: [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0 [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498 [496733.627595][T13385] ip_finish_output+0xa4/0xf4 [496733.627605][T13385] ip_output+0x100/0x1a0 [496733.627613][T13385] ip_send_skb+0x68/0x100 [496733.627618][T13385] udp_send_skb+0x1c4/0x384 [496733.627625][T13385] udp_sendmsg+0x7b0/0x898 [496733.627631][T13385] inet_sendmsg+0x5c/0x7c [496733.627639][T13385] __sys_sendto+0x174/0x1e4 [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c [496733.627653][T13385] invoke_syscall+0x58/0x11c [496733.627662][T13385] el0_svc_common+0x88/0xf4 [496733.627669][T13385] do_el0_svc+0x2c/0xb0 [496733.627676][T13385] el0_svc+0x2c/0xa4 [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4 [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8 Changes in v3: - Replaced WARN_ON() with WARN_ON_ONCE(), as suggested by Willem de Bruijn. - Dropped legacy lines mistakenly pulled in from an outdated branch. Changes in v2: - Addressed review comments from Eric Dumazet - Used READ_ONCE() to prevent potential load/store tearing - Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250730105118.GA26100@hu-sharathv-hyd.qualcomm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Keerthana: Backported the patch to v6.6.y ] Signed-off-by: Keerthana K <keerthana.kalyanasundaram@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:54 +01:00
Pedro Demarchi Gomes	9c2f8a9b68	ksm: use range-walk function to jump over holes in scan_get_next_rmap_item commit f5548c318d6520d4fa3c5ed6003eeb710763cbc5 upstream. Currently, scan_get_next_rmap_item() walks every page address in a VMA to locate mergeable pages. This becomes highly inefficient when scanning large virtual memory areas that contain mostly unmapped regions, causing ksmd to use large amount of cpu without deduplicating much pages. This patch replaces the per-address lookup with a range walk using walk_page_range(). The range walker allows KSM to skip over entire unmapped holes in a VMA, avoiding unnecessary lookups. This problem was previously discussed in [1]. Consider the following test program which creates a 32 TiB mapping in the virtual address space but only populates a single page: #include <unistd.h> #include <stdio.h> #include <sys/mman.h> /* 32 TiB / const size_t size = 32ul 1024 * 1024 * 1024 * 1024; int main() { char area = mmap(NULL, size, PROT_READ \| PROT_WRITE, MAP_NORESERVE \| MAP_PRIVATE \| MAP_ANON, -1, 0); if (area == MAP_FAILED) { perror("mmap() failed\n"); return -1; } / Populate a single page such that we get an anon_vma. / area = 0; /* Enable KSM. */ madvise(area, size, MADV_MERGEABLE); pause(); return 0; } $ ./ksm-sparse & $ echo 1 > /sys/kernel/mm/ksm/run Without this patch ksmd uses 100% of the cpu for a long time (more then 1 hour in my test machine) scanning all the 32 TiB virtual address space that contain only one mapped page. This makes ksmd essentially deadlocked not able to deduplicate anything of value. With this patch ksmd walks only the one mapped page and skips the rest of the 32 TiB virtual address space, making the scan fast using little cpu. Link: https://lkml.kernel.org/r/20251023035841.41406-1-pedrodemargomes@gmail.com Link: https://lkml.kernel.org/r/20251022153059.22763-1-pedrodemargomes@gmail.com Link: https://lore.kernel.org/linux-mm/423de7a3-1c62-4e72-8e79-19a6413e420c@redhat.com/ [1] Fixes: `31dbd01f31` ("ksm: Kernel SamePage Merging") Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com> Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: craftfever <craftfever@airmail.cc> Closes: https://lkml.kernel.org/r/020cf8de6e773bb78ba7614ef250129f11a63781@murena.io Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: xu xin <xu.xin16@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ change page to folios ] Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:54 +01:00
Ilya Dryomov	4ebc711b73	libceph: make calc_target() set t->paused, not just clear it commit c0fe2994f9a9d0a2ec9e42441ea5ba74b6a16176 upstream. Currently calc_target() clears t->paused if the request shouldn't be paused anymore, but doesn't ever set t->paused even though it's able to determine when the request should be paused. Setting t->paused is left to __submit_request() which is fine for regular requests but doesn't work for linger requests -- since __submit_request() doesn't operate on linger requests, there is nowhere for lreq->t.paused to be set. One consequence of this is that watches don't get reestablished on paused -> unpaused transitions in cases where requests have been paused long enough for the (paused) unwatch request to time out and for the subsequent (re)watch request to enter the paused state. On top of the watch not getting reestablished, rbd_reregister_watch() gets stuck with rbd_dev->watch_mutex held: rbd_register_watch __rbd_register_watch ceph_osdc_watch linger_reg_commit_wait It's waiting for lreq->reg_commit_wait to be completed, but for that to happen the respective request needs to end up on need_resend_linger list and be kicked when requests are unpaused. There is no chance for that if the request in question is never marked paused in the first place. The fact that rbd_dev->watch_mutex remains taken out forever then prevents the image from getting unmapped -- "rbd unmap" would inevitably hang in D state on an attempt to grab the mutex. Cc: stable@vger.kernel.org Reported-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:54 +01:00
Sam Edwards	90a60fe619	libceph: reset sparse-read state in osd_fault() commit 11194b416ef95012c2cfe5f546d71af07b639e93 upstream. When a fault occurs, the connection is abandoned, reestablished, and any pending operations are retried. The OSD client tracks the progress of a sparse-read reply using a separate state machine, largely independent of the messenger's state. If a connection is lost mid-payload or the sparse-read state machine returns an error, the sparse-read state is not reset. The OSD client will then interpret the beginning of a new reply as the continuation of the old one. If this makes the sparse-read machinery enter a failure state, it may never recover, producing loops like: libceph: [0] got 0 extents libceph: data len 142248331 != extent len 0 libceph: osd0 (1)...:6801 socket error on read libceph: data len 142248331 != extent len 0 libceph: osd0 (1)...:6801 socket error on read Therefore, reset the sparse-read state in osd_fault(), ensuring retries start from a clean state. Cc: stable@vger.kernel.org Fixes: `f628d79997` ("libceph: add sparse read support to OSD client") Signed-off-by: Sam Edwards <CFSworks@gmail.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:54 +01:00
Ilya Dryomov	e097cd8581	libceph: return the handler error from mon_handle_auth_done() commit e84b48d31b5008932c0a0902982809fbaa1d3b70 upstream. Currently any error from ceph_auth_handle_reply_done() is propagated via finish_auth() but isn't returned from mon_handle_auth_done(). This results in higher layers learning that (despite the monitor considering us to be successfully authenticated) something went wrong in the authentication phase and reacting accordingly, but msgr2 still trying to proceed with establishing the session in the background. In the case of secure mode this can trigger a WARN in setup_crypto() and later lead to a NULL pointer dereference inside of prepare_auth_signature(). Cc: stable@vger.kernel.org Fixes: `cd1a677cad` ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:54 +01:00
Tuo Li	8081faaf08	libceph: make free_choose_arg_map() resilient to partial allocation commit e3fe30e57649c551757a02e1cad073c47e1e075e upstream. free_choose_arg_map() may dereference a NULL pointer if its caller fails after a partial allocation. For example, in decode_choose_args(), if allocation of arg_map->args fails, execution jumps to the fail label and free_choose_arg_map() is called. Since arg_map->size is updated to a non-zero value before memory allocation, free_choose_arg_map() will iterate over arg_map->args and dereference a NULL pointer. To prevent this potential NULL pointer dereference and make free_choose_arg_map() more resilient, add checks for pointers before iterating. Cc: stable@vger.kernel.org Co-authored-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Tuo Li <islituo@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:53 +01:00
Ilya Dryomov	d3613770e2	libceph: replace overzealous BUG_ON in osdmap_apply_incremental() commit e00c3f71b5cf75681dbd74ee3f982a99cb690c2b upstream. If the osdmap is (maliciously) corrupted such that the incremental osdmap epoch is different from what is expected, there is no need to BUG. Instead, just declare the incremental osdmap to be invalid. Cc: stable@vger.kernel.org Reported-by: ziming zhang <ezrakiez@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:29:53 +01:00

1 2 3 4 5 ...

1239476 Commits