linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-07 03:15:31 +09:00

Author	SHA1	Message	Date
Ido Schimmel	f1ecccb5cd	mlxsw: spectrum_router: Fix use-after-free when deleting GRE net devices [ Upstream commit 92ec4855034b2c4d13f117558dc73d20581fa9ff ] The driver only offloads neighbors that are constructed on top of net devices registered by it or their uppers (which are all Ethernet). The device supports GRE encapsulation and decapsulation of forwarded traffic, but the driver will not offload dummy neighbors constructed on top of GRE net devices as they are not uppers of its net devices: # ip link add name gre1 up type gre tos inherit local 192.0.2.1 remote 198.51.100.1 # ip neigh add 0.0.0.0 lladdr 0.0.0.0 nud noarp dev gre1 $ ip neigh show dev gre1 nud noarp 0.0.0.0 lladdr 0.0.0.0 NOARP (Note that the neighbor is not marked with 'offload') When the driver is reloaded and the existing configuration is replayed, the driver does not perform the same check regarding existing neighbors and offloads the previously added one: # devlink dev reload pci/0000:01:00.0 $ ip neigh show dev gre1 nud noarp 0.0.0.0 lladdr 0.0.0.0 offload NOARP If the neighbor is later deleted, the driver will ignore the notification (given the GRE net device is not its upper) and will therefore keep referencing freed memory, resulting in a use-after-free [1] when the net device is deleted: # ip neigh del 0.0.0.0 lladdr 0.0.0.0 dev gre1 # ip link del dev gre1 Fix by skipping neighbor replay if the net device for which the replay is performed is not our upper. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_neigh_entry_update+0x1ea/0x200 Read of size 8 at addr ffff888155b0e420 by task ip/2282 [...] Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6f/0x350 print_report+0x108/0x205 kasan_report+0xdf/0x110 mlxsw_sp_neigh_entry_update+0x1ea/0x200 mlxsw_sp_router_rif_gone_sync+0x2a8/0x440 mlxsw_sp_rif_destroy+0x1e9/0x750 mlxsw_sp_netdevice_ipip_ol_event+0x3c9/0xdc0 mlxsw_sp_router_netdevice_event+0x3ac/0x15e0 notifier_call_chain+0xca/0x150 call_netdevice_notifiers_info+0x7f/0x100 unregister_netdevice_many_notify+0xc8c/0x1d90 rtnl_dellink+0x34e/0xa50 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x131/0x360 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 __sys_sendmsg+0x121/0x1b0 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: `8fdb09a767` ("mlxsw: spectrum_router: Replay neighbours when RIF is made") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/c53c02c904fde32dad484657be3b1477884e9ad6.1747225701.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:18 +02:00
Kees Cook	fde33ab3c0	wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_request [ Upstream commit 82bbe02b2500ef0a62053fe2eb84773fe31c5a0a ] Make sure that n_channels is set after allocating the struct cfg80211_registered_device::int_scan_req member. Seen with syzkaller: UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:1208:5 index 0 is out of range for type 'struct ieee80211_channel [] __counted_by(n_channels)' (aka 'struct ieee80211_channel []') This was missed in the initial conversions because I failed to locate the allocation likely due to the "sizeof(void *)" not matching the "channels" array type. Reported-by: syzbot+4bcdddd48bb6f0be0da1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/680fd171.050a0220.2b69d1.045e.GAE@google.com/ Fixes: `e3eac9f32e` ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by") Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/20250509184641.work.542-kees@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:18 +02:00
Hariprasad Kelam	72ee7af615	octeontx2-af: Fix CGX Receive counters [ Upstream commit bf449f35e77fd44017abf991fac1f9ab7705bbe0 ] Each CGX block supports 4 logical MACs (LMACS). Receive counters CGX_CMR_RX_STAT0-8 are per LMAC and CGX_CMR_RX_STAT9-12 are per CGX. Due a bug in previous patch, stale Per CGX counters values observed. Fixes: `66208910e5` ("octeontx2-af: Support to retrieve CGX LMAC stats") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20250513071554.728922-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:18 +02:00
Bo-Cun Chen	398989c780	net: ethernet: mtk_eth_soc: fix typo for declaration MT7988 ESW capability [ Upstream commit 1bdea6fad6fb985ff13828373c48e337c4e939f9 ] Since MTK_ESW_BIT is a bit number rather than a bitmap, it causes MTK_HAS_CAPS to produce incorrect results. This leads to the ETH driver not declaring MAC capabilities correctly for the MT7988 ESW. Fixes: `445eb6448e` ("net: ethernet: mtk_eth_soc: add basic support for MT7988 SoC") Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/b8b37f409d1280fad9c4d32521e6207f63cd3213.1747110258.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:17 +02:00
Subbaraya Sundeep	03c42d35c3	octeontx2-pf: macsec: Fix incorrect max transmit size in TX secy [ Upstream commit 865ab2461375e3a5a2526f91f9a9f17b8931bc9e ] MASCEC hardware block has a field called maximum transmit size for TX secy. Max packet size going out of MCS block has be programmed taking into account full packet size which has L2 header,SecTag and ICV. MACSEC offload driver is configuring max transmit size as macsec interface MTU which is incorrect. Say with 1500 MTU of real device, macsec interface created on top of real device will have MTU of 1468(1500 - (SecTag + ICV)). This is causing packets from macsec interface of size greater than or equal to 1468 are not getting transmitted out because driver programmed max transmit size as 1468 instead of 1514(1500 + ETH_HDR_LEN). Fixes: `c54ffc7360` ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747053756-4529-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:17 +02:00
Cosmin Tanislav	7bddac8603	regulator: max20086: fix invalid memory access [ Upstream commit 6b0cd72757c69bc2d45da42b41023e288d02e772 ] max20086_parse_regulators_dt() calls of_regulator_match() using an array of struct of_regulator_match allocated on the stack for the matches argument. of_regulator_match() calls devm_of_regulator_put_matches(), which calls devres_alloc() to allocate a struct devm_of_regulator_matches which will be de-allocated using devm_of_regulator_put_matches(). struct devm_of_regulator_matches is populated with the stack allocated matches array. If the device fails to probe, devm_of_regulator_put_matches() will be called and will try to call of_node_put() on that stack pointer, generating the following dmesg entries: max20086 6-0028: Failed to read DEVICE_ID reg: -121 kobject: '\xc0$\xa5\x03' (000000002cebcb7a): is not initialized, yet kobject_put() is being called. Followed by a stack trace matching the call flow described above. Switch to allocating the matches array using devm_kcalloc() to avoid accessing the stack pointer long after it's out of scope. This also has the advantage of allowing multiple max20086 to probe without overriding the data stored inside the global of_regulator_match. Fixes: `bfff546aae` ("regulator: Add MAX20086-MAX20089 driver") Signed-off-by: Cosmin Tanislav <demonsingur@gmail.com> Link: https://patch.msgid.link/20250508064947.2567255-1-demonsingur@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:17 +02:00
Abdun Nihaal	9c2d0899c6	qlcnic: fix memory leak in qlcnic_sriov_channel_cfg_cmd() [ Upstream commit 9d8a99c5a7c7f4f7eca2c168a4ec254409670035 ] In one of the error paths in qlcnic_sriov_channel_cfg_cmd(), the memory allocated in qlcnic_sriov_alloc_bc_mbx_args() for mailbox arguments is not freed. Fix that by jumping to the error path that frees them, by calling qlcnic_free_mbx_args(). This was found using static analysis. Fixes: `f197a7aa62` ("qlcnic: VF-PF communication channel implementation") Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250512044829.36400-1-abdun.nihaal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:17 +02:00
Carolina Jubran	b48a47e137	net/mlx5e: Disable MACsec offload for uplink representor profile [ Upstream commit 588431474eb7572e57a927fa8558c9ba2f8af143 ] MACsec offload is not supported in switchdev mode for uplink representors. When switching to the uplink representor profile, the MACsec offload feature must be cleared from the netdevice's features. If left enabled, attempts to add offloads result in a null pointer dereference, as the uplink representor does not support MACsec offload even though the feature bit remains set. Clear NETIF_F_HW_MACSEC in mlx5e_fix_uplink_rep_features(). Kernel log: Oops: general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f] CPU: 29 UID: 0 PID: 4714 Comm: ip Not tainted 6.14.0-rc4_for_upstream_debug_2025_03_02_17_35 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__mutex_lock+0x128/0x1dd0 Code: d0 7c 08 84 d2 0f 85 ad 15 00 00 8b 35 91 5c fe 03 85 f6 75 29 49 8d 7e 60 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 a6 15 00 00 4d 3b 76 60 0f 85 fd 0b 00 00 65 ff RSP: 0018:ffff888147a4f160 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000078 RBP: ffff888147a4f2e0 R08: ffffffffa05d2c19 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000018 R15: ffff888152de0000 FS: 00007f855e27d800(0000) GS:ffff88881ee80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004e5768 CR3: 000000013ae7c005 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 Call Trace: <TASK> ? die_addr+0x3d/0xa0 ? exc_general_protection+0x144/0x220 ? asm_exc_general_protection+0x22/0x30 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? __mutex_lock+0x128/0x1dd0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mutex_lock_io_nested+0x1ae0/0x1ae0 ? lock_acquire+0x1c2/0x530 ? macsec_upd_offload+0x145/0x380 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? kasan_save_stack+0x30/0x40 ? kasan_save_stack+0x20/0x40 ? kasan_save_track+0x10/0x30 ? __kasan_kmalloc+0x77/0x90 ? __kmalloc_noprof+0x249/0x6b0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0xb5/0x240 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mlx5e_macsec_add_rxsa+0x11a0/0x11a0 [mlx5_core] macsec_update_offload+0x26c/0x820 ? macsec_set_mac_address+0x4b0/0x4b0 ? lockdep_hardirqs_on_prepare+0x284/0x400 ? _raw_spin_unlock_irqrestore+0x47/0x50 macsec_upd_offload+0x2c8/0x380 ? macsec_update_offload+0x820/0x820 ? __nla_parse+0x22/0x30 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x15e/0x240 genl_family_rcv_msg_doit+0x1cc/0x2a0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? cap_capable+0xd4/0x330 genl_rcv_msg+0x3ea/0x670 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? macsec_update_offload+0x820/0x820 netlink_rcv_skb+0x12b/0x390 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? netlink_ack+0xd80/0xd80 ? rwsem_down_read_slowpath+0xf90/0xf90 ? netlink_deliver_tap+0xcd/0xac0 ? netlink_deliver_tap+0x155/0xac0 ? _copy_from_iter+0x1bb/0x12c0 genl_rcv+0x24/0x40 netlink_unicast+0x440/0x700 ? netlink_attachskb+0x760/0x760 ? lock_acquire+0x1c2/0x530 ? __might_fault+0xbb/0x170 netlink_sendmsg+0x749/0xc10 ? netlink_unicast+0x700/0x700 ? __might_fault+0xbb/0x170 ? netlink_unicast+0x700/0x700 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x53f/0x760 ? import_iovec+0x7/0x10 ? kernel_sendmsg+0x30/0x30 ? __copy_msghdr+0x3c0/0x3c0 ? filter_irq_stacks+0x90/0x90 ? stack_depot_save_flags+0x28/0xa30 ___sys_sendmsg+0xeb/0x170 ? kasan_save_stack+0x30/0x40 ? copy_msghdr_from_user+0x110/0x110 ? do_syscall_64+0x6d/0x140 ? lock_acquire+0x1c2/0x530 ? __virt_addr_valid+0x116/0x3b0 ? __virt_addr_valid+0x1da/0x3b0 ? lock_downgrade+0x680/0x680 ? __delete_object+0x21/0x50 __sys_sendmsg+0xf7/0x180 ? __sys_sendmsg_sock+0x20/0x20 ? kmem_cache_free+0x14c/0x4e0 ? __x64_sys_close+0x78/0xd0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f855e113367 Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffd15e90c88 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f855e113367 RDX: 0000000000000000 RSI: 00007ffd15e90cf0 RDI: 0000000000000004 RBP: 00007ffd15e90dbc R08: 0000000000000028 R09: 000000000045d100 R10: 00007f855e011dd8 R11: 0000000000000246 R12: 0000000000000019 R13: 0000000067c6b785 R14: 00000000004a1e80 R15: 0000000000000000 </TASK> Modules linked in: 8021q garp mrp sch_ingress openvswitch nsh mlx5_ib mlx5_fwctl mlx5_dpll mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]--- Fixes: `8ff0ac5be1` ("net/mlx5: Add MACsec offload Tx command support") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1746958552-561295-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:17 +02:00
Geert Uytterhoeven	4a7d4031fa	ALSA: sh: SND_AICA should depend on SH_DMA_API [ Upstream commit 66e48ef6ef506c89ec1b3851c6f9f5f80b5835ff ] If CONFIG_SH_DMA_API=n: WARNING: unmet direct dependencies detected for G2_DMA Depends on [n]: SH_DREAMCAST [=y] && SH_DMA_API [=n] Selected by [y]: - SND_AICA [=y] && SOUND [=y] && SND [=y] && SND_SUPERH [=y] && SH_DREAMCAST [=y] SND_AICA selects G2_DMA. As the latter depends on SH_DMA_API, the former should depend on SH_DMA_API, too. Fixes: `f477a538c1` ("sh: dma: fix kconfig dependency for G2_DMA") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505131320.PzgTtl9H-lkp@intel.com/ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/b90625f8a9078d0d304bafe862cbe3a3fab40082.1747121335.git.geert+renesas@glider.be Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:17 +02:00
Keith Busch	c70c021be0	nvme-pci: acquire cq_poll_lock in nvme_poll_irqdisable [ Upstream commit 3d8932133dcecbd9bef1559533c1089601006f45 ] We need to lock this queue for that condition because the timeout work executes per-namespace and can poll the poll CQ. Reported-by: Hannes Reinecke <hare@kernel.org> Closes: https://lore.kernel.org/all/20240902130728.1999-1-hare@kernel.org/ Fixes: `a0fa9647a5` ("NVMe: add blk polling support") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:17 +02:00
Kees Cook	64638f3103	nvme-pci: make nvme_pci_npages_prp() __always_inline [ Upstream commit 40696426b8c8c4f13cf6ac52f0470eed144be4b2 ] The only reason nvme_pci_npages_prp() could be used as a compile-time known result in BUILD_BUG_ON() is because the compiler was always choosing to inline the function. Under special circumstances (sanitizer coverage functions disabled for __init functions on ARCH=um), the compiler decided to stop inlining it: drivers/nvme/host/pci.c: In function 'nvme_init': include/linux/compiler_types.h:557:45: error: call to '__compiletime_assert_678' declared with attribute error: BUILD_BUG_ON failed: nvme_pci_npages_prp() > NVME_MAX_NR_ALLOCATIONS 557 \| _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) \| ^ include/linux/compiler_types.h:538:25: note: in definition of macro '__compiletime_assert' 538 \| prefix ## suffix(); \ \| ^~~~~~ include/linux/compiler_types.h:557:9: note: in expansion of macro '_compiletime_assert' 557 \| _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) \| ^~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert' 39 \| #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) \| ^~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG' 50 \| BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) \| ^~~~~~~~~~~~~~~~ drivers/nvme/host/pci.c:3804:9: note: in expansion of macro 'BUILD_BUG_ON' 3804 \| BUILD_BUG_ON(nvme_pci_npages_prp() > NVME_MAX_NR_ALLOCATIONS); \| ^~~~~~~~~~~~ Force it to be __always_inline to make sure it is always available for use with BUILD_BUG_ON(). Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505061846.12FMyRjj-lkp@intel.com/ Fixes: `c372cdd1ef` ("nvme-pci: iod npages fits in s8") Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:17 +02:00
Vladimir Oltean	cbe3f04451	net: dsa: sja1105: discard incoming frames in BR_STATE_LISTENING [ Upstream commit 498625a8ab2c8e1c9ab5105744310e8d6952cc01 ] It has been reported that when under a bridge with stp_state=1, the logs get spammed with this message: [ 251.734607] fsl_dpaa2_eth dpni.5 eth0: Couldn't decode source port Further debugging shows the following info associated with packets: source_port=-1, switch_id=-1, vid=-1, vbid=1 In other words, they are data plane packets which are supposed to be decoded by dsa_tag_8021q_find_port_by_vbid(), but the latter (correctly) refuses to do so, because no switch port is currently in BR_STATE_LEARNING or BR_STATE_FORWARDING - so the packet is effectively unexpected. The error goes away after the port progresses to BR_STATE_LEARNING in 15 seconds (the default forward_time of the bridge), because then, dsa_tag_8021q_find_port_by_vbid() can correctly associate the data plane packets with a plausible bridge port in a plausible STP state. Re-reading IEEE 802.1D-1990, I see the following: "4.4.2 Learning: (...) The Forwarding Process shall discard received frames." IEEE 802.1D-2004 further clarifies: "DISABLED, BLOCKING, LISTENING, and BROKEN all correspond to the DISCARDING port state. While those dot1dStpPortStates serve to distinguish reasons for discarding frames, the operation of the Forwarding and Learning processes is the same for all of them. (...) LISTENING represents a port that the spanning tree algorithm has selected to be part of the active topology (computing a Root Port or Designated Port role) but is temporarily discarding frames to guard against loops or incorrect learning." Well, this is not what the driver does - instead it sets mac[port].ingress = true. To get rid of the log spam, prevent unexpected data plane packets to be received by software by discarding them on ingress in the LISTENING state. In terms of blame attribution: the prints only date back to commit `d7f9787a76` ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID"). However, the settings would permit a LISTENING port to forward to a FORWARDING port, and the standard suggests that's not OK. Fixes: `640f763f98` ("net: dsa: sja1105: Add support for Spanning Tree Protocol") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250509113816.2221992-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:16 +02:00
Mathieu Othacehe	1d60c0781c	net: cadence: macb: Fix a possible deadlock in macb_halt_tx. [ Upstream commit c92d6089d8ad7d4d815ebcedee3f3907b539ff1f ] There is a situation where after THALT is set high, TGO stays high as well. Because jiffies are never updated, as we are in a context with interrupts disabled, we never exit that loop and have a deadlock. That deadlock was noticed on a sama5d4 device that stayed locked for days. Use retries instead of jiffies so that the timeout really works and we do not have a deadlock anymore. Fixes: `e86cd53afc` ("net/macb: better manage tx errors") Signed-off-by: Mathieu Othacehe <othacehe@gnu.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250509121935.16282-1-othacehe@gnu.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:16 +02:00
Takashi Iwai	4e132f56bc	ALSA: ump: Fix a typo of snd_ump_stream_msg_device_info [ Upstream commit dd33993a9721ab1dae38bd37c9f665987d554239 ] s/devince/device/ It's used only internally, so no any behavior changes. Fixes: `37e0e14128` ("ALSA: ump: Support UMP Endpoint and Function Block parsing") Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20250511141147.10246-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:16 +02:00
Takashi Iwai	8a8dc7fd1e	ALSA: seq: Fix delivery of UMP events to group ports [ Upstream commit ff7b190aef6cccdb6f14d20c5753081fe6420e0b ] When an event with UMP message is sent to a UMP client, the EP port receives always no matter where the event is sent to, as it's a catch-all port. OTOH, if an event is sent to EP port, and if the event has a certain UMP Group, it should have been delivered to the associated UMP Group port, too, but this was ignored, so far. This patch addresses the behavior. Now a UMP event sent to the Endpoint port will be delivered to the subscribers of the UMP group port the event is associated with. The patch also does a bit of refactoring to simplify the code about __deliver_to_subscribers(). Fixes: `177ccf811d` ("ALSA: seq: Support MIDI 2.0 UMP Endpoint port") Link: https://patch.msgid.link/20250511134528.6314-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:16 +02:00
Andrew Jeffery	e4d8a51732	net: mctp: Ensure keys maintain only one ref to corresponding dev [ Upstream commit e4f349bd6e58051df698b82f94721f18a02a293d ] mctp_flow_prepare_output() is called in mctp_route_output(), which places outbound packets onto a given interface. The packet may represent a message fragment, in which case we provoke an unbalanced reference count to the underlying device. This causes trouble if we ever attempt to remove the interface: [ 48.702195] usb 1-1: USB disconnect, device number 2 [ 58.883056] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2 [ 69.022548] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2 [ 79.172568] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2 ... Predicate the invocation of mctp_dev_set_key() in mctp_flow_prepare_output() on not already having associated the device with the key. It's not yet realistic to uphold the property that the key maintains only one device reference earlier in the transmission sequence as the route (and therefore the device) may not be known at the time the key is associated with the socket. Fixes: `67737c4572` ("mctp: Pass flow data & flow release events to drivers") Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au> Link: https://patch.msgid.link/20250508-mctp-dev-refcount-v1-1-d4f965c67bb5@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:16 +02:00
Matt Johnston	acab78ae12	net: mctp: Don't access ifa_index when missing [ Upstream commit f11cf946c0a92c560a890d68e4775723353599e1 ] In mctp_dump_addrinfo, ifa_index can be used to filter interfaces, but only when the struct ifaddrmsg is provided. Otherwise it will be comparing to uninitialised memory - reproducible in the syzkaller case from dhcpd, or busybox "ip addr show". The kernel MCTP implementation has always filtered by ifa_index, so existing userspace programs expecting to dump MCTP addresses must already be passing a valid ifa_index value (either 0 or a real index). BUG: KMSAN: uninit-value in mctp_dump_addrinfo+0x208/0xac0 net/mctp/device.c:128 mctp_dump_addrinfo+0x208/0xac0 net/mctp/device.c:128 rtnl_dump_all+0x3ec/0x5b0 net/core/rtnetlink.c:4380 rtnl_dumpit+0xd5/0x2f0 net/core/rtnetlink.c:6824 netlink_dump+0x97b/0x1690 net/netlink/af_netlink.c:2309 Fixes: `583be982d9` ("mctp: Add device handling and netlink interface") Reported-by: syzbot+e76d52dadc089b9d197f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68135815.050a0220.3a872c.000e.GAE@google.com/ Reported-by: syzbot+1065a199625a388fce60@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/681357d6.050a0220.14dd7d.000d.GAE@google.com/ Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Link: https://patch.msgid.link/20250508-mctp-addr-dump-v2-1-c8a53fd2dd66@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:16 +02:00
Eric Dumazet	2d45eeb7d5	mctp: no longer rely on net->dev_index_head[] [ Upstream commit 2d20773aec14996b6cc4db92d885028319be683d ] mctp_dump_addrinfo() is one of the last users of net->dev_index_head[] in the control path. Switch to for_each_netdev_dump() for better scalability. Use C99 for mctp_device_rtnl_msg_handlers[] to prepare future RTNL removal from mctp_dump_addrinfo() (mdev->addrs is not yet RCU protected) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Matt Johnston <matt@codeconstruct.com.au> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20241206223811.1343076-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: f11cf946c0a9 ("net: mctp: Don't access ifa_index when missing") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:16 +02:00
Hangbin Liu	7777ca11a4	tools/net/ynl: ethtool: fix crash when Hardware Clock info is missing [ Upstream commit 45375814eb3f4245956c0c85092a4eee4441d167 ] Fix a crash in the ethtool YNL implementation when Hardware Clock information is not present in the response. This ensures graceful handling of devices or drivers that do not provide this optional field. e.g. Traceback (most recent call last): File "/net/tools/net/ynl/pyynl/./ethtool.py", line 438, in <module> main() ~~~~^^ File "/net/tools/net/ynl/pyynl/./ethtool.py", line 341, in main print(f'PTP Hardware Clock: {tsinfo["phc-index"]}') ~~~~~~^^^^^^^^^^^^^ KeyError: 'phc-index' Fixes: `f3d07b02b2` ("tools: ynl: ethtool testing tool") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250508035414.82974-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:16 +02:00
Rahul Rameshbabu	95a9e08ea5	tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation [ Upstream commit 2e0e148c727061009d3db5f436f51890bbb49a80 ] Print the nested stats attribute containing timestamping statistics when the --show-time-stamping flag is used. [root@binary-eater-vm-01 linux-ethtool-ts]# ./tools/net/ynl/ethtool.py --show-time-stamping mlx5_1 Time stamping parameters for mlx5_1: Capabilities: hardware-transmit hardware-receive hardware-raw-clock PTP Hardware Clock: 0 Hardware Transmit Timestamp Modes: off on Hardware Receive Filter Modes: none all Statistics: tx-pkts: 8 tx-lost: 0 tx-err: 0 Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-8-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 45375814eb3f ("tools/net/ynl: ethtool: fix crash when Hardware Clock info is missing") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:15 +02:00
Cong Wang	a7d6e0ac0a	net_sched: Flush gso_skb list too during ->change() [ Upstream commit 2d3cbfd6d54a2c39ce3244f33f85c595844bd7b8 ] Previously, when reducing a qdisc's limit via the ->change() operation, only the main skb queue was trimmed, potentially leaving packets in the gso_skb list. This could result in NULL pointer dereference when we only check sch->limit against sch->q.qlen. This patch introduces a new helper, qdisc_dequeue_internal(), which ensures both the gso_skb list and the main queue are properly flushed when trimming excess packets. All relevant qdiscs (codel, fq, fq_codel, fq_pie, hhf, pie) are updated to use this helper in their ->change() routines. Fixes: `76e3cc126b` ("codel: Controlled Delay AQM") Fixes: `4b549a2ef4` ("fq_codel: Fair Queue Codel AQM") Fixes: `afe4fd0624` ("pkt_sched: fq: Fair Queue packet scheduler") Fixes: `ec97ecf1eb` ("net: sched: add Flow Queue PIE packet scheduler") Fixes: `10239edf86` ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Fixes: `d4b36210c2` ("net: pkt_sched: PIE AQM scheme") Reported-by: Will <willsroot@protonmail.com> Reported-by: Savy <savy@syst3mfailure.io> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:15 +02:00
Luiz Augusto von Dentz	31492b8386	Bluetooth: MGMT: Fix MGMT_OP_ADD_DEVICE invalid device flags [ Upstream commit 1e2e3044c1bc64a64aa0eaf7c17f7832c26c9775 ] Device flags could be updated in the meantime while MGMT_OP_ADD_DEVICE is pending on hci_update_passive_scan_sync so instead of setting the current_flags as cmd->user_data just do a lookup using hci_conn_params_lookup and use the latest stored flags. Fixes: a182d9c84f9c ("Bluetooth: MGMT: Fix Add Device to responding before completing") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:15 +02:00
Geert Uytterhoeven	b8084e8135	spi: loopback-test: Do not split 1024-byte hexdumps [ Upstream commit a73fa3690a1f3014d6677e368dce4e70767a6ba2 ] spi_test_print_hex_dump() prints buffers holding less than 1024 bytes in full. Larger buffers are truncated: only the first 512 and the last 512 bytes are printed, separated by a truncation message. The latter is confusing in case the buffer holds exactly 1024 bytes, as all data is printed anyway. Fix this by printing buffers holding up to and including 1024 bytes in full. Fixes: `84e0c4e5e2` ("spi: add loopback test driver to allow for spi_master regression tests") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/37ee1bc90c6554c9347040adabf04188c8f704aa.1746184171.git.geert+renesas@glider.be Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:15 +02:00
Li Lingfeng	a6879a076b	nfs: handle failure of nfs_get_lock_context in unlock path [ Upstream commit c457dc1ec770a22636b473ce5d35614adfe97636 ] When memory is insufficient, the allocation of nfs_lock_context in nfs_get_lock_context() fails and returns -ENOMEM. If we mistakenly treat an nfs4_unlockdata structure (whose l_ctx member has been set to -ENOMEM) as valid and proceed to execute rpc_run_task(), this will trigger a NULL pointer dereference in nfs4_locku_prepare. For example: BUG: kernel NULL pointer dereference, address: 000000000000000c PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 15 UID: 0 PID: 12 Comm: kworker/u64:0 Not tainted 6.15.0-rc2-dirty #60 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 Workqueue: rpciod rpc_async_schedule RIP: 0010:nfs4_locku_prepare+0x35/0xc2 Code: 89 f2 48 89 fd 48 c7 c7 68 69 ef b5 53 48 8b 8e 90 00 00 00 48 89 f3 RSP: 0018:ffffbbafc006bdb8 EFLAGS: 00010246 RAX: 000000000000004b RBX: ffff9b964fc1fa00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: fffffffffffffff4 RDI: ffff9ba53fddbf40 RBP: ffff9ba539934000 R08: 0000000000000000 R09: ffffbbafc006bc38 R10: ffffffffb6b689c8 R11: 0000000000000003 R12: ffff9ba539934030 R13: 0000000000000001 R14: 0000000004248060 R15: ffffffffb56d1c30 FS: 0000000000000000(0000) GS:ffff9ba5881f0000(0000) knlGS:00000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000000c CR3: 000000093f244000 CR4: 00000000000006f0 Call Trace: <TASK> __rpc_execute+0xbc/0x480 rpc_async_schedule+0x2f/0x40 process_one_work+0x232/0x5d0 worker_thread+0x1da/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x10d/0x240 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: CR2: 000000000000000c ---[ end trace 0000000000000000 ]--- Free the allocated nfs4_unlockdata when nfs_get_lock_context() fails and return NULL to terminate subsequent rpc_run_task, preventing NULL pointer dereference. Fixes: `f30cb757f6` ("NFS: Always wait for I/O completion before unlock") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20250417072508.3850532-1-lilingfeng3@huawei.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:15 +02:00
Henry Martin	ad6caaf29b	HID: uclogic: Add NULL check in uclogic_input_configured() [ Upstream commit bd07f751208ba190f9b0db5e5b7f35d5bb4a8a1e ] devm_kasprintf() returns NULL when memory allocation fails. Currently, uclogic_input_configured() does not check for this case, which results in a NULL pointer dereference. Add NULL check after devm_kasprintf() to prevent this issue. Fixes: `dd613a4e45` ("HID: uclogic: Correct devm device reference for hidinput input_dev name") Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:15 +02:00
Qasim Ijaz	4715f16b1e	HID: thrustmaster: fix memory leak in thrustmaster_interrupts() [ Upstream commit 09d546303b370113323bfff456c4e8cff8756005 ] In thrustmaster_interrupts(), the allocated send_buf is not freed if the usb_check_int_endpoints() check fails, leading to a memory leak. Fix this by ensuring send_buf is freed before returning in the error path. Fixes: 50420d7c79c3 ("HID: hid-thrustmaster: Fix warning in thrustmaster_probe by adding endpoint check") Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:15 +02:00
Zhu Yanjun	ee4c5a2a38	RDMA/rxe: Fix slab-use-after-free Read in rxe_queue_cleanup bug [ Upstream commit f81b33582f9339d2dc17c69b92040d3650bb4bae ] Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x7d/0xa0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xcf/0x610 mm/kasan/report.c:489 kasan_report+0xb5/0xe0 mm/kasan/report.c:602 rxe_queue_cleanup+0xd0/0xe0 drivers/infiniband/sw/rxe/rxe_queue.c:195 rxe_cq_cleanup+0x3f/0x50 drivers/infiniband/sw/rxe/rxe_cq.c:132 __rxe_cleanup+0x168/0x300 drivers/infiniband/sw/rxe/rxe_pool.c:232 rxe_create_cq+0x22e/0x3a0 drivers/infiniband/sw/rxe/rxe_verbs.c:1109 create_cq+0x658/0xb90 drivers/infiniband/core/uverbs_cmd.c:1052 ib_uverbs_create_cq+0xc7/0x120 drivers/infiniband/core/uverbs_cmd.c:1095 ib_uverbs_write+0x969/0xc90 drivers/infiniband/core/uverbs_main.c:679 vfs_write fs/read_write.c:677 [inline] vfs_write+0x26a/0xcc0 fs/read_write.c:659 ksys_write+0x1b8/0x200 fs/read_write.c:731 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xaa/0x1b0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f In the function rxe_create_cq, when rxe_cq_from_init fails, the function rxe_cleanup will be called to handle the allocated resources. In fact, some memory resources have already been freed in the function rxe_cq_from_init. Thus, this problem will occur. The solution is to let rxe_cleanup do all the work. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://paste.ubuntu.com/p/tJgC42wDf6/ Tested-by: liuyi <liuy22@mails.tsinghua.edu.cn> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20250412075714.3257358-1-yanjun.zhu@linux.dev Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:14 +02:00
David Lechner	99173e6f28	iio: chemical: sps30: use aligned_s64 for timestamp [ Upstream commit bb49d940344bcb8e2b19e69d7ac86f567887ea9a ] Follow the pattern of other drivers and use aligned_s64 for the timestamp. This will ensure that the timestamp is correctly aligned on all architectures. Fixes: `a5bf6fdd19` ("iio:chemical:sps30: Fix timestamp alignment") Signed-off-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Link: https://patch.msgid.link/20250417-iio-more-timestamp-alignment-v1-5-eafac1e22318@baylibre.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:14 +02:00
Jonathan Cameron	02023c2899	iio: adc: ad7768-1: Fix insufficient alignment of timestamp. [ Upstream commit ffbc26bc91c1f1eb3dcf5d8776e74cbae21ee13a ] On architectures where an s64 is not 64-bit aligned, this may result insufficient alignment of the timestamp and the structure being too small. Use aligned_s64 to force the alignment. Fixes: `a1caeebab0` ("iio: adc: ad7768-1: Fix too small buffer passed to iio_push_to_buffers_with_timestamp()") # aligned_s64 newer Reported-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250413103443.2420727-3-jic23@kernel.org Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:14 +02:00
Alex Deucher	41ade94ad4	Revert "drm/amd: Stop evicting resources on APUs in suspend" [ Upstream commit d0ce1aaa8531a4a4707711cab5721374751c51b0 ] This reverts commit 3a9626c816db901def438dc2513622e281186d39. This breaks S4 because we end up setting the s3/s0ix flags even when we are entering s4 since prepare is used by both flows. The causes both the S3/s0ix and s4 flags to be set which breaks several checks in the driver which assume they are mutually exclusive. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3634 Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ce8f7d95899c2869b47ea6ce0b3e5bf304b2fff4) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:14 +02:00
Mario Limonciello	c1a4d21a15	drm/amd: Add Suspend/Hibernate notification callback support [ Upstream commit 2965e6355dcdf157b5fafa25a2715f00064da8bf ] As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the pmops prepare() callback so that the suspend sequence would fail but the system could remain operational under high memory usage suspend. Another class of issues exist though where due to memory fragementation there isn't a large enough contiguous space and swap isn't accessible. Add support for a suspend/hibernate notification callback that could evict VRAM before tasks are frozen. This should allow paging out to swap if necessary. Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20241128032656.2090059-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:14 +02:00
Zhigang Luo	d59f455951	drm/amdgpu: trigger flr_work if reading pf2vf data failed [ Upstream commit ab66c832847fcdffc97d4591ba5547e3990d9d33 ] if reading pf2vf data failed 30 times continuously, it means something is wrong. Need to trigger flr_work to recover the issue. also use dev_err to print the error message to get which device has issue and add warning message if waiting IDH_FLR_NOTIFICATION_CMPL timeout. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:14 +02:00
Ma Jun	9a6d2e1944	drm/amdgpu: Fix the runtime resume failure issue [ Upstream commit bbfaf2aea7164db59739728d62d9cc91d64ff856 ] Don't set power state flag when system enter runtime suspend, or it may cause runtime resume failure issue. Fixes: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend") Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:14 +02:00
Mario Limonciello	95db39dbaa	drm/amd: Stop evicting resources on APUs in suspend [ Upstream commit 226db36032c61d8717dfdd052adac351b22d3e83 ] commit 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") intentionally moved the eviction of resources to earlier in the suspend process, but this introduced a subtle change that it occurs before adev->in_s0ix or adev->in_s3 are set. This meant that APUs actually started to evict resources at suspend time as well. Explicitly set s0ix or s3 in the prepare() stage, and unset them if the prepare() stage failed. v2: squash in warning fix from Stephen Rothwell Reported-by: Jürg Billeter <j@bitron.ch> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132#note_2271038 Fixes: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:13 +02:00
Jonathan Cameron	220192d514	iio: adc: ad7266: Fix potential timestamp alignment issue. [ Upstream commit 52d349884738c346961e153f195f4c7fe186fcf4 ] On architectures where an s64 is only 32-bit aligned insufficient padding would be left between the earlier elements and the timestamp. Use aligned_s64 to enforce the correct placement and ensure the storage is large enough. Fixes: `54e018da31` ("iio:ad7266: Mark transfer buffer as __be16") # aligned_s64 is much newer. Reported-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250413103443.2420727-2-jic23@kernel.org Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:13 +02:00
Mikhail Lobanov	ec24e62a1d	KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception [ Upstream commit a2620f8932fa9fdabc3d78ed6efb004ca409019f ] Previously, commit `ed129ec905` ("KVM: x86: forcibly leave nested mode on vCPU reset") addressed an issue where a triple fault occurring in nested mode could lead to use-after-free scenarios. However, the commit did not handle the analogous situation for System Management Mode (SMM). This omission results in triggering a WARN when KVM forces a vCPU INIT after SHUTDOWN interception while the vCPU is in SMM. This situation was reprodused using Syzkaller by: 1) Creating a KVM VM and vCPU 2) Sending a KVM_SMI ioctl to explicitly enter SMM 3) Executing invalid instructions causing consecutive exceptions and eventually a triple fault The issue manifests as follows: WARNING: CPU: 0 PID: 25506 at arch/x86/kvm/x86.c:12112 kvm_vcpu_reset+0x1d2/0x1530 arch/x86/kvm/x86.c:12112 Modules linked in: CPU: 0 PID: 25506 Comm: syz-executor.0 Not tainted 6.1.130-syzkaller-00157-g164fe5dde9b6 #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:kvm_vcpu_reset+0x1d2/0x1530 arch/x86/kvm/x86.c:12112 Call Trace: <TASK> shutdown_interception+0x66/0xb0 arch/x86/kvm/svm/svm.c:2136 svm_invoke_exit_handler+0x110/0x530 arch/x86/kvm/svm/svm.c:3395 svm_handle_exit+0x424/0x920 arch/x86/kvm/svm/svm.c:3457 vcpu_enter_guest arch/x86/kvm/x86.c:10959 [inline] vcpu_run+0x2c43/0x5a90 arch/x86/kvm/x86.c:11062 kvm_arch_vcpu_ioctl_run+0x50f/0x1cf0 arch/x86/kvm/x86.c:11283 kvm_vcpu_ioctl+0x570/0xf00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4122 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x19a/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Architecturally, INIT is blocked when the CPU is in SMM, hence KVM's WARN() in kvm_vcpu_reset() to guard against KVM bugs, e.g. to detect improper emulation of INIT. SHUTDOWN on SVM is a weird edge case where KVM needs to do _something_ sane with the VMCB, since it's technically undefined, and INIT is the least awful choice given KVM's ABI. So, double down on stuffing INIT on SHUTDOWN, and force the vCPU out of SMM to avoid any weirdness (and the WARN). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: `ed129ec905` ("KVM: x86: forcibly leave nested mode on vCPU reset") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru> Link: https://lore.kernel.org/r/20250414171207.155121-1-m.lobanov@rosa.ru [sean: massage changelog, make it clear this isn't architectural behavior] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:13 +02:00
Peter Gonda	635b3cc639	KVM: SVM: Update SEV-ES shutdown intercepts with more metadata [ Upstream commit bc3d7c5570a03ab45bde4bae83697c80900fb714 ] Currently if an SEV-ES VM shuts down userspace sees KVM_RUN struct with only errno=EINVAL. This is a very limited amount of information to debug the situation. Instead return KVM_EXIT_SHUTDOWN to alert userspace the VM is shutting down and is not usable any further. Signed-off-by: Peter Gonda <pgonda@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20230907162449.1739785-1-pgonda@google.com [sean: tweak changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: a2620f8932fa ("KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:13 +02:00
Cristian Marussi	2402a3ef9e	firmware: arm_scmi: Fix timeout checks on polling path [ Upstream commit c23c03bf1faa1e76be1eba35bad6da6a2a7c95ee ] Polling mode transactions wait for a reply busy-looping without holding a spinlock, but currently the timeout checks are based only on elapsed time: as a result we could hit a false positive whenever our busy-looping thread is pre-empted and scheduled out for a time greater than the polling timeout. Change the checks at the end of the busy-loop to make sure that the polling wasn't indeed successful or an out-of-order reply caused the polling to be forcibly terminated. Fixes: `31d2f803c1` ("firmware: arm_scmi: Add sync_cmds_completed_on_ret transport flag") Reported-by: Huangjie <huangjie1663@phytium.com.cn> Closes: https://lore.kernel.org/arm-scmi/20250123083323.2363749-1-jackhuang021@gmail.com/ Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Cc: stable@vger.kernel.org # 5.18.x Message-Id: <20250310175800.1444293-1-cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:13 +02:00
Luke Parkin	b388129425	firmware: arm_scmi: Track basic SCMI communication debug metrics [ Upstream commit 0b3d48c4726e1b20dffd2ff81a9d94d5d930220b ] Add the support for counting some of the SCMI communication debug metrics like how many were sent successfully or with some errors, responses received, notifications and delayed responses, transfer timeouts and errors from the firmware/platform. In many cases, the traces exists. But the traces are not always necessarily enabled and getting such cumulative SCMI communication debug metrics helps in understanding if there are any possible improvements that can be made on either side of SCMI communication. Signed-off-by: Luke Parkin <luke.parkin@arm.com> Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Message-Id: <20240805131013.587016-4-sudeep.holla@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Stable-dep-of: c23c03bf1faa ("firmware: arm_scmi: Fix timeout checks on polling path") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:13 +02:00
Luke Parkin	0b175f7c4f	firmware: arm_scmi: Add support for debug metrics at the interface [ Upstream commit 1b18d4295f9d1125bc7a799fc12924cd45fc04b1 ] Since SCMI involves interaction with the entity(software, firmware and/or hardware) providing services or features, it is quite useful to track certain metrics(for pure debugging purposes) like how many messages were sent or received, were there any failures, what kind of failures, ..etc. Add a new optional config option for the above purpose and the initial support for counting such key debug metrics. Signed-off-by: Luke Parkin <luke.parkin@arm.com> Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Message-Id: <20240805131013.587016-3-sudeep.holla@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Stable-dep-of: c23c03bf1faa ("firmware: arm_scmi: Fix timeout checks on polling path") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:13 +02:00
Cristian Marussi	844b899f78	firmware: arm_scmi: Add message dump traces for bad and unexpected replies [ Upstream commit 5076ab66db1671a5cd9ecfb857d1949e36a33142 ] It is useful to have message dump traces for any invalid/bad/unexpected replies. Let us add traces for the same as well as late-timed-out, out-of-order and unexpected/spurious messages. Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20240325204620.1437237-4-cristian.marussi@arm.com Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Stable-dep-of: c23c03bf1faa ("firmware: arm_scmi: Fix timeout checks on polling path") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:13 +02:00
Cristian Marussi	6bddbdbeb7	firmware: arm_scmi: Add helper to trace bad messages [ Upstream commit 5dc0e0b1f0ea2b55031f84a365962b9b45869b98 ] Upon reception of malformed and unexpected timed-out SCMI messages, it is not possible to trace those bad messages in their entirety, because usually we cannot even retrieve the payload, or it is just not reliable. Add a helper to trace at least the content of the header of the received message while associating a meaningful tag and error code. Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20240325204620.1437237-3-cristian.marussi@arm.com Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Stable-dep-of: c23c03bf1faa ("firmware: arm_scmi: Fix timeout checks on polling path") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:12 +02:00
Michal Suchanek	7ea1913daa	tpm: tis: Double the timeout B to 4s [ Upstream commit 2f661f71fda1fc0c42b7746ca5b7da529eb6b5be ] With some Infineon chips the timeouts in tpm_tis_send_data (both B and C) can reach up to about 2250 ms. Timeout C is retried since commit de9e33df7762 ("tpm, tpm_tis: Workaround failed command reception on Infineon devices") Timeout B still needs to be extended. The problem is most commonly encountered with context related operation such as load context/save context. These are issued directly by the kernel, and there is no retry logic for them. When a filesystem is set up to use the TPM for unlocking the boot fails, and restarting the userspace service is ineffective. This is likely because ignoring a load context/save context result puts the real TPM state and the TPM state expected by the kernel out of sync. Chips known to be affected: tpm_tis IFX1522:00: 2.0 TPM (device-id 0x1D, rev-id 54) Description: SLB9672 Firmware Revision: 15.22 tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id 22) Firmware Revision: 7.83 tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1A, rev-id 16) Firmware Revision: 5.63 Link: https://lore.kernel.org/linux-integrity/Z5pI07m0Muapyu9w@kitsune.suse.cz/ Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:12 +02:00
Masami Hiramatsu (Google)	70417bada4	tracing: probes: Fix a possible race in trace_probe_log APIs [ Upstream commit fd837de3c9cb1a162c69bc1fb1f438467fe7f2f5 ] Since the shared trace_probe_log variable can be accessed and modified via probe event create operation of kprobe_events, uprobe_events, and dynamic_events, it should be protected. In the dynamic_events, all operations are serialized by `dyn_event_ops_mutex`. But kprobe_events and uprobe_events interfaces are not serialized. To solve this issue, introduces dyn_event_create(), which runs create() operation under the mutex, for kprobe_events and uprobe_events. This also uses lockdep to check the mutex is held when using trace_probe_log* APIs. Link: https://lore.kernel.org/all/174684868120.551552.3068655787654268804.stgit@devnote2/ Reported-by: Paul Cacheux <paulcacheux@gmail.com> Closes: https://lore.kernel.org/all/20250510074456.805a16872b591e2971a4d221@kernel.org/ Fixes: `ab105a4fb8` ("tracing: Use tracing error_log with probe events") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:12 +02:00
Waiman Long	4e63b6907d	cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks [ Upstream commit 39b5ef791d109dd54c7c2e6e87933edfcc0ad1ac ] Commit `ec5fbdfb99` ("cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset") enabled us to pull CPUs dedicated to child partitions from tasks in top_cpuset by ignoring per cpu kthreads. However, there can be other kthreads that are not per cpu but have PF_NO_SETAFFINITY flag set to indicate that we shouldn't mess with their CPU affinity. For other kthreads, their affinity will be changed to skip CPUs dedicated to child partitions whether it is an isolating or a scheduling one. As all the per cpu kthreads have PF_NO_SETAFFINITY set, the PF_NO_SETAFFINITY tasks are essentially a superset of per cpu kthreads. Fix this issue by dropping the kthread_is_per_cpu() check and checking the PF_NO_SETAFFINITY flag instead. Fixes: `ec5fbdfb99` ("cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset") Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:12 +02:00
Hans de Goede	44a4c0dba8	platform/x86: asus-wmi: Fix wlan_ctrl_by_user detection [ Upstream commit bfcfe6d335a967f8ea0c1980960e6f0205b5de6e ] The wlan_ctrl_by_user detection was introduced by commit `a50bd128f2` ("asus-wmi: record wlan status while controlled by userapp"). Quoting from that commit's commit message: """ When you call WMIMethod(DSTS, 0x00010011) to get WLAN status, it may return (1) 0x00050001 (On) (2) 0x00050000 (Off) (3) 0x00030001 (On) (4) 0x00030000 (Off) (5) 0x00000002 (Unknown) (1), (2) means that the model has hardware GPIO for WLAN, you can call WMIMethod(DEVS, 0x00010011, 1 or 0) to turn WLAN on/off. (3), (4) means that the model doesn’t have hardware GPIO, you need to use API or driver library to turn WLAN on/off, and call WMIMethod(DEVS, 0x00010012, 1 or 0) to set WLAN LED status. After you set WLAN LED status, you can see the WLAN status is changed with WMIMethod(DSTS, 0x00010011). Because the status is recorded lastly (ex: Windows), you can use it for synchronization. (5) means that the model doesn’t have WLAN device. WLAN is the ONLY special case with upper rule. """ The wlan_ctrl_by_user flag should be set on 0x0003000? ((3), (4) above) return values, but the flag mistakenly also gets set on laptops with 0x0005000? ((1), (2)) return values. This is causing rfkill problems on laptops where 0x0005000? is returned. Fix the check to only set the wlan_ctrl_by_user flag for 0x0003000? return values. Fixes: `a50bd128f2` ("asus-wmi: record wlan status while controlled by userapp") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219786 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20250501131702.103360-2-hdegoede@redhat.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:12 +02:00
Runhua He	6b92eee6b9	platform/x86/amd/pmc: Declare quirk_spurious_8042 for MECHREVO Wujie 14XA (GX4HRXL) [ Upstream commit 0887817e4953885fbd6a5c1bec2fdd339261eb19 ] MECHREVO Wujie 14XA (GX4HRXL) wakes up immediately after s2idle entry. This happens regardless of whether the laptop is plugged into AC power, or whether any peripheral is plugged into the laptop. Similar to commit a55bdad5dfd1 ("platform/x86/amd/pmc: Disable keyboard wakeup on AMD Framework 13"), the MECHREVO Wujie 14XA wakes up almost instantly after s2idle suspend entry (IRQ1 is the keyboard): 2025-04-18 17:23:57,588 DEBUG: PM: Triggering wakeup from IRQ 9 2025-04-18 17:23:57,588 DEBUG: PM: Triggering wakeup from IRQ 1 Add this model to the spurious_8042 quirk to workaround this. This patch does not affect the wake-up function of the built-in keyboard. Because the firmware of this machine adds an insurance for keyboard wake-up events, as it always triggers an additional IRQ 9 to wake up the system. Suggested-by: Mingcong Bai <jeffbai@aosc.io> Suggested-by: Xinhui Yang <cyan@cyano.uk> Suggested-by: Rong Zhang <i@rong.moe> Fixes: a55bdad5dfd1 ("platform/x86/amd/pmc: Disable keyboard wakeup on AMD Framework 13") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4166 Cc: Mario Limonciello <mario.limonciello@amd.com> Link: https://zhuanldan.zhihu.com/p/730538041 Tested-by: Yemu Lu <prcups@krgm.moe> Signed-off-by: Runhua He <hua@aosc.io> Link: https://lore.kernel.org/r/20250507100103.995395-1-hua@aosc.io Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:12 +02:00
Kees Cook	bfe0dd3b73	binfmt_elf: Move brk for static PIE even if ASLR disabled [ Upstream commit 11854fe263eb1b9a8efa33b0c087add7719ea9b4 ] In commit `bbdc6076d2` ("binfmt_elf: move brk out of mmap when doing direct loader exec"), the brk was moved out of the mmap region when loading static PIE binaries (ET_DYN without INTERP). The common case for these binaries was testing new ELF loaders, so the brk needed to be away from mmap to avoid colliding with stack, future mmaps (of the loader-loaded binary), etc. But this was only done when ASLR was enabled, in an attempt to minimize changes to memory layouts. After adding support to respect alignment requirements for static PIE binaries in commit 3545deff0ec7 ("binfmt_elf: Honor PT_LOAD alignment for static PIE"), it became possible to have a large gap after the final PT_LOAD segment and the top of the mmap region. This means that future mmap allocations might go after the last PT_LOAD segment (where brk might be if ASLR was disabled) instead of before them (where they traditionally ended up). On arm64, running with ASLR disabled, Ubuntu 22.04's "ldconfig" binary, a static PIE, has alignment requirements that leaves a gap large enough after the last PT_LOAD segment to fit the vdso and vvar, but still leave enough space for the brk (which immediately follows the last PT_LOAD segment) to be allocated by the binary. fffff7f20000-fffff7fde000 r-xp 00000000 fe:02 8110426 /sbin/ldconfig.real fffff7fee000-fffff7ff5000 rw-p 000be000 fe:02 8110426 /sbin/ldconfig.real fffff7ff5000-fffff7ffa000 rw-p 00000000 00:00 0 *[brk will go here at fffff7ffa000]* fffff7ffc000-fffff7ffe000 r--p 00000000 00:00 0 [vvar] fffff7ffe000-fffff8000000 r-xp 00000000 00:00 0 [vdso] fffffffdf000-1000000000000 rw-p 00000000 00:00 0 [stack] After commit 0b3bc3354eb9 ("arm64: vdso: Switch to generic storage implementation"), the arm64 vvar grew slightly, and suddenly the brk collided with the allocation. fffff7f20000-fffff7fde000 r-xp 00000000 fe:02 8110426 /sbin/ldconfig.real fffff7fee000-fffff7ff5000 rw-p 000be000 fe:02 8110426 /sbin/ldconfig.real fffff7ff5000-fffff7ffa000 rw-p 00000000 00:00 0 *[oops, no room any more, vvar is at fffff7ffa000!]* fffff7ffa000-fffff7ffe000 r--p 00000000 00:00 0 [vvar] fffff7ffe000-fffff8000000 r-xp 00000000 00:00 0 [vdso] fffffffdf000-1000000000000 rw-p 00000000 00:00 0 [stack] The solution is to unconditionally move the brk out of the mmap region for static PIE binaries. Whether ASLR is enabled or not does not change if there may be future mmap allocation collisions with a growing brk region. Update memory layout comments (with kernel-doc headings), consolidate the setting of mm->brk to later (it isn't needed early), move static PIE brk out of mmap unconditionally, and make sure brk(2) knows to base brk position off of mm->start_brk not mm->end_data no matter what the cause of moving it is (via current->brk_randomized). For the CONFIG_COMPAT_BRK case, though, leave the logic unchanged, as we can never safely move the brk. These systems, however, are not using specially aligned static PIE binaries. Reported-by: Ryan Roberts <ryan.roberts@arm.com> Closes: https://lore.kernel.org/lkml/f93db308-4a0e-4806-9faf-98f890f5a5e6@arm.com/ Fixes: `bbdc6076d2` ("binfmt_elf: move brk out of mmap when doing direct loader exec") Link: https://lore.kernel.org/r/20250425224502.work.520-kees@kernel.org Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:12 +02:00
Kees Cook	622b7267d7	binfmt_elf: Honor PT_LOAD alignment for static PIE [ Upstream commit 3545deff0ec7a37de7ed9632e262598582b140e9 ] The p_align values in PT_LOAD were ignored for static PIE executables (i.e. ET_DYN without PT_INTERP). This is because there is no way to request a non-fixed mmap region with a specific alignment. ET_DYN with PT_INTERP uses a separate base address (ELF_ET_DYN_BASE) and binfmt_elf performs the ASLR itself, which means it can also apply alignment. For the mmap region, the address selection happens deep within the vm_mmap() implementation (when the requested address is 0). The earlier attempt to implement this: commit `9630f0d60f` ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") commit `925346c129` ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders") did not take into account the different base address origins, and were eventually reverted: `aeb7923733` ("revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"") In order to get the correct alignment from an mmap base, binfmt_elf must perform a 0-address load first, then tear down the mapping and perform alignment on the resulting address. Since this is slightly more overhead, only do this when it is needed (i.e. the alignment is not the default ELF alignment). This does, however, have the benefit of being able to use MAP_FIXED_NOREPLACE, to avoid potential collisions. With this fixed, enable the static PIE self tests again. Reported-by: H.J. Lu <hjl.tools@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215275 Link: https://lore.kernel.org/r/20240508173149.677910-3-keescook@chromium.org Signed-off-by: Kees Cook <kees@kernel.org> Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:11 +02:00
Kees Cook	d9c1758756	binfmt_elf: Calculate total_size earlier [ Upstream commit 2d4cf7b190bbfadd4986bf5c34da17c1a88adf8e ] In preparation to support PT_LOAD with large p_align values on non-PT_INTERP ET_DYN executables (i.e. "static pie"), we'll need to use the total_size details earlier. Move this separately now to make the next patch more readable. As total_size and load_bias are currently calculated separately, this has no behavioral impact. Link: https://lore.kernel.org/r/20240508173149.677910-2-keescook@chromium.org Signed-off-by: Kees Cook <kees@kernel.org> Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-22 14:12:11 +02:00

1 2 3 4 5 ...

1234271 Commits