linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-07 19:30:30 +09:00

Author	SHA1	Message	Date
Russell King (Oracle)	db3667c9bb	net: stmmac: fix TSO DMA API usage causing oops [ Upstream commit 4c49f38e20a57f8abaebdf95b369295b153d1f8e ] Commit 66600fac7a98 ("net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data") moved the assignment of tx_skbuff_dma[]'s members to be later in stmmac_tso_xmit(). The buf (dma cookie) and len stored in this structure are passed to dma_unmap_single() by stmmac_tx_clean(). The DMA API requires that the dma cookie passed to dma_unmap_single() is the same as the value returned from dma_map_single(). However, by moving the assignment later, this is not the case when priv->dma_cap.addr64 > 32 as "des" is offset by proto_hdr_len. This causes problems such as: dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed and with DMA_API_DEBUG enabled: DMA-API: dwc-eth-dwmac 2490000.ethernet: device driver tries to +free DMA memory it has not allocated [device address=0x000000ffffcf65c0] [size=66 bytes] Fix this by maintaining "des" as the original DMA cookie, and use tso_des to pass the offset DMA cookie to stmmac_tso_allocator(). Full details of the crashes can be found at: https://lore.kernel.org/all/d8112193-0386-4e14-b516-37c2d838171a@nvidia.com/ https://lore.kernel.org/all/klkzp5yn5kq5efgtrow6wbvnc46bcqfxs65nz3qy77ujr5turc@bwwhelz2l4dw/ Reported-by: Jon Hunter <jonathanh@nvidia.com> Reported-by: Thierry Reding <thierry.reding@gmail.com> Fixes: 66600fac7a98 ("net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data") Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/E1tJXcx-006N4Z-PC@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 13:58:43 +01:00
Roger Quadros	43fb5b0974	usb: cdns3: Add quirk flag to enable suspend residency [ Upstream commit 0aca19e4037a4143273e90f1b44666b78b4dde9b ] Some platforms (e.g. ti,j721e-usb, ti,am64-usb) require this bit to be set to workaround a lockup issue with PHY short suspend intervals [1]. Add a platform quirk flag to indicate if Suspend Residency should be enabled. [1] - https://www.ti.com/lit/er/sprz457h/sprz457h.pdf i2409 - USB: USB2 PHY locks up due to short suspend Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Acked-by: Peter Chen <peter.chen@kernel.org> Link: https://lore.kernel.org/r/20240516044537.16801-2-r-gunasekaran@ti.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 13:58:43 +01:00
Kai-Heng Feng	c7cc4152c0	PCI/AER: Disable AER service on suspend [ Upstream commit 5afc2f763edc5daae4722ee46fea4e627d01fa90 ] If the link is powered off during suspend, electrical noise may cause errors that are logged via AER. If the AER interrupt is enabled and shares an IRQ with PME, that causes a spurious wakeup during suspend. Disable the AER interrupt during suspend to prevent this. Clear error status before re-enabling IRQ interrupts during resume so we don't get an interrupt for errors that occurred during the suspend/resume process. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 Link: https://lore.kernel.org/r/20240416043225.1462548-2-kai.heng.feng@canonical.com Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> [bhelgaas: drop pci_ancestor_pr3_present() etc, commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 13:58:42 +01:00
Vidya Sagar	3e221877dd	PCI: Use preserve_config in place of pci_flags [ Upstream commit 7246a4520b4bf1494d7d030166a11b5226f6d508 ] Use preserve_config in place of checking for PCI_PROBE_ONLY flag to enable support for "linux,pci-probe-only" on a per host bridge basis. This also obviates the use of adding PCI_REASSIGN_ALL_BUS flag if !PCI_PROBE_ONLY, as pci_assign_unassigned_root_bus_resources() takes care of reassigning the resources that are not already claimed. Link: https://lore.kernel.org/r/20240508174138.3630283-5-vidyas@nvidia.com Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 13:58:42 +01:00
Pierre-Louis Bossart	0d1d7e0c64	ASoC: Intel: sof_sdw: add quirk for Dell SKU 0B8C [ Upstream commit 92d5b5930e7d55ca07b483490d6298eee828bbe4 ] Jack detection needs to rely on JD2, as most other Dell AlderLake-based devices. Closes: https://github.com/thesofproject/linux/issues/5021 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20240624121119.91552-4-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 13:58:42 +01:00
Pierre-Louis Bossart	9a6a33eb6b	ASoC: Intel: sof_sdw: fix jack detection on ADL-N variant RVP [ Upstream commit 65c90df918205bc84f5448550cde76a54dae5f52 ] Experimental tests show that JD2_100K is required, otherwise the jack is detected always even with nothing plugged-in. To avoid matching with other known quirks the SKU information is used. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://patch.msgid.link/20240624121119.91552-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 13:58:42 +01:00
Jiaxun Yang	dbdee8456a	MIPS: Loongson64: DTS: Fix msi node for ls7a [ Upstream commit 98a9e2ac3755a353eefea8c52e23d5b0c50f3899 ] Add it to silent warning: arch/mips/boot/dts/loongson/ls7a-pch.dtsi:68.16-416.5: Warning (interrupt_provider): /bus@10000000/pci@1a000000: '#interrupt-cells' found, but node is not an interrupt provider arch/mips/boot/dts/loongson/loongson64g_4core_ls7a.dts:32.31-40.4: Warning (interrupt_provider): /bus@10000000/msi-controller@2ff00000: Missing '#interrupt-cells' in interrupt provider arch/mips/boot/dts/loongson/loongson64g_4core_ls7a.dtb: Warning (interrupt_map): Failed prerequisite 'interrupt_provider' Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 13:58:42 +01:00
Roger Quadros	d10b8db990	usb: cdns3-ti: Add workaround for Errata i2409 [ Upstream commit b50a2da03bd95784541b3f9058e452cc38f9ba05 ] TI USB2 PHY is known to have a lockup issue on very short suspend intervals. Enable the Suspend Residency quirk flag to workaround this as described in Errata i2409 [1]. [1] - https://www.ti.com/lit/er/sprz457h/sprz457h.pdf Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Acked-by: Peter Chen <peter.chen@kernel.org> Link: https://lore.kernel.org/r/20240516044537.16801-3-r-gunasekaran@ti.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 13:58:42 +01:00
Ajit Khaparde	25f760c9ec	PCI: Add ACS quirk for Broadcom BCM5760X NIC [ Upstream commit 524e057b2d66b61f9b63b6db30467ab7b0bb4796 ] The Broadcom BCM5760X NIC may be a multi-function device. While it does not advertise an ACS capability, peer-to-peer transactions are not possible between the individual functions. So it is ok to treat them as fully isolated. Add an ACS quirk for this device so the functions can be in independent IOMMU groups and attached individually to userspace applications using VFIO. [kwilczynski: commit log] Link: https://lore.kernel.org/linux-pci/20240510204228.73435-1-ajit.khaparde@broadcom.com Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 13:58:42 +01:00
Jiwei Sun	db7d50a5d7	PCI: vmd: Create domain symlink before pci_bus_add_devices() [ Upstream commit f24c9bfcd423e2b2bb0d198456412f614ec2030a ] The vmd driver creates a "domain" symlink in sysfs for each VMD bridge. Previously this symlink was created after pci_bus_add_devices() added devices below the VMD bridge and emitted udev events to announce them to userspace. This led to a race between userspace consumers of the udev events and the kernel creation of the symlink. One such consumer is mdadm, which assembles block devices into a RAID array, and for devices below a VMD bridge, mdadm depends on the "domain" symlink. If mdadm loses the race, it may be unable to assemble a RAID array, which may cause a boot failure or other issues, with complaints like this: (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: Unable to get real path for '/sys/bus/pci/drivers/vmd/0000:c7:00.5/domain/device'' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: /dev/nvme1n1 is not attached to Intel(R) RAID controller.' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: No OROM/EFI properties for /dev/nvme1n1' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: no RAID superblock on /dev/nvme1n1.' (udev-worker)[2149]: nvme1n1: Process '/sbin/mdadm -I /dev/nvme1n1' failed with exit code 1. This symptom prevents the OS from booting successfully. After a NVMe disk is probed/added by the nvme driver, udevd invokes mdadm to detect if there is a mdraid associated with this NVMe disk, and mdadm determines if a NVMe device is connected to a particular VMD domain by checking the "domain" symlink. For example: Thread A Thread B Thread mdadm vmd_enable_domain pci_bus_add_devices __driver_probe_device ... work_on_cpu schedule_work_on : wakeup Thread B nvme_probe : wakeup scan_work to scan nvme disk and add nvme disk then wakeup udevd : udevd executes mdadm command flush_work main : wait for nvme_probe done ... __driver_probe_device find_driver_devices : probe next nvme device : 1) Detect domain symlink ... 2) Find domain symlink ... from vmd sysfs ... 3) Domain symlink not ... created yet; failed sysfs_create_link : create domain symlink Create the VMD "domain" symlink before invoking pci_bus_add_devices() to avoid this race. Suggested-by: Adrian Huang <ahuang12@lenovo.com> Link: https://lore.kernel.org/linux-pci/20240605124844.24293-1-sjiwei@163.com Signed-off-by: Jiwei Sun <sunjw10@lenovo.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 13:58:41 +01:00
Peng Hongchi	4f3cb0f96a	usb: dwc2: gadget: Don't write invalid mapped sg entries into dma_desc with iommu enabled [ Upstream commit 1134289b6b93d73721340b66c310fd985385e8fa ] When using dma_map_sg() to map the scatterlist with iommu enabled, the entries in the scatterlist can be mergerd into less but longer entries in the function __finalise_sg(). So that the number of valid mapped entries is actually smaller than ureq->num_reqs,and there are still some invalid entries in the scatterlist with dma_addr=0xffffffff and len=0. Writing these invalid sg entries into the dma_desc can cause a data transmission error. The function dma_map_sg() returns the number of valid map entries and the return value is assigned to usb_request::num_mapped_sgs in function usb_gadget_map_request_by_dev(). So that just write valid mapped entries into dma_desc according to the usb_request::num_mapped_sgs, and set the IOC bit if it's the last valid mapped entry. This patch poses no risk to no-iommu situation, cause ureq->num_mapped_sgs equals ureq->num_sgs while using dma_direct_map_sg() to map the scatterlist whith iommu disabled. Signed-off-by: Peng Hongchi <hongchi.peng@siengine.com> Link: https://lore.kernel.org/r/20240523100315.7226-1-hongchi.peng@siengine.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 13:58:41 +01:00
Lion Ackermann	97e13434b5	net: sched: fix ordering of qlen adjustment commit 5eb7de8cd58e73851cd37ff8d0666517d9926948 upstream. Changes to sch->q.qlen around qdisc_tree_reduce_backlog() need to happen _before_ a call to said function because otherwise it may fail to notify parent qdiscs when the child is about to become empty. Signed-off-by: Lion Ackermann <nnamrec@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Artem Metla <ametla@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-27 13:58:41 +01:00
Greg Kroah-Hartman	ab6cc4ef42	Linux 6.6.67 Link: https://lore.kernel.org/r/20241217170533.329523616@linuxfoundation.org Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Peter Schneider <pschneider1968@googlemail.com> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:37 +01:00
Juergen Gross	e68cbbef3d	x86/static-call: fix 32-bit build commit 349f0086ba8b2a169877d21ff15a4d9da3a60054 upstream. In 32-bit x86 builds CONFIG_STATIC_CALL_INLINE isn't set, leading to static_call_initialized not being available. Define it as "0" in that case. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 0ef8047b737d ("x86/static-call: provide a way to do very early static-call updates") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:37 +01:00
Dan Carpenter	44a7b0419d	ALSA: usb-audio: Fix a DMA to stack memory bug commit f7d306b47a24367302bd4fe846854e07752ffcd9 upstream. The usb_get_descriptor() function does DMA so we're not allowed to use a stack buffer for that. Doing DMA to the stack is not portable all architectures. Move the "new_device_descriptor" from being stored on the stack and allocate it with kmalloc() instead. Fixes: b909df18ce2a ("ALSA: usb-audio: Fix potential out-of-bound accesses for Extigy and Mbox devices") Cc: stable@kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/60e3aa09-039d-46d2-934c-6f123026c2eb@stanley.mountain Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Benoît Sevens <bsevens@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:37 +01:00
Juergen Gross	bcf0e2fda8	x86/xen: remove hypercall page commit 7fa0da5373685e7ed249af3fa317ab1e1ba8b0a6 upstream. The hypercall page is no longer needed. It can be removed, as from the Xen perspective it is optional. But, from Linux's perspective, it removes naked RET instructions that escape the speculative protections that Call Depth Tracking and/or Untrain Ret are trying to achieve. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:36 +01:00
Juergen Gross	bcca7e0679	x86/xen: use new hypercall functions instead of hypercall page commit b1c2cb86f4a7861480ad54bb9a58df3cbebf8e92 upstream. Call the Xen hypervisor via the new xen_hypercall_func static-call instead of the hypercall page. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:36 +01:00
Juergen Gross	31f29270c1	x86/xen: add central hypercall functions commit b4845bb6383821a9516ce30af3a27dc873e37fd4 upstream. Add generic hypercall functions usable for all normal (i.e. not iret) hypercalls. Depending on the guest type and the processor vendor different functions need to be used due to the to be used instruction for entering the hypervisor: - PV guests need to use syscall - HVM/PVH guests on Intel need to use vmcall - HVM/PVH guests on AMD and Hygon need to use vmmcall As PVH guests need to issue hypercalls very early during boot, there is a 4th hypercall function needed for HVM/PVH which can be used on Intel and AMD processors. It will check the vendor type and then set the Intel or AMD specific function to use via static_call(). This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:36 +01:00
Juergen Gross	82c211ead1	x86/xen: don't do PV iret hypercall through hypercall page commit a2796dff62d6c6bfc5fbebdf2bee0d5ac0438906 upstream. Instead of jumping to the Xen hypercall page for doing the iret hypercall, directly code the required sequence in xen-asm.S. This is done in preparation of no longer using hypercall page at all, as it has shown to cause problems with speculation mitigations. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:36 +01:00
Juergen Gross	cd95149561	x86/static-call: provide a way to do very early static-call updates commit 0ef8047b737d7480a5d4c46d956e97c190f13050 upstream. Add static_call_update_early() for updating static-call targets in very early boot. This will be needed for support of Xen guest type specific hypercall functions. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:36 +01:00
Juergen Gross	8fb54fe2e7	objtool/x86: allow syscall instruction commit dda014ba59331dee4f3b773a020e109932f4bd24 upstream. The syscall instruction is used in Xen PV mode for doing hypercalls. Allow syscall to be used in the kernel in case it is tagged with an unwind hint for objtool. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:36 +01:00
Juergen Gross	aac984c87e	x86: make get_cpu_vendor() accessible from Xen code commit efbcd61d9bebb771c836a3b8bfced8165633db7c upstream. In order to be able to differentiate between AMD and Intel based systems for very early hypercalls without having to rely on the Xen hypercall page, make get_cpu_vendor() non-static. Refactor early_cpu_init() for the same reason by splitting out the loop initializing cpu_devs() into an externally callable function. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:35 +01:00
Juergen Gross	fe9a8f5250	xen/netfront: fix crash when removing device commit f9244fb55f37356f75c739c57323d9422d7aa0f8 upstream. When removing a netfront device directly after a suspend/resume cycle it might happen that the queues have not been setup again, causing a crash during the attempt to stop the queues another time. Fix that by checking the queues are existing before trying to stop them. This is XSA-465 / CVE-2024-53240. Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Fixes: `d50b7914fa` ("xen-netfront: Fix NULL sring after live migration") Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:35 +01:00
Radu Rendec	4a41bb9f2b	net: rswitch: Avoid use-after-free in rswitch_poll() commit 9a0c28efeec6383ef22e97437616b920e7320b67 upstream. The use-after-free is actually in rswitch_tx_free(), which is inlined in rswitch_poll(). Since `skb` and `gq->skbs[gq->dirty]` are in fact the same pointer, the skb is first freed using dev_kfree_skb_any(), then the value in skb->len is used to update the interface statistics. Let's move around the instructions to use skb->len before the skb is freed. This bug is trivial to reproduce using KFENCE. It will trigger a splat every few packets. A simple ARP request or ICMP echo request is enough. Fixes: 271e015b9153 ("net: rswitch: Add unmap_addrs instead of dma address in each desc") Signed-off-by: Radu Rendec <rrendec@redhat.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://patch.msgid.link/20240702210838.2703228-1-rrendec@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:35 +01:00
Shung-Hsi Yu	9f7a9f95df	selftests/bpf: remove use of __xlated() Commit `68ec5395bc`, backport of mainline commit a41b3828ec05 ("selftests/bpf: Verify that sync_linked_regs preserves subreg_def") uses the __xlated() that wasn't in the v6.6 code-base, and causes BPF selftests to fail compilation. Remove the use of the __xlated() macro in tools/testing/selftests/bpf/progs/verifier_scalar_ids.c to fix compilation failure. Without the __xlated() checks the coverage is reduced, however the test case still functions just fine. Fixes: `68ec5395bc` ("selftests/bpf: Verify that sync_linked_regs preserves subreg_def") Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:35 +01:00
Daniel Borkmann	ce444a0041	selftests/bpf: Add netlink helper library commit 51f1892b5289f0c09745d3bedb36493555d6d90c upstream. Add a minimal netlink helper library for the BPF selftests. This has been taken and cut down and cleaned up from iproute2. This covers basics such as netdevice creation which we need for BPF selftests / BPF CI given iproute2 package cannot cover it yet. Stanislav Fomichev suggested that this could be replaced in future by ynl tool generated C code once it has RTNL support to create devices. Once we get to this point the BPF CI would also need to add libmnl. If no further extensions are needed, a second option could be that we remove this code again once iproute2 package has support. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20231024214904.29825-7-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:35 +01:00
Nikolay Kuratov	3a7d88f981	tracing/kprobes: Skip symbol counting logic for module symbols in create_local_trace_kprobe() commit `b022f0c7e4` ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols") avoids checking number_of_same_symbols() for module symbol in __trace_kprobe_create(), but create_local_trace_kprobe() should avoid this check too. Doing this check leads to ENOENT for module_name:symbol_name constructions passed over perf_event_open. No bug in newer kernels as it was fixed more generally by commit 9d8616034f16 ("tracing/kprobes: Add symbol counting check when module loads") Link: https://lore.kernel.org/linux-trace-kernel/20240705161030.b3ddb33a8167013b9b1da202@kernel.org Fixes: `b022f0c7e4` ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols") Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:35 +01:00
Eduard Zingerman	bfe9446ea1	bpf: sync_linked_regs() must preserve subreg_def commit e9bd9c498cb0f5843996dbe5cbce7a1836a83c70 upstream. Range propagation must not affect subreg_def marks, otherwise the following example is rewritten by verifier incorrectly when BPF_F_TEST_RND_HI32 flag is set: 0: call bpf_ktime_get_ns call bpf_ktime_get_ns 1: r0 &= 0x7fffffff after verifier r0 &= 0x7fffffff 2: w1 = w0 rewrites w1 = w0 3: if w0 < 10 goto +0 --------------> r11 = 0x2f5674a6 (r) 4: r1 >>= 32 r11 <<= 32 (r) 5: r0 = r1 r1 \|= r11 (r) 6: exit; if w0 < 0xa goto pc+0 r1 >>= 32 r0 = r1 exit (or zero extension of w1 at (2) is missing for architectures that require zero extension for upper register half). The following happens w/o this patch: - r0 is marked as not a subreg at (0); - w1 is marked as subreg at (2); - w1 subreg_def is overridden at (3) by copy_register_state(); - w1 is read at (5) but mark_insn_zext() does not mark (2) for zero extension, because w1 subreg_def is not set; - because of BPF_F_TEST_RND_HI32 flag verifier inserts random value for hi32 bits of (2) (marked (r)); - this random value is read at (5). Fixes: `75748837b7` ("bpf: Propagate scalar ranges through register assignments.") Reported-by: Lonial Con <kongln9170@gmail.com> Signed-off-by: Lonial Con <kongln9170@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Closes: https://lore.kernel.org/bpf/7e2aa30a62d740db182c170fdd8f81c596df280d.camel@gmail.com Link: https://lore.kernel.org/bpf/20240924210844.1758441-1-eddyz87@gmail.com [ shung-hsi.yu: sync_linked_regs() was called find_equal_scalars() before commit 4bf79f9be434 ("bpf: Track equal scalars history on per-instruction level"), and modification is done because there is only a single call to copy_register_state() before commit 98d7ca374ba4 ("bpf: Track delta between "linked" registers."). ] Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:35 +01:00
James Morse	4e76efda1f	KVM: arm64: Disable MPAM visibility by default and ignore VMM writes commit 6685f5d572c22e1003e7c0d089afe1c64340ab1f upstream. commit `011e5f5bf5` ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests, but didn't add trap handling. A previous patch supplied the missing trap handling. Existing VMs that have the MPAM field of ID_AA64PFR0_EL1 set need to be migratable, but there is little point enabling the MPAM CPU interface on new VMs until there is something a guest can do with it. Clear the MPAM field from the guest's ID_AA64PFR0_EL1 and on hardware that supports MPAM, politely ignore the VMMs attempts to set this bit. Guests exposed to this bug have the sanitised value of the MPAM field, so only the correct value needs to be ignored. This means the field can continue to be used to block migration to incompatible hardware (between MPAM=1 and MPAM=5), and the VMM can't rely on the field being ignored. Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-7-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-19 18:11:34 +01:00
Weizhao Ouyang	e2ccaf2d0e	kselftest/arm64: abi: fix SVCR detection [ Upstream commit ce03573a1917532da06057da9f8e74a2ee9e2ac9 ] When using svcr_in to check ZA and Streaming Mode, we should make sure that the value in x2 is correct, otherwise it may trigger an Illegal instruction if FEAT_SVE and !FEAT_SME. Fixes: `43e3f85523` ("kselftest/arm64: Add SME support to syscall ABI test") Signed-off-by: Weizhao Ouyang <o451686892@gmail.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241211111639.12344-1-o451686892@gmail.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:34 +01:00
Nathan Chancellor	4a54211845	blk-iocost: Avoid using clamp() on inuse in __propagate_weights() [ Upstream commit 57e420c84f9ab55ba4c5e2ae9c5f6c8e1ea834d2 ] After a recent change to clamp() and its variants [1] that increases the coverage of the check that high is greater than low because it can be done through inlining, certain build configurations (such as s390 defconfig) fail to build with clang with: block/blk-iocost.c:1101:11: error: call to '__compiletime_assert_557' declared with 'error' attribute: clamp() low limit 1 greater than high limit active 1101 \| inuse = clamp_t(u32, inuse, 1, active); \| ^ include/linux/minmax.h:218:36: note: expanded from macro 'clamp_t' 218 \| #define clamp_t(type, val, lo, hi) __careful_clamp(type, val, lo, hi) \| ^ include/linux/minmax.h:195:2: note: expanded from macro '__careful_clamp' 195 \| __clamp_once(type, val, lo, hi, __UNIQUE_ID(v_), __UNIQUE_ID(l_), __UNIQUE_ID(h_)) \| ^ include/linux/minmax.h:188:2: note: expanded from macro '__clamp_once' 188 \| BUILD_BUG_ON_MSG(statically_true(ulo > uhi), \ \| ^ __propagate_weights() is called with an active value of zero in ioc_check_iocgs(), which results in the high value being less than the low value, which is undefined because the value returned depends on the order of the comparisons. The purpose of this expression is to ensure inuse is not more than active and at least 1. This could be written more simply with a ternary expression that uses min(inuse, active) as the condition so that the value of that condition can be used if it is not zero and one if it is. Do this conversion to resolve the error and add a comment to deter people from turning this back into clamp(). Fixes: `7caa47151a` ("blkcg: implement blk-iocost") Link: https://lore.kernel.org/r/34d53778977747f19cce2abb287bb3e6@AcuMS.aculab.com/ [1] Suggested-by: David Laight <david.laight@aculab.com> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/llvm/CA+G9fYsD7mw13wredcZn0L-KBA3yeoVSTuxnss-AEWMN3ha0cA@mail.gmail.com/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202412120322.3GfVe3vF-lkp@intel.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:34 +01:00
Jesse Van Gavere	ee11eaa151	net: dsa: microchip: KSZ9896 register regmap alignment to 32 bit boundaries [ Upstream commit 5af53577c64fa84da032d490b701127fe8d1a6aa ] Commit `8d7ae22ae9` ("net: dsa: microchip: KSZ9477 register regmap alignment to 32 bit boundaries") fixed an issue whereby regmap_reg_range did not allow writes as 32 bit words to KSZ9477 PHY registers, this fix for KSZ9896 is adapted from there as the same errata is present in KSZ9896C as "Module 5: Certain PHY registers must be written as pairs instead of singly" the explanation below is likewise taken from this commit. The commit provided code to apply "Module 6: Certain PHY registers must be written as pairs instead of singly" errata for KSZ9477 as this chip for certain PHY registers (0xN120 to 0xN13F, N=1,2,3,4,5) must be accessed as 32 bit words instead of 16 or 8 bit access. Otherwise, adjacent registers (no matter if reserved or not) are overwritten with 0x0. Without this patch some registers (e.g. 0x113c or 0x1134) required for 32 bit access are out of valid regmap ranges. As a result, following error is observed and KSZ9896 is not properly configured: ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0 The solution is to modify regmap_reg_range to allow accesses with 4 bytes boundaries. Fixes: `5c844d57aa` ("net: dsa: microchip: fix writes to phy registers >= 0x10") Signed-off-by: Jesse Van Gavere <jesse.vangavere@scioteq.com> Link: https://patch.msgid.link/20241211092932.26881-1-jesse.vangavere@scioteq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:34 +01:00
Nikita Yushchenko	f5fcb1ff9f	net: renesas: rswitch: fix initial MPIC register setting [ Upstream commit fb9e6039c325cc205a368046dc03c56c87df2310 ] MPIC.PIS must be set per phy interface type. MPIC.LSC must be set per speed. Do that strictly per datasheet, instead of hardcoding MPIC.PIS to GMII. Fixes: `3590918b5d` ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20241211053012.368914-1-nikita.yoush@cogentembedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:34 +01:00
Thadeu Lima de Souza Cascardo	ecdcaea0e4	Bluetooth: btmtk: avoid UAF in btmtk_process_coredump [ Upstream commit b548f5e9456c568155499d9ebac675c0d7a296e8 ] hci_devcd_append may lead to the release of the skb, so it cannot be accessed once it is called. ================================================================== BUG: KASAN: slab-use-after-free in btmtk_process_coredump+0x2a7/0x2d0 [btmtk] Read of size 4 at addr ffff888033cfabb0 by task kworker/0:3/82 CPU: 0 PID: 82 Comm: kworker/0:3 Tainted: G U 6.6.40-lockdep-03464-g1d8b4eb3060e #1 b0b3c1cc0c842735643fb411799d97921d1f688c Hardware name: Google Yaviks_Ufs/Yaviks_Ufs, BIOS Google_Yaviks_Ufs.15217.552.0 05/07/2024 Workqueue: events btusb_rx_work [btusb] Call Trace: <TASK> dump_stack_lvl+0xfd/0x150 print_report+0x131/0x780 kasan_report+0x177/0x1c0 btmtk_process_coredump+0x2a7/0x2d0 [btmtk 03edd567dd71a65958807c95a65db31d433e1d01] btusb_recv_acl_mtk+0x11c/0x1a0 [btusb 675430d1e87c4f24d0c1f80efe600757a0f32bec] btusb_rx_work+0x9e/0xe0 [btusb 675430d1e87c4f24d0c1f80efe600757a0f32bec] worker_thread+0xe44/0x2cc0 kthread+0x2ff/0x3a0 ret_from_fork+0x51/0x80 ret_from_fork_asm+0x1b/0x30 </TASK> Allocated by task 82: stack_trace_save+0xdc/0x190 kasan_set_track+0x4e/0x80 __kasan_slab_alloc+0x4e/0x60 kmem_cache_alloc+0x19f/0x360 skb_clone+0x132/0xf70 btusb_recv_acl_mtk+0x104/0x1a0 [btusb] btusb_rx_work+0x9e/0xe0 [btusb] worker_thread+0xe44/0x2cc0 kthread+0x2ff/0x3a0 ret_from_fork+0x51/0x80 ret_from_fork_asm+0x1b/0x30 Freed by task 1733: stack_trace_save+0xdc/0x190 kasan_set_track+0x4e/0x80 kasan_save_free_info+0x28/0xb0 ____kasan_slab_free+0xfd/0x170 kmem_cache_free+0x183/0x3f0 hci_devcd_rx+0x91a/0x2060 [bluetooth] worker_thread+0xe44/0x2cc0 kthread+0x2ff/0x3a0 ret_from_fork+0x51/0x80 ret_from_fork_asm+0x1b/0x30 The buggy address belongs to the object at ffff888033cfab40 which belongs to the cache skbuff_head_cache of size 232 The buggy address is located 112 bytes inside of freed 232-byte region [ffff888033cfab40, ffff888033cfac28) The buggy address belongs to the physical page: page:00000000a174ba93 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x33cfa head:00000000a174ba93 order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0 anon flags: 0x4000000000000840(slab\|head\|zone=1) page_type: 0xffffffff() raw: 4000000000000840 ffff888100848a00 0000000000000000 0000000000000001 raw: 0000000000000000 0000000080190019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888033cfaa80: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc ffff888033cfab00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb >ffff888033cfab80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888033cfac00: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc ffff888033cfac80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Check if we need to call hci_devcd_complete before calling hci_devcd_append. That requires that we check data->cd_info.cnt >= MTK_COREDUMP_NUM instead of data->cd_info.cnt > MTK_COREDUMP_NUM, as we increment data->cd_info.cnt only once the call to hci_devcd_append succeeds. Fixes: `0b70151328` ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:34 +01:00
Frédéric Danis	3bf09c685e	Bluetooth: SCO: Add support for 16 bits transparent voice setting [ Upstream commit 29a651451e6c264f58cd9d9a26088e579d17b242 ] The voice setting is used by sco_connect() or sco_conn_defer_accept() after being set by sco_sock_setsockopt(). The PCM part of the voice setting is used for offload mode through PCM chipset port. This commits add support for mSBC 16 bits offloading, i.e. audio data not transported over HCI. The BCM4349B1 supports 16 bits transparent data on its I2S port. If BT_VOICE_TRANSPARENT is used when accepting a SCO connection, this gives only garbage audio while using BT_VOICE_TRANSPARENT_16BIT gives correct audio. This has been tested with connection to iPhone 14 and Samsung S24. Fixes: `ad10b1a487` ("Bluetooth: Add Bluetooth socket voice option") Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:34 +01:00
Iulia Tanasescu	263b390a82	Bluetooth: iso: Fix recursive locking warning [ Upstream commit 9bde7c3b3ad0e1f39d6df93dd1c9caf63e19e50f ] This updates iso_sock_accept to use nested locking for the parent socket, to avoid lockdep warnings caused because the parent and child sockets are locked by the same thread: [ 41.585683] ============================================ [ 41.585688] WARNING: possible recursive locking detected [ 41.585694] 6.12.0-rc6+ #22 Not tainted [ 41.585701] -------------------------------------------- [ 41.585705] iso-tester/3139 is trying to acquire lock: [ 41.585711] ffff988b29530a58 (sk_lock-AF_BLUETOOTH) at: bt_accept_dequeue+0xe3/0x280 [bluetooth] [ 41.585905] but task is already holding lock: [ 41.585909] ffff988b29533a58 (sk_lock-AF_BLUETOOTH) at: iso_sock_accept+0x61/0x2d0 [bluetooth] [ 41.586064] other info that might help us debug this: [ 41.586069] Possible unsafe locking scenario: [ 41.586072] CPU0 [ 41.586076] ---- [ 41.586079] lock(sk_lock-AF_BLUETOOTH); [ 41.586086] lock(sk_lock-AF_BLUETOOTH); [ 41.586093] * DEADLOCK * [ 41.586097] May be due to missing lock nesting notation [ 41.586101] 1 lock held by iso-tester/3139: [ 41.586107] #0: ffff988b29533a58 (sk_lock-AF_BLUETOOTH) at: iso_sock_accept+0x61/0x2d0 [bluetooth] Fixes: `ccf74f2390` ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:33 +01:00
Luiz Augusto von Dentz	0108132d7d	Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating [ Upstream commit 581dd2dc168fe0ed2a7a5534a724f0d3751c93ae ] The usage of rcu_read_(un)lock while inside list_for_each_entry_rcu is not safe since for the most part entries fetched this way shall be treated as rcu_dereference: Note that the value returned by rcu_dereference() is valid only within the enclosing RCU read-side critical section [1]_. For example, the following is not legal:: rcu_read_lock(); p = rcu_dereference(head.next); rcu_read_unlock(); x = p->address; /* BUG!!! / rcu_read_lock(); y = p->data; / BUG!!! */ rcu_read_unlock(); Fixes: `a0bfde167b` ("Bluetooth: ISO: Add support for connecting multiple BISes") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:33 +01:00
Iulia Tanasescu	11dc486ed5	Bluetooth: ISO: Reassociate a socket with an active BIS [ Upstream commit fa224d0c094a458e9ebf5ea9b1c696136b7af427 ] For ISO Broadcast, all BISes from a BIG have the same lifespan - they cannot be created or terminated independently from each other. This links together all BIS hcons that are part of the same BIG, so all hcons are kept alive as long as the BIG is active. If multiple BIS sockets are opened for a BIG handle, and only part of them are closed at some point, the associated hcons will be marked as open. If new sockets will later be opened for the same BIG, they will be reassociated with the open BIS hcons. All BIS hcons will be cleaned up and the BIG will be terminated when the last BIS socket is closed from userspace. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 581dd2dc168f ("Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:33 +01:00
Daniil Tatianin	81c4b9529e	ACPICA: events/evxfregn: don't release the ContextMutex that was never acquired [ Upstream commit c53d96a4481f42a1635b96d2c1acbb0a126bfd54 ] This bug was first introduced in `c27f3d011b`, where the author of the patch probably meant to do DeleteMutex instead of ReleaseMutex. The mutex leak was noticed later on and fixed in `e4dfe10837`, but the bogus MutexRelease line was never removed, so do it now. Link: https://github.com/acpica/acpica/pull/982 Fixes: `c27f3d011b` ("ACPICA: Fix race in generic_serial_bus (I2C) and GPIO op_region parameter handling") Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Link: https://patch.msgid.link/20241122082954.658356-1-d-tatianin@yandex-team.ru Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:33 +01:00
Daniel Borkmann	c6c217c6e2	team: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL [ Upstream commit 98712844589e06d9aa305b5077169942139fd75c ] Similar to bonding driver, add NETIF_F_GSO_ENCAP_ALL to TEAM_VLAN_FEATURES in order to support slave devices which propagate NETIF_F_GSO_UDP_TUNNEL & NETIF_F_GSO_UDP_TUNNEL_CSUM as vlan_features. Fixes: `3625920b62` ("teaming: fix vlan_features computing") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20241210141245.327886-5-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:33 +01:00
Daniel Borkmann	679b5884e6	bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL [ Upstream commit 77b11c8bf3a228d1c63464534c2dcc8d9c8bf7ff ] Drivers like mlx5 expose NIC's vlan_features such as NETIF_F_GSO_UDP_TUNNEL & NETIF_F_GSO_UDP_TUNNEL_CSUM which are later not propagated when the underlying devices are bonded and a vlan device created on top of the bond. Right now, the more cumbersome workaround for this is to create the vlan on top of the mlx5 and then enslave the vlan devices to a bond. To fix this, add NETIF_F_GSO_ENCAP_ALL to BOND_VLAN_FEATURES such that bond_compute_features() can probe and propagate the vlan_features from the slave devices up to the vlan device. Given the following bond: # ethtool -i enp2s0f{0,1}np{0,1} driver: mlx5_core [...] # ethtool -k enp2s0f0np0 \| grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: on rx-udp-gro-forwarding: off # ethtool -k enp2s0f1np1 \| grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: on rx-udp-gro-forwarding: off # ethtool -k bond0 \| grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: off [fixed] rx-udp-gro-forwarding: off Before: # ethtool -k bond0.100 \| grep udp tx-udp_tnl-segmentation: off [requested on] tx-udp_tnl-csum-segmentation: off [requested on] tx-udp-segmentation: on rx-udp_tunnel-port-offload: off [fixed] rx-udp-gro-forwarding: off After: # ethtool -k bond0.100 \| grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: off [fixed] rx-udp-gro-forwarding: off Various users have run into this reporting performance issues when configuring Cilium in vxlan tunneling mode and having the combination of bond & vlan for the core devices connecting the Kubernetes cluster to the outside world. Fixes: `a9b3ace44c` ("bonding: fix vlan_features computing") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20241210141245.327886-3-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:33 +01:00
Martin Ottens	3824c5fad1	net/sched: netem: account for backlog updates from child qdisc [ Upstream commit f8d4bc455047cf3903cd6f85f49978987dbb3027 ] In general, 'qlen' of any classful qdisc should keep track of the number of packets that the qdisc itself and all of its children holds. In case of netem, 'qlen' only accounts for the packets in its internal tfifo. When netem is used with a child qdisc, the child qdisc can use 'qdisc_tree_reduce_backlog' to inform its parent, netem, about created or dropped SKBs. This function updates 'qlen' and the backlog statistics of netem, but netem does not account for changes made by a child qdisc. 'qlen' then indicates the wrong number of packets in the tfifo. If a child qdisc creates new SKBs during enqueue and informs its parent about this, netem's 'qlen' value is increased. When netem dequeues the newly created SKBs from the child, the 'qlen' in netem is not updated. If 'qlen' reaches the configured sch->limit, the enqueue function stops working, even though the tfifo is not full. Reproduce the bug: Ensure that the sender machine has GSO enabled. Configure netem as root qdisc and tbf as its child on the outgoing interface of the machine as follows: $ tc qdisc add dev <oif> root handle 1: netem delay 100ms limit 100 $ tc qdisc add dev <oif> parent 1:0 tbf rate 50Mbit burst 1542 latency 50ms Send bulk TCP traffic out via this interface, e.g., by running an iPerf3 client on the machine. Check the qdisc statistics: $ tc -s qdisc show dev <oif> Statistics after 10s of iPerf3 TCP test before the fix (note that netem's backlog > limit, netem stopped accepting packets): qdisc netem 1: root refcnt 2 limit 1000 delay 100ms Sent 2767766 bytes 1848 pkt (dropped 652, overlimits 0 requeues 0) backlog 4294528236b 1155p requeues 0 qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms Sent 2767766 bytes 1848 pkt (dropped 327, overlimits 7601 requeues 0) backlog 0b 0p requeues 0 Statistics after the fix: qdisc netem 1: root refcnt 2 limit 1000 delay 100ms Sent 37766372 bytes 24974 pkt (dropped 9, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms Sent 37766372 bytes 24974 pkt (dropped 327, overlimits 96017 requeues 0) backlog 0b 0p requeues 0 tbf segments the GSO SKBs (tbf_segment) and updates the netem's 'qlen'. The interface fully stops transferring packets and "locks". In this case, the child qdisc and tfifo are empty, but 'qlen' indicates the tfifo is at its limit and no more packets are accepted. This patch adds a counter for the entries in the tfifo. Netem's 'qlen' is only decreased when a packet is returned by its dequeue function, and not during enqueuing into the child qdisc. External updates to 'qlen' are thus accounted for and only the behavior of the backlog statistics changes. As in other qdiscs, 'qlen' then keeps track of how many packets are held in netem and all of its children. As before, sch->limit remains as the maximum number of packets in the tfifo. The same applies to netem's backlog statistics. Fixes: `50612537e9` ("netem: fix classful handling") Signed-off-by: Martin Ottens <martin.ottens@fau.de> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20241210131412.1837202-1-martin.ottens@fau.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:33 +01:00
Vladimir Oltean	72dc88eca7	net: dsa: felix: fix stuck CPU-injected packets with short taprio windows [ Upstream commit acfcdb78d5d4cdb78e975210c8825b9a112463f6 ] With this port schedule: tc qdisc replace dev $send_if parent root handle 100 taprio \ num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ map 0 1 2 3 4 5 6 7 \ base-time 0 cycle-time 10000 \ sched-entry S 01 1250 \ sched-entry S 02 1250 \ sched-entry S 04 1250 \ sched-entry S 08 1250 \ sched-entry S 10 1250 \ sched-entry S 20 1250 \ sched-entry S 40 1250 \ sched-entry S 80 1250 \ flags 2 ptp4l would fail to take TX timestamps of Pdelay_Resp messages like this: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l[4134.168]: port 2: send peer delay response failed It turns out that the driver can't take their TX timestamps because it can't transmit them in the first place. And there's nothing special about the Pdelay_Resp packets - they're just regular 68 byte packets. But with this taprio configuration, the switch would refuse to send even the ETH_ZLEN minimum packet size. This should have definitely not been the case. When applying the taprio config, the driver prints: mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS and thus, everything under 132 bytes - ETH_FCS_LEN should have been sent without problems. Yet it's not. For the forwarding path, the configuration is fine, yet packets injected from Linux get stuck with this schedule no matter what. The first hint that the static guard bands are the cause of the problem is that reverting Michael Walle's commit `297c4de6f7` ("net: dsa: felix: re-enable TAS guard band mode") made things work. It must be that the guard bands are calculated incorrectly. I remembered that there is a magic constant in the driver, set to 33 ns for no logical reason other than experimentation, which says "never let the static guard bands get so large as to leave less than this amount of remaining space in the time slot, because the queue system will refuse to schedule packets otherwise, and they will get stuck". I had a hunch that my previous experimentally-determined value was only good for packets coming from the forwarding path, and that the CPU injection path needed more. I came to the new value of 35 ns through binary search, after seeing that with 544 ns (the bit time required to send the Pdelay_Resp packet at gigabit) it works. Again, this is purely experimental, there's no logic and the manual doesn't say anything. The new driver prints for this schedule look like this: mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS So yes, the maximum MTU is now even smaller by 1 byte than before. This is maybe counter-intuitive, but makes more sense with a diagram of one time slot. Before: Gate open Gate close \| \| v 1250 ns total time slot duration v <----------------------------------------------------> <----><----------------------------------------------> 33 ns 1217 ns static guard band useful Gate open Gate close \| \| v 1250 ns total time slot duration v <----------------------------------------------------> <-----><---------------------------------------------> 35 ns 1215 ns static guard band useful The static guard band implemented by this switch hardware directly determines the maximum allowable MTU for that traffic class. The larger it is, the earlier the switch will stop scheduling frames for transmission, because otherwise they might overrun the gate close time (and avoiding that is the entire purpose of Michael's patch). So, we now have guard bands smaller by 2 ns, thus, in this particular case, we lose a byte of the maximum MTU. Fixes: `11afdc6526` ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Michael Walle <mwalle@kernel.org> Link: https://patch.msgid.link/20241210132640.3426788-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:32 +01:00
Florian Westphal	27f0574253	netfilter: nf_tables: do not defer rule destruction via call_rcu [ Upstream commit b04df3da1b5c6f6dc7cdccc37941740c078c4043 ] nf_tables_chain_destroy can sleep, it can't be used from call_rcu callbacks. Moreover, nf_tables_rule_release() is only safe for error unwinding, while transaction mutex is held and the to-be-desroyed rule was not exposed to either dataplane or dumps, as it deactives+frees without the required synchronize_rcu() in-between. nft_rule_expr_deactivate() callbacks will change ->use counters of other chains/sets, see e.g. nft_lookup .deactivate callback, these must be serialized via transaction mutex. Also add a few lockdep asserts to make this more explicit. Calling synchronize_rcu() isn't ideal, but fixing this without is hard and way more intrusive. As-is, we can get: WARNING: .. net/netfilter/nf_tables_api.c:5515 nft_set_destroy+0x.. Workqueue: events nf_tables_trans_destroy_work RIP: 0010:nft_set_destroy+0x3fe/0x5c0 Call Trace: <TASK> nf_tables_trans_destroy_work+0x6b7/0xad0 process_one_work+0x64a/0xce0 worker_thread+0x613/0x10d0 In case the synchronize_rcu becomes an issue, we can explore alternatives. One way would be to allocate nft_trans_rule objects + one nft_trans_chain object, deactivate the rules + the chain and then defer the freeing to the nft destroy workqueue. We'd still need to keep the synchronize_rcu path as a fallback to handle -ENOMEM corner cases though. Reported-by: syzbot+b26935466701e56cfdc2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67478d92.050a0220.253251.0062.GAE@google.com/T/ Fixes: c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:32 +01:00
Phil Sutter	8c2c8445cd	netfilter: IDLETIMER: Fix for possible ABBA deadlock [ Upstream commit f36b01994d68ffc253c8296e2228dfe6e6431c03 ] Deletion of the last rule referencing a given idletimer may happen at the same time as a read of its file in sysfs: \| ====================================================== \| WARNING: possible circular locking dependency detected \| 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted \| ------------------------------------------------------ \| iptables/3303 is trying to acquire lock: \| ffff8881057e04b8 (kn->active#48){++++}-{0:0}, at: __kernfs_remove+0x20 \| \| but task is already holding lock: \| ffffffffa0249068 (list_mutex){+.+.}-{3:3}, at: idletimer_tg_destroy_v] \| \| which lock already depends on the new lock. A simple reproducer is: \| #!/bin/bash \| \| while true; do \| iptables -A INPUT -i foo -j IDLETIMER --timeout 10 --label "testme" \| iptables -D INPUT -i foo -j IDLETIMER --timeout 10 --label "testme" \| done & \| while true; do \| cat /sys/class/xt_idletimer/timers/testme >/dev/null \| done Avoid this by freeing list_mutex right after deleting the element from the list, then continuing with the teardown. Fixes: `0902b469bd` ("netfilter: xtables: idletimer target implementation") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:32 +01:00
Nikita Yushchenko	01b2c76150	net: renesas: rswitch: handle stop vs interrupt race [ Upstream commit 3dd002f20098b9569f8fd7f8703f364571e2e975 ] Currently the stop routine of rswitch driver does not immediately prevent hardware from continuing to update descriptors and requesting interrupts. It can happen that when rswitch_stop() executes the masking of interrupts from the queues of the port being closed, napi poll for that port is already scheduled or running on a different CPU. When execution of this napi poll completes, it will unmask the interrupts. And unmasked interrupt can fire after rswitch_stop() returns from napi_disable() call. Then, the handler won't mask it, because napi_schedule_prep() will return false, and interrupt storm will happen. This can't be fixed by making rswitch_stop() call napi_disable() before masking interrupts. In this case, the interrupt storm will happen if interrupt fires between napi_disable() and masking. Fix this by checking for priv->opened_ports bit when unmasking interrupts after napi poll. For that to be consistent, move priv->opened_ports changes into spinlock-protected areas, and reorder other operations in rswitch_open() and rswitch_stop() accordingly. Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Fixes: `3590918b5d` ("net: ethernet: renesas: Add support for "Ethernet Switch"") Link: https://patch.msgid.link/20241209113204.175015-1-nikita.yoush@cogentembedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:32 +01:00
Nikita Yushchenko	bf8c6755f0	net: renesas: rswitch: avoid use-after-put for a device tree node [ Upstream commit 66b7e9f85b8459c823b11e9af69dbf4be5eb6be8 ] The device tree node saved in the rswitch_device structure is used at several driver locations. So passing this node to of_node_put() after the first use is wrong. Move of_node_put() for this node to exit paths. Fixes: `b46f1e5793` ("net: renesas: rswitch: Simplify struct phy * handling") Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://patch.msgid.link/20241208095004.69468-5-nikita.yoush@cogentembedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:32 +01:00
Nikita Yushchenko	78aa0aabb0	net: renesas: rswitch: fix leaked pointer on error path [ Upstream commit bb617328bafa1023d8e9c25a25345a564c66c14f ] If error path is taken while filling descriptor for a frame, skb pointer is left in the entry. Later, on the ring entry reuse, the same entry could be used as a part of a multi-descriptor frame, and skb for that new frame could be stored in a different entry. Then, the stale pointer will reach the completion routine, and passed to the release operation. Fix that by clearing the saved skb pointer at the error path. Fixes: d2c96b9d5f83 ("net: rswitch: Add jumbo frames handling for TX") Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://patch.msgid.link/20241208095004.69468-4-nikita.yoush@cogentembedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:32 +01:00
Nikita Yushchenko	0c316b6e0a	net: renesas: rswitch: fix race window between tx start and complete [ Upstream commit 0c9547e6ccf40455b0574cf589be3b152a3edf5b ] If hardware is already transmitting, it can start handling the descriptor being written to immediately after it observes updated DT field, before the queue is kicked by a write to GWTRC. If the start_xmit() execution is preempted at unfortunate moment, this transmission can complete, and interrupt handled, before gq->cur gets updated. With the current implementation of completion, this will cause the last entry not completed. Fix that by changing completion loop to check DT values directly, instead of depending on gq->cur. Fixes: `3590918b5d` ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://patch.msgid.link/20241208095004.69468-3-nikita.yoush@cogentembedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:31 +01:00
Yoshihiro Shimoda	af327c0f41	net: rswitch: Add jumbo frames handling for TX [ Upstream commit d2c96b9d5f83e4327cf044d00d7f713edd7fecfd ] If the driver would like to transmit a jumbo frame like 2KiB or more, it should be split into multiple queues. In the near future, to support this, add handling specific descriptor types F{START,MID,END}. However, such jumbo frames will not happen yet because the maximum MTU size is still default for now. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 0c9547e6ccf4 ("net: renesas: rswitch: fix race window between tx start and complete") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-19 18:11:31 +01:00

1 2 3 4 5 ...

1230919 Commits