linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-05 18:41:58 +09:00

Author	SHA1	Message	Date
Prashanth K	cb174344bf	usb: gadget: u_serial: Avoid spinlock recursion in __gs_console_push [ Upstream commit `e599046994` ] When serial console over USB is enabled, gs_console_connect queues gs_console_work, where it acquires the spinlock and queues the usb request, and this request goes to gadget layer. Now consider a situation where gadget layer prints something to dmesg, this will eventually call gs_console_write() which requires cons->lock. And this causes spinlock recursion. Avoid this by excluding usb_ep_queue from the spinlock. spin_lock_irqsave //needs cons->lock gs_console_write . . _printk __warn_printk dev_warn/pr_err . . [USB Gadget Layer] . . usb_ep_queue gs_console_work __gs_console_push // acquires cons->lock process_one_work Signed-off-by: Prashanth K <quic_prashk@quicinc.com> Link: https://lore.kernel.org/r/1683638872-6885-1-git-send-email-quic_prashk@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:23 +02:00
Yunfei Dong	1676748aa2	media: v4l2-mem2mem: add lock to protect parameter num_rdy [ Upstream commit `56b5c3e67b` ] Getting below error when using KCSAN to check the driver. Adding lock to protect parameter num_rdy when getting the value with function: v4l2_m2m_num_src_bufs_ready/v4l2_m2m_num_dst_bufs_ready. kworker/u16:3: [name:report&]BUG: KCSAN: data-race in v4l2_m2m_buf_queue kworker/u16:3: [name:report&] kworker/u16:3: [name:report&]read-write to 0xffffff8105f35b94 of 1 bytes by task 20865 on cpu 7: kworker/u16:3: v4l2_m2m_buf_queue+0xd8/0x10c Signed-off-by: Pina Chen <pina.chen@mediatek.com> Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:23 +02:00
Paulo Alcantara	9850867042	smb: client: fix warning in cifs_smb3_do_mount() [ Upstream commit `12c30f33cc` ] This fixes the following warning reported by kernel test robot fs/smb/client/cifsfs.c:982 cifs_smb3_do_mount() warn: possible memory leak of 'cifs_sb' Link: https://lore.kernel.org/all/202306170124.CtQqzf0I-lkp@intel.com/ Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:23 +02:00
Christian Brauner	a3f252436e	ovl: check type and offset of struct vfsmount in ovl_entry [ Upstream commit `f723edb8a5` ] Porting overlayfs to the new amount api I started experiencing random crashes that couldn't be explained easily. So after much debugging and reasoning it became clear that struct ovl_entry requires the point to struct vfsmount to be the first member and of type struct vfsmount. During the port I added a new member at the beginning of struct ovl_entry which broke all over the place in the form of random crashes and cache corruptions. While there's a comment in ovl_free_fs() to the effect of "Hack! Reuse ofs->layers as a vfsmount array before freeing it" there's no such comment on struct ovl_entry which makes this easy to trip over. Add a comment and two static asserts for both the offset and the type of pointer in struct ovl_entry. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:23 +02:00
Patrisious Haddad	1a650d3ccd	RDMA/mlx5: Return the firmware result upon destroying QP/RQ [ Upstream commit `22664c06e9` ] Previously when destroying a QP/RQ, the result of the firmware destruction function was ignored and upper layers weren't informed about the failure. Which in turn could lead to various problems since when upper layer isn't aware of the failure it continues its operation thinking that the related QP/RQ was successfully destroyed while it actually wasn't, which could lead to the below kernel WARN. Currently, we return the correct firmware destruction status to upper layers which in case of the RQ would be mlx5_ib_destroy_wq() which was already capable of handling RQ destruction failure or in case of a QP to destroy_qp_common(), which now would actually warn upon qp destruction failure. WARNING: CPU: 3 PID: 995 at drivers/infiniband/core/rdma_core.c:940 uverbs_destroy_ufile_hw+0xcb/0xe0 [ib_uverbs] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse CPU: 3 PID: 995 Comm: python3 Not tainted 5.16.0-rc5+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:uverbs_destroy_ufile_hw+0xcb/0xe0 [ib_uverbs] Code: 41 5c 41 5d 41 5e e9 44 34 f0 e0 48 89 df e8 4c 77 ff ff 49 8b 86 10 01 00 00 48 85 c0 74 a1 4c 89 e7 ff d0 eb 9a 0f 0b eb c1 <0f> 0b be 04 00 00 00 48 89 df e8 b6 f6 ff ff e9 75 ff ff ff 90 0f RSP: 0018:ffff8881533e3e78 EFLAGS: 00010287 RAX: ffff88811b2cf3e0 RBX: ffff888106209700 RCX: 0000000000000000 RDX: ffff888106209780 RSI: ffff8881533e3d30 RDI: ffff888109b101a0 RBP: 0000000000000001 R08: ffff888127cb381c R09: 0de9890000000009 R10: ffff888127cb3800 R11: 0000000000000000 R12: ffff888106209780 R13: ffff888106209750 R14: ffff888100f20660 R15: 0000000000000000 FS: 00007f8be353b740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8bd5b117c0 CR3: 000000012cd8a004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ib_uverbs_close+0x1a/0x90 [ib_uverbs] __fput+0x82/0x230 task_work_run+0x59/0x90 exit_to_user_mode_prepare+0x138/0x140 syscall_exit_to_user_mode+0x1d/0x50 ? __x64_sys_close+0xe/0x40 do_syscall_64+0x4a/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f8be3ae0abb Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 83 43 f9 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 c1 43 f9 ff 8b 44 RSP: 002b:00007ffdb51909c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000557bb7f7c020 RCX: 00007f8be3ae0abb RDX: 0000557bb7c74010 RSI: 0000557bb7f14ca0 RDI: 0000000000000005 RBP: 0000557bb7fbd598 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 0000557bb7fbd5b8 R13: 0000557bb7fbd5a8 R14: 0000000000001000 R15: 0000557bb7f7c020 </TASK> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/c6df677f931d18090bafbe7f7dbb9524047b7d9b.1685953497.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:23 +02:00
Marco Morandini	9631d88503	HID: add quirk for 03f0:464a HP Elite Presenter Mouse [ Upstream commit `0db117359e` ] HP Elite Presenter Mouse HID Record Descriptor shows two mouses (Repord ID 0x1 and 0x2), one keypad (Report ID 0x5), two Consumer Controls (Report IDs 0x6 and 0x3). Previous to this commit it registers one mouse, one keypad and one Consumer Control, and it was usable only as a digitl laser pointer (one of the two mouses). This patch defines the 464a USB device ID and enables the HID_QUIRK_MULTI_INPUT quirk for it, allowing to use the device both as a mouse and a digital laser pointer. Signed-off-by: Marco Morandini <marco.morandini@polimi.it> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:23 +02:00
Lang Yu	4921792e04	drm/amdgpu: install stub fence into potential unused fence pointers [ Upstream commit `187916e6ed` ] When using cpu to update page tables, vm update fences are unused. Install stub fence into these fence pointers instead of NULL to avoid NULL dereference when calling dma_fence_wait() on them. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:23 +02:00
stuarthayhurst	fd41646d43	HID: logitech-hidpp: Add USB and Bluetooth IDs for the Logitech G915 TKL Keyboard [ Upstream commit `48aea8b445` ] Adds the USB and Bluetooth IDs for the Logitech G915 TKL keyboard, for device detection For this device, this provides battery reporting on top of hid-generic Reviewed-by: Bastien Nocera <hadess@hadess.net> Signed-off-by: Stuart Hayhurst <stuart.a.hayhurst@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:23 +02:00
gaoxu	ff10cd3e9b	dma-remap: use kvmalloc_array/kvfree for larger dma memory remap [ Upstream commit `51ff97d54f` ] If dma_direct_alloc() alloc memory in size of 64MB, the inner function dma_common_contiguous_remap() will allocate 128KB memory by invoking the function kmalloc_array(). and the kmalloc_array seems to fail to try to allocate 128KB mem. Call trace: [14977.928623] qcrosvm: page allocation failure: order:5, mode:0x40cc0 [14977.928638] dump_backtrace.cfi_jt+0x0/0x8 [14977.928647] dump_stack_lvl+0x80/0xb8 [14977.928652] warn_alloc+0x164/0x200 [14977.928657] __alloc_pages_slowpath+0x9f0/0xb4c [14977.928660] __alloc_pages+0x21c/0x39c [14977.928662] kmalloc_order+0x48/0x108 [14977.928666] kmalloc_order_trace+0x34/0x154 [14977.928668] __kmalloc+0x548/0x7e4 [14977.928673] dma_direct_alloc+0x11c/0x4f8 [14977.928678] dma_alloc_attrs+0xf4/0x138 [14977.928680] gh_vm_ioctl_set_fw_name+0x3c4/0x610 [gunyah] [14977.928698] gh_vm_ioctl+0x90/0x14c [gunyah] [14977.928705] __arm64_sys_ioctl+0x184/0x210 work around by doing kvmalloc_array instead. Signed-off-by: Gao Xu <gaoxu2@hihonor.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:22 +02:00
Pierre-Louis Bossart	e9ce774052	ASoC: SOF: Intel: fix SoundWire/HDaudio mutual exclusion [ Upstream commit `f751b99255` ] The functionality described in Commit `61bef9e68d` ("ASoC: SOF: Intel: hda: enforce exclusion between HDaudio and SoundWire") does not seem to be properly implemented with two issues that need to be corrected. a) The test used is incorrect when DisplayAudio codecs are not supported. b) Conversely when only Display Audio codecs can be found, we do want to start the SoundWire links, if any. That will help add the relevant topologies and machine descriptors, and identify cases where the SoundWire information in ACPI needs to be modified with a quirk. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://lore.kernel.org/r/20230606222529.57156-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:22 +02:00
Geert Uytterhoeven	7d53d1e476	iopoll: Call cpu_relax() in busy loops [ Upstream commit `b407460ee9` ] It is considered good practice to call cpu_relax() in busy loops, see Documentation/process/volatile-considered-harmful.rst. This can not only lower CPU power consumption or yield to a hyperthreaded twin processor, but also allows an architecture to mitigate hardware issues (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the architecture-specific cpu_relax() implementation. In addition, cpu_relax() is also a compiler barrier. It is not immediately obvious that the @op argument "function" will result in an actual function call (e.g. in case of inlining). Where a function call is a C sequence point, this is lost on inlining. Therefore, with agressive enough optimization it might be possible for the compiler to hoist the: (val) = op(args); "load" out of the loop because it doesn't see the value changing. The addition of cpu_relax() would inhibit this. As the iopoll helpers lack calls to cpu_relax(), people are sometimes reluctant to use them, and may fall back to open-coded polling loops (including cpu_relax() calls) instead. Fix this by adding calls to cpu_relax() to the iopoll helpers: - For the non-atomic case, it is sufficient to call cpu_relax() in case of a zero sleep-between-reads value, as a call to usleep_range() is a safe barrier otherwise. However, it doesn't hurt to add the call regardless, for simplicity, and for similarity with the atomic case below. - For the atomic case, cpu_relax() must be called regardless of the sleep-between-reads value, as there is no guarantee all architecture-specific implementations of udelay() handle this. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/45c87bec3397fdd704376807f0eec5cc71be440f.1685692810.git.geert+renesas@glider.be Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:22 +02:00
Oleksij Rempel	ca66e9dd98	ARM: dts: imx6dl: prtrvt, prtvt7, prti6q, prtwd2: fix USB related warnings [ Upstream commit `1d14bd943f` ] Fix USB-related warnings in prtrvt, prtvt7, prti6q and prtwd2 device trees by disabling unused usbphynop1 and usbphynop2 USB PHYs and providing proper configuration for the over-current detection. This fixes the following warnings with the current kernel: usb_phy_generic usbphynop1: dummy supplies not allowed for exclusive requests usb_phy_generic usbphynop2: dummy supplies not allowed for exclusive requests imx_usb 2184200.usb: No over current polarity defined By the way, fix over-current detection on usbotg port for prtvt7, prti6q and prtwd2 boards. Only prtrvt do not have OC on USB OTG port. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:22 +02:00
Sumit Gupta	cc15908308	PCI: tegra194: Fix possible array out of bounds access [ Upstream commit `205b3d02d5` ] Add check to fix the possible array out of bounds violation by making speed equal to GEN1_CORE_CLK_FREQ when its value is more than the size of "pcie_gen_freq" array. This array has size of four but possible speed (CLS) values are from "0 to 0xF". So, "speed - 1" values are "-1 to 0xE". Suggested-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Link: https://lore.kernel.org/lkml/72b9168b-d4d6-4312-32ea-69358df2f2d0@nvidia.com/ Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:22 +02:00
Jakub Kicinski	e2d10f1de1	net: tls: avoid discarding data on record close [ Upstream commit `6b47808f22` ] TLS records end with a 16B tag. For TLS device offload we only need to make space for this tag in the stream, the device will generate and replace it with the actual calculated tag. Long time ago the code would just re-reference the head frag which mostly worked but was suboptimal because it prevented TCP from combining the record into a single skb frag. I'm not sure if it was correct as the first frag may be shorter than the tag. The commit under fixes tried to replace that with using the page frag and if the allocation failed rolling back the data, if record was long enough. It achieves better fragment coalescing but is also buggy. We don't roll back the iterator, so unless we're at the end of send we'll skip the data we designated as tag and start the next record as if the rollback never happened. There's also the possibility that the record was constructed with MSG_MORE and the data came from a different syscall and we already told the user space that we "got it". Allocate a single dummy page and use it as fallback. Found by code inspection, and proven by forcing allocation failures. Fixes: `e7b159a48b` ("net/tls: remove the record tail optimization") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:22 +02:00
Tariq Toukan	9a15ca8939	net/tls: Multi-threaded calls to TX tls_dev_del [ Upstream commit `7adc91e0c9` ] Multiple TLS device-offloaded contexts can be added in parallel via concurrent calls to .tls_dev_add, while calls to .tls_dev_del are sequential in tls_device_gc_task. This is not a sustainable behavior. This creates a rate gap between add and del operations (addition rate outperforms the deletion rate). When running for enough time, the TLS device resources could get exhausted, failing to offload new connections. Replace the single-threaded garbage collector work with a per-context alternative, so they can be handled on several cores in parallel. Use a new dedicated destruct workqueue for this. Tested with mlx5 device: Before: 22141 add/sec, 103 del/sec After: 11684 add/sec, 11684 del/sec Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: `6b47808f22` ("net: tls: avoid discarding data on record close") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:22 +02:00
Tariq Toukan	2d93157b7e	net/tls: Perform immediate device ctx cleanup when possible [ Upstream commit `113671b255` ] TLS context destructor can be run in atomic context. Cleanup operations for device-offloaded contexts could require access and interaction with the device callbacks, which might sleep. Hence, the cleanup of such contexts must be deferred and completed inside an async work. For all others, this is not necessary, as cleanup is atomic. Invoke cleanup immediately for them, avoiding queueing redundant gc work. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: `6b47808f22` ("net: tls: avoid discarding data on record close") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:21 +02:00
Eric Dumazet	51222e1c77	macsec: use DEV_STATS_INC() [ Upstream commit `32d0a49d36` ] syzbot/KCSAN reported data-races in macsec whenever dev->stats fields are updated. It appears all of these updates can happen from multiple cpus. Adopt SMP safe DEV_STATS_INC() to update dev->stats fields. Fixes: `c09440f7dc` ("macsec: introduce IEEE 802.1AE driver") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:21 +02:00
Clayton Yager	3d64a232e4	macsec: Fix traffic counters/statistics [ Upstream commit `91ec9bd57f` ] OutOctetsProtected, OutOctetsEncrypted, InOctetsValidated, and InOctetsDecrypted were incrementing by the total number of octets in frames instead of by the number of octets of User Data in frames. The Controlled Port statistics ifOutOctets and ifInOctets were incrementing by the total number of octets instead of the number of octets of the MSDUs plus octets of the destination and source MAC addresses. The Controlled Port statistics ifInDiscards and ifInErrors were not incrementing each time the counters they aggregate were. The Controlled Port statistic ifInErrors was not included in the output of macsec_get_stats64 so the value was not present in ip commands output. The ReceiveSA counters InPktsNotValid, InPktsNotUsingSA, and InPktsUnusedSA were not incrementing. Signed-off-by: Clayton Yager <Clayton_Yager@selinc.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: `32d0a49d36` ("macsec: use DEV_STATS_INC()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:21 +02:00
Ido Schimmel	396a192140	selftests: forwarding: tc_actions: Use ncat instead of nc [ Upstream commit `5e8670610b` ] The test relies on 'nc' being the netcat version from the nmap project. While this seems to be the case on Fedora, it is not the case on Ubuntu, resulting in failures such as [1]. Fix by explicitly using the 'ncat' utility from the nmap project and the skip the test in case it is not installed. [1] # timeout set to 0 # selftests: net/forwarding: tc_actions.sh # TEST: gact drop and ok (skip_hw) [ OK ] # TEST: mirred egress flower redirect (skip_hw) [ OK ] # TEST: mirred egress flower mirror (skip_hw) [ OK ] # TEST: mirred egress matchall mirror (skip_hw) [ OK ] # TEST: mirred_egress_to_ingress (skip_hw) [ OK ] # nc: invalid option -- '-' # usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl] # [-m minttl] [-O length] [-P proxy_username] [-p source_port] # [-q seconds] [-s sourceaddr] [-T keyword] [-V rtable] [-W recvlimit] # [-w timeout] [-X proxy_protocol] [-x proxy_address[:port]] # [destination] [port] # nc: invalid option -- '-' # usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl] # [-m minttl] [-O length] [-P proxy_username] [-p source_port] # [-q seconds] [-s sourceaddr] [-T keyword] [-V rtable] [-W recvlimit] # [-w timeout] [-X proxy_protocol] [-x proxy_address[:port]] # [destination] [port] # TEST: mirred_egress_to_ingress_tcp (skip_hw) [FAIL] # server output check failed # INFO: Could not test offloaded functionality not ok 80 selftests: net/forwarding: tc_actions.sh # exit=1 Fixes: `ca22da2fbd` ("act_mirred: use the backlog for nested calls to mirred ingress") Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.unizg.hr/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20230808141503.4060661-12-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:21 +02:00
Davide Caratti	d61a0886d3	selftests: forwarding: tc_actions: cleanup temporary files when test is aborted [ Upstream commit `f58531716c` ] remove temporary files created by 'mirred_egress_to_ingress_tcp' test in the cleanup() handler. Also, change variable names to avoid clashing with globals from lib.sh. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/091649045a017fc00095ecbb75884e5681f7025f.1676368027.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: `5e8670610b` ("selftests: forwarding: tc_actions: Use ncat instead of nc") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:21 +02:00
Kunihiko Hayashi	a798977df6	mmc: sdhci-f-sdh30: Replace with sdhci_pltfm [ Upstream commit `5def5c1c15` ] Even if sdhci_pltfm_pmops is specified for PM, this driver doesn't apply sdhci_pltfm, so the structure is not correctly referenced in PM functions. This applies sdhci_pltfm to this driver to fix this issue. - Call sdhci_pltfm_init() instead of sdhci_alloc_host() and other functions that covered by sdhci_pltfm. - Move ops and quirks to sdhci_pltfm_data - Replace sdhci_priv() with own private function sdhci_f_sdh30_priv(). Fixes: `87a507459f` ("mmc: sdhci: host: add new f_sdh30") Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230630004533.26644-1-hayashi.kunihiko@socionext.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-26 14:23:21 +02:00
Greg Kroah-Hartman	f6f7927ac6	Linux 5.15.127 Link: https://lore.kernel.org/r/20230813211710.787645394@linuxfoundation.org Tested-by: Thierry Reding <treding@nvidia.com> Tested-by: SeongJae Park <sj@kernel.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Allen Pais <apais@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:04 +02:00
Frederic Weisbecker	c597d8cb0d	timers/nohz: Last resort update jiffies on nohz_full IRQ entry [ Upstream commit `53e87e3cdc` ] When at least one CPU runs in nohz_full mode, a dedicated timekeeper CPU is guaranteed to stay online and to never stop its tick. Meanwhile on some rare case, the dedicated timekeeper may be running with interrupts disabled for a while, such as in stop_machine. If jiffies stop being updated, a nohz_full CPU may end up endlessly programming the next tick in the past, taking the last jiffies update monotonic timestamp as a stale base, resulting in an tick storm. Here is a scenario where it matters: 0) CPU 0 is the timekeeper and CPU 1 a nohz_full CPU. 1) A stop machine callback is queued to execute somewhere. 2) CPU 0 reaches MULTI_STOP_DISABLE_IRQ while CPU 1 is still in MULTI_STOP_PREPARE. Hence CPU 0 can't do its timekeeping duty. CPU 1 can still take IRQs. 3) CPU 1 receives an IRQ which queues a timer callback one jiffy forward. 4) On IRQ exit, CPU 1 schedules the tick one jiffy forward, taking last_jiffies_update as a base. But last_jiffies_update hasn't been updated for 2 jiffies since the timekeeper has interrupts disabled. 5) clockevents_program_event(), which relies on ktime_get(), observes that the expiration is in the past and therefore programs the min delta event on the clock. 6) The tick fires immediately, goto 3) 7) Tick storm, the nohz_full CPU is drown and takes ages to reach MULTI_STOP_DISABLE_IRQ, which is the only way out of this situation. Solve this with unconditionally updating jiffies if the value is stale on nohz_full IRQ entry. IRQs and other disturbances are expected to be rare enough on nohz_full for the unconditional call to ktime_get() to actually matter. Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20211026141055.57358-2-frederic@kernel.org Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:04 +02:00
Nicholas Piggin	b4d36e6c5d	timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped [ Upstream commit `62c1256d54` ] When tick_nohz_stop_tick() stops the tick and high resolution timers are disabled, then the clock event device is not put into ONESHOT_STOPPED mode. This can lead to spurious timer interrupts with some clock event device drivers that don't shut down entirely after firing. Eliminate these by putting the device into ONESHOT_STOPPED mode at points where it is not being reprogrammed. When there are no timers active, then tick_program_event() with KTIME_MAX can be used to stop the device. When there is a timer active, the device can be stopped at the next tick (any new timer added by timers will reprogram the tick). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220422141446.915024-1-npiggin@gmail.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:04 +02:00
Frederic Weisbecker	c3b954a51b	tick: Detect and fix jiffies update stall [ Upstream commit `a1ff03cd6f` ] tick: Detect and fix jiffies update stall On some rare cases, the timekeeper CPU may be delaying its jiffies update duty for a while. Known causes include: * The timekeeper is waiting on stop_machine in a MULTI_STOP_DISABLE_IRQ or MULTI_STOP_RUN state. Disabled interrupts prevent from timekeeping updates while waiting for the target CPU to complete its stop_machine() callback. * The timekeeper vcpu has VMEXIT'ed for a long while due to some overload on the host. Detect and fix these situations with emergency timekeeping catchups. Original-patch-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:04 +02:00
Eric Dumazet	af99918f0e	sch_netem: fix issues in netem_change() vs get_dist_table() commit `11b73313c1` upstream. In blamed commit, I missed that get_dist_table() was allocating memory using GFP_KERNEL, and acquiring qdisc lock to perform the swap of newly allocated table with current one. In this patch, get_dist_table() is allocating memory and copy user data before we acquire the qdisc lock. Then we perform swap operations while being protected by the lock. Note that after this patch netem_change() no longer can do partial changes. If an error is returned, qdisc conf is left unchanged. Fixes: `2174a08db8` ("sch_netem: acquire qdisc lock in netem_change()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230622181503.2327695-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:04 +02:00
Masahiro Yamada	5d094d4e7b	alpha: remove __init annotation from exported page_is_ram() commit `6ccbd7fd47` upstream. EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Commit `c5a130325f` ("ACPI/APEI: Add parameter check before error injection") exported page_is_ram(), hence the __init annotation should be removed. This fixes the modpost warning in ARCH=alpha builds: WARNING: modpost: vmlinux: page_is_ram: EXPORT_SYMBOL used for init symbol. Remove __init or EXPORT_SYMBOL. Fixes: `c5a130325f` ("ACPI/APEI: Add parameter check before error injection") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Nilesh Javali	f8d6d25756	scsi: qedf: Fix firmware halt over suspend and resume commit `ef222f551e` upstream. While performing certain power-off sequences, PCI drivers are called to suspend and resume their underlying devices through PCI PM (power management) interface. However the hardware does not support PCI PM suspend/resume operations so system wide suspend/resume leads to bad MFW (management firmware) state which causes various follow-up errors in driver when communicating with the device/firmware. To fix this driver implements PCI PM suspend handler to indicate unsupported operation to the PCI subsystem explicitly, thus avoiding system to go into suspended/standby mode. Fixes: `61d8658b4a` ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.") Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20230807093725.46829-1-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Nilesh Javali	85db1cd174	scsi: qedi: Fix firmware halt over suspend and resume commit `1516ee035d` upstream. While performing certain power-off sequences, PCI drivers are called to suspend and resume their underlying devices through PCI PM (power management) interface. However the hardware does not support PCI PM suspend/resume operations so system wide suspend/resume leads to bad MFW (management firmware) state which causes various follow-up errors in driver when communicating with the device/firmware. To fix this driver implements PCI PM suspend handler to indicate unsupported operation to the PCI subsystem explicitly, thus avoiding system to go into suspended/standby mode. Fixes: `ace7f46ba5` ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.") Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20230807093725.46829-2-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Karan Tilak Kumar	e70469c289	scsi: fnic: Replace return codes in fnic_clean_pending_aborts() commit `5a43b07a87` upstream. fnic_clean_pending_aborts() was returning a non-zero value irrespective of failure or success. This caused the caller of this function to assume that the device reset had failed, even though it would succeed in most cases. As a consequence, a successful device reset would escalate to host reset. Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com> Tested-by: Karan Tilak Kumar <kartilak@cisco.com> Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com> Link: https://lore.kernel.org/r/20230727193919.2519-1-kartilak@cisco.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Zhu Wang	6bc7f4c8c2	scsi: core: Fix possible memory leak if device_add() fails commit `04b5b5cb01` upstream. If device_add() returns error, the name allocated by dev_set_name() needs be freed. As the comment of device_add() says, put_device() should be used to decrease the reference count in the error path. So fix this by calling put_device(), then the name can be freed in kobject_cleanp(). Fixes: `ee959b00c3` ("SCSI: convert struct class_device to struct device") Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Link: https://lore.kernel.org/r/20230803020230.226903-1-wangzhu9@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Zhu Wang	461f8ac666	scsi: snic: Fix possible memory leak if device_add() fails commit `41320b18a0` upstream. If device_add() returns error, the name allocated by dev_set_name() needs be freed. As the comment of device_add() says, put_device() should be used to give up the reference in the error path. So fix this by calling put_device(), then the name can be freed in kobject_cleanp(). Fixes: `c8806b6c9e` ("snic: driver for Cisco SCSI HBA") Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Acked-by: Narsimhulu Musini <nmusini@cisco.com> Link: https://lore.kernel.org/r/20230801111421.63651-1-wangzhu9@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Alexandra Diupina	171e117cdc	scsi: 53c700: Check that command slot is not NULL commit `8366d1f124` upstream. Add a check for the command slot value to avoid dereferencing a NULL pointer. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Co-developed-by: Vladimir Telezhnikov <vtelezhnikov@astralinux.ru> Signed-off-by: Vladimir Telezhnikov <vtelezhnikov@astralinux.ru> Signed-off-by: Alexandra Diupina <adiupina@astralinux.ru> Link: https://lore.kernel.org/r/20230728123521.18293-1-adiupina@astralinux.ru Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Michael Kelley	7a792b3d88	scsi: storvsc: Fix handling of virtual Fibre Channel timeouts commit `175544ad48` upstream. Hyper-V provides the ability to connect Fibre Channel LUNs to the host system and present them in a guest VM as a SCSI device. I/O to the vFC device is handled by the storvsc driver. The storvsc driver includes a partial integration with the FC transport implemented in the generic portion of the Linux SCSI subsystem so that FC attributes can be displayed in /sys. However, the partial integration means that some aspects of vFC don't work properly. Unfortunately, a full and correct integration isn't practical because of limitations in what Hyper-V provides to the guest. In particular, in the context of Hyper-V storvsc, the FC transport timeout function fc_eh_timed_out() causes a kernel panic because it can't find the rport and dereferences a NULL pointer. The original patch that added the call from storvsc_eh_timed_out() to fc_eh_timed_out() is faulty in this regard. In many cases a timeout is due to a transient condition, so the situation can be improved by just continuing to wait like with other I/O requests issued by storvsc, and avoiding the guaranteed panic. For a permanent failure, continuing to wait may result in a hung thread instead of a panic, which again may be better. So fix the panic by removing the storvsc call to fc_eh_timed_out(). This allows storvsc to keep waiting for a response. The change has been tested by users who experienced a panic in fc_eh_timed_out() due to transient timeouts, and it solves their problem. In the future we may want to deprecate the vFC functionality in storvsc since it can't be fully fixed. But it has current users for whom it is working well enough, so it should probably stay for a while longer. Fixes: `3930d73098` ("scsi: storvsc: use default I/O timeout handler for FC devices") Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1690606764-79669-1-git-send-email-mikelley@microsoft.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Tony Battersby	0f52d7b782	scsi: core: Fix legacy /proc parsing buffer overflow commit `9426d3cef5` upstream. (lightly modified commit message mostly by Linus Torvalds) The parsing code for /proc/scsi/scsi is disgusting and broken. We should have just used 'sscanf()' or something simple like that, but the logic may actually predate our kernel sscanf library routine for all I know. It certainly predates both git and BK histories. And we can't change it to be something sane like that now, because the string matching at the start is done case-insensitively, and the separator parsing between numbers isn't done at all, so any separator will work, including a possible terminating NUL character. This interface is root-only, and entirely for legacy use, so there is absolutely no point in trying to tighten up the parsing. Because any separator has traditionally worked, it's entirely possible that people have used random characters rather than the suggested space. So don't bother to try to pretty it up, and let's just make a minimal patch that can be back-ported and we can forget about this whole sorry thing for another two decades. Just make it at least not read past the end of the supplied data. Link: https://lore.kernel.org/linux-scsi/b570f5fe-cb7c-863a-6ed9-f6774c219b88@cybernetics.com/ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin K Petersen <martin.petersen@oracle.com> Cc: James Bottomley <jejb@linux.ibm.com> Cc: Willy Tarreau <w@1wt.eu> Cc: stable@kernel.org Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Martin K Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Pablo Neira Ayuso	b757ef99df	netfilter: nf_tables: report use refcount overflow commit `1689f25924` upstream. Overflow use refcount checks are not complete. Add helper function to deal with object reference counter tracking. Report -EMFILE in case UINT_MAX is reached. nft_use_dec() splats in case that reference counter underflows, which should not ever happen. Add nft_use_inc_restore() and nft_use_dec_restore() which are used to restore reference counter from error and abort paths. Use u32 in nft_flowtable and nft_object since helper functions cannot work on bitfields. Remove the few early incomplete checks now that the helper functions are in place and used to check for refcount overflow. Fixes: `96518518cc` ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Ming Lei	9bdbbcf9d1	nvme-rdma: fix potential unbalanced freeze & unfreeze commit `29b434d1e4` upstream. Move start_freeze into nvme_rdma_configure_io_queues(), and there is at least two benefits: 1) fix unbalanced freeze and unfreeze, since re-connection work may fail or be broken by removal 2) IO during error recovery can be failfast quickly because nvme fabrics unquiesces queues after teardown. One side-effect is that !mpath request may timeout during connecting because of queue topo change, but that looks not one big deal: 1) same problem exists with current code base 2) compared with !mpath, mpath use case is dominant Fixes: `9f98772ba3` ("nvme-rdma: fix controller reset hang during traffic") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Ming Lei	d68f8ef6ef	nvme-tcp: fix potential unbalanced freeze & unfreeze commit `99dc264014` upstream. Move start_freeze into nvme_tcp_configure_io_queues(), and there is at least two benefits: 1) fix unbalanced freeze and unfreeze, since re-connection work may fail or be broken by removal 2) IO during error recovery can be failfast quickly because nvme fabrics unquiesces queues after teardown. One side-effect is that !mpath request may timeout during connecting because of queue topo change, but that looks not one big deal: 1) same problem exists with current code base 2) compared with !mpath, mpath use case is dominant Fixes: `2875b0aeca` ("nvme-tcp: fix controller reset hang during traffic") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Josef Bacik	ae6e21f8bb	btrfs: set cache_block_group_error if we find an error commit `92fb94b69c` upstream. We set cache_block_group_error if btrfs_cache_block_group() returns an error, this is because we could end up not finding space to allocate and mistakenly return -ENOSPC, and which could then abort the transaction with the incorrect errno, and in the case of ENOSPC result in a WARN_ON() that will trip up tests like generic/475. However there's the case where multiple threads can be racing, one thread gets the proper error, and the other thread doesn't actually call btrfs_cache_block_group(), it instead sees ->cached == BTRFS_CACHE_ERROR. Again the result is the same, we fail to allocate our space and return -ENOSPC. Instead we need to set cache_block_group_error to -EIO in this case to make sure that if we do not make our allocation we get the appropriate error returned back to the caller. CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:03 +02:00
Qu Wenruo	314135b7ba	btrfs: reject invalid reloc tree root keys with stack dump commit `6ebcd021c9` upstream. [BUG] Syzbot reported a crash that an ASSERT() got triggered inside prepare_to_merge(). That ASSERT() makes sure the reloc tree is properly pointed back by its subvolume tree. [CAUSE] After more debugging output, it turns out we had an invalid reloc tree: BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17 Note the above root key is (TREE_RELOC_OBJECTID, ROOT_ITEM, QUOTA_TREE_OBJECTID), meaning it's a reloc tree for quota tree. But reloc trees can only exist for subvolumes, as for non-subvolume trees, we just COW the involved tree block, no need to create a reloc tree since those tree blocks won't be shared with other trees. Only subvolumes tree can share tree blocks with other trees (thus they have BTRFS_ROOT_SHAREABLE flag). Thus this new debug output proves my previous assumption that corrupted on-disk data can trigger that ASSERT(). [FIX] Besides the dedicated fix and the graceful exit, also let tree-checker to check such root keys, to make sure reloc trees can only exist for subvolumes. CC: stable@vger.kernel.org # 5.15+ Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:02 +02:00
Qu Wenruo	69dd147de4	btrfs: exit gracefully if reloc roots don't match commit `05d7ce5045` upstream. [BUG] Syzbot reported a crash that an ASSERT() got triggered inside prepare_to_merge(). [CAUSE] The root cause of the triggered ASSERT() is we can have a race between quota tree creation and relocation. This leads us to create a duplicated quota tree in the btrfs_read_fs_root() path, and since it's treated as fs tree, it would have ROOT_SHAREABLE flag, causing us to create a reloc tree for it. The bug itself is fixed by a dedicated patch for it, but this already taught us the ASSERT() is not something straightforward for developers. [ENHANCEMENT] Instead of using an ASSERT(), let's handle it gracefully and output extra info about the mismatch reloc roots to help debug. Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside merge_reloc_roots() later. Also replace those ASSERT(0)s with WARN_ON()s. CC: stable@vger.kernel.org # 5.15+ Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:02 +02:00
Christoph Hellwig	c40d4b60c5	btrfs: don't stop integrity writeback too early commit `effa24f689` upstream. extent_write_cache_pages stops writing pages as soon as nr_to_write hits zero. That is the right thing for opportunistic writeback, but incorrect for data integrity writeback, which needs to ensure that no dirty pages are left in the range. Thus only stop the writeback for WB_SYNC_NONE if nr_to_write hits 0. This is a port of write_cache_pages changes in commit `05fe478dd0` ("mm: write_cache_pages integrity fix"). Note that I've only trigger the problem with other changes to the btrfs writeback code, but this condition seems worthwhile fixing anyway. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ updated comment ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:02 +02:00
Nick Child	555e126dd3	ibmvnic: Handle DMA unmapping of login buffs in release functions commit `d78a671eb8` upstream. Rather than leaving the DMA unmapping of the login buffers to the login response handler, move this work into the login release functions. Previously, these functions were only used for freeing the allocated buffers. This could lead to issues if there are more than one outstanding login buffer requests, which is possible if a login request times out. If a login request times out, then there is another call to send login. The send login function makes a call to the login buffer release function. In the past, this freed the buffers but did not DMA unmap. Therefore, the VIOS could still write to the old login (now freed) buffer. It is for this reason that it is a good idea to leave the DMA unmap call to the login buffers release function. Since the login buffer release functions now handle DMA unmapping, remove the duplicate DMA unmapping in handle_login_rsp(). Fixes: `dff515a3e7` ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230809221038.51296-3-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:02 +02:00
Nick Child	34fcc82382	ibmvnic: Unmap DMA login rsp buffer on send login fail commit `411c565b4b` upstream. If the LOGIN CRQ fails to send then we must DMA unmap the response buffer. Previously, if the CRQ failed then the memory was freed without DMA unmapping. Fixes: `c98d9cc417` ("ibmvnic: send_login should check for crq errors") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230809221038.51296-2-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:02 +02:00
Nick Child	cee62753cf	ibmvnic: Enforce stronger sanity checks on login response commit `db17ba719b` upstream. Ensure that all offsets in a login response buffer are within the size of the allocated response buffer. Any offsets or lengths that surpass the allocation are likely the result of an incomplete response buffer. In these cases, a full reset is necessary. When attempting to login, the ibmvnic device will allocate a response buffer and pass a reference to the VIOS. The VIOS will then send the ibmvnic device a LOGIN_RSP CRQ to signal that the buffer has been filled with data. If the ibmvnic device does not get a response in 20 seconds, the old buffer is freed and a new login request is sent. With 2 outstanding requests, any LOGIN_RSP CRQ's could be for the older login request. If this is the case then the login response buffer (which is for the newer login request) could be incomplete and contain invalid data. Therefore, we must enforce strict sanity checks on the response buffer values. Testing has shown that the `off_rxadd_buff_size` value is filled in last by the VIOS and will be the smoking gun for these circumstances. Until VIOS can implement a mechanism for tracking outstanding response buffers and a method for mapping a LOGIN_RSP CRQ to a particular login response buffer, the best ibmvnic can do in this situation is perform a full reset. Fixes: `dff515a3e7` ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230809221038.51296-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:02 +02:00
Moshe Shemesh	27e8db8380	net/mlx5: Skip clock update work when device is in error state commit `d006207625` upstream. When device is in error state, marked by the flag MLX5_DEVICE_STATE_INTERNAL_ERROR, the HW and PCI may not be accessible and so clock update work should be skipped. Furthermore, such access through PCI in error state, after calling mlx5_pci_disable_device() can result in failing to recover from pci errors. Fixes: `ef9814deaf` ("net/mlx5e: Add HW timestamping (TS) support") Reported-and-tested-by: Ganesh G R <ganeshgr@linux.ibm.com> Closes: https://lore.kernel.org/netdev/9bdb9b9d-140a-7a28-f0de-2e64e873c068@nvidia.com Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:02 +02:00
Daniel Jurgens	f638fc2f73	net/mlx5: Allow 0 for total host VFs commit `2dc2b3922d` upstream. When querying eswitch functions 0 is a valid number of host VFs. After introducing ARM SRIOV falling through to getting the max value from PCI results in using the total VFs allowed on the ARM for the host. Fixes: `86eec50bea` ("net/mlx5: Support querying max VFs from device"); Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:02 +02:00
Christophe JAILLET	086a80eb62	dmaengine: mcf-edma: Fix a potential un-allocated memory access commit `0a46781c89` upstream. When 'mcf_edma' is allocated, some space is allocated for a flexible array at the end of the struct. 'chans' item are allocated, that is to say 'pdata->dma_channels'. Then, this number of item is stored in 'mcf_edma->n_chans'. A few lines later, if 'mcf_edma->n_chans' is 0, then a default value of 64 is set. This ends to no space allocated by devm_kzalloc() because chans was 0, but 64 items are read and/or written in some not allocated memory. Change the logic to define a default value before allocating the memory. Fixes: `e7a3ff92ea` ("dmaengine: fsl-edma: add ColdFire mcf5441x edma support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/f55d914407c900828f6fad3ea5fa791a5f17b9a4.1685172449.git.christophe.jaillet@wanadoo.fr Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:02 +02:00
Ido Schimmel	7e1dc94b2d	nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID commit `8743aeff5b` upstream. A netlink dump callback can return a positive number to signal that more information needs to be dumped or zero to signal that the dump is complete. In the second case, the core netlink code will append the NLMSG_DONE message to the skb in order to indicate to user space that the dump is complete. The nexthop bucket dump callback always returns a positive number if nexthop buckets were filled in the provided skb, even if the dump is complete. This means that a dump will span at least two recvmsg() calls as long as nexthop buckets are present. In the last recvmsg() call the dump callback will not fill in any nexthop buckets because the previous call indicated that the dump should restart from the last dumped nexthop ID plus one. # ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip nexthop add id 10 group 1 type resilient buckets 2 # strace -e sendto,recvmsg -s 5 ip nexthop bucket sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST\|NLM_F_DUMP, nlmsg_seq=1691396980, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? /, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK\|MSG_TRUNC) = 128 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128 id 10 index 0 idle_time 6.66 nhid 1 id 10 index 1 idle_time 6.66 nhid 1 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK\|MSG_TRUNC) = 20 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20 +++ exited with 0 +++ This behavior is both inefficient and buggy. If the last nexthop to be dumped had the maximum ID of 0xffffffff, then the dump will restart from 0 (0xffffffff + 1) and never end: # ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip nexthop add id $((232-1)) group 1 type resilient buckets 2 # ip nexthop bucket id 4294967295 index 0 idle_time 5.55 nhid 1 id 4294967295 index 1 idle_time 5.55 nhid 1 id 4294967295 index 0 idle_time 5.55 nhid 1 id 4294967295 index 1 idle_time 5.55 nhid 1 [...] Fix by adjusting the dump callback to return zero when the dump is complete. After the fix only one recvmsg() call is made and the NLMSG_DONE message is appended to the RTM_NEWNEXTHOPBUCKET responses: # ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip nexthop add id $((232-1)) group 1 type resilient buckets 2 # strace -e sendto,recvmsg -s 5 ip nexthop bucket sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST\|NLM_F_DUMP, nlmsg_seq=1691396737, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 / NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK\|MSG_TRUNC) = 148 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148 id 4294967295 index 0 idle_time 6.61 nhid 1 id 4294967295 index 1 idle_time 6.61 nhid 1 +++ exited with 0 +++ Note that if the NLMSG_DONE message cannot be appended because of size limitations, then another recvmsg() will be needed, but the core netlink code will not invoke the dump callback and simply reply with a NLMSG_DONE message since it knows that the callback previously returned zero. Add a test that fails before the fix: # ./fib_nexthops.sh -t basic_res [...] TEST: Maximum nexthop ID dump [FAIL] [...] And passes after it: # ./fib_nexthops.sh -t basic_res [...] TEST: Maximum nexthop ID dump [ OK ] [...] Fixes: `8a1bbabb03` ("nexthop: Add netlink handlers for bucket dump") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230808075233.3337922-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:02 +02:00
Ido Schimmel	608a4327c2	nexthop: Make nexthop bucket dump more efficient commit `f10d3d9df4` upstream. rtm_dump_nexthop_bucket_nh() is used to dump nexthop buckets belonging to a specific resilient nexthop group. The function returns a positive return code (the skb length) upon both success and failure. The above behavior is problematic. When a complete nexthop bucket dump is requested, the function that walks the different nexthops treats the non-zero return code as an error. This causes buckets belonging to different resilient nexthop groups to be dumped using different buffers even if they can all fit in the same buffer: # ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip nexthop add id 10 group 1 type resilient buckets 1 # ip nexthop add id 20 group 1 type resilient buckets 1 # strace -e recvmsg -s 0 ip nexthop bucket [...] recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64 id 10 index 0 idle_time 10.27 nhid 1 [...] recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64 id 20 index 0 idle_time 6.44 nhid 1 [...] Fix by only returning a non-zero return code when an error occurred and restarting the dump from the bucket index we failed to fill in. This allows buckets belonging to different resilient nexthop groups to be dumped using the same buffer: # ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip nexthop add id 10 group 1 type resilient buckets 1 # ip nexthop add id 20 group 1 type resilient buckets 1 # strace -e recvmsg -s 0 ip nexthop bucket [...] recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128 id 10 index 0 idle_time 30.21 nhid 1 id 20 index 0 idle_time 26.7 nhid 1 [...] While this change is more of a performance improvement change than an actual bug fix, it is a prerequisite for a subsequent patch that does fix a bug. Fixes: `8a1bbabb03` ("nexthop: Add netlink handlers for bucket dump") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230808075233.3337922-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:22:02 +02:00

1 2 3 4 5 ...

1063038 Commits