[ Upstream commit eb7e33d01db3aec128590391b2397384bab406b6 ]
Switch the delay loop during the Arbiter empty check from
vsc73xx_adjust_link() to use read_poll_timeout(). Functionally,
one msleep() call is eliminated at the end of the loop in the timeout
case.
As Russell King suggested:
"This [change] avoids the issue that on the last iteration, the code reads
the register, tests it, finds the condition that's being waiting for is
false, _then_ waits and end up printing the error message - that last
wait is rather useless, and as the arbiter state isn't checked after
waiting, it could be that we had success during the last wait."
Suggested-by: Russell King <linux@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Link: https://lore.kernel.org/r/20240417205048.3542839-2-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: fa63c6434b6f ("net: dsa: vsc73xx: check busy flag in MDIO operations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5b9eebc2c7a5f0cc7950d918c1e8a4ad4bed5010 ]
In the 'vsc73xx_phy_write' function, the register value is missing,
and the phy write operation always sends zeros.
This commit passes the value variable into the proper register.
Fixes: 05bd97fc55 ("net: dsa: Add Vitesse VSC73xx DSA router driver")
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ff2f816e2aa65ca9a1cdf0954842f8173c0f48d ]
In axiethernet header fix register defines comment description to be
inline with IP documentation. It updates MAC configuration register,
MDIO configuration register and frame filter control description.
Fixes: 8a3b7a252d ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9a18e8f770c9b0703dab93580d0b02e199a4c79 ]
We can't dereference "skb" after calling vcc->push() because the skb
is released.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cbc796be1779c4dbc9a482c7233995e2a8b6bfb3 ]
Previously, an ethtool rx flow with no attrs would not be added to the
NIC as it has no rules to configure the hw with, but it would be
reported as successful to the caller (return code 0). This is confusing
for the user as ethtool then reports "Added rule $num", but no rule was
actually added.
This change corrects that by instead reporting these wrong rules as
-EINVAL.
Fixes: b29c61dac3 ("net/mlx5e: Ethtool steering flow validation refactoring")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20240808144107.2095424-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e037a26ead187901f83cad9c503ccece5ff6817a ]
Testing uncovered that even when the taprio gate is closed, some packets
still transmit.
According to i225/6 hardware errata [1], traffic might overflow the
planned QBV window. This happens because MAC maintains an internal buffer,
primarily for supporting half duplex retries. Therefore, even when the
gate closes, residual MAC data in the buffer may still transmit.
To mitigate this for i226, reduce the MAC's internal buffer from 192 bytes
to the recommended 88 bytes by modifying the RETX_CTL register value.
This follows guidelines from:
[1] Ethernet Controller I225/I22 Spec Update Rev 2.1 Errata Item 9:
TSN: Packet Transmission Might Cross Qbv Window
[2] I225/6 SW User Manual Rev 1.2.4: Section 8.11.5 Retry Buffer Control
Note that the RETX_CTL register can't be used in TSN mode because half
duplex feature cannot coexist with TSN.
Test Steps:
1. Send taprio cmd to board A:
tc qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 4 \
map 3 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \
queues 1@0 1@1 1@2 1@3 \
base-time 0 \
sched-entry S 0x07 500000 \
sched-entry S 0x0f 500000 \
flags 0x2 \
txtime-delay 0
Note that for TC3, gate should open for 500us and close for another
500us.
3. Take tcpdump log on Board B.
4. Send udp packets via UDP tai app from Board A to Board B.
5. Analyze tcpdump log via wireshark log on Board B. Ensure that the
total time from the first to the last packet received during one cycle
for TC3 does not exceed 500us.
Fixes: 4354621173 ("igc: Add new device ID's")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 790835fcc0 ]
The launchtime offset should be corrected according to sections 7.5.2.6
Transmit Scheduling Latency of the Intel Ethernet I225/I226 Software
User Manual.
Software can compensate the latency between the transmission scheduling
and the time that packet is transmitted to the network by setting this
GTxOffset register. Without setting this register, there may be a
significant delay between the packet scheduling and the network point.
This patch helps to reduce the latency for each of the link speed.
Before:
10Mbps : 11000 - 13800 nanosecond
100Mbps : 1300 - 1700 nanosecond
1000Mbps : 190 - 600 nanosecond
2500Mbps : 1400 - 1700 nanosecond
After:
10Mbps : less than 750 nanosecond
100Mbps : less than 192 nanosecond
1000Mbps : less than 128 nanosecond
2500Mbps : less than 128 nanosecond
Test Setup:
Talker : Use l2_tai.c to generate the launchtime into packet payload.
Listener: Use timedump.c to compute the delta between packet arrival and
LaunchTime packet payload.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: e037a26ead18 ("igc: Fix packet still tx after gate close by reducing i226 MAC retry buffer")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce6dede912f064a855acf6f04a04cbb2c25b8c8c ]
[syzbot reported]
general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 5061 Comm: syz-executor404 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
RIP: 0010:dtInsertEntry+0xd0c/0x1780 fs/jfs/jfs_dtree.c:3713
...
[Analyze]
In dtInsertEntry(), when the pointer h has the same value as p, after writing
name in UniStrncpy_to_le(), p->header.flag will be cleared. This will cause the
previously true judgment "p->header.flag & BT-LEAF" to change to no after writing
the name operation, this leads to entering an incorrect branch and accessing the
uninitialized object ih when judging this condition for the second time.
[Fix]
After got the page, check freelist first, if freelist == 0 then exit dtInsert()
and return -EINVAL.
Reported-by: syzbot+bba84aef3a26fb93deb9@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd89a81d850fa9a65f67b4527c0e420d15bf836c ]
Drop the WARN_ON_ONCE inn gue_gro_receive if the encapsulated type is
not known or does not have a GRO handler.
Such a packet is easily constructed. Syzbot generates them and sets
off this warning.
Remove the warning as it is expected and not actionable.
The warning was previously reduced from WARN_ON to WARN_ON_ONCE in
commit 270136613b ("fou: Do WARN_ON_ONCE in gue_gro_receive for bad
proto callbacks").
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240614122552.1649044-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 27ab33854873e6fb958cb074681a0107cc2ecc4c ]
Syzbot reports uninitialized memory access in udf_rename() when updating
checksum of '..' directory entry of a moved directory. This is indeed
true as we pass on-stack diriter.fi to the udf_update_tag() and because
that has only struct fileIdentDesc included in it and not the impUse or
name fields, the checksumming function is going to checksum random stack
contents beyond the end of the structure. This is actually harmless
because the following udf_fiiter_write_fi() will recompute the checksum
from on-disk buffers where everything is properly included. So all that
is needed is just removing the bogus calculation.
Fixes: e9109a92d2 ("udf: Convert udf_rename() to new directory iteration code")
Link: https://lore.kernel.org/all/000000000000cf405f060d8f75a9@google.com/T/
Link: https://patch.msgid.link/20240617154201.29512-1-jack@suse.cz
Reported-by: syzbot+d31185aa54170f7fc1f5@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a46ef234756dca04623b7591e8ebb3440622f0b ]
ext4_xattr_set_entry() creates new EA inodes while holding buffer lock
on the external xattr block. This is problematic as it nests all the
allocation locking (which acquires locks on other buffers) under the
buffer lock. This can even deadlock when the filesystem is corrupted and
e.g. quota file is setup to contain xattr block as data block. Move the
allocation of EA inode out of ext4_xattr_set_entry() into the callers.
Reported-by: syzbot+a43d4f48b8397d0e41a9@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240321162657.27420-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8208c41c43ad5e9b63dce6c45a73e326109ca658 ]
When allocating EA inode, quota accounting is done just before
ext4_xattr_inode_lookup_create(). Logically these two operations belong
together so just fold quota accounting into
ext4_xattr_inode_lookup_create(). We also make
ext4_xattr_inode_lookup_create() return the looked up / created inode to
convert the function to a more standard calling convention.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240209112107.10585-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 0a46ef234756 ("ext4: do not create EA inode under buffer lock")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a97de7bff13b1cc825c1b1344eaed8d6c2d3e695 ]
syzbot reported rfcomm_sock_setsockopt_old() is copying data without
checking user input length.
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset
include/linux/sockptr.h:49 [inline]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr
include/linux/sockptr.h:55 [inline]
BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt_old
net/bluetooth/rfcomm/sock.c:632 [inline]
BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt+0x893/0xa70
net/bluetooth/rfcomm/sock.c:673
Read of size 4 at addr ffff8880209a8bc3 by task syz-executor632/5064
Fixes: 9f2c8a03fb ("Bluetooth: Replace RFCOMM link mode with security level")
Fixes: bb23c0ab82 ("Bluetooth: Add support for deferring RFCOMM connection setup")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59f2f841179aa6a0899cb9cf53659149a35749b7 ]
syzbot reported the following lock sequence:
cpu 2:
grabs timer_base lock
spins on bpf_lpm lock
cpu 1:
grab rcu krcp lock
spins on timer_base lock
cpu 0:
grab bpf_lpm lock
spins on rcu krcp lock
bpf_lpm lock can be the same.
timer_base lock can also be the same due to timer migration.
but rcu krcp lock is always per-cpu, so it cannot be the same lock.
Hence it's a false positive.
To avoid lockdep complaining move kfree_rcu() after spin_unlock.
Reported-by: syzbot+1fa663a2100308ab6eab@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240329171439.37813-1-alexei.starovoitov@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 896880ff30866f386ebed14ab81ce1ad3710cfc4 ]
Replace deprecated 0-length array in struct bpf_lpm_trie_key with
flexible array. Found with GCC 13:
../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=]
207 | *(__be16 *)&key->data[i]);
| ^~~~~~~~~~~~~
../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16'
102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
| ^
../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu'
97 | #define be16_to_cpu __be16_to_cpu
| ^~~~~~~~~~~~~
../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu'
206 | u16 diff = be16_to_cpu(*(__be16 *)&node->data[i]
^
| ^~~~~~~~~~~
In file included from ../include/linux/bpf.h:7:
../include/uapi/linux/bpf.h:82:17: note: while referencing 'data'
82 | __u8 data[0]; /* Arbitrary size */
| ^~~~
And found at run-time under CONFIG_FORTIFY_SOURCE:
UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49
index 0 is out of range for type '__u8 [*]'
Changing struct bpf_lpm_trie_key is difficult since has been used by
userspace. For example, in Cilium:
struct egress_gw_policy_key {
struct bpf_lpm_trie_key lpm_key;
__u32 saddr;
__u32 daddr;
};
While direct references to the "data" member haven't been found, there
are static initializers what include the final member. For example,
the "{}" here:
struct egress_gw_policy_key in_key = {
.lpm_key = { 32 + 24, {} },
.saddr = CLIENT_IP,
.daddr = EXTERNAL_SVC_IP & 0Xffffff,
};
To avoid the build time and run time warnings seen with a 0-sized
trailing array for struct bpf_lpm_trie_key, introduce a new struct
that correctly uses a flexible array for the trailing bytes,
struct bpf_lpm_trie_key_u8. As part of this, include the "header"
portion (which is just the "prefixlen" member), so it can be used
by anything building a bpf_lpr_trie_key that has trailing members that
aren't a u8 flexible array (like the self-test[1]), which is named
struct bpf_lpm_trie_key_hdr.
Unfortunately, C++ refuses to parse the __struct_group() helper, so
it is not possible to define struct bpf_lpm_trie_key_hdr directly in
struct bpf_lpm_trie_key_u8, so we must open-code the union directly.
Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out,
and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment
to the UAPI header directing folks to the two new options.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Closes: https://paste.debian.net/hidden/ca500597/
Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1]
Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org
Stable-dep-of: 59f2f841179a ("bpf: Avoid kfree_rcu() under lock in bpf_lpm_trie.")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e4c0d0460 ]
At least ath10k and ath11k supported hardware (maybe more) does not implement
mesh A-MSDU aggregation in a standard compliant way.
802.11-2020 9.3.2.2.2 declares that the Mesh Control field is part of the
A-MSDU header (and little-endian).
As such, its length must not be included in the subframe length field.
Hardware affected by this bug treats the mesh control field as part of the
MSDU data and sets the length accordingly.
In order to avoid packet loss, keep track of which stations are affected
by this and take it into account when converting A-MSDU to 802.3 + mesh control
packets.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-5-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Stable-dep-of: 9ad797485692 ("wifi: cfg80211: check A-MSDU format more carefully")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 986e43b19a ]
The current mac80211 mesh A-MSDU receive path fails to parse A-MSDU packets
on mesh interfaces, because it assumes that the Mesh Control field is always
directly after the 802.11 header.
802.11-2020 9.3.2.2.2 Figure 9-70 shows that the Mesh Control field is
actually part of the A-MSDU subframe header.
This makes more sense, since it allows packets for multiple different
destinations to be included in the same A-MSDU, as long as RA and TID are
still the same.
Another issue is the fact that the A-MSDU subframe length field was apparently
accidentally defined as little-endian in the standard.
In order to fix this, the mesh forwarding path needs happen at a different
point in the receive path.
ieee80211_data_to_8023_exthdr is changed to ignore the mesh control field
and leave it in after the ethernet header. This also affects the source/dest
MAC address fields, which now in the case of mesh point to the mesh SA/DA.
ieee80211_amsdu_to_8023s is changed to deal with the endian difference and
to add the Mesh Control length to the subframe length, since it's not covered
by the MSDU length field.
With these changes, the mac80211 will get the same packet structure for
converted regular data packets and unpacked A-MSDU subframes.
The mesh forwarding checks are now only performed after the A-MSDU decap.
For locally received packets, the Mesh Control header is stripped away.
For forwarded packets, a new 802.11 header gets added.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-4-nbd@nbd.name
[fix fortify build error]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Stable-dep-of: 9ad797485692 ("wifi: cfg80211: check A-MSDU format more carefully")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5c1e269aa5 ]
Now that all drivers use iTXQ, it does not make sense to check to drop
tx forwarding packets when the driver has stopped the queues.
fq_codel will take care of dropping packets when the queues fill up
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-3-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Stable-dep-of: 9ad797485692 ("wifi: cfg80211: check A-MSDU format more carefully")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94b9b9de05 ]
ieee80211_drop_unencrypted is called from ieee80211_rx_h_mesh_fwding and
ieee80211_frame_allowed.
Since ieee80211_rx_h_mesh_fwding can forward packets for other mesh nodes
and is called earlier, it needs to check the decryptions status and if the
packet is using the control protocol on its own, instead of deferring to
the later call from ieee80211_frame_allowed.
Because of that, ieee80211_drop_unencrypted has a mesh specific check
that skips over the mesh header in order to check the payload protocol.
This code is invalid when called from ieee80211_frame_allowed, since that
happens after the 802.11->802.3 conversion.
Fix this by moving the mesh specific check directly into
ieee80211_rx_h_mesh_fwding.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221201135730.19723-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Stable-dep-of: 9ad797485692 ("wifi: cfg80211: check A-MSDU format more carefully")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc34ebd5c018b0edf47f39d11083ad8312733034 ]
syzbot reports a memory leak in pppoe_sendmsg [1].
The problem is in the pppoe_recvmsg() function that handles errors
in the wrong order. For the skb_recv_datagram() function, check
the pointer to skb for NULL first, and then check the 'error' variable,
because the skb_recv_datagram() function can set 'error'
to -EAGAIN in a loop but return a correct pointer to socket buffer
after a number of attempts, though 'error' remains set to -EAGAIN.
skb_recv_datagram
__skb_recv_datagram // Loop. if (err == -EAGAIN) then
// go to the next loop iteration
__skb_try_recv_datagram // if (skb != NULL) then return 'skb'
// else if a signal is received then
// return -EAGAIN
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with Syzkaller.
Link: https://syzkaller.appspot.com/bug?extid=6bdfd184eac7709e5cc9 [1]
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot+6bdfd184eac7709e5cc9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6bdfd184eac7709e5cc9
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/20240214085814.3894917-1-Ilia.Gavrilov@infotecs.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f1acf1ac84d2ae97b7889b87223c1064df850069 ]
Functions rds_still_queued and rds_clear_recv_queue lock a given socket
in order to safely iterate over the incoming rds messages. However
calling rds_inc_put while under this lock creates a potential deadlock.
rds_inc_put may eventually call rds_message_purge, which will lock
m_rs_lock. This is the incorrect locking order since m_rs_lock is
meant to be locked before the socket. To fix this, we move the message
item to a local list or variable that wont need rs_recv_lock protection.
Then we can safely call rds_inc_put on any item stored locally after
rs_recv_lock is released.
Fixes: bdbe6fbc6a ("RDS: recv.c")
Reported-by: syzbot+f9db6ff27b9bfdcfeca0@syzkaller.appspotmail.com
Reported-by: syzbot+dcd73ff9291e6d34b3ab@syzkaller.appspotmail.com
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://lore.kernel.org/r/20240209022854.200292-1-allison.henderson@oracle.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a898cb621ac589b0b9e959309689a027e765aa12 ]
Syzbot has found that when it creates corrupted quota files where the
quota tree contains a loop, we will deadlock when tryling to insert a
dquot. Add loop detection into functions traversing the quota tree.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 496530c7c1dfc159d59a75ae00b572f570710c53 ]
Syzbot reported a KMSAN warning,
erofs: (device loop0): z_erofs_lz4_decompress_mem: failed to decompress -12 in[46, 4050] out[917]
=====================================================
BUG: KMSAN: uninit-value in hex_dump_to_buffer+0xae9/0x10f0 lib/hexdump.c:194
..
print_hex_dump+0x13d/0x3e0 lib/hexdump.c:276
z_erofs_lz4_decompress_mem fs/erofs/decompressor.c:252 [inline]
z_erofs_lz4_decompress+0x257e/0x2a70 fs/erofs/decompressor.c:311
z_erofs_decompress_pcluster fs/erofs/zdata.c:1290 [inline]
z_erofs_decompress_queue+0x338c/0x6460 fs/erofs/zdata.c:1372
z_erofs_runqueue+0x36cd/0x3830
z_erofs_read_folio+0x435/0x810 fs/erofs/zdata.c:1843
The root cause is that the printed decompressed buffer may be filled
incompletely due to decompression failure. Since they were once only
used for debugging, get rid of them now.
Reported-and-tested-by: syzbot+6c746eea496f34b3161d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/000000000000321c24060d7cfa1c@google.com
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231227151903.2900413-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e316dd1cf1358ff9c44b37c7be273a7dc4349986 ]
The top syzbot report for networking (#14 for the entire kernel)
is the queue timeout splat. We kept it around for a long time,
because in real life it provides pretty strong signal that
something is wrong with the driver or the device.
Removing it is also likely to break monitoring for those who
track it as a kernel warning.
Nevertheless, WARN()ings are best suited for catching kernel
programming bugs. If a Tx queue gets starved due to a pause
storm, priority configuration, or other weirdness - that's
obviously a problem, but not a problem we can fix at
the kernel level.
Bite the bullet and convert the WARN() to a print.
Before:
NETDEV WATCHDOG: eni1np1 (netdevsim): transmit queue 0 timed out 1975 ms
WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x39e/0x3b0
[... completely pointless stack trace of a timer follows ...]
Now:
netdevsim netdevsim1 eni1np1: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 1769 ms
Alternatively we could mark the drivers which syzbot has
learned to abuse as "print-instead-of-WARN" selectively.
Reported-by: syzbot+d55372214aff0faa1f1f@syzkaller.appspotmail.com
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f0f9465ad ]
The kernel will print several warnings in a short period of time
when it stalls. Like this:
First warning:
[ 7100.097547] ------------[ cut here ]------------
[ 7100.097550] NETDEV WATCHDOG: eno2 (xxx): transmit queue 8 timed out
[ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467
dev_watchdog+0x260/0x270
...
Second warning:
[ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 7147.756958] rcu: 24-....: (59999 ticks this GP) idle=546/1/0x400000000000000
softirq=367 3137/3673146 fqs=13844
[ 7147.756960] (t=60001 jiffies g=4322709 q=133381)
[ 7147.756962] NMI backtrace for cpu 24
...
We calculate that the transmit queue start stall should occur before
7095s according to watchdog_timeo, the rcu start stall at 7087s.
These two times are close together, it is difficult to confirm which
happened first.
To let users know the exact time the stall started, print msecs when
the transmit queue time out.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: e316dd1cf135 ("net: don't dump stack on queue timeout")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a26787aa13974fb0b3fb42bfeb4256c1b686e305 ]
We want to ensure everything holds the wiphy lock,
so also extend that to the MAC change callback.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Stable-dep-of: 74a7c93f45ab ("wifi: mac80211: fix change_address deadlock during unregister")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7eaf80bfb ]
Syzbot found a bug "BUG: sleeping function called from invalid context
at kernel/locking/mutex.c:580". It is because hci_link_tx_to holds an
RCU read lock and calls hci_disconnect which would hold a mutex lock
since the commit a13f316e90 ("Bluetooth: hci_conn: Consolidate code
for aborting connections"). Here's an example call trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xfc/0x174 lib/dump_stack.c:106
___might_sleep+0x4a9/0x4d3 kernel/sched/core.c:9663
__mutex_lock_common kernel/locking/mutex.c:576 [inline]
__mutex_lock+0xc7/0x6e7 kernel/locking/mutex.c:732
hci_cmd_sync_queue+0x3a/0x287 net/bluetooth/hci_sync.c:388
hci_abort_conn+0x2cd/0x2e4 net/bluetooth/hci_conn.c:1812
hci_disconnect+0x207/0x237 net/bluetooth/hci_conn.c:244
hci_link_tx_to net/bluetooth/hci_core.c:3254 [inline]
__check_timeout net/bluetooth/hci_core.c:3419 [inline]
__check_timeout+0x310/0x361 net/bluetooth/hci_core.c:3399
hci_sched_le net/bluetooth/hci_core.c:3602 [inline]
hci_tx_work+0xe8f/0x12d0 net/bluetooth/hci_core.c:3652
process_one_work+0x75c/0xba1 kernel/workqueue.c:2310
worker_thread+0x5b2/0x73a kernel/workqueue.c:2457
kthread+0x2f7/0x30b kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
This patch releases RCU read lock before calling hci_disconnect and
reacquires it afterward to fix the bug.
Fixes: a13f316e90 ("Bluetooth: hci_conn: Consolidate code for aborting connections")
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f66af88e33 ]
[ 81.372851][ T5532] CPU: 1 PID: 5532 Comm: syz-executor.0 Not tainted 6.2.0-rc1-syzkaller-dirty #0
[ 81.382080][ T5532] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
[ 81.392343][ T5532] Call Trace:
[ 81.395654][ T5532] <TASK>
[ 81.398603][ T5532] dump_stack_lvl+0x1b1/0x290
[ 81.418421][ T5532] gfs2_assert_warn_i+0x19a/0x2e0
[ 81.423480][ T5532] gfs2_quota_cleanup+0x4c6/0x6b0
[ 81.428611][ T5532] gfs2_make_fs_ro+0x517/0x610
[ 81.457802][ T5532] gfs2_withdraw+0x609/0x1540
[ 81.481452][ T5532] gfs2_inode_refresh+0xb2d/0xf60
[ 81.506658][ T5532] gfs2_instantiate+0x15e/0x220
[ 81.511504][ T5532] gfs2_glock_wait+0x1d9/0x2a0
[ 81.516352][ T5532] do_sync+0x485/0xc80
[ 81.554943][ T5532] gfs2_quota_sync+0x3da/0x8b0
[ 81.559738][ T5532] gfs2_sync_fs+0x49/0xb0
[ 81.564063][ T5532] sync_filesystem+0xe8/0x220
[ 81.568740][ T5532] generic_shutdown_super+0x6b/0x310
[ 81.574112][ T5532] kill_block_super+0x79/0xd0
[ 81.578779][ T5532] deactivate_locked_super+0xa7/0xf0
[ 81.584064][ T5532] cleanup_mnt+0x494/0x520
[ 81.593753][ T5532] task_work_run+0x243/0x300
[ 81.608837][ T5532] exit_to_user_mode_loop+0x124/0x150
[ 81.614232][ T5532] exit_to_user_mode_prepare+0xb2/0x140
[ 81.619820][ T5532] syscall_exit_to_user_mode+0x26/0x60
[ 81.625287][ T5532] do_syscall_64+0x49/0xb0
[ 81.629710][ T5532] entry_SYSCALL_64_after_hwframe+0x63/0xcd
In this backtrace, gfs2_quota_sync() takes quota data references and
then calls do_sync(). Function do_sync() encounters filesystem
corruption and withdraws the filesystem, which (among other things) calls
gfs2_quota_cleanup(). Function gfs2_quota_cleanup() wrongly assumes
that nobody is holding any quota data references anymore, and destroys
all quota data objects. When gfs2_quota_sync() then resumes and
dereferences the quota data objects it is holding, those objects are no
longer there.
Function gfs2_quota_cleanup() deals with resource deallocation and can
easily be delayed until gfs2_put_super() in the case of a filesystem
withdraw. In fact, most of the other work gfs2_make_fs_ro() does is
unnecessary during a withdraw as well, so change signal_our_withdraw()
to skip gfs2_make_fs_ro() and perform the necessary steps directly
instead.
Thanks to Edward Adam Davis <eadavis@sina.com> for the initial patches.
Link: https://lore.kernel.org/all/0000000000002b5e2405f14e860f@google.com
Reported-by: syzbot+3f6a670108ce43356017@syzkaller.appspotmail.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b77b4a4815 ]
So far, at mount time, gfs2 would take the freeze glock in shared mode
and then immediately drop it again, turning it into a cached glock that
can be reclaimed at any time. To freeze the filesystem cluster-wide,
the node initiating the freeze would take the freeze glock in exclusive
mode, which would cause the freeze glock's freeze_go_sync() callback to
run on each node. There, gfs2 would freeze the filesystem and schedule
gfs2_freeze_func() to run. gfs2_freeze_func() would re-acquire the
freeze glock in shared mode, thaw the filesystem, and drop the freeze
glock again. The initiating node would keep the freeze glock held in
exclusive mode. To thaw the filesystem, the initiating node would drop
the freeze glock again, which would allow gfs2_freeze_func() to resume
on all nodes, leaving the filesystem in the thawed state.
It turns out that in freeze_go_sync(), we cannot reliably and safely
freeze the filesystem. This is primarily because the final unmount of a
filesystem takes a write lock on the s_umount rw semaphore before
calling into gfs2_put_super(), and freeze_go_sync() needs to call
freeze_super() which also takes a write lock on the same semaphore,
causing a deadlock. We could work around this by trying to take an
active reference on the super block first, which would prevent unmount
from running at the same time. But that can fail, and freeze_go_sync()
isn't actually allowed to fail.
To get around this, this patch changes the freeze glock locking scheme
as follows:
At mount time, each node takes the freeze glock in shared mode. To
freeze a filesystem, the initiating node first freezes the filesystem
locally and then drops and re-acquires the freeze glock in exclusive
mode. All other nodes notice that there is contention on the freeze
glock in their go_callback callbacks, and they schedule
gfs2_freeze_func() to run. There, they freeze the filesystem locally
and drop and re-acquire the freeze glock before re-thawing the
filesystem. This is happening outside of the glock state engine, so
there, we are allowed to fail.
From a cluster point of view, taking and immediately dropping a glock is
indistinguishable from taking the glock and only dropping it upon
contention, so this new scheme is compatible with the old one.
Thanks to Li Dong <lidong@vivo.com> for reporting a locking bug in
gfs2_freeze_func() in a previous version of this commit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: f66af88e33 ("gfs2: Stop using gfs2_make_fs_ro for withdraw")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cad1e15804 ]
Rename the SDF_FS_FROZEN flag to SDF_FREEZE_INITIATOR to indicate more
clearly that the node that has this flag set is the initiator of the
freeze.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com
Stable-dep-of: f66af88e33 ("gfs2: Stop using gfs2_make_fs_ro for withdraw")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e392edd5d5 ]
Rename gfs2_freeze_lock to gfs2_freeze_lock_shared to make it a bit more
obvious that this function establishes the "thawed" state of the freeze
glock.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: f66af88e33 ("gfs2: Stop using gfs2_make_fs_ro for withdraw")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 097cca525a ]
Rename gfs2_freeze to gfs2_freeze_super and gfs2_unfreeze to
gfs2_thaw_super to match the names of the corresponding super
operations.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: f66af88e33 ("gfs2: Stop using gfs2_make_fs_ro for withdraw")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af1abe1146 ]
The transaction glock was repurposed to serve as the new freeze glock
years ago. Don't refer to it as the transaction glock anymore.
Also, to be more precise, call it the "freeze glock" instead of the
"freeze lock". Ditto for the journal glock.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: f66af88e33 ("gfs2: Stop using gfs2_make_fs_ro for withdraw")
Signed-off-by: Sasha Levin <sashal@kernel.org>