MattB reported a lockdep splat in the mptcp listener code cleanup:
WARNING: possible circular locking dependency detected
packetdrill/14278 is trying to acquire lock:
ffff888017d868f0 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: __flush_work (kernel/workqueue.c:3069)
but task is already holding lock:
ffff888017d84130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973)
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
__lock_acquire (kernel/locking/lockdep.c:5055)
lock_acquire (kernel/locking/lockdep.c:466)
lock_sock_nested (net/core/sock.c:3463)
mptcp_worker (net/mptcp/protocol.c:2614)
process_one_work (kernel/workqueue.c:2294)
worker_thread (include/linux/list.h:292)
kthread (kernel/kthread.c:376)
ret_from_fork (arch/x86/entry/entry_64.S:312)
-> #0 ((work_completion)(&msk->work)){+.+.}-{0:0}:
check_prev_add (kernel/locking/lockdep.c:3098)
validate_chain (kernel/locking/lockdep.c:3217)
__lock_acquire (kernel/locking/lockdep.c:5055)
lock_acquire (kernel/locking/lockdep.c:466)
__flush_work (kernel/workqueue.c:3070)
__cancel_work_timer (kernel/workqueue.c:3160)
mptcp_cancel_work (net/mptcp/protocol.c:2758)
mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817)
__mptcp_close_ssk (net/mptcp/protocol.c:2363)
mptcp_destroy_common (net/mptcp/protocol.c:3170)
mptcp_destroy (include/net/sock.h:1495)
__mptcp_destroy_sock (net/mptcp/protocol.c:2886)
__mptcp_close (net/mptcp/protocol.c:2959)
mptcp_close (net/mptcp/protocol.c:2974)
inet_release (net/ipv4/af_inet.c:432)
__sock_release (net/socket.c:651)
sock_close (net/socket.c:1367)
__fput (fs/file_table.c:320)
task_work_run (kernel/task_work.c:181 (discriminator 1))
exit_to_user_mode_prepare (include/linux/resume_user_mode.h:49)
syscall_exit_to_user_mode (kernel/entry/common.c:130)
do_syscall_64 (arch/x86/entry/common.c:87)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sk_lock-AF_INET);
lock((work_completion)(&msk->work));
lock(sk_lock-AF_INET);
lock((work_completion)(&msk->work));
*** DEADLOCK ***
The report is actually a false positive, since the only existing lock
nesting is the msk socket lock acquired by the mptcp work.
cancel_work_sync() is invoked without the relevant socket lock being
held, but under a different (the msk listener) socket lock.
We could silence the splat adding a per workqueue dynamic lockdep key,
but that looks overkill. Instead just tell lockdep the msk socket lock
is not held around cancel_work_sync().
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/322
Fixes: 30e51b923e ("mptcp: fix unreleased socket in accept queue")
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MatM reported a deadlock at fastopening time:
INFO: task syz-executor.0:11454 blocked for more than 143 seconds.
Tainted: G S 6.1.0-rc5-03226-gdb0157db5153 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:25104 pid:11454 ppid:424 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5191 [inline]
__schedule+0x5c2/0x1550 kernel/sched/core.c:6503
schedule+0xe8/0x1c0 kernel/sched/core.c:6579
__lock_sock+0x142/0x260 net/core/sock.c:2896
lock_sock_nested+0xdb/0x100 net/core/sock.c:3466
__mptcp_close_ssk+0x1a3/0x790 net/mptcp/protocol.c:2328
mptcp_destroy_common+0x16a/0x650 net/mptcp/protocol.c:3171
mptcp_disconnect+0xb8/0x450 net/mptcp/protocol.c:3019
__inet_stream_connect+0x897/0xa40 net/ipv4/af_inet.c:720
tcp_sendmsg_fastopen+0x3dd/0x740 net/ipv4/tcp.c:1200
mptcp_sendmsg_fastopen net/mptcp/protocol.c:1682 [inline]
mptcp_sendmsg+0x128a/0x1a50 net/mptcp/protocol.c:1721
inet6_sendmsg+0x11f/0x150 net/ipv6/af_inet6.c:663
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xf7/0x190 net/socket.c:734
____sys_sendmsg+0x336/0x970 net/socket.c:2476
___sys_sendmsg+0x122/0x1c0 net/socket.c:2530
__sys_sendmmsg+0x18d/0x460 net/socket.c:2616
__do_sys_sendmmsg net/socket.c:2645 [inline]
__se_sys_sendmmsg net/socket.c:2642 [inline]
__x64_sys_sendmmsg+0x9d/0x110 net/socket.c:2642
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5920a75e7d
RSP: 002b:00007f59201e8028 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f5920bb4f80 RCX: 00007f5920a75e7d
RDX: 0000000000000001 RSI: 0000000020002940 RDI: 0000000000000005
RBP: 00007f5920ae7593 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000020004050 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f5920bb4f80 R15: 00007f59201c8000
</TASK>
In the error path, tcp_sendmsg_fastopen() ends-up calling
mptcp_disconnect(), and the latter tries to close each
subflow, acquiring the socket lock on each of them.
At fastopen time, we have a single subflow, and such subflow
socket lock is already held by the called, causing the deadlock.
We already track the 'fastopen in progress' status inside the msk
socket. Use it to address the issue, making mptcp_disconnect() a
no op when invoked from the fastopen (error) path and doing the
relevant cleanup after releasing the subflow socket lock.
While at the above, rename the fastopen status bit to something
more meaningful.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/321
Fixes: fa9e57468a ("mptcp: fix abba deadlock on fastopen")
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload
support") added support for encapsulation offload. However, the
pathc did not report correctly the csum_level for encapsulated packet.
This patch fixes this issue by reporting correct csum level for the
encapsulated packet.
Fixes: dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload support")
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Peng Li <lpeng@vmware.com>
Link: https://lore.kernel.org/r/20221220202556.24421-1-doshir@vmware.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Multicast packets received on an interface bound to a VRF are marked as
belonging to the VRF and the skb device is updated to point to the VRF
device itself. This was fine even when a route was associated to a
device as when performing a fib table lookup 'oif' in fib6_table_lookup
(coming from 'skb->dev->ifindex' in ip6_route_input) was set to 0 when
FLOWI_FLAG_SKIP_NH_OIF was set.
With commit 40867d74c3 ("net: Add l3mdev index to flow struct and
avoid oif reset for port devices") this is not longer true and multicast
traffic is not received on the original interface.
Instead of adding back a similar check in fib6_table_lookup determine
the dst using the original ifindex for multicast VRF traffic. To make
things consistent across the function do the above for all strict
packets, which was the logic before commit 6f12fa7755 ("vrf: mark skb
for multicast or link-local as enslaved to VRF"). Note that reverting to
this behavior should be fine as the change was about marking packets
belonging to the VRF, not about their dst.
Fixes: 40867d74c3 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20221220171825.1172237-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously ice XDP xmit routine was changed in a way that it avoids
xdp_buff->xdp_frame conversion as it is simply not needed for handling
XDP_TX action and what is more it saves us CPU cycles. This routine is
re-used on ZC driver to handle XDP_TX action.
Although for XDP_TX on Rx ZC xdp_buff that comes from xsk_buff_pool is
converted to xdp_frame, xdp_frame itself is not stored inside
ice_tx_buf, we only store raw data pointer. Casting this pointer to
xdp_frame and calling against it xdp_return_frame in
ice_clean_xdp_tx_buf() results in undefined behavior.
To fix this, simply call page_frag_free() on tx_buf->raw_buf.
Later intention is to remove the buff->frame conversion in order to
simplify the codebase and improve XDP_TX performance on ZC.
Fixes: 126cdfe100 ("ice: xsk: Improve AF_XDP ZC Tx and use batching API")
Reported-and-tested-by: Robin Cowley <robin.cowley@thehutgroup.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@.intel.com>
Link: https://lore.kernel.org/r/20221220175448.693999-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalle Valo says:
====================
wireless fixes for v6.2
First set of fixes for v6.2. Fix for a link error in mt76, fix for an
iwlwifi firmware crash and two cleanups.
* tag 'wireless-2022-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: ath9k: use proper statements in conditionals
wifi: mt76: mt7996: select CONFIG_RELAY
wifi: iwlwifi: fw: skip PPAG for JF
wifi: ti: remove obsolete lines in the Makefile
====================
Link: https://lore.kernel.org/r/20221221180808.96A8AC433EF@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull block fixes from Jens Axboe:
- Various fixes for BFQ (Yu, Yuwei)
- Fix for loop command line parsing (Isaac)
- No need to specifically clear REQ_ALLOC_CACHE on IOPOLL downgrade
anymore (me)
- blk-iocost enum fix for newer gcc (Jiri)
- UAF fix for queue release (Ming)
- blk-iolatency error handling memory leak fix (Tejun)
* tag 'block-6.2-2022-12-19' of git://git.kernel.dk/linux:
block: don't clear REQ_ALLOC_CACHE for non-polled requests
block: fix use-after-free of q->q_usage_counter
block, bfq: only do counting of pending-request for BFQ_GROUP_IOSCHED
blk-iolatency: Fix memory leak on add_disk() failures
loop: Fix the max_loop commandline argument treatment when it is set to 0
block/blk-iocost (gcc13): keep large values in a new enum
block, bfq: replace 0/1 with false/true in bic apis
block, bfq: don't return bfqg from __bfq_bic_change_cgroup()
block, bfq: fix possible uaf for 'bfqq->bic'
Pull io_uring fixes from Jens Axboe:
- Improve the locking for timeouts. This was originally queued up for
the initial pull, but I messed up and it got missed. (Pavel)
- Fix an issue with running task_work from the wait path, causing some
inefficiencies (me)
- Add a clear of ->free_iov upfront in the 32-bit compat data
importing, so we ensure that it's always sane at completion time (me)
- Use call_rcu_hurry() for the eventfd signaling (Dylan)
- Ordering fix for multishot recv completions (Pavel)
- Add the io_uring trace header to the MAINTAINERS entry (Ammar)
* tag 'io_uring-6.2-2022-12-19' of git://git.kernel.dk/linux:
MAINTAINERS: io_uring: Add include/trace/events/io_uring.h
io_uring/net: fix cleanup after recycle
io_uring/net: ensure compat import handlers clear free_iov
io_uring: include task_work run after scheduling in wait for events
io_uring: don't use TIF_NOTIFY_SIGNAL to test for availability of task_work
io_uring: use call_rcu_hurry if signaling an eventfd
io_uring: fix overflow handling regression
io_uring: ease timeout flush locking requirements
io_uring: revise completion_lock locking
io_uring: protect cq_timeouts with timeout_lock
When the bpf_skb_adjust_room() shrinks the skb such that its csum_start
is invalid, the skb->ip_summed should be reset from CHECKSUM_PARTIAL to
CHECKSUM_NONE.
The commit 54c3f1a814 ("bpf: pull before calling skb_postpull_rcsum()")
fixed it.
This patch adds a test to ensure the skb->ip_summed changed from
CHECKSUM_PARTIAL to CHECKSUM_NONE after bpf_skb_adjust_room().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221221185653.1589961-1-martin.lau@linux.dev
This is needed for the vmap/vunmap declarations:
mm/kmsan/kmsan_test.c:316:9: error: implicit declaration of function 'vmap' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
^
mm/kmsan/kmsan_test.c:316:29: error: use of undeclared identifier 'VM_MAP'
vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
^
mm/kmsan/kmsan_test.c:322:3: error: implicit declaration of function 'vunmap' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
vunmap(vbuf);
^
Link: https://lkml.kernel.org/r/20221215163046.4079767-1-arnd@kernel.org
Fixes: 8ed691b02a ("kmsan: add tests for KMSAN")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The build was failing on archlinux because it has a newer libtraceevent
that added a new entry to the tep_print_arg_type enum:
19.72 archlinux:base : FAIL gcc version 12.2.0 (GCC)
util/scripting-engines/trace-event-python.c: In function ‘define_event_symbols’:
util/scripting-engines/trace-event-python.c:281:9: error: enumeration value ‘TEP_PRINT_CPUMASK’ not handled in switch [-Werror=switch-enum]
281 | switch (args->type) {
| ^~~~~~
cc1: all warnings being treated as errors
Since we build with distros that have different versions of
libtraceevent and there is no way to easily test if these enum entries
are available, just disable -Werror=switch-enum for that specific
object.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch isn't intended to have any effect on the compiled code. It
just removes one level of indirection: calling the *host* compiler to
build and then run a program that just printf:s the numerical entries of
the syscall-table. In other words, the generated syscalls.c changes
from:
[46] = "ftruncate",
to:
[__NR3264_ftruncate] = "ftruncate",
The latter is as good as the former to the user of perf, and this can be
done directly by the shell-script. The syscalls defined as non-literal
values (like "#define __NR_ftruncate __NR3264_ftruncate") are trivially
resolved at compile-time without namespace-leaking and/or collision for
its sole user, perf/util/syscalltbl.c, that just #includes the generated
file. A future "-mabi=32" support would probably have to handle this
differently, but that is a pre-existing problem not affected by this
simplification.
Calling the *host* compiler only complicates things and accidentally can
get a completely wrong set of files and syscall numbers, see earlier
commits. Note that the script parameter hostcc is now unused.
At the time of this patch, powerpc (the origin, see comments), and also
e.g. x86 has moved on, from filtering "gcc -dM -E" output to reading
separate specific text-file, a table of syscall numbers. IMHO should
arm64 consider adopting this.
Signed-off-by: Hans-Peter Nilsson <hp@axis.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20201228024159.2BB66203B5@pchp3.se.axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull cifs fixes from Steve French:
"cifs/smb3 client fixes, mostly related to reconnect and/or DFS:
- two important reconnect fixes: cases where status of recently
connected IPCs and shares were not being updated leaving them in an
incorrect state
- fix for older Windows servers that would return
STATUS_OBJECT_NAME_INVALID to query info requests on DFS links in a
namespace that contained non-ASCII characters, reducing number of
wasted roundtrips.
- fix for leaked -ENOMEM to userspace when cifs.ko couldn't perform
I/O due to a disconnected server, expired or deleted session.
- removal of all unneeded DFS related mount option string parsing
(now using fs_context for automounts)
- improve clarity/readability, moving various DFS related functions
out of fs/cifs/connect.c (which was getting too big to be readable)
to new file.
- Fix problem when large number of DFS connections. Allow sharing of
DFS connections and fix how the referral paths are matched
- Referral caching fix: Instead of looking up ipc connections to
refresh cached referrals, store direct dfs root server's IPC
pointer in new sessions so it can simply be accessed to either
refresh or create a new referral that such connections belong to.
- Fix to allow dfs root server's connections to also failover
- Optimized reconnect of nested DFS links
- Set correct status of IPC connections marked for reconnect"
* tag '6.2-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module number
cifs: don't leak -ENOMEM in smb2_open_file()
cifs: use origin fullpath for automounts
cifs: set correct status of tcon ipc when reconnecting
cifs: optimize reconnect of nested links
cifs: fix source pathname comparison of dfs supers
cifs: fix confusing debug message
cifs: don't block in dfs_cache_noreq_update_tgthint()
cifs: refresh root referrals
cifs: fix refresh of cached referrals
cifs: don't refresh cached referrals from unactive mounts
cifs: share dfs connections and supers
cifs: split out ses and tcon retrieval from mount_get_conns()
cifs: set resolved ip in sockaddr
cifs: remove unused smb3_fs_context::mount_options
cifs: get rid of mount options string parsing
cifs: use fs_context for automounts
cifs: reduce roundtrips on create/qinfo requests
cifs: set correct ipc status after initial tree connect
cifs: set correct tcon status after initial tree connect
Pull ntfs3 updates from Konstantin Komarov:
- added mount options 'hidedotfiles', 'nocase' and 'windows_names'
- fixed xfstests (tested on x86_64): generic/083 generic/263
generic/307 generic/465
- fix some logic errors
- code refactoring and dead code removal
* tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3: (61 commits)
fs/ntfs3: Make if more readable
fs/ntfs3: Improve checking of bad clusters
fs/ntfs3: Fix wrong if in hdr_first_de
fs/ntfs3: Use ALIGN kernel macro
fs/ntfs3: Fix incorrect if in ntfs_set_acl_ex
fs/ntfs3: Check fields while reading
fs/ntfs3: Correct ntfs_check_for_free_space
fs/ntfs3: Restore correct state after ENOSPC in attr_data_get_block
fs/ntfs3: Changing locking in ntfs_rename
fs/ntfs3: Fixing wrong logic in attr_set_size and ntfs_fallocate
fs/ntfs3: atomic_open implementation
fs/ntfs3: Fix wrong indentations
fs/ntfs3: Change new sparse cluster processing
fs/ntfs3: Fixing work with sparse clusters
fs/ntfs3: Simplify ntfs_update_mftmirr function
fs/ntfs3: Remove unused functions
fs/ntfs3: Fix sparse problems
fs/ntfs3: Add ntfs_bitmap_weight_le function and refactoring
fs/ntfs3: Use _le variants of bitops functions
fs/ntfs3: Add functions to modify LE bitmaps
...
Pull mount propagation fix from Christian Brauner:
"The propagate_mnt() function handles mount propagation when creating
mounts and propagates the source mount tree @source_mnt to all
applicable nodes of the destination propagation mount tree headed by
@dest_mnt.
Unfortunately it contains a bug where it fails to terminate at peers
of @source_mnt when looking up copies of the source mount that become
masters for copies of the source mount tree mounted on top of slaves
in the destination propagation tree causing a NULL dereference.
This fixes that bug (with a long commit message for a seven character
fix but hopefully it'll help us fix issues faster in the future rather
than having to go through the pain of having to relearn everything
once more)"
* tag 'fs.mount.propagation.fix.v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
pnode: terminate at peers of source
If the libpython feature test (tools/build/feature/test-libpython.c)
fails, then the python-devel is missing, it doesn't mattere if it is for
python2 or 3, remove that explicit 2.x reference.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>