If a Netlink command for the MPTCP path-managers is not valid, it is
important to check if there are errors. If yes, they need to be reported
instead of being ignored and exiting without errors.
Now if no replies are expected, an ACK from the kernelspace is asked by
the userspace in order to always expect a reply. We can use the same
buffer that is currently always >1024 bytes. Then we can check if there
is an error (err->error), print it if any and report the error.
After this modification, it is required to mute expected errors in
mptcp_join.sh and pm_netlink.sh selftests:
- when trying to add a bad endpoint, e.g. duplicated
- when trying to set the two limits above the hard limit
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-3-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch modifies how the detailed results are printed, mainly to
improve what is displayed in case of issue:
- Now the test name (title) is printed earlier, when starting the test
if it is not intentionally skipped: by doing that, errors linked to
a test will be printed after having written the test name and then
avoid confusions.
- Due to the previous item, it is required to add a new line after
having printed the test name because in case of error with a command,
it is better not to have the output in the middle of the screen.
- Each check is printed on a dedicated line with aligned status (ok,
skip, fail): it is easier to spot which one has failed, simpler to
manage in the code not having to deal with alignment case by case and
helpers can be used to uniform what is done. These helpers can also be
useful later to do more actions depending on the results or change in
one place what is printed.
- Info messages have been reduced and aligned as well. And info messages
about the creation of the default test files of 1 KB are no longer
printed.
Example:
001 no JOIN
syn [ ok ]
synack [ ok ]
ack [ ok ]
Or with a skip and a failure:
001 no JOIN
syn [ ok ]
synack [fail] got 42 JOIN[s] synack expected 0
Server ns stats
(...)
Client ns stats
(...)
ack [skip]
Or with info:
104 Infinite map
Test file (size 128 KB) for client
Test file (size 128 KB) for server
file received by server has inverted byte at 169
5 corrupted pkts
syn [ ok ]
synack [ ok ]
While at it, verify_listener_events() now also print more info in case
of failure and in pm_nl_check_endpoint(), the test is marked as failed
instead of skipped if no ID has been given (internal selftest issue).
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-1-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During unregister_netdevice_many_notify(), the ordering of our concerned
function calls is like this:
unregister_netdevice_many_notify
dev_shutdown
qdisc_put
clsact_destroy
tcx_uninstall
The syzbot reproducer triggered a case that the qdisc refcnt is not
zero during dev_shutdown().
tcx_uninstall() will then WARN_ON_ONCE(tcx_entry(entry)->miniq_active)
because the miniq is still active and the entry should not be freed.
The latter assumed that qdisc destruction happens before tcx teardown.
This fix is to avoid tcx_uninstall() doing tcx_entry_free() when the
miniq is still alive and let the clsact_destroy() do the free later, so
that we do not assume any specific ordering for either of them.
If still active, tcx_uninstall() does clear the entry when flushing out
the prog/link. clsact_destroy() will then notice the "!tcx_entry_is_active()"
and then does the tcx_entry_free() eventually.
Fixes: e420bed025 ("bpf: Add fd-based tcx multi-prog infra with link support")
Reported-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com
Reported-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/222255fe07cb58f15ee662e7ee78328af5b438e4.1690549248.git.daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shuah Khan says:
====================
Connector/proc_filter test fixes
The first patch fixes the LKFT reported compile error, second
one adds .gitignore.
====================
Applying the first 2 patches, third one resent separately.
Link: https://lore.kernel.org/r/cover.1690564372.git.skhan@linuxfoundation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test compile fails with following errors. Fix the Makefile
CFLAGS to include KHDR_INCLUDES to pull in uapi defines.
gcc -Wall proc_filter.c -o ../tools/testing/selftests/connector/proc_filter
proc_filter.c: In function ‘send_message’:
proc_filter.c:22:33: error: invalid application of ‘sizeof’ to incomplete type ‘struct proc_input’
22 | sizeof(struct proc_input))
| ^~~~~~
proc_filter.c:42:19: note: in expansion of macro ‘NL_MESSAGE_SIZE’
42 | char buff[NL_MESSAGE_SIZE];
| ^~~~~~~~~~~~~~~
proc_filter.c:22:33: error: invalid application of ‘sizeof’ to incomplete type ‘struct proc_input’
22 | sizeof(struct proc_input))
| ^~~~~~
proc_filter.c:48:34: note: in expansion of macro ‘NL_MESSAGE_SIZE’
48 | hdr->nlmsg_len = NL_MESSAGE_SIZE;
| ^~~~~~~~~~~~~~~
`
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lore.kernel.org/all/CA+G9fYt=6ysz636XcQ=-KJp7vJcMZ=NjbQBrn77v7vnTcfP2cA@mail.gmail.com/
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/d0055c8cdf18516db8ba9edec99cfc5c08f32a7c.1690564372.git.skhan@linuxfoundation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit df8fc4e934 ("kbuild: Enable -fstrict-flex-arrays=3") started
applying strict rules to standard string functions.
It does not work well with conventional socket code around each protocol-
specific sockaddr_XXX struct, which is cast from sockaddr_storage and has
a bigger size than fortified functions expect. See these commits:
commit 06d4c8a808 ("af_unix: Fix fortify_panic() in unix_bind_bsd().")
commit ecb4534b6a ("af_unix: Terminate sun_path when bind()ing pathname socket.")
commit a0ade8404c ("af_packet: Fix warning of fortified memcpy() in packet_getname().")
We must cast the protocol-specific address back to sockaddr_storage
to call such functions.
However, in the case of getsockaddr(SO_PEERNAME), the rationale is a bit
unclear as the buffer is defined by char[128] which is the same size as
sockaddr_storage.
Let's use sockaddr_storage explicitly.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce the control transfer if all bytes of first or the last DWORD are
written.
The original method is to split the control transfer into three parts
(the first DWORD, middle continuous data, and the last DWORD). However,
they could be combined if whole bytes of the first DWORD or last DWORD
are written. That is, the first DWORD or the last DWORD could be combined
with the middle continuous data, if the byte_en is 0xff.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20230726030808.9093-418-nic_swsd@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chuck Lever says:
====================
In-kernel support for the TLS Alert protocol
IMO the kernel doesn't need user space (ie, tlshd) to handle the TLS
Alert protocol. Instead, a set of small helper functions can be used
to handle sending and receiving TLS Alerts for in-kernel TLS
consumers.
====================
Merged on top of a tag in case it's needed in the NFS tree.
Link: https://lore.kernel.org/r/169047923706.5241.1181144206068116926.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I'm about to add support for kernel handshake API consumers to send
TLS Alerts, so introduce the needed protocol definitions in the new
header tls_prot.h.
This presages support for Closure alerts. Also, support for alerts
is a pre-requite for handling session re-keying, where one peer will
signal the need for a re-key by sending a TLS Alert.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047934064.5241.8377890858495063518.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix a W=1 warning with gcc 13.1:
In function ‘fortify_memcpy_chk’,
inlined from ‘bnxt_hwrm_queue_cos2bw_cfg’ at drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c:133:3:
include/linux/fortify-string.h:592:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
592 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The field group is already defined and starts at queue_id:
struct bnxt_cos2bw_cfg {
u8 pad[3];
struct_group_attr(cfg, __packed,
u8 queue_id;
__le32 min_bw;
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230727190726.1859515-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
mlx5-updates-2023-07-24
1) Generalize devcom implementation to be independent of number of ports
or device's GUID.
2) Save memory on command interface statistics.
3) General code cleanups
* tag 'mlx5-updates-2023-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefix
net/mlx5: Make mlx5_eswitch_load/unload_vport() static
net/mlx5: Make mlx5_esw_offloads_rep_load/unload() static
net/mlx5: Remove pointless devlink_rate checks
net/mlx5: Don't check vport->enabled in port ops
net/mlx5e: Make flow classification filters static
net/mlx5e: Remove duplicate code for user flow
net/mlx5: Allocate command stats with xarray
net/mlx5: split mlx5_cmd_init() to probe and reload routines
net/mlx5: Remove redundant cmdif revision check
net/mlx5: Re-organize mlx5_cmd struct
net/mlx5e: E-Switch, Allow devcom initialization on more vports
net/mlx5e: E-Switch, Register devcom device with switch id key
net/mlx5: Devcom, Infrastructure changes
net/mlx5: Use shared code for checking lag is supported
====================
Link: https://lore.kernel.org/r/20230727183914.69229-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata says:
====================
mlxsw: Avoid non-tracker helpers when holding and putting netdevices
Using the tracking helpers, netdev_hold() and netdev_put(), makes it easier
to debug netdevice refcount imbalances when CONFIG_NET_DEV_REFCNT_TRACKER
is enabled. For example, the following traceback shows the callpath to the
point of an outstanding hold that was never put:
unregister_netdevice: waiting for swp3 to become free. Usage count = 6
ref_tracker: eth%d@ffff888123c9a580 has 1/5 users at
mlxsw_sp_switchdev_event+0x6bd/0xcc0 [mlxsw_spectrum]
notifier_call_chain+0xbf/0x3b0
atomic_notifier_call_chain+0x78/0x200
br_switchdev_fdb_notify+0x25f/0x2c0 [bridge]
fdb_notify+0x16a/0x1a0 [bridge]
[...]
In this patchset, get rid of all non-ref-tracking helpers in mlxsw.
- Patch #1 drops two functions that are not used anymore, but contain
dev_hold() / dev_put() calls.
- Patch #2 avoids taking a reference in one function which is called
under RTNL.
- The remaining patches convert individual hold/put sites one by one
from trackerless to tracker-enabled.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/netdev/4c056da27c19d95ffeaba5acf1427ecadfc3f94c.camel@redhat.com/
====================
Link: https://lore.kernel.org/r/cover.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
accept_ra_min_rtr_lft only considered the lifetime of the default route
and discarded entire RAs accordingly.
This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and
applies the value to individual RA sections; in particular, router
lifetime, PIO preferred lifetime, and RIO lifetime. If any of those
lifetimes are lower than the configured value, the specific RA section
is ignored.
In order for the sysctl to be useful to Android, it should really apply
to all lifetimes in the RA, since that is what determines the minimum
frequency at which RAs must be processed by the kernel. Android uses
hardware offloads to drop RAs for a fraction of the minimum of all
lifetimes present in the RA (some networks have very frequent RAs (5s)
with high lifetimes (2h)). Despite this, we have encountered networks
that set the router lifetime to 30s which results in very frequent CPU
wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the
WiFi firmware) entirely on such networks, it seems better to ignore the
misconfigured routers while still processing RAs from other IPv6 routers
on the same network (i.e. to support IoT applications).
The previous implementation dropped the entire RA based on router
lifetime. This turned out to be hard to expand to the other lifetimes
present in the RA in a consistent manner; dropping the entire RA based
on RIO/PIO lifetimes would essentially require parsing the whole thing
twice.
Fixes: 1671bcfd76 ("net: add sysctl accept_ra_min_rtr_lft")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230726230701.919212-1-prohr@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>