Commit Graph

1202612 Commits

Author SHA1 Message Date
Gaosheng Cui
151e34d1c6 ring-buffer: Fix kernel-doc warnings in ring_buffer.c
Fix kernel-doc warnings:

kernel/trace/ring_buffer.c:954: warning: Function parameter or
member 'cpu' not described in 'ring_buffer_wake_waiters'
kernel/trace/ring_buffer.c:3383: warning: Excess function parameter
'event' description in 'ring_buffer_unlock_commit'
kernel/trace/ring_buffer.c:5359: warning: Excess function parameter
'cpu' description in 'ring_buffer_reset_online_cpus'

Link: https://lkml.kernel.org/r/20230724140827.1023266-2-cuigaosheng1@huawei.com

Cc: <mhiramat@kernel.org>
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-28 19:59:03 -04:00
Linus Torvalds
c06f9091a2 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
 "Several smaller driver fixes and a core RDMA CM regression fix:

   - Fix improperly accepting flags from userspace in mlx4

   - Add missing DMA barriers for irdma

   - Fix two kcsan warnings in irdma

   - Report the correct CQ op code to userspace in irdma

   - Report the correct MW bind error code for irdma

   - Load the destination address in RDMA CM to resolve a recent
     regression

   - Fix a QP regression in mthca

   - Remove a race processing completions in bnxt_re resulting in a
     crash

   - Fix driver unloading races with interrupts and tasklets in bnxt_re

   - Fix missing error unwind in rxe"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/irdma: Report correct WC error
  RDMA/irdma: Fix op_type reporting in CQEs
  RDMA/rxe: Fix an error handling path in rxe_bind_mw()
  RDMA/bnxt_re: Fix hang during driver unload
  RDMA/bnxt_re: Prevent handling any completions after qp destroy
  RDMA/mthca: Fix crash when polling CQ for shared QPs
  RDMA/core: Update CMA destination address on rdma_resolve_addr
  RDMA/irdma: Fix data race on CQP request done
  RDMA/irdma: Fix data race on CQP completion stats
  RDMA/irdma: Add missing read barriers
  RDMA/mlx4: Make check for invalid flags stricter
2023-07-28 16:55:56 -07:00
Alexei Starovoitov
eb03993a60 Merge branch 'support-defragmenting-ipv-4-6-packets-in-bpf'
Daniel Xu says:

====================
Support defragmenting IPv(4|6) packets in BPF

=== Context ===

In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:

1. Enforce policy on first fragment and accept all subsequent fragments.
   This works but may let in certain attacks or allow data exfiltration.

2. Enforce policy on first fragment and drop all subsequent fragments.
   This does not really work b/c some protocols may rely on
   fragmentation. For example, DNS may rely on oversized UDP packets for
   large responses.

So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:

    Middleboxes [...] should process IP fragments in a manner that is
    consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
    must maintain state in order to achieve this goal.

=== BPF related bits ===

Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.

The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.

=== Changelog ===

Changes from v5:

* Fix defrag disable codepaths

Changes from v4:

* Refactor module handling code to not sleep in rcu_read_lock()
* Also unify the v4 and v6 hook structs so they can share codepaths
* Fixed some checkpatch.pl formatting warnings

Changes from v3:

* Correctly initialize `addrlen` stack var for recvmsg()

Changes from v2:

* module_put() if ->enable() fails
* Fix CI build errors

Changes from v1:

* Drop bpf_program__attach_netfilter() patches
* static -> static const where appropriate
* Fix callback assignment order during registration
* Only request_module() if callbacks are missing
* Fix retval when modprobe fails in userspace
* Fix v6 defrag module name (nf_defrag_ipv6_hooks -> nf_defrag_ipv6)
* Simplify priority checking code
* Add warning if module doesn't assign callbacks in the future
* Take refcnt on module while defrag link is active

[0]: https://datatracker.ietf.org/doc/html/rfc8900
====================

Link: https://lore.kernel.org/r/cover.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-28 16:52:09 -07:00
Daniel Xu
c313eae739 bpf: selftests: Add defrag selftests
These selftests tests 2 major scenarios: the BPF based defragmentation
can successfully be done and that packet pointers are invalidated after
calls to the kfunc. The logic is similar for both ipv4 and ipv6.

In the first scenario, we create a UDP client and UDP echo server. The
the server side is fairly straightforward: we attach the prog and simply
echo back the message.

The on the client side, we send fragmented packets to and expect the
reassembled message back from the server.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/33e40fdfddf43be93f2cb259303f132f46750953.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-28 16:52:08 -07:00
Daniel Xu
e15a220956 bpf: selftests: Support custom type and proto for client sockets
Extend connect_to_fd_opts() to take optional type and protocol
parameters for the client socket. These parameters are useful when
opening a raw socket to send IP fragments.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/9067db539efdfd608aa86a2b143c521337c111fc.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-28 16:52:08 -07:00
Daniel Xu
3495e89cdc bpf: selftests: Support not connecting client socket
For connectionless protocols or raw sockets we do not want to actually
connect() to the server.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/525c13d66dac2d640a1db922546842c051c6f2e6.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-28 16:52:08 -07:00
Daniel Xu
91721c2d02 netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
This commit adds support for enabling IP defrag using pre-existing
netfilter defrag support. Basically all the flag does is bump a refcnt
while the link the active. Checks are also added to ensure the prog
requesting defrag support is run _after_ netfilter defrag hooks.

We also take care to avoid any issues w.r.t. module unloading -- while
defrag is active on a link, the module is prevented from unloading.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/5cff26f97e55161b7d56b09ddcf5f8888a5add1d.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-28 16:52:08 -07:00
Daniel Xu
9abddac583 netfilter: defrag: Add glue hooks for enabling/disabling defrag
We want to be able to enable/disable IP packet defrag from core
bpf/netfilter code. In other words, execute code from core that could
possibly be built as a module.

To help avoid symbol resolution errors, use glue hooks that the modules
will register callbacks with during module init.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/f6a8824052441b72afe5285acedbd634bd3384c1.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-28 16:52:08 -07:00
Yonghong Song
ee932bf940 docs/bpf: Improve documentation for cpu=v4 instructions
Improve documentation for cpu=v4 instructions based on
David's suggestions.

Cc: bpf@ietf.org
Suggested-by: David Vernet <void@manifault.com>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728225105.919595-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-28 16:50:06 -07:00
Linus Torvalds
2b17e90d3f Merge tag 'tpmdd-v6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
 "I picked up three small scale updates that I think would improve the
  quality of the release"

* tag 'tpmdd-v6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm_tis: Explicitly check for error code
  tpm: Switch i2c drivers back to use .probe()
  security: keys: perform capable check only on privileged operations
2023-07-28 16:44:32 -07:00
Zheng Yejian
2d093282b0 ring-buffer: Fix wrong stat of cpu_buffer->read
When pages are removed in rb_remove_pages(), 'cpu_buffer->read' is set
to 0 in order to make sure any read iterators reset themselves. However,
this will mess 'entries' stating, see following steps:

  # cd /sys/kernel/tracing/
  # 1. Enlarge ring buffer prepare for later reducing:
  # echo 20 > per_cpu/cpu0/buffer_size_kb
  # 2. Write a log into ring buffer of cpu0:
  # taskset -c 0 echo "hello1" > trace_marker
  # 3. Read the log:
  # cat per_cpu/cpu0/trace_pipe
       <...>-332     [000] .....    62.406844: tracing_mark_write: hello1
  # 4. Stop reading and see the stats, now 0 entries, and 1 event readed:
  # cat per_cpu/cpu0/stats
   entries: 0
   [...]
   read events: 1
  # 5. Reduce the ring buffer
  # echo 7 > per_cpu/cpu0/buffer_size_kb
  # 6. Now entries became unexpected 1 because actually no entries!!!
  # cat per_cpu/cpu0/stats
   entries: 1
   [...]
   read events: 0

To fix it, introduce 'page_removed' field to count total removed pages
since last reset, then use it to let read iterators reset themselves
instead of changing the 'read' pointer.

Link: https://lore.kernel.org/linux-trace-kernel/20230724054040.3489499-1-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Fixes: 83f40318da ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-28 19:16:23 -04:00
Colin Ian King
3bdd85e2e3 net: ethernet: slicoss: remove redundant increment of pointer data
The pointer data is being incremented but this change to the pointer
is not used afterwards. The increment is redundant and can be removed.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Link: https://lore.kernel.org/r/20230726164522.369206-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 15:37:08 -07:00
Jakub Kicinski
05191d8896 Merge branch 'in-kernel-support-for-the-tls-alert-protocol'
Chuck Lever says:

====================
In-kernel support for the TLS Alert protocol

IMO the kernel doesn't need user space (ie, tlshd) to handle the TLS
Alert protocol. Instead, a set of small helper functions can be used
to handle sending and receiving TLS Alerts for in-kernel TLS
consumers.
====================

Merged on top of a tag in case it's needed in the NFS tree.

Link: https://lore.kernel.org/r/169047923706.5241.1181144206068116926.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:08:02 -07:00
Chuck Lever
b470985c76 net/handshake: Trace events for TLS Alert helpers
Add observability for the new TLS Alert infrastructure.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047947409.5241.14548832149596892717.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
39067dda1d SUNRPC: Use new helpers to handle TLS Alerts
Use the helpers to parse the level and description fields in
incoming alerts. "Warning" alerts are discarded, and "fatal"
alerts mean the session is no longer valid.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047944747.5241.1974889594004407123.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
39d0e38dcc net/handshake: Add helpers for parsing incoming TLS Alerts
Kernel TLS consumers can replace common TLS Alert parsing code with
these helpers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047942074.5241.13791647439480672048.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
5dd5ad682c SUNRPC: Send TLS Closure alerts before closing a TCP socket
Before closing a TCP connection, the TLS protocol wants peers to
send session close Alert notifications. Add those in both the RPC
client and server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047939404.5241.14392506226409865832.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
35b1b538d4 net/handshake: Add API for sending TLS Closure alerts
This helper sends an alert only if a TLS session was established.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047936730.5241.618595693821012638.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
0257427146 net/tls: Add TLS Alert definitions
I'm about to add support for kernel handshake API consumers to send
TLS Alerts, so introduce the needed protocol definitions in the new
header tls_prot.h.

This presages support for Closure alerts. Also, support for alerts
is a pre-requite for handling session re-keying, where one peer will
signal the need for a re-key by sending a TLS Alert.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047934064.5241.8377890858495063518.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
6a7eccef47 net/tls: Move TLS protocol elements to a separate header
Kernel TLS consumers will need definitions of various parts of the
TLS protocol, but often do not need the function declarations and
other infrastructure provided in <net/tls.h>.

Break out existing standardized protocol elements into a separate
header, and make room for a few more elements in subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047931374.5241.7713175865185969309.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Suman Ghosh
222a6c42e9 octeontx2-af: Initialize 'cntr_val' to fix uninitialized symbol error
drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c:860
otx2_tc_update_mcam_table_del_req()
error: uninitialized symbol 'cntr_val'.

Fixes: ec87f05402 ("octeontx2-af: Install TC filter rules in hardware based on priority")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20230727163101.2793453-1-sumang@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:02:11 -07:00
Jakub Kicinski
a4989bee92 Merge branch 'eth-bnxt-fix-a-couple-of-w-1-c-1-warnings'
Jakub Kicinski says:

====================
eth: bnxt: fix a couple of W=1 C=1 warnings

Fix a couple of build warnings.
====================

Link: https://lore.kernel.org/r/20230727190726.1859515-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:47:36 -07:00
Jakub Kicinski
9f49db62f5 eth: bnxt: fix warning for define in struct_group
Fix C=1 warning with sparse 0.6.4:

drivers/net/ethernet/broadcom/bnxt/bnxt.c: note: in included file:
drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h:30:1: warning: directive in macro's argument list

Don't put defines in a struct_group().

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230727190726.1859515-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:47:34 -07:00
Jakub Kicinski
833c4a8105 eth: bnxt: fix one of the W=1 warnings about fortified memcpy()
Fix a W=1 warning with gcc 13.1:

In function ‘fortify_memcpy_chk’,
    inlined from ‘bnxt_hwrm_queue_cos2bw_cfg’ at drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c:133:3:
include/linux/fortify-string.h:592:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
  592 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The field group is already defined and starts at queue_id:

struct bnxt_cos2bw_cfg {
	u8			pad[3];
	struct_group_attr(cfg, __packed,
		u8		queue_id;
		__le32		min_bw;

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230727190726.1859515-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:47:34 -07:00
Jakub Kicinski
b10d10a7c1 Merge tag 'mlx5-updates-2023-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5-updates-2023-07-24

1) Generalize devcom implementation to be independent of number of ports
   or device's GUID.

2) Save memory on command interface statistics.

3) General code cleanups

* tag 'mlx5-updates-2023-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefix
  net/mlx5: Make mlx5_eswitch_load/unload_vport() static
  net/mlx5: Make mlx5_esw_offloads_rep_load/unload() static
  net/mlx5: Remove pointless devlink_rate checks
  net/mlx5: Don't check vport->enabled in port ops
  net/mlx5e: Make flow classification filters static
  net/mlx5e: Remove duplicate code for user flow
  net/mlx5: Allocate command stats with xarray
  net/mlx5: split mlx5_cmd_init() to probe and reload routines
  net/mlx5: Remove redundant cmdif revision check
  net/mlx5: Re-organize mlx5_cmd struct
  net/mlx5e: E-Switch, Allow devcom initialization on more vports
  net/mlx5e: E-Switch, Register devcom device with switch id key
  net/mlx5: Devcom, Infrastructure changes
  net/mlx5: Use shared code for checking lag is supported
====================

Link: https://lore.kernel.org/r/20230727183914.69229-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:41:59 -07:00
Jakub Kicinski
97d0dca794 Merge branch 'mlxsw-avoid-non-tracker-helpers-when-holding-and-putting-netdevices'
Petr Machata says:

====================
mlxsw: Avoid non-tracker helpers when holding and putting netdevices

Using the tracking helpers, netdev_hold() and netdev_put(), makes it easier
to debug netdevice refcount imbalances when CONFIG_NET_DEV_REFCNT_TRACKER
is enabled. For example, the following traceback shows the callpath to the
point of an outstanding hold that was never put:

    unregister_netdevice: waiting for swp3 to become free. Usage count = 6
    ref_tracker: eth%d@ffff888123c9a580 has 1/5 users at
	mlxsw_sp_switchdev_event+0x6bd/0xcc0 [mlxsw_spectrum]
	notifier_call_chain+0xbf/0x3b0
	atomic_notifier_call_chain+0x78/0x200
	br_switchdev_fdb_notify+0x25f/0x2c0 [bridge]
	fdb_notify+0x16a/0x1a0 [bridge]
	[...]

In this patchset, get rid of all non-ref-tracking helpers in mlxsw.

- Patch #1 drops two functions that are not used anymore, but contain
  dev_hold() / dev_put() calls.

- Patch #2 avoids taking a reference in one function which is called
  under RTNL.

- The remaining patches convert individual hold/put sites one by one
  from trackerless to tracker-enabled.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/netdev/4c056da27c19d95ffeaba5acf1427ecadfc3f94c.camel@redhat.com/
====================

Link: https://lore.kernel.org/r/cover.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:48 -07:00
Petr Machata
cb21162041 mlxsw: spectrum_router: IPv6 events: Use tracker helpers to hold & put netdevices
Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with IPv6 address events.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/f0af6ad4722b4ca6e598fd4fda8311a3041651ec.1690471775.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:46 -07:00
Petr Machata
d0e0e88012 mlxsw: spectrum_router: RIF: Use tracker helpers to hold & put netdevices
Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with RIF allocation.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/8b7701a7b439ac268e4be4040eff99d01e27ae47.1690471775.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Petr Machata
b17b2d57b7 mlxsw: spectrum_router: hw_stats: Use tracker helpers to hold & put netdevices
Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with hw_stats events.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/b972314cfef4f4c24e66e60d13cffa5d606d1bf3.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Petr Machata
deeaa3716f mlxsw: spectrum_router: FIB: Use tracker helpers to hold & put netdevices
Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with FIB events.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/5221a92e751c40447c55959f622267ccc999ed04.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Petr Machata
1ae489ab43 mlxsw: spectrum_switchdev: Use tracker helpers to hold & put netdevices
Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
switchdev module.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/774c3d7b5b0231f1435df2ec9dd660192e382756.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Petr Machata
16f8c846cd mlxsw: spectrum_nve: Do not take reference when looking up netdevice
mlxsw_sp_nve_fid_disable() is always called under RTNL. It is therefore
safe to call __dev_get_by_index() to get the netdevice pointer without
bumping the reference count, because we can be sure the netdevice is not
going away. That then obviates the need to put the netdevice later in the
function.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/341d1046f89d8d839d9d00e4a3d58cdc351e9397.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Petr Machata
569f98b36b mlxsw: spectrum: Drop unused functions mlxsw_sp_port_lower_dev_hold/_put()
As of commit 151b89f602 ("mlxsw: spectrum_router: Reuse work neighbor
initialization in work scheduler"), the functions
mlxsw_sp_port_lower_dev_hold() and mlxsw_sp_port_dev_put() have no users.
Drop them.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/d0adcd7cb4ea19416294a0f861100edba84c9f36.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Patrick Rohr
5027d54a9c net: change accept_ra_min_rtr_lft to affect all RA lifetimes
accept_ra_min_rtr_lft only considered the lifetime of the default route
and discarded entire RAs accordingly.

This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and
applies the value to individual RA sections; in particular, router
lifetime, PIO preferred lifetime, and RIO lifetime. If any of those
lifetimes are lower than the configured value, the specific RA section
is ignored.

In order for the sysctl to be useful to Android, it should really apply
to all lifetimes in the RA, since that is what determines the minimum
frequency at which RAs must be processed by the kernel. Android uses
hardware offloads to drop RAs for a fraction of the minimum of all
lifetimes present in the RA (some networks have very frequent RAs (5s)
with high lifetimes (2h)). Despite this, we have encountered networks
that set the router lifetime to 30s which results in very frequent CPU
wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the
WiFi firmware) entirely on such networks, it seems better to ignore the
misconfigured routers while still processing RAs from other IPv6 routers
on the same network (i.e. to support IoT applications).

The previous implementation dropped the entire RA based on router
lifetime. This turned out to be hard to expand to the other lifetimes
present in the RA in a consistent manner; dropping the entire RA based
on RIO/PIO lifetimes would essentially require parsing the whole thing
twice.

Fixes: 1671bcfd76 ("net: add sysctl accept_ra_min_rtr_lft")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230726230701.919212-1-prohr@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:30:51 -07:00
Davidlohr Bueso
ad64f5952c cxl/memdev: Only show sanitize sysfs files when supported
If the device does not support Sanitize or Secure Erase commands,
hide the respective sysfs interfaces such that the operation can
never be attempted.

In order to be generic, keep track of the enabled security commands
found in the CEL - the driver does not support Security Passthrough.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230726051940.3570-4-dave@stgolabs.net
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-28 13:16:54 -06:00
Davidlohr Bueso
3de8cd2242 cxl/memdev: Document security state in kern-doc
... as is the case with all members of struct cxl_memdev_state.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230726051940.3570-3-dave@stgolabs.net
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-28 13:16:54 -06:00
Davidlohr Bueso
0fcde5989e cxl/memdev: Improve sanitize ABI descriptions
Be more detailed about the CPU cache management situation. The same
goes for both sanitize and secure erase.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230726051940.3570-2-dave@stgolabs.net
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-28 13:16:54 -06:00
Jakub Kicinski
5bdc312c1d Merge branch 'net-store-netdevs-in-an-xarray'
Jakub Kicinski says:

====================
net: store netdevs in an xarray

One of more annoying developer experience gaps we have in netlink
is iterating over netdevs. It's painful. Add an xarray to make
it trivial.

v1: https://lore.kernel.org/all/20230722014237.4078962-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20230726185530.2247698-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 11:36:00 -07:00
Jakub Kicinski
84e00d9bd4 net: convert some netlink netdev iterators to depend on the xarray
Reap the benefits of easier iteration thanks to the xarray.
Convert just the genetlink ones, those are easier to test.

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230726185530.2247698-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 11:35:58 -07:00
Jakub Kicinski
759ab1edb5 net: store netdevs in an xarray
Iterating over the netdev hash table for netlink dumps is hard.
Dumps are done in "chunks" so we need to save the position
after each chunk, so we know where to restart from. Because
netdevs are stored in a hash table we remember which bucket
we were in and how many devices we dumped.

Since we don't hold any locks across the "chunks" - devices may
come and go while we're dumping. If that happens we may miss
a device (if device is deleted from the bucket we were in).
We indicate to user space that this may have happened by setting
NLM_F_DUMP_INTR. User space is supposed to dump again (I think)
if it sees that. Somehow I doubt most user space gets this right..

To illustrate let's look at an example:

               System state:
  start:       # [A, B, C]
  del:  B      # [A, C]

with the hash table we may dump [A, B], missing C completely even
tho it existed both before and after the "del B".

Add an xarray and use it to allocate ifindexes. This way we
can iterate ifindexes in order, without the worry that we'll
skip one. We may still generate a dump of a state which "never
existed", for example for a set of values and sequence of ops:

               System state:
  start:       # [A, B]
  add:  C      # [A, C, B]
  del:  B      # [A, C]

we may generate a dump of [A], if C got an index between A and B.
System has never been in such state. But I'm 90% sure that's perfectly
fine, important part is that we can't _miss_ devices which exist before
and after. User space which wants to mirror kernel's state subscribes
to notifications and does periodic dumps so it will know that C exists
from the notification about its creation or from the next dump
(next dump is _guaranteed_ to include C, if it doesn't get removed).

To avoid any perf regressions keep the hash table for now. Most
net namespaces have very few devices and microbenchmarking 1M lookups
on Skylake I get the following results (not counting loopback
to number of devs):

 #devs | hash |  xa  | delta
    2  | 18.3 | 20.1 | + 9.8%
   16  | 18.3 | 20.1 | + 9.5%
   64  | 18.3 | 26.3 | +43.8%
  128  | 20.4 | 26.3 | +28.6%
  256  | 20.0 | 26.4 | +32.1%
 1024  | 26.6 | 26.7 | + 0.2%
 8192  |541.3 | 33.5 | -93.8%

No surprises since the hash table has 256 entries.
The microbenchmark scans indexes in order, if the pattern is more
random xa starts to win at 512 devices already. But that's a lot
of devices, in practice.

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230726185530.2247698-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 11:35:58 -07:00
Georg Müller
98ce8e4a9d perf test uprobe_from_different_cu: Skip if there is no gcc
Without gcc, the test will fail.

On cleanup, ignore probe removal errors. Otherwise, in case of an error
adding the probe, the temporary directory is not removed.

Fixes: 56cbeacf14 ("perf probe: Add test for regression introduced by switch to die_get_decl_file()")
Signed-off-by: Georg Müller <georgmueller@gmx.net>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Georg Müller <georgmueller@gmx.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230728151812.454806-2-georgmueller@gmx.net
Link: https://lore.kernel.org/r/CAP-5=fUP6UuLgRty3t2=fQsQi3k4hDMz415vWdp1x88QMvZ8ug@mail.gmail.com/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-07-28 15:31:21 -03:00
Linus Torvalds
f837f0a3c9 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - A couple of SME updates for recent fixes (one of which went to
   stable): reverting the flushing of the SME hardware state along with
   the thread flushing and making sure we have the correct vector length
   before reallocating.

 - An ACPI/IORT fix to avoid skipping ID mappings whose "number of IDs"
   is 0 (the spec reports the number of IDs in the mapping range minus
   1).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  ACPI/IORT: Remove erroneous id_count check in iort_node_get_rmr_info()
  arm64/sme: Set new vector length before reallocating
  arm64/fpsimd: Don't flush SME register hardware state along with thread
2023-07-28 11:21:57 -07:00
Linus Torvalds
81eef8909d Merge tag 'for-linus-6.5a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:

 - A fix for a performance problem in QubesOS, adding a way to drain the
   queue of grants experiencing delayed unmaps faster

 - A patch enabling the use of static event channels from user mode,
   which was omitted when introducing supporting static event channels

 - A fix for a problem where Xen related code didn't check properly for
   running in a Xen environment, resulting in a WARN splat

* tag 'for-linus-6.5a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: speed up grant-table reclaim
  xen/evtchn: Introduce new IOCTL to bind static evtchn
  xenbus: check xen_domain in xenbus_probe_initcall
2023-07-28 11:17:30 -07:00
Alexander Steffen
513253f8c2 tpm_tis: Explicitly check for error code
recv_data either returns the number of received bytes, or a negative value
representing an error code. Adding the return value directly to the total
number of received bytes therefore looks a little weird, since it might add
a negative error code to a sum of bytes.

The following check for size < expected usually makes the function return
ETIME in that case, so it does not cause too many problems in practice. But
to make the code look cleaner and because the caller might still be
interested in the original error code, explicitly check for the presence of
an error code and pass that through.

Cc: stable@vger.kernel.org
Fixes: cb5354253a ("[PATCH] tpm: spacing cleanups 2")
Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-07-28 18:13:39 +00:00
Uwe Kleine-König
be6f48a7c8 tpm: Switch i2c drivers back to use .probe()
After commit b8a1a4cd5a ("i2c: Provide a temporary .probe_new()
call-back type"), all drivers being converted to .probe_new() and then
03c835f498 ("i2c: Switch .probe() to not take an id parameter")
convert back to (the new) .probe() to be able to eventually drop
.probe_new() from struct i2c_driver.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-07-28 18:12:40 +00:00
Christian Göttsche
2d7f105edb security: keys: perform capable check only on privileged operations
If the current task fails the check for the queried capability via
`capable(CAP_SYS_ADMIN)` LSMs like SELinux generate a denial message.
Issuing such denial messages unnecessarily can lead to a policy author
granting more privileges to a subject than needed to silence them.

Reorder CAP_SYS_ADMIN checks after the check whether the operation is
actually privileged.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-07-28 18:07:41 +00:00
Linus Torvalds
e62e26d3e9 Merge tag 'ceph-for-6.5-rc4' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
 "A patch to reduce the potential for erroneous RBD exclusive lock
  blocklisting (fencing) with a couple of prerequisites and a fixup to
  prevent metrics from being sent to the MDS even just once after that
  has been disabled by the user. All marked for stable"

* tag 'ceph-for-6.5-rc4' of https://github.com/ceph/ceph-client:
  rbd: retrieve and check lock owner twice before blocklisting
  rbd: harden get_lock_owner_info() a bit
  rbd: make get_lock_owner_info() return a single locker or NULL
  ceph: never send metrics if disable_send_metrics is set
2023-07-28 10:47:24 -07:00
Linus Torvalds
28d79b746c Merge tag '9p-fixes-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p fixes from Eric Van Hensbergen:
 "Misc set of fixes for 9p.

  Most of these clean up warnings we've gotten out of compilation tools,
  but several of them were from inspection while hunting down a couple
  of regressions.

  The most important one is 75b396821c ("fs/9p: remove unnecessary and
  overrestrictive check") which caused a regression for some folks by
  restricting mmap in any case where writeback caches weren't enabled.

  Most of the other bugs caught via inspection were type mismatches"

* tag '9p-fixes-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: Remove unused extern declaration
  9p: remove dead stores (variable set again without being read)
  9p: virtio: skip incrementing unused variable
  9p: virtio: make sure 'offs' is initialized in zc_request
  9p: virtio: fix unlikely null pointer deref in handle_rerror
  9p: fix ignored return value in v9fs_dir_release
  fs/9p: remove unnecessary invalidate_inode_pages2
  fs/9p: fix type mismatch in file cache mode helper
  fs/9p: fix typo in comparison logic for cache mode
  fs/9p: remove unnecessary and overrestrictive check
  fs/9p: Fix a datatype used with V9FS_DIRECT_IO
2023-07-28 10:43:16 -07:00
Linus Torvalds
818680d154 Merge tag 'block-6.5-2023-07-28' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
 "A few fixes that should go into the current kernel release, mainly:

   - Set of fixes for dasd (Stefan)

   - Handle interruptible waits returning because of a signal for ublk
     (Ming)"

* tag 'block-6.5-2023-07-28' of git://git.kernel.dk/linux:
  ublk: return -EINTR if breaking from waiting for existed users in DEL_DEV
  ublk: fail to recover device if queue setup is interrupted
  ublk: fail to start device if queue setup is interrupted
  block: Fix a source code comment in include/uapi/linux/blkzoned.h
  s390/dasd: print copy pair message only for the correct error
  s390/dasd: fix hanging device after request requeue
  s390/dasd: use correct number of retries for ERP requests
  s390/dasd: fix hanging device after quiesce/resume
2023-07-28 10:23:41 -07:00
Linus Torvalds
9c65505826 Merge tag 'io_uring-6.5-2023-07-28' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
 "Just a single tweak to a patch from last week, to avoid having idle
  cqring waits be attributed as iowait"

* tag 'io_uring-6.5-2023-07-28' of git://git.kernel.dk/linux:
  io_uring: gate iowait schedule on having pending requests
2023-07-28 10:19:44 -07:00