Currently, xsk_socket__delete frees BPF resources regardless of ctx
refcount. Xdpxceiver has a test to verify whether underlying BPF
resources would not be wiped out after closing XSK socket that was
bound to interface with other active sockets. From library's xsk part
perspective it also means that the internal xsk context is shared and
its refcount is bumped accordingly.
After a switch to loading XDP prog based on previously opened XSK
socket, mentioned xdpxceiver test fails with:
not ok 16 [xdpxceiver.c:swap_xsk_resources:1334]: ERROR: 9/"Bad file descriptor
which means that in swap_xsk_resources(), xsk_socket__delete() released
xskmap which in turn caused a failure of xsk_socket__update_xskmap().
To fix this, when deleting socket, decrement ctx refcount before
releasing BPF resources and do so only when refcount dropped to 0 which
means there are no more active sockets for this ctx so BPF resources can
be freed safely.
Fixes: 2f6324a393 ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220629143458.934337-5-maciej.fijalkowski@intel.com
Currently, xsk_setup_xdp_prog() uses anonymous xsk_socket struct which
means that during xsk_create_bpf_link() call, xsk->config.xdp_flags is
always 0. This in turn means that from xdpxceiver it is impossible to
use xdpgeneric attachment, so since commit 3b22523bca ("selftests,
xsk: Fix bpf_res cleanup test") we were not testing SKB mode at all.
To fix this, introduce a function, called xsk_setup_xdp_prog_xsk(), that
will load XDP prog based on the existing xsk_socket, so that xsk
context's refcount is correctly bumped and flags from application side
are respected. Use this from xdpxceiver side so we get coverage of
generic and native XDP program attach points.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220629143458.934337-3-maciej.fijalkowski@intel.com
Currently bpf_link probe is done for each call of xsk_socket__create().
For cases where xsk context was previously created and current socket
creation uses it, has_bpf_link will be overwritten, where it has already
been initialized.
Optimize this by moving the query to the xsk_create_ctx() so that when
xsk_get_ctx() finds a ctx then no further bpf_link probes are needed.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220629143458.934337-2-maciej.fijalkowski@intel.com
Commit f49169c97f ("NFSD: Remove svc_serv_ops::svo_module") removed
calls to module_put_and_kthread_exit() from threads that acted as SUNRPC
servers and had a related svc_serv_ops structure. This was correct.
It ALSO removed the module_put_and_kthread_exit() call from
nfs4_run_state_manager() which is NOT a SUNRPC service.
Consequently every time the NFSv4 state manager runs the module count
increments and won't be decremented. So the nfsv4 module cannot be
unloaded.
So restore the module_put_and_kthread_exit() call.
Fixes: f49169c97f ("NFSD: Remove svc_serv_ops::svo_module")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Clear VF MAC from parent PF and remove VF filter from VSI when both
conditions are true:
-VIRTCHNL_VF_OFFLOAD_USO is not used
-VM MAC was not set from PF level
It affects older version of IAVF and it allow them to change MAC
Address on VM, newer IAVF won't change their behaviour.
Previously it wasn't possible to change VF's MAC Address on VM
because there is flag on IAVF driver that won't allow to
change MAC Address if this address is given from PF driver.
Fixes: 155f0ac2c9 ("iavf: allow permanent MAC address to change")
Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dropped packets caused by too large frames were not included in
dropped RX packets statistics.
Issue was caused by not reading the GL_RXERR1 register. That register
stores count of packet which was have been dropped due to too large
size.
Fix it by reading GL_RXERR1 register for each interface.
Repro steps:
Send a packet larger than the set MTU to SUT
Observe rx statists: ethtool -S <interface> | grep rx | grep -v ": 0"
Fixes: 41a9e55c89 ("i40e: add missing VSI statistics")
Signed-off-by: Lukasz Cieplicki <lukaszx.cieplicki@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.19
- more quirks (Lamarque Vieira Souza, Pablo Greco)
- fix a fabrics disconnect regression (Ruozhu Li)
- fix a nvmet-tcp data_digest calculation regression (Sagi Grimberg)
- fix nvme-tcp send failure handling (Sagi Grimberg)
- fix a regression with nvmet-loop and passthrough controllers
(Alan Adamson)"
* tag 'nvme-5.19-2022-06-30' of git://git.infradead.org/nvme:
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1
nvmet: add a clear_ids attribute for passthru targets
nvme: fix regression when disconnect a recovering ctrl
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG SX6000LNP (AKA SPECTRIX S40G)
nvme-tcp: always fail a request when sending it failed
nvmet-tcp: fix regression in data_digest calculation
The current link mode of the phylink instance may not require an
attached PCS. However, phylink_major_config() unconditionally
dereferences this potentially NULL pointer when restarting the link poll
timer, which will panic the kernel.
Fix the problem by checking whether a PCS exists in phylink_pcs_poll_start(),
otherwise do nothing. The code prior to the blamed patch also only
looked at pcs->poll within an "if (pcs)" block.
Fixes: bfac8c490d ("net: phylink: disable PCS polling over major configuration")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Tested-by: Michael Walle <michael@walle.cc> # on kontron-kbox-a-230-ls
Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com> # on sam9x60ek
Link: https://lore.kernel.org/r/20220629193358.4007923-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both PSFP stats and the port stats read by ocelot_check_stats_work() are
indirectly read through the same mechanism - write to STAT_CFG:STAT_VIEW,
read from SYS:STAT:CNT[n].
It's just that for port stats, we write STAT_VIEW with the index of the
port, and for PSFP stats, we write STAT_VIEW with the filter index.
So if we allow them to run concurrently, ocelot_check_stats_work() may
change the view from vsc9959_psfp_counters_get(), and vice versa.
Fixes: 7d4b564d6a ("net: dsa: felix: support psfp filter on vsc9959")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220629183007.3808130-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric reports that syzbot made short work out of my speculative
fix. Indeed when queue gets detached its tfile->tun remains,
so we would try to stop NAPI twice with a detach(), close()
sequence.
Alternative fix would be to move tun_napi_disable() to
tun_detach_all() and let the NAPI run after the queue
has been detached.
Fixes: a8fc8cb569 ("net: tun: stop NAPI when detaching queues")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220629181911.372047-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Delete the redundant word 'frames'.
Delete the redundant word 'set'.
Delete the redundant word 'slot'.
Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add RaptorLake to the list of processor models supported by the Intel
TCC cooling driver.
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
[ rjw: Subject edits, new changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
there is an unexpected word 'for' in the comments that need to be dropped
file - drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
line - 5164
* ixgbe_lpbthresh - calculate low water mark for for flow control
changed to:
* ixgbe_lpbthresh - calculate low water mark for flow control
Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
s390x appears to present two RNG interfaces:
- a "TRNG" that gathers entropy using some hardware function; and
- a "DRBG" that takes in a seed and expands it.
Previously, the TRNG was wired up to arch_get_random_{long,int}(), but
it was observed that this was being called really frequently, resulting
in high overhead. So it was changed to be wired up to arch_get_random_
seed_{long,int}(), which was a reasonable decision. Later on, the DRBG
was then wired up to arch_get_random_{long,int}(), with a complicated
buffer filling thread, to control overhead and rate.
Fortunately, none of the performance issues matter much now. The RNG
always attempts to use arch_get_random_seed_{long,int}() first, which
means a complicated implementation of arch_get_random_{long,int}() isn't
really valuable or useful to have around. And it's only used when
reseeding, which means it won't hit the high throughput complications
that were faced before.
So this commit returns to an earlier design of just calling the TRNG in
arch_get_random_seed_{long,int}(), and returning false in arch_get_
random_{long,int}().
Part of what makes the simplification possible is that the RNG now seeds
itself using the TRNG at bootup. But this only works if the TRNG is
detected early in boot, before random_init() is called. So this commit
also causes that check to happen in setup_arch().
Cc: stable@vger.kernel.org
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Ingo Franzki <ifranzki@linux.ibm.com>
Cc: Juergen Christ <jchrist@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20220610222023.378448-1-Jason@zx2c4.com
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
io_import_iovec uses the s pointer, but this was changed immediately
after the iovec was re-imported and so it was imported into the wrong
place.
Change the ordering.
Fixes: 2be2eb02e2 ("io_uring: ensure reads re-import for selected buffers")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630132006.2825668-1-dylany@fb.com
[axboe: ensure we don't half-import as well]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull rdma fixes from Jason Gunthorpe:
"Three minor bug fixes:
- qedr not setting the QP timeout properly toward userspace
- Memory leak on error path in ib_cm
- Divide by 0 in RDMA interrupt moderation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
linux/dim: Fix divide by 0 in RDMA DIM
RDMA/cm: Fix memory leak in ib_cm_insert_listen
RDMA/qedr: Fix reporting QP timeout attribute
Pull fanotify fix from Jan Kara:
"A fix for recently added fanotify API to have stricter checks and
refuse some invalid flag combinations to make our life easier in the
future"
* tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: refine the validation checks on non-dir inode mask
Pull crypto fix from Herbert Xu:
"Fix a regression that breaks the ccp driver"
* tag 'v5.19-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccp - Fix device IRQ counting by using platform_irq_count()
As found by the compile option -Wunused-macros, remove these macros
that are never used by the code.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Now that bpftool is able to produce a list of known program, map, attach
types, let's use as much of this as we can in the bash completion file,
so that we don't have to expand the list each time a new type is added
to the kernel.
Also update the relevant test script to remove some checks that are no
longer needed.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/bpf/20220629203637.138944-3-quentin@isovalent.com
Add a "bpftool feature list" subcommand to list BPF "features".
Contrarily to "bpftool feature probe", this is not about the features
available on the system. Instead, it lists all features known to bpftool
from compilation time; in other words, all program, map, attach, link
types known to the libbpf version in use, and all helpers found in the
UAPI BPF header.
The first use case for this feature is bash completion: running the
command provides a list of types that can be used to produce the list of
candidate map types, for example.
Now that bpftool uses "standard" names provided by libbpf for the
program, map, link, and attach types, having the ability to list these
types and helpers could also be useful in scripts to loop over existing
items.
Sample output:
# bpftool feature list prog_types | grep -vw unspec | head -n 6
socket_filter
kprobe
sched_cls
sched_act
tracepoint
xdp
# bpftool -p feature list map_types | jq '.[1]'
"hash"
# bpftool feature list attach_types | grep '^cgroup_'
cgroup_inet_ingress
cgroup_inet_egress
[...]
cgroup_inet_sock_release
# bpftool feature list helpers | grep -vw bpf_unspec | wc -l
207
The "unspec" types and helpers are not filtered out by bpftool, so as to
remain closer to the enums, and to preserve the indices in the JSON
arrays (e.g. "hash" at index 1 == BPF_MAP_TYPE_HASH in map types list).
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/bpf/20220629203637.138944-2-quentin@isovalent.com
Pull devfreq fixes for 5.19-rc5 from Chanwoo Choi:
"1. Fix devfreq passive governor issue when cpufreq policies are not
ready during kernel boot because some CPUs turn on after kernel
booting or others.
- Re-initialize the vairables of struct devfreq_passive_data when
PROBE_DEFER happens when cpufreq_get() returns NULL.
- Use dev_err_probe to mute warning when PROBE_DEFER.
- Fix cpufreq passive unregister erroring on PROBE_DEFER
by using the allocated parent_cpu_data list to free resouce
instead of for_each_possible_cpu().
- Remove duplicate cpufreq passive unregister and warning when
PROBE_DEFER.
- Use HZ_PER_KZH macro in units.h.
- Fix wrong indentation in SPDX-License line.
2. Fix reference count leak in exynos-ppmu.c by using of_node_put().
3. Rework freq_table to be local to devfreq struct
- struct devfreq_dev_profile includes freq_table array to store
the supported frequencies. If devfreq driver doesn't initialize
the freq_table, devfreq core allocates the memory and initializes
the freq_table.
On a devfreq PROBE_DEFER, the freq_table in the driver profile
struct is never reset and may be left in an undefined state. To fix
this and correctly handle PROBE_DEFER, use a local freq_table and
max_state in the devfreq struct."
* tag 'devfreq-fixes-for-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
PM / devfreq: passive: revert an editing accident in SPDX-License line
PM / devfreq: Fix kernel warning with cpufreq passive register fail
PM / devfreq: Rework freq_table to be local to devfreq struct
PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_events
PM / devfreq: passive: Use HZ_PER_KHZ macro in units.h
PM / devfreq: Fix cpufreq passive unregister erroring on PROBE_DEFER
PM / devfreq: Mute warning on governor PROBE_DEFER
PM / devfreq: Fix kernel panic with cpu based scaling to passive gov
We waste a u64 SQE field for flags even though we don't need as many
bits and it can be used for something more useful later. Store io_uring
specific send/recv flags in sqe->ioprio instead of ->addr2.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Fixes: 0455d4ccec ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg")
[axboe: change comment in io_uring.h as well]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Yuwei Wang says:
====================
net, neigh: introduce interval_probe_time for periodic probe
This series adds a new option `interval_probe_time_ms` in net, neigh
for periodic probe, and add a limitation to prevent it set to 0
====================
Link: https://lore.kernel.org/r/20220629084832.2842973-1-wangyuweihx@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
commit ed6cd6a178 ("net, neigh: Set lower cap for neigh_managed_work rearming")
fixed a case when DELAY_PROBE_TIME is configured to 0, the processing of the
system work queue hog CPU to 100%, and further more we should introduce
a new option used by periodic probe
Signed-off-by: Yuwei Wang <wangyuweihx@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
add proc_dointvec_ms_jiffies_minmax to fit read msecs value to jiffies
with a limited range of values
Signed-off-by: Yuwei Wang <wangyuweihx@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>