The switch can autocast MIB counter using Ethernet packet.
Add support for this and provide a handler for the tagger.
The switch will send packet with MIB counter for each port, the switch
will use completion API to wait for the correct packet to be received
and will complete the task only when each packet is received.
Although the handler will drop all the other packet, we still have to
consume each MIB packet to complete the request. This is done to prevent
mixed data with concurrent ethtool request.
connect_tag_protocol() is used to add the handler to the tag_qca tagger,
master_state_change() use the MIB lock to make sure no MIB Ethernet is
in progress.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add qca8k side support for mgmt read/write in Ethernet packet.
qca8k supports some specially crafted Ethernet packet that can be used
for mgmt read/write instead of the legacy method uart/internal mdio.
This add support for the qca8k side to craft the packet and enqueue it.
Each port and the qca8k_priv have a special struct to put data in it.
The completion API is used to wait for the packet to be received back
with the requested data.
The various steps are:
1. Craft the special packet with the qca hdr set to mgmt read/write
mode.
2. Set the lock in the dedicated mgmt struct.
3. Increment the seq number and set it in the mgmt pkt
4. Reinit the completion.
5. Enqueue the packet.
6. Wait the packet to be received.
7. Use the data set by the tagger to complete the mdio operation.
If the completion timeouts or the ack value is not true, the legacy
mdio way is used.
It has to be considered that in the initial setup mdio is still used and
mdio is still used until DSA is ready to accept and tag packet.
tag_proto_connect() is used to fill the required handler for the tagger
to correctly parse and elaborate the special Ethernet mdio packet.
Locking is added to qca8k_master_change() to make sure no mgmt Ethernet
are in progress.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MDIO/MIB Ethernet require the master port and the tagger availabale to
correctly work. Use the new api master_state_change to track when master
is operational or not and set a bool in qca8k_priv.
We cache the first cached master available and we check if other cpu
port are operational when the cached one goes down.
This cached master will later be used by mdio read/write and mib request to
correctly use the working function.
qca8k implementation for MDIO/MIB Ethernet is bad. CPU port0 is the only
one that answers with the ack packet or sends MIB Ethernet packets. For
this reason the master_state_change ignore CPU port6 and only checks
CPU port0 if it's operational and enables this mode.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add connect/disconnect helper to assign private struct to the DSA switch.
Add support for Ethernet mgmt and MIB if the DSA driver provide an handler
to correctly parse and elaborate the data.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add all the required define to prepare support for mgmt read/write in
Ethernet packet. Any packet of this type has to be dropped as the only
use of these special packet is receive ack for an mgmt write request or
receive data for an mgmt read request.
A struct is used that emulates the Ethernet header but is used for a
different purpose.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ethernet MDIO packets are non-standard and DSA master expects the first
6 octets to be the MAC DA. To address these kind of packet, enable
promisc_on_master flag for the tagger.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move tag_qca define to include dir linux/dsa as the qca8k require access
to the tagger define to support in-band mdio read/write using ethernet
packet.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order for switch driver to be able to make simple and reliable use of
the master tracking operations, they must also be notified of the
initial state of the DSA master, not just of the changes. This is
because they might enable certain features only during the time when
they know that the DSA master is up and running.
Therefore, this change explicitly checks the state of the DSA master
under the same rtnl_mutex as we were holding during the
dsa_master_setup() and dsa_master_teardown() call. The idea being that
if the DSA master became operational in between the moment in which it
became a DSA master (dsa_master_setup set dev->dsa_ptr) and the moment
when we checked for the master being up, there is a chance that we
would emit a ->master_state_change() call with no actual state change.
We need to avoid that by serializing the concurrent netdevice event with
us. If the netdevice event started before, we force it to finish before
we begin, because we take rtnl_lock before making netdev_uses_dsa()
return true. So we also handle that early event and do nothing on it.
Similarly, if the dev_open() attempt is concurrent with us, it will
attempt to take the rtnl_mutex, but we're holding it. We'll see that
the master flag IFF_UP isn't set, then when we release the rtnl_mutex
we'll process the NETDEV_UP notifier.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Certain drivers may need to send management traffic to the switch for
things like register access, FDB dump, etc, to accelerate what their
slow bus (SPI, I2C, MDIO) can already do.
Ethernet is faster (especially in bulk transactions) but is also more
unreliable, since the user may decide to bring the DSA master down (or
not bring it up), therefore severing the link between the host and the
attached switch.
Drivers needing Ethernet-based register access already should have
fallback logic to the slow bus if the Ethernet method fails, but that
fallback may be based on a timeout, and the I/O to the switch may slow
down to a halt if the master is down, because every Ethernet packet will
have to time out. The driver also doesn't have the option to turn off
Ethernet-based I/O momentarily, because it wouldn't know when to turn it
back on.
Which is where this change comes in. By tracking NETDEV_CHANGE,
NETDEV_UP and NETDEV_GOING_DOWN events on the DSA master, we should know
the exact interval of time during which this interface is reliably
available for traffic. Provide this information to switches so they can
use it as they wish.
An helper is added dsa_port_master_is_operational() to check if a master
port is operational.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
mlx5 fixes 2022-02-01
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
Sorry about the long series, but I had to move the top two patches from
net-next to net to help avoiding a build break when kspp branch is merged
into linus-next on next merge window.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Chancellor says:
====================
This series allows CONFIG_DEBUG_INFO_DWARF5 to be selected with
CONFIG_DEBUG_INFO_BTF=y by checking the pahole version.
The first four patches add CONFIG_PAHOLE_VERSION and
scripts/pahole-version.sh to clean up all the places that pahole's
version is transformed into a 3-digit form.
The fourth patch adds a PAHOLE_VERSION dependency to DEBUG_INFO_DWARF5
so that there are no build errors when it is selected with
DEBUG_INFO_BTF.
I build tested Fedora's aarch64 and x86_64 config with ToT clang 14.0.0
and GCC 11 with CONFIG_DEBUG_INFO_DWARF5 enabled with both pahole 1.21
and 1.23.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There are a few different places where pahole's version is turned into a
three digit form with the exact same command. Move this command into
scripts/pahole-version.sh to reduce the amount of duplication across the
tree.
Create CONFIG_PAHOLE_VERSION so the version code can be used in Kconfig
to enable and disable configuration options based on the pahole version,
which is already done in a couple of places.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-3-nathan@kernel.org
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-02-01
This series contains updates to e1000e driver only.
Sasha removes CSME handshake with TGL platform as this is not supported
and is causing hardware unit hangs to be reported.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000e: Handshake with CSME starts from ADL platforms
e1000e: Separate ADP board type from TGP
====================
Link: https://lore.kernel.org/r/20220201173754.580305-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.
Use struct_group() in struct vlan_ethhdr around members h_dest and
h_source, so they can be referenced together. This will allow memcpy()
and sizeof() to more easily reason about sizes, improve readability,
and avoid future warnings about writing beyond the end of h_dest.
"pahole" shows no size nor member offset changes to struct vlan_ethhdr.
"objdump -d" shows no object code changes.
Fixes: 34802a42b3 ("net/mlx5e: Do not modify the TX SKB")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently the driver adds implicit modify hdr action for
decap rules on tunnel devices if the port is an ovs port.
This is also done if the action is drop and makes the modify
hdr redundant and also the FW doesn't support it and will generate
a syndrome.
kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)
Fix it by adding the implicit modify hdr only for fwd actions.
Fixes: b16eb3c81f ("net/mlx5: Support internal port as decap route device")
Fixes: 077cdda764 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
IPsec Tunnel mode crypto offload software parser (SWP) setting in data
path currently always set the inner L4 offset regardless of the
encapsulated L4 header type and whether it exists in the first place,
this breaks non TCP/UDP traffic as such.
Set the SWP inner L4 offset only when the IPsec tunnel encapsulated L4
header protocol is TCP/UDP.
While at it fix inner ip protocol read for setting MLX5_ETH_WQE_SWP_INNER_L4_UDP
flag to address the case where the ip header protocol is IPv6.
Fixes: f1267798c9 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
IPsec crypto offload always set the ethernet segment checksum flags with
the inner L4 header checksum flag enabled for encapsulated IPsec offloaded
packet regardless of the encapsulated L4 header type, and even if it
doesn't exists in the first place, this breaks non TCP/UDP traffic as
such.
Set the inner L4 checksum flag only when the encapsulated L4 header
protocol is TCP/UDP using software parser swp_inner_l4_offset field as
indication.
Fixes: 5cfb540ef2 ("net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The hardware spec defines max_average_bw == 0 as "unlimited bandwidth".
max_average_bw is calculated as `ceil / BYTES_IN_MBIT`, which can become
0 when ceil is small, leading to an undesired effect of having no
bandwidth limit.
This commit fixes it by rounding up small values of ceil to 1 Mbit/s.
Fixes: 214baf2287 ("net/mlx5e: Support HTB offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The variable modact is not initialized before used in command
modify header allocation which can cause command to fail.
Fix by initializing modact with zeros.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 8f1e0b97cc ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In case the HW doesn't perform header-data split, it will write the whole
packet into the data buffer in the WQ, in this case the SHAMPO CQE handler
couldn't use the header entry to build the SKB, instead it should allocate
a new memory to build the SKB using the function:
mlx5e_skb_from_cqe_mpwrq_nonlinear.
Fixes: f97d5c2a45 ("net/mlx5e: Add handle SHAMPO cqe support")
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The HW doesn't wrap the CQE.shampo.header_index field according to the
headers buffer size, instead it always increases it until reaching overflow
of u16 size.
Thus the mlx5e_handle_rx_cqe_mpwrq_shampo handler should mask the
CQE header_index field to find the actual header index in the headers buffer.
Fixes: f97d5c2a45 ("net/mlx5e: Add handle SHAMPO cqe support")
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Only prio 1 is supported for nic mode when there is no ignore flow level
support in firmware. But for switchdev mode, which supports fixed number
of statically pre-allocated prios, this restriction is not relevant so
it can be relaxed.
Fixes: d671e109bd ("net/mlx5: Fix tc max supported prio for nic mode")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.
Fixes: 03a9d11e6e ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Substitute del_timer() with del_timer_sync() in fw reset polling
deactivation flow, in order to prevent a race condition which occurs
when del_timer() is called and timer is deactivated while another
process is handling the timer interrupt. A situation that led to
the following call trace:
RIP: 0010:run_timer_softirq+0x137/0x420
<IRQ>
recalibrate_cpu_khz+0x10/0x10
ktime_get+0x3e/0xa0
? sched_clock_cpu+0xb/0xc0
__do_softirq+0xf5/0x2ea
irq_exit_rcu+0xc1/0xf0
sysvec_apic_timer_interrupt+0x9e/0xc0
asm_sysvec_apic_timer_interrupt+0x12/0x20
</IRQ>
Fixes: 38b9f903f2 ("net/mlx5: Handle sync reset request event")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When querying the module EEPROM, there was a misusage of the 'offset'
variable vs the 'query.offset' field.
Fix that by always using 'offset' and assigning its value to
'query.offset' right before the mcia register read call.
While at it, the cross-pages read size adjustment was changed to be more
intuitive.
Fixes: e19b0a3474 ("net/mlx5: Refactor module EEPROM query")
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This kind of action is not supported by firmware and generates a
syndrome.
kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)
Fixes: d7e75a325c ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Even though net_device->name is guaranteed to be null-terminated string of
size<=IFNAMSIZ, the test robot complains that return value of netdev_name()
can be larger:
In file included from include/trace/define_trace.h:102,
from drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h:113,
from drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c:12:
drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h: In function 'trace_event_raw_event_mlx5_esw_bridge_fdb_template':
>> drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h:24:29: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 20 [-Wstringop-truncation]
24 | strncpy(__entry->dev_name,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
25 | netdev_name(fdb->dev),
| ~~~~~~~~~~~~~~~~~~~~~~
26 | IFNAMSIZ);
| ~~~~~~~~~
This is caused by the fact that default value of IFNAMSIZ is 16, while
placeholder value that is returned by netdev_name() for unnamed net devices
is larger than that.
The offending code is in a tracing function that is only called for mlx5
representors, so there is no straightforward way to reproduce the issue but
let's fix it for correctness sake by replacing strncpy() with strscpy() to
ensure that resulting string is always null-terminated.
Fixes: 9724fd5d9c ("net/mlx5: Bridge, add tracepoints")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The mlx5_esw_bridge_cleanup() is expected to be called with rtnl lock
taken, which is true for mlx5e_rep_bridge_cleanup() function but not for
error handling code in mlx5e_rep_bridge_init(). Add missing rtnl
lock/unlock calls and extend both mlx5_esw_bridge_cleanup() and its dual
function mlx5_esw_bridge_init() with ASSERT_RTNL() to verify the invariant
from now on.
Fixes: 7cd6a54a82 ("net/mlx5: Bridge, handle FDB events")
Fixes: 19e9bfa044 ("net/mlx5: Bridge, add offload infrastructure")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-01-31
This series contains updates to i40e driver only.
Jedrzej fixes a condition check which would cause an error when
resetting bandwidth when DCB is active with one TC.
Karen resolves a null pointer dereference that could occur when removing
the driver while VSI rings are being disabled.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Fix reset path while removing the driver
i40e: Fix reset bw limit when DCB enabled with 1 TC
====================
Link: https://lore.kernel.org/r/20220201000522.505909-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ideally the size would depend on the link speed, but the recycle
ring is created when the interface is brought up before the driver
knows the link speed. So size it for the maximum speed of a given NIC.
PowerPC is only supported on SFN7xxx and SFN8xxx NICs.
With this patch on a 40G NIC the number of calls to alloc_pages and
friends went down from about 18% to under 2%.
On a 10G NIC the number of calls to alloc_pages and friends went down
from about 15% to 0 (perf did not capture any calls during the 60
second test).
On a 100G NIC the number of calls to alloc_pages and friends went down
from about 23% to 4%.
Reported-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220131111054.cp4f6foyinaarwbn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When setting Tx sci explicit, the Rx side is expected to use this
sci and not recalculate it from the packet.However, in case of Tx sci
is explicit and send_sci is off, the receiver is wrongly recalculate
the sci from the source MAC address which most likely be different
than the explicit sci.
Fix by preventing such configuration when macsec newlink is established
and return EINVAL error code on such cases.
Fixes: c09440f7dc ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Link: https://lore.kernel.org/r/1643542672-29403-1-git-send-email-raeds@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We got reports of following warning in inet_sock_destruct()
WARN_ON(sk_forward_alloc_get(sk));
Whenever we add a non zero-copy fragment to a pure zerocopy skb,
we have to anticipate that whole skb->truesize will be uncharged
when skb is finally freed.
skb->data_len is the payload length. But the memory truesize
estimated by __zerocopy_sg_from_iter() is page aligned.
Fixes: 9b65b17db7 ("net: avoid double accounting for pure zerocopy skbs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Link: https://lore.kernel.org/r/20220201065254.680532-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When packet_setsockopt( PACKET_FANOUT_DATA ) reads po->fanout,
no lock is held, meaning that another thread can change po->fanout.
Given that po->fanout can only be set once during the socket lifetime
(it is only cleared from fanout_release()), we can use
READ_ONCE()/WRITE_ONCE() to document the race.
BUG: KCSAN: data-race in packet_setsockopt / packet_setsockopt
write to 0xffff88813ae8e300 of 8 bytes by task 14653 on cpu 0:
fanout_add net/packet/af_packet.c:1791 [inline]
packet_setsockopt+0x22fe/0x24a0 net/packet/af_packet.c:3931
__sys_setsockopt+0x209/0x2a0 net/socket.c:2180
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88813ae8e300 of 8 bytes by task 14654 on cpu 1:
packet_setsockopt+0x691/0x24a0 net/packet/af_packet.c:3935
__sys_setsockopt+0x209/0x2a0 net/socket.c:2180
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000000000000000 -> 0xffff888106f8c000
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 14654 Comm: syz-executor.3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 47dceb8ecd ("packet: add classic BPF fanout mode")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220201022358.330621-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexei Starovoitov says:
====================
CO-RE in the kernel support allows bpf preload to switch to light
skeleton and remove libbpf dependency.
This reduces the size of bpf_preload_umd from 300kbyte to 19kbyte
and eventually will make "kernel skeleton" possible.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>