The page pool keeps track of the number of pages in flight, and
it isn't safe to remove the pool until all pages are returned.
Disallow removing the pool until all pages are back, so the pool
is always available for page producers.
Make the page pool responsible for its own delayed destruction
instead of relying on XDP, so the page pool can be used without
the xdp memory model.
When all pages are returned, free the pool and notify xdp if the
pool is registered with the xdp memory system. Have the callback
perform a table walk since some drivers (cpsw) may share the pool
among multiple xdp_rxq_info.
Note that the increment of pages_state_release_cnt may result in
inflight == 0, resulting in the pool being released.
Fixes: d956a048cd ("xdp: force mem allocator removal and periodic warning")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While it is entirely possible that this tagger format is in fact more
generic than just these 2 switch families, I don't have that knowledge.
The Seville switch in NXP T1040 has a similar frame format, but there
are enough differences (e.g. DEST field starts at bit 57 instead of 56)
that calling this file tag_vitesse.c is a bit of a stretch at the
moment. The frame format has been listed in a comment so that people who
add support for further Vitesse switches can rework this tagger while
keeping compatibility with Felix.
The "ocelot" name was chosen instead of "felix" because even the Ocelot
switch can act as a DSA device when it is used in NPI mode, and the Felix
tagger format is almost identical. Currently it is only used for the
Felix switch embedded in the NXP LS1028A chip.
The ABI for this tagger should be considered "not stable" at the moment.
The DSA tag is always placed before the Ethernet header and therefore,
we are using the long prefix for RX tags to avoid putting the DSA master
port in promiscuous mode. Once there will be an API in DSA for drivers
to request DSA masters to be in promiscuous mode unconditionally, we
will switch to the "no prefix" extraction frame header, which will save
16 padding bytes for each RX frame.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Felix DSA driver needs to write to SYS_RAM_INIT_RAM_INIT for its own
chip initialization process.
Also update the MAINTAINERS file such that the headers exported by the
ocelot driver are under the same maintainers' umbrella as the driver
itself.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We will be registering another switch driver based on ocelot, which
lives under drivers/net/dsa.
Make sure the Felix DSA front-end has the necessary abstractions to
implement a new Ocelot driver instantiation. This includes the function
prototypes for implementing DSA callbacks.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the ism module is unloaded return control from exit routine only,
if all link groups are freed.
If an IB device is thrown away return control from device removal only,
if all link groups belonging to this device are freed.
A counters for the total number of SMCD link groups per ISM device is
introduced. ism module unloading continues only if the total number of
SMCD link groups for all ISM devices is zero. ISM device
removal continues only it the total number of SMCD link groups per ISM
device has decreased to zero.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SMCD link group termination is called when peer signals its shutdown
of its corresponding link group. For regular shutdowns no connections
exist anymore. For abnormal shutdowns connections must be killed and
their DMBs must be unregistered immediately. That means the SMCR method
to delay the link group freeing several seconds does not fit.
This patch adds immediate termination of a link group and its SMCD
connections and makes sure all SMCD link group related cleanup steps
are finished.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds 'module' member in the 'struct vsock_transport'
in order to get/put the transport module. This prevents the
module unloading while sockets are assigned to it.
We increase the module refcnt when a socket is assigned to a
transport, and we decrease the module refcnt when the socket
is destructed.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To allow other transports to be loaded with vmci_transport,
we register the vmci_transport as G2H or H2G only when a VMCI guest
or host is active.
To do that, this patch adds a callback registered in the vmci driver
that will be called when the host or guest becomes active.
This callback will register the vmci_transport in the VSOCK core.
Cc: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support of multiple transports in the
VSOCK core.
With the multi-transports support, we can use vsock with nested VMs
(using also different hypervisors) loading both guest->host and
host->guest transports at the same time.
Major changes:
- vsock core module can be loaded regardless of the transports
- vsock_core_init() and vsock_core_exit() are renamed to
vsock_core_register() and vsock_core_unregister()
- vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
to identify which directions the transport can handle and if it's
support DGRAM (only vmci)
- each stream socket is assigned to a transport when the remote CID
is set (during the connect() or when we receive a connection request
on a listener socket).
The remote CID is used to decide which transport to use:
- remote CID <= VMADDR_CID_HOST will use guest->host transport;
- remote CID == local_cid (guest->host transport) will use guest->host
transport for loopback (host->guest transports don't support loopback);
- remote CID > VMADDR_CID_HOST will use host->guest transport;
- listener sockets are not bound to any transports since no transport
operations are done on it. In this way we can create a listener
socket, also if the transports are not loaded or with VMADDR_CID_ANY
to listen on all transports.
- DGRAM sockets are handled as before, since only the vmci_transport
provides this feature.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All transports call __vsock_create() with the same parameters,
most of them depending on the parent socket. In order to simplify
the VSOCK core APIs exposed to the transports, this patch adds
the vsock_create_connected() callable from transports to create
a new socket when a connection request is received.
We also unexported the __vsock_create().
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
virtio_transport and vmci_transport handle the buffer_size
sockopts in a very similar way.
In order to support multiple transports, this patch moves this
handling in the core to allow the user to change the options
also if the socket is not yet assigned to any transport.
This patch also adds the '.notify_buffer_size' callback in the
'struct virtio_transport' in order to inform the transport,
when the buffer_size is changed by the user. It is also useful
to limit the 'buffer_size' requested (e.g. virtio transports).
Acked-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since now the 'struct vsock_sock' object contains a pointer to
the transport, this patch adds a parameter to the
vsock_core_get_transport() to return the right transport
assigned to the socket.
This patch modifies also the virtio_transport_get_ops(), that
uses the vsock_core_get_transport(), adding the
'struct vsock_sock *' parameter.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are going to add 'struct vsock_sock *' parameter to
virtio_transport_get_ops().
In some cases, like in the virtio_transport_reset_no_sock(),
we don't have any socket assigned to the packet received,
so we can't use the virtio_transport_get_ops().
In order to allow virtio_transport_reset_no_sock() to use the
'.send_pkt' callback from the 'vhost_transport' or 'virtio_transport',
we add the 'struct virtio_transport *' to it and to its caller:
virtio_transport_recv_pkt().
We moved the 'vhost_transport' and 'virtio_transport' definition,
to pass their address to the virtio_transport_recv_pkt().
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a preparation to support multiple transports, this patch adds
the 'transport' member at the 'struct vsock_sock'.
This new field is initialized during the creation in the
__vsock_create() function.
This patch also renames the global 'transport' pointer to
'transport_single', since for now we're only supporting a single
transport registered at run-time.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This header file now only includes the "uapi/linux/vm_sockets.h".
We can include directly it when needed.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vm_sockets_get_local_cid() is only used in virtio_transport_common.c.
We can replace it calling the virtio_transport_get_ops() and
using the get_local_cid() callback registered by the transport.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using the kernel datapath, the upcall don't
include skb hash info relatived. That will introduce
some problem, because the hash of skb is important
in kernel stack. For example, VXLAN module uses
it to select UDP src port. The tx queue selection
may also use the hash in stack.
Hash is computed in different ways. Hash is random
for a TCP socket, and hash may be computed in hardware,
or software stack. Recalculation hash is not easy.
Hash of TCP socket is computed:
tcp_v4_connect
-> sk_set_txhash (is random)
__tcp_transmit_skb
-> skb_set_hash_from_sk
There will be one upcall, without information of skb
hash, to ovs-vswitchd, for the first packet of a TCP
session. The rest packets will be processed in Open vSwitch
modules, hash kept. If this tcp session is forward to
VXLAN module, then the UDP src port of first tcp packet
is different from rest packets.
TCP packets may come from the host or dockers, to Open vSwitch.
To fix it, we store the hash info to upcall, and restore hash
when packets sent back.
+---------------+ +-------------------------+
| Docker/VMs | | ovs-vswitchd |
+----+----------+ +-+--------------------+--+
| ^ |
| | |
| | upcall v restore packet hash (not recalculate)
| +-+--------------------+--+
| tap netdev | | vxlan module
+---------------> +--> Open vSwitch ko +-->
or internal type | |
+-------------------------+
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) New generic devlink param "enable_roce", for downstream devlink
reload support
2) Do vport ACL configuration on per vport basis when
enabling/disabling a vport. This enables to have vports enabled/disabled
outside of eswitch config for future
3) Split the code for legacy vs offloads mode and make it clear
4) Tide up vport locking and workqueue usage
5) Fix metadata enablement for ECPF
6) Make explicit use of VF property to publish IB_DEVICE_VIRTUAL_FUNCTION
7) E-Switch and flow steering core low level support and refactoring for
netfilter flowtables offload
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Netfilter tables (nftables) implements a software datapath that
comes after tc ingress datapath. The datapath supports offloading
such rules via the flow table offload API.
This API is currently only used by NFT and it doesn't provide the
global priority in regards to tc offload, so we assume offloading such
rules must come after tc. It does provide a flow table priority
parameter, so we need to provide some supported priority range.
For that, split fastpath prio to two, flow table offload and tc offload,
with one dedicated priority chain for flow table offload.
Next patch will re-use the multi chain API to access this chain by
allowing access to this chain by the fdb_sub_namespace.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2019-11-13
1) Remove a unnecessary net_exit function from the xfrm interface.
From Xin Long.
2) Assign xfrm4_udp_encap_rcv to a UDP socket only if xfrm
is configured. From Alexey Dobriyan.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the dataplane hardware offload to the flowtable
infrastructure. Three new flags represent the hardware state of this
flow:
* FLOW_OFFLOAD_HW: This flow entry resides in the hardware.
* FLOW_OFFLOAD_HW_DYING: This flow entry has been scheduled to be remove
from hardware. This might be triggered by either packet path (via TCP
RST/FIN packet) or via aging.
* FLOW_OFFLOAD_HW_DEAD: This flow entry has been already removed from
the hardware, the software garbage collector can remove it from the
software flowtable.
This patch supports for:
* IPv4 only.
* Aging via FLOW_CLS_STATS, no packet and byte counter synchronization
at this stage.
This patch also adds the action callback that specifies how to convert
the flow entry into the flow_rule object that is passed to the driver.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the NFTA_FLOWTABLE_FLAGS attribute that allows users to
specify the NF_FLOWTABLE_HW_OFFLOAD flag. This patch also adds a new
setup interface for the flowtable type to perform the flowtable offload
block callback configuration.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the infrastructure to support for flow entry types.
The initial type is NF_FLOW_OFFLOAD_ROUTE that stores the routing
information into the flow entry to define a fastpath for the classic
forwarding path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move rcu_head to struct flow_offload, then remove the flow_offload_entry
structure definition.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers do not have access to the flow_offload structure, hence remove
this union from this flow_offload object as well as the original comment
on top of it.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify this code by storing the pointer to conntrack object in the
flow_offload structure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can-next 2019-10-07
this is a pull request for net-next/master consisting of 32 patches.
The first patch is by Gustavo A. R. Silva and removes unused code in the
generic CAN infrastructure.
The next three patches target the mcp251x driver. The one by Andy
Shevchenko removes the legacy platform data support from the driver. The
other two are by Timo Schlüßler and reset the device only when needed,
to prevent glitches on the output when GPIO support is added.
I'm contributing two patches fixing checkpatch warnings in the
c_can_platform and peak_canfd driver.
Stephane Grosjean's patch for the peak_canfd driver adds hw timestamps
support in rx skbs.
The next three patches target the xilinx_can driver. One patch by me to
fix checkpatch warnings, one patch by Anssi Hannula to avoid non
requested bus error frames, and a patch by YueHaibing that switches the
driver to devm_platform_ioremap_resource().
Pankaj Sharma contributes two patches for the m_can driver, the first
one adds support for one shot mode, the other support for handling
arbitration errors.
Followed by four patches by YueHaibing, switching the grcan, ifi, rcar,
and sun4i drivers to devm_platform_ioremap_resource()
I'm contributing cleanup patches for the rx-offload helper, while Joakim
Zhang's patch prepares the rx-offload helper for CAN-FD support. The rx
offload users flexcan and ti_hecc are converted accordingly.
The remaining twelve patches target the flexcan driver. First Joakim
Zhang switches the driver to devm_platform_ioremap_resource(). The
remaining eleven patch are by me and clean up the abstract the access of
the iflag1 and iflag2 register both for RX and TX mailboxes. This is a
preparation for the upcoming CAN-FD support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
after commit 4097e9d250 ("net: sched: don't use tc_action->order during
action dump"), 'act->order' is initialized but then it's no more read, so
we can just remove this member of struct tc_action.
CC: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink supports pair output of name and value. When the value is
binary, it must be presented in an array. If the length of the binary
value exceeds fmsg limitation, break the value into chunks internally.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When building with SFP disabled, the stub for sfp_bus_add_upstream()
missed "inline". Add it.
Fixes: 727b3668b7 ("net: sfp: rework upstream interface")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skbs for classic CAN and CAN FD frames are allocated with seperate
functions: alloc_can_skb() and alloc_canfd_skb().
In order to support CAN FD frames via the rx-offload helper, the driver
itself has to allocate the skb (depending whether it received a classic
CAN or CAN FD frame), as the rx-offload helper cannot know which kind of
CAN frame the driver has received.
This patch moves the allocation of the skb into the struct
can_rx_offload::mailbox_read callbacks of the the flexcan and ti_hecc
driver and adjusts the rx-offload helper accordingly.
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch removes the function can_rx_offload_reset(), as it does
nothing. If we ever need this function, add it back again.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Register "enable_roce" param, default value is RoCE enabled.
Current configuration is stored on mlx5_core_dev and exposed to user
through the cmode runtime devlink param.
Changing configuration requires changing the cmode driverinit devlink
param and calling devlink reload.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current upstream interface is an all-or-nothing, which is
sub-optimal for future changes, as it doesn't allow the upstream driver
to prepare for the SFP module becoming available, as it is at boot.
Switch to a find-sfp-bus, add-upstream, del-upstream, put-sfp-bus
interface structure instead, which allows the upstream driver to
prepare for a module being available as soon as add-upstream is called.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
One conflict in the BPF samples Makefile, some fixes in 'net' whilst
we were converting over to Makefile.target rules in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) BPF sample build fixes from Björn Töpel
2) Fix powerpc bpf tail call implementation, from Eric Dumazet.
3) DCCP leaks jiffies on the wire, fix also from Eric Dumazet.
4) Fix crash in ebtables when using dnat target, from Florian Westphal.
5) Fix port disable handling whne removing bcm_sf2 driver, from Florian
Fainelli.
6) Fix kTLS sk_msg trim on fallback to copy mode, from Jakub Kicinski.
7) Various KCSAN fixes all over the networking, from Eric Dumazet.
8) Memory leaks in mlx5 driver, from Alex Vesker.
9) SMC interface refcounting fix, from Ursula Braun.
10) TSO descriptor handling fixes in stmmac driver, from Jose Abreu.
11) Add a TX lock to synchonize the kTLS TX path properly with crypto
operations. From Jakub Kicinski.
12) Sock refcount during shutdown fix in vsock/virtio code, from Stefano
Garzarella.
13) Infinite loop in Intel ice driver, from Colin Ian King.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits)
ixgbe: need_wakeup flag might not be set for Tx
i40e: need_wakeup flag might not be set for Tx
igb/igc: use ktime accessors for skb->tstamp
i40e: Fix for ethtool -m issue on X722 NIC
iavf: initialize ITRN registers with correct values
ice: fix potential infinite loop because loop counter being too small
qede: fix NULL pointer deref in __qede_remove()
net: fix data-race in neigh_event_send()
vsock/virtio: fix sock refcnt holding during the shutdown
net: ethernet: octeon_mgmt: Account for second possible VLAN header
mac80211: fix station inactive_time shortly after boot
net/fq_impl: Switch to kvmalloc() for memory allocation
mac80211: fix ieee80211_txq_setup_flows() failure path
ipv4: Fix table id reference in fib_sync_down_addr
ipv6: fixes rt6_probe() and fib6_nh->last_probe init
net: hns: Fix the stray netpoll locks causing deadlock in NAPI path
net: usb: qmi_wwan: add support for DW5821e with eSIM support
CDC-NCM: handle incomplete transfer of MTU
nfc: netlink: fix double device reference drop
NFC: st21nfca: fix double free
...
Pull block fixes from Jens Axboe:
- Two NVMe device removal crash fixes, and a compat fixup for for an
ioctl that was introduced in this release (Anton, Charles, Max - via
Keith)
- Missing error path mutex unlock for drbd (Dan)
- cgroup writeback fixup on dead memcg (Tejun)
- blkcg online stats print fix (Tejun)
* tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block:
cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
block: drbd: remove a stray unlock in __drbd_send_protocol()
blkcg: make blkcg_print_stat() print stats only for online blkgs
nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
nvme-rdma: fix a segmentation fault during module unload
Section 7.2 of rfc7829: "Peer Address Thresholds (SCTP_PEER_ADDR_THLDS)
Socket Option" extends 'struct sctp_paddrthlds' with 'spt_pathcpthld'
added to allow a user to change ps_retrans per sock/asoc/transport, as
other 2 paddrthlds: pf_retrans, pathmaxrxt.
Note: to not break the user's program, here to support pf_retrans dump
and setting by adding a new sockopt SCTP_PEER_ADDR_THLDS_V2, and a new
structure sctp_paddrthlds_v2 instead of extending sctp_paddrthlds.
Also, when setting ps_retrans, the value is not allowed to be greater
than pf_retrans.
v1->v2:
- use SCTP_PEER_ADDR_THLDS_V2 to set/get pf_retrans instead,
as Marcelo and David Laight suggested.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a new feature defined in section 5 of rfc7829: "Primary Path
Switchover". By introducing a new tunable parameter:
Primary.Switchover.Max.Retrans (PSMR)
The primary path will be changed to another active path when the path
error counter on the old primary path exceeds PSMR, so that "the SCTP
sender is allowed to continue data transmission on a new working path
even when the old primary destination address becomes active again".
This patch is to add this tunable parameter, 'ps_retrans' per netns,
sock, asoc and transport. It also allows a user to change ps_retrans
per netns by sysctl, and ps_retrans per sock/asoc/transport will be
initialized with it.
The check will be done in sctp_do_8_2_transport_strike() when this
feature is enabled.
Note this feature is disabled by initializing 'ps_retrans' per netns
as 0xffff by default, and its value can't be less than 'pf_retrans'
when changing by sysctl.
v3->v4:
- add define SCTP_PS_RETRANS_MAX 0xffff, and use it on extra2 of
sysctl 'ps_retrans'.
- add a new entry for ps_retrans on ip-sysctl.txt.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a sockopt defined in section 7.3 of rfc7829: "Exposing
the Potentially Failed Path State", by which users can change
pf_expose per sock and asoc.
The new sockopt SCTP_EXPOSE_POTENTIALLY_FAILED_STATE is also
known as SCTP_EXPOSE_PF_STATE for short.
v2->v3:
- return -EINVAL if params.assoc_value > SCTP_PF_EXPOSE_MAX.
- define SCTP_EXPOSE_PF_STATE SCTP_EXPOSE_POTENTIALLY_FAILED_STATE.
v3->v4:
- improve changelog.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP Quick failover draft section 5.1, point 5 has been removed
from rfc7829. Instead, "the sender SHOULD (i) notify the Upper
Layer Protocol (ULP) about this state transition", as said in
section 3.2, point 8.
So this patch is to add SCTP_ADDR_POTENTIALLY_FAILED, defined
in section 7.1, "which is reported if the affected address
becomes PF". Also remove transport cwnd's update when moving
from PF back to ACTIVE , which is no longer in rfc7829 either.
Note that ulp_notify will be set to false if asoc->expose is
not 'enabled', according to last patch.
v2->v3:
- define SCTP_ADDR_PF SCTP_ADDR_POTENTIALLY_FAILED.
v3->v4:
- initialize spc_state with SCTP_ADDR_AVAILABLE, as Marcelo suggested.
- check asoc->pf_expose in sctp_assoc_control_transport(), as Marcelo
suggested.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As said in rfc7829, section 3, point 12:
The SCTP stack SHOULD expose the PF state of its destination
addresses to the ULP as well as provide the means to notify the
ULP of state transitions of its destination addresses from
active to PF, and vice versa. However, it is recommended that
an SCTP stack implementing SCTP-PF also allows for the ULP to be
kept ignorant of the PF state of its destinations and the
associated state transitions, thus allowing for retention of the
simpler state transition model of [RFC4960] in the ULP.
Not only does it allow to expose the PF state to ULP, but also
allow to ignore sctp-pf to ULP.
So this patch is to add pf_expose per netns, sock and asoc. And in
sctp_assoc_control_transport(), ulp_notify will be set to false if
asoc->expose is not 'enabled' in next patch.
It also allows a user to change pf_expose per netns by sysctl, and
pf_expose per sock and asoc will be initialized with it.
Note that pf_expose also works for SCTP_GET_PEER_ADDR_INFO sockopt,
to not allow a user to query the state of a sctp-pf peer address
when pf_expose is 'disabled', as said in section 7.3.
v1->v2:
- Fix a build warning noticed by Nathan Chancellor.
v2->v3:
- set pf_expose to UNUSED by default to keep compatible with old
applications.
v3->v4:
- add a new entry for pf_expose on ip-sysctl.txt, as Marcelo suggested.
- change this patch to 1/5, and move sctp_assoc_control_transport
change into 2/5, as Marcelo suggested.
- use SCTP_PF_EXPOSE_UNSET instead of SCTP_PF_EXPOSE_UNUSED, and
set SCTP_PF_EXPOSE_UNSET to 0 in enum, as Marcelo suggested.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race between driver code that does setup/cleanup of device
and devlink reload operation that in some drivers works with the same
code. Use after free could we easily obtained by running:
while true; do
echo 10 > /sys/bus/netdevsim/new_device
devlink dev reload netdevsim/netdevsim10 &
echo 10 > /sys/bus/netdevsim/del_device
done
Fix this by enabling reload only after setup of device is complete and
disabling it at the beginning of the cleanup process.
Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 2d8dc5bbf4 ("devlink: Add support for reload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds two netlink commands to TIPC in order for user to be
able to set or remove AEAD keys:
- TIPC_NL_KEY_SET
- TIPC_NL_KEY_FLUSH
When the 'KEY_SET' is given along with the key data, the key will be
initiated and attached to TIPC crypto. On the other hand, the
'KEY_FLUSH' command will remove all existing keys if any.
Acked-by: Ying Xue <ying.xue@windreiver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new structure 'tipc_aead_key' is added to the 'tipc.h' for user to
be able to transfer a key to TIPC in kernel. Netlink will be used for
this purpose in the later commits.
Acked-by: Ying Xue <ying.xue@windreiver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>