We have netdev_alloc_pcpu_stats(), and we have devm_alloc_percpu().
Add a managed version of netdev_alloc_pcpu_stats, e.g. for allocating
the per-cpu stats in the probe() callback of a driver. It needs to be
a macro for dealing properly with the type argument.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add dev_sw_netstats_tx_add(), complementing already existing
dev_sw_netstats_rx_add(). Other than dev_sw_netstats_rx_add allow to
pass the number of packets as function argument.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is one main difference in mscc_ocelot between IP multicast and L2
multicast. With IP multicast, destination ports are encoded into the
upper bytes of the multicast MAC address. Example: to deliver the
address 01:00:5E:11:22:33 to ports 3, 8, and 9, one would need to
program the address of 00:03:08:11:22:33 into hardware. Whereas for L2
multicast, the MAC table entry points to a Port Group ID (PGID), and
that PGID contains the port mask that the packet will be forwarded to.
As to why it is this way, no clue. My guess is that not all port
combinations can be supported simultaneously with the limited number of
PGIDs, and this was somehow an issue for IP multicast but not for L2
multicast. Anyway.
Prior to this change, the raw L2 multicast code was bogus, due to the
fact that there wasn't really any way to test it using the bridge code.
There were 2 issues:
- A multicast PGID was allocated for each MDB entry, but it wasn't in
fact programmed to hardware. It was dummy.
- In fact we don't want to reserve a multicast PGID for every single MDB
entry. That would be odd because we can only have ~60 PGIDs, but
thousands of MDB entries. So instead, we want to reserve a multicast
PGID for every single port combination for multicast traffic. And
since we can have 2 (or more) MDB entries delivered to the same port
group (and therefore PGID), we need to reference-count the PGIDs.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend the bridge multicast control and data path to configure routes
for L2 (non-IP) multicast groups.
The uapi struct br_mdb_entry union u is extended with another variant,
mac_addr, which does not change the structure size, and which is valid
when the proto field is zero.
To be compatible with the forwarding code that is already in place,
which acts as an IGMP/MLD snooping bridge with querier capabilities, we
need to declare that for L2 MDB entries (for which there exists no such
thing as IGMP/MLD snooping/querying), that there is always a querier.
Otherwise, these entries would be flooded to all bridge ports and not
just to those that are members of the L2 multicast group.
Needless to say, only permanent L2 multicast groups can be installed on
a bridge port.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201028233831.610076-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch is to add the function to make the abort chunk with
the error cause for new encapsulation port restart, defined
on Section 4.4 in draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.
v1->v2:
- no change.
v2->v3:
- no need to call htons() when setting nep.cur_port/new_port.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sctp_mtu_payload() is for calculating the frag size before making
chunks from a msg. So we should only add udphdr size to overhead
when udp socks are listening, as only then sctp can handle the
incoming sctp over udp packets and outgoing sctp over udp packets
will be possible.
Note that we can't do this according to transport->encap_port, as
different transports may be set to different values, while the
chunks were made before choosing the transport, we could not be
able to meet all rfc6951#section-5.6 recommends.
v1->v2:
- Add udp_port for sctp_sock to avoid a potential race issue, it
will be used in xmit path in the next patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As rfc6951#section-5.4 says:
"After finding the SCTP association (which
includes checking the verification tag), the UDP source port MUST be
stored as the encapsulation port for the destination address the SCTP
packet is received from (see Section 5.1).
When a non-encapsulated SCTP packet is received by the SCTP stack,
the encapsulation of outgoing packets belonging to the same
association and the corresponding destination address MUST be
disabled."
transport encap_port should be updated by a validated incoming packet's
udp src port.
We save the udp src port in sctp_input_cb->encap_port, and then update
the transport in two places:
1. right after vtag is verified, which is required by RFC, and this
allows the existent transports to be updated by the chunks that
can only be processed on an asoc.
2. right before processing the 'init' where the transports are added,
and this allows building a sctp over udp connection by client with
the server not knowing the remote encap port.
3. when processing ootb_pkt and creating the temporary transport for
the reply pkt.
Note that sctp_input_cb->header is removed, as it's not used any more
in sctp.
v1->v2:
- Change encap_port as __be16 for sctp_input_cb.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch is to implement:
rfc6951#section-6.1: Get or Set the Remote UDP Encapsulation Port Number
with the param of the struct:
struct sctp_udpencaps {
sctp_assoc_t sue_assoc_id;
struct sockaddr_storage sue_address;
uint16_t sue_port;
};
the encap_port of sock, assoc or transport can be changed by users,
which also means it allows the different transports of the same asoc
to have different encap_port value.
v1->v2:
- no change.
v2->v3:
- fix the endian warning when setting values between encap_port and
sue_port.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
encap_port is added as per netns/sock/assoc/transport, and the
latter one's encap_port inherits the former one's by default.
The transport's encap_port value would mostly decide if one
packet should go out with udp encapsulated or not.
This patch also allows users to set netns' encap_port by sysctl.
v1->v2:
- Change to define encap_port as __be16 for sctp_sock, asoc and
transport.
v2->v3:
- No change.
v3->v4:
- Add 'encap_port' entry in ip-sysctl.rst.
v4->v5:
- Improve the description of encap_port in ip-sysctl.rst.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch is to add the udp6 sock part in sctp_udp_sock_start/stop().
udp_conf.use_udp6_rx_checksums is set to true, as:
"The SCTP checksum MUST be computed for IPv4 and IPv6, and the UDP
checksum SHOULD be computed for IPv4 and IPv6"
says in rfc6951#section-5.3.
v1->v2:
- Add pr_err() when fails to create udp v6 sock.
- Add #if IS_ENABLED(CONFIG_IPV6) not to create v6 sock when ipv6 is
disabled.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch is to add the functions to create/release udp4 sock,
and set the sock's encap_rcv to process the incoming udp encap
sctp packets. In sctp_udp_rcv(), as we can see, all we need to
do is fix the transport header for sctp_rcv(), then it would
implement the part of rfc6951#section-5.4:
"When an encapsulated packet is received, the UDP header is removed.
Then, the generic lookup is performed, as done by an SCTP stack
whenever a packet is received, to find the association for the
received SCTP packet"
Note that these functions will be called in the last patch of
this patchset when enabling this feature.
v1->v2:
- Add pr_err() when fails to create udp v4 sock.
v2->v3:
- Add 'select NET_UDP_TUNNEL' in sctp Kconfig.
v3->v4:
- No change.
v4->v5:
- Change to set udp_port to 0 by default.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnd Bergmann says:
====================
wimax: move to staging
After I sent a fix for what appeared to be a harmless warning in
the wimax user interface code, the conclusion was that the whole
thing has most likely not been used in a very long time, and the
user interface possibly been broken since b61a5eea59 ("wimax: use
genl_register_family_with_ops()").
Using a shared branch between net-next and staging should help
coordinate patches getting submitted against it.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is the implementation of CFM netlink status
get information interface.
Add new nested netlink attributes. These attributes are used by the
user space to get status information.
GETLINK:
Request filter RTEXT_FILTER_CFM_STATUS:
Indicating that CFM status information must be delivered.
IFLA_BRIDGE_CFM:
Points to the CFM information.
IFLA_BRIDGE_CFM_MEP_STATUS_INFO:
This indicate that the MEP instance status are following.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO:
This indicate that the peer MEP status are following.
CFM nested attribute has the following attributes in next level.
GETLINK RTEXT_FILTER_CFM_STATUS:
IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE:
The MEP instance number of the delivered status.
The type is u32.
IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN:
The MEP instance received CFM PDU with unexpected Opcode.
The type is u32 (bool).
IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN:
The MEP instance received CFM PDU with unexpected version.
The type is u32 (bool).
IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN:
The MEP instance received CCM PDU with MD level lower than
configured level. This frame is discarded.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE:
The MEP instance number of the delivered status.
The type is u32.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID:
The added Peer MEP ID of the delivered status.
The type is u32.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT:
The CCM defect status.
The type is u32 (bool).
True means no CCM frame is received for 3.25 intervals.
IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI:
The last received CCM PDU RDI.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE:
The last received CCM PDU Port Status TLV value field.
The type is u8.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE:
The last received CCM PDU Interface Status TLV value field.
The type is u8.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN:
A CCM frame has been received from Peer MEP.
The type is u32 (bool).
This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN:
A CCM frame with TLV has been received from Peer MEP.
The type is u32 (bool).
This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN:
A CCM frame with unexpected sequence number has been received
from Peer MEP.
The type is u32 (bool).
When a sequence number is not one higher than previously received
then it is unexpected.
This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is the implementation of CFM netlink configuration
get information interface.
Add new nested netlink attributes. These attributes are used by the
user space to get configuration information.
GETLINK:
Request filter RTEXT_FILTER_CFM_CONFIG:
Indicating that CFM configuration information must be delivered.
IFLA_BRIDGE_CFM:
Points to the CFM information.
IFLA_BRIDGE_CFM_MEP_CREATE_INFO:
This indicate that MEP instance create parameters are following.
IFLA_BRIDGE_CFM_MEP_CONFIG_INFO:
This indicate that MEP instance config parameters are following.
IFLA_BRIDGE_CFM_CC_CONFIG_INFO:
This indicate that MEP instance CC functionality
parameters are following.
IFLA_BRIDGE_CFM_CC_RDI_INFO:
This indicate that CC transmitted CCM PDU RDI
parameters are following.
IFLA_BRIDGE_CFM_CC_CCM_TX_INFO:
This indicate that CC transmitted CCM PDU parameters are
following.
IFLA_BRIDGE_CFM_CC_PEER_MEP_INFO:
This indicate that the added peer MEP IDs are following.
CFM nested attribute has the following attributes in next level.
GETLINK RTEXT_FILTER_CFM_CONFIG:
IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE:
The created MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN:
The created MEP domain.
The type is u32 (br_cfm_domain).
It must be BR_CFM_PORT.
This means that CFM frames are transmitted and received
directly on the port - untagged. Not in a VLAN.
IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION:
The created MEP direction.
The type is u32 (br_cfm_mep_direction).
It must be BR_CFM_MEP_DIRECTION_DOWN.
This means that CFM frames are transmitted and received on
the port. Not in the bridge.
IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX:
The created MEP residence port ifindex.
The type is u32 (ifindex).
IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE:
The deleted MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE:
The configured MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC:
The configured MEP unicast MAC address.
The type is 6*u8 (array).
This is used as SMAC in all transmitted CFM frames.
IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL:
The configured MEP unicast MD level.
The type is u32.
It must be in the range 1-7.
No CFM frames are passing through this MEP on lower levels.
IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID:
The configured MEP ID.
The type is u32.
It must be in the range 0-0x1FFF.
This MEP ID is inserted in any transmitted CCM frame.
IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE:
The configured MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE:
The Continuity Check (CC) functionality is enabled or disabled.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL:
The CC expected receive interval of CCM frames.
The type is u32 (br_cfm_ccm_interval).
This is also the transmission interval of CCM frames when enabled.
IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID:
The CC expected receive MAID in CCM frames.
The type is CFM_MAID_LENGTH*u8.
This is MAID is also inserted in transmitted CCM frames.
IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE:
The configured MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_CC_PEER_MEPID:
The CC Peer MEP ID added.
The type is u32.
When a Peer MEP ID is added and CC is enabled it is expected to
receive CCM frames from that Peer MEP.
IFLA_BRIDGE_CFM_CC_RDI_INSTANCE:
The configured MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_CC_RDI_RDI:
The RDI that is inserted in transmitted CCM PDU.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE:
The configured MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC:
The transmitted CCM frame destination MAC address.
The type is 6*u8 (array).
This is used as DMAC in all transmitted CFM frames.
IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE:
The transmitted CCM frame update (increment) of sequence
number is enabled or disabled.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD:
The period of time where CCM frame are transmitted.
The type is u32.
The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX
must be done before timeout to keep transmission alive.
When period is zero any ongoing CCM frame transmission
will be stopped.
IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV:
The transmitted CCM frame update with Interface Status TLV
is enabled or disabled.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE:
The transmitted Interface Status TLV value field.
The type is u8.
IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV:
The transmitted CCM frame update with Port Status TLV is enabled
or disabled.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE:
The transmitted Port Status TLV value field.
The type is u8.
Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is the implementation of CFM netlink configuration
set information interface.
Add new nested netlink attributes. These attributes are used by the
user space to create/delete/configure CFM instances.
SETLINK:
IFLA_BRIDGE_CFM:
Indicate that the following attributes are CFM.
IFLA_BRIDGE_CFM_MEP_CREATE:
This indicate that a MEP instance must be created.
IFLA_BRIDGE_CFM_MEP_DELETE:
This indicate that a MEP instance must be deleted.
IFLA_BRIDGE_CFM_MEP_CONFIG:
This indicate that a MEP instance must be configured.
IFLA_BRIDGE_CFM_CC_CONFIG:
This indicate that a MEP instance Continuity Check (CC)
functionality must be configured.
IFLA_BRIDGE_CFM_CC_PEER_MEP_ADD:
This indicate that a CC Peer MEP must be added.
IFLA_BRIDGE_CFM_CC_PEER_MEP_REMOVE:
This indicate that a CC Peer MEP must be removed.
IFLA_BRIDGE_CFM_CC_CCM_TX:
This indicate that the CC transmitted CCM PDU must be configured.
IFLA_BRIDGE_CFM_CC_RDI:
This indicate that the CC transmitted CCM PDU RDI must be
configured.
CFM nested attribute has the following attributes in next level.
SETLINK RTEXT_FILTER_CFM_CONFIG:
IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE:
The created MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN:
The created MEP domain.
The type is u32 (br_cfm_domain).
It must be BR_CFM_PORT.
This means that CFM frames are transmitted and received
directly on the port - untagged. Not in a VLAN.
IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION:
The created MEP direction.
The type is u32 (br_cfm_mep_direction).
It must be BR_CFM_MEP_DIRECTION_DOWN.
This means that CFM frames are transmitted and received on
the port. Not in the bridge.
IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX:
The created MEP residence port ifindex.
The type is u32 (ifindex).
IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE:
The deleted MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE:
The configured MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC:
The configured MEP unicast MAC address.
The type is 6*u8 (array).
This is used as SMAC in all transmitted CFM frames.
IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL:
The configured MEP unicast MD level.
The type is u32.
It must be in the range 1-7.
No CFM frames are passing through this MEP on lower levels.
IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID:
The configured MEP ID.
The type is u32.
It must be in the range 0-0x1FFF.
This MEP ID is inserted in any transmitted CCM frame.
IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE:
The configured MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE:
The Continuity Check (CC) functionality is enabled or disabled.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL:
The CC expected receive interval of CCM frames.
The type is u32 (br_cfm_ccm_interval).
This is also the transmission interval of CCM frames when enabled.
IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID:
The CC expected receive MAID in CCM frames.
The type is CFM_MAID_LENGTH*u8.
This is MAID is also inserted in transmitted CCM frames.
IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE:
The configured MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_CC_PEER_MEPID:
The CC Peer MEP ID added.
The type is u32.
When a Peer MEP ID is added and CC is enabled it is expected to
receive CCM frames from that Peer MEP.
IFLA_BRIDGE_CFM_CC_RDI_INSTANCE:
The configured MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_CC_RDI_RDI:
The RDI that is inserted in transmitted CCM PDU.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE:
The configured MEP instance number.
The type is u32.
IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC:
The transmitted CCM frame destination MAC address.
The type is 6*u8 (array).
This is used as DMAC in all transmitted CFM frames.
IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE:
The transmitted CCM frame update (increment) of sequence
number is enabled or disabled.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD:
The period of time where CCM frame are transmitted.
The type is u32.
The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX
must be done before timeout to keep transmission alive.
When period is zero any ongoing CCM frame transmission
will be stopped.
IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV:
The transmitted CCM frame update with Interface Status TLV
is enabled or disabled.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE:
The transmitted Interface Status TLV value field.
The type is u8.
IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV:
The transmitted CCM frame update with Port Status TLV is enabled
or disabled.
The type is u32 (bool).
IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE:
The transmitted Port Status TLV value field.
The type is u8.
Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is the third commit of the implementation of the CFM protocol
according to 802.1Q section 12.14.
Functionality is extended with CCM frame reception.
The MEP instance now contains CCM based status information.
Most important is the CCM defect status indicating if correct
CCM frames are received with the expected interval.
Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is the second commit of the implementation of the CFM protocol
according to 802.1Q section 12.14.
Functionality is extended with CCM frame transmission.
Interface is extended with these functions:
br_cfm_cc_rdi_set()
br_cfm_cc_ccm_tx()
br_cfm_cc_config_set()
A MEP Continuity Check feature can be configured by
br_cfm_cc_config_set()
The Continuity Check parameters can be configured to be used when
transmitting CCM.
A MEP can be configured to start or stop transmission of CCM frames by
br_cfm_cc_ccm_tx()
The CCM will be transmitted for a selected period in seconds.
Must call this function before timeout to keep transmission alive.
A MEP transmitting CCM can be configured with inserted RDI in PDU by
br_cfm_cc_rdi_set()
Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is the first commit of the implementation of the CFM protocol
according to 802.1Q section 12.14.
It contains MEP instance create, delete and configuration.
Connectivity Fault Management (CFM) comprises capabilities for
detecting, verifying, and isolating connectivity failures in
Virtual Bridged Networks. These capabilities can be used in
networks operated by multiple independent organizations, each
with restricted management access to each others equipment.
CFM functions are partitioned as follows:
- Path discovery
- Fault detection
- Fault verification and isolation
- Fault notification
- Fault recovery
Interface consists of these functions:
br_cfm_mep_create()
br_cfm_mep_delete()
br_cfm_mep_config_set()
br_cfm_cc_config_set()
br_cfm_cc_peer_mep_add()
br_cfm_cc_peer_mep_remove()
A MEP instance is created by br_cfm_mep_create()
-It is the Maintenance association End Point
described in 802.1Q section 19.2.
-It is created on a specific level (1-7) and is assuring
that no CFM frames are passing through this MEP on lower levels.
-It initiates and validates CFM frames on its level.
-It can only exist on a port that is related to a bridge.
-Attributes given cannot be changed until the instance is
deleted.
A MEP instance can be deleted by br_cfm_mep_delete().
A created MEP instance has attributes that can be
configured by br_cfm_mep_config_set().
A MEP Continuity Check feature can be configured by
br_cfm_cc_config_set()
The Continuity Check Receiver state machine can be
enabled and disabled.
According to 802.1Q section 19.2.8
A MEP can have Peer MEPs added and removed by
br_cfm_cc_peer_mep_add() and br_cfm_cc_peer_mep_remove()
The Continuity Check feature can maintain connectivity
status on each added Peer MEP.
Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull fallthrough fix from Gustavo A. R. Silva:
"This fixes a ton of fall-through warnings when building with Clang
12.0.0 and -Wimplicit-fallthrough"
* tag 'fallthrough-fixes-clang-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
include: jhash/signal: Fix fall-through warnings for Clang
Pull rdma fixes from Jason Gunthorpe:
"The good news is people are testing rc1 in the RDMA world - the bad
news is testing of the for-next area is not as good as I had hoped, as
we really should have caught at least the rdma_connect_locked() issue
before now.
Notable merge window regressions that didn't get caught/fixed in time
for rc1:
- Fix in kernel users of rxe, they were broken by the rapid fix to
undo the uABI breakage in rxe from another patch
- EFA userspace needs to read the GID table but was broken with the
new GID table logic
- Fix user triggerable deadlock in mlx5 using devlink reload
- Fix deadlock in several ULPs using rdma_connect from the CM handler
callbacks
- Memory leak in qedr"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/qedr: Fix memory leak in iWARP CM
RDMA: Add rdma_connect_locked()
RDMA/uverbs: Fix false error in query gid IOCTL
RDMA/mlx5: Fix devlink deadlock on net namespace deletion
RDMA/rxe: Fix small problem in network_type patch
There are no known users of this driver as of October 2020, and it will
be removed unless someone turns out to still need it in future releases.
According to https://en.wikipedia.org/wiki/List_of_WiMAX_networks, there
have been many public wimax networks, but it appears that many of these
have migrated to LTE or discontinued their service altogether.
As most PCs and phones lack WiMAX hardware support, the remaining
networks tend to use standalone routers. These almost certainly
run Linux, but not a modern kernel or the mainline wimax driver stack.
NetworkManager appears to have dropped userspace support in 2015
https://bugzilla.gnome.org/show_bug.cgi?id=747846, the
www.linuxwimax.org
site had already shut down earlier.
WiMax is apparently still being deployed on airport campus networks
("AeroMACS"), but in a frequency band that was not supported by the old
Intel 2400m (used in Sandy Bridge laptops and earlier), which is the
only driver using the kernel's wimax stack.
Move all files into drivers/staging/wimax, including the uapi header
files and documentation, to make it easier to remove it when it gets
to that. Only minimal changes are made to the source files, in order
to make it possible to port patches across the move.
Also remove the MAINTAINERS entry that refers to a broken mailing
list and website.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-By: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Suggested-by: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
In preparation to enable -Wimplicit-fallthrough for Clang, explicitly
add break statements instead of letting the code fall through to the
next case.
This patch adds four break statements that, together, fix almost 40,000
warnings when building Linux 5.10-rc1 with Clang 12.0.0 and this[1] change
reverted. Notice that in order to enable -Wimplicit-fallthrough for Clang,
such change[1] is meant to be reverted at some point. So, this patch helps
to move in that direction.
Something important to mention is that there is currently a discrepancy
between GCC and Clang when dealing with switch fall-through to empty case
statements or to cases that only contain a break/continue/return
statement[2][3][4].
Now that the -Wimplicit-fallthrough option has been globally enabled[5],
any compiler should really warn on missing either a fallthrough annotation
or any of the other case-terminating statements (break/continue/return/
goto) when falling through to the next case statement. Making exceptions
to this introduces variation in case handling which may continue to lead
to bugs, misunderstandings, and a general lack of robustness. The point
of enabling options like -Wimplicit-fallthrough is to prevent human error
and aid developers in spotting bugs before their code is even built/
submitted/committed, therefore eliminating classes of bugs. So, in order
to really accomplish this, we should, and can, move in the direction of
addressing any error-prone scenarios and get rid of the unintentional
fallthrough bug-class in the kernel, entirely, even if there is some minor
redundancy. Better to have explicit case-ending statements than continue to
have exceptions where one must guess as to the right result. The compiler
will eliminate any actual redundancy.
[1] commit e2079e93f5 ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now")
[2] https://github.com/ClangBuiltLinux/linux/issues/636
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91432
[4] https://godbolt.org/z/xgkvIh
[5] commit a035d552a9 ("Makefile: Globally enable fall-through warning")
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Pull AFS fixes from David Howells:
- Fix copy_file_range() to an afs file now returning EINVAL if the
splice_write file op isn't supplied.
- Fix a deref-before-check in afs_unuse_cell().
- Fix a use-after-free in afs_xattr_get_acl().
- Fix afs to not try to clear PG_writeback when laundering a page.
- Fix afs to take a ref on a page that it sets PG_private on and to
drop that ref when clearing PG_private. This is done through recently
added helpers.
- Fix a page leak if write_begin() fails.
- Fix afs_write_begin() to not alter the dirty region info stored in
page->private, but rather do this in afs_write_end() instead when we
know what we actually changed.
- Fix afs_invalidatepage() to alter the dirty region info on a page
when partial page invalidation occurs so that we don't inadvertantly
include a span of zeros that will get written back if a page gets
laundered due to a remote 3rd-party induced invalidation.
We mustn't, however, reduce the dirty region if the page has been
seen to be mapped (ie. we got called through the page_mkwrite vector)
as the page might still be mapped and we might lose data if the file
is extended again.
- Fix the dirty region info to have a lower resolution if the size of
the page is too large for this to be encoded (e.g. powerpc32 with 64K
pages).
Note that this might not be the ideal way to handle this, since it
may allow some leakage of undirtied zero bytes to the server's copy
in the case of a 3rd-party conflict.
To aid the last two fixes, two additional changes:
- Wrap the manipulations of the dirty region info stored in
page->private into helper functions.
- Alter the encoding of the dirty region so that the region bounds can
be stored with one fewer bit, making a bit available for the
indication of mappedness.
* tag 'afs-fixes-20201029' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix dirty-region encoding on ppc32 with 64K pages
afs: Fix afs_invalidatepage to adjust the dirty region
afs: Alter dirty range encoding in page->private
afs: Wrap page->private manipulations in inline functions
afs: Fix where page->private is set during write
afs: Fix page leak on afs_write_begin() failure
afs: Fix to take ref on page when PG_private is set
afs: Fix afs_launder_page to not clear PG_writeback
afs: Fix a use after free in afs_xattr_get_acl()
afs: Fix tracing deref-before-check
afs: Fix copy_file_range()
Pull ext4 fixes from Ted Ts'o:
"Bug fixes for the new ext4 fast commit feature, plus a fix for the
'data=journal' bug fix.
Also use the generic casefolding support which has now landed in
fs/libfs.c for 5.10"
* tag 'ext4_for_linus_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: indicate that fast_commit is available via /sys/fs/ext4/feature/...
ext4: use generic casefolding support
ext4: do not use extent after put_bh
ext4: use IS_ERR() for error checking of path
ext4: fix mmap write protection for data=journal mode
jbd2: fix a kernel-doc markup
ext4: use s_mount_flags instead of s_mount_state for fast commit state
ext4: make num of fast commit blocks configurable
ext4: properly check for dirty state in ext4_inode_datasync_dirty()
ext4: fix double locking in ext4_fc_commit_dentry_updates()
Fix afs_invalidatepage() to adjust the dirty region recorded in
page->private when truncating a page. If the dirty region is entirely
removed, then the private data is cleared and the page dirty state is
cleared.
Without this, if the page is truncated and then expanded again by truncate,
zeros from the expanded, but no-longer dirty region may get written back to
the server if the page gets laundered due to a conflicting 3rd-party write.
It mustn't, however, shorten the dirty region of the page if that page is
still mmapped and has been marked dirty by afs_page_mkwrite(), so a flag is
stored in page->private to record this.
Fixes: 4343d00872 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
The afs filesystem uses page->private to store the dirty range within a
page such that in the event of a conflicting 3rd-party write to the server,
we write back just the bits that got changed locally.
However, there are a couple of problems with this:
(1) I need a bit to note if the page might be mapped so that partial
invalidation doesn't shrink the range.
(2) There aren't necessarily sufficient bits to store the entire range of
data altered (say it's a 32-bit system with 64KiB pages or transparent
huge pages are in use).
So wrap the accesses in inline functions so that future commits can change
how this works.
Also move them out of the tracing header into the in-directory header.
There's not really any need for them to be in the tracing header.
Signed-off-by: David Howells <dhowells@redhat.com>
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the
handler triggers a completion and another thread does rdma_connect() or
the handler directly calls rdma_connect().
In all cases rdma_connect() needs to hold the handler_mutex, but when
handler's are invoked this is already held by the core code. This causes
ULPs using the 2nd method to deadlock.
Provide a rdma_connect_locked() and have all ULPs call it from their
handlers.
Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com
Reported-and-tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Fixes: 2a7cec5381 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Under some circumstances, the compiler generates .ctors.* sections. This
is seen doing a cross compile of x86_64 from a powerpc64el host:
x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/trace_clock.o' being
placed in section `.ctors.65435'
x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ftrace.o' being
placed in section `.ctors.65435'
x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ring_buffer.o' being
placed in section `.ctors.65435'
Include these orphans along with the regular .ctors section.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 83109d5d5f ("x86/build: Warn on orphan section placement")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20201005025720.2599682-1-keescook@chromium.org
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.
Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo
mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence. A below call graph
of the unload flow.
cleanup_net()
down_read(&pernet_ops_rwsem); <- first sem acquired
ops_pre_exit_list()
pre_exit()
devlink_pernet_pre_exit()
devlink_reload()
mlx5_devlink_reload_down()
mlx5_unload_one()
[...]
mlx5_ib_remove()
mlx5_ib_unbind_slave_port()
mlx5_remove_netdev_notifier()
unregister_netdevice_notifier()
down_write(&pernet_ops_rwsem);<- recurrsive lock
Hence, when net namespace is deleted, mlx5 reload results in deadlock.
When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.
Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.
Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Commit 453431a549 ("mm, treewide: rename kzfree() to
kfree_sensitive()") renamed kzfree() to kfree_sensitive(),
but it left a compatibility definition of kzfree() to avoid
being too disruptive.
Since then a few more instances of kzfree() have slipped in.
Just get rid of them and remove the compatibility definition
once and for all.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull perf fix from Thomas Gleixner:
"A single fix to compute the field offset of the SNOOPX bit in the data
source bitmask of perf events correctly"
* tag 'perf-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: correct SNOOPX field offset
Pull locking fix from Thomas Gleixner:
"Just a trivial fix for kernel-doc warnings"
* tag 'locking-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/seqlocks: Fix kernel-doc warnings
Pull more parisc updates from Helge Deller:
- During this merge window O_NONBLOCK was changed to become 000200000,
but we missed that the syscalls timerfd_create(), signalfd4(),
eventfd2(), pipe2(), inotify_init1() and userfaultfd() do a strict
bit-wise check of the flags parameter.
To provide backward compatibility with existing userspace we
introduce parisc specific wrappers for those syscalls which filter
out the old O_NONBLOCK value and replaces it with the new one.
- Prevent HIL bus driver to get stuck when keyboard or mouse isn't
attached
- Improve error return codes when setting rtc time
- Minor documentation fix in pata_ns87415.c
* 'parisc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
ata: pata_ns87415.c: Document support on parisc with superio chip
parisc: Add wrapper syscalls to fix O_NONBLOCK flag usage
hil/parisc: Disable HIL driver when it gets stuck
parisc: Improve error return codes when setting rtc time
Pull more xen updates from Juergen Gross:
- a series for the Xen pv block drivers adding module parameters for
better control of resource usge
- a cleanup series for the Xen event driver
* tag 'for-linus-5.10b-rc1c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
Documentation: add xen.fifo_events kernel parameter description
xen/events: unmask a fifo event channel only if it was masked
xen/events: only register debug interrupt for 2-level events
xen/events: make struct irq_info private to events_base.c
xen: remove no longer used functions
xen-blkfront: Apply changed parameter name to the document
xen-blkfront: add a parameter for disabling of persistent grants
xen-blkback: add a parameter for disabling of persistent grants
Pull random32 updates from Willy Tarreau:
"Make prandom_u32() less predictable.
This is the cleanup of the latest series of prandom_u32
experimentations consisting in using SipHash instead of Tausworthe to
produce the randoms used by the network stack.
The changes to the files were kept minimal, and the controversial
commit that used to take noise from the fast_pool (f227e3ec3b) was
reverted. Instead, a dedicated "net_rand_noise" per_cpu variable is
fed from various sources of activities (networking, scheduling) to
perturb the SipHash state using fast, non-trivially predictable data,
instead of keeping it fully deterministic. The goal is essentially to
make any occasional memory leakage or brute-force attempt useless.
The resulting code was verified to be very slightly faster on x86_64
than what is was with the controversial commit above, though this
remains barely above measurement noise. It was also tested on i386 and
arm, and build- tested only on arm64"
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
* tag '20201024-v4-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/prandom:
random32: add a selftest for the prandom32 code
random32: add noise from network and scheduling activity
random32: make prandom_u32() output unpredictable
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph
- rdma error handling fixes (Chao Leng)
- fc error handling and reconnect fixes (James Smart)
- fix the qid displace when tracing ioctl command (Keith Busch)
- don't use BLK_MQ_REQ_NOWAIT for passthru (Chaitanya Kulkarni)
- fix MTDT for passthru (Logan Gunthorpe)
- blacklist Write Same on more devices (Kai-Heng Feng)
- fix an uninitialized work struct (zhenwei pi)"
- lightnvm out-of-bounds fix (Colin)
- SG allocation leak fix (Doug)
- rnbd fixes (Gioh, Guoqing, Jack)
- zone error translation fixes (Keith)
- kerneldoc markup fix (Mauro)
- zram lockdep fix (Peter)
- Kill unused io_context members (Yufen)
- NUMA memory allocation cleanup (Xianting)
- NBD config wakeup fix (Xiubo)
* tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block: (27 commits)
block: blk-mq: fix a kernel-doc markup
nvme-fc: shorten reconnect delay if possible for FC
nvme-fc: wait for queues to freeze before calling update_hr_hw_queues
nvme-fc: fix error loop in create_hw_io_queues
nvme-fc: fix io timeout to abort I/O
null_blk: use zone status for max active/open
nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru
nvmet: cleanup nvmet_passthru_map_sg()
nvmet: limit passthru MTDS by BIO_MAX_PAGES
nvmet: fix uninitialized work for zero kato
nvme-pci: disable Write Zeroes on Sandisk Skyhawk
nvme: use queuedata for nvme_req_qid
nvme-rdma: fix crash due to incorrect cqe
nvme-rdma: fix crash when connect rejected
block: remove unused members for io_context
blk-mq: remove the calling of local_memory_node()
zram: Fix __zram_bvec_{read,write}() locking order
skd_main: remove unused including <linux/version.h>
sgl_alloc_order: fix memory leak
lightnvm: fix out-of-bounds write to array devices->info[]
...
Pull io_uring fixes from Jens Axboe:
- fsize was missed in previous unification of work flags
- Few fixes cleaning up the flags unification creds cases (Pavel)
- Fix NUMA affinities for completely unplugged/replugged node for io-wq
- Two fallout fixes from the set_fs changes. One local to io_uring, one
for the splice entry point that io_uring uses.
- Linked timeout fixes (Pavel)
- Removal of ->flush() ->files work-around that we don't need anymore
with referenced files (Pavel)
- Various cleanups (Pavel)
* tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block:
splice: change exported internal do_splice() helper to take kernel offset
io_uring: make loop_rw_iter() use original user supplied pointers
io_uring: remove req cancel in ->flush()
io-wq: re-set NUMA node affinities if CPUs come online
io_uring: don't reuse linked_timeout
io_uring: unify fsize with def->work_flags
io_uring: fix racy REQ_F_LINK_TIMEOUT clearing
io_uring: do poll's hash_node init in common code
io_uring: inline io_poll_task_handler()
io_uring: remove extra ->file check in poll prep
io_uring: make cached_cq_overflow non atomic_t
io_uring: inline io_fail_links()
io_uring: kill ref get/drop in personality init
io_uring: flags-based creds init in queue
Pull misc vfs updates from Al Viro:
"Assorted stuff all over the place (the largest group here is
Christoph's stat cleanups)"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove KSTAT_QUERY_FLAGS
fs: remove vfs_stat_set_lookup_flags
fs: move vfs_fstatat out of line
fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
fs: remove vfs_statx_fd
fs: omfs: use kmemdup() rather than kmalloc+memcpy
[PATCH] reduce boilerplate in fsid handling
fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS
selftests: mount: add nosymfollow tests
Add a "nosymfollow" mount option.
Pull dma-mapping fixes from Christoph Hellwig:
- document the new dma_{alloc,free}_pages() API
- two fixups for the dma-mapping.h split
* tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: document dma_{alloc,free}_pages
dma-mapping: move more functions to dma-map-ops.h
ARM/sa1111: add a missing include of dma-map-ops.h
With the removal of the interrupt perturbations in previous random32
change (random32: make prandom_u32() output unpredictable), the PRNG
has become 100% deterministic again. While SipHash is expected to be
way more robust against brute force than the previous Tausworthe LFSR,
there's still the risk that whoever has even one temporary access to
the PRNG's internal state is able to predict all subsequent draws till
the next reseed (roughly every minute). This may happen through a side
channel attack or any data leak.
This patch restores the spirit of commit f227e3ec3b ("random32: update
the net random state on interrupt and activity") in that it will perturb
the internal PRNG's statee using externally collected noise, except that
it will not pick that noise from the random pool's bits nor upon
interrupt, but will rather combine a few elements along the Tx path
that are collectively hard to predict, such as dev, skb and txq
pointers, packet length and jiffies values. These ones are combined
using a single round of SipHash into a single long variable that is
mixed with the net_rand_state upon each invocation.
The operation was inlined because it produces very small and efficient
code, typically 3 xor, 2 add and 2 rol. The performance was measured
to be the same (even very slightly better) than before the switch to
SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
(i40e), the connection rate dropped from 556k/s to 555k/s while the
SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin <lkml@sdf.org>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output. An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.
It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack. Oops.
This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key. (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.) Speed is prioritized over security;
attacks are rare, while performance is always wanted.
Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.
Commit f227e3ec3b ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution. This patch replaces
it.
Reported-by: Amit Klein <aksecurity@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Fixes: f227e3ec3b ("random32: update the net random state on interrupt and activity")
Signed-off-by: George Spelvin <lkml@sdf.org>
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
to prandom.h for later use; merged George's prandom_seed() proposal;
inlined siprand_u32(); replaced the net_rand_state[] array with 4
members to fix a build issue; cosmetic cleanups to make checkpatch
happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Pull more RISC-V updates from Palmer Dabbelt:
"Just a single patch set: the remainder of Christoph's work to remove
set_fs, including the RISC-V portion"
* tag 'riscv-for-linus-5.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: remove address space overrides using set_fs()
riscv: implement __get_kernel_nofault and __put_user_nofault
riscv: refactor __get_user and __put_user
riscv: use memcpy based uaccess for nommu again
asm-generic: make the set_fs implementation optional
asm-generic: add nommu implementations of __{get,put}_kernel_nofault
asm-generic: improve the nommu {get,put}_user handling
uaccess: provide a generic TASK_SIZE_MAX definition
Pull ARM SoC-related driver updates from Olof Johansson:
"Various driver updates for platforms. A bulk of this is smaller fixes
or cleanups, but some of the new material this time around is:
- Support for Nvidia Tegra234 SoC
- Ring accelerator support for TI AM65x
- PRUSS driver for TI platforms
- Renesas support for R-Car V3U SoC
- Reset support for Cortex-M4 processor on i.MX8MQ
There are also new socinfo entries for a handful of different SoCs and
platforms"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (131 commits)
drm/mediatek: reduce clear event
soc: mediatek: cmdq: add clear option in cmdq_pkt_wfe api
soc: mediatek: cmdq: add jump function
soc: mediatek: cmdq: add write_s_mask value function
soc: mediatek: cmdq: add write_s value function
soc: mediatek: cmdq: add read_s function
soc: mediatek: cmdq: add write_s_mask function
soc: mediatek: cmdq: add write_s function
soc: mediatek: cmdq: add address shift in jump
soc: mediatek: mtk-infracfg: Fix kerneldoc
soc: amlogic: pm-domains: use always-on flag
reset: sti: reset-syscfg: fix struct description warnings
reset: imx7: add the cm4 reset for i.MX8MQ
dt-bindings: reset: imx8mq: add m4 reset
reset: Fix and extend kerneldoc
reset: reset-zynqmp: Added support for Versal platform
dt-bindings: reset: Updated binding for Versal reset driver
reset: imx7: Support module build
soc: fsl: qe: Remove unnessesary check in ucc_set_tdm_rxtx_clk
soc: fsl: qman: convert to use be32_add_cpu()
...
Pull ARM SoC platform updates from Olof Johansson:
"SoC changes, a substantial part of this is cleanup of some of the
older platforms that used to have a bunch of board files.
In particular:
- Remove non-DT i.MX platforms that haven't seen activity in years,
it's time to remove them.
- A bunch of cleanup and removal of platform data for TI/OMAP
platforms, moving over to genpd for power/reset control (yay!)
- Major cleanup of Samsung S3C24xx and S3C64xx platforms, moving them
closer to multiplatform support (not quite there yet, but getting
close).
There are a few other changes too, smaller fixlets, etc. For new
platform support, the primary ones are:
- New SoC: Hisilicon SD5203, ARM926EJ-S platform.
- Cpufreq support for i.MX7ULP"
* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (121 commits)
ARM: mstar: Select MStar intc
ARM: stm32: Replace HTTP links with HTTPS ones
ARM: debug: add UART early console support for SD5203
ARM: hisi: add support for SD5203 SoC
ARM: omap3: enable off mode automatically
clk: imx: imx35: Remove mx35_clocks_init()
clk: imx: imx31: Remove mx31_clocks_init()
clk: imx: imx27: Remove mx27_clocks_init()
ARM: imx: Remove unused definitions
ARM: imx35: Retrieve the IIM base address from devicetree
ARM: imx3: Retrieve the AVIC base address from devicetree
ARM: imx3: Retrieve the CCM base address from devicetree
ARM: imx31: Retrieve the IIM base address from devicetree
ARM: imx27: Retrieve the CCM base address from devicetree
ARM: imx27: Retrieve the SYSCTRL base address from devicetree
ARM: s3c64xx: bring back notes from removed debug-macro.S
ARM: s3c24xx: fix Wunused-variable warning on !MMU
ARM: samsung: fix PM debug build with DEBUG_LL but !MMU
MAINTAINERS: mark linux-samsung-soc list non-moderated
ARM: imx: Remove remnant board file support pieces
...