Commit Graph

1187393 Commits

Author SHA1 Message Date
Christophe JAILLET
978015f7ef net/mlx5e: Remove a useless function call
'handle' is known to be NULL here. There is no need to kfree() it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:53 -07:00
Shay Drory
e71383fb9c net/mlx5: Light probe local SFs
In case user wants to configure the SFs, for example: to use only vdpa
functionality, he needs to fully probe a SF, configure what he wants,
and afterward reload the SF.

In order to save the time of the reload, local SFs will probe without
any auxiliary sub-device, so that the SFs can be configured prior to
its full probe.

The defaults of the enable_* devlink params of these SFs are set to
false.

Usage example:
Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
               hw_addr 00:00:00:00:00:11 state active

Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
              name enable_eth value true cmode driverinit

Now, in order to fully probe the SF, use devlink reload:
$ devlink dev reload auxiliary/mlx5_core.sf.1

At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:53 -07:00
Shay Drory
3f90840305 net/mlx5: Move esw multiport devlink param to eswitch code
Move the param registration and handling code into the eswitch
code as they are related to each other. No point in having the
devlink param registration done in separate file.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:53 -07:00
Shay Drory
2059cf51f3 net/mlx5: Split function_setup() to enable and open functions
mlx5_cmd_init_hca() is taking ~0.2 seconds. In case of a user who
desire to disable some of the SF aux devices, and with large scale-1K
SFs for example, this user will waste more than 3 minutes on
mlx5_cmd_init_hca() which isn't needed at this stage.

Downstream patch will change SFs which are probe over the E-switch,
local SFs, to be probed without any aux dev. In order to support this,
split function_setup() to avoid executing mlx5_cmd_init_hca().

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:52 -07:00
Daniel Jurgens
7057fe5619 net/mlx5: Set max number of embedded CPU VFs
Set the maximum number of embedded cpu VF functions available.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:52 -07:00
Daniel Jurgens
6d98f314bf net/mlx5: Update SRIOV enable/disable to handle EC/VFs
Previously on the embedded CPU platform SRIOV was never enabled/disabled
via mlx5_core_sriov_configure. Host VF updates are provided by an event
handler. Now in the disable flow it must be known if this is a disable
due to driver unload or SRIOV detach, or if the user updated the number
of VFs. If due to change in the number of VFs only wait for the pages of
ECVFs.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:52 -07:00
Daniel Jurgens
42a84a4309 net/mlx5: Query correct caps for min msix vectors
The VFs on the host and the embedded CPU platform share function
numbers. Set the ec_vf_function field to query the caps for the correct
function.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:52 -07:00
Daniel Jurgens
2ee3db806e net/mlx5: Use correct vport when restoring GUIDs
Prior to enabling EC VF functionality the vport number and function ID
were always the same. That's not the case now. Use the correct vport
number to modify the HCA vport context.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:52 -07:00
Daniel Jurgens
395ccd6eb4 net/mlx5: Add new page type for EC VF pages
When the embedded cpu supports SRIOV it can be enabled and disabled
independently from the host SRIOV. Track the pages separately so we can
properly wait for returned VF pages.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:51 -07:00
Daniel Jurgens
fa3c73eee6 net/mlx5: Add/remove peer miss rules for EC VFs
Add and remove the peer miss rules for EC VFs. It's possible that there
are different amounts of total VFs per function so only create rules for
the minimum number of max VFs.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:51 -07:00
Daniel Jurgens
a7719b29a8 net/mlx5: Add management of EC VF vports
Add init, load, unload, and cleanup of the EC VF vports. This includes
changes in how eswitch SRIOV is managed. Previous on an embedded CPU
platform the number of VFs provided when enabling the eswitch was always
0, host VFs vports are handled in the eswitch functions change event
handler. Now track the number of EC VFs as well, so they can be handled
properly in the enable/disable flows.

There are only 3 marks available for use in xarrays, all 3 were already
in use for this use case. EC VF vports are in a known range so we can
access them by index instead of marks.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:51 -07:00
Daniel Jurgens
9ac0b12824 net/mlx5: Update vport caps query/set for EC VFs
These functions are for query/set by vport, there was an underlying
assumption that vport was equal to function ID. That's not the case for
EC VF functions. Set the ec_vf_function bit accordingly.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:51 -07:00
Daniel Jurgens
dc13180824 net/mlx5: Enable devlink port for embedded cpu VF vports
Enable creation of a devlink port for EC VF vports.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:51 -07:00
Daniel Jurgens
93b36d0f28 net/mlx5: mlx5_ifc updates for embedded CPU SRIOV
Add ec_vf_vport_base to HCA Capabilities 2. This indicates the base vport
of embedded CPU virtual functions that are connected to the eswitch.

Add ec_vf_function to query/set_hca_caps. If set this indicates
accessing a virtual function on the embedded CPU by function ID. This
should only be used with other_function set to 1.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:50 -07:00
Daniel Jurgens
18a92b0542 net/mlx5: Simplify unload all rep code
Instead of using type specific iterators which are only used in one place
just traverse the xarray. It will provide suitable ordering based on the
vport numbers. This will also eliminate the need for changes here when
new types are added.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:50 -07:00
Jakub Kicinski
ded5c1a16e Merge branch 'tools-ynl-gen-code-gen-improvements-before-ethtool'
Jakub Kicinski says:

====================
tools: ynl-gen: code gen improvements before ethtool

I was going to post ethtool but I couldn't stand the ugliness
of the if conditions which were previously generated.
So I cleaned that up and improved a number of other things
ethtool will benefit from.
====================

Link: https://lore.kernel.org/r/20230608211200.1247213-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:33 -07:00
Jakub Kicinski
76abff37f0 tools: ynl-gen: support / skip pads on the way to kernel
Kernel does not have padding requirements for 64b attrs.
We can ignore pad attrs.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
6f96ec73cb tools: ynl-gen: don't pass op_name to RenderInfo
The op_name argument is barely used and identical to op.name
in all cases.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
6da3424fd6 tools: ynl-gen: support code gen for events
Netlink specs support both events and notifications (former can
define their own message contents). Plug in missing code to
generate types, parsers and include events into notification
tables.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
ced1568862 tools: ynl-gen: sanitize notification tracking
Don't modify the raw dicts (as loaded from YAML) to pretend
that the notify attributes also exist on the ops. This makes
the code easier to follow.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
d0915d64c3 tools: ynl: regen: stop generating common notification handlers
Remove unused notification handlers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
f2ba1e5e22 tools: ynl-gen: stop generating common notification handlers
Common notification handler was supposed to be a way for the user
to parse the notifications from a socket synchronously.
I don't think we'll end up using it, ynl_ntf_check() works for
all known use cases.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
7234415b8f tools: ynl: regen: regenerate the if ladders
Renegate the code to combine } and else and use tmp variable
to store type.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
e4ea3cc684 tools: ynl-gen: get attr type outside of if()
Reading attr type with mnl_attr_get_type() for each condition
leads to most conditions being longer than 80 chars.
Avoid this by reading the type to a variable on the stack.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
2c0f146686 tools: ynl-gen: combine else with closing bracket
Code gen currently prints:

  }
  else if (...

This is really ugly. Fix it by delaying printing of closing
brackets in anticipation of else coming along.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
820343ccbb tools: ynl-gen: complete the C keyword list
C keywords need to be avoided when naming things.
Complete the list (ethtool has at least one thing called "auto").

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
9b52fd4b63 tools: ynl: regen: cleanup user space header includes
Remove unnecessary includes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
30b5c720e1 tools: ynl-gen: cleanup user space header includes
Bots started screaming that we're including stdlib.h twice.
While at it move string.h into a common spot and drop stdio.h
which we don't need.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5464
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5466
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5467
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 14:40:31 -07:00
Jakub Kicinski
7ec5d48fdb Revert "tools: ynl: Remove duplicated include in handshake-user.c"
This reverts commit e7c5433c5a.

Commit e7c5433c5a ("tools: ynl: Remove duplicated include
in handshake-user.c") was applied too hastily. It changes
an auto-generated file, and there's already a proper fix
on the list.

Link: https://lore.kernel.org/all/ZIMPLYi%2FxRih+DlC@nanopsycho/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09 11:01:04 -07:00
Yang Li
e7c5433c5a tools: ynl: Remove duplicated include in handshake-user.c
./tools/net/ynl/generated/handshake-user.c: stdlib.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5464
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 11:36:41 +01:00
David S. Miller
56f7783ba4 Merge branch 'broadcom-phy-led-brightness'
Florian Fainelli says:

====================
LED brightness support for Broadcom PHYs

This patch series adds support for controlling the LED brightness on
Broadcom PHYs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:38:44 +01:00
Florian Fainelli
bd5736e146 net: phy: broadcom: Add support for setting LED brightness
Broadcom PHYs have two LEDs selector registers which allow us to control
the LED assignment, including how to turn them on/off.

Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:38:43 +01:00
Florian Fainelli
57fd7d59b1 net: phy: broadcom: Rename LED registers
These registers are common to most PHYs and are not specific to the
BCM5482, renamed the constants accordingly, no functional change.

Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:38:43 +01:00
David S. Miller
54a8c43f3b Merge branch 'net-ncsi-refactoring-for-GMA-cmd'
Ivan Mikhaylov says:

====================
net/ncsi: refactoring for GMA command

Make one GMA function for all manufacturers, change ndo_set_mac_address
to dev_set_mac_address for notifiying net layer about MAC change which
ndo_set_mac_address doesn't do.

Changes from v1:
	1. delete ftgmac100.txt changes about mac-address-increment
	2. add convert to yaml from ftgmac100.txt
	3. add mac-address-increment option for ethernet-controller.yaml

Changes from v2:
	1. remove DT changes from series, will be done in another one
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:32:51 +01:00
Ivan Mikhaylov
790071347a net/ncsi: change from ndo_set_mac_address to dev_set_mac_address
Change ndo_set_mac_address to dev_set_mac_address because
dev_set_mac_address provides a way to notify network layer about MAC
change. In other case, services may not aware about MAC change and keep
using old one which set from network adapter driver.

As example, DHCP client from systemd do not update MAC address without
notification from net subsystem which leads to the problem with acquiring
the right address from DHCP server.

Fixes: cb10c7c0df ("net/ncsi: Add NCSI Broadcom OEM command")
Cc: stable@vger.kernel.org # v6.0+ 2f38e84 net/ncsi: make one oem_gma function for all mfr id
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: Ivan Mikhaylov <fr0st61te@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:32:51 +01:00
Ivan Mikhaylov
74b449b98d net/ncsi: make one oem_gma function for all mfr id
Make the one Get Mac Address function for all manufacturers and change
this call in handlers accordingly.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ivan Mikhaylov <fr0st61te@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:32:51 +01:00
Foster Snowhill
0c6e9d32ef usbnet: ipheth: update Kconfig description
This module has for a long time not been limited to iPhone <= 3GS.
Update description to match the actual state of the driver.

Remove dead link from 2010, instead reference an existing userspace
iOS device pairing implementation as part of libimobiledevice.

Signed-off-by: Foster Snowhill <forst@pen.gy>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:26:57 +01:00
Foster Snowhill
a2d274c62e usbnet: ipheth: add CDC NCM support
Recent iOS releases support CDC NCM encapsulation on RX. This mode is
the default on macOS and Windows. In this mode, an iOS device may include
one or more Ethernet frames inside a single URB.

Freshly booted iOS devices start in legacy mode, but are put into
NCM mode by the official Apple driver. When reconnecting such a device
from a macOS/Windows machine to a Linux host, the device stays in
NCM mode, making it unusable with the legacy ipheth driver code.

To correctly support such a device, the driver has to either support
the NCM mode too, or put the device back into legacy mode.

To match the behaviour of the macOS/Windows driver, and since there
is no documented control command to revert to legacy mode, implement
NCM support. The device is attempted to be put into NCM mode by default,
and falls back to legacy mode if the attempt fails.

Signed-off-by: Foster Snowhill <forst@pen.gy>
Tested-by: Georgi Valkov <gvalkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:26:57 +01:00
Foster Snowhill
3e65efcca8 usbnet: ipheth: transmit URBs without trailing padding
The behaviour of the official iOS tethering driver on macOS is to not
transmit any trailing padding at the end of URBs. This is applicable
to both NCM and legacy modes, including older devices.

Adapt the driver to not include trailing padding in TX URBs, matching
the behaviour of the official macOS driver.

Signed-off-by: Foster Snowhill <forst@pen.gy>
Tested-by: Georgi Valkov <gvalkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:26:57 +01:00
Georgi Valkov
2203718c2f usbnet: ipheth: fix risk of NULL pointer deallocation
The cleanup precedure in ipheth_probe will attempt to free a
NULL pointer in dev->ctrl_buf if the memory allocation for
this buffer is not successful. While kfree ignores NULL pointers,
and the existing code is safe, it is a better design to rearrange
the goto labels and avoid this.

Signed-off-by: Georgi Valkov <gvalkov@gmail.com>
Signed-off-by: Foster Snowhill <forst@pen.gy>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:26:57 +01:00
Jakub Kicinski
fd5f4d7da2 Merge branch 'splice-net-rewrite-splice-to-socket-fix-splice_f_more-and-handle-msg_splice_pages-in-af_tls'
David Howells says:

====================
splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS

Here are patches to do the following:

 (1) Block MSG_SENDPAGE_* flags from leaking into ->sendmsg() from
     userspace, whilst allowing splice_to_socket() to pass them in.

 (2) Allow MSG_SPLICE_PAGES to be passed into tls_*_sendmsg().  Until
     support is added, it will be ignored and a splice-driven sendmsg()
     will be treated like a normal sendmsg().  TCP, UDP, AF_UNIX and
     Chelsio-TLS already handle the flag in net-next.

 (3) Replace a chain of functions to splice-to-sendpage with a single
     function to splice via sendmsg() with MSG_SPLICE_PAGES.  This allows a
     bunch of pages to be spliced from a pipe in a single call using a
     bio_vec[] and pushes the main processing loop down into the bowels of
     the protocol driver rather than repeatedly calling in with a page at a
     time.

 (4) Provide a ->splice_eof() op[2] that allows splice to signal to its
     output that the input observed a premature EOF and that the caller
     didn't flag SPLICE_F_MORE, thereby allowing a corked socket to be
     flushed.  This attempts to maintain the current behaviour.  It is also
     not called if we didn't manage to read any data and so didn't called
     the actor function.

     This needs routing though several layers to get it down to the network
     protocol.

     [!] Note that I chose not to pass in any flags - I'm not sure it's
     	 particularly useful to pass in the splice flags; I also elected
     	 not to return any error code - though we might actually want to do
     	 that.

 (5) Provide tls_{device,sw}_splice_eof() to flush a pending TLS record if
     there is one.

 (6) Provide splice_eof() for UDP, TCP, Chelsio-TLS and AF_KCM.  AF_UNIX
     doesn't seem to pay attention to the MSG_MORE or MSG_SENDPAGE_NOTLAST
     flags.

 (7) Alter the behaviour of sendfile() and fix SPLICE_F_MORE/MSG_MORE
     signalling[1] such SPLICE_F_MORE is always signalled until we have
     read sufficient data to finish the request.  If we get a zero-length
     before we've managed to splice sufficient data, we now leave the
     socket expecting more data and leave it to userspace to deal with it.

 (8) Make AF_TLS handle the MSG_SPLICE_PAGES internal sendmsg flag.
     MSG_SPLICE_PAGES is an internal hint that tells the protocol that it
     should splice the pages supplied if it can.  Its sendpage
     implementations are then turned into wrappers around that.

Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ [2]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/20230524153311.3625329-1-dhowells@redhat.com/ # v1
====================

Link: https://lore.kernel.org/r/20230607181920.2294972-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 19:40:33 -07:00
David Howells
3dc8976c7a tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES
Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself.  With that, the tls_iter_offset
union is no longer necessary and can be replaced with an iov_iter pointer
and the zc_page argument to tls_push_data() can also be removed.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 19:40:31 -07:00
David Howells
24763c9c09 tls/device: Support MSG_SPLICE_PAGES
Make TLS's device sendmsg() support MSG_SPLICE_PAGES.  This causes pages to
be spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 19:40:31 -07:00
David Howells
45e5be844a tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg()
with MSG_SPLICE_PAGES rather than directly splicing in the pages itself.

[!] Note that tls_sw_sendpage_locked() appears to have the wrong locking
    upstream.  I think the caller will only hold the socket lock, but it
    should hold tls_ctx->tx_lock too.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 19:40:31 -07:00
David Howells
fe1e81d4f7 tls/sw: Support MSG_SPLICE_PAGES
Make TLS's sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 19:40:31 -07:00
David Howells
219d92056b splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()
splice_direct_to_actor() doesn't manage SPLICE_F_MORE correctly[1] - and,
as a result, it incorrectly signals/fails to signal MSG_MORE when splicing
to a socket.  The problem I'm seeing happens when a short splice occurs
because we got a short read due to hitting the EOF on a file: as the length
read (read_len) is less than the remaining size to be spliced (len),
SPLICE_F_MORE (and thus MSG_MORE) is set.

The issue is that, for the moment, we have no way to know *why* the short
read occurred and so can't make a good decision on whether we *should* keep
MSG_MORE set.

MSG_SENDPAGE_NOTLAST was added to work around this, but that is also set
incorrectly under some circumstances - for example if a short read fills a
single pipe_buffer, but the next read would return more (seqfile can do
this).

This was observed with the multi_chunk_sendfile tests in the tls kselftest
program.  Some of those tests would hang and time out when the last chunk
of file was less than the sendfile request size:

	build/kselftest/net/tls -r tls.12_aes_gcm.multi_chunk_sendfile

This has been observed before[2] and worked around in AF_TLS[3].

Fix this by making splice_direct_to_actor() always signal SPLICE_F_MORE if
we haven't yet hit the requested operation size.  SPLICE_F_MORE remains
signalled if the user passed it in to splice() but otherwise gets cleared
when we've read sufficient data to fulfill the request.

If, however, we get a premature EOF from ->splice_read(), have sent at
least one byte and SPLICE_F_MORE was not set by the caller, ->splice_eof()
will be invoked.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: David Hildenbrand <david@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: linux-mm@kvack.org

Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ [2]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a [3]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 19:40:31 -07:00
David Howells
951ace9951 kcm: Use splice_eof() to flush
Allow splice to undo the effects of MSG_MORE after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Cong Wang <cong.wang@bytedance.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 19:40:31 -07:00
David Howells
c289a1601a chelsio/chtls: Use splice_eof() to flush
Allow splice to end a Chelsio TLS record after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ayush Sawal <ayush.sawal@chelsio.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 19:40:31 -07:00
David Howells
1d7e4538a5 ipv4, ipv6: Use splice_eof() to flush
Allow splice to undo the effects of MSG_MORE after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.

For UDP, a pending packet will not be emitted if the socket is closed
before it is flushed; with this change, it be flushed by ->splice_eof().

For TCP, it's not clear that MSG_MORE is actually effective.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Kuniyuki Iwashima <kuniyu@amazon.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 19:40:30 -07:00
David Howells
d4c1e80b0d tls/device: Use splice_eof() to flush
Allow splice to end a TLS record after prematurely ending a splice/sendfile
due to getting an EOF condition (->splice_read() returned 0) after splice
had called TLS with a sendmsg() with MSG_MORE set when the user didn't set
MSG_MORE.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 19:40:30 -07:00