Store and handle ethtool link mode masks within the driver instead of
just a single u32. However, quite a significant amount of existing code
wants to manipulate the masks directly, and thus now uses the first
unsigned long (i.e. mask[0]) as though it were a legacy u32 mask. This
is ok because all the bits that code is interested in are in the first
32 bits of the mask; but it might be a good idea to change them in
future to use the proper bitmap API.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only handles direct speed setting, not autoneg, because the driver is
still trying to pretend it uses the legacy ethtool API which doesn't
have advertised/supported bits for 25/50/100G.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a qdisc is being replaced by another qdisc of the same type, it can
simply override over its configuration.
However, if it replaces a qdisc of another type, it needs to be removed
before setting the new qdisc.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a generic qdisc replace function.
For that goal, add three functions to the qdisc ops struct:
* check_params: Checks if the given parameters are offloadable.
* replace: Offload the given parameters.
* clean_stats: clean the qdisc stats for the offloaded qdisc.
integrate RED offloading into using the new internal replace API.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a destroy function to the qdiscs ops struct.
Create a generic qdisc destroy function, that clears the qdisc metadata as
well as calling the specific qdisc destroy function.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qdisc struct have the Qdisc_class_ops struct.
This patch introduces the similar ops struct for the mlxsw_sp_qdisc_ops
struct. It allows better readability as well as code reusability for the
common parts of some functions like destroy.
The first operations to be added are the statistics getters.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every qdisc op gets the qdisc handle ID as well as its location. Each one
of them, beside replace, checks if the handle doesn't match the qdisc in
the given location, and if so, it returns without running the actual op.
Unite these checks to one comparison function and avoid sending the handle
id to these ops.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tclass number is needed for most of the operations related to the qdisc in
the driver. Create a field for it in the mlxsw_sp_qdisc instead of passing
it to every function as parameter.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improve readability by changing the clean stats function to handle only
RED. Qdiscs that will be offloaded in the future will have a clean stats
function of their own.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean RED offloaded stats and make them more generic by breaking the
generic qdisc stats to a struct of their own.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the value of the xstats requested from the driver for offloaded RED
to be incremental, like the normal stats.
It increases consistency - if a qdisc stops being offloaded its xstats
don't change.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the name of the stats struct to be generic, so it could be used for
other qdisc offload, that will be added in the next patches.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move all the qdisc related data from the spectrum.h to spectrum_qdisc.c.
Create an init and fini functions for the qdiscs.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return a negative error code from the xdp_rxq_info_reg() error
handling case instead of 0, as done elsewhere in this function.
Fixes: 0ddf543226 ("xdp/mlx5: setup xdp_rxq_info")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We'd come in with SGE_FL_BUFFER_SIZE[0] and [1] both equal to 64KB and
the extant logic would flag that as an error. This was already fixed in
cxgb4 driver with "92ddcc7 cxgb4: Fix some small bugs in
t4_sge_init_soft() when our Page Size is 64KB".
Original Work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The internal PHYs in the mv88e6390 switch have a temperature sensor.
It uses a different register layout to other PHY currently supported.
It also has an errata, in that some reads of the sensor result in bad
values. So a number of reads need to be made, and the average taken.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements the changes needed for the bnxt_en driver to add support
for dynamic interrupt moderation per ring.
This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring. A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
NET_DIM, respectively, in code that handles dynamic interrupt
moderation. Also change all references from 'am' to 'dim' when used as
local variables and add generic profile references.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These functions were identified as ones that could be made generic and
used by multiple drivers. Most of the contents of en_rx_am.c are moved
to net_dim.c.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create new header file to prepare to move code that handles irq
moderation to a library that lives in a header file.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch tries to batched used ring update during RX. This is pretty
fit for the case when guest is much faster (e.g dpdk based
backend). In this case, used ring is almost empty:
- we may get serious cache line misses/contending on both used ring
and used idx.
- at most 1 packet could be dequeued at one time, batching in guest
does not make much effect.
Update used ring in a batch can help since guest won't access the used
ring until used idx was advanced for several descriptors and since we
advance used ring for every N packets, guest will only need to access
used idx for every N packet since it can cache the used idx. To have a
better interaction for both batch dequeuing and dpdk batching,
VHOST_RX_BATCH was used as the maximum number of descriptors that
could be batched.
Test were done between two machines with 2.40GHz Intel(R) Xeon(R) CPU
E5-2630 connected back to back through ixgbe. Traffic were generated
on one remote ixgbe through MoonGen and measure the RX pps through
testpmd in guest when do xdp_redirect_map from local ixgbe to
tap. RX pps were increased from 3.05 Mpps to 4.00 Mpps (about 31%
improvement).
One possible concern for this is the implications for TCP (especially
latency sensitive workload). Result[1] does not show obvious changes
for most of the netperf test (RR, TX, and RX). And we do get some
improvements for RX on some specific size.
Guest RX:
size/sessions/+thu%/+normalize%
64/ 1/ +2%/ +2%
64/ 2/ +2%/ -1%
64/ 4/ +1%/ +1%
64/ 8/ 0%/ 0%
256/ 1/ +6%/ -3%
256/ 2/ -3%/ +2%
256/ 4/ +11%/ +11%
256/ 8/ 0%/ 0%
512/ 1/ +4%/ 0%
512/ 2/ +2%/ +2%
512/ 4/ 0%/ -1%
512/ 8/ -8%/ -8%
1024/ 1/ -7%/ -17%
1024/ 2/ -8%/ -7%
1024/ 4/ +1%/ 0%
1024/ 8/ 0%/ 0%
2048/ 1/ +30%/ +14%
2048/ 2/ +46%/ +40%
2048/ 4/ 0%/ 0%
2048/ 8/ 0%/ 0%
4096/ 1/ +23%/ +22%
4096/ 2/ +26%/ +23%
4096/ 4/ 0%/ +1%
4096/ 8/ 0%/ 0%
16384/ 1/ -2%/ -3%
16384/ 2/ +1%/ -4%
16384/ 4/ -1%/ -3%
16384/ 8/ 0%/ -1%
65535/ 1/ +15%/ +7%
65535/ 2/ +4%/ +7%
65535/ 4/ 0%/ +1%
65535/ 8/ 0%/ 0%
TCP_RR:
size/sessions/+thu%/+normalize%
1/ 1/ 0%/ +1%
1/ 25/ +2%/ +1%
1/ 50/ +4%/ +1%
64/ 1/ 0%/ -4%
64/ 25/ +2%/ +1%
64/ 50/ 0%/ -1%
256/ 1/ 0%/ 0%
256/ 25/ 0%/ 0%
256/ 50/ +4%/ +2%
Guest TX:
size/sessions/+thu%/+normalize%
64/ 1/ +4%/ -2%
64/ 2/ -6%/ -5%
64/ 4/ +3%/ +6%
64/ 8/ 0%/ +3%
256/ 1/ +15%/ +16%
256/ 2/ +11%/ +12%
256/ 4/ +1%/ 0%
256/ 8/ +5%/ +5%
512/ 1/ -1%/ -6%
512/ 2/ 0%/ -8%
512/ 4/ -2%/ +4%
512/ 8/ +6%/ +9%
1024/ 1/ +3%/ +1%
1024/ 2/ +3%/ +9%
1024/ 4/ 0%/ +7%
1024/ 8/ 0%/ +7%
2048/ 1/ +8%/ +2%
2048/ 2/ +3%/ -1%
2048/ 4/ -1%/ +11%
2048/ 8/ +3%/ +9%
4096/ 1/ +8%/ +8%
4096/ 2/ 0%/ -7%
4096/ 4/ +4%/ +4%
4096/ 8/ +2%/ +5%
16384/ 1/ -3%/ +1%
16384/ 2/ -1%/ -12%
16384/ 4/ -1%/ +5%
16384/ 8/ 0%/ +1%
65535/ 1/ 0%/ -3%
65535/ 2/ +5%/ +16%
65535/ 4/ +1%/ +2%
65535/ 8/ +1%/ -1%
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx5-updates-2018-01-08
Four patches from Or that add Hairpin support to mlx5:
===========================================================
From: Or Gerlitz <ogerlitz@mellanox.com>
We refer the ability of NIC HW to fwd packet received on one port to
the other port (also from a port to itself) as hairpin. The application API
is based
on ingress tc/flower rules set on the NIC with the mirred redirect
action. Other actions can apply to packets during the redirect.
Hairpin allows to offload the data-path of various SW DDoS gateways,
load-balancers, etc to HW. Packets go through all the required
processing in HW (header re-write, encap/decap, push/pop vlan) and
then forwarded, CPU stays at practically zero usage. HW Flow counters
are used by the control plane for monitoring and accounting.
Hairpin is implemented by pairing a receive queue (RQ) to send queue (SQ).
All the flows that share <recv NIC, mirred NIC> are redirected through
the same hairpin pair. Currently, only header-rewrite is supported as a
packet modification action.
I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing this
functionality
on HW simulator, before it was avail in the FW so the driver code could be
tested early.
===========================================================
From Feras three patches that provide very small changes that allow IPoIB
to support RX timestamping for child interfaces, simply by hooking the mlx5e
timestamping PTP ioctl to IPoIB child interface netdev profile.
One patch from Gal to fix a spilling mistake.
Two patches from Eugenia adds drop counters to VF statistics
to be reported as part of VF statistics in netlink (iproute2) and
implemented them in mlx5 eswitch.
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2018-01-09
This series contains updates to ixgbe and ixgbevf only.
Emil fixes an issue with "wake on LAN"(WoL) where we need to ensure we
enable the reception of multicast packets so that WoL works for IPv6
magic packets. Cleaned up code no longer needed with the update to
adaptive ITR.
Paul update the driver to advertise the highest capable link speed
when a module gets inserted. Also extended the displaying of firmware
version to include the iSCSI and OEM block in the EEPROM to better
identify firmware versions/images.
Tonghao Zhang cleans up a code comment that no longer applies since
InterruptThrottleRate has been removed from the driver.
Alex fixes SR-IOV and MACVLAN offload interaction, where the MACVLAN
offload was incorrectly configuring several filters with the wrong
pool value which resulted in MACLVAN interfaces not being able to
receive traffic that had to pass over the physical interface. Fixed
transmit hangs and dropped receive frames when the number of VFs
changed. Added support for RSS on MACVLAN pools for X550 devices.
Fixed up the MACVLAN limitations so we can now support 63 offloaded
devices. Cleaned up MACVLAN code that is no longer needed with the
recent changes and fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
So far rpm doesn't cover cases like unused ports which are never
brought up. If they are active at probe time they remain in this state.
Included in this patch:
- Let the idle notification check whether we can suspend and let it
schedule the suspend. This way we don't need to have calls to
pm_schedule_suspend in different places.
- At the end of rtl_open and rtl_init_one send an idle notification
to allow suspending if the link is down. If a cable is plugged in
aneg is finished before the suspend timer expires and the suspend
request is cancelled.
- Change rtl8169_runtime_suspend to power down the chip if the
interface is down.
Successfully tested on a RTL8168evl (mac version 34).
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch partially reverts commit e4fbce740f "r8169: Fix runtime
power management" from 2010. At that time the suspend delay was 100ms
and therefore suspending happened during initial aneg. Currently
suspend delay is 5s, so suspend starts after aneg and the issue
doesn't exist any longer. On my system aneg takes almost 3s, to be on
the safe side let's increase the suspend delay to 10s.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reverts commit 2a15cd2ff4 "r8169: runtime resume before
shutdown" from 2012. Few months after this change the underlying issue
was solved in the PCI core with commit 3ff2de9ba1 "PCI/PM: Resume
device before shutdown".
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the checks are done in upper layer ethtool code,
checks in driver are not needed any more.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In current implementation, any requested RX/TX ring size value
that is less than minimum is silently casted to nearest valid value.
Update this behavior to align with mlx5 behavior by printing warning
in dmesg and remaining the size unchanged.
Kernel is responsible for verifying against the maximum.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The l2 acceleration private pointer isn't needed in the ring struct. It
isn't really used anywhere other than to test and see if we are supporting
an offloaded macvlan netdev, and it is much easier to test netdev for not
being ixgbe based to verify that.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch simplifies the check for Tx pending traffic and makes it more
holistic as there being any difference between next_to_use and
next_to_clean is much more informative than if head and tail are equal, as
it is possible for us to either not update tail, or not be notified of
completed work in which case next_to_clean would not be equal to head.
In addition the simplification makes it so that we don't have to read
hardware which allows us to drop a number of variables that were previously
being used in the call.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is a fix of the macvlan offload so that we correctly handle
macvlan offloaded devices. Specifically we were configuring our limits based
on the assumption that we were going to max out the RSS indices for every
mode. As a result when we went to 15 or more macvlan interfaces we were
forced into the 2 queue RSS mode on VFs even though they could have still
supported 4.
This change splits the logic up so that we limit either the total number of
macvlan instances if DCB is enabled, or limit the number of RSS queues used
per macvlan (instead of per pool) if SR-IOV is enabled. By doing this we
can make best use of the part.
In addition I have increased the maximum number of supported interfaces to
63 with one queue per offloaded interface as this more closely reflects the
actual values supported by the interface.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The num_rx_pools value is overwritten when we reinitialize the queue
configuration. In reality we shouldn't need to be updating the value since
it is redone every time we call into ixgbe_setup_tc so for now just drop
the spots where we were incrementing or decrementing the value.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In order for RSS to work on the macvlan pools of the X550 we need to
populate the MRQC, RETA, and RSS key values for each pool. This patch makes
it so that we now take care of that.
In addition I have dropped the macvlan specific configuration of psrtype
since it is redundant with the code that already exists for configuring
this value.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If the number of VFs are changed we need to reinitialize the part since the
offset for the device and the number of pools will be incorrect. Without
this change we can end up seeing Tx hangs and dropped Rx frames for
incoming traffic.
In addition we should drop the code that is arbitrarily changing the
default pool and queue configuration. Instead we should wait until the port
is reset and reconfigured via ixgbe_sriov_reinit.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use the ARRAY_SIZE macro on array cmd_priv_map to determine size of the
array. Improvement suggested by coccinelle.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When SR-IOV was enabled the macvlan offload was configuring several filters
with the wrong pool value. This would result in the macvlan interfaces not
being able to receive traffic that had to pass over the physical interface.
To fix it wrap the pool argument in the VMDQ_P macro which will add the
necessary offset to get to the actual VMDq pool
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The ability to set speed and duplex for virtio_net is useful in various
scenarios as described here:
16032be virtio_net: add ethtool support for set and get of settings
However, it would be nice to be able to set this from the hypervisor,
such that virtio_net doesn't require custom guest ethtool commands.
Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
the hypervisor to export a linkspeed and duplex setting. The user can
subsequently overwrite it later if desired via: 'ethtool -s'.
Note that VIRTIO_NET_F_SPEED_DUPLEX is defined as bit 63, the intention
is that device feature bits are to grow down from bit 63, since the
transports are starting from bit 24 and growing up.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
Signed-off-by: David S. Miller <davem@davemloft.net>