[ Upstream commit 57f6f99fda ]
Avoid exiting the function with a lingering sysfs file (if the first
call to device_create_file() fails while the second succeeds), and avoid
calling devlink_port_unregister() twice.
In other words, either mlx4_init_port_info() succeeds and returns zero, or
it fails, returns non-zero, and requires no cleanup.
Fixes: 096335b3f9 ("mlx4_core: Allow dynamic MTU configuration for IB ports")
Signed-off-by: Tarick Bedeir <tarick@google.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1ccef350db ]
For ICMPv4, the checksum is calculated from the ICMP headers and data.
Since the ICMPv4 checksum doesn't cover the IP header, we can allow to
do L3 header re-write for this protocol.
Fixes: bdd66ac0ae ('net/mlx5e: Disallow TC offloading of unsupported match/action combinations')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9c26f5f89d ]
When we fail to initialize the RX root namespace, we need
to clean only that and not the entire flow steering.
Currently the code may try to clean the flow steering twice
on error witch leads to null pointer deference.
Make sure we clean correctly.
Fixes: fba53f7b57 ("net/mlx5: Introduce mlx5_flow_steering structure")
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d9a96ec362 ]
In case of a dma_mapping_error, do not use wi->num_dma
as a parameter for dma unmap function because it's yet
to be set, and holds an out-of-date value.
Use actual value (local variable num_dma) instead.
Fixes: 34802a42b3 ("net/mlx5e: Do not modify the TX SKB")
Fixes: e586b3b0ba ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d89a2adb8b ]
tg3_free_consistent() calls dma_free_coherent() to free tp->hw_stats
under spinlock and can trigger BUG_ON() in vunmap() because vunmap()
may sleep. Fix it by removing the spinlock and relying on the
TG3_FLAG_INIT_COMPLETE flag to prevent race conditions between
tg3_get_stats64() and tg3_free_consistent(). TG3_FLAG_INIT_COMPLETE
is always cleared under tp->lock before tg3_free_consistent()
and therefore tg3_get_stats64() can safely access tp->hw_stats
under tp->lock if TG3_FLAG_INIT_COMPLETE is set.
Fixes: f5992b72eb ("tg3: Fix race condition in tg3_get_stats64().")
Reported-by: Zumeng Chen <zumeng.chen@gmail.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3148dedfe7 ]
Since commit a92a08499b "r8169: improve runtime pm in general and
suspend unused ports" interfaces w/o link are runtime-suspended after
10s. On systems where drivers take longer to load this can lead to the
situation that the interface is runtime-suspended already when it's
initially brought up.
This shouldn't be a problem because rtl_open() resumes MAC/PHY.
However with at least one chip version the interface doesn't properly
come up, as reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=199549
The vendor driver uses a delay to give certain chip versions some
time to resume before starting the PHY configuration. So let's do
the same. I don't know which chip versions may be affected,
therefore apply this delay always.
This patch was reported to fix the issue for RTL8168h.
I was able to reproduce the issue on an Asus H310I-Plus which also
uses a RTL8168h. Also in my case the patch fixed the issue.
Reported-by: Slava Kardakov <ojab@ojab.ru>
Tested-by: Slava Kardakov <ojab@ojab.ru>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f85900c3e1 ]
The HW doesn't support matching on frag first/later, return error if we are
asked to offload that.
Fixes: 3f7d0eb42d ("net/mlx5e: Offload TC matching on packets being IP fragments")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6ad4e91c6d ]
Add check of coalescing parameters received through ethtool are within
range of values supported by the HW.
Driver gets the coalescing rx/tx-usecs and rx/tx-frames as set by the
users through ethtool. The ethtool support up to 32 bit value for each.
However, mlx4 modify cq limits the coalescing time parameter and
coalescing frames parameters to 16 bits.
Return out of range error if user tries to set these parameters to
higher values.
Change type of sample-interval and adaptive_rx_coal parameters in mlx4
driver to u32 as the ethtool holds them as u32 and these parameters are
not limited due to mlx4 HW.
Fixes: c27a02cd94 ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC')
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a577d868b7 ]
If an error occurs, 'mlx4_en_destroy_netdev()' is called.
It then calls 'mlx4_en_free_resources()' which does the needed resources
cleanup.
So, doing some explicit kfree in the error handling path would lead to
some double kfree.
Simplify code to avoid such a case.
Fixes: 67f8b1dcb9 ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5e5add172e ]
In dual_mac mode packets arrived on one port should not be forwarded by
switch hw to another port. Only Linux Host can forward packets between
ports. The below test case (reported in [1]) shows that packet arrived on
one port can be leaked to anoter (reproducible with dual port evms):
- connect port 1 (eth0) to linux Host 0 and run tcpdump or Wireshark
- connect port 2 (eth1) to linux Host 1 with vlan 1 configured
- ping <IPx> from Host 1 through vlan 1 interface.
ARP packets will be seen on Host 0.
Issue happens because dual_mac mode is implemnted using two vlans: 1 (Port
1+Port 0) and 2 (Port 2+Port 0), so there are vlan records created for for
each vlan. By default, the ALE will find valid vlan record in its table
when vlan 1 tagged packet arrived on Port 2 and so forwards packet to all
ports which are vlan 1 members (like Port.
To avoid such behaviorr the ALE VLAN ID Ingress Check need to be enabled
for each external CPSW port (ALE_PORTCTLn.VID_INGRESS_CHECK) so ALE will
drop ingress packets if Rx port is not VLAN member.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 14224923c3 ]
Currently, skb->len and skb->data_len are set to the page size, not
the packet size. This causes the frame check sequence to not be
located at the "end" of the packet resulting in ethernet frame check
errors. The driver does work currently, but stricter kernel facing
networking solutions like OpenVSwitch will drop these packets as
invalid.
These changes set the packet size correctly so that these errors no
longer occur. The length does not include the frame check sequence, so
that subtraction was removed.
Tested on Oracle/SUN Multithreaded 10-Gigabit Ethernet Network
Controller [108e:abcd] and validated in wireshark.
Signed-off-by: Rob Taglang <rob@taglang.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5e391dc5a8 ]
The CPDMA_TX_PRIORITY_MAP in real is vlan pcp field priority mapping
register and basically replaces vlan pcp field for tagged packets.
So, set it to be 1:1 mapping. Otherwise, it will cause unexpected
change of egress vlan tagged packets, like prio 2 -> prio 5.
Fixes: e05107e6b7 ("net: ethernet: ti: cpsw: add multi queue support")
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 117df655f8 ]
The SFP eeprom indicates the transceiver signals (Rx LOS, Tx Fault, etc.)
that it supports. Update the driver to include checking the eeprom data
when deciding whether to use a transceiver signal.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 96f4d430c5 ]
Update xgbe-phy-v2.c to make use of the auto-negotiation (AN) phy hooks
to improve the ability to successfully complete Clause 73 AN when running
at 10gbps. Hardware can sometimes have issues with CDR lock when the
AN DME page exchange is being performed.
The AN and KR training hooks are used as follows:
- The pre AN hook is used to disable CDR tracking in the PHY so that the
DME page exchange can be successfully and consistently completed.
- The post KR training hook is used to re-enable the CDR tracking so that
KR training can successfully complete.
- The post AN hook is used to check for an unsuccessful AN which will
increase a CDR tracking enablement delay (up to a maximum value).
Add two debugfs entries to allow control over use of the CDR tracking
workaround. The debugfs entries allow the CDR tracking workaround to
be disabled and determine whether to re-enable CDR tracking before or
after link training has been initiated.
Also, with these changes the receiver reset cycle that is performed during
the link status check can be performed less often.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4d945663a6 ]
Add hooks to the driver auto-negotiation (AN) flow to allow the different
phy implementations to perform any steps necessary to improve AN.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 028daf8011 upstream.
Fix for "Resource temporarily unavailable" problem when virsh is
trying to attach a device to VM. When the VF driver is loaded on
host and virsh is trying to attach it to the VM and set a MAC
address, it ends with a race condition between i40e_reset_vf and
i40e_ndo_set_vf_mac functions. The bug is fixed by adding polling
in i40e_ndo_set_vf_mac function For when the VF is in Reset mode.
Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1b84ca1875 ]
The interrupt status register in both dwmac1000 and dwmac4 ignores
interrupt enable (for dwmac4) / interrupt mask (for dwmac1000).
Therefore, if we want to check only the bits that can actually trigger
an irq, we have to filter the interrupt status register manually.
Commit 0a764db103 ("stmmac: Discard masked flags in interrupt status
register") fixed this for dwmac1000. Fix the same issue for dwmac4.
Just like commit 0a764db103 ("stmmac: Discard masked flags in
interrupt status register"), this makes sure that we do not get
spurious link up/link down prints.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 202a0a70e4 ]
When the frame check sequence (FCS) is split across the last two frames
of a fragmented packet, part of the FCS gets counted twice, once when
subtracting the FCS, and again when subtracting the previously received
data.
For example, if 1602 bytes are received, and the first fragment contains
the first 1600 bytes (including the first two bytes of the FCS), and the
second fragment contains the last two bytes of the FCS:
'skb->len == 1600' from the first fragment
size = lstatus & BD_LENGTH_MASK; # 1602
size -= ETH_FCS_LEN; # 1598
size -= skb->len; # -2
Since the size is unsigned, it wraps around and causes a BUG later in
the packet handling, as shown below:
kernel BUG at ./include/linux/skbuff.h:2068!
Oops: Exception in kernel mode, sig: 5 [#1]
...
NIP [c021ec60] skb_pull+0x24/0x44
LR [c01e2fbc] gfar_clean_rx_ring+0x498/0x690
Call Trace:
[df7edeb0] [c01e2c1c] gfar_clean_rx_ring+0xf8/0x690 (unreliable)
[df7edf20] [c01e33a8] gfar_poll_rx_sq+0x3c/0x9c
[df7edf40] [c023352c] net_rx_action+0x21c/0x274
[df7edf90] [c0329000] __do_softirq+0xd8/0x240
[df7edff0] [c000c108] call_do_irq+0x24/0x3c
[c0597e90] [c00041dc] do_IRQ+0x64/0xc4
[c0597eb0] [c000d920] ret_from_except+0x0/0x18
--- interrupt: 501 at arch_cpu_idle+0x24/0x5c
Change the size to a signed integer and then trim off any part of the
FCS that was received prior to the last fragment.
Fixes: 6c389fc931 ("gianfar: fix size of scatter-gathered frames")
Signed-off-by: Andy Spencer <aspencer@spacex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 40339af33c ]
In commit 36777d9fa2 ("i40e: check current configured input set when
adding ntuple filters") some code was added to report the input set
mask for a given filter when reporting it to the user.
This code is necessary so that the reported filter correctly displays
that it is or is not masking certain fields.
Unfortunately the code was incorrect. Development error accidentally
swapped the mask values for the IPv4 addresses with the L4 port numbers.
The port numbers are only 16bits wide while IPv4 addresses are 32 bits.
Unfortunately we assigned only 16 bits to the IPv4 address masks.
Additionally we assigned 32bit value 0xFFFFFFF to the TCP port numbers.
This second part does not matter as the value would be truncated to
16bits regardless, but it is unnecessary.
Fix the reported masks to properly report that the entire field is
masked.
Fixes: 36777d9fa2 ("i40e: check current configured input set when adding ntuple filters")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 02b4016bfe ]
When implementing support for IP_USER_FLOW filters, we correctly
programmed a filter for both the non fragmented IPv4/Other filter, as
well as the fragmented IPv4 filters. However, we did not properly
program the input set for fragmented IPv4 PCTYPE. This meant that the
filters would almost certainly not match, unless the user specified all
of the flow types.
Add support to program the fragmented IPv4 filter input set. Since we
always program these filters together, we'll assume that the two input
sets must match, and will thus always program the input sets to the same
value.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2bafa8fac1 ]
commit 2de6aa3a66 ("ixgbe: Add support for padding packet")
Uses RXDCTL.RLPML to limit the maximum frame size on Rx when using
build_skb. Unfortunately that register does not work on 82599.
Added an explicit check to avoid setting this register on 82599 MAC.
Extended the comment related to the setting of RXDCTL.RLPML to better
explain its purpose.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cf315ea596 ]
When a VF is under PF VLAN assignment:
ip link set <pf> vf <#> vlan <vid>
This will remove all previous entries in the VLAN table including those
generated by VLAN interfaces created on the VF. The issue arises when
the VF is under PF VLAN assignment and one or more of these VLAN
interfaces of the VF are deleted. When deleting these VLAN interfaces,
the following message will be generated in "dmesg":
failed to kill vid 0081/<vid> for device <vf>
This is due to the fact that "ndo_vlan_rx_kill_vid" exits with an error.
The handler for this ndo is "fm10k_update_vid". Any calls to this
function while under PF VLAN management will exit prematurely and, thus,
it will generate the failure message.
Additionally, since "fm10k_update_vid" exits prematurely, none of the
VLAN update is performed. So, even though the actual VLAN interfaces of
the VF will be deleted, the active_vlans bitmask is not cleared. When
the VF is no longer under PF VLAN assignment, the driver mistakenly
restores the previous entries of the VLAN table based on an
unsynchronized list of active VLANs.
The solution to this issue involves checking the VLAN update action type
before exiting "fm10k_update_vid". If the VLAN update action type is to
"add", this action will not be permitted while the VF is under PF VLAN
assignment and the VLAN update is abandoned like before.
However, if the VLAN update action type is to "kill", then we need to
also clear the active_vlans bitmask. However, we don't need to actually
queue any messages to the PF, because the MAC and VLAN tables have
already been cleared, and the PF would silently ignore these requests
anyways.
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3a53285228 ]
Problem description:
After ethernet cable connect and disconnect for several iterations on a
device with i210, tx timestamp will stop being put into the socket.
Steps to reproduce:
1. Setup a device with i210 and wire it to a 802.1AS capable switch (
Extreme Networks Summit x440 is used in our case)
2. Have the gptp daemon running on the device and make sure it is synced
with the switch
3. Have the switch disable and enable the port, wait for the device gets
resynced with the switch
4. Iterates step 3 until the device is not albe to get resynced
5. Review the log in dmesg and you will see warning message "igb : clearing
Tx timestamp hang"
Root cause:
If ptp_tx_work() gets scheduled just before the port gets disabled, a LINK
DOWN event will be processed before ptp_tx_work(), which may cause timeout
in ptp_tx_work(). In the timeout logic, the TSYNCTXCTL's TXTT bit (Transmit
timestamp valid bit) is not cleared, causing no new timestamp loaded to
TXSTMP register. Consequently therefore, no new interrupt is triggerred by
TSICR.TXTS bit and no more Tx timestamp send to the socket.
Signed-off-by: Daniel Hua <daniel.hua@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 177132df5e ]
Before libvirt modifies the MAC address and vlan tag for an SRIOV VF
for use by a virtual machine (either using vfio device assignment or
macvtap passthru mode), it saves the current MAC address and vlan tag
so that it can reset them to their original value when the guest is
done. Libvirt can't leave the VF MAC set to the value used by the
now-defunct guest since it may be started again later using a
different VF, but it certainly shouldn't just pick any random value,
either. So it saves the state of everything prior to using the VF, and
resets it to that.
The igb driver initializes the MAC addresses of all VFs to
00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK
netlink message, also visible in the list of VFs in the output of "ip
link show"). But when libvirt attempts to restore the MAC address back
to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel
responds with "Invalid argument".
Forbidding a reset back to the original value leaves the VF MAC at the
value set for the now-defunct virtual machine. Especially on a system
with NetworkManager enabled, this has very bad consequences, since
NetworkManager forces all interfacess to be IFF_UP all the time - if
the same virtual machine is restarted using a different VF (or even on
a different host), there will be multiple interfaces watching for
traffic with the same MAC address.
To allow libvirt to revert to the original state, we need a way to
remove the administrative set MAC on a VF, to allow normal host
operation again, and to reset/overwrite the VF MAC via VF netdev.
This patch implements the outlined scenario by allowing to set the
VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF.
igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0,
so it's possible to reset the VF MAC back to the original value via
the VF netdev.
Note: Recent patches to libvirt allow for a workaround if the NIC
isn't capable of resetting the administrative MAC back to all 0, but
in theory the NIC should allow resetting the MAC in the first place.
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Tested-by: Aaron Brown <arron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 06aa040f03 ]
When a host disables and enables a PF device, all the associated
VFs are removed and added back in. It also generates a PFR which in turn
resets all the connected VFs. This behaviour is different from that of
Linux guest on Linux host. Hence we end up in a situation where there's
a PFR and device removal at the same time. And watchdog doesn't have a
clue about this and schedules a reset_task. This patch adds code to send
signal to reset_task that the device is currently being removed.
Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fb7d38a70e ]
On Meson8b the only valid input clock is MPLL2. The bootloader
configures that to run at 500002394Hz which cannot be divided evenly
down to 125MHz using the m250_div clock. Currently the common clock
framework chooses a m250_div of 2 - with the internal fixed
"divide by 10" this results in a RGMII TX clock of 125001197Hz (120Hz
above the requested 125MHz).
Letting the common clock framework propagate the rate changes up to the
parent of m250_mux allows us to get the best possible clock rate. With
this patch the common clock framework calculates a rate of
very-close-to-250MHz (249999701Hz to be exact) for the MPLL2 clock
(which is the mux input). Dividing that by 2 (which is an internal,
fixed divider for the RGMII TX clock) gives us an RGMII TX clock of
124999850Hz (which is only 150Hz off the requested 125MHz, compared to
1197Hz based on the MPLL2 rate set by u-boot and the Amlogic GPL kernel
sources).
SoCs from the Meson GX series are not affected by this change because
the input clock is FCLK_DIV2 whose rate cannot be changed (which is fine
since it's running at 1GHz, so it's already a multiple of 250MHz and
125MHz).
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Suggested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 433c6cab9d ]
Meson8b only supports MPLL2 as clock input. The rate of the MPLL2 clock
set by Odroid-C1's u-boot is close to (but not exactly) 500MHz. The
exact rate is 500002394Hz, which is calculated in
drivers/clk/meson/clk-mpll.c using the following formula:
DIV_ROUND_UP_ULL((u64)parent_rate * SDM_DEN, (SDM_DEN * n2) + sdm)
Odroid-C1's u-boot configures MPLL2 with the following values:
- SDM_DEN = 16384
- SDM = 1638
- N2 = 5
The 250MHz clock (m250_div) inside dwmac-meson8b driver is derived from
the MPLL2 clock. Due to MPLL2 running slightly faster than 500MHz the
common clock framework chooses a divider which is too big to generate
the 250MHz clock (a divider of 2 would be needed, but this is rounded up
to a divider of 3). This breaks the RTL8211F RGMII PHY on Odroid-C1
because it requires a (close to) 125MHz RGMII TX clock (on Gbit speeds,
the IP block internally divides that down to 25MHz on 100Mbit/s
connections and 2.5MHz on 10Mbit/s connections - we don't need any
special configuration for that).
Round the divider to the closest value to prevent this issue on Meson8b.
This means we'll now end up with a clock rate for the RGMII TX clock of
125001197Hz (= 125MHz plus 1197Hz), which is close-enough to 125MHz.
This has no effect on the Meson GX SoCs since there fclk_div2 is used as
input clock, which has a rate of 1000MHz (and thus is divisible cleanly
to 250MHz and 125MHz).
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Reported-by: Emiliano Ingrassia <ingrassia@epigenesys.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 461d5f1b59 ]
mlx4_delete_all_resources_for_slave in resource tracker should free all
memory allocated for a slave.
While releasing memory of fs_rule, it misses releasing memory of
fs_rule->mirr_mbox.
Fixes: 78efed2751 ('net/mlx4_core: Support mirroring VF DMFS rules on both ports')
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4246f698dd ]
Increase representor netdev RQ size to avoid dropped packets.
The current size (two) is just too small to keep up with
conventional slow path traffic patterns.
Also match the SQ size to the RQ size.
Fixes: cb67b83292 ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6e8814ceb7 ]
Global pause and PFC configuration should be mutually exclusive (i.e. only
one of them at most can be set). However, once PFC was turned off,
driver automatically turned Global pause on. This is a bug.
Fix the driver behaviour to turn off PFC/Global once the user turned the
other on.
This also fixed a weird behaviour that at a current time, the profile
had both PFC and global pause configuration turned on, which is
Hardware-wise impossible and caused returning false positive indication
to query tools.
In addition, fix error code when setting global pause or PFC to change
metadata only upon successful change.
Also, removed useless debug print.
Fixes: af7d518526 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
Fixes: c27a02cd94 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a117f73dc2 ]
When mlx5_core is loaded it is expected to sync ports
with all vxlan devices so it can support vxlan encap/decap.
This is done via udp_tunnel_get_rx_info(). Currently this
call is set in mlx5e_nic_enable() and if the netdev is not in
NETREG_REGISTERED state it will not be called.
Normally on load the netdev state is not NETREG_REGISTERED
so udp_tunnel_get_rx_info() will not be called.
Moving udp_tunnel_get_rx_info() to mlx5e_open() so
it will be called on netdev UP event and allow encap/decap.
Fixes: 610e89e05c ("net/mlx5e: Don't sync netdev state when not registered")
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The vport admin original link state will be re-applied after returning
back to legacy mode, it is not right to change the admin link state value
when in switchdev mode.
Use direct vport commands to alter logical vport state in netdev
representor open/close flows rather than the administrative eswitch API.
Fixes: 20a1ea6747 ('net/mlx5e: Support VF vport link state control for SRIOV switchdev mode')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1489bbd10e ]
The NSP default buffer is a piece of NFP memory where additional
command data can be placed. Its format has been copied from
host buffer, but the PCIe selection bits do not make sense in
this case. If those get masked out from a NFP address - writes
to random place in the chip memory may be issued and crash the
device.
Even in the general NSP buffer case, it doesn't make sense to have the
PCIe selection bits there anymore. These are unused at the moment, and
when it becomes necessary, the PCIe selection bits should rather be
moved to another register to utilise more bits for the buffer address.
This has never been an issue because the buffer used to be
allocated in memory with less-than-38-bit-long address but that
is about to change.
Fixes: 1a64821c6a ("nfp: add support for service processor access")
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit af1607c37d ]
For NIC flows, the parsed attributes are not freed when we exit
successfully from mlx5e_configure_flower().
There is possible double free for eswitch flows. If error is returned
from rhashtable_insert_fast(), the parse attrs will be freed in
mlx5e_tc_del_flow(), but they will be freed again before exiting
mlx5e_configure_flower().
To fix both issues we do the following:
(1) change the condition that determines if to issue the free call to
check if this flow is NIC flow, or it does not have encap action.
(2) reorder the code such that that the check and free calls are done
before we attempt to add into the hash table.
Fixes: 232c001398 ('net/mlx5e: Add support to neighbour update flow')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 423c9db299 ]
Currently we use the global ipv6_stub var to access the ipv6 global
nd table. This practice gets us to troubles when the stub is only partially
set e.g when ipv6 is loaded under the disabled policy. In this case, as of commit
343d60aada ("ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument")
the stub is not null, but stub->nd_tbl is and we crash.
As we can access the ipv6 nd_tbl directly, the fix is just to avoid the
reference through the stub. There is one place in the code where we
issue ipv6 route lookup and keep doing it through the stub, but that
mentioned commit makes sure we get -EAFNOSUPPORT from the stack.
Fixes: 232c001398 ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 19c9ea363a ]
pci_set_drvdata() is called only after registering the net_device,
therefore we could run into a NPE if one of the functions using
driver_data is called before it's set.
Fix this by calling pci_set_drvdata() before registering the
net_device.
This fix is a candidate for stable. As far as I can see the
bug has been there in kernel version 3.2 already, therefore
I can't provide a reference which commit is fixed by it.
The fix may need small adjustments per kernel version because
due to other changes the label which is jumped to if
register_netdev() fails has changed over time.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 09fb35ead5 ]
Initiating a kdump via the command line can cause a pending interrupt
to be handled by the ibmvnic driver when initializing the sub-CRQ
irqs during driver initialization.
NIP [d000000000ca34f0] ibmvnic_interrupt_rx+0x40/0xd0 [ibmvnic]
LR [c000000008132ef0] __handle_irq_event_percpu+0xa0/0x2f0
Call Trace:
[c000000047fcfde0] [c000000008132ef0] __handle_irq_event_percpu+0xa0/0x2f0
[c000000047fcfea0] [c00000000813317c] handle_irq_event_percpu+0x3c/0x90
[c000000047fcfee0] [c00000000813323c] handle_irq_event+0x6c/0xd0
[c000000047fcff10] [c0000000081385e0] handle_fasteoi_irq+0xf0/0x250
[c000000047fcff40] [c0000000081320a0] generic_handle_irq+0x50/0x80
[c000000047fcff60] [c000000008014984] __do_irq+0x84/0x1d0
[c000000047fcff90] [c000000008027564] call_do_irq+0x14/0x24
[c00000003c92af00] [c000000008014b70] do_IRQ+0xa0/0x120
[c00000003c92af50] [c000000008002594] hardware_interrupt_common+0x114/0x180
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ea0a42109a ]
We'd come in with SGE_FL_BUFFER_SIZE[0] and [1] both equal to 64KB and
the extant logic would flag that as an error. This was already fixed in
cxgb4 driver with "92ddcc7 cxgb4: Fix some small bugs in
t4_sge_init_soft() when our Page Size is 64KB".
Original Work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 44b034b406 ]
In i40evf_reset_task we use netif_running() to determine whether or not
the device is currently up. This allows us to properly free queue memory
and shut down things before we request the hardware reset.
It turns out that we cannot be guaranteed of netif_running() returning
false until the device is fully up, as the kernel core code sets
__LINK_STATE_START prior to calling .ndo_open. Since we're not holding
the rtnl_lock(), it's possible that the driver's i40evf_open handler
function is currently being called while we're resetting.
We can't simply hold the rtnl_lock() while checking netif_running() as
this could cause a deadlock with the i40evf_open() function.
Additionally, we can't avoid the deadlock by holding the rtnl_lock()
over the whole reset path, as this essentially serializes all resets,
and can cause massive delays if we have multiple VFs on a system.
Instead, lets just check our own internal state __I40EVF_RUNNING state
field. This allows us to ensure that the state is correct and is only
set after we've finished bringing the device up.
Without this change we might free data structures about device queues
and other memory before they've been fully allocated.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 57ffee737b ]
The member "stats_offset" was designed to indicate the offset
of each member of struct ring_stats in struct hns3_enet_ring,
but forgot to add the offset of the member in struct ring_stats.
Fixes: 496d03e960 ("net: hns3: Add Ethtool support to HNS3 driver")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 734dc065fc ]
There are two potential problems with the existing implementation.
1. Enable and disable can race after the atomic operations.
2. If a command fails the refcount is left in an inconsistent state.
Introduce a lock and perform error checking.
Fixes: a6f7d2aff6 ("net/mlx5: Add support for multiple RoCE enable")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a42b63c1ac ]
Change the default mapping between TC and TCG as follows:
Prio | TC/TCG
| from to
| (set by FW) (set by SW)
---------+-----------------------------------
0 | 0/0 0/7
1 | 1/0 0/6
2 | 2/0 0/5
3 | 3/0 0/4
4 | 4/0 0/3
5 | 5/0 0/2
6 | 6/0 0/1
7 | 7/0 0/0
These new settings cause that a pause frame for any prio stops
traffic for all prios.
Fixes: 564c274c3d ("net/mlx4_en: DCB QoS support")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>