KVM irqfd based emulation of level-triggered interrupts doesn't work
quite correctly in some cases, particularly in the case of interrupts
that are handled in a Linux guest as oneshot interrupts (IRQF_ONESHOT).
Such an interrupt is acked to the device in its threaded irq handler,
i.e. later than it is acked to the interrupt controller (EOI at the end
of hardirq), not earlier.
Linux keeps such interrupt masked until its threaded handler finishes,
to prevent the EOI from re-asserting an unacknowledged interrupt.
However, with KVM + vfio (or whatever is listening on the resamplefd)
we always notify resamplefd at the EOI, so vfio prematurely unmasks the
host physical IRQ, thus a new physical interrupt is fired in the host.
This extra interrupt in the host is not a problem per se. The problem is
that it is unconditionally queued for injection into the guest, so the
guest sees an extra bogus interrupt. [*]
There are observed at least 2 user-visible issues caused by those
extra erroneous interrupts for a oneshot irq in the guest:
1. System suspend aborted due to a pending wakeup interrupt from
ChromeOS EC (drivers/platform/chrome/cros_ec.c).
2. Annoying "invalid report id data" errors from ELAN0000 touchpad
(drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
every time the touchpad is touched.
The core issue here is that by the time when the guest unmasks the IRQ,
the physical IRQ line is no longer asserted (since the guest has
acked the interrupt to the device in the meantime), yet we
unconditionally inject the interrupt queued into the guest by the
previous resampling. So to fix the issue, we need a way to detect that
the IRQ is no longer pending, and cancel the queued interrupt in this
case.
With IOAPIC we are not able to probe the physical IRQ line state
directly (at least not if the underlying physical interrupt controller
is an IOAPIC too), so in this patch we use irqfd resampler for that.
Namely, instead of injecting the queued interrupt, we just notify the
resampler that this interrupt is done. If the IRQ line is actually
already deasserted, we are done. If it is still asserted, a new
interrupt will be shortly triggered through irqfd and injected into the
guest.
In the case if there is no irqfd resampler registered for this IRQ, we
cannot fix the issue, so we keep the existing behavior: immediately
unconditionally inject the queued interrupt.
This patch fixes the issue for x86 IOAPIC only. In the long run, we can
fix it for other irqchips and other architectures too, possibly taking
advantage of reading the physical state of the IRQ line, which is
possible with some other irqchips (e.g. with arm64 GIC, maybe even with
the legacy x86 PIC).
[*] In this description we assume that the interrupt is a physical host
interrupt forwarded to the guest e.g. by vfio. Potentially the same
issue may occur also with a purely virtual interrupt from an
emulated device, e.g. if the guest handles this interrupt, again, as
a oneshot interrupt.
Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee-d2fde2700083@semihalf.com/
Link: https://lore.kernel.org/lkml/87o7wrug0w.wl-maz@kernel.org/
Message-Id: <20230322204344.50138-3-dmy@semihalf.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is useful to be able to do read-only traversal of the list of all the
registered irqfd resamplers without locking the resampler_lock mutex.
In particular, we are going to traverse it to search for a resampler
registered for the given irq of an irqchip, and that will be done with
an irqchip spinlock (ioapic->lock) held, so it is undesirable to lock a
mutex in this context. So turn this list into an RCU list.
For protecting the read side, reuse kvm->irq_srcu which is already used
for protecting a number of irq related things (kvm->irq_routing,
irqfd->resampler->list, kvm->irq_ack_notifier_list,
kvm->arch.mask_notifier_list).
Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
Message-Id: <20230322204344.50138-2-dmy@semihalf.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Hyper-V "EnlightenedNptTlb" enlightenment is always enabled when KVM
is running on top of Hyper-V and Hyper-V exposes support for it (which
is always). On AMD CPUs this enlightenment results in ASID invalidations
not flushing TLB entries derived from the NPT. To force the underlying
(L0) hypervisor to rebuild its shadow page tables, an explicit hypercall
is needed.
The original KVM implementation of Hyper-V's "EnlightenedNptTlb" on SVM
only added remote TLB flush hooks. This worked out fine for a while, as
sufficient remote TLB flushes where being issued in KVM to mask the
problem. Since v5.17, changes in the TDP code reduced the number of
flushes and the out-of-sync TLB prevents guests from booting
successfully.
Split svm_flush_tlb_current() into separate callbacks for the 3 cases
(guest/all/current), and issue the required Hyper-V hypercall when a
Hyper-V TLB flush is needed. The most important case where the TLB flush
was missing is when loading a new PGD, which is followed by what is now
svm_flush_tlb_current().
Cc: stable@vger.kernel.org # v5.17+
Fixes: 1e0c7d4075 ("KVM: SVM: hyper-v: Remote TLB flush for SVM")
Link: https://lore.kernel.org/lkml/43980946-7bbf-dcef-7e40-af904c456250@linux.microsoft.com/
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20230324145233.4585-1-jpiotrowski@linux.microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM/arm64 fixes for 6.3, part #2
Fixes for a rather interesting set of bugs relating to the MMU:
- Read the MMU notifier seq before dropping the mmap lock to guard
against reading a potentially stale VMA
- Disable interrupts when walking user page tables to protect against
the page table being freed
- Read the MTE permissions for the VMA within the mmap lock critical
section, avoiding the use of a potentally stale VMA pointer
Additionally, some fixes targeting the vPMU:
- Return the sum of the current perf event value and PMC snapshot for
reads from userspace
- Don't save the value of guest writes to PMCR_EL0.{C,P}, which could
otherwise lead to userspace erroneously resetting the vPMU during VM
save/restore
Syzkaller reported the following issue:
=====================================================
BUG: KMSAN: uninit-value in aio_rw_done fs/aio.c:1520 [inline]
BUG: KMSAN: uninit-value in aio_write+0x899/0x950 fs/aio.c:1600
aio_rw_done fs/aio.c:1520 [inline]
aio_write+0x899/0x950 fs/aio.c:1600
io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
__do_sys_io_submit fs/aio.c:2078 [inline]
__se_sys_io_submit+0x293/0x770 fs/aio.c:2048
__x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Uninit was created at:
slab_post_alloc_hook mm/slab.h:766 [inline]
slab_alloc_node mm/slub.c:3452 [inline]
__kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491
__do_kmalloc_node mm/slab_common.c:967 [inline]
__kmalloc+0x11d/0x3b0 mm/slab_common.c:981
kmalloc_array include/linux/slab.h:636 [inline]
bcm_tx_setup+0x80e/0x29d0 net/can/bcm.c:930
bcm_sendmsg+0x3a2/0xce0 net/can/bcm.c:1351
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
sock_write_iter+0x495/0x5e0 net/socket.c:1108
call_write_iter include/linux/fs.h:2189 [inline]
aio_write+0x63a/0x950 fs/aio.c:1600
io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
__do_sys_io_submit fs/aio.c:2078 [inline]
__se_sys_io_submit+0x293/0x770 fs/aio.c:2048
__x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
CPU: 1 PID: 5034 Comm: syz-executor350 Not tainted 6.2.0-rc6-syzkaller-80422-geda666ff2276 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
=====================================================
We can follow the call chain and find that 'bcm_tx_setup' function
calls 'memcpy_from_msg' to copy some content to the newly allocated
frame of 'op->frames'. After that the 'len' field of copied structure
being compared with some constant value (64 or 8). However, if
'memcpy_from_msg' returns an error, we will compare some uninitialized
memory. This triggers 'uninit-value' issue.
This patch will add 'memcpy_from_msg' possible errors processing to
avoid uninit-value issue.
Tested via syzkaller
Reported-by: syzbot+c9bfd85eca611ebf5db1@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=47f897f8ad958bbde5790ebf389b5e7e0a345089
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Fixes: 6f3b911d5f ("can: bcm: add support for CAN FD frames")
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230314120445.12407-1-ivan.orlov0322@gmail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
For platforms with Alder Lake PCH (Alder Lake S and Raptor Lake S) the
slp_s0_residency attribute has been reporting the wrong value. Unlike other
platforms, ADL PCH does not have a counter for the time that the SLP_S0
signal was asserted. Instead, firmware uses the aggregate of the Low Power
Mode (LPM) substate counters as the S0ix value. Since the LPM counters run
at a different frequency, this lead to misreporting of the S0ix time.
Add a check for Alder Lake PCH and adjust the frequency accordingly when
display slp_s0_residency.
Fixes: bbab31101f ("platform/x86/intel: pmc/core: Add Alderlake support to pmc core driver")
Signed-off-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Reviewed-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20230320212029.3154407-1-david.e.box@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Currently i915_gem_object_is_framebuffer() doesn't treat the
BO containing the framebuffer's DPT as a framebuffer itself.
This means eg. that the shrinker can evict the DPT BO while
leaving the actual FB BO bound, when the DPT is allocated
from regular shmem.
That causes an immediate oops during hibernate as we
try to rewrite the PTEs inside the already evicted
DPT obj.
TODO: presumably this might also be the reason for the
DPT related display faults under heavy memory pressure,
but I'm still not sure how that would happen as the object
should be pinned by intel_dpt_pin() while in active use by
the display engine...
Cc: stable@vger.kernel.org
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Fixes: 0dc987b699 ("drm/i915/display: Add smem fallback allocation for dpt")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230320090522.9909-2-ville.syrjala@linux.intel.com
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
(cherry picked from commit 779cb5ba64)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Keeping DC states enabled is incompatible with the _noarm()/_arm()
split we use for writing pipe/plane registers. When DC5 and PSR
are enabled, all pipe/plane registers effectively become self-arming
on account of DC5 exit arming the update, and PSR exit latching it.
What probably saves us most of the time is that (with PIPE_MISC[21]=0)
all pipe register writes themselves trigger PSR exit, and then
we don't re-enter PSR until the idle frame count has elapsed.
So it may be that the PSR exit happens already before we've
updated the state too much.
Also the PSR1 panel (at least on this KBL) seems to discard the first
frame we trasmit, presumably still scanning out from its internal
framebuffer at that point. So only the second frame we transmit is
actually visible. But I suppose that could also be panel specific
behaviour. I haven't checked out how other PSR panels behave, nor
did I bother to check what the eDP spec has to say about this.
And since this really is all about DC states, let's switch from
the MODESET domain to the DC_OFF domain. Functionally they are
100% identical. We should probably remove the MODESET domain...
And for good measure let's toss in an assert to the place where
we do the _noarm() register writes to make sure DC states are
in fact off.
v2: Just use intel_display_power_is_enabled() (Imre)
Cc: <stable@vger.kernel.org> #v5.17+
Cc: Manasi Navare <navaremanasi@google.com>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Fixes: d13dde4495 ("drm/i915: Split pipe+output CSC programming to noarm+arm pair")
Fixes: f8a005eb89 ("drm/i915: Optimize icl+ universal plane programming")
Fixes: 890b6ec4a5 ("drm/i915: Split skl+ plane update into noarm+arm pair")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230320183532.17727-1-ville.syrjala@linux.intel.com
(cherry picked from commit 41b4c7fe72)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
SKL/GLK CSC unit suffers from a nasty issue where a CSC
coeff/offset register read or write between DC5 exit and
PSR exit will undo the CSC arming performed by DMC, and
then during PSR exit the hardware will latch zeroes into
the active CSC registers. This causes any plane going
through the CSC to output all black.
We can sidestep the issue by making sure the PSR exit has
already actually happened before we touch the CSC coeff/offset
registers. Easiest way to guarantee that is to just move the
CSC programming back into the .color_commir_arm() as we force
a PSR exit (and crucially wait for it to actually happen)
prior to touching the arming registers.
When PSR (and thus also DC states) are disabled we don't
have anything to worry about, so we can keep using the
more optional _noarm() hook for writing the CSC registers.
Cc: <stable@vger.kernel.org> #v5.19+
Cc: Manasi Navare <navaremanasi@google.com>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Jouni Högander <jouni.hogander@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8283
Fixes: d13dde4495 ("drm/i915: Split pipe+output CSC programming to noarm+arm pair")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230320095438.17328-3-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
(cherry picked from commit 80a892a4c2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Expose intel_rps_read_actual_frequency_fw to read the actual freq without
taking forcewake for use by PMU. The code is refactored to use a common set
of functions across sysfs and PMU. Using common functions with sysfs in PMU
solves the issues of missing support for MTL and missing support for older
generations (prior to Gen6). It also future proofs the PMU where sometimes
code has been updated for sysfs and PMU has been missed.
v2: Remove runtime_pm_if_in_use from read_actual_frequency_fw (Tvrtko)
v3: (Tvrtko)
- Remove goto in __read_cagf
- Unexport intel_rps_get_cagf and intel_rps_read_punit_req
Fixes: 22009b6dad ("drm/i915/mtl: Modify CAGF functions for MTL")
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8280
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230316004800.2539753-1-ashutosh.dixit@intel.com
(cherry picked from commit 44df42e661)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The blamed commit has introduced the following tests to
dwmac4_add_hw_vlan_rx_fltr(), called from stmmac_vlan_rx_add_vid():
if (hw->promisc) {
netdev_err(dev,
"Adding VLAN in promisc mode not supported\n");
return -EPERM;
}
"VLAN promiscuous" mode is keyed in this driver to IFF_PROMISC, and so,
vlan_vid_add() and vlan_vid_del() calls cannot take place in IFF_PROMISC
mode. I have the following 2 arguments that this restriction is.... hm,
how shall I put it nicely... unproductive :)
First, take the case of a Linux bridge. If the kernel is compiled with
CONFIG_BRIDGE_VLAN_FILTERING=y, then this bridge shall have a VLAN
database. The bridge shall try to call vlan_add_vid() on its bridge
ports for each VLAN in the VLAN table. It will do this irrespectively of
whether that port is *currently* VLAN-aware or not. So it will do this
even when the bridge was created with vlan_filtering 0.
But the Linux bridge, in VLAN-unaware mode, configures its ports in
promiscuous (IFF_PROMISC) mode, so that they accept packets with any
MAC DA (a switch must do this in order to forward those packets which
are not directly targeted to its MAC address).
As a result, the stmmac driver does not work as a bridge port, when the
kernel is compiled with CONFIG_BRIDGE_VLAN_FILTERING=y.
$ ip link add br0 type bridge && ip link set br0 up
$ ip link set eth0 master br0 && ip link set eth0 up
[ 2333.943296] br0: port 1(eth0) entered blocking state
[ 2333.943381] br0: port 1(eth0) entered disabled state
[ 2333.943782] device eth0 entered promiscuous mode
[ 2333.944080] 4033c000.ethernet eth0: Adding VLAN in promisc mode not supported
[ 2333.976509] 4033c000.ethernet eth0: failed to initialize vlan filtering on this port
RTNETLINK answers: Operation not permitted
Secondly, take the case of stmmac as DSA master. Some switch tagging
protocols are based on 802.1Q VLANs (tag_sja1105.c), and as such,
tag_8021q.c uses vlan_vid_add() to work with VLAN-filtering DSA masters.
But also, when a DSA port becomes promiscuous (for example when it joins
a bridge), the DSA framework also makes the DSA master promiscuous.
Moreover, for every VLAN that a DSA switch sends to the CPU, DSA also
programs a VLAN filter on the DSA master, because if the the DSA switch
uses a tail tag, then the hardware frame parser of the DSA master will
see VLAN as VLAN, and might filter them out, for being unknown.
Due to the above 2 reasons, my belief is that the stmmac driver does not
get to choose to not accept vlan_vid_add() calls while IFF_PROMISC is
enabled, because the 2 are completely independent and there are code
paths in the network stack which directly lead to this situation
occurring, without the user's direct input.
In fact, my belief is that "VLAN promiscuous" mode should have never
been keyed on IFF_PROMISC in the first place, but rather, on the
NETIF_F_HW_VLAN_CTAG_FILTER feature flag which can be toggled by the
user through ethtool -k, when present in netdev->hw_features.
In the stmmac driver, NETIF_F_HW_VLAN_CTAG_FILTER is only present in
"features", making this feature "on [fixed]".
I have this belief because I am unaware of any definition of promiscuity
which implies having an effect on anything other than MAC DA (therefore
not VLAN). However, I seem to be rather alone in having this opinion,
looking back at the disagreements from this discussion:
https://lore.kernel.org/netdev/20201110153958.ci5ekor3o2ekg3ky@ipetronik.com/
In any case, to remove the vlan_vid_add() dependency on !IFF_PROMISC,
one would need to remove the check and see what fails. I guess the test
was there because of the way in which dwmac4_vlan_promisc_enable() is
implemented.
For context, the dwmac4 supports Perfect Filtering for a limited number
of VLANs - dwmac4_get_num_vlan(), priv->hw->num_vlan, with a fallback on
Hash Filtering - priv->dma_cap.vlhash - see stmmac_vlan_update(), also
visible in cat /sys/kernel/debug/stmmaceth/eth0/dma_cap | grep 'VLAN
Hash Filtering'.
The perfect filtering is based on MAC_VLAN_Tag_Filter/MAC_VLAN_Tag_Data
registers, accessed in the driver through dwmac4_write_vlan_filter().
The hash filtering is based on the MAC_VLAN_Hash_Table register, named
GMAC_VLAN_HASH_TABLE in the driver and accessed by dwmac4_update_vlan_hash().
The control bit for enabling hash filtering is GMAC_VLAN_VTHM
(MAC_VLAN_Tag_Ctrl bit VTHM: VLAN Tag Hash Table Match Enable).
Now, the description of dwmac4_vlan_promisc_enable() is that it iterates
through the driver's cache of perfect filter entries (hw->vlan_filter[i],
added by dwmac4_add_hw_vlan_rx_fltr()), and evicts them from hardware by
unsetting their GMAC_VLAN_TAG_DATA_VEN (MAC_VLAN_Tag_Data bit VEN - VLAN
Tag Enable) bit. Then it unsets the GMAC_VLAN_VTHM bit, which disables
hash matching.
This leaves the MAC, according to table "VLAN Match Status" from the
documentation, to always enter these data paths:
VID |VLAN Perfect Filter |VTHM Bit |VLAN Hash Filter |Final VLAN Match
|Match Result | |Match Result |Status
-------|--------------------|---------|-----------------|----------------
VID!=0 |Fail |0 |don't care |Pass
So, dwmac4_vlan_promisc_enable() does its job, but by unsetting
GMAC_VLAN_VTHM, it conflicts with the other code path which controls
this bit: dwmac4_update_vlan_hash(), called through stmmac_update_vlan_hash()
from stmmac_vlan_rx_add_vid() and from stmmac_vlan_rx_kill_vid().
This is, I guess, why dwmac4_add_hw_vlan_rx_fltr() is not allowed to run
after dwmac4_vlan_promisc_enable() has unset GMAC_VLAN_VTHM: because if
it did, then dwmac4_update_vlan_hash() would set GMAC_VLAN_VTHM again,
breaking the "VLAN promiscuity".
It turns out that dwmac4_vlan_promisc_enable() is way too complicated
for what needs to be done. The MAC_Packet_Filter register also has the
VTFE bit (VLAN Tag Filter Enable), which simply controls whether VLAN
tagged packets which don't match the filtering tables (either perfect or
hash) are dropped or not. At the moment, this driver unconditionally
sets GMAC_PACKET_FILTER_VTFE if NETIF_F_HW_VLAN_CTAG_FILTER was detected
through the priv->dma_cap.vlhash capability bits of the device, in
stmmac_dvr_probe().
I would suggest deleting the unnecessarily complex logic from
dwmac4_vlan_promisc_enable(), and simply unsetting GMAC_PACKET_FILTER_VTFE
when becoming IFF_PROMISC, which has the same effect of allowing packets
with any VLAN tags, but has the additional benefit of being able to run
concurrently with stmmac_vlan_rx_add_vid() and stmmac_vlan_rx_kill_vid().
As much as I believe that the VTFE bit should have been exclusively
controlled by NETIF_F_HW_VLAN_CTAG_FILTER through ethtool, and not by
IFF_PROMISC, changing that is not a punctual fix to the problem, and it
would probably break the VFFQ feature added by the later commit
e0f9956a38 ("net: stmmac: Add option for VLAN filter fail queue
enable"). From the commit description, VFFQ needs IFF_PROMISC=on and
VTFE=off in order to work (and this change respects that). But if VTFE
was changed to be controlled through ethtool -k, then a user-visible
change would have been introduced in Intel's scripts (a need to run
"ethtool -k eth0 rx-vlan-filter off" which did not exist before).
The patch was tested with this set of commands:
ip link set eth0 up
ip link add link eth0 name eth0.100 type vlan id 100
ip addr add 192.168.100.2/24 dev eth0.100 && ip link set eth0.100 up
ip link set eth0 promisc on
ip link add link eth0 name eth0.101 type vlan id 101
ip addr add 192.168.101.2/24 dev eth0.101 && ip link set eth0.101 up
ip link set eth0 promisc off
ping -c 5 192.168.100.1
ping -c 5 192.168.101.1
ip link set eth0 promisc on
ping -c 5 192.168.100.1
ping -c 5 192.168.101.1
ip link del eth0.100
ip link del eth0.101
# Wait for VLAN-tagged pings from the other end...
# Check with "tcpdump -i eth0 -e -n -p" and we should see them
ip link set eth0 promisc off
# Wait for VLAN-tagged pings from the other end...
# Check with "tcpdump -i eth0 -e -n -p" and we shouldn't see them
# anymore, but remove the "-p" argument from tcpdump and they're there.
Fixes: c89f44ff10 ("net: stmmac: Add support for VLAN promiscuous mode")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit addresses a deadlock situation that can occur in certain
scenarios, such as when running data TP/ETP transfer and subscribing to
the error queue while receiving a net down event. The deadlock involves
locks in the following order:
3
j1939_session_list_lock -> active_session_list_lock
j1939_session_activate
...
j1939_sk_queue_activate_next -> sk_session_queue_lock
...
j1939_xtp_rx_eoma_one
2
j1939_sk_queue_drop_all -> sk_session_queue_lock
...
j1939_sk_netdev_event_netdown -> j1939_socks_lock
j1939_netdev_notify
1
j1939_sk_errqueue -> j1939_socks_lock
__j1939_session_cancel -> active_session_list_lock
j1939_tp_rxtimer
CPU0 CPU1
---- ----
lock(&priv->active_session_list_lock);
lock(&jsk->sk_session_queue_lock);
lock(&priv->active_session_list_lock);
lock(&priv->j1939_socks_lock);
The solution implemented in this commit is to move the
j1939_sk_errqueue() call out of the active_session_list_lock context,
thus preventing the deadlock situation.
Reported-by: syzbot+ee1cd780f69483a8616b@syzkaller.appspotmail.com
Fixes: 5b9272e93f ("can: j1939: extend UAPI to notify about RX status")
Co-developed-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20230324130141.2132787-1-o.rempel@pengutronix.de
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This fixes the following warning when compiled with GCC 12.2.0 and W=1.
net/core/dev_ioctl.c:475: warning: Function parameter or member 'data'
not described in 'dev_ioctl'
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was a bit too optimistic in commit bf51d27704 ("tools: ynl: fix
get_mask utility routine"), not every mask we use is necessarily
coming from an enum of type "flags". We also allow flipping an
enum into flags on per-attribute basis. That's done by
the 'enum-as-flags' property of an attribute.
Restore this functionality, it's not currently used by any in-tree
family.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Other tests set up the connection fully on both ends before
communicating any data. Add a test which will queue up TLS
records to TCP before the TLS ULP is installed.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While testing the tool I noticed we miss the u16 type on payload create.
On the code inspection it turned out we miss also u64 - add them.
We also miss the decoding of u16 despite the fact `NlAttr` class
supports it - add it.
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Anderson says:
====================
net: sunhme: Probe/IRQ cleanups
Well, I've had these patches kicking around in my tree since last
October, so I guess I had better get around to posting them. This
series is mainly a cleanup/consolidation of the probe process, with
some interrupt changes as well. Some of these changes are SBUS- (AKA
SPARC-) specific, so this should really get some testing there as well
to ensure nothing breaks. I've CC'd a few SPARC mailing lists in hopes
that someone there can try this out. I also have an SBUS card I
ordered by mistake if anyone has a SPARC computer but lacks this card.
Changes in v4:
- Tweak variable order for yuletide
- Move uninitialized return to its own commit
- Use correct SBUS/PCI accessors
- Rework hme_version to set the default in pci/sbus_probe and override it (if
necessary) in common_probe
Changes in v3:
- Incorperate a fix from another series into this commit
Changes in v2:
- Move happy_meal_begin_auto_negotiation earlier and remove forward declaration
- Make some more includes common
- Clean up mac address init
- Inline error returns
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the second half of the PCI/SBUS probe functions are the same.
Consolidate them into a common function.
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mac address initialization is braodly the same between PCI and SBUS,
and one was clearly copied from the other. Consolidate them. We still have
to have some ifdefs because pci_(un)map_rom is only implemented for PCI,
and idprom is only implemented for SPARC.
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of registering one interrupt handler for all four SBUS Quattro
HMEs, let each HME register its own handler. To make this work, we don't
handle the IRQ if none of the status bits are set. This reduces the
complexity of the driver, and makes it easier to ensure things happen
before/after enabling IRQs.
I'm not really sure why we request IRQs in two different places (and leave
them running after removing the driver!). A lot of things in this driver
seem to just be crusty, and not necessarily intentional. I'm assuming
that's the case here as well.
This really needs to be tested by someone with an SBUS Quattro card.
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sunhme driver never used the hardware MII polling feature. Even the
if-def'd out happy_meal_poll_start was removed by 2002 [1]. Remove the
various places in the driver which needlessly guard against MII interrupts
which will never be enabled.
[1] https://lwn.net/2002/0411/a/2.5.8-pre3.php3
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we've tried regular autonegotiation and forcing the link mode, just
restart autonegotiation instead of reinitializing the whole NIC.
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix an uninitialized return code if we never found a qfe slot. It would be
a bug if we ever got into this situation, but it's good to return something
tracable.
Fixes: acb3f35f92 ("sunhme: forward the error code from pci_enable_device()")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veerasenareddy Burru says:
====================
octeon_ep: deferred probe and mailbox
Implement Deferred probe, mailbox enhancements and heartbeat monitor.
v4 -> v5:
- addressed review comments
https://lore.kernel.org/all/20230323104703.GD36557@unreal/
replaced atomic_inc() + atomic_read() with atomic_inc_return().
v3 -> v4:
- addressed review comments on v3
https://lore.kernel.org/all/20230214051422.13705-1-vburru@marvell.com/
- 0004-xxx.patch v3 is split into 0004-xxx.patch and 0005-xxx.patch
in v4.
- API changes to accept function ID are moved to 0005-xxx.patch.
- fixed rct violations.
- reverted newly added changes that do not yet have use cases.
v2 -> v3:
- removed SRIOV VF support changes from v2, as new drivers which use
ndo_get_vf_xxx() and ndo_set_vf_xxx() are not accepted.
https://lore.kernel.org/all/20221207200204.6819575a@kernel.org/
Will implement VF representors and submit again.
- 0007-xxx.patch and 0008-xxx.patch from v2 are removed and
0009-xxx.patch in v2 is now 0007-xxx.patch in v3.
- accordingly, changed title for cover letter.
v1 -> v2:
- remove separate workqueue task to wait for firmware ready.
instead defer probe when firmware is not ready.
Reported-by: Leon Romanovsky <leon@kernel.org>
- This change has resulted in update of 0001-xxx.patch and
all other patches in the patchset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Monitor periodic heartbeat messages from device firmware.
Presence of heartbeat indicates the device is active and running.
If the heartbeat is missed for configured interval indicates
firmware has crashed and device is unusable; in this case, PF driver
stops and uninitialize the device.
Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update control mailbox API to include function id in get stats and
link info.
Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend control command structure to include vfid and
update APIs to accept VF ID.
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enhance control mailbox protocol to support following
- separate command and response queues
* command queue to send control commands to firmware.
* response queue to receive responses and notifications from
firmware.
- variable size messages using scatter/gather
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add control mailbox support for multiple PFs.
Update control mbox base address calculation based on PF function link.
Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Poll for control messages until interrupts are enabled.
All the interrupts are enabled in ndo_open().
Add ability to listen for notifications from firmware before ndo_open().
Once interrupts are enabled, this polling is disabled and all the
messages are processed by bottom half of interrupt handler.
Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement phy_read16() and phy_write16() ops for B53 MMAP to avoid accessing
B53_PORT_MII_PAGE registers which hangs the device.
This access should be done through the MDIO Mux bus controller.
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Álvaro Fernández Rojas says:
====================
net: dsa: b53: mdio: add support for BCM53134
This is based on the initial work from Paul Geurts that was sent to the
incorrect linux development lists and recipients.
I've modified it by removing BCM53134_DEVICE_ID from is531x5() and therefore
adding is53134() where needed.
I also added a separate RGMII handling block for is53134() since according to
Paul, BCM53134 doesn't support RGMII_CTRL_TIMING_SEL as opposed to is531x5().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The KSZ9131RNX incorrectly shows EEE capabilities in its registers.
Although the "EEE control and capability 1" (Register 3.20) is set to 0,
indicating no EEE support, the "EEE advertisement 1" (Register 7.60) is
set to 0x6, advertising EEE support for 1000BaseT/Full and
100BaseT/Full.
This inconsistency causes PHYlib to assume there is no EEE support,
preventing control over EEE advertisement, which is enabled by default.
This patch resolves the issue by utilizing the ksz9477_get_features()
function to correctly set the EEE capabilities for the KSZ9131RNX. This
adjustment allows proper control over EEE advertisement and ensures
accurate representation of the device's capabilities.
Fixes: 8b68710a31 ("net: phy: start using genphy_c45_ethtool_get/set_eee()")
Reported-by: Marek Vasut <marex@denx.de>
Tested-by: Marek Vasut <marex@denx.de>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>