[ Upstream commit daa751444f ]
key might contain private part of the key, so better use
kfree_sensitive to free it.
Fixes: 38320c70d2 ("[IPSEC]: Use crypto_aead and authenc in ESP")
Signed-off-by: Wang Ming <machel@vivo.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c0a8966e2b ]
When using IPv4/TCP, skb->hash comes from sk->sk_txhash except in
TIME_WAIT and SYN_RECV where it's not set in the reply skb from
ip_send_unicast_reply. Those packets will have a mismatched hash with
others from the same flow as their hashes will be 0. IPv6 does not have
the same issue as the hash is set from the socket txhash in those cases.
This commits sets the hash in the reply skb from ip_send_unicast_reply,
which makes the IPv4 code behaving like IPv6.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 5e5265522a ("tcp: annotate data-races around tcp_rsk(req)->txhash")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78adb4bcf9 ]
In normal operation, each populated queue item has
next_to_watch pointing to the last TX desc of the packet,
while each cleaned item has it set to 0. In particular,
next_to_use that points to the next (necessarily clean)
item to use has next_to_watch set to 0.
When the TX queue is used both by an application using
AF_XDP with ZEROCOPY as well as a second non-XDP application
generating high traffic, the queue pointers can get in
an invalid state where next_to_use points to an item
where next_to_watch is NOT set to 0.
However, the implementation assumes at several places
that this is never the case, so if it does hold,
bad things happen. In particular, within the loop inside
of igc_clean_tx_irq(), next_to_clean can overtake next_to_use.
Finally, this prevents any further transmission via
this queue and it never gets unblocked or signaled.
Secondly, if the queue is in this garbled state,
the inner loop of igc_clean_tx_ring() will never terminate,
completely hogging a CPU core.
The reason is that igc_xdp_xmit_zc() reads next_to_use
before acquiring the lock, and writing it back
(potentially unmodified) later. If it got modified
before locking, the outdated next_to_use is written
pointing to an item that was already used elsewhere
(and thus next_to_watch got written).
Fixes: 9acf59a752 ("igc: Enable TX via AF_XDP zero-copy")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95b6814855 ]
High XDP load triggers the netdev watchdog:
|NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out
The reason is the Tx queue transmission start (txq->trans_start) is not updated
in XDP code path. Therefore, add it for all XDP transmission functions.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: 78adb4bcf9 ("igc: Prevent garbled TX queue with XDP ZEROCOPY")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a3f25d614b ]
When running an freplace attached bpf program on an arm64 system w were
seeing the following issue:
Unhandled 64-bit el1h sync exception on CPU47, ESR 0x0000000036000003 -- BTI
After a bit of work to track it down I determined that what appeared to be
happening is that the 'bti c' at the start of the program was somehow being
reached after a 'br' instruction. Further digging pointed me toward the
fact that the function was attached via freplace. This in turn led me to
build_plt which I believe is invoking the long jump which is triggering
this error.
To resolve it we can replace the 'bti c' with 'bti jc' and add a comment
explaining why this has to be modified as such.
Fixes: b2ad54e153 ("bpf, arm64: Implement bpf_arch_text_poke() for arm64")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/168926677665.316237.9953845318337455525.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5e9ad522c ]
While the check_max_stack_depth function explores call chains emanating
from the main prog, which is typically enough to cover all possible call
chains, it doesn't explore those rooted at async callbacks unless the
async callback will have been directly called, since unlike non-async
callbacks it skips their instruction exploration as they don't
contribute to stack depth.
It could be the case that the async callback leads to a callchain which
exceeds the stack depth, but this is never reachable while only
exploring the entry point from main subprog. Hence, repeat the check for
the main subprog *and* all async callbacks marked by the symbolic
execution pass of the verifier, as execution of the program may begin at
any of them.
Consider functions with following stack depths:
main: 256
async: 256
foo: 256
main:
rX = async
bpf_timer_set_callback(...)
async:
foo()
Here, async is not descended as it does not contribute to stack depth of
main (since it is referenced using bpf_pseudo_func and not
bpf_pseudo_call). However, when async is invoked asynchronously, it will
end up breaching the MAX_BPF_STACK limit by calling foo.
Hence, in addition to main, we also need to explore call chains
beginning at all async callback subprogs in a program.
Fixes: 7ddc80a476 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba7b3e7d5f ]
The assignment to idx in check_max_stack_depth happens once we see a
bpf_pseudo_call or bpf_pseudo_func. This is not an issue as the rest of
the code performs a few checks and then pushes the frame to the frame
stack, except the case of async callbacks. If the async callback case
causes the loop iteration to be skipped, the idx assignment will be
incorrect on the next iteration of the loop. The value stored in the
frame stack (as the subprogno of the current subprog) will be incorrect.
This leads to incorrect checks and incorrect tail_call_reachable
marking. Save the target subprog in a new variable and only assign to
idx once we are done with the is_async_cb check which may skip pushing
of frame to the frame stack and subsequent stack depth checks and tail
call markings.
Fixes: 7ddc80a476 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c34743daca ]
The reset task is currently scheduled from the watchdog or adminq tasks.
First, all direct calls to schedule the reset task are replaced with the
iavf_schedule_reset(), which is modified to accept the flag showing the
type of reset.
To prevent the reset task from starting once iavf_remove() starts, we need
to check the __IAVF_IN_REMOVE_TASK bit before we schedule it. This is now
easily added to iavf_schedule_reset().
Finally, remove the check for IAVF_FLAG_RESET_NEEDED in the watchdog task.
It is redundant since all callers who set the flag immediately schedules
the reset task.
Fixes: 3ccd54ef44 ("iavf: Fix init state closure on remove")
Fixes: 14756b2ae2 ("iavf: Fix __IAVF_RESETTING state usage")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1639a1731 ]
A driver's lock (crit_lock) is used to serialize all the driver's tasks.
Lockdep, however, shows a circular dependency between rtnl and
crit_lock. This happens when an ndo that already holds the rtnl requests
the driver to reset, since the reset task (in some paths) tries to grab
rtnl to either change real number of queues of update netdev features.
[566.241851] ======================================================
[566.241893] WARNING: possible circular locking dependency detected
[566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G OE
[566.241984] ------------------------------------------------------
[566.242025] repro.sh/2604 is trying to acquire lock:
[566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf]
[566.242167]
but task is already holding lock:
[566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf]
[566.242300]
which lock already depends on the new lock.
[566.242353]
the existing dependency chain (in reverse order) is:
[566.242401]
-> #1 (rtnl_mutex){+.+.}-{3:3}:
[566.242451] __mutex_lock+0xc1/0xbb0
[566.242489] iavf_init_interrupt_scheme+0x179/0x440 [iavf]
[566.242560] iavf_watchdog_task+0x80b/0x1400 [iavf]
[566.242627] process_one_work+0x2b3/0x560
[566.242663] worker_thread+0x4f/0x3a0
[566.242696] kthread+0xf2/0x120
[566.242730] ret_from_fork+0x29/0x50
[566.242763]
-> #0 (&adapter->crit_lock){+.+.}-{3:3}:
[566.242815] __lock_acquire+0x15ff/0x22b0
[566.242869] lock_acquire+0xd2/0x2c0
[566.242901] __mutex_lock+0xc1/0xbb0
[566.242934] iavf_close+0x3c/0x240 [iavf]
[566.242997] __dev_close_many+0xac/0x120
[566.243036] dev_close_many+0x8b/0x140
[566.243071] unregister_netdevice_many_notify+0x165/0x7c0
[566.243116] unregister_netdevice_queue+0xd3/0x110
[566.243157] iavf_remove+0x6c1/0x730 [iavf]
[566.243217] pci_device_remove+0x33/0xa0
[566.243257] device_release_driver_internal+0x1bc/0x240
[566.243299] pci_stop_bus_device+0x6c/0x90
[566.243338] pci_stop_and_remove_bus_device+0xe/0x20
[566.243380] pci_iov_remove_virtfn+0xd1/0x130
[566.243417] sriov_disable+0x34/0xe0
[566.243448] ice_free_vfs+0x2da/0x330 [ice]
[566.244383] ice_sriov_configure+0x88/0xad0 [ice]
[566.245353] sriov_numvfs_store+0xde/0x1d0
[566.246156] kernfs_fop_write_iter+0x15e/0x210
[566.246921] vfs_write+0x288/0x530
[566.247671] ksys_write+0x74/0xf0
[566.248408] do_syscall_64+0x58/0x80
[566.249145] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[566.249886]
other info that might help us debug this:
[566.252014] Possible unsafe locking scenario:
[566.253432] CPU0 CPU1
[566.254118] ---- ----
[566.254800] lock(rtnl_mutex);
[566.255514] lock(&adapter->crit_lock);
[566.256233] lock(rtnl_mutex);
[566.256897] lock(&adapter->crit_lock);
[566.257388]
*** DEADLOCK ***
The deadlock can be triggered by a script that is continuously resetting
the VF adapter while doing other operations requiring RTNL, e.g:
while :; do
ip link set $VF up
ethtool --set-channels $VF combined 2
ip link set $VF down
ip link set $VF up
ethtool --set-channels $VF combined 4
ip link set $VF down
done
Any operation that triggers a reset can substitute "ethtool --set-channles"
As a fix, add a new task "finish_config" that do all the work which
needs rtnl lock. With the exception of iavf_remove(), all work that
require rtnl should be called from this task.
As for iavf_remove(), at the point where we need to call
unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config
task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent
finish_config work cannot restart after that since the task is guarded
by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config().
Fixes: 5ac49f3c27 ("iavf: use mutexes for locking of critical sections")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c2ed2403f1 ]
There was a fail when trying to add the interface to bonding
right after changing the MTU on the interface. It was caused
by bonding interface unable to open the interface due to
interface being in __RESETTING state because of MTU change.
Add new reset_waitqueue to indicate that reset has finished.
Add waiting for reset to finish in callbacks which trigger hw reset:
iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam().
We use a 5000ms timeout period because on Hyper-V based systems,
this operation takes around 3000-4000ms. In normal circumstances,
it doesn't take more than 500ms to complete.
Add a function iavf_wait_for_reset() to reuse waiting for reset code and
use it also in iavf_set_channels(), which already waits for reset.
We don't use error handling in iavf_set_channels() as this could
cause the device to be in incorrect state if the reset was scheduled
but hit timeout or the waitng function was interrupted by a signal.
Fixes: 4e5e6b5d9d ("iavf: Fix return of set the new channel count")
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Co-developed-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7dcbdf2928 ]
When the user disables rxvlan offloading and then changes the number of
channels, all VLAN ports are unable to receive traffic.
Changing the number of channels triggers a VFR reset. During re-init, when
VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS is received, we do:
1 - set the IAVF_FLAG_SETUP_NETDEV_FEATURES flag
2 - call
iavf_set_vlan_offload_features(adapter, 0, netdev->features);
The second step sends to the PF the __default__ features, in this case
aq_required |= IAVF_FLAG_AQ_ENABLE_CTAG_VLAN_STRIPPING
While the first step forces the watchdog task to call
netdev_update_features() -> iavf_set_features() ->
iavf_set_vlan_offload_features(adapter, netdev->features, features).
Since the user disabled the "rxvlan", this sets:
aq_required |= IAVF_FLAG_AQ_DISABLE_CTAG_VLAN_STRIPPING
When we start processing the AQ commands, both flags are enabled. Since we
process DISABLE_XTAG first then ENABLE_XTAG, this results in the PF
enabling the rxvlan offload. This breaks all communications on the VLAN
net devices.
Fix by removing the call to iavf_set_vlan_offload_features() (second
step). Calling netdev_update_features() from watchdog task is enough for
both init and reset paths.
Fixes: 7598f4b40b ("iavf: Move netdev_update_features() into watchdog task")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: c2ed2403f1 ("iavf: Wait for reset in callbacks which trigger it")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7598f4b40b ]
Remove netdev_update_features() from iavf_adminq_task(), as it can cause
deadlocks due to needing rtnl_lock. Instead use the
IAVF_FLAG_SETUP_NETDEV_FEATURES flag to indicate that netdev features need
to be updated in the watchdog task. iavf_set_vlan_offload_features()
and iavf_set_queue_vlan_tag_loc() can be called directly from
iavf_virtchnl_completion().
Suggested-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: c2ed2403f1 ("iavf: Wait for reset in callbacks which trigger it")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a77ed5c5b7 ]
If the system tries to close the netdev while iavf_reset_task() is
running, __LINK_STATE_START will be cleared and netif_running() will
return false in iavf_reinit_interrupt_scheme(). This will result in
iavf_free_traffic_irqs() not being called and a leak as follows:
[7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0'
[7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0
is shown when pci_disable_msix() is later called. Fix by using the
internal adapter state. The traffic IRQs will always exist if
state == __IAVF_RUNNING.
Fixes: 5b36e8d04b ("i40evf: Enable VF to request an alternate queue allocation")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4bdf79d686 ]
The KSZ8795 driver code was modified to use on KSZ8863/73, which has
different register definitions. Some of the new KSZ8795 register
information are wrong compared to previous code.
KSZ8795 also behaves differently in that the STATIC_MAC_TABLE_USE_FID
and STATIC_MAC_TABLE_FID bits are off by 1 when doing MAC table reading
than writing. To compensate that a special code was added to shift the
register value by 1 before applying those bits. This is wrong when the
code is running on KSZ8863, so this special code is only executed when
KSZ8795 is detected.
Fixes: 4b20a07e10 ("net: dsa: microchip: ksz8795: add support for ksz88xx chips")
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 559901b468 ]
Prepare for the next patch by ensuring that ksz8_r_sta_mac_table() does
not use error codes for empty entries. This change will enable better
handling of read/write errors in the upcoming patch.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 4bdf79d686 ("net: dsa: microchip: correct KSZ8795 static MAC table access")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5751cdd7d ]
As ksz8_r_sta_mac_table() is only used within ksz8795.c, there is no need
to export it. Make the function static for better encapsulation.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 4bdf79d686 ("net: dsa: microchip: correct KSZ8795 static MAC table access")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f6636ff69e ]
Move static MAC table operations to separate functions in order to reuse
the code for add/del_fdb. This is needed to address kernel warnings
caused by the lack of fdb add function support in the current driver.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 4bdf79d686 ("net: dsa: microchip: correct KSZ8795 static MAC table access")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 26a2219492 ]
If cls_bpf_offload errors out, we must also undo tcf_bind_filter that
was done before the error.
Fix that by calling tcf_unbind_filter in errout_parms.
Fixes: eadb41489f ("net: cls_bpf: add support for marking filters as hardware-only")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8d3d78c19 ]
In the case of an update, when TCA_U32_LINK is set, u32_set_parms will
decrement the refcount of the ht_down (struct tc_u_hnode) pointer
present in the older u32 filter which we are replacing. However, if
u32_replace_hw_knode errors out, the update command fails and that
ht_down pointer continues decremented. To fix that, when
u32_replace_hw_knode fails, check if ht_down's refcount was decremented
and undo the decrement.
Fixes: d34e3e1813 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b3d0e04894 ]
In case an error occurred after mall_set_parms executed successfully, we
must undo the tcf_bind_filter call it issues.
Fix that by calling tcf_unbind_filter in err_replace_hw_filter label.
Fixes: ec2507d2a3 ("net/sched: cls_matchall: Fix error path")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 469e2f28c2 ]
This doesn't check how many bytes the simple_write_to_buffer() writes to
the buffer. The only thing that we know is that the first byte is
initialized and the last byte of the buffer is set to NUL. However
the middle bytes could be uninitialized.
There is no need to use simple_write_to_buffer(). This code does not
support partial writes but instead passes "pos = 0" as the starting
offset regardless of what the user passed as "*ppos". Just use the
copy_from_user() function and initialize the whole buffer.
Fixes: 671e0b9005 ("ASoC: SOF: Clone the trace code to ipc3-dtrace as fw_tracing implementation")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/74148292-ce4d-4e01-a1a7-921e6767da14@moroto.mountain
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 69cba9d3c1 ]
When the number of responses with status of STATUS_IO_TIMEOUT
exceeds a specified threshold (NUM_STATUS_IO_TIMEOUT), we reconnect
the connection. But we do not return the mid, or the credits
returned for the mid, or reduce the number of in-flight requests.
This bug could result in the server->in_flight count to go bad,
and also cause a leak in the mids.
This change moves the check to a few lines below where the
response is decrypted, even of the response is read from the
transform header. This way, the code for returning the mids
can be reused.
Also, the cifs_reconnect was reconnecting just the transport
connection before. In case of multi-channel, this may not be
what we want to do after several timeouts. Changed that to
reconnect the session and the tree too.
Also renamed NUM_STATUS_IO_TIMEOUT to a more appropriate name
MAX_STATUS_IO_TIMEOUT.
Fixes: 8e670f77c4 ("Handle STATUS_IO_TIMEOUT gracefully")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d6d537dc5 ]
Move the call to of_get_ethdev_address to mtk_add_mac which is part of
the probe function and can hence itself return -EPROBE_DEFER should
of_get_ethdev_address return -EPROBE_DEFER. This allows us to entirely
get rid of the mtk_init function.
The problem of of_get_ethdev_address returning -EPROBE_DEFER surfaced
in situations in which the NVMEM provider holding the MAC address has
not yet be loaded at the time mtk_eth_soc is initially probed. In this
case probing of mtk_eth_soc should be deferred instead of falling back
to use a random MAC address, so once the NVMEM provider becomes
available probing can be repeated.
Fixes: 656e705243 ("net-next: mediatek: add support for MT7623 ethernet")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56a16035bb ]
When we create an L2 loop on a bridge in netns, we will see packets storm
even if STP is enabled.
# unshare -n
# ip link add br0 type bridge
# ip link add veth0 type veth peer name veth1
# ip link set veth0 master br0 up
# ip link set veth1 master br0 up
# ip link set br0 type bridge stp_state 1
# ip link set br0 up
# sleep 30
# ip -s link show br0
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether b6:61:98:1c:1c:b5 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
956553768 12861249 0 0 0 12861249 <-. Keep
TX: bytes packets errors dropped carrier collsns | increasing
1027834 11951 0 0 0 0 <-' rapidly
This is because llc_rcv() drops all packets in non-root netns and BPDU
is dropped.
Let's add extack warning when enabling STP in netns.
# unshare -n
# ip link add br0 type bridge
# ip link set br0 type bridge stp_state 1
Warning: bridge: STP does not work in non-root netns.
Note this commit will be reverted later when we namespacify the whole LLC
infra.
Fixes: e730c15519 ("[NET]: Make packet reception network namespace safe")
Suggested-by: Harry Coin <hcoin@quietfountain.com>
Link: https://lore.kernel.org/netdev/0f531295-e289-022d-5add-5ceffa0df9bc@quietfountain.com/
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b685f1a589 ]
CPSW ALE has 75 bit ALE entries which are stored within three 32 bit words.
The cpsw_ale_get_field() and cpsw_ale_set_field() functions assume that the
field will be strictly contained within one word. However, this is not
guaranteed to be the case and it is possible for ALE field entries to span
across up to two words at the most.
Fix the methods to handle getting/setting fields spanning up to two words.
Fixes: db82173f23 ("netdev: driver: ethernet: add cpsw address lookup engine support")
Signed-off-by: Tanmay Patil <t-patil@ti.com>
[s-vadapalli@ti.com: rephrased commit message and added Fixes tag]
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95ce158b6c ]
I get sporadic timeouts from the driver when using the
MV88E6352. Reading the status again after the loop fixes the
problem: the operation is successful but goes undetected.
Some added prints show things like this:
[ 58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
for switch, addr 1b reg 0b, mask 8000, val 0000, data c000
[ 58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for
ATU op 4000, fid 0001
(...)
[ 61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
for switch, addr 1c reg 18, mask 8000, val 0000, data 9860
[ 61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting
for PHY command 1860 to complete
The reason is probably not the commands: I think those are
mostly fine with the 50+50ms timeout, but the problem
appears when OpenWrt brings up several interfaces in
parallel on a system with 7 populated ports: if one of
them take more than 50 ms and waits one or more of the
others can get stuck on the mutex for the switch and then
this can easily multiply.
As we sleep and wait, the function loop needs a final
check after exiting the loop if we were successful.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Cc: Tobias Waldekranz <tobias@waldekranz.com>
Fixes: 35da1dfd94 ("net: dsa: mv88e6xxx: Improve performance of busy bit polling")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230712223405.861899-1-linus.walleij@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8cc32a9bbf ]
Commit 6eb4bd92c1 ("kallsyms: strip LTO suffixes from static functions")
stripped all function/variable suffixes started with '.' regardless
of whether those suffixes are generated at LTO mode or not. In fact,
as far as I know, in LTO mode, when a static function/variable is
promoted to the global scope, '.llvm.<...>' suffix is added.
The existing mechanism breaks live patch for a LTO kernel even if
no <symbol>.llvm.<...> symbols are involved. For example, for the following
kernel symbols:
$ grep bpf_verifier_vlog /proc/kallsyms
ffffffff81549f60 t bpf_verifier_vlog
ffffffff8268b430 d bpf_verifier_vlog._entry
ffffffff8282a958 d bpf_verifier_vlog._entry_ptr
ffffffff82e12a1f d bpf_verifier_vlog.__already_done
'bpf_verifier_vlog' is a static function. '_entry', '_entry_ptr' and
'__already_done' are static variables used inside 'bpf_verifier_vlog',
so llvm promotes them to file-level static with prefix 'bpf_verifier_vlog.'.
Note that the func-level to file-level static function promotion also
happens without LTO.
Given a symbol name 'bpf_verifier_vlog', with LTO kernel, current mechanism will
return 4 symbols to live patch subsystem which current live patching
subsystem cannot handle it. With non-LTO kernel, only one symbol
is returned.
In [1], we have a lengthy discussion, the suggestion is to separate two
cases:
(1). new symbols with suffix which are generated regardless of whether
LTO is enabled or not, and
(2). new symbols with suffix generated only when LTO is enabled.
The cleanup_symbol_name() should only remove suffixes for case (2).
Case (1) should not be changed so it can work uniformly with or without LTO.
This patch removed LTO-only suffix '.llvm.<...>' so live patching and
tracing should work the same way for non-LTO kernel.
The cleanup_symbol_name() in scripts/kallsyms.c is also changed to have the same
filtering pattern so both kernel and kallsyms tool have the same
expectation on the order of symbols.
[1] https://lore.kernel.org/live-patching/20230615170048.2382735-1-song@kernel.org/T/#u
Fixes: 6eb4bd92c1 ("kallsyms: strip LTO suffixes from static functions")
Reported-by: Song Liu <song@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628181926.4102448-1-yhs@fb.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 010a0aad39 ]
LLVM appends various suffixes for local functions and variables, suffixes
observed:
- foo.llvm.[0-9a-f]+
- foo.[0-9a-f]+
Therefore, when CONFIG_LTO_CLANG=y, kallsyms_lookup_name() needs to
truncate the suffix of the symbol name before comparing the local function
or variable name.
Old implementation code:
- if (strcmp(namebuf, name) == 0)
- return kallsyms_sym_address(i);
- if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
- return kallsyms_sym_address(i);
The preceding process is traversed by address from low to high. That is,
for those with the same name after the suffix is removed, the one with
the smallest address is returned first. Therefore, when sorting in the
tool, if the raw names are the same, they should be sorted by address in
ascending order.
ASCII[.] = 2e
ASCII[0-9] = 30,39
ASCII[A-Z] = 41,5a
ASCII[_] = 5f
ASCII[a-z] = 61,7a
According to the preceding ASCII code values, the following sorting result
is strictly followed.
---------------------------------
| main-key | sub-key |
|---------------------------------|
| | addr_lowest |
| <name> | ... |
| <name>.<suffix> | ... |
| | addr_highest |
|---------------------------------|
| <name>?<others> | | //? is [_A-Za-z0-9]
---------------------------------
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Stable-dep-of: 8cc32a9bbf ("kallsyms: strip LTO-only suffixes from promoted global functions")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60443c88f3 ]
Currently, to search for a symbol, we need to expand the symbols in
'kallsyms_names' one by one, and then use the expanded string for
comparison. It's O(n).
If we sort names in ascending order like addresses, we can also use
binary search. It's O(log(n)).
In order not to change the implementation of "/proc/kallsyms", the table
kallsyms_names[] is still stored in a one-to-one correspondence with the
address in ascending order.
Add array kallsyms_seqs_of_names[], it's indexed by the sequence number
of the sorted names, and the corresponding content is the sequence number
of the sorted addresses. For example:
Assume that the index of NameX in array kallsyms_seqs_of_names[] is 'i',
the content of kallsyms_seqs_of_names[i] is 'k', then the corresponding
address of NameX is kallsyms_addresses[k]. The offset in kallsyms_names[]
is get_symbol_offset(k).
Note that the memory usage will increase by (4 * kallsyms_num_syms)
bytes, the next two patches will reduce (1 * kallsyms_num_syms) bytes
and properly handle the case CONFIG_LTO_CLANG=y.
Performance test results: (x86)
Before:
min=234, max=10364402, avg=5206926
min=267, max=11168517, avg=5207587
After:
min=1016, max=90894, avg=7272
min=1014, max=93470, avg=7293
The average lookup performance of kallsyms_lookup_name() improved 715x.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Stable-dep-of: 8cc32a9bbf ("kallsyms: strip LTO-only suffixes from promoted global functions")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4e7de35eb7 ]
The mirror_num_ret is allowed to be NULL, although it has to be set when
smap is set. Unfortunately that is not a well enough specifiable
invariant for static type checkers, so add a NULL check to make sure they
are fine.
Fixes: 03793cbbc8 ("btrfs: add fast path for single device io in __btrfs_map_block")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1feece2780 ]
-L only specifies the search path for libraries directly provided in the
link line with -l. Because -lopencsd isn't specified, it's only linked
because it's a dependency of -lopencsd_c_api. Dependencies like this are
resolved using the default system search paths or -rpath-link=... rather
than -L. This means that compilation only works if OpenCSD is installed
to the system rather than provided with the CSLIBS (-L) option.
This could be fixed by adding -Wl,-rpath-link=$(CSLIBS) but that is less
conventional than just adding -lopencsd to the link line so that it uses
-L. -lopencsd seems to have been removed in commit ed17b19149
("perf tools: Drop requirement for libstdc++.so for libopencsd check")
because it was thought that there was a chance compilation would work
even if it didn't exist, but I think that only applies to libstdc++ so
there is no harm to add it back. libopencsd.so and libopencsd_c_api.so
would always exist together.
Testing
=======
The following scenarios now all work:
* Cross build with OpenCSD installed
* Cross build using CSLIBS=...
* Native build with OpenCSD installed
* Native build using CSLIBS=...
* Static cross build with OpenCSD installed
* Static cross build with CSLIBS=...
Committer testing:
⬢[acme@toolbox perf-tools]$ alias m
alias m='make -k BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf-tools -C tools/perf install-bin && git status && perf test python ; perf record -o /dev/null sleep 0.01 ; perf stat --null sleep 0.01'
⬢[acme@toolbox perf-tools]$ ldd ~/bin/perf | grep csd
libopencsd_c_api.so.1 => /lib64/libopencsd_c_api.so.1 (0x00007fd49c44e000)
libopencsd.so.1 => /lib64/libopencsd.so.1 (0x00007fd49bd56000)
⬢[acme@toolbox perf-tools]$ cat /etc/redhat-release
Fedora release 36 (Thirty Six)
⬢[acme@toolbox perf-tools]$
Fixes: ed17b19149 ("perf tools: Drop requirement for libstdc++.so for libopencsd check")
Reported-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Signed-off-by: James Clark <james.clark@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Uwe Kleine-König <uwe@kleine-koenig.org>
Cc: coresight@lists.linaro.org
Closes: https://lore.kernel.org/linux-arm-kernel/56905d7a-a91e-883a-b707-9d5f686ba5f1@arm.com/
Link: https://lore.kernel.org/all/36cc4dc6-bf4b-1093-1c0a-876e368af183@kleine-koenig.org/
Link: https://lore.kernel.org/r/20230707154546.456720-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45fcc058a7 ]
Remove unnecessary release_mem_region from the error path to prevent
mem region from being released twice, which could avoid resource leak
or other unexpected issues.
Fixes: b083c22d51 ("video: fbdev: imxfb: Convert request_mem_region + ioremap to devm_ioremap_resource")
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4e47382fbc ]
Warn about invalid var->left_margin or var->right_margin. Their values
are read from the device tree.
We store var->left_margin-3 and var->right_margin-1 in register
fields. These fields should be >= 0.
Fixes: 7e8549bcee ("imxfb: Fix margin settings")
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5158814cbb ]
The command word is defined as following:
/* Command */
#define SPI_CMD_COMMAND_SHIFT 0
#define SPI_CMD_DEVICE_ID_SHIFT 4
#define SPI_CMD_PREPEND_BYTE_CNT_SHIFT 8
#define SPI_CMD_ONE_BYTE_SHIFT 11
#define SPI_CMD_ONE_WIRE_SHIFT 12
If the prepend byte count field starts at bit 8, and the next defined
bit is SPI_CMD_ONE_BYTE at bit 11, it can be at most 3 bits wide, and
thus the max value is 7, not 15.
Fixes: b17de07606 ("spi/bcm63xx: work around inability to keep CS up")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Link: https://lore.kernel.org/r/20230629071453.62024-1-jonas.gorski@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bfc374a145 ]
Currently, sd1 and sd0 have unique subnode names 'sd1_mux' and 'sd0_mux'.
If we change these to non-unique subnode names such as 'mux' this can
lead to the below conflict as the RZ/G2L pin control driver considers
only the names of the subnodes.
pinctrl-rzg2l 11030000.pinctrl: pin P47_0 already requested by 11c00000.mmc; cannot claim for 11c10000.mmc
pinctrl-rzg2l 11030000.pinctrl: pin-376 (11c10000.mmc) status -22
pinctrl-rzg2l 11030000.pinctrl: could not request pin 376 (P47_0) from group mux on device pinctrl-rzg2l
renesas_sdhi_internal_dmac 11c10000.mmc: Error applying setting, reverse things back
Fix this by constructing unique names from the node names of both the
pin control configuration node and its child node, where appropriate.
Based on the work done by Geert for the RZ/V2M pinctrl driver.
Fixes: c4c4637eb5 ("pinctrl: renesas: Add RZ/G2L pin and gpio controller driver")
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230704111858.215278-1-biju.das.jz@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f46a0b47cc ]
The eMMC and SDHI pin control configuration nodes in DT have subnodes
with the same names ("data" and "ctrl"). As the RZ/V2M pin control
driver considers only the names of the subnodes, this leads to
conflicts:
pinctrl-rzv2m b6250000.pinctrl: pin P8_2 already requested by 85000000.mmc; cannot claim for 85020000.mmc
pinctrl-rzv2m b6250000.pinctrl: pin-130 (85020000.mmc) status -22
renesas_sdhi_internal_dmac 85020000.mmc: Error applying setting, reverse things back
Fix this by constructing unique names from the node names of both the
pin control configuration node and its child node, where appropriate.
Reported by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Fixes: 92a9b82525 ("pinctrl: renesas: Add RZ/V2M pin and gpio controller driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Link: https://lore.kernel.org/r/607bd6ab4905b0b1b119a06ef953fa1184505777.1688396717.git.geert+renesas@glider.be
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aff037078e ]
Destroying psi trigger in cgroup_file_release causes UAF issues when
a cgroup is removed from under a polling process. This is happening
because cgroup removal causes a call to cgroup_file_release while the
actual file is still alive. Destroying the trigger at this point would
also destroy its waitqueue head and if there is still a polling process
on that file accessing the waitqueue, it will step on the freed pointer:
do_select
vfs_poll
do_rmdir
cgroup_rmdir
kernfs_drain_open_files
cgroup_file_release
cgroup_pressure_release
psi_trigger_destroy
wake_up_pollfree(&t->event_wait)
// vfs_poll is unblocked
synchronize_rcu
kfree(t)
poll_freewait -> UAF access to the trigger's waitqueue head
Patch [1] fixed this issue for epoll() case using wake_up_pollfree(),
however the same issue exists for synchronous poll() case.
The root cause of this issue is that the lifecycles of the psi trigger's
waitqueue and of the file associated with the trigger are different. Fix
this by using kernfs_generic_poll function when polling on cgroup-specific
psi triggers. It internally uses kernfs_open_node->poll waitqueue head
with its lifecycle tied to the file's lifecycle. This also renders the
fix in [1] obsolete, so revert it.
[1] commit c2dbe32d5d ("sched/psi: Fix use-after-free in ep_remove_wait_queue()")
Fixes: 0e94682b73 ("psi: introduce psi monitor")
Closes: https://lore.kernel.org/all/20230613062306.101831-1-lujialin4@huawei.com/
Reported-by: Lu Jialin <lujialin4@huawei.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230630005612.1014540-1-surenb@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d82caa2735 ]
PSI offers 2 mechanisms to get information about a specific resource
pressure. One is reading from /proc/pressure/<resource>, which gives
average pressures aggregated every 2s. The other is creating a pollable
fd for a specific resource and cgroup.
The trigger creation requires CAP_SYS_RESOURCE, and gives the
possibility to pick specific time window and threshold, spawing an RT
thread to aggregate the data.
Systemd would like to provide containers the option to monitor pressure
on their own cgroup and sub-cgroups. For example, if systemd launches a
container that itself then launches services, the container should have
the ability to poll() for pressure in individual services. But neither
the container nor the services are privileged.
This patch implements a mechanism to allow unprivileged users to create
pressure triggers. The difference with privileged triggers creation is
that unprivileged ones must have a time window that's a multiple of 2s.
This is so that we can avoid unrestricted spawning of rt threads, and
use instead the same aggregation mechanism done for the averages, which
runs independently of any triggers.
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20230330105418.77061-5-cerasuolodomenico@gmail.com
Stable-dep-of: aff037078e ("sched/psi: use kernfs polling functions for PSI trigger polling")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4468fcae49 ]
This change moves update_total flag out of update_triggers function,
currently called only in psi_poll_work.
In the next patch, update_triggers will be called also in psi_avgs_work,
but the total update information is specific to psi_poll_work.
Returning update_total value to the caller let us avoid differentiating
the implementation of update_triggers for different aggregators.
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20230330105418.77061-4-cerasuolodomenico@gmail.com
Stable-dep-of: aff037078e ("sched/psi: use kernfs polling functions for PSI trigger polling")
Signed-off-by: Sasha Levin <sashal@kernel.org>