Added userspace file that describes what clock is used for hyp tracing.
Unlike tracefs root instance, the hyp tracing only supports boot clock,
hence the trace_clock file is read-only.
Bug: 249050813
Change-Id: Ib9cc1f582699245ed94cf745dae0888eb7556ced
Signed-off-by: Nikita Ioffe <ioffe@google.com>
Like the common "trace" file introduced previously, that new common
file aggregates a pipe version for all CPUs, similarly to the tracefs
root file of the same name.
Bug: 249050813
Change-Id: I1872bf3cfeef637902fcdfa5f589a903c0121d04
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Writing anything to the "trace" file will delete the content of the
buffer. When using the common "trace", the ring buffer will also be
unloaded from the hypervisor and all the memory will be freed.
At the same time, tracing_on will not reset the buffers anymore and
trace pipe interfaces will be able to setup the ring buffers, bringing
the hyp tracing interface a bit closer from the host behavior.
Bug: 249050813
Change-Id: I9d4ba7b18504440f3d03dbedf1186d384a53a990
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Extend the hyp tracing interface with a new hyp/trace file that merges
all per-CPU traces. This is similar to "trace" file found in the tracefs
root.
At the same time, align the output of the files with the host:
[<CPU>] <timestamp>: <event>
Bug: 249050813
Change-Id: I816f8504b14480b13d40f8689f9b9f63706a4daf
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
This newly introduced hypercall allows the host to disable tracing on
all CPUs, while keeping the tracing buffers loaded into the hypervisor.
This intends to later improve the userspace interface which will be able
to turn on and off tracing and reset (teardown for the hyp) the tracing
buffers.
As disabling buffers will switch the buffer status, rename those status
to nonwritable - writable - writing. Another way of identifying buffers
which have not been loaded is needed. See rb_cpu_loaded().
Bug: 249050813
Change-Id: I6080aafe71d5628e94b37c432bcd8616e68ddfe8
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Previously, hyp/per_cpu/cpu*/trace files would return an error when no
buffer has been allocated (i.e. when no tracing has ever started).
Return an empty header instead.
Bug: 249050813
Change-Id: Ic88bbdf8c876b8f26101ce2b33d3aca26fb88c94
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
The ELF sections and delimiters used by the hyp events were not
following the convention used by other hyp sections. Align them all.
Bug: 249050813
Change-Id: I7b3ee4915c8904cd531911df59c1fd1853bbbe9f
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
This reverts commit 76d62f24db.
So while priority inversion on the pmsg_lock is an occasional
problem that an rt_mutex would help with, in uses where logging
is writing to pmsg heavily from multiple threads, the pmsg_lock
can be heavily contended.
After this change landed, it was reported that cases where the
mutex locking overhead was commonly adding on the order of 10s
of usecs delay had suddenly jumped to ~msec delay with rtmutex.
It seems the slight differences in the locks under this level
of contention causes the normal mutexes to utilize the spinning
optimizations, while the rtmutexes end up in the sleeping
slowpath (which allows additional threads to pile on trying
to take the lock).
In this case, it devolves to a worse case senerio where the lock
acquisition and scheduling overhead dominates, and each thread
is waiting on the order of ~ms to do ~us of work.
Obviously, having tons of threads all contending on a single
lock for logging is non-optimal, so the proper fix is probably
reworking pstore pmsg to have per-cpu buffers so we don't have
contention.
Additionally, Steven Rostedt has provided some furhter
optimizations for rtmutexes that improves the rtmutex spinning
path, but at least in my testing, I still see the test tripping
into the sleeping path on rtmutexes while utilizing the spinning
path with mutexes.
But in the short term, lets revert the change to the rt_mutex
and go back to normal mutexes to avoid a potentially major
performance regression. And we can work on optimizations to both
rtmutexes and finer-grained locking for pstore pmsg in the
future.
Cc: Wei Wang <wvw@google.com>
Cc: Midas Chien<midaschieh@google.com>
Cc: "Chunhui Li (李春辉)" <chunhui.li@mediatek.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: kernel-team@android.com
Fixes: 76d62f24db ("pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversion")
Reported-by: "Chunhui Li (李春辉)" <chunhui.li@mediatek.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230308204043.2061631-1-jstultz@google.com
Bug: 271041816
Bug: 272453930
(cherry picked from commit 5239a89b06https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore )
Change-Id: Iadf30bcbf5ba3895dd4af8c15c3a8aecf4301acb
Signed-off-by: John Stultz <jstultz@google.com>
The printed reserved memory information uses the non-standard "K"
prefix, while all other printed values use proper binary prefixes.
Fix this by using "Ki" instead.
While at it, drop the superfluous spaces inside the parentheses, to
reduce printed line length.
Bug: 254441685
Fixes: aeb9267eb6 ("of: reserved-mem: print out reserved-mem details during boot")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230216083725.1244817-1-geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
(cherry picked from commit 6ee7afbabc)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ide373aecb11b08df071a9f7633af3ae21a677799
When a connection was established without going through
NL80211_CMD_CONNECT, the ssid was never set in the wireless_dev struct.
Now we set it in __cfg80211_connect_result() when it is not already set.
When using a userspace configuration that does not call
cfg80211_connect() (can be checked with breakpoints in the kernel),
this patch should allow `networkctl status device_name` to output the
SSID instead of null.
Bug: 254441685
Cc: stable@vger.kernel.org
Reported-by: Yohan Prod'homme <kernel@zoddo.fr>
Fixes: 7b0a0e3c3a (wifi: cfg80211: do some rework towards MLO link APIs)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216711
Signed-off-by: Marc Bornand <dev.mbornand@systemb.ch>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
(cherry picked from commit c38c701851)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Idc15d5f35fc93a5f48848b462b19e8b18774fcbc
It turns out the optimisation implemented by commit 4f2c3872dd is
totally broken, since all the places that consume hw->dtcs_used for
events other than cycle count are still not expecting it to be sparsely
populated, and fail to read all the relevant DTC counters correctly if
so.
If implemented correctly, the optimisation potentially saves up to 3
register reads per event update, which is reasonably significant for
events targeting a single node, but still not worth a massive amount of
additional code complexity overall. Getting it right within the current
design looks a fair bit more involved than it was ever intended to be,
so let's just make a functional revert which restores the old behaviour
while still backporting easily.
Bug: 254441685
Fixes: 4f2c3872dd ("perf/arm-cmn: Optimise DTC counter accesses")
Reported-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b41bb4ed7283c3d8400ce5cf5e6ec94915e6750f.1674498637.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit a428eb4b99)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I3bc5b2e6a8fc483a051862ddba084f59142cab3d
Fix the following sparse endianness warning:
"sparse warnings: drivers/ufs/core/ufs_bsg.c:91:25: sparse: sparse: cast to
restricted __be16."
For consistency with endianness annotations of other UFS data structures,
change __u16/32 to __be16/32 in UFS ARPMB data structures.
Bug: 254441685
Fixes: 6ff265fc5e ("scsi: ufs: core: bsg: Add advanced RPMB support in ufs_bsg")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e2cb6e8db6)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I78195aa5c606a766c7414b256c9e23b1a16434bd
Currently we only allocate space for SVE signal frames on systems that
support SVE, meaning that SME only systems do not allocate a signal frame
for streaming mode SVE state. Change the check so space is allocated if
either feature is supported.
Bug: 254441685
Fixes: 85ed24dad2 ("arm64/sme: Implement streaming SVE signal handling")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221223-arm64-fix-sme-only-v1-3-938d663f69e5@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit f26cd73721)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I74e735b2fba9e055acb1d43881eec814f7eba91d
We currently guard REGSET_{SSVE, ZA} using ARM64_SVE for no good reason.
Both enumerations would be pointless without ARM64_SME and create two empty
entries in aarch64_regsets[] which would then become part of a process's
native regset view (they should be ignored though).
Switch to use ARM64_SME instead.
Bug: 254441685
Fixes: e12310a0d3 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221214135943.379-1-yuzenghui@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit eb9a85261e)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I1aca02e58dfe67db7eb45efc8e9ad08a8c1f9392
lru_gen_add_mm() has been added within an IRQ-off region in the commit
mentioned below. The other invocations of lru_gen_add_mm() are not within
an IRQ-off region.
The invocation within IRQ-off region is problematic on PREEMPT_RT because
the function is using a spin_lock_t which must not be used within
IRQ-disabled regions.
The other invocations of lru_gen_add_mm() occur while
task_struct::alloc_lock is acquired. Move lru_gen_add_mm() after
interrupts are enabled and before task_unlock().
Bug: 254441685
Link: https://lkml.kernel.org/r/20221026134830.711887-1-bigeasy@linutronix.de
Fixes: bd74fdaea1 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit dda1c41a07)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: If85c9bf03c4ffa47cd0e79db2f75fdb0ff92ce0a
The subelements obviously start after the common data, including
the common multi-link element structure definition itself. This
bug was possibly just hidden by the higher bits of the control
being set to 0, so the iteration just found one bogus element
and most of the code could continue anyway.
Bug: 254441685
Fixes: 0f48b8b88a ("wifi: ieee80211: add definitions for multi-link element")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
(cherry picked from commit 1177aaa7fe)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I068a4a16eaad463ada5ba976fc065d0ddb058195
CMN-600 uses bits [27:0] for child node address offset while bits [30:28]
are required to be zero.
For CMN-650, the child node address offset field has been increased
to include bits [29:0] while leaving only bit 30 set to zero.
Let's include the missing two bits and assume older implementations
comply with the spec and set bits [29:28] to 0.
Bug: 254441685
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Fixes: 60d1504070 ("perf/arm-cmn: Support new IP features")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220808195455.79277-1-ilkka@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 05d6f6d346)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I8d4048e6bca6498c04b10f31bc188bebdf3f716b
Use READ_ONCE_NOCHECK() when reading the stack to prevent KASAN splats
when dump_stack() is used.
Bug: 254441685
Fixes: 5b301409e8 ("UML: add support for KASAN under x86_64")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
(cherry picked from commit 2975e4a282)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I592c51099761e9eef0b24a40ea427d5d2ab0bacf
We should set the STA deflink addresses in case no
link is really added.
Bug: 254441685
Fixes: 046d2e7c50 ("mac80211: prepare sta handling for MLO support")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
(cherry picked from commit 630c7e4621)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I709a549c3394926c67e003ed4b923dfafd18b4df
From CMN-650 onwards, some of the fields in the watchpoint config
registers moved subtly enough to easily overlook. Watchpoint events are
still only partially supported on newer IPs - which in itself deserves
noting - but were not intended to become any *less* functional than on
CMN-600.
Bug: 254441685
Fixes: 60d1504070 ("perf/arm-cmn: Support new IP features")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/e1ce4c2f1e4f73ab1c60c3a85e4037cd62dd6352.1645727871.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 31fac56577)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I09f3740c63f90e49d84bc2d0ddd6176cd5869db3
MGLRU has been tested and edge cases addressed on Android workloads;
after which the MGLRU showed good results across various performance
metrics. Enable the MGLRU as default memory reclaim in algorithm.
Bug: 261619133
Change-Id: I7ed7fbfd6ef9ce10053347528125dd98c39e50bf
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Using per-cpu thread pool we can reduce the scheduling latency compared
to workqueue implementation. With this patch scheduling latency and
variation is reduced as per-cpu threads are high priority kthread_workers.
The results were evaluated on arm64 Android devices running 5.10 kernel.
The table below shows resulting improvements of total scheduling latency
for the same app launch benchmark runs with 50 iterations. Scheduling
latency is the latency between when the task (workqueue kworker vs
kthread_worker) became eligible to run to when it actually started
running.
+-------------------------+-----------+----------------+---------+
| | workqueue | kthread_worker | diff |
+-------------------------+-----------+----------------+---------+
| Average (us) | 15253 | 2914 | -80.89% |
| Median (us) | 14001 | 2912 | -79.20% |
| Minimum (us) | 3117 | 1027 | -67.05% |
| Maximum (us) | 30170 | 3805 | -87.39% |
| Standard deviation (us) | 7166 | 359 | |
+-------------------------+-----------+----------------+---------+
Background: Boot times and cold app launch benchmarks are very
important to the Android ecosystem as they directly translate to
responsiveness from user point of view. While EROFS provides
a lot of important features like space savings, we saw some
performance penalty in cold app launch benchmarks in few scenarios.
Analysis showed that the significant variance was coming from the
scheduling cost while decompression cost was more or less the same.
Having per-cpu thread pool we can see from the above table that this
variation is reduced by ~80% on average. This problem was discussed
at LPC 2022. Link to LPC 2022 slides and talk at [1]
[1] https://lpc.events/event/16/contributions/1338/
[ Gao Xiang: At least, we have to add this until WQ_UNBOUND workqueue
issue [2] on many arm64 devices is resolved. ]
[2] https://lore.kernel.org/r/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com
Bug: 271635890
Test: launch_cvd
Change-Id: I9dce2bfd6f40ec6a210161b80cee7c0417b4edb3
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230208093322.75816-1-hsiangkao@linux.alibaba.com
(cherry picked from commit 3fffb589b9)
[dhavale: Fixed minor conflict as upstream now has zdata.h folded in
zdata.c]
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
(cherry picked from commit 566a7f6c6b)
[dhavale: Fixed minor conflicts in Kconfig and zdata.c]
For AOA re-connection, since the string ID of accessory has been changed
into a non-zero value, the f_accessory failes to call `usb_string_id` to
increment `next_string_id`. This makes the ADB interface display a wrong
name.
Bug: 270044830
Test: CTS Verifier: USB Accessory Test
Test: manual test
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
Change-Id: I807164588e80b28065e8715591a100392b04d3de
Changes in 5.15.98
io_uring: ensure that io_init_req() passes in the right issue_flags
Linux 5.15.98
Change-Id: I3d843bbf562cf5da5fc71adef802990dd2841add
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
We can't use 0 here, as io_init_req() is always invoked with the
ctx uring_lock held. Newer kernels have IO_URING_F_UNLOCKED for this,
but previously we used IO_URING_F_NONBLOCK to indicate this as well.
Fixes: cf7f9cd500 ("io_uring: add missing lock in io_get_file_fixed")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.15.97
ionic: refactor use of ionic_rx_fill()
Fix XFRM-I support for nested ESP tunnels
arm64: dts: rockchip: drop unused LED mode property from rk3328-roc-cc
ARM: dts: rockchip: add power-domains property to dp node on rk3288
HID: elecom: add support for TrackBall 056E:011C
ACPI: NFIT: fix a potential deadlock during NFIT teardown
btrfs: send: limit number of clones and allocated memory size
ASoC: rt715-sdca: fix clock stop prepare timeout issue
IB/hfi1: Assign npages earlier
neigh: make sure used and confirmed times are valid
HID: core: Fix deadloop in hid_apply_multiplier.
x86/cpu: Add Lunar Lake M
staging: mt7621-dts: change palmbus address to lower case
bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
net: Remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues().
vc_screen: don't clobber return value in vcs_read
scripts/tags.sh: Invoke 'realpath' via 'xargs'
scripts/tags.sh: fix incompatibility with PCRE2
usb: dwc3: pci: add support for the Intel Meteor Lake-M
USB: serial: option: add support for VW/Skoda "Carstick LTE"
usb: gadget: u_serial: Add null pointer check in gserial_resume
USB: core: Don't hold device lock while reading the "descriptors" sysfs file
io_uring: add missing lock in io_get_file_fixed
Linux 5.15.97
Change-Id: I7e043d6a6dce3cdedde819bebe654689b644de3c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
io_get_file_fixed will access io_uring's context. Lock it if it is
invoked unlocked (eg via io-wq) to avoid a race condition with fixed
files getting unregistered.
No single upstream patch exists for this issue, it was fixed as part
of the file assignment changes that went into the 5.18 cycle.
Signed-off-by: Jheng, Bing-Jhong Billy <billy@starlabs.sg>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45bf39f8df upstream.
Ever since commit 83e83ecb79 ("usb: core: get config and string
descriptors for unauthorized devices") was merged in 2013, there has
been no mechanism for reallocating the rawdescriptors buffers in
struct usb_device after the initial enumeration. Before that commit,
the buffers would be deallocated when a device was deauthorized and
reallocated when it was authorized and enumerated.
This means that the locking in the read_descriptors() routine is not
needed, since the buffers it reads will never be reallocated while the
routine is running. This locking can interfere with user programs
trying to read a hub's descriptors via sysfs while new child devices
of the hub are being initialized, since the hub is locked during this
procedure.
Since the locking in read_descriptors() hasn't been needed for over
nine years, we can remove it.
Reported-and-tested-by: Troels Liebe Bentsen <troels@connectedcars.dk>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: stable@vger.kernel.org
Link: https://lore.kernel.org/r/Y9l+wDTRbuZABzsE@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7394d2ebb6 upstream.
When COMPILED_SOURCE is set, running
make ARCH=x86_64 COMPILED_SOURCE=1 cscope tags
could throw the following errors:
scripts/tags.sh: line 98: /usr/bin/realpath: Argument list too long
cscope: no source files found
scripts/tags.sh: line 98: /usr/bin/realpath: Argument list too long
ctags: No files specified. Try "ctags --help".
This is most likely to happen when the kernel is configured to build a
large number of modules, which has the consequence of passing too many
arguments when calling 'realpath' in 'all_compiled_sources()'.
Let's improve this by invoking 'realpath' through 'xargs', which takes
care of properly limiting the argument list.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20220516234646.531208-1-cristian.ciocaltea@collabora.com
Cc: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae3419fbac upstream.
Commit 226fae124b ("vc_screen: move load of struct vc_data pointer in
vcs_read() to avoid UAF") moved the call to vcs_vc() into the loop.
While doing this it also moved the unconditional assignment of
ret = -ENXIO;
This unconditional assignment was valid outside the loop but within it
it clobbers the actual value of ret.
To avoid this only assign "ret = -ENXIO" when actually needed.
[ Also, the 'goto unlock_out" needs to be just a "break", so that it
does the right thing when it exits on later iterations when partial
success has happened - Linus ]
Reported-by: Storm Dragon <stormdragon2976@gmail.com>
Link: https://lore.kernel.org/lkml/Y%2FKS6vdql2pIsCiI@hotmail.com/
Fixes: 226fae124b ("vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/64981d94-d00c-4b31-9063-43ad0a384bde@t-8ch.de/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1fe4850b34 upstream.
The bpf_fib_lookup() helper does not only look up the fib (ie. route)
but it also looks up the neigh. Before returning the neigh, the helper
does not check for NUD_VALID. When a neigh state (neigh->nud_state)
is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper
still returns SUCCESS instead of NO_NEIGH in this case. Because of the
SUCCESS return value, the bpf prog directly uses the returned dmac
and ends up filling all zero in the eth header.
This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is
not valid.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217004150.2980689-3-martin.lau@linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>