Using per-cpu thread pool we can reduce the scheduling latency compared
to workqueue implementation. With this patch scheduling latency and
variation is reduced as per-cpu threads are high priority kthread_workers.
The results were evaluated on arm64 Android devices running 5.10 kernel.
The table below shows resulting improvements of total scheduling latency
for the same app launch benchmark runs with 50 iterations. Scheduling
latency is the latency between when the task (workqueue kworker vs
kthread_worker) became eligible to run to when it actually started
running.
+-------------------------+-----------+----------------+---------+
| | workqueue | kthread_worker | diff |
+-------------------------+-----------+----------------+---------+
| Average (us) | 15253 | 2914 | -80.89% |
| Median (us) | 14001 | 2912 | -79.20% |
| Minimum (us) | 3117 | 1027 | -67.05% |
| Maximum (us) | 30170 | 3805 | -87.39% |
| Standard deviation (us) | 7166 | 359 | |
+-------------------------+-----------+----------------+---------+
Background: Boot times and cold app launch benchmarks are very
important to the Android ecosystem as they directly translate to
responsiveness from user point of view. While EROFS provides
a lot of important features like space savings, we saw some
performance penalty in cold app launch benchmarks in few scenarios.
Analysis showed that the significant variance was coming from the
scheduling cost while decompression cost was more or less the same.
Having per-cpu thread pool we can see from the above table that this
variation is reduced by ~80% on average. This problem was discussed
at LPC 2022. Link to LPC 2022 slides and talk at [1]
[1] https://lpc.events/event/16/contributions/1338/
[ Gao Xiang: At least, we have to add this until WQ_UNBOUND workqueue
issue [2] on many arm64 devices is resolved. ]
[2] https://lore.kernel.org/r/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com
Bug: 271635890
Test: launch_cvd
Change-Id: I9dce2bfd6f40ec6a210161b80cee7c0417b4edb3
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230208093322.75816-1-hsiangkao@linux.alibaba.com
(cherry picked from commit 3fffb589b9)
[dhavale: Fixed minor conflict as upstream now has zdata.h folded in
zdata.c]
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
(cherry picked from commit 566a7f6c6b)
[dhavale: Fixed minor conflicts in Kconfig and zdata.c]
For AOA re-connection, since the string ID of accessory has been changed
into a non-zero value, the f_accessory failes to call `usb_string_id` to
increment `next_string_id`. This makes the ADB interface display a wrong
name.
Bug: 270044830
Test: CTS Verifier: USB Accessory Test
Test: manual test
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
Change-Id: I807164588e80b28065e8715591a100392b04d3de
Changes in 5.15.98
io_uring: ensure that io_init_req() passes in the right issue_flags
Linux 5.15.98
Change-Id: I3d843bbf562cf5da5fc71adef802990dd2841add
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
We can't use 0 here, as io_init_req() is always invoked with the
ctx uring_lock held. Newer kernels have IO_URING_F_UNLOCKED for this,
but previously we used IO_URING_F_NONBLOCK to indicate this as well.
Fixes: cf7f9cd500 ("io_uring: add missing lock in io_get_file_fixed")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.15.97
ionic: refactor use of ionic_rx_fill()
Fix XFRM-I support for nested ESP tunnels
arm64: dts: rockchip: drop unused LED mode property from rk3328-roc-cc
ARM: dts: rockchip: add power-domains property to dp node on rk3288
HID: elecom: add support for TrackBall 056E:011C
ACPI: NFIT: fix a potential deadlock during NFIT teardown
btrfs: send: limit number of clones and allocated memory size
ASoC: rt715-sdca: fix clock stop prepare timeout issue
IB/hfi1: Assign npages earlier
neigh: make sure used and confirmed times are valid
HID: core: Fix deadloop in hid_apply_multiplier.
x86/cpu: Add Lunar Lake M
staging: mt7621-dts: change palmbus address to lower case
bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
net: Remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues().
vc_screen: don't clobber return value in vcs_read
scripts/tags.sh: Invoke 'realpath' via 'xargs'
scripts/tags.sh: fix incompatibility with PCRE2
usb: dwc3: pci: add support for the Intel Meteor Lake-M
USB: serial: option: add support for VW/Skoda "Carstick LTE"
usb: gadget: u_serial: Add null pointer check in gserial_resume
USB: core: Don't hold device lock while reading the "descriptors" sysfs file
io_uring: add missing lock in io_get_file_fixed
Linux 5.15.97
Change-Id: I7e043d6a6dce3cdedde819bebe654689b644de3c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
io_get_file_fixed will access io_uring's context. Lock it if it is
invoked unlocked (eg via io-wq) to avoid a race condition with fixed
files getting unregistered.
No single upstream patch exists for this issue, it was fixed as part
of the file assignment changes that went into the 5.18 cycle.
Signed-off-by: Jheng, Bing-Jhong Billy <billy@starlabs.sg>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45bf39f8df upstream.
Ever since commit 83e83ecb79 ("usb: core: get config and string
descriptors for unauthorized devices") was merged in 2013, there has
been no mechanism for reallocating the rawdescriptors buffers in
struct usb_device after the initial enumeration. Before that commit,
the buffers would be deallocated when a device was deauthorized and
reallocated when it was authorized and enumerated.
This means that the locking in the read_descriptors() routine is not
needed, since the buffers it reads will never be reallocated while the
routine is running. This locking can interfere with user programs
trying to read a hub's descriptors via sysfs while new child devices
of the hub are being initialized, since the hub is locked during this
procedure.
Since the locking in read_descriptors() hasn't been needed for over
nine years, we can remove it.
Reported-and-tested-by: Troels Liebe Bentsen <troels@connectedcars.dk>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: stable@vger.kernel.org
Link: https://lore.kernel.org/r/Y9l+wDTRbuZABzsE@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7394d2ebb6 upstream.
When COMPILED_SOURCE is set, running
make ARCH=x86_64 COMPILED_SOURCE=1 cscope tags
could throw the following errors:
scripts/tags.sh: line 98: /usr/bin/realpath: Argument list too long
cscope: no source files found
scripts/tags.sh: line 98: /usr/bin/realpath: Argument list too long
ctags: No files specified. Try "ctags --help".
This is most likely to happen when the kernel is configured to build a
large number of modules, which has the consequence of passing too many
arguments when calling 'realpath' in 'all_compiled_sources()'.
Let's improve this by invoking 'realpath' through 'xargs', which takes
care of properly limiting the argument list.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20220516234646.531208-1-cristian.ciocaltea@collabora.com
Cc: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae3419fbac upstream.
Commit 226fae124b ("vc_screen: move load of struct vc_data pointer in
vcs_read() to avoid UAF") moved the call to vcs_vc() into the loop.
While doing this it also moved the unconditional assignment of
ret = -ENXIO;
This unconditional assignment was valid outside the loop but within it
it clobbers the actual value of ret.
To avoid this only assign "ret = -ENXIO" when actually needed.
[ Also, the 'goto unlock_out" needs to be just a "break", so that it
does the right thing when it exits on later iterations when partial
success has happened - Linus ]
Reported-by: Storm Dragon <stormdragon2976@gmail.com>
Link: https://lore.kernel.org/lkml/Y%2FKS6vdql2pIsCiI@hotmail.com/
Fixes: 226fae124b ("vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/64981d94-d00c-4b31-9063-43ad0a384bde@t-8ch.de/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1fe4850b34 upstream.
The bpf_fib_lookup() helper does not only look up the fib (ie. route)
but it also looks up the neigh. Before returning the neigh, the helper
does not check for NUD_VALID. When a neigh state (neigh->nud_state)
is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper
still returns SUCCESS instead of NO_NEIGH in this case. Because of the
SUCCESS return value, the bpf prog directly uses the returned dmac
and ends up filling all zero in the eth header.
This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is
not valid.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217004150.2980689-3-martin.lau@linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ea427a222d ]
The initial value of hid->collection[].parent_idx if 0. When
Report descriptor doesn't contain "HID Collection", the value
remains as 0.
In the meanwhile, when the Report descriptor fullfill
all following conditions, it will trigger hid_apply_multiplier
function call.
1. Usage page is Generic Desktop Ctrls (0x01)
2. Usage is RESOLUTION_MULTIPLIER (0x48)
3. Contain any FEATURE items
The while loop in hid_apply_multiplier will search the top-most
collection by searching parent_idx == -1. Because all parent_idx
is 0. The loop will run forever.
There is a Report Descriptor triggerring the deadloop
0x05, 0x01, // Usage Page (Generic Desktop Ctrls)
0x09, 0x48, // Usage (0x48)
0x95, 0x01, // Report Count (1)
0x75, 0x08, // Report Size (8)
0xB1, 0x01, // Feature
Signed-off-by: Xin Zhao <xnzhao@google.com>
Link: https://lore.kernel.org/r/20230130212947.1315941-1-xnzhao@google.com
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c1d2ecdf5e ]
Entries can linger in cache without timer for days, thanks to
the gc_thresh1 limit. As result, without traffic, the confirmed
time can be outdated and to appear to be in the future. Later,
on traffic, NUD_STALE entries can switch to NUD_DELAY and start
the timer which can see the invalid confirmed time and wrongly
switch to NUD_REACHABLE state instead of NUD_PROBE. As result,
timer is set many days in the future. This is more visible on
32-bit platforms, with higher HZ value.
Why this is a problem? While we expect unused entries to expire,
such entries stay in REACHABLE state for too long, locked in
cache. They are not expired normally, only when cache is full.
Problem and the wrong state change reported by Zhang Changzhong:
172.16.1.18 dev bond0 lladdr 0a:0e:0f:01:12:01 ref 1 used 350521/15994171/350520 probes 4 REACHABLE
350520 seconds have elapsed since this entry was last updated, but it is
still in the REACHABLE state (base_reachable_time_ms is 30000),
preventing lladdr from being updated through probe.
Fix it by ensuring timer is started with valid used/confirmed
times. Considering the valid time range is LONG_MAX jiffies,
we try not to go too much in the past while we are in
DELAY/PROBE state. There are also places that need
used/updated times to be validated while timer is not running.
Reported-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33e17b3f5a ]
The arg->clone_sources_count is u64 and can trigger a warning when a
huge value is passed from user space and a huge array is allocated.
Limit the allocated memory to 8MiB (can be increased if needed), which
in turn limits the number of clone sources to 8M / sizeof(struct
clone_root) = 8M / 40 = 209715. Real world number of clones is from
tens to hundreds, so this is future proof.
Reported-by: syzbot+4376a9a073770c173269@syzkaller.appspotmail.com
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fb6df4366f ]
Lockdep reports that acpi_nfit_shutdown() may deadlock against an
opportune acpi_nfit_scrub(). acpi_nfit_scrub () is run from inside a
'work' and therefore has already acquired workqueue-internal locks. It
also acquiires acpi_desc->init_mutex. acpi_nfit_shutdown() first
acquires init_mutex, and was subsequently attempting to cancel any
pending workqueue items. This reversed locking order causes a potential
deadlock:
======================================================
WARNING: possible circular locking dependency detected
6.2.0-rc3 #116 Tainted: G O N
------------------------------------------------------
libndctl/1958 is trying to acquire lock:
ffff888129b461c0 ((work_completion)(&(&acpi_desc->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x43/0x450
but task is already holding lock:
ffff888129b460e8 (&acpi_desc->init_mutex){+.+.}-{3:3}, at: acpi_nfit_shutdown+0x87/0xd0 [nfit]
which lock already depends on the new lock.
...
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&acpi_desc->init_mutex);
lock((work_completion)(&(&acpi_desc->dwork)->work));
lock(&acpi_desc->init_mutex);
lock((work_completion)(&(&acpi_desc->dwork)->work));
*** DEADLOCK ***
Since the workqueue manipulation is protected by its own internal locking,
the cancellation of pending work doesn't need to be done under
acpi_desc->init_mutex. Move cancel_delayed_work_sync() outside the
init_mutex to fix the deadlock. Any work that starts after
acpi_nfit_shutdown() drops the lock will see ARS_CANCEL, and the
cancel_delayed_work_sync() will safely flush it out.
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/20230112-acpi_nfit_lockdep-v1-1-660be4dd10be@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29f316a1d7 ]
Make function buttons on ELECOM M-HT1DRBK trackball mouse work. This model
has two devices with different device IDs (010D and 011C). Both of
them misreports the number of buttons as 5 in the report descriptor, even
though they have 8 buttons. hid-elecom overwrites the report to fix them,
but supports only on 010D and does not work on 011C. This patch fixes
011C in the similar way but with specialized position parameters.
In fact, it is sufficient to rewrite only 17th byte (05 -> 08). However I
followed the existing way.
Signed-off-by: Takahiro Fujii <fujii@xaxxi.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b0355dbbf1 ]
This change adds support for nested IPsec tunnels by ensuring that
XFRM-I verifies existing policies before decapsulating a subsequent
policies. Addtionally, this clears the secpath entries after policies
are verified, ensuring that previous tunnels with no-longer-valid
do not pollute subsequent policy checks.
This is necessary especially for nested tunnels, as the IP addresses,
protocol and ports may all change, thus not matching the previous
policies. In order to ensure that packets match the relevant inbound
templates, the xfrm_policy_check should be done before handing off to
the inner XFRM protocol to decrypt and decapsulate.
Notably, raw ESP/AH packets did not perform policy checks inherently,
whereas all other encapsulated packets (UDP, TCP encapsulated) do policy
checks after calling xfrm_input handling in the respective encapsulation
layer.
Test: Verified with additional Android Kernel Unit tests
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e55f0f5bef ]
The same pre-work code is used before each call to
ionic_rx_fill(), so bring it in and make it a part of
the routine.
Signed-off-by: Neel Patel <neel@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Setting WQ_UNBOUND increases scheduler latency on ARM64. This is
likely due to the asymmetric architecture of ARM64 processors.
I've been unable to reproduce the results that claim WQ_UNBOUND gives
a performance boost on x86-64.
This flag is causing performance issues for multiple subsystems within
Android. Notably, the same slowdown exists for decompression with
EROFS.
| open-prebuilt-camera | WQ_UNBOUND | ~WQ_UNBOUND |
|-----------------------|------------|---------------|
| verity wait time (us) | 11746 | 119 (-98%) |
| erofs wait time (us) | 357805 | 174205 (-51%) |
| sha256 ramdisk random read | WQ_UNBOUND | ~WQ_UNBOUND |
|----------------------------|-----------=---|-------------|
| arm64 (accelerated) | bw=42.4MiB/s | bw=212MiB/s |
| arm64 (generic) | bw=16.5MiB/s | bw=48MiB/s |
| x86_64 (generic) | bw=233MiB/s | bw=230MiB/s |
Using a alloc_workqueue() @max_active arg of num_online_cpus() only
made sense with WQ_UNBOUND. Switch the @max_active arg to 0 (aka
default, which is 256 per-cpu).
Also, eliminate 'wq_flags' since it really doesn't serve a purpose.
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Bug: 233247259
Change-Id: Iea437fcfaa978a1389a57ef4d4adcb976d89089c
(cherry picked from commit c25da5b7ba)
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
WQ_HIGHPRI increases throughput and decreases disk latency when using
dm-verity. This is important in Android for camera startup speed.
The following tests were run by doing 60 seconds of random reads using
a dm-verity device backed by two ramdisks.
Without WQ_HIGHPRI
lat (usec): min=13, max=3947, avg=69.53, stdev=50.55
READ: bw=51.1MiB/s (53.6MB/s), 51.1MiB/s-51.1MiB/s (53.6MB/s-53.6MB/s)
With WQ_HIGHPRI:
lat (usec): min=13, max=7854, avg=31.15, stdev=30.42
READ: bw=116MiB/s (121MB/s), 116MiB/s-116MiB/s (121MB/s-121MB/s)
Further testing was done by measuring how long it takes to open a
camera on an Android device.
Without WQ_HIGHPRI
Total verity work queue wait times (ms):
880.960, 789.517, 898.852
With WQ_HIGHPRI:
Total verity work queue wait times (ms):
528.824, 439.191, 433.300
The average time to open the camera is reduced by 350ms (or 40-50%).
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Bug: 233247259
Change-Id: I7d600c924b4a3e793b9a26c2852139683061a831
(cherry picked from commit afd41fff9c)
[nhuck: Resolved minor conflict in drivers/md/dm-verity-target.c ]
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Define protected modules and exports list for the aarch64 target.
This enables support to update and/or create protected
exports list at common/android/abi_gki_protected_exports based on
the modules listed in common/android/gki_protected_modules.
Command:
bazel run --config=fast //common:kernel_aarch64_abi_update_protected_exports
Bug: 268679215
Test: bazel run //common:kernel_aarch64_abi_update_protected_exports
Test: TH
Change-Id: I36a18162c2ef253fcf691b016b4da861d9c61e4a
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
Set KMI_GENERATION=1 for 3/1 KMI update
Location: "Kernel ABI difference"
Analyzer Description: Report STG ABI changes between builds (stgdiff on .stg).
Owner: gki-eng-team@google.com
ABI for 'kernel_abi_aarch64' on 'aosp_kernel-common-android14-5.15' has changed.
[old=ab/9668063/kernel_abi_aarch64; new=ab/P51085744/kernel_abi_aarch64; file=abi.stg; tool=stgdiff_stg; version=AyeAye - CL 512944641 android-build.ayeaye-python-dispatcher_20230228.05_p0; rc=4]
For more information, please visit go/kernel-abi-monitoring.
function symbol 'int aead_register_instance(struct crypto_template*, struct aead_instance*)' changed
CRC changed from 0x4842bcc to 0x523040da
function symbol 'int ahash_register_instance(struct crypto_template*, struct ahash_instance*)' changed
CRC changed from 0xa4ff2220 to 0x1ea3dc79
function symbol 'int crypto_aead_decrypt(struct aead_request*)' changed
CRC changed from 0xd5621133 to 0x40117c2
... 65 omitted; 68 symbols have only CRC changes
type 'struct aead_instance' changed
byte size changed from 768 to 576
member changed from 'union { struct { char head[128]; struct crypto_instance base; } s; struct aead_alg alg; }' to 'union { struct { char head[64]; struct crypto_instance base; } s; struct aead_alg alg; }'
offset changed from 1024 to 512
type changed from 'union { struct { char head[128]; struct crypto_instance base; } s; struct aead_alg alg; }' to 'union { struct { char head[64]; struct crypto_instance base; } s; struct aead_alg alg; }'
byte size changed from 640 to 512
member changed from 'struct { char head[128]; struct crypto_instance base; } s' to 'struct { char head[64]; struct crypto_instance base; } s'
type changed from 'struct { char head[128]; struct crypto_instance base; }' to 'struct { char head[64]; struct crypto_instance base; }'
byte size changed from 640 to 512
member changed from 'char head[128]' to 'char head[64]'
type changed from 'char[128]' to 'char[64]'
number of elements changed from 128 to 64
member 'struct crypto_instance base' changed
offset changed by -512
type 'struct ahash_instance' changed
byte size changed from 896 to 704
member changed from 'union { struct { char head[256]; struct crypto_instance base; } s; struct ahash_alg alg; }' to 'union { struct { char head[192]; struct crypto_instance base; } s; struct ahash_alg alg; }'
offset changed from 1024 to 512
type changed from 'union { struct { char head[256]; struct crypto_instance base; } s; struct ahash_alg alg; }' to 'union { struct { char head[192]; struct crypto_instance base; } s; struct ahash_alg alg; }'
byte size changed from 768 to 640
member changed from 'struct { char head[256]; struct crypto_instance base; } s' to 'struct { char head[192]; struct crypto_instance base; } s'
type changed from 'struct { char head[256]; struct crypto_instance base; }' to 'struct { char head[192]; struct crypto_instance base; }'
byte size changed from 768 to 640
member changed from 'char head[256]' to 'char head[192]'
type changed from 'char[256]' to 'char[192]'
number of elements changed from 256 to 192
member 'struct crypto_instance base' changed
offset changed by -512
type 'struct crypto_aead' changed
byte size changed from 256 to 128
member 'struct crypto_tfm base' changed
offset changed by -512
type 'struct crypto_ahash' changed
byte size changed from 256 to 192
type 'struct crypto_tfm' changed
byte size changed from 128 to 64
member 'void* __crt_ctx[0]' changed
offset changed by -512
type 'struct crypto_rng' changed
byte size changed from 128 to 64
type 'struct crypto_shash' changed
byte size changed from 256 to 128
member 'struct crypto_tfm base' changed
offset changed by -512
type 'struct crypto_skcipher' changed
byte size changed from 256 to 128
member 'struct crypto_tfm base' changed
offset changed by -512
type 'struct crypto_sync_skcipher' changed
byte size changed from 256 to 128
type 'struct crypto_cipher' changed
byte size changed from 128 to 64
type 'struct crypto_comp' changed
byte size changed from 128 to 64
type 'struct crypto_instance' changed
byte size changed from 512 to 448
member 'void* __ctx[0]' changed
offset changed by -512
type 'struct aead_alg' changed
byte size changed from 512 to 448
member 'struct crypto_alg base' changed
offset changed by -512
type 'struct ahash_alg' changed
byte size changed from 640 to 576
type 'struct rng_alg' changed
byte size changed from 512 to 448
member 'struct crypto_alg base' changed
offset changed by -512
type 'struct shash_alg' changed
byte size changed from 640 to 576
member 'struct crypto_alg base' changed
offset changed by -512
type 'struct skcipher_alg' changed
byte size changed from 512 to 448
member 'struct crypto_alg base' changed
offset changed by -512
type 'struct shash_instance' changed
byte size changed from 896 to 704
member changed from 'union { struct { char head[256]; struct crypto_instance base; } s; struct shash_alg alg; }' to 'union { struct { char head[192]; struct crypto_instance base; } s; struct shash_alg alg; }'
offset changed from 1024 to 512
type changed from 'union { struct { char head[256]; struct crypto_instance base; } s; struct shash_alg alg; }' to 'union { struct { char head[192]; struct crypto_instance base; } s; struct shash_alg alg; }'
byte size changed from 768 to 640
member changed from 'struct { char head[256]; struct crypto_instance base; } s' to 'struct { char head[192]; struct crypto_instance base; } s'
type changed from 'struct { char head[256]; struct crypto_instance base; }' to 'struct { char head[192]; struct crypto_instance base; }'
byte size changed from 768 to 640
member changed from 'char head[256]' to 'char head[192]'
type changed from 'char[256]' to 'char[192]'
number of elements changed from 256 to 192
member 'struct crypto_instance base' changed
offset changed by -512
type 'struct skcipher_instance' changed
byte size changed from 768 to 576
member changed from 'union { struct { char head[128]; struct crypto_instance base; } s; struct skcipher_alg alg; }' to 'union { struct { char head[64]; struct crypto_instance base; } s; struct skcipher_alg alg; }'
offset changed from 1024 to 512
type changed from 'union { struct { char head[128]; struct crypto_instance base; } s; struct skcipher_alg alg; }' to 'union { struct { char head[64]; struct crypto_instance base; } s; struct skcipher_alg alg; }'
byte size changed from 640 to 512
member changed from 'struct { char head[128]; struct crypto_instance base; } s' to 'struct { char head[64]; struct crypto_instance base; } s'
type changed from 'struct { char head[128]; struct crypto_instance base; }' to 'struct { char head[64]; struct crypto_instance base; }'
byte size changed from 640 to 512
member changed from 'char head[128]' to 'char head[64]'
type changed from 'char[128]' to 'char[64]'
number of elements changed from 128 to 64
member 'struct crypto_instance base' changed
offset changed by -512
type 'struct hash_alg_common' changed
byte size changed from 512 to 448
member 'struct crypto_alg base' changed
offset changed by -512
Bug: 271188187
Change-Id: I0f5a5967bda2df567d26d9bb5acef4c16b31cfc9
Signed-off-by: Todd Kjos <tkjos@google.com>
Currently, ARCH_DMA_MINALIGN is set to 128 bytes for ARM64, which
means that the minimum size for kmalloc objects is 128 bytes.
ARCH_DMA_MINALIGN is required to be 128 bytes to be able to use a
single kernel image to support non-coherent DMA on systems with
cachelines up to 128 bytes in size.
However, the current value of 128 bytes leads to a large amount of
wasted memory for slab allocations on systems that have 64 byte
cachelines and only need a minimum alignment of 64 bytes for
DMA buffers. If these systems are allowed to use a smaller
ARCH_DMA_MINALIGN value of 64, the memory footprint of slab
allocations can be reduced by redirecting some allocations from
the kmalloc-128 and kmalloc-256 caches to the kmalloc-64 and
kmalloc-192 slab caches.
The following output from the slabinfo tool from a device running
Linux 6.1-rc5 reveals that lowering ARCH_DMA_MINALIGN from 128 bytes
to 64 bytes reduces the memory footprint of slab allocations by
16.6 MB--almost 5%.
ARCH_DMA_MINALIGN == 128:
Name Objects Objsize Space
kmalloc-128 236973 128 33.0M
kmalloc-rcl-128 5541 128 724.9K
kmalloc-cg-128 10367 128 1.5M
kmalloc-256 12986 256 3.5M
kmalloc-rcl-256 256 256 65.5K
kmalloc-cg-256 544 256 139.2K
Total: 266667 38.9M
ARCH_DMA_MINALIGN == 64:
Name Objects Objsize Space
kmalloc-64 216525 64 14.9M
kmalloc-rcl-64 3663 64 249.8K
kmalloc-cg-64 10269 64 864.2K
kmalloc-128 22797 128 3.5M
kmalloc-rcl-128 2016 128 258.0K
kmalloc-cg-128 288 128 36.8K
kmalloc-192 5532 192 1.1M
kmalloc-rcl-192 147 192 28.6K
kmalloc-cg-192 462 192 90.1K
kmalloc-256 5110 256 1.3M
kmalloc-rcl-256 0 256 0K
kmalloc-cg-256 224 256 57.3K
Total: 267033 22.3M
Thus, given the amount of memory saved by lowering ARCH_DMA_MINALIGN,
and that we are not aware of systems that have 128 byte cachelines
that will launch with newer kernels, lower the value of
ARCH_DMA_MINALIGN to 64 bytes for ARM64.
This is meant to serve as an intermediate solution while the series
in [1] is finalized.
[1] https://lore.kernel.org/linux-iommu/20221106220143.2129263-1-catalin.marinas@arm.com/
Bug: 241844128
Bug: 267786731
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Change-Id: Idde8d1c682865582382766acc0443dda1a8a4f12
If the value of ARCH_DMA_MINALIGN is less than the cache line size of
the system, it is not possible to safely perform non-coherent DMA
transactions. For example, if a DMA buffer is used for non-coherent
DMA from a device, and that buffer shares a cacheline with another
buffer that the CPU operated on in the past, the data from the device
will be overwritten if the cacheline is evicted.
These sort of DMA corruptions are non-trivial to find, so instead of
allowing a system to continue booting and potentially initiate an
unsafe DMA transaction, trigger a kernel panic if the minimum DMA
alignment is smaller than the cache line size of the system.
Bug: 241844128
Bug: 267786731
Change-Id: I97998a4b3eea25d0956416c020ac0a6aa6950fb8
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Disable zoned write pipelining until it is clear how the zoned write
command order can be preserved.
Bug: 270741871
Change-Id: Ic43e75aab0a13193394a362a5b1d22052c4a7d05
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Decrement the command retry counter before retrying a zoned write.
This patch has not been posted on any Linux kernel mailing list since
there is agreement that another approach will be taken to preserve the
order of WRITE commands for zoned devices. However, the implementation
of that new approach is not yet available.
Bug: 197782466
Fixes: 33aea9741e ("ANDROID: scsi: Retry unaligned zoned writes")
Fixes: If89c1f0b4d382978c52382dd3634f39fc15bcaf0
Change-Id: Idec66e2f5a8ee7ab218cdbcdb308f0bf2e9802fe
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Changes in 5.15.96
drm/etnaviv: don't truncate physical page address
wifi: rtl8xxxu: gen2: Turn on the rate control
drm/edid: Fix minimum bpc supported with DSC1.2 for HDMI sink
clk: mxl: Switch from direct readl/writel based IO to regmap based IO
clk: mxl: Remove redundant spinlocks
clk: mxl: Add option to override gate clks
clk: mxl: Fix a clk entry by adding relevant flags
powerpc: dts: t208x: Mark MAC1 and MAC2 as 10G
clk: mxl: syscon_node_to_regmap() returns error pointers
random: always mix cycle counter in add_latent_entropy()
KVM: x86: Fail emulation during EMULTYPE_SKIP on any exception
KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid
KVM: VMX: Execute IBPB on emulated VM-exit when guest has IBRS
can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
powerpc: dts: t208x: Disable 10G on MAC1 and MAC2
powerpc: use generic version of arch_is_kernel_initmem_freed()
powerpc/vmlinux.lds: Ensure STRICT_ALIGN_SIZE is at least page aligned
powerpc/vmlinux.lds: Add an explicit symbol for the SRWX boundary
powerpc/64s/radix: Fix crash with unaligned relocated kernel
powerpc/64s/radix: Fix RWX mapping with relocated kernel
drm/i915/gvt: fix double free bug in split_2MB_gtt_entry
uaccess: Add speculation barrier to copy_from_user()
binder: read pre-translated fds from sender buffer
binder: defer copies of pre-patched txn data
binder: fix pointer cast warning
binder: Address corner cases in deferred copy and fixup
binder: Gracefully handle BINDER_TYPE_FDA objects with num_fds=0
nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
wifi: mwifiex: Add missing compatible string for SD8787
audit: update the mailing list in MAINTAINERS
ext4: Fix function prototype mismatch for ext4_feat_ktype
kbuild: Add CONFIG_PAHOLE_VERSION
scripts/pahole-flags.sh: Use pahole-version.sh
lib/Kconfig.debug: Use CONFIG_PAHOLE_VERSION
lib/Kconfig.debug: Allow BTF + DWARF5 with pahole 1.21+
Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs"
bpf: add missing header file include
Linux 5.15.96
Change-Id: Ifd0d066cee65d011049ee351d1307317d65135ea
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit f3dd0c5337 upstream.
Commit 74e19ef0ff ("uaccess: Add speculation barrier to
copy_from_user()") built fine on x86-64 and arm64, and that's the extent
of my local build testing.
It turns out those got the <linux/nospec.h> include incidentally through
other header files (<linux/kvm_host.h> in particular), but that was not
true of other architectures, resulting in build errors
kernel/bpf/core.c: In function ‘___bpf_prog_run’:
kernel/bpf/core.c:1913:3: error: implicit declaration of function ‘barrier_nospec’
so just make sure to explicitly include the proper <linux/nospec.h>
header file to make everybody see it.
Fixes: 74e19ef0ff ("uaccess: Add speculation barrier to copy_from_user()")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-by: Huacai Chen <chenhuacai@loongson.cn>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af7b29b1de upstream.
taprio_attach() has this logic at the end, which should have been
removed with the blamed patch (which is now being reverted):
/* access to the child qdiscs is not needed in offload mode */
if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
kfree(q->qdiscs);
q->qdiscs = NULL;
}
because otherwise, we make use of q->qdiscs[] even after this array was
deallocated, namely in taprio_leaf(). Therefore, whenever one would try
to attach a valid child qdisc to a fully offloaded taprio root, one
would immediately dereference a NULL pointer.
$ tc qdisc replace dev eno0 handle 8001: parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
max-sdu 0 0 0 0 0 200 0 0 \
base-time 200 \
sched-entry S 80 20000 \
sched-entry S a0 20000 \
sched-entry S 5f 60000 \
flags 2
$ max_frame_size=1500
$ data_rate_kbps=20000
$ port_transmit_rate_kbps=1000000
$ idleslope=$data_rate_kbps
$ sendslope=$(($idleslope - $port_transmit_rate_kbps))
$ locredit=$(($max_frame_size * $sendslope / $port_transmit_rate_kbps))
$ hicredit=$(($max_frame_size * $idleslope / $port_transmit_rate_kbps))
$ tc qdisc replace dev eno0 parent 8001:7 cbs \
idleslope $idleslope \
sendslope $sendslope \
hicredit $hicredit \
locredit $locredit \
offload 0
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
pc : taprio_leaf+0x28/0x40
lr : qdisc_leaf+0x3c/0x60
Call trace:
taprio_leaf+0x28/0x40
tc_modify_qdisc+0xf0/0x72c
rtnetlink_rcv_msg+0x12c/0x390
netlink_rcv_skb+0x5c/0x130
rtnetlink_rcv+0x1c/0x2c
The solution is not as obvious as the problem. The code which deallocates
q->qdiscs[] is in fact copied and pasted from mqprio, which also
deallocates the array in mqprio_attach() and never uses it afterwards.
Therefore, the identical cleanup logic of priv->qdiscs[] that
mqprio_destroy() has is deceptive because it will never take place at
qdisc_destroy() time, but just at raw ops->destroy() time (otherwise
said, priv->qdiscs[] do not last for the entire lifetime of the mqprio
root), but rather, this is just the twisted way in which the Qdisc API
understands error path cleanup should be done (Qdisc_ops :: destroy() is
called even when Qdisc_ops :: init() never succeeded).
Side note, in fact this is also what the comment in mqprio_init() says:
/* pre-allocate qdisc, attachment can't fail */
Or reworded, mqprio's priv->qdiscs[] scheme is only meant to serve as
data passing between Qdisc_ops :: init() and Qdisc_ops :: attach().
[ this comment was also copied and pasted into the initial taprio
commit, even though taprio_attach() came way later ]
The problem is that taprio also makes extensive use of the q->qdiscs[]
array in the software fast path (taprio_enqueue() and taprio_dequeue()),
but it does not keep a reference of its own on q->qdiscs[i] (you'd think
that since it creates these Qdiscs, it holds the reference, but nope,
this is not completely true).
To understand the difference between taprio_destroy() and mqprio_destroy()
one must look before commit 13511704f8 ("net: taprio offload: enforce
qdisc to netdev queue mapping"), because that just muddied the waters.
In the "original" taprio design, taprio always attached itself (the root
Qdisc) to all netdev TX queues, so that dev_qdisc_enqueue() would go
through taprio_enqueue().
It also called qdisc_refcount_inc() on itself for as many times as there
were netdev TX queues, in order to counter-balance what tc_get_qdisc()
does when destroying a Qdisc (simplified for brevity below):
if (n->nlmsg_type == RTM_DELQDISC)
err = qdisc_graft(dev, parent=NULL, new=NULL, q, extack);
qdisc_graft(where "new" is NULL so this deletes the Qdisc):
for (i = 0; i < num_q; i++) {
struct netdev_queue *dev_queue;
dev_queue = netdev_get_tx_queue(dev, i);
old = dev_graft_qdisc(dev_queue, new);
if (new && i > 0)
qdisc_refcount_inc(new);
qdisc_put(old);
~~~~~~~~~~~~~~
this decrements taprio's refcount once for each TX queue
}
notify_and_destroy(net, skb, n, classid,
rtnl_dereference(dev->qdisc), new);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and this finally decrements it to zero,
making qdisc_put() call qdisc_destroy()
The q->qdiscs[] created using qdisc_create_dflt() (or their
replacements, if taprio_graft() was ever to get called) were then
privately freed by taprio_destroy().
This is still what is happening after commit 13511704f8 ("net: taprio
offload: enforce qdisc to netdev queue mapping"), but only for software
mode.
In full offload mode, the per-txq "qdisc_put(old)" calls from
qdisc_graft() now deallocate the child Qdiscs rather than decrement
taprio's refcount. So when notify_and_destroy(taprio) finally calls
taprio_destroy(), the difference is that the child Qdiscs were already
deallocated.
And this is exactly why the taprio_attach() comment "access to the child
qdiscs is not needed in offload mode" is deceptive too. Not only the
q->qdiscs[] array is not needed, but it is also necessary to get rid of
it as soon as possible, because otherwise, we will also call qdisc_put()
on the child Qdiscs in qdisc_destroy() -> taprio_destroy(), and this
will cause a nasty use-after-free/refcount-saturate/whatever.
In short, the problem is that since the blamed commit, taprio_leaf()
needs q->qdiscs[] to not be freed by taprio_attach(), while qdisc_destroy()
-> taprio_destroy() does need q->qdiscs[] to be freed by taprio_attach()
for full offload. Fixing one problem triggers the other.
All of this can be solved by making taprio keep its q->qdiscs[i] with a
refcount elevated at 2 (in offloaded mode where they are attached to the
netdev TX queues), both in taprio_attach() and in taprio_graft(). The
generic qdisc_graft() would just decrement the child qdiscs' refcounts
to 1, and taprio_destroy() would give them the final coup de grace.
However the rabbit hole of changes is getting quite deep, and the
complexity increases. The blamed commit was supposed to be a bug fix in
the first place, and the bug it addressed is not so significant so as to
justify further rework in stable trees. So I'd rather just revert it.
I don't know enough about multi-queue Qdisc design to make a proper
judgement right now regarding what is/isn't idiomatic use of Qdisc
concepts in taprio. I will try to study the problem more and come with a
different solution in net-next.
Fixes: 1461d212ab ("net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs")
Reported-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20221004220100.1650558-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>