mirror of
https://github.com/hardkernel/linux.git
synced 2026-06-06 02:50:49 +09:00
d86f55a38edfefc37fedd356d10979d697c42eb5
1074866 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d86f55a38e |
ANDROID: Pixel: Add missing symbol to symbol list
Pixel MM Metrics:
add the missing symbol 'seq_put_decimal_ll' and re-do update list
Bug: 299190787
Test: local build
Change-Id: I005ccfa15cee8252bc51242460bbab9b7d0eb2ab
Signed-off-by: Robin Hsu <robinhsu@google.com>
|
||
|
|
699d87745a |
ANDROID: GKI: update mtktv symbol
11 function symbol(s) added 'void usbnet_cdc_unbind(struct usbnet*, struct usb_interface*)' 'void usbnet_disconnect(struct usb_interface*)' 'int usbnet_generic_cdc_bind(struct usbnet*, struct usb_interface*)' 'int usbnet_open(struct net_device*)' 'int usbnet_probe(struct usb_interface*, const struct usb_device_id*)' 'int usbnet_resume(struct usb_interface*)' 'void usbnet_skb_return(struct usbnet*, struct sk_buff*)' 'netdev_tx_t usbnet_start_xmit(struct sk_buff*, struct net_device*)' 'int usbnet_stop(struct net_device*)' 'int usbnet_suspend(struct usb_interface*, pm_message_t)' 'void usbnet_tx_timeout(struct net_device*, unsigned int)' Bug: 344792547 Change-Id: I5954a9e53e2a4687ac39d51378d4e519c74e9685 Signed-off-by: yenchia.chen <yenchia.chen@mediatek.com> |
||
|
|
7f18f2a32a |
ANDROID: set rewrite_absolute_paths_in_config for GKI aarch64.
For GKI targets with kmi_symbol_list set,
also set rewrite_absolute_paths_in_config to True
so that CONFIG_UNUSED_KSYMS_WHITELIST is not an
absolute path any more, increasing reproducibility
on different machines.
This is made possible with
|
||
|
|
0f96f3a53c |
BACKPORT: USB: gadget: core: create sysfs link between udc and gadget
udc device and gadget device are tightly coupled, yet there's no good way to corelate the two. Add a sysfs link in udc that points to the corresponding gadget device. An example use case: userspace configures a f_midi configfs driver and bind the udc device, then it tries to locate the corresponding midi device, which is a child device of the gadget device. The gadget device that's associated to the udc device has to be identified in order to index the midi device. Having a sysfs link would make things much easier. Signed-off-by: Roy Luo <royluo@google.com> Link: https://lore.kernel.org/r/20240307030922.3573161-1-royluo@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Bug: 333778731 Change-Id: I9e3f782543eba5e026a65031aaae754daafb69ab (cherry picked from commit 0ef40f399aa2be8c04aee9b7430705612c104ce5) [royluo: Resolved conflict in drivers/usb/gadget/udc/core.c ] Signed-off-by: Roy Luo <royluo@google.com> |
||
|
|
eef06c5488 |
ANDROID: Add __nocfi return for swsusp_arch_resume
Resolve the CFI failure problem encountered during the restoration of the hibernation snapshot image. Bug: 340049585 Signed-off-by: Mukesh Pilaniya <quic_mpilaniy@quicinc.com> Signed-off-by: Auditya Bhattaram <quic_audityab@quicinc.com> Signed-off-by: Kamati Srinivas <quic_kamasrin@quicinc.com> (cherry picked from https://android-review.googlesource.com/q/commit:93f0348ea15b65656e1c2f909de3dd3bbf26599f) Merged-In: I1f8f2c38e9d02a177c0cadb066419bf7edd66085 Change-Id: I1f8f2c38e9d02a177c0cadb066419bf7edd66085 |
||
|
|
9a84d60e35 |
ANDROID: ABI fixup for abi break in struct dst_ops
In commit 92f1655aa2b2 ("net: fix __dst_negative_advice() race") the
struct dst_ops callback negative_advice is callback changes function
parameters. But as this pointer is part of a structure that is tracked
in the ABI checker, the tool triggers when this is changed.
However, the callback pointer is internal to the networking stack, so
changing the function type is safe, so needing to preserve this is not
required. To do so, switch the function pointer type back to the old
one so that the checking tools pass, AND then do a hard cast of the
function pointer to the new type when assigning and calling the
function.
Bug: 343727534
Fixes: 92f1655aa2b2 ("net: fix __dst_negative_advice() race")
Change-Id: I48d4ab4bbd29f8edc8fbd7923828b7f78a23e12e
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
||
|
|
9b0dadc811 |
BACKPORT: net: fix __dst_negative_advice() race
__dst_negative_advice() does not enforce proper RCU rules when
sk->dst_cache must be cleared, leading to possible UAF.
RCU rules are that we must first clear sk->sk_dst_cache,
then call dst_release(old_dst).
Note that sk_dst_reset(sk) is implementing this protocol correctly,
while __dst_negative_advice() uses the wrong order.
Given that ip6_negative_advice() has special logic
against RTF_CACHE, this means each of the three ->negative_advice()
existing methods must perform the sk_dst_reset() themselves.
Note the check against NULL dst is centralized in
__dst_negative_advice(), there is no need to duplicate
it in various callbacks.
Many thanks to Clement Lecigne for tracking this issue.
This old bug became visible after the blamed commit, using UDP sockets.
Bug: 343727534
Fixes:
|
||
|
|
42528b83af |
ANDROID: GKI: Update symbol list for Pixel Watch
Bug: 332548598 Change-Id: I0843ed59b917daeb04b179e5ed26be52fbe52909 Signed-off-by: Andrei Ciubotariu <aciubotariu@google.com> |
||
|
|
d94e16579b |
Revert^2 "BACKPORT: FROMGIT: module: allow UNUSED_KSYMS_WHITELIST ..."
This reverts commit
|
||
|
|
81d41dd61b |
Merge tag 'android14-5.15.149_r00' into branch android14-5.15
This catches the android14-5.15 branch up to date with the 5.15.149 LTS release. Included in here are the following commits: * |
||
|
|
73bef21507 |
UPSTREAM: selftests: timers: Fix valid-adjtimex signed left-shift undefined behavior
[ Upstream commit 076361362122a6d8a4c45f172ced5576b2d4a50d ]
The struct adjtimex freq field takes a signed value who's units are in
shifted (<<16) parts-per-million.
Unfortunately for negative adjustments, the straightforward use of:
freq = ppm << 16 trips undefined behavior warnings with clang:
valid-adjtimex.c:66:6: warning: shifting a negative signed value is undefined [-Wshift-negative-value]
-499<<16,
~~~~^
valid-adjtimex.c:67:6: warning: shifting a negative signed value is undefined [-Wshift-negative-value]
-450<<16,
~~~~^
..
Fix it by using a multiply by (1 << 16) instead of shifting negative values
in the valid-adjtimex test case. Align the values for better readability.
Bug: 339526723
Reported-by: Lee Jones <joneslee@google.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Change-Id: Ied611c13a802acf9c7a2427f0a61eb358b571a3d
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20240409202222.2830476-1-jstultz@google.com
Link: https://lore.kernel.org/lkml/0c6d4f0d-2064-4444-986b-1d1ed782135f@collabora.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit
|
||
|
|
58719dc95f |
UPSTREAM: netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
__netlink_dump_start() releases nlk->cb_mutex right before calling netlink_dump() which grabs it again. This seems dangerous, even if KASAN did not bother yet. Add a @lock_taken parameter to netlink_dump() to let it grab the mutex if called from netlink_recvmsg() only. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit b5590270068c4324dac4a2b5a4a156e02e21339f) Bug: 339546075 Change-Id: I29a711ea804794b556674011cbd23c5bf9a03ab6 Signed-off-by: yenchia.chen <yenchia.chen@mediatek.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> |
||
|
|
e94d70c73a |
UPSTREAM: netlink: annotate data-races around sk->sk_err
commit |
||
|
|
c0e59dab58 |
UPSTREAM: netlink: annotate lockless accesses to nlk->max_recvmsg_len
commit |
||
|
|
8d122b23ed |
ANDROID: incremental-fs: Make work with 16k pages
Bug: 260919895 Test: incfs_test passes Signed-off-by: Paul Lawrence <paullawrence@google.com> Change-Id: Ia4fbb6011930b085bc00a36851e9b0e8559d3dc5 (cherry picked from commit 5ac10739bcf2dae9220a7a39392aa41235bc64c2) |
||
|
|
b7d4cf60e7 |
Revert "BACKPORT: FROMGIT: module: allow UNUSED_KSYMS_WHITELIST ..."
Revert submission 3101887-android14-ksyms-wl Reason for revert: Restore green in release builds Reverted changes: /q/submissionid:3101887-android14-ksyms-wl Change-Id: Ia6c694be720c7a07abdfee6a4720a4518ea59141 |
||
|
|
acacba8bfa |
BACKPORT: FROMGIT: module: allow UNUSED_KSYMS_WHITELIST to be relative against objtree.
If UNUSED_KSYMS_WHITELIST is a file generated before Kbuild runs, and the source tree is in a read-only filesystem, the developer must put the file somewhere and specify an absolute path to UNUSED_KSYMS_WHITELIST. This worked, but if IKCONFIG=y, an absolute path is embedded into .config and eventually into vmlinux, causing the build to be less reproducible when building on a different machine. This patch makes the handling of UNUSED_KSYMS_WHITELIST to be similar to MODULE_SIG_KEY. First, check if UNUSED_KSYMS_WHITELIST is an absolute path, just as before this patch. If so, use the path as is. If it is a relative path, use wildcard to check the existence of the file below objtree first. If it does not exist, fall back to the original behavior of adding $(srctree)/ before the value. After this patch, the developer can put the generated file in objtree, then use a relative path against objtree in .config, eradicating any absolute paths that may be evaluated differently on different machines. Signed-off-by: Yifan Hong <elsk@google.com> Reviewed-by: Elliot Berman <quic_eberman@quicinc.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> (cherry picked from commit a2e3c811938b4902725e259c03b2d6c539613992 https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git modules-next) Bug: 333769605 Change-Id: I0696ac8f686329795034ada5a4587af4ecbb774f [elsk: apply change to gen_autoksyms.sh instead because CONFIG_UNUSED_KSYMS_WHITELIST is parsed there. Revert change to Makefile.modpost. Edit init/Kconfig instead of kernel/module/Kconfig because UNUSED_KSYMS_WHITELIST is defined there.] Bug: 342390208 Signed-off-by: Yifan Hong <elsk@google.com> |
||
|
|
d19f3c553a |
ANDROID: export one function for mm metrics
export function for sysfs node formating Bug: 299190787 Change-Id: I71e6a0815efa8df99d036bf457b8a0081999f3de Signed-off-by: Robin Hsu <robinhsu@google.com> |
||
|
|
de30e28ba8 |
FROMLIST: usb: typec: tcpm: Ignore received Hard Reset in TOGGLING state
Similar to what fixed in Commit a6fe37f428c1 ("usb: typec: tcpm: Skip
hard reset when in error recovery"), the handling of the received Hard
Reset has to be skipped during TOGGLING state.
[ 4086.021288] VBUS off
[ 4086.021295] pending state change SNK_READY -> SNK_UNATTACHED @ 650 ms [rev2 NONE_AMS]
[ 4086.022113] VBUS VSAFE0V
[ 4086.022117] state change SNK_READY -> SNK_UNATTACHED [rev2 NONE_AMS]
[ 4086.022447] VBUS off
[ 4086.022450] state change SNK_UNATTACHED -> SNK_UNATTACHED [rev2 NONE_AMS]
[ 4086.023060] VBUS VSAFE0V
[ 4086.023064] state change SNK_UNATTACHED -> SNK_UNATTACHED [rev2 NONE_AMS]
[ 4086.023070] disable BIST MODE TESTDATA
[ 4086.023766] disable vbus discharge ret:0
[ 4086.023911] Setting usb_comm capable false
[ 4086.028874] Setting voltage/current limit 0 mV 0 mA
[ 4086.028888] polarity 0
[ 4086.030305] Requesting mux state 0, usb-role 0, orientation 0
[ 4086.033539] Start toggling
[ 4086.038496] state change SNK_UNATTACHED -> TOGGLING [rev2 NONE_AMS]
// This Hard Reset is unexpected
[ 4086.038499] Received hard reset
[ 4086.038501] state change TOGGLING -> HARD_RESET_START [rev2 HARD_RESET]
Fixes:
|
||
|
|
fb512eac3c |
ANDROID: binder: fix ptrdiff_t printk-format issue
The correct printk format specifier when calculating buffer offsets
should be "%tx" as it is a pointer difference (a.k.a ptrdiff_t). This
fixes some W=1 build warnings reported by the kernel test robot.
Bug: 329799092
Fixes:
|
||
|
|
2526c59f3f |
ANDROID: usb: Optimize the problem of slow transfer rate in USB accessory mode
The data transfer rate using Google Restore in USB3.2 mode is slower,
only about 140MB/s at 5Gbps.
The bMaxBurst is not set, and num_fifos in
dwc3_gadget_resize_tx_fifosis 1, which results
in only 131btye of dwc3 ram space being allocated to ep.
Modify bMaxBurst to 6.
The 5Gbps rate increases from 140MB/s to 350MB/s.
The 10Gbps rate is increased from 220MB/s to 500MB/s.
Bug: 340049583
BUG: 341178033
Change-Id: I5710af32c72d0b57afaecc00c4f0909af4b9a299
Signed-off-by: Lianqin Hu <hulianqin@vivo.corp-partner.google.com>
Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
(cherry picked from commit
|
||
|
|
5981994114 |
UPSTREAM: usb: typec: tcpm: clear pd_event queue in PORT_RESET
When a Fast Role Swap control message attempt results in a transition
to ERROR_RECOVERY, the TCPC can still queue a TCPM_SOURCING_VBUS event.
If the event is queued but processed after the tcpm_reset_port() call
in the PORT_RESET state, then the following occurs:
1. tcpm_reset_port() calls tcpm_init_vbus() to reset the vbus sourcing and
sinking state
2. tcpm_pd_event_handler() turns VBUS on before the port is in the default
state.
3. The port resolves as a sink. In the SNK_DISCOVERY state,
tcpm_set_charge() cannot set vbus to charge.
Clear pd events within PORT_RESET to get rid of non-applicable events.
Fixes:
|
||
|
|
1167a6a15c |
BACKPORT: usb: typec: tcpm: enforce ready state when queueing alt mode vdm
Before sending Enter Mode for an Alt Mode, there is a gap between Discover
Modes and the Alt Mode driver queueing the Enter Mode VDM for the port
partner to send a message to the port.
If this message results in unregistering Alt Modes such as in a DR_SWAP,
then the following deadlock can occur with respect to the DisplayPort Alt
Mode driver:
1. The DR_SWAP state holds port->lock. Unregistering the Alt Mode driver
results in a cancel_work_sync() that waits for the current dp_altmode_work
to finish.
2. dp_altmode_work makes a call to tcpm_altmode_enter. The deadlock occurs
because tcpm_queue_vdm_unlock attempts to hold port->lock.
Before attempting to grab the lock, ensure that the port is in a state
vdm_run_state_machine can run in. Alt Mode unregistration will not occur
in these states.
Fixes:
|
||
|
|
ab87c12aa2 |
BACKPORT: f2fs: sysfs: support discard_io_aware
It gives a way to enable/disable IO aware feature for background discard, so that we can tune background discard more precisely based on undiscard condition. e.g. force to disable IO aware if there are large number of discard extents, and discard IO may always be interrupted by frequent common IO. Bug:340148900 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> (cherry picked from commit d346fa09abff46988de9267b67b6900d9913d5a2) Change-Id: Icff98ecc1f9e3a294afa08cfe2e8ba4e9f7fdaf3 |
||
|
|
1f9a0ccd86 |
ANDROID: GKI: Update symbol list for Amlogic
1 function symbol(s) added 'int thermal_zone_bind_cooling_device(struct thermal_zone_device*, int, struct thermal_cooling_device*, unsigned long, unsigned long, unsigned int)' Bug: 339356479 Change-Id: Ic83d54a0be3156194b4bebf78ca4cfcc2ef30290 Signed-off-by: Qinglin Li <qinglin.li@amlogic.com> |
||
|
|
62000bb037 |
ANDROID: KVM: arm64: wait_for_initramfs for pKVM module loading procfs
Of course, the initramfs needs to be ready before procfs can be mounted... in the initramfs. While at it, only mount if a pKVM module must be loaded and only print a warning in case of failure. Bug: 278749606 Bug: 301483379 Bug: 331152809 Change-Id: Ie56bd26d4575f69cb1f06ba6317a098649f6da44 Reported-by: Mankyum Kim <mankyum.kim@samsung-slsi.corp-partner.google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> (cherry picked from commit 7d5843b59548672c23c977b4666c3779d31695fb) |
||
|
|
d5eef65564 |
BACKPORT: FROMGIT: usb: typec: tcpm: Check for port partner validity before consuming it
typec_register_partner() does not guarantee partner registration
to always succeed. In the event of failure, port->partner is set
to the error value or NULL. Given that port->partner validity is
not checked, this results in the following crash:
Unable to handle kernel NULL pointer dereference at virtual address 00000000000003c0
pc : run_state_machine+0x1bc8/0x1c08
lr : run_state_machine+0x1b90/0x1c08
..
Call trace:
run_state_machine+0x1bc8/0x1c08
tcpm_state_machine_work+0x94/0xe4
kthread_worker_fn+0x118/0x328
kthread+0x1d0/0x23c
ret_from_fork+0x10/0x20
To prevent the crash, check for port->partner validity before
derefencing it in all the call sites.
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
4dd58977f2 |
BACKPORT: FROMGIT: kbuild: Remove support for Clang's ThinLTO caching
There is an issue in clang's ThinLTO caching (enabled for the kernel via
'--thinlto-cache-dir') with .incbin, which the kernel occasionally uses
to include data within the kernel, such as the .config file for
/proc/config.gz. For example, when changing the .config and rebuilding
vmlinux, the copy of .config in vmlinux does not match the copy of
.config in the build folder:
$ echo 'CONFIG_LTO_NONE=n
CONFIG_LTO_CLANG_THIN=y
CONFIG_IKCONFIG=y
CONFIG_HEADERS_INSTALL=y' >kernel/configs/repro.config
$ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 clean defconfig repro.config vmlinux
...
$ grep CONFIG_HEADERS_INSTALL .config
CONFIG_HEADERS_INSTALL=y
$ scripts/extract-ikconfig vmlinux | grep CONFIG_HEADERS_INSTALL
CONFIG_HEADERS_INSTALL=y
$ scripts/config -d HEADERS_INSTALL
$ make -kj"$(nproc)" ARCH=x86_64 LLVM=1 vmlinux
...
UPD kernel/config_data
GZIP kernel/config_data.gz
CC kernel/configs.o
...
LD vmlinux
...
$ grep CONFIG_HEADERS_INSTALL .config
# CONFIG_HEADERS_INSTALL is not set
$ scripts/extract-ikconfig vmlinux | grep CONFIG_HEADERS_INSTALL
CONFIG_HEADERS_INSTALL=y
Without '--thinlto-cache-dir' or when using full LTO, this issue does
not occur.
Benchmarking incremental builds on a few different machines with and
without the cache shows a 20% increase in incremental build time without
the cache when measured by touching init/main.c and running 'make all'.
ARCH=arm64 defconfig + CONFIG_LTO_CLANG_THIN=y on an arm64 host:
Benchmark 1: With ThinLTO cache
Time (mean ± σ): 56.347 s ± 0.163 s [User: 83.768 s, System: 24.661 s]
Range (min … max): 56.109 s … 56.594 s 10 runs
Benchmark 2: Without ThinLTO cache
Time (mean ± σ): 67.740 s ± 0.479 s [User: 718.458 s, System: 31.797 s]
Range (min … max): 67.059 s … 68.556 s 10 runs
Summary
With ThinLTO cache ran
1.20 ± 0.01 times faster than Without ThinLTO cache
ARCH=x86_64 defconfig + CONFIG_LTO_CLANG_THIN=y on an x86_64 host:
Benchmark 1: With ThinLTO cache
Time (mean ± σ): 85.772 s ± 0.252 s [User: 91.505 s, System: 8.408 s]
Range (min … max): 85.447 s … 86.244 s 10 runs
Benchmark 2: Without ThinLTO cache
Time (mean ± σ): 103.833 s ± 0.288 s [User: 232.058 s, System: 8.569 s]
Range (min … max): 103.286 s … 104.124 s 10 runs
Summary
With ThinLTO cache ran
1.21 ± 0.00 times faster than Without ThinLTO cache
While it is unfortunate to take this performance improvement off the
table, correctness is more important. If/when this is fixed in LLVM, it
can potentially be brought back in a conditional manner. Alternatively,
a developer can just disable LTO if doing incremental compiles quickly
is important, as a full compile cycle can still take over a minute even
with the cache and it is unlikely that LTO will result in functional
differences for a kernel change.
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
2a64b7fe17 |
ANDROID: 16K: Fix call to show_pad_maps_fn()
The call should be for printing maps not smaps; fix this. Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: I8356b9e93fa2a300cb8bcd66fed857d42e8bfdca Signed-off-by: Kalesh Singh <kaleshsingh@google.com> |
||
|
|
596b6507c3 |
ANDROID: 16K: Fix show maps CFI failure
If the kernel is built CONFIG_CFI_CLANG=y, reading smaps may cause a panic. This is due to a failed CFI check; which is triggered becuase the signature of the function pointer for printing smaps padding VMAs does not match exactly with that for show_smap(). Fix this by casting the function pointer to the expected type based on whether printing maps or smaps padding. Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: I65564a547dacbc4131f8557344c8c96e51f90cd5 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> |
||
|
|
d83231efe4 |
ANDROID: 16K: Handle pad VMA splits and merges
In some cases a VMA with padding representation may be split, and
therefore the padding flags must be updated accordingly.
There are 3 cases to handle:
Given:
| DDDDPPPP |
where:
- D represents 1 page of data;
- P represents 1 page of padding;
- | represents the boundaries (start/end) of the VMA
1) Split exactly at the padding boundary
| DDDDPPPP | --> | DDDD | PPPP |
- Remove padding flags from the first VMA.
- The second VMA is all padding
2) Split within the padding area
| DDDDPPPP | --> | DDDDPP | PP |
- Subtract the length of the second VMA from the first VMA's
padding.
- The second VMA is all padding, adjust its padding length (flags)
3) Split within the data area
| DDDDPPPP | --> | DD | DDPPPP |
- Remove padding flags from the first VMA.
- The second VMA is has the same padding as from before the split.
To simplify the semantics merging of padding VMAs is not allowed.
If a split produces a VMA that is entirely padding, show_[s]maps()
only outputs the padding VMA entry (as the data entry is of length 0).
Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: Ie2628ced5512e2c7f8af25fabae1f38730c8bb1a
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
|
||
|
|
19d6e7eb47 |
ANDROID: 16K: madvise_vma_pad_pages: Remove filemap_fault check
Some file systems like F2FS use a custom filemap_fault ops. Remove this check, as checking vm_file is sufficient. Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: Id6a584d934f06650c0a95afd1823669fc77ba2c2 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> |
||
|
|
ae44e8dac8 |
ANDROID: 16K: Only madvise padding from dynamic linker context
Only preform padding advise from the execution context on bionic's dynamic linker. This ensures that madvise() doesn't have unwanted side effects. Also rearrange the order of fail checks in madvise_vma_pad_pages() in order of ascending cost. Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: I3e05b8780c6eda78007f86b613f8c11dd18ac28f Signed-off-by: Kalesh Singh <kaleshsingh@google.com> |
||
|
|
ae67f18944 |
ANDROID: Enable CONFIG_LAZY_RCU in x86 gki_defconfig
It is still disabled by default. Must specify rcutree.android_enable_rcu_lazy and rcu_nocbs=all in boot time parameter to actually enable it. Bug: 258241771 Change-Id: Ic9e15b846d58ffa3d5dd81842c568da79352ff2d Signed-off-by: Qais Yousef <qyousef@google.com> |
||
|
|
d38091b4ff |
ANDROID: Enable CONFIG_LAZY_RCU in arm64 gki_defconfig
It is still disabled by default. Must specify rcutree.android_enable_rcu_lazy and rcu_nocbs=all in boot time parameter to actually enable it. Bug: 258241771 Change-Id: I11c920aa5edde2fc42ab54245cd198eb8cb47616 Signed-off-by: Qais Yousef <qyousef@google.com> |
||
|
|
37b02c190c |
FROMLIST: rcu: Provide a boot time parameter to control lazy RCU
To allow more flexible arrangements while still provide a single kernel for distros, provide a boot time parameter to enable/disable lazy RCU. Specify: rcutree.enable_rcu_lazy=[y|1|n|0] Which also requires rcu_nocbs=all at boot time to enable/disable lazy RCU. To disable it by default at build time when CONFIG_RCU_LAZY=y, the new CONFIG_RCU_LAZY_DEFAULT_OFF can be used. Bug: 258241771 Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Tested-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/lkml/20231203011252.233748-1-qyousef@layalina.io/ [Fix trivial conflicts rejecting newer code that doesn't exist on 5.15] Signed-off-by: Qais Yousef <qyousef@google.com> Change-Id: Ib5585ae717a2ba7749f2802101b785c4e5de8a90 |
||
|
|
4adb60810c |
ANDROID: rcu: Add a minimum time for marking boot as completed
On many systems, a great deal of boot (in userspace) happens after the
kernel thinks the boot has completed. It is difficult to determine if
the system has really booted from the kernel side. Some features like
lazy-RCU can risk slowing down boot time if, say, a callback has been
added that the boot synchronously depends on. Further expedited callbacks
can get unexpedited way earlier than it should be, thus slowing down
boot (as shown in the data below).
For these reasons, this commit adds a config option
'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter rcupdate.boot_end_delay.
Userspace can also make RCU's view of the system as booted, by writing the
time in milliseconds to: /sys/module/rcupdate/parameters/android_rcu_boot_end_delay
Or even just writing a value of 0 to this sysfs node.
However, under no circumstance will the boot be allowed to end earlier
than just before init is launched.
The default value of CONFIG_RCU_BOOT_END_DELAY is chosen as 15s. This
suites ChromeOS and also a PREEMPT_RT system below very well, which need
no config or parameter changes, and just a simple application of this
patch. A system designer can also choose a specific value here to keep
RCU from marking boot completion. As noted earlier, RCU's perspective
of the system as booted will not be marker until at least
android_rcu_boot_end_delay milliseconds have passed or an update is made
via writing a small value (or 0) in milliseconds to:
/sys/module/rcupdate/parameters/android_rcu_boot_end_delay.
One side-effect of this patch is, there is a risk that a real-time workload
launched just after the kernel boots will suffer interruptions due to expedited
RCU, which previous ended just before init was launched. However, to mitigate
such an issue (however unlikely), the user should either tune
CONFIG_RCU_BOOT_END_DELAY to a smaller value than 15 seconds or write a value
of 0 to /sys/module/rcupdate/parameters/android_rcu_boot_end_delay, once userspace
boots, and before launching the real-time workload.
Qiuxu also noted impressive boot-time improvements with earlier version
of patch. An excerpt from the data he shared:
1) Testing environment:
OS : CentOS Stream 8 (non-RT OS)
Kernel : v6.2
Machine : Intel Cascade Lake server (2 sockets, each with 44 logical threads)
Qemu args : -cpu host -enable-kvm, -smp 88,threads=2,sockets=2, …
2) OS boot time definition:
The time from the start of the kernel boot to the shell command line
prompt is shown from the console. [ Different people may have
different OS boot time definitions. ]
3) Measurement method (very rough method):
A timer in the kernel periodically prints the boot time every 100ms.
As soon as the shell command line prompt is shown from the console,
we record the boot time printed by the timer, then the printed boot
time is the OS boot time.
4) Measured OS boot time (in seconds)
a) Measured 10 times w/o this patch:
8.7s, 8.4s, 8.6s, 8.2s, 9.0s, 8.7s, 8.8s, 9.3s, 8.8s, 8.3s
The average OS boot time was: ~8.7s
b) Measure 10 times w/ this patch:
8.5s, 8.2s, 7.6s, 8.2s, 8.7s, 8.2s, 7.8s, 8.2s, 9.3s, 8.4s
The average OS boot time was: ~8.3s.
(CHROMIUM tag rationale: Submitted upstream but got lots of pushback as
it may harm a PREEMPT_RT system -- the concern is VERY theoretical and
this improves things for ChromeOS. Plus we are not a PREEMPT_RT system.
So I am strongly suggesting this mostly simple change for ChromeOS.)
Bug: 258241771
Bug: 268129466
Test: boot
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Change-Id: Ibd262189d7f92dbcc57f1508efe90fcfba95a6cc
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4350228
Commit-Queue: Joel Fernandes <joelaf@google.com>
Commit-Queue: Vineeth Pillai <vineethrp@google.com>
Tested-by: Vineeth Pillai <vineethrp@google.com>
Tested-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
(cherry picked from commit 7968079ec77b320ee9d4115fe13048a8f7afbc02)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style. Prefix boot param with android_]
Signed-off-by: Qais Yousef <qyousef@google.com>
|
||
|
|
16ea06fe44 |
UPSTREAM: rcu/kvfree: Move need_offload_krc() out of krcp->lock
The need_offload_krc() function currently holds the krcp->lock in order
to safely check krcp->head. This commit removes the need for this lock
in that function by updating the krcp->head pointer using WRITE_ONCE()
macro so that readers can carry out lockless loads of that pointer.
Bug: 258241771
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit
|
||
|
|
5d1a3986c2 |
UPSTREAM: rcu/kfree: Fix kfree_rcu_shrink_count() return value
As per the comments in include/linux/shrinker.h, .count_objects callback
should return the number of freeable items, but if there are no objects
to free, SHRINK_EMPTY should be returned. The only time 0 is returned
should be when we are unable to determine the number of objects, or the
cache should be skipped for another reason.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit
|
||
|
|
88587c1838 |
UPSTREAM: rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval
Currently the monitor work is scheduled with a fixed interval of HZ/20, which is roughly 50 milliseconds. The drawback of this approach is low utilization of the 512 page slots in scenarios with infrequence kvfree_rcu() calls. For example on an Android system: <snip> kworker/3:3-507 [003] .... 470.286305: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=6 kworker/6:1-76 [006] .... 470.416613: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000ea0d6556 nr_records=1 kworker/6:1-76 [006] .... 470.416625: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003e025849 nr_records=9 kworker/3:3-507 [003] .... 471.390000: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=48 kworker/1:1-73 [001] .... 471.725785: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000fda9bf20 nr_records=3 kworker/1:1-73 [001] .... 471.725833: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000a425b67b nr_records=76 kworker/0:4-1411 [000] .... 472.085673: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007996be9d nr_records=1 kworker/0:4-1411 [000] .... 472.085728: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=5 kworker/6:1-76 [006] .... 472.260340: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000065630ee4 nr_records=102 <snip> In many cases, out of 512 slots, fewer than 10 were actually used. In order to improve batching and make utilization more efficient this commit sets a drain interval to a fixed 5-seconds interval. Floods are detected when a page fills quickly, and in that case, the reclaim work is re-scheduled for the next scheduling-clock tick (jiffy). After this change: <snip> kworker/7:1-371 [007] .... 5630.725708: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000005ab0ffb3 nr_records=121 kworker/7:1-371 [007] .... 5630.989702: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000060c84761 nr_records=47 kworker/7:1-371 [007] .... 5630.989714: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000000babf308 nr_records=510 kworker/7:1-371 [007] .... 5631.553790: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000bb7bd0ef nr_records=169 kworker/7:1-371 [007] .... 5631.553808: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044c78753 nr_records=510 kworker/5:6-9428 [005] .... 5631.746102: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d98519aa nr_records=123 kworker/4:7-9434 [004] .... 5632.001758: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000526c9d44 nr_records=322 kworker/4:7-9434 [004] .... 5632.002073: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000002c6a8afa nr_records=185 kworker/7:1-371 [007] .... 5632.277515: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007f4a962f nr_records=510 <snip> Here, all but one of the cases, more than one hundreds slots were used, representing an order-of-magnitude improvement. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit |
||
|
|
5b47d8411d |
UPSTREAM: rcu/kvfree: Remove useless monitor_todo flag
monitor_todo is not needed as the work struct already tracks if work is pending. Just use that to know if work is pending using schedule_delayed_work() helper. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> (cherry picked from commit |
||
|
|
84828604c7 |
UPSTREAM: scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu()
Earlier commits in this series allow battery-powered systems to build
their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
This Kconfig option causes call_rcu() to delay its callbacks in order
to batch them. This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing. This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.
This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory. If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order. Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.
However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked. It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU. After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.
Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.
And another call_rcu() instance that cannot be lazy is the one in the
scsi_eh_scmd_add() function. Leaving this instance lazy results in
unacceptably slow boot times.
Therefore, make scsi_eh_scmd_add() use call_rcu_hurry() in order to
revert to the old behavior.
[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
Bug: 258241771
Bug: 222463781
Test: CQ
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Change-Id: I95bba865e582b0a12b1c09ba1f0bd4f897401c07
Signed-off-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: <linux-scsi@vger.kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit
|
||
|
|
a4124a21b1 |
ANDROID: rxrpc: Use call_rcu_hurry() instead of call_rcu()
call_rcu() changes to save power may cause slowness. Use the call_rcu_hurry() API instead which reverts to the old behavior. We find this via inspection that the RCU callback does a wakeup of a thread. This usually indicates that something is waiting on it. To be safe, let us use call_rcu_hurry() here instead. [ joel: Upstream is rewriting this code, so I am merging this as a CHROMIUM patch. There is no harm in including it. Link: https://lore.kernel.org/rcu/658624.1669849522@warthog.procyon.org.uk/#t ] Bug: 258241771 Bug: 222463781 Test: CQ Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Change-Id: Iaadfe2f9db189489915828c6f2f74522f4b90ea3 Signed-off-by: Joel Fernandes <joelaf@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3965078 Reviewed-by: Ross Zwisler <zwisler@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318055 Reviewed-by: Vineeth Pillai <vineethrp@google.com> (cherry picked from commit 1f98f32393f83d14bc290fef06d5b3132bee23e0) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com> |
||
|
|
930bdc0924 |
UPSTREAM: net: devinet: Reduce refcount before grace period
Currently, the inetdev_destroy() function waits for an RCU grace period
before decrementing the refcount and freeing memory. This causes a delay
with a new RCU configuration that tries to save power, which results in the
network interface disappearing later than expected. The resulting delay
causes test failures on ChromeOS.
Refactor the code such that the refcount is freed before the grace period
and memory is freed after. With this a ChromeOS network test passes that
does 'ip netns del' and polls for an interface disappearing, now passes.
Bug: 258241771
Bug: 222463781
Test: CQ
Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Change-Id: I98b13c5a8fb9696c1111219d774cf91c8b14b4c5
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: <netdev@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit
|
||
|
|
706e751b33 |
UPSTREAM: rcu: Disable laziness if lazy-tracking says so
During suspend, we see failures to suspend 1 in 300-500 suspends. Looking closer, it appears that asynchronous RCU callbacks are being queued as lazy even though synchronous callbacks are expedited. These delays appear to not be very welcome by the suspend/resume code as evidenced by these occasional suspend failures. This commit modifies call_rcu() to check if rcu_async_should_hurry(), which will return true if we are in suspend or in-kernel boot. [ paulmck: Alphabetize local variables. ] Ignoring the lazy hint makes the 3000 suspend/resume cycles pass reliably on a 12th gen 12-core Intel CPU, and there is some evidence that it also slightly speeds up boot performance. Bug: 258241771 Bug: 222463781 Test: CQ Fixes: |
||
|
|
8568593719 |
UPSTREAM: rcu: Track laziness during boot and suspend
Boot and suspend/resume should not be slowed down in kernels built with CONFIG_RCU_LAZY=y. In particular, suspend can sometimes fail in such kernels. This commit therefore adds rcu_async_hurry(), rcu_async_relax(), and rcu_async_should_hurry() functions that track whether or not either a boot or a suspend/resume operation is in progress. This will enable a later commit to refrain from laziness during those times. Export rcu_async_should_hurry(), rcu_async_hurry(), and rcu_async_relax() for later use by rcutorture. [ paulmck: Apply feedback from Steve Rostedt. ] Bug: 258241771 Bug: 222463781 Test: CQ Fixes: |
||
|
|
f12c162eac |
UPSTREAM: net: Use call_rcu_hurry() for dst_release()
In a networking test on ChromeOS, kernels built with the new
CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown
phase.
This failure may be reproduced as follows: ip netns del <name>
The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits
in this series for the benefit of certain battery-powered systems.
This Kconfig option causes call_rcu() to delay its callbacks in order
to batch them. This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing. This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.
This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory. If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order. Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.
However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked. It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU. After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.
Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.
Returning to the test failure, use of ftrace showed that this failure
cause caused by the aadded delays due to this new lazy behavior of
call_rcu() in kernels built with CONFIG_RCU_LAZY=y.
Therefore, make dst_release() use call_rcu_hurry() in order to revert
to the old test-failure-free behavior.
[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: Ifd64083bd210a9dfe94c179152f27d310c179507
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: <netdev@vger.kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit
|
||
|
|
ff22b562f0 |
UPSTREAM: percpu-refcount: Use call_rcu_hurry() for atomic switch
Earlier commits in this series allow battery-powered systems to build
their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
This Kconfig option causes call_rcu() to delay its callbacks in order to
batch callbacks. This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing. This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.
This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory. If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order. Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.
However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked. It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU. After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.
Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.
And another call_rcu() instance that cannot be lazy is the one on the
percpu refcounter's "per-CPU to atomic switch" code path, which
uses RCU when switching to atomic mode. The enqueued callback
wakes up waiters waiting in the percpu_ref_switch_waitq. Allowing
this callback to be lazy would result in unacceptable slowdowns for
users of per-CPU refcounts, such as blk_pre_runtime_suspend().
Therefore, make __percpu_ref_switch_to_atomic() use call_rcu_hurry()
in order to revert to the old behavior.
[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: Icc325f69d0df1a37b6f1de02a284e1fabf20e366
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: <linux-mm@kvack.org>
(cherry picked from commit
|
||
|
|
a4cc1aa22d |
UPSTREAM: rcu/sync: Use call_rcu_hurry() instead of call_rcu
call_rcu() changes to save power will slow down rcu sync. Use the
call_rcu_hurry() API instead which reverts to the old behavior.
[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: I5123ba52f47676305dbcfa1233bf3b41f140766c
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit
|
||
|
|
222a4cd66c |
UPSTREAM: rcu: Refactor code a bit in rcu_nocb_do_flush_bypass()
This consolidates the code a bit and makes it cleaner. Functionally it
is the same.
Bug: 258241771
Bug: 222463781
Test: CQ
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit
|