Commit Graph

1074380 Commits

Author SHA1 Message Date
Eric Dumazet
e94d70c73a UPSTREAM: netlink: annotate data-races around sk->sk_err
commit d0f95894fd upstream.

syzbot caught another data-race in netlink when
setting sk->sk_err.

Annotate all of them for good measure.

BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg

write to 0xffff8881613bb220 of 4 bytes by task 28147 on cpu 0:
netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994
sock_recvmsg_nosec net/socket.c:1027 [inline]
sock_recvmsg net/socket.c:1049 [inline]
__sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229
__do_sys_recvfrom net/socket.c:2247 [inline]
__se_sys_recvfrom net/socket.c:2243 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2243
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

write to 0xffff8881613bb220 of 4 bytes by task 28146 on cpu 1:
netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994
sock_recvmsg_nosec net/socket.c:1027 [inline]
sock_recvmsg net/socket.c:1049 [inline]
__sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229
__do_sys_recvfrom net/socket.c:2247 [inline]
__se_sys_recvfrom net/socket.c:2243 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2243
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00000000 -> 0x00000016

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 28146 Comm: syz-executor.0 Not tainted 6.6.0-rc3-syzkaller-00055-g9ed22ae6be81 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231003183455.3410550-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: yenchia.chen <yenchia.chen@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2750d7641d0825aeb1e4d3b40c627ff3e3670d24)
Bug: 339546075
Change-Id: I66f0690e8ab0d2e88f0cb0ce3b4be92907a0dcc1
Signed-off-by: yenchia.chen <yenchia.chen@mediatek.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
2024-05-29 03:58:42 +00:00
Eric Dumazet
c0e59dab58 UPSTREAM: netlink: annotate lockless accesses to nlk->max_recvmsg_len
commit a1865f2e7d upstream.

syzbot reported a data-race in data-race in netlink_recvmsg() [1]

Indeed, netlink_recvmsg() can be run concurrently,
and netlink_dump() also needs protection.

[1]
BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg

read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0:
netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
__sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194
__do_sys_recvfrom net/socket.c:2212 [inline]
__se_sys_recvfrom net/socket.c:2208 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2208
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1:
netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
____sys_recvmsg+0x156/0x310 net/socket.c:2720
___sys_recvmsg net/socket.c:2762 [inline]
do_recvmmsg+0x2e5/0x710 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x0000000000000000 -> 0x0000000000001000

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48fdfcb #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023

Fixes: 9063e21fb0 ("netlink: autosize skb lengthes")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: yenchia.chen <yenchia.chen@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7cff4103be7c402ecc3e7bf8f95a64089e3c91b8)
Bug: 339546075
Change-Id: I13f81332f05f2c5bc14de00bfa3de0925162af2b
Signed-off-by: yenchia.chen <yenchia.chen@mediatek.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
2024-05-29 03:57:45 +00:00
Paul Lawrence
8d122b23ed ANDROID: incremental-fs: Make work with 16k pages
Bug: 260919895
Test: incfs_test passes
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: Ia4fbb6011930b085bc00a36851e9b0e8559d3dc5
(cherry picked from commit 5ac10739bcf2dae9220a7a39392aa41235bc64c2)
2024-05-28 22:30:52 +00:00
Yifan Hong
b7d4cf60e7 Revert "BACKPORT: FROMGIT: module: allow UNUSED_KSYMS_WHITELIST ..."
Revert submission 3101887-android14-ksyms-wl

Reason for revert: Restore green in release builds

Reverted changes: /q/submissionid:3101887-android14-ksyms-wl

Change-Id: Ia6c694be720c7a07abdfee6a4720a4518ea59141
2024-05-28 17:13:06 +00:00
Yifan Hong
acacba8bfa BACKPORT: FROMGIT: module: allow UNUSED_KSYMS_WHITELIST to be relative against objtree.
If UNUSED_KSYMS_WHITELIST is a file generated
before Kbuild runs, and the source tree is in
a read-only filesystem, the developer must put
the file somewhere and specify an absolute
path to UNUSED_KSYMS_WHITELIST. This worked,
but if IKCONFIG=y, an absolute path is embedded
into .config and eventually into vmlinux, causing
the build to be less reproducible when building
on a different machine.

This patch makes the handling of
UNUSED_KSYMS_WHITELIST to be similar to
MODULE_SIG_KEY.

First, check if UNUSED_KSYMS_WHITELIST is an
absolute path, just as before this patch. If so,
use the path as is.

If it is a relative path, use wildcard to check
the existence of the file below objtree first.
If it does not exist, fall back to the original
behavior of adding $(srctree)/ before the value.

After this patch, the developer can put the generated
file in objtree, then use a relative path against
objtree in .config, eradicating any absolute paths
that may be evaluated differently on different machines.

Signed-off-by: Yifan Hong <elsk@google.com>
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

(cherry picked from commit a2e3c811938b4902725e259c03b2d6c539613992
 https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git modules-next)
Bug: 333769605

Change-Id: I0696ac8f686329795034ada5a4587af4ecbb774f
[elsk: apply change to gen_autoksyms.sh instead because
 CONFIG_UNUSED_KSYMS_WHITELIST is parsed there. Revert change
 to Makefile.modpost. Edit init/Kconfig instead of kernel/module/Kconfig
 because UNUSED_KSYMS_WHITELIST is defined there.]
Bug: 342390208
Signed-off-by: Yifan Hong <elsk@google.com>
2024-05-28 16:18:05 +00:00
Robin Hsu
d19f3c553a ANDROID: export one function for mm metrics
export function for sysfs node formating

Bug: 299190787
Change-Id: I71e6a0815efa8df99d036bf457b8a0081999f3de
Signed-off-by: Robin Hsu <robinhsu@google.com>
2024-05-28 12:17:47 +00:00
Kyle Tso
de30e28ba8 FROMLIST: usb: typec: tcpm: Ignore received Hard Reset in TOGGLING state
Similar to what fixed in Commit a6fe37f428c1 ("usb: typec: tcpm: Skip
hard reset when in error recovery"), the handling of the received Hard
Reset has to be skipped during TOGGLING state.

[ 4086.021288] VBUS off
[ 4086.021295] pending state change SNK_READY -> SNK_UNATTACHED @ 650 ms [rev2 NONE_AMS]
[ 4086.022113] VBUS VSAFE0V
[ 4086.022117] state change SNK_READY -> SNK_UNATTACHED [rev2 NONE_AMS]
[ 4086.022447] VBUS off
[ 4086.022450] state change SNK_UNATTACHED -> SNK_UNATTACHED [rev2 NONE_AMS]
[ 4086.023060] VBUS VSAFE0V
[ 4086.023064] state change SNK_UNATTACHED -> SNK_UNATTACHED [rev2 NONE_AMS]
[ 4086.023070] disable BIST MODE TESTDATA
[ 4086.023766] disable vbus discharge ret:0
[ 4086.023911] Setting usb_comm capable false
[ 4086.028874] Setting voltage/current limit 0 mV 0 mA
[ 4086.028888] polarity 0
[ 4086.030305] Requesting mux state 0, usb-role 0, orientation 0
[ 4086.033539] Start toggling
[ 4086.038496] state change SNK_UNATTACHED -> TOGGLING [rev2 NONE_AMS]

// This Hard Reset is unexpected
[ 4086.038499] Received hard reset
[ 4086.038501] state change TOGGLING -> HARD_RESET_START [rev2 HARD_RESET]

Fixes: f0690a25a1 ("staging: typec: USB Type-C Port Manager (tcpm)")
Cc: stable@vger.kernel.org
Signed-off-by: Kyle Tso <kyletso@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Change-Id: Icfa144f370bd87670df1cd71f247a3528ab4c591
Bug: 331356545
Link: https://lore.kernel.org/all/20240520154858.1072347-1-kyletso@google.com/
2024-05-22 12:10:42 +00:00
Carlos Llamas
fb512eac3c ANDROID: binder: fix ptrdiff_t printk-format issue
The correct printk format specifier when calculating buffer offsets
should be "%tx" as it is a pointer difference (a.k.a ptrdiff_t). This
fixes some W=1 build warnings reported by the kernel test robot.

Bug: 329799092
Fixes: 63f7ddea2e ("ANDROID: binder: fix KMI-break due to address type change")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401100511.A4BKMwoq-lkp@intel.com/
Change-Id: Iaa87433897b507c47fe8601464445cb6de4b61db
Signed-off-by: Carlos Llamas <cmllamas@google.com>
2024-05-17 17:32:17 +00:00
hulianqin
2526c59f3f ANDROID: usb: Optimize the problem of slow transfer rate in USB accessory mode
The data transfer rate using Google Restore in USB3.2 mode is slower,
only about 140MB/s at 5Gbps.

The bMaxBurst is not set, and num_fifos in
dwc3_gadget_resize_tx_fifosis 1, which results
in only 131btye of dwc3 ram space being allocated to ep.

Modify bMaxBurst to 6.
The 5Gbps rate increases from 140MB/s to 350MB/s.
The 10Gbps rate is increased from 220MB/s to 500MB/s.

Bug: 340049583
BUG: 341178033

Change-Id: I5710af32c72d0b57afaecc00c4f0909af4b9a299
Signed-off-by: Lianqin Hu <hulianqin@vivo.corp-partner.google.com>
Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
(cherry picked from commit 23f2a9f5f1)
Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
2024-05-17 02:31:30 +00:00
RD Babiera
5981994114 UPSTREAM: usb: typec: tcpm: clear pd_event queue in PORT_RESET
When a Fast Role Swap control message attempt results in a transition
to ERROR_RECOVERY, the TCPC can still queue a TCPM_SOURCING_VBUS event.

If the event is queued but processed after the tcpm_reset_port() call
in the PORT_RESET state, then the following occurs:
1. tcpm_reset_port() calls tcpm_init_vbus() to reset the vbus sourcing and
sinking state
2. tcpm_pd_event_handler() turns VBUS on before the port is in the default
state.
3. The port resolves as a sink. In the SNK_DISCOVERY state,
tcpm_set_charge() cannot set vbus to charge.

Clear pd events within PORT_RESET to get rid of non-applicable events.

Fixes: b17dd57118 ("staging: typec: tcpm: Improve role swap with non PD capable partners")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240423202715.3375827-2-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bug: 311127232
(cherry picked from commit bf20c69cf3cf9c6445c4925dd9a8a6ca1b78bfdf)
Signed-off-by: RD Babiera <rdbabiera@google.com>
Change-Id: I9b27d040d0acdeb2af74fd3fe90d246b864b5141
2024-05-14 10:59:17 +00:00
RD Babiera
1167a6a15c BACKPORT: usb: typec: tcpm: enforce ready state when queueing alt mode vdm
Before sending Enter Mode for an Alt Mode, there is a gap between Discover
Modes and the Alt Mode driver queueing the Enter Mode VDM for the port
partner to send a message to the port.

If this message results in unregistering Alt Modes such as in a DR_SWAP,
then the following deadlock can occur with respect to the DisplayPort Alt
Mode driver:
1. The DR_SWAP state holds port->lock. Unregistering the Alt Mode driver
results in a cancel_work_sync() that waits for the current dp_altmode_work
to finish.
2. dp_altmode_work makes a call to tcpm_altmode_enter. The deadlock occurs
because tcpm_queue_vdm_unlock attempts to hold port->lock.

Before attempting to grab the lock, ensure that the port is in a state
vdm_run_state_machine can run in. Alt Mode unregistration will not occur
in these states.

Fixes: 03eafcfb60 ("usb: typec: tcpm: Add tcpm_queue_vdm_unlocked() helper")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240423202356.3372314-2-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bug: 333787869
(cherry picked from commit cdc9946ea6377e8e214b135ccc308c5e514ba25f)
[rd: removed SRC_VDM_IDENTITY_REQUEST check, state not defined in branch]
Signed-off-by: RD Babiera <rdbabiera@google.com>
Change-Id: I8018d1fdc294885ae609b6e45e9bf6ab190897b9
2024-05-14 09:28:40 +00:00
Chao Yu
ab87c12aa2 BACKPORT: f2fs: sysfs: support discard_io_aware
It gives a way to enable/disable IO aware feature for background
discard, so that we can tune background discard more precisely
based on undiscard condition. e.g. force to disable IO aware if
there are large number of discard extents, and discard IO may
always be interrupted by frequent common IO.

Bug:340148900
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit d346fa09abff46988de9267b67b6900d9913d5a2)
Change-Id: Icff98ecc1f9e3a294afa08cfe2e8ba4e9f7fdaf3
2024-05-14 02:00:08 +00:00
qinglin.li
1f9a0ccd86 ANDROID: GKI: Update symbol list for Amlogic
1 function symbol(s) added
  'int thermal_zone_bind_cooling_device(struct thermal_zone_device*, int, struct thermal_cooling_device*, unsigned long, unsigned long, unsigned int)'

Bug: 339356479
Change-Id: Ic83d54a0be3156194b4bebf78ca4cfcc2ef30290
Signed-off-by: Qinglin Li <qinglin.li@amlogic.com>
2024-05-08 17:33:02 +00:00
Vincent Donnefort
62000bb037 ANDROID: KVM: arm64: wait_for_initramfs for pKVM module loading procfs
Of course, the initramfs needs to be ready before procfs can be
mounted... in the initramfs. While at it, only mount if a pKVM module
must be loaded and only print a warning in case of failure.

Bug: 278749606
Bug: 301483379
Bug: 331152809
Change-Id: Ie56bd26d4575f69cb1f06ba6317a098649f6da44
Reported-by: Mankyum Kim <mankyum.kim@samsung-slsi.corp-partner.google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
(cherry picked from commit 7d5843b59548672c23c977b4666c3779d31695fb)
2024-05-08 13:50:22 +00:00
Badhri Jagan Sridharan
d5eef65564 BACKPORT: FROMGIT: usb: typec: tcpm: Check for port partner validity before consuming it
typec_register_partner() does not guarantee partner registration
to always succeed. In the event of failure, port->partner is set
to the error value or NULL. Given that port->partner validity is
not checked, this results in the following crash:

Unable to handle kernel NULL pointer dereference at virtual address 00000000000003c0
 pc : run_state_machine+0x1bc8/0x1c08
 lr : run_state_machine+0x1b90/0x1c08
..
 Call trace:
   run_state_machine+0x1bc8/0x1c08
   tcpm_state_machine_work+0x94/0xe4
   kthread_worker_fn+0x118/0x328
   kthread+0x1d0/0x23c
   ret_from_fork+0x10/0x20

To prevent the crash, check for port->partner validity before
derefencing it in all the call sites.

Cc: stable@vger.kernel.org
Fixes: c97cd0b4b5 ("usb: typec: tcpm: set initial svdm version based on pd revision")
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240427202812.3435268-1-badhri@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bug: 321849121
(cherry picked from commit ae11f04b452b5205536e1c02d31f8045eba249dd
https://kernel.googlesource.com/pub/scm/linux/kernel/git/gregkh/usb
usb-linus)
Change-Id: I01510c86e147b3011afc5d475fc1dc38d2636a60
Signed-off-by: Zheng Pan <zhengpan@google.com>
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
2024-05-07 14:53:15 +00:00
Nathan Chancellor
4dd58977f2 BACKPORT: FROMGIT: kbuild: Remove support for Clang's ThinLTO caching
There is an issue in clang's ThinLTO caching (enabled for the kernel via
'--thinlto-cache-dir') with .incbin, which the kernel occasionally uses
to include data within the kernel, such as the .config file for
/proc/config.gz. For example, when changing the .config and rebuilding
vmlinux, the copy of .config in vmlinux does not match the copy of
.config in the build folder:

  $ echo 'CONFIG_LTO_NONE=n
  CONFIG_LTO_CLANG_THIN=y
  CONFIG_IKCONFIG=y
  CONFIG_HEADERS_INSTALL=y' >kernel/configs/repro.config

  $ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 clean defconfig repro.config vmlinux
  ...

  $ grep CONFIG_HEADERS_INSTALL .config
  CONFIG_HEADERS_INSTALL=y

  $ scripts/extract-ikconfig vmlinux | grep CONFIG_HEADERS_INSTALL
  CONFIG_HEADERS_INSTALL=y

  $ scripts/config -d HEADERS_INSTALL

  $ make -kj"$(nproc)" ARCH=x86_64 LLVM=1 vmlinux
  ...
    UPD     kernel/config_data
    GZIP    kernel/config_data.gz
    CC      kernel/configs.o
  ...
    LD      vmlinux
  ...

  $ grep CONFIG_HEADERS_INSTALL .config
  # CONFIG_HEADERS_INSTALL is not set

  $ scripts/extract-ikconfig vmlinux | grep CONFIG_HEADERS_INSTALL
  CONFIG_HEADERS_INSTALL=y

Without '--thinlto-cache-dir' or when using full LTO, this issue does
not occur.

Benchmarking incremental builds on a few different machines with and
without the cache shows a 20% increase in incremental build time without
the cache when measured by touching init/main.c and running 'make all'.

ARCH=arm64 defconfig + CONFIG_LTO_CLANG_THIN=y on an arm64 host:

  Benchmark 1: With ThinLTO cache
    Time (mean ± σ):     56.347 s ±  0.163 s    [User: 83.768 s, System: 24.661 s]
    Range (min … max):   56.109 s … 56.594 s    10 runs

  Benchmark 2: Without ThinLTO cache
    Time (mean ± σ):     67.740 s ±  0.479 s    [User: 718.458 s, System: 31.797 s]
    Range (min … max):   67.059 s … 68.556 s    10 runs

  Summary
    With ThinLTO cache ran
      1.20 ± 0.01 times faster than Without ThinLTO cache

ARCH=x86_64 defconfig + CONFIG_LTO_CLANG_THIN=y on an x86_64 host:

  Benchmark 1: With ThinLTO cache
    Time (mean ± σ):     85.772 s ±  0.252 s    [User: 91.505 s, System: 8.408 s]
    Range (min … max):   85.447 s … 86.244 s    10 runs

  Benchmark 2: Without ThinLTO cache
    Time (mean ± σ):     103.833 s ±  0.288 s    [User: 232.058 s, System: 8.569 s]
    Range (min … max):   103.286 s … 104.124 s    10 runs

  Summary
    With ThinLTO cache ran
      1.21 ± 0.00 times faster than Without ThinLTO cache

While it is unfortunate to take this performance improvement off the
table, correctness is more important. If/when this is fixed in LLVM, it
can potentially be brought back in a conditional manner. Alternatively,
a developer can just disable LTO if doing incremental compiles quickly
is important, as a full compile cycle can still take over a minute even
with the cache and it is unlikely that LTO will result in functional
differences for a kernel change.

Cc: stable@vger.kernel.org
Fixes: dc5723b02e ("kbuild: add support for Clang LTO")
Reported-by: Yifan Hong <elsk@google.com>
Closes: https://github.com/ClangBuiltLinux/linux/issues/2021
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Closes: https://lore.kernel.org/r/20220327115526.cc4b0ff55fc53c97683c3e4d@kernel.org/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

Bug: 312268956
Bug: 335301039
Change-Id: Iace492db67f28e172427669b1b7eb6a8c44dd3aa
(cherry picked from commit aba091547ef6159d52471f42a3ef531b7b660ed8
 https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git kbuild)
[elsk: Resolved minor conflict in Makefile]
Signed-off-by: Yifan Hong <elsk@google.com>
2024-05-06 22:21:09 +00:00
Kalesh Singh
2a64b7fe17 ANDROID: 16K: Fix call to show_pad_maps_fn()
The call should be for printing maps not smaps; fix this.

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I8356b9e93fa2a300cb8bcd66fed857d42e8bfdca
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-05-01 19:55:54 +00:00
Kalesh Singh
596b6507c3 ANDROID: 16K: Fix show maps CFI failure
If the kernel is built CONFIG_CFI_CLANG=y, reading smaps
may cause a panic. This is due to a failed CFI check; which
is triggered becuase the signature of the function pointer for
printing smaps padding VMAs does not match exactly with that
for show_smap().

Fix this by casting the function pointer to the expected type
based on whether printing maps or smaps padding.

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I65564a547dacbc4131f8557344c8c96e51f90cd5
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-05-01 16:35:25 +00:00
Kalesh Singh
d83231efe4 ANDROID: 16K: Handle pad VMA splits and merges
In some cases a VMA with padding representation may be split, and
therefore the padding flags must be updated accordingly.

There are 3 cases to handle:

Given:
    | DDDDPPPP |

where:
    - D represents 1 page of data;
    - P represents 1 page of padding;
    - | represents the boundaries (start/end) of the VMA

1) Split exactly at the padding boundary

    | DDDDPPPP | --> | DDDD | PPPP |

    - Remove padding flags from the first VMA.
    - The second VMA is all padding

2) Split within the padding area

    | DDDDPPPP | --> | DDDDPP | PP |

    - Subtract the length of the second VMA from the first VMA's
      padding.
    - The second VMA is all padding, adjust its padding length (flags)

3) Split within the data area

    | DDDDPPPP | --> | DD | DDPPPP |

    - Remove padding flags from the first VMA.
    - The second VMA is has the same padding as from before the split.

To simplify the semantics merging of padding VMAs is not allowed.

If a split produces a VMA that is entirely padding, show_[s]maps()
only outputs the padding VMA entry (as the data entry is of length 0).

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: Ie2628ced5512e2c7f8af25fabae1f38730c8bb1a
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-04-29 23:44:57 +00:00
Kalesh Singh
19d6e7eb47 ANDROID: 16K: madvise_vma_pad_pages: Remove filemap_fault check
Some file systems like F2FS use a custom filemap_fault ops. Remove this
check, as checking vm_file is sufficient.

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: Id6a584d934f06650c0a95afd1823669fc77ba2c2
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-04-29 23:44:57 +00:00
Kalesh Singh
ae44e8dac8 ANDROID: 16K: Only madvise padding from dynamic linker context
Only preform padding advise from the execution context on bionic's
dynamic linker. This ensures that madvise() doesn't have unwanted
side effects.

Also rearrange the order of fail checks in madvise_vma_pad_pages()
in order of ascending cost.

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I3e05b8780c6eda78007f86b613f8c11dd18ac28f
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-04-29 23:44:57 +00:00
Qais Yousef
ae67f18944 ANDROID: Enable CONFIG_LAZY_RCU in x86 gki_defconfig
It is still disabled by default. Must specify
rcutree.android_enable_rcu_lazy and rcu_nocbs=all in boot time parameter
to actually enable it.

Bug: 258241771
Change-Id: Ic9e15b846d58ffa3d5dd81842c568da79352ff2d
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Qais Yousef
d38091b4ff ANDROID: Enable CONFIG_LAZY_RCU in arm64 gki_defconfig
It is still disabled by default. Must specify
rcutree.android_enable_rcu_lazy and rcu_nocbs=all in boot time parameter
to actually enable it.

Bug: 258241771
Change-Id: I11c920aa5edde2fc42ab54245cd198eb8cb47616
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Qais Yousef
37b02c190c FROMLIST: rcu: Provide a boot time parameter to control lazy RCU
To allow more flexible arrangements while still provide a single kernel
for distros, provide a boot time parameter to enable/disable lazy RCU.

Specify:

	rcutree.enable_rcu_lazy=[y|1|n|0]

Which also requires

	rcu_nocbs=all

at boot time to enable/disable lazy RCU.

To disable it by default at build time when CONFIG_RCU_LAZY=y, the new
CONFIG_RCU_LAZY_DEFAULT_OFF can be used.

Bug: 258241771
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/lkml/20231203011252.233748-1-qyousef@layalina.io/
[Fix trivial conflicts rejecting newer code that doesn't exist on 5.15]
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: Ib5585ae717a2ba7749f2802101b785c4e5de8a90
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
4adb60810c ANDROID: rcu: Add a minimum time for marking boot as completed
On many systems, a great deal of boot (in userspace) happens after the
kernel thinks the boot has completed. It is difficult to determine if
the system has really booted from the kernel side. Some features like
lazy-RCU can risk slowing down boot time if, say, a callback has been
added that the boot synchronously depends on. Further expedited callbacks
can get unexpedited way earlier than it should be, thus slowing down
boot (as shown in the data below).

For these reasons, this commit adds a config option
'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter rcupdate.boot_end_delay.
Userspace can also make RCU's view of the system as booted, by writing the
time in milliseconds to: /sys/module/rcupdate/parameters/android_rcu_boot_end_delay
Or even just writing a value of 0 to this sysfs node.
However, under no circumstance will the boot be allowed to end earlier
than just before init is launched.

The default value of CONFIG_RCU_BOOT_END_DELAY is chosen as 15s. This
suites ChromeOS and also a PREEMPT_RT system below very well, which need
no config or parameter changes, and just a simple application of this
patch. A system designer can also choose a specific value here to keep
RCU from marking boot completion.  As noted earlier, RCU's perspective
of the system as booted will not be marker until at least
android_rcu_boot_end_delay milliseconds have passed or an update is made
via writing a small value (or 0) in milliseconds to:
/sys/module/rcupdate/parameters/android_rcu_boot_end_delay.

One side-effect of this patch is, there is a risk that a real-time workload
launched just after the kernel boots will suffer interruptions due to expedited
RCU, which previous ended just before init was launched. However, to mitigate
such an issue (however unlikely), the user should either tune
CONFIG_RCU_BOOT_END_DELAY to a smaller value than 15 seconds or write a value
of 0 to /sys/module/rcupdate/parameters/android_rcu_boot_end_delay, once userspace
boots, and before launching the real-time workload.

Qiuxu also noted impressive boot-time improvements with earlier version
of patch. An excerpt from the data he shared:

1) Testing environment:
    OS            : CentOS Stream 8 (non-RT OS)
    Kernel     : v6.2
    Machine : Intel Cascade Lake server (2 sockets, each with 44 logical threads)
    Qemu  args  : -cpu host -enable-kvm, -smp 88,threads=2,sockets=2, …

2) OS boot time definition:
    The time from the start of the kernel boot to the shell command line
    prompt is shown from the console. [ Different people may have
    different OS boot time definitions. ]

3) Measurement method (very rough method):
    A timer in the kernel periodically prints the boot time every 100ms.
    As soon as the shell command line prompt is shown from the console,
    we record the boot time printed by the timer, then the printed boot
    time is the OS boot time.

4) Measured OS boot time (in seconds)
   a) Measured 10 times w/o this patch:
        8.7s, 8.4s, 8.6s, 8.2s, 9.0s, 8.7s, 8.8s, 9.3s, 8.8s, 8.3s
        The average OS boot time was: ~8.7s

   b) Measure 10 times w/ this patch:
        8.5s, 8.2s, 7.6s, 8.2s, 8.7s, 8.2s, 7.8s, 8.2s, 9.3s, 8.4s
        The average OS boot time was: ~8.3s.

(CHROMIUM tag rationale: Submitted upstream but got lots of pushback as
it may harm a PREEMPT_RT system -- the concern is VERY theoretical and
this improves things for ChromeOS. Plus we are not a PREEMPT_RT system.
So I am strongly suggesting this mostly simple change for ChromeOS.)

Bug: 258241771
Bug: 268129466
Test: boot
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Change-Id: Ibd262189d7f92dbcc57f1508efe90fcfba95a6cc
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4350228
Commit-Queue: Joel Fernandes <joelaf@google.com>
Commit-Queue: Vineeth Pillai <vineethrp@google.com>
Tested-by: Vineeth Pillai <vineethrp@google.com>
Tested-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
(cherry picked from commit 7968079ec77b320ee9d4115fe13048a8f7afbc02)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style. Prefix boot param with android_]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Uladzislau Rezki (Sony)
16ea06fe44 UPSTREAM: rcu/kvfree: Move need_offload_krc() out of krcp->lock
The need_offload_krc() function currently holds the krcp->lock in order
to safely check krcp->head.  This commit removes the need for this lock
in that function by updating the krcp->head pointer using WRITE_ONCE()
macro so that readers can carry out lockless loads of that pointer.

Bug: 258241771
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 8fc5494ad5)
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: Iddde5ec15e8574216abc95d8c64efa5c66868508
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
5d1a3986c2 UPSTREAM: rcu/kfree: Fix kfree_rcu_shrink_count() return value
As per the comments in include/linux/shrinker.h, .count_objects callback
should return the number of freeable items, but if there are no objects
to free, SHRINK_EMPTY should be returned. The only time 0 is returned
should be when we are unable to determine the number of objects, or the
cache should be skipped for another reason.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 3826909635)

Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: I5cb380fceaccc85971a47773d9058f0ea044c6dd
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4332178
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Reviewed-by: Sean Paul <sean@poorly.run>
(cherry picked from commit 3243f1e22bf915c9b805a96cc4a8cbc03ed5d7a8)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Uladzislau Rezki (Sony)
88587c1838 UPSTREAM: rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval
Currently the monitor work is scheduled with a fixed interval of HZ/20,
which is roughly 50 milliseconds. The drawback of this approach is
low utilization of the 512 page slots in scenarios with infrequence
kvfree_rcu() calls.  For example on an Android system:

<snip>
  kworker/3:3-507     [003] ....   470.286305: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=6
  kworker/6:1-76      [006] ....   470.416613: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000ea0d6556 nr_records=1
  kworker/6:1-76      [006] ....   470.416625: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003e025849 nr_records=9
  kworker/3:3-507     [003] ....   471.390000: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=48
  kworker/1:1-73      [001] ....   471.725785: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000fda9bf20 nr_records=3
  kworker/1:1-73      [001] ....   471.725833: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000a425b67b nr_records=76
  kworker/0:4-1411    [000] ....   472.085673: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007996be9d nr_records=1
  kworker/0:4-1411    [000] ....   472.085728: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=5
  kworker/6:1-76      [006] ....   472.260340: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000065630ee4 nr_records=102
<snip>

In many cases, out of 512 slots, fewer than 10 were actually used.
In order to improve batching and make utilization more efficient this
commit sets a drain interval to a fixed 5-seconds interval. Floods are
detected when a page fills quickly, and in that case, the reclaim work
is re-scheduled for the next scheduling-clock tick (jiffy).

After this change:

<snip>
  kworker/7:1-371     [007] ....  5630.725708: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000005ab0ffb3 nr_records=121
  kworker/7:1-371     [007] ....  5630.989702: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000060c84761 nr_records=47
  kworker/7:1-371     [007] ....  5630.989714: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000000babf308 nr_records=510
  kworker/7:1-371     [007] ....  5631.553790: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000bb7bd0ef nr_records=169
  kworker/7:1-371     [007] ....  5631.553808: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044c78753 nr_records=510
  kworker/5:6-9428    [005] ....  5631.746102: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d98519aa nr_records=123
  kworker/4:7-9434    [004] ....  5632.001758: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000526c9d44 nr_records=322
  kworker/4:7-9434    [004] ....  5632.002073: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000002c6a8afa nr_records=185
  kworker/7:1-371     [007] ....  5632.277515: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007f4a962f nr_records=510
<snip>

Here, all but one of the cases, more than one hundreds slots were used,
representing an order-of-magnitude improvement.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 51824b780b)

Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: I4635ba0dbece4e029d5271ef3950b8eaa1ae5e81
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4332177
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Reviewed-by: Sean Paul <sean@poorly.run>
(cherry picked from commit b1bf359877e084383be107bf0008d58d0a6b15e3)
[Conflict due to 71cf9c9835 adding a new
function in the same location.
Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
5b47d8411d UPSTREAM: rcu/kvfree: Remove useless monitor_todo flag
monitor_todo is not needed as the work struct already tracks
if work is pending. Just use that to know if work is pending
using schedule_delayed_work() helper.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
(cherry picked from commit 82d26c36cc)

Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: I4c13f89da735a628a5030ab55a13e338b97da4b8
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4332176
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
(cherry picked from commit bb867be28d6a70b36ff1d6563f794c489072ab7e)
[Minor conflict with 71cf9c9835 where it
added a new function in the same location.
Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Uladzislau Rezki
84828604c7 UPSTREAM: scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu()
Earlier commits in this series allow battery-powered systems to build
their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
This Kconfig option causes call_rcu() to delay its callbacks in order
to batch them.  This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing.  This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.

This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.

Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory.  If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order.  Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.

However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked.  It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU.  After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.

Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.

And another call_rcu() instance that cannot be lazy is the one in the
scsi_eh_scmd_add() function.  Leaving this instance lazy results in
unacceptably slow boot times.

Therefore, make scsi_eh_scmd_add() use call_rcu_hurry() in order to
revert to the old behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Bug: 258241771
Bug: 222463781
Test: CQ
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Change-Id: I95bba865e582b0a12b1c09ba1f0bd4f897401c07
Signed-off-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: <linux-scsi@vger.kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 54d87b0a0c)
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318056
Commit-Queue: Joel Fernandes <joelaf@google.com>
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Tested-by: Joel Fernandes <joelaf@google.com>
(cherry picked from commit 5578f9ac27d25e3e57a5b9c4cf0346cfc5162994)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
a4124a21b1 ANDROID: rxrpc: Use call_rcu_hurry() instead of call_rcu()
call_rcu() changes to save power may cause slowness. Use the
call_rcu_hurry() API instead which reverts to the old behavior.

We find this via inspection that the RCU callback does a wakeup of a
thread. This usually indicates that something is waiting on it. To be
safe, let us use call_rcu_hurry() here instead.

[ joel: Upstream is rewriting this code, so I am merging this as a CHROMIUM
  patch. There is no harm in including it.
  Link: https://lore.kernel.org/rcu/658624.1669849522@warthog.procyon.org.uk/#t ]

Bug: 258241771
Bug: 222463781
Test: CQ
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Change-Id: Iaadfe2f9db189489915828c6f2f74522f4b90ea3
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3965078
Reviewed-by: Ross Zwisler <zwisler@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318055
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
(cherry picked from commit 1f98f32393f83d14bc290fef06d5b3132bee23e0)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Eric Dumazet
930bdc0924 UPSTREAM: net: devinet: Reduce refcount before grace period
Currently, the inetdev_destroy() function waits for an RCU grace period
before decrementing the refcount and freeing memory. This causes a delay
with a new RCU configuration that tries to save power, which results in the
network interface disappearing later than expected. The resulting delay
causes test failures on ChromeOS.

Refactor the code such that the refcount is freed before the grace period
and memory is freed after. With this a ChromeOS network test passes that
does 'ip netns del' and polls for an interface disappearing, now passes.

Bug: 258241771
Bug: 222463781
Test: CQ
Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Change-Id: I98b13c5a8fb9696c1111219d774cf91c8b14b4c5
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: <netdev@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 9d40c84cf5)
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318054
Tested-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Commit-Queue: Joel Fernandes <joelaf@google.com>
Reviewed-by: Sean Paul <sean@poorly.run>
(cherry picked from commit 3c0f4bb182d6b0be5424947b53019e92bea8b38c)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
706e751b33 UPSTREAM: rcu: Disable laziness if lazy-tracking says so
During suspend, we see failures to suspend 1 in 300-500 suspends.
Looking closer, it appears that asynchronous RCU callbacks are being
queued as lazy even though synchronous callbacks are expedited. These
delays appear to not be very welcome by the suspend/resume code as
evidenced by these occasional suspend failures.

This commit modifies call_rcu() to check if rcu_async_should_hurry(),
which will return true if we are in suspend or in-kernel boot.

[ paulmck: Alphabetize local variables. ]

Ignoring the lazy hint makes the 3000 suspend/resume cycles pass
reliably on a 12th gen 12-core Intel CPU, and there is some evidence
that it also slightly speeds up boot performance.

Bug: 258241771
Bug: 222463781
Test: CQ
Fixes: 3cb278e73b ("rcu: Make call_rcu() lazy to save power")
Change-Id: I4cfe6f43de8bae9a6c034831c79d9773199d6d29
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit cf7066b97e)
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318052
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Tested-by: Joel Fernandes <joelaf@google.com>
Commit-Queue: Joel Fernandes <joelaf@google.com>
(cherry picked from commit e59686da91b689d3771a09f3eae37db5f40d3f75)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
8568593719 UPSTREAM: rcu: Track laziness during boot and suspend
Boot and suspend/resume should not be slowed down in kernels built with
CONFIG_RCU_LAZY=y.  In particular, suspend can sometimes fail in such
kernels.

This commit therefore adds rcu_async_hurry(), rcu_async_relax(), and
rcu_async_should_hurry() functions that track whether or not either
a boot or a suspend/resume operation is in progress.  This will
enable a later commit to refrain from laziness during those times.

Export rcu_async_should_hurry(), rcu_async_hurry(), and rcu_async_relax()
for later use by rcutorture.

[ paulmck: Apply feedback from Steve Rostedt. ]

Bug: 258241771
Bug: 222463781
Test: CQ
Fixes: 3cb278e73b ("rcu: Make call_rcu() lazy to save power")
Change-Id: Ieb2f2d484a33cfbd71f71c8e3dbcfc05cd7efe8c
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 6efdda8bec)
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318051
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Reviewed-by: Sean Paul <sean@poorly.run>
Tested-by: Joel Fernandes <joelaf@google.com>
Commit-Queue: Joel Fernandes <joelaf@google.com>
(cherry picked from commit 8bc7efc64c84da753f2174a7071c8f1a7823d2bb)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
f12c162eac UPSTREAM: net: Use call_rcu_hurry() for dst_release()
In a networking test on ChromeOS, kernels built with the new
CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown
phase.

This failure may be reproduced as follows: ip netns del <name>

The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits
in this series for the benefit of certain battery-powered systems.
This Kconfig option causes call_rcu() to delay its callbacks in order
to batch them.  This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing.  This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.

This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.

Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory.  If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order.  Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.

However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked.  It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU.  After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.

Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.

Returning to the test failure, use of ftrace showed that this failure
cause caused by the aadded delays due to this new lazy behavior of
call_rcu() in kernels built with CONFIG_RCU_LAZY=y.

Therefore, make dst_release() use call_rcu_hurry() in order to revert
to the old test-failure-free behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: Ifd64083bd210a9dfe94c179152f27d310c179507
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: <netdev@vger.kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 483c26ff63)
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318050
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
(cherry picked from commit e0886387489fed8a60e7e0f107b95fb9c0241930)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
ff22b562f0 UPSTREAM: percpu-refcount: Use call_rcu_hurry() for atomic switch
Earlier commits in this series allow battery-powered systems to build
their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
This Kconfig option causes call_rcu() to delay its callbacks in order to
batch callbacks.  This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing.  This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.

This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.

Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory.  If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order.  Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.

However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked.  It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU.  After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.

Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.

And another call_rcu() instance that cannot be lazy is the one on the
percpu refcounter's "per-CPU to atomic switch" code path, which
uses RCU when switching to atomic mode.  The enqueued callback
wakes up waiters waiting in the percpu_ref_switch_waitq.  Allowing
this callback to be lazy would result in unacceptable slowdowns for
users of per-CPU refcounts, such as blk_pre_runtime_suspend().

Therefore, make __percpu_ref_switch_to_atomic() use call_rcu_hurry()
in order to revert to the old behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: Icc325f69d0df1a37b6f1de02a284e1fabf20e366
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: <linux-mm@kvack.org>
(cherry picked from commit 343a72e5e3)
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318049
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Reviewed-by: Sean Paul <sean@poorly.run>
Tested-by: Joel Fernandes <joelaf@google.com>
Commit-Queue: Joel Fernandes <joelaf@google.com>
(cherry picked from commit dfd536f499642cd18679cc64c79a8fb275137f45)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
a4cc1aa22d UPSTREAM: rcu/sync: Use call_rcu_hurry() instead of call_rcu
call_rcu() changes to save power will slow down rcu sync. Use the
call_rcu_hurry() API instead which reverts to the old behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: I5123ba52f47676305dbcfa1233bf3b41f140766c
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 7651d6b250)
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318048
Reviewed-by: Sean Paul <sean@poorly.run>
Commit-Queue: Joel Fernandes <joelaf@google.com>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Tested-by: Joel Fernandes <joelaf@google.com>
(cherry picked from commit 183fce4e1bfbbae1266ec90c6bb871b51d7af81c)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
222a4cd66c UPSTREAM: rcu: Refactor code a bit in rcu_nocb_do_flush_bypass()
This consolidates the code a bit and makes it cleaner. Functionally it
is the same.

Bug: 258241771
Bug: 222463781
Test: CQ
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

(cherry picked from commit 3d222a0c0c)
Change-Id: I8422c7138edd6a476fc46374beefdf46dd76b8b0
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318047
Tested-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Commit-Queue: Joel Fernandes <joelaf@google.com>
(cherry picked from commit 58cb433d445d2416ba26645e8df63d86afa15f8c)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Vineeth Pillai
f4abe7bb5f BACKPORT: rcu: Shrinker for lazy rcu
The shrinker is used to speed up the free'ing of memory potentially held
by RCU lazy callbacks. RCU kernel module test cases show this to be
effective. Test is introduced in a later patch.

[Joel: register_shrinker() argument list change.]

Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: I6a73a9dae79ff35feca37abe2663e55a0f46dda8
Signed-off-by: Vineeth Pillai <vineeth@bitbyteword.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit c945b4da7a)
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318046
Tested-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Commit-Queue: Joel Fernandes <joelaf@google.com>
(cherry picked from commit 2cf50ca2e7c3bc08f5182fc517a89a65e8dca7e3)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
e0297c38a5 BACKPORT: rcu: Make call_rcu() lazy to save power
Implement timer-based RCU callback batching (also known as lazy
callbacks). With this we save about 5-10% of power consumed due
to RCU requests that happen when system is lightly loaded or idle.

By default, all async callbacks (queued via call_rcu) are marked
lazy. An alternate API call_rcu_hurry() is provided for the few users,
for example synchronize_rcu(), that need the old behavior.

The batch is flushed whenever a certain amount of time has passed, or
the batch on a particular CPU grows too big. Also memory pressure will
flush it in a future patch.

To handle several corner cases automagically (such as rcu_barrier() and
hotplug), we re-use bypass lists which were originally introduced to
address lock contention, to handle lazy CBs as well. The bypass list
length has the lazy CB length included in it. A separate lazy CB length
counter is also introduced to keep track of the number of lazy CBs.

[ paulmck: Fix formatting of inline call_rcu_lazy() definition. ]
[ paulmck: Apply Zqiang feedback. ]
[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

[ joelaf: Small changes for 5.15 backport. ]

Suggested-by: Paul McKenney <paulmck@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Bug: 258241771
Bug: 222463781
Test: CQ
(cherry picked from commit 3cb278e73b
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master)
Change-Id: I557d5af2a5d317bd66e9ec55ed40822bb5c54390
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318045
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Commit-Queue: Joel Fernandes <joelaf@google.com>
Tested-by: Joel Fernandes <joelaf@google.com>
(cherry picked from commit b30e520b9da88a5de115ed5b2c1b2aa89de9e214)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)
276d33f21a UPSTREAM: rcu: Fix late wakeup when flush of bypass cblist happens
When the bypass cblist gets too big or its timeout has occurred, it is
flushed into the main cblist. However, the bypass timer is still running
and the behavior is that it would eventually expire and wake the GP
thread.

Since we are going to use the bypass cblist for lazy CBs, do the wakeup
soon as the flush for "too big or too long" bypass list happens.
Otherwise, long delays can happen for callbacks which get promoted from
lazy to non-lazy.

This is a good thing to do anyway (regardless of future lazy patches),
since it makes the behavior consistent with behavior of other code paths
where flushing into the ->cblist makes the GP kthread into a
non-sleeping state quickly.

[ Frederic Weisbecker: Changes to avoid unnecessary GP-thread wakeups plus
		    comment changes. ]

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit b50606f35f)

Bug: 258241771
Bug: 222463781
Test: powerIdle lab tests.
Change-Id: If8da96d7ba6ed90a2a70f7d56f7bb03af44fd649
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4065239
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
(cherry picked from commit 75db04e1eed1756a4ee5fb87ef8dd494d19bf53f)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Frederic Weisbecker
24e6758060 BACKPORT: rcu: Fix missing nocb gp wake on rcu_barrier()
In preparation for RCU lazy changes, wake up the RCU nocb gp thread if
needed after an entrain.  This change prevents the RCU barrier callback
from waiting in the queue for several seconds before the lazy callbacks
in front of it are serviced.

Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit b8f7aca3f0
 https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next)

(Backport:
Conflicts:
   kernel/rcu/tree.c
Due to missing 'rcu: Rework rcu_barrier() and callback-migration logic'
Chose not to backport that.)

Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: Ib55c5886764b74df22531eca35f076ef7acc08dd
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4062165
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
(cherry picked from commit fc6e55ea65dca9cc52bda6081341f3fcc87f6ee7)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
2024-04-29 22:43:39 +00:00
Florian Westphal
fb310d468a UPSTREAM: netfilter: nft_set_pipapo: do not free live element
[ Upstream commit 3cfc9ec039af60dbd8965ae085b2c2ccdcfbe1cc ]

Pablo reports a crash with large batches of elements with a
back-to-back add/remove pattern.  Quoting Pablo:

  add_elem("00000000") timeout 100 ms
  ...
  add_elem("0000000X") timeout 100 ms
  del_elem("0000000X") <---------------- delete one that was just added
  ...
  add_elem("00005000") timeout 100 ms

  1) nft_pipapo_remove() removes element 0000000X
  Then, KASAN shows a splat.

Looking at the remove function there is a chance that we will drop a
rule that maps to a non-deactivated element.

Removal happens in two steps, first we do a lookup for key k and return the
to-be-removed element and mark it as inactive in the next generation.
Then, in a second step, the element gets removed from the set/map.

The _remove function does not work correctly if we have more than one
element that share the same key.

This can happen if we insert an element into a set when the set already
holds an element with same key, but the element mapping to the existing
key has timed out or is not active in the next generation.

In such case its possible that removal will unmap the wrong element.
If this happens, we will leak the non-deactivated element, it becomes
unreachable.

The element that got deactivated (and will be freed later) will
remain reachable in the set data structure, this can result in
a crash when such an element is retrieved during lookup (stale
pointer).

Add a check that the fully matching key does in fact map to the element
that we have marked as inactive in the deactivation step.
If not, we need to continue searching.

Add a bug/warn trap at the end of the function as well, the remove
function must not ever be called with an invisible/unreachable/non-existent
element.

v2: avoid uneeded temporary variable (Stefano)

Bug: 336735501
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ebf7c9746f)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ic9a48ac9ac0f9960fea9e066d9a0a9fb93f7b633
2024-04-29 15:19:22 +00:00
seanwang1
444a497469 ANDROID: GKI: Update lenovo symbol list
3 function symbols added
  'void css_task_iter_end(struct css_task_iter*)'
  'struct task_struct* css_task_iter_next(struct css_task_iter*)'
  'void css_task_iter_start(struct cgroup_subsys_state*, unsigned int, struct css_task_iter*)'

Bug: 336967294
Change-Id: I7258e06fe9f1e21d73481d47a5cc54bb95e40646
Signed-off-by: seanwang1 <seanwang1@lenovo.com>
2024-04-29 15:17:00 +00:00
seanwang1
978f805a2d ANDROID: GKI: Export css_task_iter_start()
Export css_task_iter_start() and css_task_iter_next() and
css_task_iter_end() inorder to support task iteration in a cgroup in
vendor modules.

Bug: 336967294

Change-Id: Id93963ddd30ab02c7a4d5086f19d15310e4eda14
Signed-off-by: seanwang1 <seanwang1@lenovo.com>
2024-04-29 15:17:00 +00:00
Suzuki K Poulose
0ae4f32634 FROMGIT: coresight: etm4x: Fix access to resource selector registers
Resource selector pair 0 is always implemented and reserved. We must not
touch it, even during save/restore for CPU Idle. Rest of the driver is
well behaved. Fix the offending ones.

Reported-by: Yabin Cui <yabinc@google.com>
Fixes: f188b5e76a ("coresight: etm4x: Save/restore state across CPU low power states")
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Yabin Cui <yabinc@google.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Link: https://lore.kernel.org/r/20240412142702.2882478-5-suzuki.poulose@arm.com

Bug: 335234033
(cherry picked from commit d6fc00d0f640d6010b51054aa8b0fd191177dbc9
 https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git
 next)
Change-Id: I5f3385cb269969a299402fa258b30ab43e95805f
Signed-off-by: Yabin Cui <yabinc@google.com>
2024-04-26 12:23:30 -07:00
Suzuki K Poulose
8ba1802287 BACKPORT: FROMGIT: coresight: etm4x: Safe access for TRCQCLTR
ETM4x implements TRCQCLTR only when the Q elements are supported
and the Q element filtering is supported (TRCIDR0.QFILT). Access
to the register otherwise could be fatal. Fix this by tracking the
availability, like the others.

Fixes: f188b5e76a ("coresight: etm4x: Save/restore state across CPU low power states")
Reported-by: Yabin Cui <yabinc@google.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Yabin Cui <yabinc@google.com>
Link: https://lore.kernel.org/r/20240412142702.2882478-4-suzuki.poulose@arm.com

Bug: 335234033
(cherry picked from commit 46bf8d7cd8530eca607379033b9bc4ac5590a0cd
 https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git
 next)
Change-Id: Id848fa14ba8003149f76b5ca54562593f6164150
Signed-off-by: Yabin Cui <yabinc@google.com>
2024-04-26 12:23:14 -07:00
Suzuki K Poulose
6a08c9fb9d FROMGIT: coresight: etm4x: Do not save/restore Data trace control registers
ETM4x doesn't support Data trace on A class CPUs. As such do not access the
Data trace control registers during CPU idle. This could cause problems for
ETE. While at it, remove all references to the Data trace control registers.

Fixes: f188b5e76a ("coresight: etm4x: Save/restore state across CPU low power states")
Reported-by: Yabin Cui <yabinc@google.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Yabin Cui <yabinc@google.com>
Link: https://lore.kernel.org/r/20240412142702.2882478-3-suzuki.poulose@arm.com

Bug: 335234033
(cherry picked from commit 5eb3a0c2c52368cb9902e9a6ea04888e093c487d
 https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git
 next)
Change-Id: I06977d86aa2d876d166db0fac8fbccf48fd07229
Signed-off-by: Yabin Cui <yabinc@google.com>
2024-04-26 12:23:02 -07:00
Suzuki K Poulose
a02278f990 FROMGIT: coresight: etm4x: Do not hardcode IOMEM access for register restore
When we restore the register state for ETM4x, while coming back
from CPU idle, we hardcode IOMEM access. This is wrong and could
blow up for an ETM with system instructions access (and for ETE).

Fixes: f5bd523690 ("coresight: etm4x: Convert all register accesses")
Reported-by: Yabin Cui <yabinc@google.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Yabin Cui <yabinc@google.com>
Link: https://lore.kernel.org/r/20240412142702.2882478-2-suzuki.poulose@arm.com

Bug: 335234033
(cherry picked from commit 1e7ba33fa591de1cf60afffcabb45600b3607025
 https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git
 next)
Change-Id: Id2ea066374933de51a90f1fca8304338b741845d
Signed-off-by: Yabin Cui <yabinc@google.com>
2024-04-26 12:22:54 -07:00
Michal Luczaj
e8e652b8c8 UPSTREAM: af_unix: Fix garbage collector racing against connect()
[ Upstream commit 47d8ac011fe1c9251070e1bd64cb10b48193ec51 ]

Garbage collector does not take into account the risk of embryo getting
enqueued during the garbage collection. If such embryo has a peer that
carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
different set of children. Leading to an incorrectly elevated inflight
count, and then a dangling pointer within the gc_inflight_list.

sockets are AF_UNIX/SOCK_STREAM
S is an unconnected socket
L is a listening in-flight socket bound to addr, not in fdtable
V's fd will be passed via sendmsg(), gets inflight count bumped

connect(S, addr)	sendmsg(S, [V]); close(V)	__unix_gc()
----------------	-------------------------	-----------

NS = unix_create1()
skb1 = sock_wmalloc(NS)
L = unix_find_other(addr)
unix_state_lock(L)
unix_peer(S) = NS
			// V count=1 inflight=0

 			NS = unix_peer(S)
 			skb2 = sock_alloc()
			skb_queue_tail(NS, skb2[V])

			// V became in-flight
			// V count=2 inflight=1

			close(V)

			// V count=1 inflight=1
			// GC candidate condition met

						for u in gc_inflight_list:
						  if (total_refs == inflight_refs)
						    add u to gc_candidates

						// gc_candidates={L, V}

						for u in gc_candidates:
						  scan_children(u, dec_inflight)

						// embryo (skb1) was not
						// reachable from L yet, so V's
						// inflight remains unchanged
__skb_queue_tail(L, skb1)
unix_state_unlock(L)
						for u in gc_candidates:
						  if (u.inflight)
						    scan_children(u, inc_inflight_move_tail)

						// V count=1 inflight=2 (!)

If there is a GC-candidate listening socket, lock/unlock its state. This
makes GC wait until the end of any ongoing connect() to that socket. After
flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
there is another embryo coming, it can not possibly carry SCM_RIGHTS. At
this point, unix_inflight() can not happen because unix_gc_lock is already
taken. Inflight graph remains unaffected.

Bug: 336226035
Fixes: 1fd05ba5a2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 507cc232ff)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: If321f78b8b3220f5a1caea4b5e9450f1235b0770
2024-04-22 16:24:10 -07:00