linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-05 10:31:46 +09:00

Author	SHA1	Message	Date
Nathan Chancellor	4dd58977f2	BACKPORT: FROMGIT: kbuild: Remove support for Clang's ThinLTO caching There is an issue in clang's ThinLTO caching (enabled for the kernel via '--thinlto-cache-dir') with .incbin, which the kernel occasionally uses to include data within the kernel, such as the .config file for /proc/config.gz. For example, when changing the .config and rebuilding vmlinux, the copy of .config in vmlinux does not match the copy of .config in the build folder: $ echo 'CONFIG_LTO_NONE=n CONFIG_LTO_CLANG_THIN=y CONFIG_IKCONFIG=y CONFIG_HEADERS_INSTALL=y' >kernel/configs/repro.config $ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 clean defconfig repro.config vmlinux ... $ grep CONFIG_HEADERS_INSTALL .config CONFIG_HEADERS_INSTALL=y $ scripts/extract-ikconfig vmlinux \| grep CONFIG_HEADERS_INSTALL CONFIG_HEADERS_INSTALL=y $ scripts/config -d HEADERS_INSTALL $ make -kj"$(nproc)" ARCH=x86_64 LLVM=1 vmlinux ... UPD kernel/config_data GZIP kernel/config_data.gz CC kernel/configs.o ... LD vmlinux ... $ grep CONFIG_HEADERS_INSTALL .config # CONFIG_HEADERS_INSTALL is not set $ scripts/extract-ikconfig vmlinux \| grep CONFIG_HEADERS_INSTALL CONFIG_HEADERS_INSTALL=y Without '--thinlto-cache-dir' or when using full LTO, this issue does not occur. Benchmarking incremental builds on a few different machines with and without the cache shows a 20% increase in incremental build time without the cache when measured by touching init/main.c and running 'make all'. ARCH=arm64 defconfig + CONFIG_LTO_CLANG_THIN=y on an arm64 host: Benchmark 1: With ThinLTO cache Time (mean ± σ): 56.347 s ± 0.163 s [User: 83.768 s, System: 24.661 s] Range (min … max): 56.109 s … 56.594 s 10 runs Benchmark 2: Without ThinLTO cache Time (mean ± σ): 67.740 s ± 0.479 s [User: 718.458 s, System: 31.797 s] Range (min … max): 67.059 s … 68.556 s 10 runs Summary With ThinLTO cache ran 1.20 ± 0.01 times faster than Without ThinLTO cache ARCH=x86_64 defconfig + CONFIG_LTO_CLANG_THIN=y on an x86_64 host: Benchmark 1: With ThinLTO cache Time (mean ± σ): 85.772 s ± 0.252 s [User: 91.505 s, System: 8.408 s] Range (min … max): 85.447 s … 86.244 s 10 runs Benchmark 2: Without ThinLTO cache Time (mean ± σ): 103.833 s ± 0.288 s [User: 232.058 s, System: 8.569 s] Range (min … max): 103.286 s … 104.124 s 10 runs Summary With ThinLTO cache ran 1.21 ± 0.00 times faster than Without ThinLTO cache While it is unfortunate to take this performance improvement off the table, correctness is more important. If/when this is fixed in LLVM, it can potentially be brought back in a conditional manner. Alternatively, a developer can just disable LTO if doing incremental compiles quickly is important, as a full compile cycle can still take over a minute even with the cache and it is unlikely that LTO will result in functional differences for a kernel change. Cc: stable@vger.kernel.org Fixes: `dc5723b02e` ("kbuild: add support for Clang LTO") Reported-by: Yifan Hong <elsk@google.com> Closes: https://github.com/ClangBuiltLinux/linux/issues/2021 Reported-by: Masami Hiramatsu <mhiramat@kernel.org> Closes: https://lore.kernel.org/r/20220327115526.cc4b0ff55fc53c97683c3e4d@kernel.org/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Bug: 312268956 Bug: 335301039 Change-Id: Iace492db67f28e172427669b1b7eb6a8c44dd3aa (cherry picked from commit aba091547ef6159d52471f42a3ef531b7b660ed8 https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git kbuild) [elsk: Resolved minor conflict in Makefile] Signed-off-by: Yifan Hong <elsk@google.com>	2024-05-06 22:21:09 +00:00
Kalesh Singh	2a64b7fe17	ANDROID: 16K: Fix call to show_pad_maps_fn() The call should be for printing maps not smaps; fix this. Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: I8356b9e93fa2a300cb8bcd66fed857d42e8bfdca Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2024-05-01 19:55:54 +00:00
Kalesh Singh	596b6507c3	ANDROID: 16K: Fix show maps CFI failure If the kernel is built CONFIG_CFI_CLANG=y, reading smaps may cause a panic. This is due to a failed CFI check; which is triggered becuase the signature of the function pointer for printing smaps padding VMAs does not match exactly with that for show_smap(). Fix this by casting the function pointer to the expected type based on whether printing maps or smaps padding. Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: I65564a547dacbc4131f8557344c8c96e51f90cd5 Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2024-05-01 16:35:25 +00:00
Kalesh Singh	d83231efe4	ANDROID: 16K: Handle pad VMA splits and merges In some cases a VMA with padding representation may be split, and therefore the padding flags must be updated accordingly. There are 3 cases to handle: Given: \| DDDDPPPP \| where: - D represents 1 page of data; - P represents 1 page of padding; - \| represents the boundaries (start/end) of the VMA 1) Split exactly at the padding boundary \| DDDDPPPP \| --> \| DDDD \| PPPP \| - Remove padding flags from the first VMA. - The second VMA is all padding 2) Split within the padding area \| DDDDPPPP \| --> \| DDDDPP \| PP \| - Subtract the length of the second VMA from the first VMA's padding. - The second VMA is all padding, adjust its padding length (flags) 3) Split within the data area \| DDDDPPPP \| --> \| DD \| DDPPPP \| - Remove padding flags from the first VMA. - The second VMA is has the same padding as from before the split. To simplify the semantics merging of padding VMAs is not allowed. If a split produces a VMA that is entirely padding, show_[s]maps() only outputs the padding VMA entry (as the data entry is of length 0). Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: Ie2628ced5512e2c7f8af25fabae1f38730c8bb1a Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2024-04-29 23:44:57 +00:00
Kalesh Singh	19d6e7eb47	ANDROID: 16K: madvise_vma_pad_pages: Remove filemap_fault check Some file systems like F2FS use a custom filemap_fault ops. Remove this check, as checking vm_file is sufficient. Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: Id6a584d934f06650c0a95afd1823669fc77ba2c2 Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2024-04-29 23:44:57 +00:00
Kalesh Singh	ae44e8dac8	ANDROID: 16K: Only madvise padding from dynamic linker context Only preform padding advise from the execution context on bionic's dynamic linker. This ensures that madvise() doesn't have unwanted side effects. Also rearrange the order of fail checks in madvise_vma_pad_pages() in order of ascending cost. Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: I3e05b8780c6eda78007f86b613f8c11dd18ac28f Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2024-04-29 23:44:57 +00:00
Qais Yousef	ae67f18944	ANDROID: Enable CONFIG_LAZY_RCU in x86 gki_defconfig It is still disabled by default. Must specify rcutree.android_enable_rcu_lazy and rcu_nocbs=all in boot time parameter to actually enable it. Bug: 258241771 Change-Id: Ic9e15b846d58ffa3d5dd81842c568da79352ff2d Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Qais Yousef	d38091b4ff	ANDROID: Enable CONFIG_LAZY_RCU in arm64 gki_defconfig It is still disabled by default. Must specify rcutree.android_enable_rcu_lazy and rcu_nocbs=all in boot time parameter to actually enable it. Bug: 258241771 Change-Id: I11c920aa5edde2fc42ab54245cd198eb8cb47616 Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Qais Yousef	37b02c190c	FROMLIST: rcu: Provide a boot time parameter to control lazy RCU To allow more flexible arrangements while still provide a single kernel for distros, provide a boot time parameter to enable/disable lazy RCU. Specify: rcutree.enable_rcu_lazy=[y\|1\|n\|0] Which also requires rcu_nocbs=all at boot time to enable/disable lazy RCU. To disable it by default at build time when CONFIG_RCU_LAZY=y, the new CONFIG_RCU_LAZY_DEFAULT_OFF can be used. Bug: 258241771 Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Tested-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/lkml/20231203011252.233748-1-qyousef@layalina.io/ [Fix trivial conflicts rejecting newer code that doesn't exist on 5.15] Signed-off-by: Qais Yousef <qyousef@google.com> Change-Id: Ib5585ae717a2ba7749f2802101b785c4e5de8a90	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	4adb60810c	ANDROID: rcu: Add a minimum time for marking boot as completed On many systems, a great deal of boot (in userspace) happens after the kernel thinks the boot has completed. It is difficult to determine if the system has really booted from the kernel side. Some features like lazy-RCU can risk slowing down boot time if, say, a callback has been added that the boot synchronously depends on. Further expedited callbacks can get unexpedited way earlier than it should be, thus slowing down boot (as shown in the data below). For these reasons, this commit adds a config option 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter rcupdate.boot_end_delay. Userspace can also make RCU's view of the system as booted, by writing the time in milliseconds to: /sys/module/rcupdate/parameters/android_rcu_boot_end_delay Or even just writing a value of 0 to this sysfs node. However, under no circumstance will the boot be allowed to end earlier than just before init is launched. The default value of CONFIG_RCU_BOOT_END_DELAY is chosen as 15s. This suites ChromeOS and also a PREEMPT_RT system below very well, which need no config or parameter changes, and just a simple application of this patch. A system designer can also choose a specific value here to keep RCU from marking boot completion. As noted earlier, RCU's perspective of the system as booted will not be marker until at least android_rcu_boot_end_delay milliseconds have passed or an update is made via writing a small value (or 0) in milliseconds to: /sys/module/rcupdate/parameters/android_rcu_boot_end_delay. One side-effect of this patch is, there is a risk that a real-time workload launched just after the kernel boots will suffer interruptions due to expedited RCU, which previous ended just before init was launched. However, to mitigate such an issue (however unlikely), the user should either tune CONFIG_RCU_BOOT_END_DELAY to a smaller value than 15 seconds or write a value of 0 to /sys/module/rcupdate/parameters/android_rcu_boot_end_delay, once userspace boots, and before launching the real-time workload. Qiuxu also noted impressive boot-time improvements with earlier version of patch. An excerpt from the data he shared: 1) Testing environment: OS : CentOS Stream 8 (non-RT OS) Kernel : v6.2 Machine : Intel Cascade Lake server (2 sockets, each with 44 logical threads) Qemu args : -cpu host -enable-kvm, -smp 88,threads=2,sockets=2, … 2) OS boot time definition: The time from the start of the kernel boot to the shell command line prompt is shown from the console. [ Different people may have different OS boot time definitions. ] 3) Measurement method (very rough method): A timer in the kernel periodically prints the boot time every 100ms. As soon as the shell command line prompt is shown from the console, we record the boot time printed by the timer, then the printed boot time is the OS boot time. 4) Measured OS boot time (in seconds) a) Measured 10 times w/o this patch: 8.7s, 8.4s, 8.6s, 8.2s, 9.0s, 8.7s, 8.8s, 9.3s, 8.8s, 8.3s The average OS boot time was: ~8.7s b) Measure 10 times w/ this patch: 8.5s, 8.2s, 7.6s, 8.2s, 8.7s, 8.2s, 7.8s, 8.2s, 9.3s, 8.4s The average OS boot time was: ~8.3s. (CHROMIUM tag rationale: Submitted upstream but got lots of pushback as it may harm a PREEMPT_RT system -- the concern is VERY theoretical and this improves things for ChromeOS. Plus we are not a PREEMPT_RT system. So I am strongly suggesting this mostly simple change for ChromeOS.) Bug: 258241771 Bug: 268129466 Test: boot Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Change-Id: Ibd262189d7f92dbcc57f1508efe90fcfba95a6cc Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4350228 Commit-Queue: Joel Fernandes <joelaf@google.com> Commit-Queue: Vineeth Pillai <vineethrp@google.com> Tested-by: Vineeth Pillai <vineethrp@google.com> Tested-by: Joel Fernandes <joelaf@google.com> Reviewed-by: Vineeth Pillai <vineethrp@google.com> (cherry picked from commit 7968079ec77b320ee9d4115fe13048a8f7afbc02) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style. Prefix boot param with android_] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Uladzislau Rezki (Sony)	16ea06fe44	UPSTREAM: rcu/kvfree: Move need_offload_krc() out of krcp->lock The need_offload_krc() function currently holds the krcp->lock in order to safely check krcp->head. This commit removes the need for this lock in that function by updating the krcp->head pointer using WRITE_ONCE() macro so that readers can carry out lockless loads of that pointer. Bug: 258241771 Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `8fc5494ad5`) Signed-off-by: Qais Yousef <qyousef@google.com> Change-Id: Iddde5ec15e8574216abc95d8c64efa5c66868508	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	5d1a3986c2	UPSTREAM: rcu/kfree: Fix kfree_rcu_shrink_count() return value As per the comments in include/linux/shrinker.h, .count_objects callback should return the number of freeable items, but if there are no objects to free, SHRINK_EMPTY should be returned. The only time 0 is returned should be when we are unable to determine the number of objects, or the cache should be skipped for another reason. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `3826909635`) Bug: 258241771 Bug: 222463781 Test: CQ Change-Id: I5cb380fceaccc85971a47773d9058f0ea044c6dd Signed-off-by: Joel Fernandes <joelaf@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4332178 Reviewed-by: Vineeth Pillai <vineethrp@google.com> Reviewed-by: Sean Paul <sean@poorly.run> (cherry picked from commit 3243f1e22bf915c9b805a96cc4a8cbc03ed5d7a8) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Uladzislau Rezki (Sony)	88587c1838	UPSTREAM: rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval Currently the monitor work is scheduled with a fixed interval of HZ/20, which is roughly 50 milliseconds. The drawback of this approach is low utilization of the 512 page slots in scenarios with infrequence kvfree_rcu() calls. For example on an Android system: <snip> kworker/3:3-507 [003] .... 470.286305: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=6 kworker/6:1-76 [006] .... 470.416613: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000ea0d6556 nr_records=1 kworker/6:1-76 [006] .... 470.416625: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003e025849 nr_records=9 kworker/3:3-507 [003] .... 471.390000: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=48 kworker/1:1-73 [001] .... 471.725785: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000fda9bf20 nr_records=3 kworker/1:1-73 [001] .... 471.725833: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000a425b67b nr_records=76 kworker/0:4-1411 [000] .... 472.085673: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007996be9d nr_records=1 kworker/0:4-1411 [000] .... 472.085728: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=5 kworker/6:1-76 [006] .... 472.260340: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000065630ee4 nr_records=102 <snip> In many cases, out of 512 slots, fewer than 10 were actually used. In order to improve batching and make utilization more efficient this commit sets a drain interval to a fixed 5-seconds interval. Floods are detected when a page fills quickly, and in that case, the reclaim work is re-scheduled for the next scheduling-clock tick (jiffy). After this change: <snip> kworker/7:1-371 [007] .... 5630.725708: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000005ab0ffb3 nr_records=121 kworker/7:1-371 [007] .... 5630.989702: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000060c84761 nr_records=47 kworker/7:1-371 [007] .... 5630.989714: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000000babf308 nr_records=510 kworker/7:1-371 [007] .... 5631.553790: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000bb7bd0ef nr_records=169 kworker/7:1-371 [007] .... 5631.553808: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044c78753 nr_records=510 kworker/5:6-9428 [005] .... 5631.746102: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d98519aa nr_records=123 kworker/4:7-9434 [004] .... 5632.001758: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000526c9d44 nr_records=322 kworker/4:7-9434 [004] .... 5632.002073: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000002c6a8afa nr_records=185 kworker/7:1-371 [007] .... 5632.277515: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007f4a962f nr_records=510 <snip> Here, all but one of the cases, more than one hundreds slots were used, representing an order-of-magnitude improvement. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `51824b780b`) Bug: 258241771 Bug: 222463781 Test: CQ Change-Id: I4635ba0dbece4e029d5271ef3950b8eaa1ae5e81 Signed-off-by: Joel Fernandes <joelaf@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4332177 Reviewed-by: Vineeth Pillai <vineethrp@google.com> Reviewed-by: Sean Paul <sean@poorly.run> (cherry picked from commit b1bf359877e084383be107bf0008d58d0a6b15e3) [Conflict due to `71cf9c9835` adding a new function in the same location. Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	5b47d8411d	UPSTREAM: rcu/kvfree: Remove useless monitor_todo flag monitor_todo is not needed as the work struct already tracks if work is pending. Just use that to know if work is pending using schedule_delayed_work() helper. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> (cherry picked from commit `82d26c36cc`) Bug: 258241771 Bug: 222463781 Test: CQ Change-Id: I4c13f89da735a628a5030ab55a13e338b97da4b8 Signed-off-by: Joel Fernandes <joelaf@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4332176 Reviewed-by: Sean Paul <sean@poorly.run> Reviewed-by: Vineeth Pillai <vineethrp@google.com> (cherry picked from commit bb867be28d6a70b36ff1d6563f794c489072ab7e) [Minor conflict with `71cf9c9835` where it added a new function in the same location. Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Uladzislau Rezki	84828604c7	UPSTREAM: scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu() Earlier commits in this series allow battery-powered systems to build their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option. This Kconfig option causes call_rcu() to delay its callbacks in order to batch them. This means that a given RCU grace period covers more callbacks, thus reducing the number of grace periods, in turn reducing the amount of energy consumed, which increases battery lifetime which can be a very good thing. This is not a subtle effect: In some important use cases, the battery lifetime is increased by more than 10%. This CONFIG_RCU_LAZY=y option is available only for CPUs that offload callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y. Delaying callbacks is normally not a problem because most callbacks do nothing but free memory. If the system is short on memory, a shrinker will kick all currently queued lazy callbacks out of their laziness, thus freeing their memory in short order. Similarly, the rcu_barrier() function, which blocks until all currently queued callbacks are invoked, will also kick lazy callbacks, thus enabling rcu_barrier() to complete in a timely manner. However, there are some cases where laziness is not a good option. For example, synchronize_rcu() invokes call_rcu(), and blocks until the newly queued callback is invoked. It would not be a good for synchronize_rcu() to block for ten seconds, even on an idle system. Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a given CPU kicks any lazy callbacks that might be already queued on that CPU. After all, if there is going to be a grace period, all callbacks might as well get full benefit from it. Yes, this could be done the other way around by creating a call_rcu_lazy(), but earlier experience with this approach and feedback at the 2022 Linux Plumbers Conference shifted the approach to call_rcu() being lazy with call_rcu_hurry() for the few places where laziness is inappropriate. And another call_rcu() instance that cannot be lazy is the one in the scsi_eh_scmd_add() function. Leaving this instance lazy results in unacceptably slow boot times. Therefore, make scsi_eh_scmd_add() use call_rcu_hurry() in order to revert to the old behavior. [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] Bug: 258241771 Bug: 222463781 Test: CQ Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org> Change-Id: I95bba865e582b0a12b1c09ba1f0bd4f897401c07 Signed-off-by: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: <linux-scsi@vger.kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `54d87b0a0c`) Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318056 Commit-Queue: Joel Fernandes <joelaf@google.com> Reviewed-by: Sean Paul <sean@poorly.run> Reviewed-by: Vineeth Pillai <vineethrp@google.com> Tested-by: Joel Fernandes <joelaf@google.com> (cherry picked from commit 5578f9ac27d25e3e57a5b9c4cf0346cfc5162994) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	a4124a21b1	ANDROID: rxrpc: Use call_rcu_hurry() instead of call_rcu() call_rcu() changes to save power may cause slowness. Use the call_rcu_hurry() API instead which reverts to the old behavior. We find this via inspection that the RCU callback does a wakeup of a thread. This usually indicates that something is waiting on it. To be safe, let us use call_rcu_hurry() here instead. [ joel: Upstream is rewriting this code, so I am merging this as a CHROMIUM patch. There is no harm in including it. Link: https://lore.kernel.org/rcu/658624.1669849522@warthog.procyon.org.uk/#t ] Bug: 258241771 Bug: 222463781 Test: CQ Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Change-Id: Iaadfe2f9db189489915828c6f2f74522f4b90ea3 Signed-off-by: Joel Fernandes <joelaf@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3965078 Reviewed-by: Ross Zwisler <zwisler@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318055 Reviewed-by: Vineeth Pillai <vineethrp@google.com> (cherry picked from commit 1f98f32393f83d14bc290fef06d5b3132bee23e0) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Eric Dumazet	930bdc0924	UPSTREAM: net: devinet: Reduce refcount before grace period Currently, the inetdev_destroy() function waits for an RCU grace period before decrementing the refcount and freeing memory. This causes a delay with a new RCU configuration that tries to save power, which results in the network interface disappearing later than expected. The resulting delay causes test failures on ChromeOS. Refactor the code such that the refcount is freed before the grace period and memory is freed after. With this a ChromeOS network test passes that does 'ip netns del' and polls for an interface disappearing, now passes. Bug: 258241771 Bug: 222463781 Test: CQ Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org> Change-Id: I98b13c5a8fb9696c1111219d774cf91c8b14b4c5 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: David Ahern <dsahern@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: <netdev@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `9d40c84cf5`) Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318054 Tested-by: Joel Fernandes <joelaf@google.com> Reviewed-by: Vineeth Pillai <vineethrp@google.com> Commit-Queue: Joel Fernandes <joelaf@google.com> Reviewed-by: Sean Paul <sean@poorly.run> (cherry picked from commit 3c0f4bb182d6b0be5424947b53019e92bea8b38c) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	706e751b33	UPSTREAM: rcu: Disable laziness if lazy-tracking says so During suspend, we see failures to suspend 1 in 300-500 suspends. Looking closer, it appears that asynchronous RCU callbacks are being queued as lazy even though synchronous callbacks are expedited. These delays appear to not be very welcome by the suspend/resume code as evidenced by these occasional suspend failures. This commit modifies call_rcu() to check if rcu_async_should_hurry(), which will return true if we are in suspend or in-kernel boot. [ paulmck: Alphabetize local variables. ] Ignoring the lazy hint makes the 3000 suspend/resume cycles pass reliably on a 12th gen 12-core Intel CPU, and there is some evidence that it also slightly speeds up boot performance. Bug: 258241771 Bug: 222463781 Test: CQ Fixes: `3cb278e73b` ("rcu: Make call_rcu() lazy to save power") Change-Id: I4cfe6f43de8bae9a6c034831c79d9773199d6d29 Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `cf7066b97e`) Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318052 Reviewed-by: Sean Paul <sean@poorly.run> Reviewed-by: Vineeth Pillai <vineethrp@google.com> Tested-by: Joel Fernandes <joelaf@google.com> Commit-Queue: Joel Fernandes <joelaf@google.com> (cherry picked from commit e59686da91b689d3771a09f3eae37db5f40d3f75) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	8568593719	UPSTREAM: rcu: Track laziness during boot and suspend Boot and suspend/resume should not be slowed down in kernels built with CONFIG_RCU_LAZY=y. In particular, suspend can sometimes fail in such kernels. This commit therefore adds rcu_async_hurry(), rcu_async_relax(), and rcu_async_should_hurry() functions that track whether or not either a boot or a suspend/resume operation is in progress. This will enable a later commit to refrain from laziness during those times. Export rcu_async_should_hurry(), rcu_async_hurry(), and rcu_async_relax() for later use by rcutorture. [ paulmck: Apply feedback from Steve Rostedt. ] Bug: 258241771 Bug: 222463781 Test: CQ Fixes: `3cb278e73b` ("rcu: Make call_rcu() lazy to save power") Change-Id: Ieb2f2d484a33cfbd71f71c8e3dbcfc05cd7efe8c Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `6efdda8bec`) Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318051 Reviewed-by: Vineeth Pillai <vineethrp@google.com> Reviewed-by: Sean Paul <sean@poorly.run> Tested-by: Joel Fernandes <joelaf@google.com> Commit-Queue: Joel Fernandes <joelaf@google.com> (cherry picked from commit 8bc7efc64c84da753f2174a7071c8f1a7823d2bb) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	f12c162eac	UPSTREAM: net: Use call_rcu_hurry() for dst_release() In a networking test on ChromeOS, kernels built with the new CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown phase. This failure may be reproduced as follows: ip netns del <name> The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits in this series for the benefit of certain battery-powered systems. This Kconfig option causes call_rcu() to delay its callbacks in order to batch them. This means that a given RCU grace period covers more callbacks, thus reducing the number of grace periods, in turn reducing the amount of energy consumed, which increases battery lifetime which can be a very good thing. This is not a subtle effect: In some important use cases, the battery lifetime is increased by more than 10%. This CONFIG_RCU_LAZY=y option is available only for CPUs that offload callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y. Delaying callbacks is normally not a problem because most callbacks do nothing but free memory. If the system is short on memory, a shrinker will kick all currently queued lazy callbacks out of their laziness, thus freeing their memory in short order. Similarly, the rcu_barrier() function, which blocks until all currently queued callbacks are invoked, will also kick lazy callbacks, thus enabling rcu_barrier() to complete in a timely manner. However, there are some cases where laziness is not a good option. For example, synchronize_rcu() invokes call_rcu(), and blocks until the newly queued callback is invoked. It would not be a good for synchronize_rcu() to block for ten seconds, even on an idle system. Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a given CPU kicks any lazy callbacks that might be already queued on that CPU. After all, if there is going to be a grace period, all callbacks might as well get full benefit from it. Yes, this could be done the other way around by creating a call_rcu_lazy(), but earlier experience with this approach and feedback at the 2022 Linux Plumbers Conference shifted the approach to call_rcu() being lazy with call_rcu_hurry() for the few places where laziness is inappropriate. Returning to the test failure, use of ftrace showed that this failure cause caused by the aadded delays due to this new lazy behavior of call_rcu() in kernels built with CONFIG_RCU_LAZY=y. Therefore, make dst_release() use call_rcu_hurry() in order to revert to the old test-failure-free behavior. [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] Bug: 258241771 Bug: 222463781 Test: CQ Change-Id: Ifd64083bd210a9dfe94c179152f27d310c179507 Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: David Ahern <dsahern@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: <netdev@vger.kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `483c26ff63`) Signed-off-by: Joel Fernandes <joelaf@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318050 Reviewed-by: Sean Paul <sean@poorly.run> Reviewed-by: Vineeth Pillai <vineethrp@google.com> (cherry picked from commit e0886387489fed8a60e7e0f107b95fb9c0241930) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	ff22b562f0	UPSTREAM: percpu-refcount: Use call_rcu_hurry() for atomic switch Earlier commits in this series allow battery-powered systems to build their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option. This Kconfig option causes call_rcu() to delay its callbacks in order to batch callbacks. This means that a given RCU grace period covers more callbacks, thus reducing the number of grace periods, in turn reducing the amount of energy consumed, which increases battery lifetime which can be a very good thing. This is not a subtle effect: In some important use cases, the battery lifetime is increased by more than 10%. This CONFIG_RCU_LAZY=y option is available only for CPUs that offload callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y. Delaying callbacks is normally not a problem because most callbacks do nothing but free memory. If the system is short on memory, a shrinker will kick all currently queued lazy callbacks out of their laziness, thus freeing their memory in short order. Similarly, the rcu_barrier() function, which blocks until all currently queued callbacks are invoked, will also kick lazy callbacks, thus enabling rcu_barrier() to complete in a timely manner. However, there are some cases where laziness is not a good option. For example, synchronize_rcu() invokes call_rcu(), and blocks until the newly queued callback is invoked. It would not be a good for synchronize_rcu() to block for ten seconds, even on an idle system. Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a given CPU kicks any lazy callbacks that might be already queued on that CPU. After all, if there is going to be a grace period, all callbacks might as well get full benefit from it. Yes, this could be done the other way around by creating a call_rcu_lazy(), but earlier experience with this approach and feedback at the 2022 Linux Plumbers Conference shifted the approach to call_rcu() being lazy with call_rcu_hurry() for the few places where laziness is inappropriate. And another call_rcu() instance that cannot be lazy is the one on the percpu refcounter's "per-CPU to atomic switch" code path, which uses RCU when switching to atomic mode. The enqueued callback wakes up waiters waiting in the percpu_ref_switch_waitq. Allowing this callback to be lazy would result in unacceptable slowdowns for users of per-CPU refcounts, such as blk_pre_runtime_suspend(). Therefore, make __percpu_ref_switch_to_atomic() use call_rcu_hurry() in order to revert to the old behavior. [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] Bug: 258241771 Bug: 222463781 Test: CQ Change-Id: Icc325f69d0df1a37b6f1de02a284e1fabf20e366 Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: <linux-mm@kvack.org> (cherry picked from commit `343a72e5e3`) Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318049 Reviewed-by: Vineeth Pillai <vineethrp@google.com> Reviewed-by: Sean Paul <sean@poorly.run> Tested-by: Joel Fernandes <joelaf@google.com> Commit-Queue: Joel Fernandes <joelaf@google.com> (cherry picked from commit dfd536f499642cd18679cc64c79a8fb275137f45) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	a4cc1aa22d	UPSTREAM: rcu/sync: Use call_rcu_hurry() instead of call_rcu call_rcu() changes to save power will slow down rcu sync. Use the call_rcu_hurry() API instead which reverts to the old behavior. [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] Bug: 258241771 Bug: 222463781 Test: CQ Change-Id: I5123ba52f47676305dbcfa1233bf3b41f140766c Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `7651d6b250`) Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318048 Reviewed-by: Sean Paul <sean@poorly.run> Commit-Queue: Joel Fernandes <joelaf@google.com> Reviewed-by: Vineeth Pillai <vineethrp@google.com> Tested-by: Joel Fernandes <joelaf@google.com> (cherry picked from commit 183fce4e1bfbbae1266ec90c6bb871b51d7af81c) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	222a4cd66c	UPSTREAM: rcu: Refactor code a bit in rcu_nocb_do_flush_bypass() This consolidates the code a bit and makes it cleaner. Functionally it is the same. Bug: 258241771 Bug: 222463781 Test: CQ Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `3d222a0c0c`) Change-Id: I8422c7138edd6a476fc46374beefdf46dd76b8b0 Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318047 Tested-by: Joel Fernandes <joelaf@google.com> Reviewed-by: Sean Paul <sean@poorly.run> Reviewed-by: Vineeth Pillai <vineethrp@google.com> Commit-Queue: Joel Fernandes <joelaf@google.com> (cherry picked from commit 58cb433d445d2416ba26645e8df63d86afa15f8c) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Vineeth Pillai	f4abe7bb5f	BACKPORT: rcu: Shrinker for lazy rcu The shrinker is used to speed up the free'ing of memory potentially held by RCU lazy callbacks. RCU kernel module test cases show this to be effective. Test is introduced in a later patch. [Joel: register_shrinker() argument list change.] Bug: 258241771 Bug: 222463781 Test: CQ Change-Id: I6a73a9dae79ff35feca37abe2663e55a0f46dda8 Signed-off-by: Vineeth Pillai <vineeth@bitbyteword.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `c945b4da7a`) Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318046 Tested-by: Joel Fernandes <joelaf@google.com> Reviewed-by: Vineeth Pillai <vineethrp@google.com> Commit-Queue: Joel Fernandes <joelaf@google.com> (cherry picked from commit 2cf50ca2e7c3bc08f5182fc517a89a65e8dca7e3) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	e0297c38a5	BACKPORT: rcu: Make call_rcu() lazy to save power Implement timer-based RCU callback batching (also known as lazy callbacks). With this we save about 5-10% of power consumed due to RCU requests that happen when system is lightly loaded or idle. By default, all async callbacks (queued via call_rcu) are marked lazy. An alternate API call_rcu_hurry() is provided for the few users, for example synchronize_rcu(), that need the old behavior. The batch is flushed whenever a certain amount of time has passed, or the batch on a particular CPU grows too big. Also memory pressure will flush it in a future patch. To handle several corner cases automagically (such as rcu_barrier() and hotplug), we re-use bypass lists which were originally introduced to address lock contention, to handle lazy CBs as well. The bypass list length has the lazy CB length included in it. A separate lazy CB length counter is also introduced to keep track of the number of lazy CBs. [ paulmck: Fix formatting of inline call_rcu_lazy() definition. ] [ paulmck: Apply Zqiang feedback. ] [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] [ joelaf: Small changes for 5.15 backport. ] Suggested-by: Paul McKenney <paulmck@kernel.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Bug: 258241771 Bug: 222463781 Test: CQ (cherry picked from commit `3cb278e73b` https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master) Change-Id: I557d5af2a5d317bd66e9ec55ed40822bb5c54390 Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4318045 Reviewed-by: Vineeth Pillai <vineethrp@google.com> Commit-Queue: Joel Fernandes <joelaf@google.com> Tested-by: Joel Fernandes <joelaf@google.com> (cherry picked from commit b30e520b9da88a5de115ed5b2c1b2aa89de9e214) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Joel Fernandes (Google)	276d33f21a	UPSTREAM: rcu: Fix late wakeup when flush of bypass cblist happens When the bypass cblist gets too big or its timeout has occurred, it is flushed into the main cblist. However, the bypass timer is still running and the behavior is that it would eventually expire and wake the GP thread. Since we are going to use the bypass cblist for lazy CBs, do the wakeup soon as the flush for "too big or too long" bypass list happens. Otherwise, long delays can happen for callbacks which get promoted from lazy to non-lazy. This is a good thing to do anyway (regardless of future lazy patches), since it makes the behavior consistent with behavior of other code paths where flushing into the ->cblist makes the GP kthread into a non-sleeping state quickly. [ Frederic Weisbecker: Changes to avoid unnecessary GP-thread wakeups plus comment changes. ] Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `b50606f35f`) Bug: 258241771 Bug: 222463781 Test: powerIdle lab tests. Change-Id: If8da96d7ba6ed90a2a70f7d56f7bb03af44fd649 Signed-off-by: Joel Fernandes <joelaf@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4065239 Reviewed-by: Vineeth Pillai <vineethrp@google.com> (cherry picked from commit 75db04e1eed1756a4ee5fb87ef8dd494d19bf53f) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Frederic Weisbecker	24e6758060	BACKPORT: rcu: Fix missing nocb gp wake on rcu_barrier() In preparation for RCU lazy changes, wake up the RCU nocb gp thread if needed after an entrain. This change prevents the RCU barrier callback from waiting in the queue for several seconds before the lazy callbacks in front of it are serviced. Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit `b8f7aca3f0` https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next) (Backport: Conflicts: kernel/rcu/tree.c Due to missing 'rcu: Rework rcu_barrier() and callback-migration logic' Chose not to backport that.) Bug: 258241771 Bug: 222463781 Test: CQ Change-Id: Ib55c5886764b74df22531eca35f076ef7acc08dd Signed-off-by: Joel Fernandes <joelaf@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4062165 Reviewed-by: Vineeth Pillai <vineethrp@google.com> (cherry picked from commit fc6e55ea65dca9cc52bda6081341f3fcc87f6ee7) [Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message to match Android style] Signed-off-by: Qais Yousef <qyousef@google.com>	2024-04-29 22:43:39 +00:00
Florian Westphal	fb310d468a	UPSTREAM: netfilter: nft_set_pipapo: do not free live element [ Upstream commit 3cfc9ec039af60dbd8965ae085b2c2ccdcfbe1cc ] Pablo reports a crash with large batches of elements with a back-to-back add/remove pattern. Quoting Pablo: add_elem("00000000") timeout 100 ms ... add_elem("0000000X") timeout 100 ms del_elem("0000000X") <---------------- delete one that was just added ... add_elem("00005000") timeout 100 ms 1) nft_pipapo_remove() removes element 0000000X Then, KASAN shows a splat. Looking at the remove function there is a chance that we will drop a rule that maps to a non-deactivated element. Removal happens in two steps, first we do a lookup for key k and return the to-be-removed element and mark it as inactive in the next generation. Then, in a second step, the element gets removed from the set/map. The _remove function does not work correctly if we have more than one element that share the same key. This can happen if we insert an element into a set when the set already holds an element with same key, but the element mapping to the existing key has timed out or is not active in the next generation. In such case its possible that removal will unmap the wrong element. If this happens, we will leak the non-deactivated element, it becomes unreachable. The element that got deactivated (and will be freed later) will remain reachable in the set data structure, this can result in a crash when such an element is retrieved during lookup (stale pointer). Add a check that the fully matching key does in fact map to the element that we have marked as inactive in the deactivation step. If not, we need to continue searching. Add a bug/warn trap at the end of the function as well, the remove function must not ever be called with an invisible/unreachable/non-existent element. v2: avoid uneeded temporary variable (Stefano) Bug: 336735501 Fixes: `3c4287f620` ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `ebf7c9746f`) Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: Ic9a48ac9ac0f9960fea9e066d9a0a9fb93f7b633	2024-04-29 15:19:22 +00:00
seanwang1	444a497469	ANDROID: GKI: Update lenovo symbol list 3 function symbols added 'void css_task_iter_end(struct css_task_iter)' 'struct task_struct css_task_iter_next(struct css_task_iter)' 'void css_task_iter_start(struct cgroup_subsys_state, unsigned int, struct css_task_iter*)' Bug: 336967294 Change-Id: I7258e06fe9f1e21d73481d47a5cc54bb95e40646 Signed-off-by: seanwang1 <seanwang1@lenovo.com>	2024-04-29 15:17:00 +00:00
seanwang1	978f805a2d	ANDROID: GKI: Export css_task_iter_start() Export css_task_iter_start() and css_task_iter_next() and css_task_iter_end() inorder to support task iteration in a cgroup in vendor modules. Bug: 336967294 Change-Id: Id93963ddd30ab02c7a4d5086f19d15310e4eda14 Signed-off-by: seanwang1 <seanwang1@lenovo.com>	2024-04-29 15:17:00 +00:00
Suzuki K Poulose	0ae4f32634	FROMGIT: coresight: etm4x: Fix access to resource selector registers Resource selector pair 0 is always implemented and reserved. We must not touch it, even during save/restore for CPU Idle. Rest of the driver is well behaved. Fix the offending ones. Reported-by: Yabin Cui <yabinc@google.com> Fixes: `f188b5e76a` ("coresight: etm4x: Save/restore state across CPU low power states") Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Tested-by: Yabin Cui <yabinc@google.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Link: https://lore.kernel.org/r/20240412142702.2882478-5-suzuki.poulose@arm.com Bug: 335234033 (cherry picked from commit d6fc00d0f640d6010b51054aa8b0fd191177dbc9 https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git next) Change-Id: I5f3385cb269969a299402fa258b30ab43e95805f Signed-off-by: Yabin Cui <yabinc@google.com>	2024-04-26 12:23:30 -07:00
Suzuki K Poulose	8ba1802287	BACKPORT: FROMGIT: coresight: etm4x: Safe access for TRCQCLTR ETM4x implements TRCQCLTR only when the Q elements are supported and the Q element filtering is supported (TRCIDR0.QFILT). Access to the register otherwise could be fatal. Fix this by tracking the availability, like the others. Fixes: `f188b5e76a` ("coresight: etm4x: Save/restore state across CPU low power states") Reported-by: Yabin Cui <yabinc@google.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Tested-by: Yabin Cui <yabinc@google.com> Link: https://lore.kernel.org/r/20240412142702.2882478-4-suzuki.poulose@arm.com Bug: 335234033 (cherry picked from commit 46bf8d7cd8530eca607379033b9bc4ac5590a0cd https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git next) Change-Id: Id848fa14ba8003149f76b5ca54562593f6164150 Signed-off-by: Yabin Cui <yabinc@google.com>	2024-04-26 12:23:14 -07:00
Suzuki K Poulose	6a08c9fb9d	FROMGIT: coresight: etm4x: Do not save/restore Data trace control registers ETM4x doesn't support Data trace on A class CPUs. As such do not access the Data trace control registers during CPU idle. This could cause problems for ETE. While at it, remove all references to the Data trace control registers. Fixes: `f188b5e76a` ("coresight: etm4x: Save/restore state across CPU low power states") Reported-by: Yabin Cui <yabinc@google.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Tested-by: Yabin Cui <yabinc@google.com> Link: https://lore.kernel.org/r/20240412142702.2882478-3-suzuki.poulose@arm.com Bug: 335234033 (cherry picked from commit 5eb3a0c2c52368cb9902e9a6ea04888e093c487d https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git next) Change-Id: I06977d86aa2d876d166db0fac8fbccf48fd07229 Signed-off-by: Yabin Cui <yabinc@google.com>	2024-04-26 12:23:02 -07:00
Suzuki K Poulose	a02278f990	FROMGIT: coresight: etm4x: Do not hardcode IOMEM access for register restore When we restore the register state for ETM4x, while coming back from CPU idle, we hardcode IOMEM access. This is wrong and could blow up for an ETM with system instructions access (and for ETE). Fixes: `f5bd523690` ("coresight: etm4x: Convert all register accesses") Reported-by: Yabin Cui <yabinc@google.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Tested-by: Yabin Cui <yabinc@google.com> Link: https://lore.kernel.org/r/20240412142702.2882478-2-suzuki.poulose@arm.com Bug: 335234033 (cherry picked from commit 1e7ba33fa591de1cf60afffcabb45600b3607025 https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git next) Change-Id: Id2ea066374933de51a90f1fca8304338b741845d Signed-off-by: Yabin Cui <yabinc@google.com>	2024-04-26 12:22:54 -07:00
Michal Luczaj	e8e652b8c8	UPSTREAM: af_unix: Fix garbage collector racing against connect() [ Upstream commit 47d8ac011fe1c9251070e1bd64cb10b48193ec51 ] Garbage collector does not take into account the risk of embryo getting enqueued during the garbage collection. If such embryo has a peer that carries SCM_RIGHTS, two consecutive passes of scan_children() may see a different set of children. Leading to an incorrectly elevated inflight count, and then a dangling pointer within the gc_inflight_list. sockets are AF_UNIX/SOCK_STREAM S is an unconnected socket L is a listening in-flight socket bound to addr, not in fdtable V's fd will be passed via sendmsg(), gets inflight count bumped connect(S, addr) sendmsg(S, [V]); close(V) __unix_gc() ---------------- ------------------------- ----------- NS = unix_create1() skb1 = sock_wmalloc(NS) L = unix_find_other(addr) unix_state_lock(L) unix_peer(S) = NS // V count=1 inflight=0 NS = unix_peer(S) skb2 = sock_alloc() skb_queue_tail(NS, skb2[V]) // V became in-flight // V count=2 inflight=1 close(V) // V count=1 inflight=1 // GC candidate condition met for u in gc_inflight_list: if (total_refs == inflight_refs) add u to gc_candidates // gc_candidates={L, V} for u in gc_candidates: scan_children(u, dec_inflight) // embryo (skb1) was not // reachable from L yet, so V's // inflight remains unchanged __skb_queue_tail(L, skb1) unix_state_unlock(L) for u in gc_candidates: if (u.inflight) scan_children(u, inc_inflight_move_tail) // V count=1 inflight=2 (!) If there is a GC-candidate listening socket, lock/unlock its state. This makes GC wait until the end of any ongoing connect() to that socket. After flipping the lock, a possibly SCM-laden embryo is already enqueued. And if there is another embryo coming, it can not possibly carry SCM_RIGHTS. At this point, unix_inflight() can not happen because unix_gc_lock is already taken. Inflight graph remains unaffected. Bug: 336226035 Fixes: `1fd05ba5a2` ("[AF_UNIX]: Rewrite garbage collector, fixes race.") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `507cc232ff`) Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: If321f78b8b3220f5a1caea4b5e9450f1235b0770	2024-04-22 16:24:10 -07:00
Kuniyuki Iwashima	65e0a92c6d	UPSTREAM: af_unix: Do not use atomic ops for unix_sk(sk)->inflight. [ Upstream commit 97af84a6bba2ab2b9c704c08e67de3b5ea551bb2 ] When touching unix_sk(sk)->inflight, we are always under spin_lock(&unix_gc_lock). Let's convert unix_sk(sk)->inflight to the normal unsigned long. Bug: 336226035 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240123170856.41348-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `301fdbaa0b`) Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: I0d965d5f2a863d798c06de9f21d0467f256b538e	2024-04-22 16:23:24 -07:00
Bart Van Assche	5725caa296	FROMLIST: scsi: ufs: Check for completion from the timeout handler If ufshcd_abort() returns SUCCESS for an already completed command then that command is completed twice. This results in a crash. Prevent this by checking whether a command has completed without completion interrupt from the timeout handler. This CL fixes the following kernel crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: dma_direct_map_sg+0x70/0x274 scsi_dma_map+0x84/0x124 ufshcd_queuecommand+0x3fc/0x880 scsi_queue_rq+0x7d0/0x111c blk_mq_dispatch_rq_list+0x440/0xebc blk_mq_do_dispatch_sched+0x5a4/0x6b8 __blk_mq_sched_dispatch_requests+0x150/0x220 __blk_mq_run_hw_queue+0xf0/0x218 __blk_mq_delay_run_hw_queue+0x8c/0x18c blk_mq_run_hw_queue+0x1a4/0x360 blk_mq_sched_insert_requests+0x130/0x334 blk_mq_flush_plug_list+0x138/0x234 blk_flush_plug_list+0x118/0x164 blk_finish_plug() read_pages+0x38c/0x408 page_cache_ra_unbounded+0x230/0x2f8 do_sync_mmap_readahead+0x1a4/0x208 filemap_fault+0x27c/0x8f4 f2fs_filemap_fault+0x28/0xfc __do_fault+0xc4/0x208 handle_pte_fault+0x290/0xe04 do_handle_mm_fault+0x52c/0x858 do_page_fault+0x5dc/0x798 do_translation_fault+0x40/0x54 do_mem_abort+0x60/0x134 el0_da+0x40/0xb8 el0t_64_sync_handler+0xc4/0xe4 el0t_64_sync+0x1b4/0x1b8 Bug: 312786487 Bug: 326329246 Bug: 333069246 Bug: 333317508 Link: https://lore.kernel.org/linux-scsi/20240416171357.1062583-1-bvanassche@acm.org/T/#mbfa6b7a56e07c792ddca7801fb8900f8370d4731 Change-Id: I48e93516d2aae3b2ad62b0b51144e8e2e39d7476 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Bart Van Assche <bvanassche@google.com>	2024-04-16 11:35:04 -07:00
Bart Van Assche	8563ce5895	BACKPORT: FROMLIST: scsi: ufs: Make the polling code report which command has been completed Prepare for introducing a new __ufshcd_poll() caller that will need to know whether or not a specific command has been completed. Bug: 312786487 Bug: 326329246 Bug: 333069246 Bug: 333317508 Link: https://lore.kernel.org/linux-scsi/20240416171357.1062583-1-bvanassche@acm.org/T/#m68901e4f4e2437e7d0cb747049006ab19f57e038 Change-Id: I1b25b095b4bf9fbf175aa963ec85fcbbcb2be0ed Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Bart Van Assche <bvanassche@google.com>	2024-04-16 11:35:04 -07:00
Bart Van Assche	0fcd7a1c7c	BACKPORT: FROMLIST: scsi: ufs: Make ufshcd_poll() complain about unsupported arguments The ufshcd_poll() implementation does not support queue_num == UFSHCD_POLL_FROM_INTERRUPT_CONTEXT in MCQ mode. Hence complain if queue_num == UFSHCD_POLL_FROM_INTERRUPT_CONTEXT in MCQ mode. Bug: 312786487 Bug: 326329246 Bug: 333069246 Bug: 333317508 Link: https://lore.kernel.org/linux-scsi/20240416171357.1062583-1-bvanassche@acm.org/T/#mf141ffd0528e062eccaceb98f326abae709da3c1 Change-Id: I4182872aa86ed84f074a3f11364138cfde19e74b Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Bart Van Assche <bvanassche@google.com>	2024-04-16 11:35:04 -07:00
Bart Van Assche	aa07d6b28d	ANDROID: scsi: ufs: Unexport ufshcd_mcq_poll_cqe_nolock() Unexport this function because it is not used outside the UFSHCI core driver and because it is not possible to use this function from outside the UFSHCI core driver without triggering a race condition. Bug: 312786487 Bug: 326329246 Bug: 333069246 Bug: 333317508 Change-Id: I1bb504b0310c3618db94e9401ff4f7e13633d6a0 Signed-off-by: Bart Van Assche <bvanassche@google.com>	2024-04-16 11:35:04 -07:00
Oven	25ebc09178	ANDROID: mm: fix incorrect unlock mmap_lock for speculative swap fault In a20b68c396127cd6387f37845c5bc05e44e2fd0e, SPF is supported for swap fault. But in __lock_page_or_retry(), it will unlock mmap_lock unconditionally. That will cause unpaired lock release in handling SPF. Bug: 333508035 Change-Id: Ia1da66c85e0d58883cf518f10cd33fc5cad387b8 Signed-off-by: Oven <liyangouwen1@oppo.com> (cherry picked from commit 63070883166ae63620a87d958319deba86f236ae)	2024-04-16 16:44:22 +00:00
Varad Gautam	264477e0d8	ANDROID: Update the ABI symbol list Adding the following symbols: - iov_iter_kvec - seq_read_iter 1 function symbol(s) added 'ssize_t seq_read_iter(struct kiocb, struct iov_iter)' Bug: 332885803 Change-Id: I4068f8a28395deee9a7bcd1cccf786cdd169f0c1 Signed-off-by: Varad Gautam <varadgautam@google.com>	2024-04-16 14:36:45 +00:00
Kalesh Singh	084d22016c	ANDROID: 16K: Separate padding from ELF LOAD segment mappings In has been found that some in-field apps depend on the output of /proc//maps to determine the address ranges of other operations. With the extension of LOAD segments VMAs to be contiguous in memory, the apps may perform operations on an area that is not backed by the underlying file, which results in a SIGBUS. Other apps have crashed with yet unindentified reasons. To avoid breaking in-field apps, maintain the output of /proc//[s]maps with PROT_NONE VMAs for the padding pages of LOAD segmetns instead of showing the segment extensions. NOTE: This does not allocate actual backing VMAs for the shown PROT_NONE mappings. This approach maintains 2 possible assumptions that userspace (apps) could be depending on: 1) That LOAD segment mappings are "contiguous" (not speparated by unrelated mappings) in memory. 2) That no virtual address space is available between mappings of consecutive LOAD segments for the same ELF. For example the output of /proc/*/[s]maps before and after this change is shown below. Segments maintain PROT_NONE gaps ("[page size compat]") for app compatiblity but these are not backed by actual slab VMA memory. Maps Before: 7fb03604d000-7fb036051000 r--p 00000000 fe:09 21935719 /system/lib64/libnetd_client.so 7fb036051000-7fb036055000 r-xp 00004000 fe:09 21935719 /system/lib64/libnetd_client.so 7fb036055000-7fb036059000 r--p 00008000 fe:09 21935719 /system/lib64/libnetd_client.so 7fb036059000-7fb03605a000 rw-p 0000c000 fe:09 21935719 /system/lib64/libnetd_client.so Maps After: 7fc707390000-7fc707393000 r--p 00000000 fe:09 21935719 /system/lib64/libnetd_client.so 7fc707393000-7fc707394000 ---p 00000000 00:00 0 [page size compat] 7fc707394000-7fc707398000 r-xp 00004000 fe:09 21935719 /system/lib64/libnetd_client.so 7fc707398000-7fc707399000 r--p 00008000 fe:09 21935719 /system/lib64/libnetd_client.so 7fc707399000-7fc70739c000 ---p 00000000 00:00 0 [page size compat] 7fc70739c000-7fc70739d000 rw-p 0000c000 fe:09 21935719 /system/lib64/libnetd_client.so Smaps Before: 7fb03604d000-7fb036051000 r--p 00000000 fe:09 21935719 /system/lib64/libnetd_client.so Size: 16 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 16 kB Pss: 0 kB Pss_Dirty: 0 kB Shared_Clean: 16 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 16 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd mr mw me 7fb036051000-7fb036055000 r-xp 00004000 fe:09 21935719 /system/lib64/libnetd_client.so Size: 16 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 16 kB Pss: 0 kB Pss_Dirty: 0 kB Shared_Clean: 16 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 16 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd ex mr mw me 7fb036055000-7fb036059000 r--p 00008000 fe:09 21935719 /system/lib64/libnetd_client.so Size: 16 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 4 kB Pss: 4 kB Pss_Dirty: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd mr mw me ac 7fb036059000-7fb03605a000 rw-p 0000c000 fe:09 21935719 /system/lib64/libnetd_client.so Size: 4 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 4 kB Pss: 4 kB Pss_Dirty: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd wr mr mw me ac Smaps After: 7fc707390000-7fc707393000 r--p 00000000 fe:09 21935719 /system/lib64/libnetd_client.so Size: 12 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 12 kB Pss: 0 kB Shared_Clean: 12 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 12 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd mr mw me ?? 7fc707393000-7fc707394000 ---p 00000000 00:00 0 [page size compat] Size: 4 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: mr mw me 7fc707394000-7fc707398000 r-xp 00004000 fe:09 21935719 /system/lib64/libnetd_client.so Size: 16 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 16 kB Pss: 0 kB Shared_Clean: 16 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 16 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd ex mr mw me 7fc707398000-7fc707399000 r--p 00008000 fe:09 21935719 /system/lib64/libnetd_client.so Size: 4 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd mr mw me ac ?? ?? 7fc707399000-7fc70739c000 ---p 00000000 00:00 0 [page size compat] Size: 12 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: mr mw me ac 7fc70739c000-7fc70739d000 rw-p 0000c000 fe:09 21935719 /system/lib64/libnetd_client.so Size: 4 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd wr mr mw me ac Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: I12bf2c106fafc74a500d79155b81dde5db42661e Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2024-04-15 20:38:16 +00:00
Kalesh Singh	37ea0e8485	ANDROID: 16K: Exclude ELF padding for fault around range Userspace apps often analyze memory consumption by the use of mm rss_stat counters -- via the kmem/rss_stat trace event or from /proc/<pid>/statm. rss_stat counters are only updated when the PTEs are updated. What this means is that pages can be present in the page cache from readahead but not visible to userspace (not attributed to the app) as there is no corresponding VMA (PTEs) for the respective page cache pages. A side effect of the loader now extending ELF LOAD segments to be contiguously mapped in the virtual address space, means that the VMA is extended to cover the padding pages. When filesystems, such as f2fs and ext4, that implement vm_ops->map_pages() attempt to perform a do_fault_around() the extent of the fault around is restricted by the area of the enclosing VMA. Since the loader extends LOAD segment VMAs to be contiguously mapped, the extent of the fault around is also increased. The result of which, is that the PTEs corresponding to the padding pages are updated and reflected in the rss_stat counters. It is not common that userspace application developers be aware of this nuance in the kernel's memory accounting. To avoid apparent regressions in memory usage to userspace, restrict the fault around range to only valid data pages (i.e. exclude the padding pages at the end of the VMA). Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: I2c7a39ec1b040be2b9fb47801f95042f5dbf869d Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2024-04-15 20:38:16 +00:00
Kalesh Singh	e7bff50b22	ANDROID: 16K: Use MADV_DONTNEED to save VMA padding pages. When performing LOAD segment extension, the dynamic linker knows what portion of the VMA is padding. In order for the kernel to implement mitigations that ensure app compatibility, the extent of the padding must be made available to the kernel. To achieve this, reuse MADV_DONTNEED on single VMAs to hint the padding range to the kernel. This information is then stored in vm_flag bits. This allows userspace (dynamic linker) to set the padding pages on the VMA without a need for new out-of-tree UAPI. Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: I3421de32ab38ad3cb0fbce73ecbd8f7314287cde Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2024-04-15 20:38:16 +00:00
Kalesh Singh	38cccb9154	ANDROID: 16K: Introduce ELF padding representation for VMAs The dynamic linker may extend ELF LOAD segment mappings to be contiguous in memory when loading a 16kB compatible ELF on a 4kB page-size system. This is done to reduce the use of unreclaimable VMA slab memory for the otherwise necessary "gap" VMAs. The extended portion of the mapping (VMA) can be viewed as "padding", meaning that the mapping in that range corresponds to an area of the file that does not contain contents of the respective segments (maybe zero's depending on how the ELF is built). For some compatibility mitigations, the region of a VMA corresponding to these padding sections need to be known. In order to represent such regions without adding addtional overhead or breaking ABI, some upper bits of vm_flags are used. Add the VMA padding pages representation and the necessary APIs to manipulate it. Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: Ieb9fa98e30ec9b0bec62256624f14e3ed6062a75 Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2024-04-15 20:38:16 +00:00
Kalesh Singh	9274c308d8	ANDROID: 16K: Introduce /sys/kernel/mm/pgsize_miration/enabled Migrating from 4kB to 16kB page-size in Android requires first making the platform page-agnostic, which involves increasing Android-ELFs' max-page-size (p_align) from 4kB to 16kB. Increasing the ELF max-page-size was found to cause compatibility issues in apps that use obfuscation or depend on the ELF segments being mapped based on 4kB-alignment. Working around these compatibility issues involves both kernel and userspace (dynamic linker) changes. Introduce a knob for userspace (dynamic linker) to determine whether the kernel supports the mitigations needed for page-size migration compatibility. The knob also allows for userspace to turn on or off these mitigations by writing 1 or 0 to /sys/kernel/mm/pgsize_miration/enabled: echo 1 > /sys/kernel/mm//pgsize_miration/enabled # Enable echo 0 > /sys/kernel/mm//pgsize_miration/enabled # Disable Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: I9ac1d15d397b8226b27827ecffa30502da91e10e Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2024-04-15 20:38:16 +00:00
Pablo Neira Ayuso	ceb8c595f8	UPSTREAM: netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path commit 0d459e2ffb541841714839e8228b845458ed3b27 upstream. The commit mutex should not be released during the critical section between nft_gc_seq_begin() and nft_gc_seq_end(), otherwise, async GC worker could collect expired objects and get the released commit lock within the same GC sequence. nf_tables_module_autoload() temporarily releases the mutex to load module dependencies, then it goes back to replay the transaction again. Move it at the end of the abort phase after nft_gc_seq_end() is called. Bug: 332996726 Cc: stable@vger.kernel.org Fixes: `720344340f` ("netfilter: nf_tables: GC transaction race with abort path") Reported-by: Kuan-Ting Chen <hexrabbit@devco.re> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `8038ee3c3e`) Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: I637389421d8eca5ab59a41bd1a4b70432440034c	2024-04-15 10:32:30 +00:00
Pablo Neira Ayuso	ea419cda5c	UPSTREAM: netfilter: nf_tables: release batch on table validation from abort path commit a45e6889575c2067d3c0212b6bc1022891e65b91 upstream. Unlike early commit path stage which triggers a call to abort, an explicit release of the batch is required on abort, otherwise mutex is released and commit_list remains in place. Add WARN_ON_ONCE to ensure commit_list is empty from the abort path before releasing the mutex. After this patch, commit_list is always assumed to be empty before grabbing the mutex, therefore `03c1f1ef15` ("netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()") only needs to release the pending modules for registration. Bug: 332996726 Cc: stable@vger.kernel.org Fixes: `c0391b6ab8` ("netfilter: nf_tables: missing validation from the abort path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `b0b36dcbe0`) Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: I38f9b05ac4eadd1d2b7b306cccaf0aeacb61b57a	2024-04-15 10:32:28 +00:00
Pablo Neira Ayuso	6b883cdac2	UPSTREAM: netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout commit 552705a3650bbf46a22b1adedc1b04181490fc36 upstream. While the rhashtable set gc runs asynchronously, a race allows it to collect elements from anonymous sets with timeouts while it is being released from the commit path. Mingi Cho originally reported this issue in a different path in 6.1.x with a pipapo set with low timeouts which is not possible upstream since 7395dfacfff6 ("netfilter: nf_tables: use timestamp to check for set element timeout"). Fix this by setting on the dead flag for anonymous sets to skip async gc in this case. According to 08e4c8c5919f ("netfilter: nf_tables: mark newset as dead on transaction abort"), Florian plans to accelerate abort path by releasing objects via workqueue, therefore, this sets on the dead flag for abort path too. Bug: 329205787 Cc: stable@vger.kernel.org Fixes: `5f68718b34` ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Reported-by: Mingi Cho <mgcho.minic@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `406b0241d0`) Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: I6170493c267e020c50a739150f8c421deb635b35	2024-04-15 09:45:39 +01:00

1 2 3 4 5 ...

1074365 Commits