mirror of
https://github.com/hardkernel/linux.git
synced 2026-06-05 10:31:46 +09:00
b6fa3795ff19ca15d3750b0fbad0268cebccb73e
1170182 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
b6fa3795ff |
ANDROID: GKI: net: add vendor hooks net qos for gki purpose
Add vendor hooks to support net qos policy feature: 1.android_rvh_tcp_select_window We want to modify the tcp_select_window return value 2.android_rvh_inet_sock_create; android_rvh_inet_sock_release We want to add a field when a inet sock is created 3.android_vh_tcp_rtt_estimator To record the rtt of tcp connections for specified uids 4.android_vh_build_skb_around To initialize the oem data field in the skb_shared_info structure Bug: 335081123 Bug: 424394849 Change-Id: Ibb22813c5004464416346d2c4c526d6cc5531fcc Signed-off-by: jujiang <jyu.jiang@vivo.corp-partner.google.com> |
||
|
|
ba45069401 |
Revert "ANDROID: mm: Set PAGE_BLOCK_ORDER to 8 when ARM64_16K_PAGES"
This reverts commit 45afa562802ae9a253ef437dd0010c8c2ec17806. Reason for revert: This was a workaround due to the kernel build tools preserving the # nocheck comment as part of the config option for CONFIG_PAGE_BLOCK_ORDER, which is problematic, since it is supposed to be an int. The build tools have been patched to not do that anymore, so this can be removed. Bug: 424212284 Bug: 375647879 Bug: 355449177 Bug: 418282543 Change-Id: Ib999aea150c1c5f7f22ea6bdd81de0ec75f8efaf [isaacmanjarres: resolved merge conflicts from PAGE_BLOCK_ORDER being hardcoded to 8 on android14 kernels.] Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> |
||
|
|
6246d345f5 |
ANDROID: mm: Set PAGE_BLOCK_ORDER to 8 when ARM64_16K_PAGES
This config will allow the 16kb page size kernels to have the
same CMA_MIN_ALIGNMENT_BYTES that 4k kernels. This means that
the CMA configs for the drivers won't have to change.
Note: This change is needed to avoid breaking old kernel
builds.
Bug: 424212284
Bug: 375647879
Bug: 355449177
Bug: 418282543
Test: tools/bazel run //common:kernel_aarch64_dist
tools/bazel run //common:kernel_aarch64_16k_dist
tools/bazel run //common:kernel_x86_64_dist
Change-Id: Icbfcab0d7e5ba18b3fc35c1186ef79e82f3e7ab1
Signed-off-by: Juan Yescas <jyescas@google.com>
|
||
|
|
74db64dcc8 |
ANDROID: GKI: Update symbol list for vivo
Update vivo symbol list for adding hook to retry mempool allocation without delay. 1 function symbol(s) added 'int __traceiter_android_vh_mempool_alloc_skip_wait(void*, gfp_t *, bool *)' 1 variable symbol(s) added 'struct tracepoint __tracepoint_android_vh_mempool_alloc_skip_wait' Bug: 423832910 Change-Id: I869cfce91993628c05ddefd01e67a655f53ee48a Signed-off-by: Justin Jiang <justinjiang@vivo.corp-partner.google.com> |
||
|
|
0c59801101 |
ANDROID: vendor_hooks: add hook to retry mempool allocation without delay
Allow important priority threads to retry mempool allocation, achieving fast memory allocation and solving lagging problems caused by delaying 5 seconds. Bug: 423832910 Change-Id: I80e6b1c55652f5a62ac36bbf0091d22ec7fb6189 Signed-off-by: Justin Jiang <justinjiang@vivo.corp-partner.google.com> |
||
|
|
312cb0bda6 |
BACKPORT: FROMGIT: mm: Add CONFIG_PAGE_BLOCK_ORDER to select page block order
Problem: On large page size configurations (16KiB, 64KiB), the CMA
alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
and this causes the CMA reservations to be larger than necessary.
This means that system will have less available MIGRATE_UNMOVABLE and
MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
For example, in ARM, the CMA alignment requirement when:
- CONFIG_ARCH_FORCE_MAX_ORDER default value is used
- CONFIG_TRANSPARENT_HUGEPAGE is set:
PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
-----------------------------------------------------------------------
4KiB | 10 | 9 | 4KiB * (2 ^ 9) = 2MiB
16Kib | 11 | 11 | 16KiB * (2 ^ 11) = 32MiB
64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
There are some extreme cases for the CMA alignment requirement when:
- CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
- CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
- CONFIG_HUGETLB_PAGE is NOT set
PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
------------------------------------------------------------------------
4KiB | 15 | 15 | 4KiB * (2 ^ 15) = 128MiB
16Kib | 13 | 13 | 16KiB * (2 ^ 13) = 128MiB
64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
This affects the CMA reservations for the drivers. If a driver in a
4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
reservation has to be 32MiB due to the alignment requirements:
reserved-memory {
...
cma_test_reserve: cma_test_reserve {
compatible = "shared-dma-pool";
size = <0x0 0x400000>; /* 4 MiB */
...
};
};
reserved-memory {
...
cma_test_reserve: cma_test_reserve {
compatible = "shared-dma-pool";
size = <0x0 0x2000000>; /* 32 MiB */
...
};
};
Solution: Add a new config CONFIG_PAGE_BLOCK_ORDER that
allows to set the page block order in all the architectures.
The maximum page block order will be given by
ARCH_FORCE_MAX_ORDER.
By default, CONFIG_PAGE_BLOCK_ORDER will have the same
value that ARCH_FORCE_MAX_ORDER. This will make sure that
current kernel configurations won't be affected by this
change. It is a opt-in change.
This patch will allow to have the same CMA alignment
requirements for large page sizes (16KiB, 64KiB) as that
in 4kb kernels by setting a lower pageblock_order.
Tests:
- Verified that HugeTLB pages work when pageblock_order is 1, 7, 10
on 4k and 16k kernels.
- Verified that Transparent Huge Pages work when pageblock_order
is 1, 7, 10 on 4k and 16k kernels.
- Verified that dma-buf heaps allocations work when pageblock_order
is 1, 7, 10 on 4k and 16k kernels.
Benchmarks:
The benchmarks compare 16kb kernels with pageblock_order 10 and 7. The
reason for the pageblock_order 7 is because this value makes the min
CMA alignment requirement the same as that in 4kb kernels (2MB).
- Perform 100K dma-buf heaps (/dev/dma_heap/system) allocations of
SZ_8M, SZ_4M, SZ_2M, SZ_1M, SZ_64, SZ_8, SZ_4. Use simpleperf
(https://developer.android.com/ndk/guides/simpleperf) to measure
the # of instructions and page-faults on 16k kernels.
The benchmark was executed 10 times. The averages are below:
# instructions | #page-faults
order 10 | order 7 | order 10 | order 7
--------------------------------------------------------
13,891,765,770 | 11,425,777,314 | 220 | 217
14,456,293,487 | 12,660,819,302 | 224 | 219
13,924,261,018 | 13,243,970,736 | 217 | 221
13,910,886,504 | 13,845,519,630 | 217 | 221
14,388,071,190 | 13,498,583,098 | 223 | 224
13,656,442,167 | 12,915,831,681 | 216 | 218
13,300,268,343 | 12,930,484,776 | 222 | 218
13,625,470,223 | 14,234,092,777 | 219 | 218
13,508,964,965 | 13,432,689,094 | 225 | 219
13,368,950,667 | 13,683,587,37 | 219 | 225
-------------------------------------------------------------------
13,803,137,433 | 13,131,974,268 | 220 | 220 Averages
There were 4.85% #instructions when order was 7, in comparison
with order 10.
13,803,137,433 - 13,131,974,268 = -671,163,166 (-4.86%)
The number of page faults in order 7 and 10 were the same.
These results didn't show any significant regression when the
pageblock_order is set to 7 on 16kb kernels.
- Run speedometer 3.1 (https://browserbench.org/Speedometer3.1/) 5 times
on the 16k kernels with pageblock_order 7 and 10.
order 10 | order 7 | order 7 - order 10 | (order 7 - order 10) %
-------------------------------------------------------------------
15.8 | 16.4 | 0.6 | 3.80%
16.4 | 16.2 | -0.2 | -1.22%
16.6 | 16.3 | -0.3 | -1.81%
16.8 | 16.3 | -0.5 | -2.98%
16.6 | 16.8 | 0.2 | 1.20%
-------------------------------------------------------------------
16.44 16.4 -0.04 -0.24% Averages
The results didn't show any significant regression when the
pageblock_order is set to 7 on 16kb kernels.
Signed-off-by: Juan Yescas <jyescas@google.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit e13e7922d03439e374c263049af5f740ceae6346 https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/ mm-stable)
Bug: 375647879
Bug: 355449177
Bug: 418282543
[jyescas: Use MAX_ORDER instead of MAX_PAGE_ORDER. Update the file
mm/page_alloc.c instead of mm/mm_init.c due the function
set_pageblock_order is there.]
Test: Built and ran kernel
Link: https://lkml.kernel.org/r/20250521215807.1860663-1-jyescas@google.com
Change-Id: Id7132b6848e5deb97a7531cf546060de2accffac
|
||
|
|
5b2e204a7b |
BACKPORT: binder: Create safe versions of binder log files
Binder defines several seq_files that can be accessed via debugfs or binderfs. Some of these files (e.g., 'state' and 'transactions') contain more granular information about binder's internal state that is helpful for debugging, but they also leak userspace address data through user-defined 'cookie' or 'ptr' values. Consequently, access to these files must be heavily restricted. Add two new files, 'state_hashed' and 'transactions_hashed', that reproduce the information in the original files but use the kernel's raw pointer obfuscation to hash any potential user addresses. This approach allows systems to grant broader access to the new files without having to change the security policy around the existing ones. In practice, userspace populates these fields with user addresses, but within the driver, these values only serve as unique identifiers for their associated binder objects. Consequently, binder logs can obfuscate these values and still retain meaning. While this strategy prevents leaking information about the userspace memory layout in the existing log files, it also decouples log messages about binder objects from their user-defined identifiers. Acked-by: Carlos Llamas <cmllamas@google.com> Tested-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: "Tiffany Y. Yang" <ynaffit@google.com> Link: https://lore.kernel.org/r/20250510013435.1520671-7-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Bug: 316970771 (cherry picked from commit 57483a362741e4f0f3f4d2fc82d48f82fd0986d9) [Resolve conflicts from node prio] Change-Id: I6a01048c0105a1d6061e95f386e7ee55e2fdc898 Signed-off-by: "Tiffany Yang" <ynaffit@google.com> |
||
|
|
1fee46d076 |
UPSTREAM: binder: Refactor binder_node print synchronization
The binder driver outputs information about each dead binder node by iterating over the dead nodes list, and it prints the state of each live node in the system by traversing each binder_proc's proc->nodes tree. Both cases require similar logic to maintain the global lock ordering while accessing each node. Create a helper function to synchronize around printing binder nodes in a list. Opportunistically make minor cosmetic changes to binder print functions. Acked-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: "Tiffany Y. Yang" <ynaffit@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250510013435.1520671-5-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Bug: 316970771 (cherry picked from commit 91f1bbaa783d26b379d65ef7b4b2b947c338c749) Change-Id: Iae546a847ca138ddfffcdc24faf075b325a54640 Signed-off-by: "Tiffany Yang" <ynaffit@google.com> |
||
|
|
47e5d7e917 |
FROMGIT: scsi: core: ufs: Fix a hang in the error handler
ufshcd_err_handling_prepare() calls ufshcd_rpm_get_sync(). The latter
function can only succeed if UFSHCD_EH_IN_PROGRESS is not set because
resuming involves submitting a SCSI command and ufshcd_queuecommand()
returns SCSI_MLQUEUE_HOST_BUSY if UFSHCD_EH_IN_PROGRESS is set. Fix this
hang by setting UFSHCD_EH_IN_PROGRESS after ufshcd_rpm_get_sync() has
been called instead of before.
Backtrace:
__switch_to+0x174/0x338
__schedule+0x600/0x9e4
schedule+0x7c/0xe8
schedule_timeout+0xa4/0x1c8
io_schedule_timeout+0x48/0x70
wait_for_common_io+0xa8/0x160 //waiting on START_STOP
wait_for_completion_io_timeout+0x10/0x20
blk_execute_rq+0xe4/0x1e4
scsi_execute_cmd+0x108/0x244
ufshcd_set_dev_pwr_mode+0xe8/0x250
__ufshcd_wl_resume+0x94/0x354
ufshcd_wl_runtime_resume+0x3c/0x174
scsi_runtime_resume+0x64/0xa4
rpm_resume+0x15c/0xa1c
__pm_runtime_resume+0x4c/0x90 // Runtime resume ongoing
ufshcd_err_handler+0x1a0/0xd08
process_one_work+0x174/0x808
worker_thread+0x15c/0x490
kthread+0xf4/0x1ec
ret_from_fork+0x10/0x20
Signed-off-by: Sanjeev Yadav <sanjeev.y@mediatek.com>
[ bvanassche: rewrote patch description ]
Fixes:
|
||
|
|
13ff1300ee |
BACKPORT: erofs: allocate more short-lived pages from reserved pool first
This patch aims to allocate bvpages and short-lived compressed pages
from the reserved pool first.
After applying this patch, there are three benefits.
1. It reduces the page allocation time.
The bvpages and short-lived compressed pages account for about 4% of
the pages allocated from the system in the multi-app launch benchmarks
[1]. It reduces the page allocation time accordingly and lowers the
likelihood of blockage by page allocation in low memory scenarios.
2. The pages in the reserved pool will be allocated on demand.
Currently, bvpages and short-lived compressed pages are short-lived
pages allocated from the system, and the pages in the reserved pool all
originate from short-lived pages. Consequently, the number of reserved
pool pages will increase to z_erofs_rsv_nrpages over time.
With this patch, all short-lived pages are allocated from the reserved
pool first, so the number of reserved pool pages will only increase when
there are not enough pages. Thus, even if z_erofs_rsv_nrpages is set to
a large number for specific reasons, the actual number of reserved pool
pages may remain low as per demand. In the multi-app launch benchmarks
[1], z_erofs_rsv_nrpages is set at 256, while the number of reserved
pool pages remains below 64.
3. When erofs cache decompression is disabled
(EROFS_ZIP_CACHE_DISABLED), all pages will *only* be allocated from
the reserved pool for erofs. This will significantly reduce the memory
pressure from erofs.
[1] For additional details on the multi-app launch benchmarks, please
refer to commit 0f6273ab4637 ("erofs: add a reserved buffer pool for lz4
decompression").
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240906121110.3701889-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Bug: 422867180
Bug: 387202250
Change-Id: Ife45adcb4c22c9d73952db1de956e1b9cda1b8c2
(cherry picked from commit 79f504a2cd3c0b7d953d0015618a2a41559a2cfd)
Signed-off-by: liujinbao1 <liujinbao1@xiaomi.corp-partner.google.com>
(cherry picked from commit 6e7af99d68e309a0a1a14e7674406d9462e1b0bb)
|
||
|
|
753068e2ae |
ANDROID: GKI: Add zebra KMI symbol list
These symbols are required scanner driver INFO: 1 function symbol(s) added 'void vb2_video_unregister_device(struct video_device*)' Bug: 423519221 Change-Id: I9826d7ca5b1e8d286fad1ec80efc674cc676540e Signed-off-by: rajat.suri <rajat.suri@zebra.com> |
||
|
|
7af56ffc91 |
UPSTREAM: f2fs: compress: fix error path of inc_valid_block_count()
If inc_valid_block_count() can not allocate all requested blocks,
it needs to release block count in .total_valid_block_count and
resevation blocks in inode.
Bug: 419862398
Change-Id: I3d05f5ced5a8e9e4af6879d8e35ef9aef148dd95
(cherry picked from commit 043c832371cd9023fbd725138ddc6c7f288dc469)
Fixes: 54607494875e ("f2fs: compress: fix to avoid inconsistence bewteen i_blocks and dnode")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao@kernel.org>
|
||
|
|
d8780220fd |
UPSTREAM: scripts/unifdef: avoid constexpr keyword
Starting with c23, 'constexpr' is a keyword in C like in C++ and cannot
be used as an identifier:
scripts/unifdef.c:206:25: error: 'constexpr' can only be used in variable declarations
206 | static bool constexpr; /* constant #if expression */
| ^
scripts/unifdef.c:880:13: error: expected identifier or '('
880 | constexpr = false;
| ^
Rename this instance to allow changing to C23 at some point in the future.
Bug: 401172689
Bug: 422603167
Change-Id: I19e1c13f5dcffe98b8189d3317100f20774f1d4c
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-By: Tony Finch <dot@dotat.at>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
(cherry picked from commit 10f94d8fcc0880c93d7697184fe199022792a61c)
Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com>
(cherry picked from commit b7fd55a297a77e72154def86b46c3d1cf1ffa1e1)
|
||
|
|
3048ff6925 |
ANDROID: GKI: Update symbol list file for xiaomi
INFO: 3 function symbol(s) added 'struct page* read_swap_cache_async(swp_entry_t, gfp_t, struct vm_area_struct*, unsigned long, struct swap_iocb**)' 'int unuse_swap_pte(struct vm_area_struct*, pmd_t*, unsigned long, swp_entry_t, struct folio*)' 'int vfs_fadvise(struct file*, loff_t, loff_t, int)' Bug: 415852480 Change-Id: I5c4d50e042dd8dc8ba4df430614b0ac79ffb41cd Signed-off-by: jianhua hao <haojianhua1@xiaomi.com> |
||
|
|
7f2f532bd0 |
ANDROID: mm: export __pte_offset_map/unuse_swap_pte/read_swap_cache_async
Export __pte_offset_map facilitate retrieving the corresponding PTE using PMD and an address. Add and export unuse_swap_pte to facilitate releasing the PTE resources corresponding to pages preloaded via swapin. Export read_swap_cache_async to facilitate asynchronously reading pages from the swap partition using PTE-prefetch scanning. Bug: 415852480 Change-Id: Ie200656ec97b087936ca98c06b0a370f547d5d0a Signed-off-by: jianhua hao <haojianhua1@xiaomi.com> (cherry picked from commit 88cb3505ebf4d9eb1dd0d3c63403727eb4b239bd) (cherry picked from commit c5defcb638906800d4ab6b50e79e9f25538aefbd) |
||
|
|
4cf22d9783 |
FROMGIT: f2fs: sysfs: export linear_lookup in features directory
cat /sys/fs/f2fs/features/linear_lookup supported Bug: 410768629 (cherry picked from commit 617e0491abe4d8d45c5110ca474c0feb428e6828 https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev) Link: https://lore.kernel.org/linux-f2fs-devel/20250416054805.1416834-2-chao@kernel.org Change-Id: I9762dc9d918f96e716a2e0b76a9fe6d168a6b56a Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [Chao: Resolved conflicts in fs/f2fs/sysfs.c ] |
||
|
|
1619abb40c |
FROMGIT: f2fs: sysfs: add encoding_flags entry
This patch adds a new sysfs entry /sys/fs/f2fs/<disk>/encoding_flags, it is a read-only entry to show the value of sb.s_encoding_flags, the value is hexadecimal. ============================ ========== Flag_Name Flag_Value ============================ ========== SB_ENC_STRICT_MODE_FL 0x00000001 SB_ENC_NO_COMPAT_FALLBACK_FL 0x00000002 ============================ ========== case#1 mkfs.f2fs -f -O casefold -C utf8:strict /dev/vda mount /dev/vda /mnt/f2fs cat /sys/fs/f2fs/vda/encoding_flags 1 case#2 mkfs.f2fs -f -O casefold -C utf8 /dev/vda fsck.f2fs --nolinear-lookup=1 /dev/vda mount /dev/vda /mnt/f2fs cat /sys/fs/f2fs/vda/encoding_flags 2 Bug: 410768629 (cherry picked from commit 3fea0641b06ff4e53d95d07a96764d8951d4ced6 https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev) Link: https://lore.kernel.org/linux-f2fs-devel/20250506074725.12315-1-chao@kernel.org Change-Id: I8a06c81d74278f148c0619fa905989aa75da8719 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [Chao: Resolved conflicts in sysfs-fs-f2fs ] |
||
|
|
78be72f696 |
FROMGIT: f2fs: support to disable linear lookup fallback
After commit 91b587ba79e1 ("f2fs: Introduce linear search for
dentries"), f2fs forced to use linear lookup whenever a hash-based
lookup fails on casefolded directory, it may affect performance for
scenarios: a) create a new file w/ filename it doesn't exist in
directory, b) lookup a file which may be removed.
This patch supports to disable linear lookup fallback, so, once there is
a solution for commit 5c26d2f1d3f5 ("unicode: Don't special case
ignorable code points") to fix red heart unicode issue, then we can set
an encodeing flag to disable the fallback for performance recovery.
The way is kept in line w/ ext4, refer to commit 9e28059d5664 ("ext4:
introduce linear search for dentries").
Bug: 410768629
(cherry picked from commit aa00c6d5d05a80ef5946984025c25ab231b722f9
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Link: https://lore.kernel.org/linux-f2fs-devel/20250401035800.51504-1-chao@kernel.org
[Chao: backport dependent macro from commit 9e28059d5664 ("ext4:
introduce linear search for dentries")]
Cc: Daniel Lee <chullee@google.com>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>
Change-Id: Idff994f832dcaa1b9a5eb750b8ae03a5e580c1c6
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
||
|
|
5b87067cdd |
UPSTREAM: mm/memcg: use kmem_cache when alloc memcg pernode info
When tracing mem_cgroup_per_node allocations with kmalloc ftrace:
kmalloc: call_site=mem_cgroup_css_alloc+0x1d8/0x5b4 ptr=00000000d798700c
bytes_req=2896 bytes_alloc=4096 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
accounted=false
This reveals the slab allocator provides 4096B chunks for 2896B
mem_cgroup_per_node due to:
1. The slab allocator predefines bucket sizes from 64B to 8096B
2. The mem_cgroup allocation size (2312B) falls between the 2KB and 4KB
slabs
3. The allocator rounds up to the nearest larger slab (4KB), resulting in
~1KB wasted memory per memcg alloc - per node.
This patch introduces a dedicated kmem_cache for mem_cgroup structs,
achieving precise memory allocation. Post-patch ftrace verification shows:
kmem_cache_alloc: call_site=mem_cgroup_css_alloc+0x1b8/0x5d4
ptr=000000002989e63a bytes_req=2896 bytes_alloc=2944
gfp_flags=GFP_KERNEL|__GFP_ZERO node=0 accounted=false
Each mem_cgroup_per_node alloc 2944bytes(include hw cacheline align),
compare to 4096, it avoid waste.
Link: https://lkml.kernel.org/r/20250425031935.76411-4-link@vivo.com
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Francesco Valla <francesco@valla.it>
Cc: guoweikang <guoweikang.kernel@gmail.com>
Cc: Huang Shijie <shijie@os.amperecomputing.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Raul E Rangel <rrangel@chromium.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 1b6a58e205ed0bbeeeca46388f0649f322b04f06)
Change-Id: I83a4f675dd8b539a8a40617cc1c1f10e3588e955
Bug: 417296244
Signed-off-by: T.J. Mercier <tjmercier@google.com>
|
||
|
|
19d3046bc5 |
BACKPORT: mm/memcg: use kmem_cache when alloc memcg
When tracing mem_cgroup_alloc() with kmalloc ftrace, we observe:
kmalloc: call_site=mem_cgroup_css_alloc+0xd8/0x5b4 ptr=000000003e4c3799
bytes_req=2312 bytes_alloc=4096 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
accounted=false
The output indicates that while allocating mem_cgroup struct (2312 bytes),
the slab allocator actually provides 4096-byte chunks. This occurs because:
1. The slab allocator predefines bucket sizes from 64B to 8096B
2. The mem_cgroup allocation size (2312B) falls between the 2KB and 4KB
slabs
3. The allocator rounds up to the nearest larger slab (4KB), resulting in
~1KB wasted memory per allocation
This patch introduces a dedicated kmem_cache for mem_cgroup structs,
achieving precise memory allocation. Post-patch ftrace verification shows:
kmem_cache_alloc: call_site=mem_cgroup_css_alloc+0xbc/0x5d4
ptr=00000000695c1806 bytes_req=2312 bytes_alloc=2368
gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1 accounted=false
Each memcg alloc offer 2368bytes(include hw cacheline align), compare to
4096, avoid waste.
Link: https://lkml.kernel.org/r/20250425031935.76411-3-link@vivo.com
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Francesco Valla <francesco@valla.it>
Cc: guoweikang <guoweikang.kernel@gmail.com>
Cc: Huang Shijie <shijie@os.amperecomputing.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Raul E Rangel <rrangel@chromium.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 97e4fc4b35dc1b98f28671fda35bd37d4f401bca)
[TJ: struct_size_t not available. Open code it with struct_size.]
Change-Id: Ib5c73e7af6eaf2b3066088566690cd0c6f4c796c
Bug: 417296244
Signed-off-by: T.J. Mercier <tjmercier@google.com>
|
||
|
|
db3b80e22e |
UPSTREAM: mm/memcg: move mem_cgroup_init() ahead of cgroup_init()
Patch series "Use kmem_cache for memcg alloc", v3.
(willy tldr: "you've gone from allocating 8 objects per 32KiB to
allocating 13 objects per 32KiB, a 62% improvement in memory consumption"
[1])
The mem_cgroup_alloc function creates mem_cgroup struct and it's
associated structures including mem_cgroup_per_node. Through detailed
analysis on our test machine (Arm64, 16GB RAM, 6.6 kernel, 1 NUMA node,
memcgv2 with nokmem,nosocket,cgroup_disable=pressure), we can observe the
memory allocation for these structures using the following shell commands:
# Enable tracing
echo 1 > /sys/kernel/tracing/events/kmem/kmalloc/enable
echo 1 > /sys/kernel/tracing/tracing_on
cat /sys/kernel/tracing/trace_pipe | grep kmalloc | grep mem_cgroup
# Trigger allocation if cgroup subtree do not enable memcg
echo +memory > /sys/fs/cgroup/cgroup.subtree_control
Ftrace Output:
# mem_cgroup struct allocation
sh-6312 [000] ..... 58015.698365: kmalloc:
call_site=mem_cgroup_css_alloc+0xd8/0x5b4
ptr=000000003e4c3799 bytes_req=2312 bytes_alloc=4096
gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1 accounted=false
# mem_cgroup_per_node allocation
sh-6312 [000] ..... 58015.698389: kmalloc:
call_site=mem_cgroup_css_alloc+0x1d8/0x5b4
ptr=00000000d798700c bytes_req=2896 bytes_alloc=4096
gfp_flags=GFP_KERNEL|__GFP_ZERO node=0 accounted=false
Key Observations:
1. Both structures use kmalloc with requested sizes between 2KB-4KB
2. Allocation alignment forces 4KB slab usage due to pre-defined sizes
(64B, 128B,..., 2KB, 4KB, 8KB)
3. Memory waste per memcg instance:
Base struct: 4096 - 2312 = 1784 bytes
Per-node struct: 4096 - 2896 = 1200 bytes
Total waste: 2984 bytes (1-node system)
NUMA scaling: (1200 + 8) * nr_node_ids bytes
So, it's a little waste.
This patchset introduces dedicated kmem_cache:
Patch2 - mem_cgroup kmem_cache - memcg_cachep
Patch3 - mem_cgroup_per_node kmem_cache - memcg_pn_cachep
The benefits of this change can be observed with the following tracing
commands:
# Enable tracing
echo 1 > /sys/kernel/tracing/events/kmem/kmem_cache_alloc/enable
echo 1 > /sys/kernel/tracing/tracing_on
cat /sys/kernel/tracing/trace_pipe | grep kmem_cache_alloc | grep mem_cgroup
# In another terminal:
echo +memory > /sys/fs/cgroup/cgroup.subtree_control
The output might now look like this:
# mem_cgroup struct allocation
sh-9827 [000] ..... 289.513598: kmem_cache_alloc:
call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=00000000695c1806
bytes_req=2312 bytes_alloc=2368 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
accounted=false
# mem_cgroup_per_node allocation
sh-9827 [000] ..... 289.513602: kmem_cache_alloc:
call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=000000002989e63a
bytes_req=2896 bytes_alloc=2944 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
accounted=false
This indicates that the `mem_cgroup` struct now requests 2312 bytes and is
allocated 2368 bytes, while `mem_cgroup_per_node` requests 2896 bytes and
is allocated 2944 bytes. The slight increase in allocated size is due to
`SLAB_HWCACHE_ALIGN` in the `kmem_cache`.
Without `SLAB_HWCACHE_ALIGN`, the allocation might appear as:
# mem_cgroup struct allocation
sh-9269 [003] ..... 80.396366: kmem_cache_alloc:
call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=000000005b12b475
bytes_req=2312 bytes_alloc=2312 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
accounted=false
# mem_cgroup_per_node allocation
sh-9269 [003] ..... 80.396411: kmem_cache_alloc:
call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=00000000f347adc6
bytes_req=2896 bytes_alloc=2896 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
accounted=false
While the `bytes_alloc` now matches the `bytes_req`, this patchset
defaults to using `SLAB_HWCACHE_ALIGN` as it is generally considered more
beneficial for performance. Please let me know if there are any issues or
if I've misunderstood anything.
This patchset also move mem_cgroup_init ahead of cgroup_init() due to
cgroup_init() will allocate root_mem_cgroup, but each initcall invoke
after cgroup_init, so if each kmem_cache do not prepare, we need testing
NULL before use it.
This patch (of 3):
When cgroup_init() creates root_mem_cgroup through css_alloc callback,
some critical resources might not be fully initialized, forcing later
operations to perform conditional checks for resource availability.
This patch move mem_cgroup_init() to address the init order, it invoke
before cgroup_init, so, compare to subsys_initcall, it can use to prepare
some key resources before root_mem_cgroup alloc.
Link: https://lkml.kernel.org/r/aAsRCj-niMMTtmK8@casper.infradead.org [1]
Link: https://lkml.kernel.org/r/20250425031935.76411-1-link@vivo.com
Link: https://lkml.kernel.org/r/20250425031935.76411-2-link@vivo.com
Signed-off-by: Huan Yang <link@vivo.com>
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Francesco Valla <francesco@valla.it>
Cc: guoweikang <guoweikang.kernel@gmail.com>
Cc: Huang Shijie <shijie@os.amperecomputing.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Raul E Rangel <rrangel@chromium.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit bc9817bb7a21f64fbca2c4b83811d943036ec870)
Change-Id: I71005c1ad3d826b99fa698358bcf357ff7924c8c
Bug: 417296244
Signed-off-by: T.J. Mercier <tjmercier@google.com>
|
||
|
|
db710ea87c |
BACKPORT: mm: avoid unconditional one-tick sleep when swapcache_prepare fails
Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
introduced an unconditional one-tick sleep when `swapcache_prepare()`
fails, which has led to reports of UI stuttering on latency-sensitive
Android devices. To address this, we can use a waitqueue to wake up tasks
that fail `swapcache_prepare()` sooner, instead of always sleeping for a
full tick. While tasks may occasionally be woken by an unrelated
`do_swap_page()`, this method is preferable to two scenarios: rapid
re-entry into page faults, which can cause livelocks, and multiple
millisecond sleeps, which visibly degrade user experience.
Oven's testing shows that a single waitqueue resolves the UI stuttering
issue. If a 'thundering herd' problem becomes apparent later, a waitqueue
hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit
locks can be introduced.
[v-songbaohua@oppo.com: wake_up only when swapcache_wq waitqueue is active]
Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com
Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
Change-Id: I6cd3d6ef318d660ee6290554b5e864d90a70b920
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reported-by: Oven Liyang <liyangouwen1@oppo.com>
Tested-by: Oven Liyang <liyangouwen1@oppo.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 01626a18230246efdcea322aa8f067e60ffe5ccd)
Bug: 313807618
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Oven <liyangouwen1@oppo.com>
|
||
|
|
9e207186c7 |
Merge tag 'android14-6.1.138_r00' into android14-6.1
This merges the android14-6.1.138_r00 tag into the android14-6.1 branch, catching it up with the latest LTS releases. It contains the following commits: * |
||
|
|
3c6d0251e1 |
ANDROID: ABI: Update pixel symbol list
Adding the following symbols: - param_ops_ullong - __traceiter_android_rvh_setscheduler_prio - __tracepoint_android_rvh_setscheduler_prio - usb_gadget_connect - usb_gadget_disconnect Bug: 409176857 Change-Id: I026c6a80ef4c31577bb2fc28b0b3d9e2e709a200 Signed-off-by: Chungkai Mei <chungkai@google.com> |
||
|
|
ed6999107e |
ANDROID: vendor_hook: add trace_android_rvh_setscheduler_prio
To modify priority of specific tasks, add the vendor hook in __setscheduler_prio Bug: 409176857 Change-Id: Id5a2309378f1a8c3ecc1de71c20f44f73b3f7557 Signed-off-by: Chungkai Mei <chungkai@google.com> |
||
|
|
5b71d36425 |
ANDROID: binder: fix minimum node priority comparison
The "desired" priority for a transaction can be adjusted depending on
various factors. For instance, it might be set to SCHED_NORMAL 120, when
the caller is RT and the target node has !inherit_rt.
However, instead of using these adjustments, the existing logic compares
the minimum node priority against the original transaction priority.
If the transaction priority is "higher", then the minimum node priority
is ignored. This is particularly a problem when the "desired" priority
has been changed to SCHED_NORMAL.
This patch corrects the logic, comparing the minimum node priority
against the (potentially adjusted) "desired" priority. This guarantees
that the node's minimum priority is honored.
Bug: 417382411
Cc: Martijn Coenen <maco@google.com>
Fixes:
|
||
|
|
785e577258 |
BACKPORT: KVM: arm64: Eagerly switch ZCR_EL{1,2}
[ Upstream commit 59419f10045bc955d2229819c7cf7a8b0b9c5b59 ]
In non-protected KVM modes, while the guest FPSIMD/SVE/SME state is live on the
CPU, the host's active SVE VL may differ from the guest's maximum SVE VL:
* For VHE hosts, when a VM uses NV, ZCR_EL2 contains a value constrained
by the guest hypervisor, which may be less than or equal to that
guest's maximum VL.
Note: in this case the value of ZCR_EL1 is immaterial due to E2H.
* For nVHE/hVHE hosts, ZCR_EL1 contains a value written by the guest,
which may be less than or greater than the guest's maximum VL.
Note: in this case hyp code traps host SVE usage and lazily restores
ZCR_EL2 to the host's maximum VL, which may be greater than the
guest's maximum VL.
This can be the case between exiting a guest and kvm_arch_vcpu_put_fp().
If a softirq is taken during this period and the softirq handler tries
to use kernel-mode NEON, then the kernel will fail to save the guest's
FPSIMD/SVE state, and will pend a SIGKILL for the current thread.
This happens because kvm_arch_vcpu_ctxsync_fp() binds the guest's live
FPSIMD/SVE state with the guest's maximum SVE VL, and
fpsimd_save_user_state() verifies that the live SVE VL is as expected
before attempting to save the register state:
| if (WARN_ON(sve_get_vl() != vl)) {
| force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
| return;
| }
Fix this and make this a bit easier to reason about by always eagerly
switching ZCR_EL{1,2} at hyp during guest<->host transitions. With this
happening, there's no need to trap host SVE usage, and the nVHE/nVHE
__deactivate_cptr_traps() logic can be simplified to enable host access
to all present FPSIMD/SVE/SME features.
In protected nVHE/hVHE modes, the host's state is always saved/restored
by hyp, and the guest's state is saved prior to exit to the host, so
from the host's PoV the guest never has live FPSIMD/SVE/SME state, and
the host's ZCR_EL1 is never clobbered by hyp.
Bug: 411040189
Change-Id: Ifecd5024230fadd0b73755587950ba651b94dae0
Fixes:
|
||
|
|
6a31e426c6 |
BACKPORT: KVM: arm64: Calculate cptr_el2 traps on activating traps
[ Upstream commit 2fd5b4b0e7b440602455b79977bfa64dea101e6c ] Similar to VHE, calculate the value of cptr_el2 from scratch on activate traps. This removes the need to store cptr_el2 in every vcpu structure. Moreover, some traps, such as whether the guest owns the fp registers, need to be set on every vcpu run. [tabba@ Kept cptr_el2 as to not break the KMI.] Bug: 411040189 Reported-by: James Clark <james.clark@linaro.org> Fixes: 5294afdbf45a ("KVM: arm64: Exclude FP ownership from kvm_vcpu_arch") Change-Id: Iba65e9bb65d8498007423dc5b137dedc602359de Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-13-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
89720e9e1b |
BACKPORT: KVM: arm64: Mark some header functions as inline
[ Upstream commit f9dd00de1e53a47763dfad601635d18542c3836d ] The shared hyp switch header has a number of static functions which might not be used by all files that include the header, and when unused they will provoke compiler warnings, e.g. | In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8: | ./arch/arm64/kvm/hyp/include/hyp/switch.h:703:13: warning: 'kvm_hyp_handle_dabt_low' defined but not used [-Wunused-function] | 703 | static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:682:13: warning: 'kvm_hyp_handle_cp15_32' defined but not used [-Wunused-function] | 682 | static bool kvm_hyp_handle_cp15_32(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:662:13: warning: 'kvm_hyp_handle_sysreg' defined but not used [-Wunused-function] | 662 | static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:458:13: warning: 'kvm_hyp_handle_fpsimd' defined but not used [-Wunused-function] | 458 | static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:329:13: warning: 'kvm_hyp_handle_mops' defined but not used [-Wunused-function] | 329 | static bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~ Mark these functions as 'inline' to suppress this warning. This shouldn't result in any functional change. At the same time, avoid the use of __alias() in the header and alias kvm_hyp_handle_iabt_low() and kvm_hyp_handle_watchpt_low() to kvm_hyp_handle_memory_fault() using CPP, matching the style in the rest of the kernel. For consistency, kvm_hyp_handle_memory_fault() is also marked as 'inline'. Bug: 411040189 Change-Id: I5766401542afda440f737c1fee1810a73e89e86d Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-8-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Fuad Tabba <tabba@google.com> |
||
|
|
b9b8d84f6c |
BACKPORT: KVM: arm64: Refactor exit handlers
[ Upstream commit 9b66195063c5a145843547b1d692bd189be85287 ] The hyp exit handling logic is largely shared between VHE and nVHE/hVHE, with common logic in arch/arm64/kvm/hyp/include/hyp/switch.h. The code in the header depends on function definitions provided by arch/arm64/kvm/hyp/vhe/switch.c and arch/arm64/kvm/hyp/nvhe/switch.c when they include the header. This is an unusual header dependency, and prevents the use of arch/arm64/kvm/hyp/include/hyp/switch.h in other files as this would result in compiler warnings regarding missing definitions, e.g. | In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8: | ./arch/arm64/kvm/hyp/include/hyp/switch.h:733:31: warning: 'kvm_get_exit_handler_array' used but never defined | 733 | static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu); | | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:735:13: warning: 'early_exit_filter' used but never defined | 735 | static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code); | | ^~~~~~~~~~~~~~~~~ Refactor the logic such that the header doesn't depend on anything from the C files. There should be no functional change as a result of this patch. Bug: 411040189 Change-Id: I4e58bad80763afd73fd03f9653ed4e66dfe97255 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-7-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Fuad Tabba <tabba@google.com> |
||
|
|
c00c44bea2 |
BACKPORT: KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN
[ Upstream commit 407a99c4654e8ea65393f412c421a55cac539f5b ] When KVM is in VHE mode, the host kernel tries to save and restore the configuration of CPACR_EL1.SMEN (i.e. CPTR_EL2.SMEN when HCR_EL2.E2H=1) across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the configuration may be clobbered by hyp when running a vCPU. This logic has historically been broken, and is currently redundant. This logic was originally introduced in commit: |
||
|
|
c952e23cf8 |
BACKPORT: KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN
[ Upstream commit 459f059be702056d91537b99a129994aa6ccdd35 ] When KVM is in VHE mode, the host kernel tries to save and restore the configuration of CPACR_EL1.ZEN (i.e. CPTR_EL2.ZEN when HCR_EL2.E2H=1) across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the configuration may be clobbered by hyp when running a vCPU. This logic is currently redundant. The VHE hyp code unconditionally configures CPTR_EL2.ZEN to 0b01 when returning to the host, permitting host kernel usage of SVE. Now that the host eagerly saves and unbinds its own FPSIMD/SVE/SME state, there's no need to save/restore the state of the EL0 SVE trap. The kernel can safely save/restore state without trapping, as described above, and will restore userspace state (including trap controls) before returning to userspace. Remove the redundant logic. Bug: 411040189 Change-Id: I43bf5587223aae54caf9389eb3de17f155043d96 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-4-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org> [Rework for refactoring of where the flags are stored -- broonie] Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Fuad Tabba <tabba@google.com> |
||
|
|
a08391468f |
BACKPORT: KVM: arm64: Remove host FPSIMD saving for non-protected KVM
[ Upstream commit 8eca7f6d5100b6997df4f532090bc3f7e0203bef ] Now that the host eagerly saves its own FPSIMD/SVE/SME state, non-protected KVM never needs to save the host FPSIMD/SVE/SME state, and the code to do this is never used. Protected KVM still needs to save/restore the host FPSIMD/SVE state to avoid leaking guest state to the host (and to avoid revealing to the host whether the guest used FPSIMD/SVE/SME), and that code needs to be retained. Remove the unused code and data structures. To avoid the need for a stub copy of kvm_hyp_save_fpsimd_host() in the VHE hyp code, the nVHE/hVHE version is moved into the shared switch header, where it is only invoked when KVM is in protected mode. [tabba@ Kept user_fpsimd_state as to not break the KMI.] Bug: 411040189 Change-Id: I0088db7c5f75c9331956867040b8eb69976aabf8 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-3-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Fuad Tabba <tabba@google.com> |
||
|
|
12921b6e23 |
BACKPORT: KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state
[ Upstream commit fbc7e61195e23f744814e78524b73b59faa54ab4 ] There are several problems with the way hyp code lazily saves the host's FPSIMD/SVE state, including: * Host SVE being discarded unexpectedly due to inconsistent configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to result in QEMU crashes where SVE is used by memmove(), as reported by Eric Auger: https://issues.redhat.com/browse/RHEL-68997 * Host SVE state is discarded *after* modification by ptrace, which was an unintentional ptrace ABI change introduced with lazy discarding of SVE state. * The host FPMR value can be discarded when running a non-protected VM, where FPMR support is not exposed to a VM, and that VM uses FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR before unbinding the host's FPSIMD/SVE/SME state, leaving a stale value in memory. Avoid these by eagerly saving and "flushing" the host's FPSIMD/SVE/SME state when loading a vCPU such that KVM does not need to save any of the host's FPSIMD/SVE/SME state. For clarity, fpsimd_kvm_prepare() is removed and the necessary call to fpsimd_save_and_flush_cpu_state() is placed in kvm_arch_vcpu_load_fp(). As 'fpsimd_state' and 'fpmr_ptr' should not be used, they are set to NULL; all uses of these will be removed in subsequent patches. Historical problems go back at least as far as v5.17, e.g. erroneous assumptions about TIF_SVE being clear in commit: |
||
|
|
f1df93017e |
BACKPORT: KVM: arm64: Discard any SVE state when entering KVM guests
[ Upstream commit |
||
|
|
21c687a8c5 |
ANDROID: KVM: arm64: Eagerly restore host FPSIMD/SVE state in pKVM
Eagerly restore the host fpsimd/sve state after every vcpu run in protected mode if the fpsimd/sve unit was used by the guest, instead of setting fpsimd/simd traps and restoring if the host triggers them. Note that the behavior with this patch is the existing behavior in Android 16 (except for restoring ZCL_EL2, which is being fixed in conjunction with this patch there as well). Bug: 411040189 Change-Id: I5702590331093937c1cd0d08ac754c634054c7f7 Signed-off-by: Fuad Tabba <tabba@google.com> |
||
|
|
d871a6444c |
ANDROID: KVM: arm64: Move __deactivate_fpsimd_traps() to switch.h
Move __deactivate_fpsimd_traps() to the shared switch header, instead of having separate implementations in the vhe/nvhe switch.c files. Subsequent patches will remove all specific implementations from switch.c and include switch.h in other files. Bug: 411040189 Change-Id: I42c545e939b230366fbd9ad8e41a614193169bce Signed-off-by: Fuad Tabba <tabba@google.com> |
||
|
|
1b3dfc7c38 |
ANDROID: KVM: arm64: Move kvm_hyp_handle_fpsimd_host() to switch.h
Move kvm_hyp_handle_fpsimd_host() to the shared switch header, instead of having separate implementations in the vhe/nvhe switch.c files. Subsequent patches will remove all specific implementations from switch.c and include switch.h in other files. Bug: 411040189 Change-Id: I07f1d92f96b072435ded5f0b84a446df4e6a81ab Signed-off-by: Fuad Tabba <tabba@google.com> |
||
|
|
c3b505e78c |
ANDROID: KVM: arm64: Remove pkvm_set_max_sve_vq()
This function doesn't encapsulate that much code, and removing it makes backporting SVE-fix patches easier and cleaner. No functional change intended. Bug: 411040189 Change-Id: I27b3fe467b1896a393751349b86771ddbb1bd62b Signed-off-by: Fuad Tabba <tabba@google.com> |
||
|
|
d653b32842 |
Revert "ANDROID: KVM: arm64: Use enum instead of helper for fp state"
This reverts commit
|
||
|
|
b07be5e511 |
ANDROID: GKI: update symbol list file for xiaomi
add 2 function: trace_android_vh_drain_all_pages_bypass() trace_android_vh_pageset_update() Bug: 418695654 Change-Id: Id1bbb269b7650528dcb2dfac29e7a611154954b3 Signed-off-by: Marcus Ma <maminghui5@xiaomi.corp-partner.google.com> |
||
|
|
58b3f63bc6 |
ANDROID: vendor_hooks: Add hooks for pcp related optimization.
We want to make some optimizations to the pcp buffer. First, when directly recycling, we skip drain_all_pages when it is known that the pcp buffer is small to reduce zone->lock contention. In addition, the default pcp buffer size is still relatively small for mobile phones with large memory. We want to increase the pcp buffer area to reduce zone->lock contention. Bug: 418695654 Change-Id: I38c7a3715500918d839e4363bbcc41cdbf4bd643 Signed-off-by: Marcus Ma <maminghui5@xiaomi.corp-partner.google.com> |
||
|
|
ad7902a401 |
BACKPORT: mm: userfaultfd: correct dirty flags set for both present and swap pte
As David pointed out, what truly matters for mremap and userfaultfd move operations is the soft dirty bit. The current comment and implementation—which always sets the dirty bit for present PTEs and fails to set the soft dirty bit for swap PTEs—are incorrect. This could break features like Checkpoint-Restore in Userspace (CRIU). This patch updates the behavior to correctly set the soft dirty bit for both present and swap PTEs in accordance with mremap. Link: https://lkml.kernel.org/r/20250508220912.7275-1-21cnbao@gmail.com Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reported-by: David Hildenbrand <david@redhat.com> Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redhat.com/ Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 75cb1cca2c880179a11c7dd9380b6f14e41a06a4) Merge Conflicts: 1. pte_mkwrite() doesn't take vma as second argument, so removed it. Change-Id: I5fc25f9028ad7972ea1b6d873f072fd15f9c7214 Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> |
||
|
|
e30317e116 |
ANDROID: 16K: Remove ELF padding entry from map_file ranges
Symbolization techniques use address ranges as reported in /proc/*/maps
to infer the corresponding /proc/*/map_files/ entry.
Per Daniel, this is done because the path in /proc/*/maps is problematic
for at least two reasons:
1. The file could have been deleted from the file system (this is
indicated with the (deleted) suffix), meaning that you can't
actually open it through the "regular" file system. However,
while the mapping is alive, the kernel keeps the inode accessible
via the corresponding /proc/*/map_files entry, allowing for
access after all.
2. It makes dealing with changed root and file system namespaces
much more painful. The /proc/*/maps path is relative, and so now
you need to concatenate paths etc. Accessing file through
/proc/*/map_files just works (assuming necessary permissions), as
the kernel redirects the request to the proper inode,
irrespective of how it is exposed through the non-proc
filesystem.
Android extends ELF padding regions to be contiguously mapped in memory
to mitigate increase in unreclaimable VMA slab memory usage.
Commit 8c2a805a857914324b077708b45c31c2f20d02da [1] emulates the padding
region of such extended mappings to be outputted as PROT_NONE
[page size compat] entries from /proc/*/[s]maps. This breaks the use
case of /proc/*/maps_files/, as the ranges in /proc/*/map_files/ are
the true ranges of the actual underlying VMA layout; while those in
/proc/*/[s]maps are the emulated (shortened) ranges.
Remove the padding (extended) ranges from /proc/*/maps_files entries.
====== Example Output ======
=== maps ===
❯ adb shell cat /proc/1/maps | grep -A1 libdl_android.so | sed '$d'
7f76663df000-7f76663e0000 r--p 00000000 fe:09 1911 /system/lib64/bootstrap/libdl_android.so
7f76663e0000-7f76663e3000 ---p 00000000 00:00 0 [page size compat]
7f76663e3000-7f76663e4000 r-xp 00004000 fe:09 1911 /system/lib64/bootstrap/libdl_android.so
7f76663e4000-7f76663e7000 ---p 00000000 00:00 0 [page size compat]
7f76663e7000-7f76663e8000 r--p 00008000 fe:09 1911 /system/lib64/bootstrap/libdl_android.s
=== map_files - Before patch ===
❯ adb shell ls /proc/1/map_files | grep -A2 7f76663df000
7f76663df000-7f76663e3000
7f76663e3000-7f76663e7000
7f76663e7000-7f76663e8000
=== map_files - After patch ===
❯ adb shell ls /proc/1/map_files | grep -A2 7f76663df000
7f76663df000-7f76663e0000
7f76663e3000-7f76663e4000
7f76663e7000-7f76663e8000
[1]
|
||
|
|
228e0f23bd |
UPSTREAM: net_sched: sch_sfq: move the limit validation
[ Upstream commit b3bf8f63e6179076b57c9de660c9f80b5abefe70 ] It is not sufficient to directly validate the limit on the data that the user passes as it can be updated based on how the other parameters are changed. Move the check at the end of the configuration update process to also catch scenarios where the limit is indirectly updated, for example with the following configurations: tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 depth 1 tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 divisor 1 This fixes the following syzkaller reported crash: ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:6 index 65535 is out of range for type 'struct sfq_head[128]' CPU: 1 UID: 0 PID: 3037 Comm: syz.2.16 Not tainted 6.14.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x201/0x300 lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:231 [inline] __ubsan_handle_out_of_bounds+0xf5/0x120 lib/ubsan.c:429 sfq_link net/sched/sch_sfq.c:203 [inline] sfq_dec+0x53c/0x610 net/sched/sch_sfq.c:231 sfq_dequeue+0x34e/0x8c0 net/sched/sch_sfq.c:493 sfq_reset+0x17/0x60 net/sched/sch_sfq.c:518 qdisc_reset+0x12e/0x600 net/sched/sch_generic.c:1035 tbf_reset+0x41/0x110 net/sched/sch_tbf.c:339 qdisc_reset+0x12e/0x600 net/sched/sch_generic.c:1035 dev_reset_queue+0x100/0x1b0 net/sched/sch_generic.c:1311 netdev_for_each_tx_queue include/linux/netdevice.h:2590 [inline] dev_deactivate_many+0x7e5/0xe70 net/sched/sch_generic.c:1375 Bug: 413623519 Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 10685681bafc ("net_sched: sch_sfq: don't allow 1 packet limit") Signed-off-by: Octavian Purdila <tavip@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit f86293adce0c201cfabb283ef9d6f21292089bb8) Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: Ie5fc222b52c59eaa1070cc03402f8a624af60cd9 |
||
|
|
3e7cb920f1 |
UPSTREAM: net_sched: sch_sfq: use a temporary work area for validating configuration
[ Upstream commit 8c0cea59d40cf6dd13c2950437631dd614fbade6 ] Many configuration parameters have influence on others (e.g. divisor -> flows -> limit, depth -> limit) and so it is difficult to correctly do all of the validation before applying the configuration. And if a validation error is detected late it is difficult to roll back a partially applied configuration. To avoid these issues use a temporary work area to update and validate the configuration and only then apply the configuration to the internal state. Bug: 413623519 Signed-off-by: Octavian Purdila <tavip@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: b3bf8f63e617 ("net_sched: sch_sfq: move the limit validation") Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 70449ca40609ec77f58b93ed154d54e1fdb197b6) Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: Icab9dc62eddd23f6a2c5d06dd1f8457294716fb8 |
||
|
|
a0fa2316cc |
ANDROID: ABI: Update pixel symbol list
Adding the following symbols: - irq_check_status_bit - irq_get_percpu_devid_partition - irq_work_run - perf_aux_output_skip - this_cpu_has_cap Bug: 393467632 Change-Id: I8e9f34b6b40ec078586d175efb835a6898cbc4f1 Signed-off-by: Yabin Cui <yabinc@google.com> |
||
|
|
218e2bd245 |
FROMGIT: perf/aux: Allocate non-contiguous AUX pages by default
perf always allocates contiguous AUX pages based on aux_watermark. However, this contiguous allocation doesn't benefit all PMUs. For instance, ARM SPE and TRBE operate with virtual pages, and Coresight ETR allocates a separate buffer. For these PMUs, allocating contiguous AUX pages unnecessarily exacerbates memory fragmentation. This fragmentation can prevent their use on long-running devices. This patch modifies the perf driver to be memory-friendly by default, by allocating non-contiguous AUX pages. For PMUs requiring contiguous pages (Intel BTS and some Intel PT), the existing PERF_PMU_CAP_AUX_NO_SG capability can be used. For PMUs that don't require but can benefit from contiguous pages (some Intel PT), a new capability, PERF_PMU_CAP_AUX_PREFER_LARGE, is added to maintain their existing behavior. Bug: 393467632 (cherry picked from commit 18049c8cff9cc89daadc4df6975f7d9069638926 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core) Change-Id: Iaff554201726bf271c7625a6df59fb35c6cfbc5d Signed-off-by: Yabin Cui <yabinc@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250508232642.148767-1-yabinc@google.com |
||
|
|
3cd01bb5bd |
UPSTREAM: mm: Fix is_zero_page() usage in try_grab_page()
The backport of upstream commit |
||
|
|
53b26534cc |
UPSTREAM: usb-storage: Optimize scan delay more precisely
Current storage scan delay is reduced by the following old commit.
|