Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock
state when it goes into split_folio() with raised folio refcount.
split_folio() expects the reference count to be exactly mapcount +
num_pages_in_folio + 1 (see can_split_folio()) and fails with EAGAIN
otherwise.
If multiple processes are trying to move the same large folio, they raise
the refcount (all tasks succeed in that) then one of them succeeds in
locking the folio, while others will block in folio_lock() while keeping
the refcount raised. The winner of this race will proceed with calling
split_folio() and will fail returning EAGAIN to the caller and unlocking
the folio. The next competing process will get the folio locked and will
go through the same flow. In the meantime the original winner will be
retried and will block in folio_lock(), getting into the queue of waiting
processes only to repeat the same path. All this results in a livelock.
An easy fix would be to avoid waiting for the folio lock while holding
folio refcount, similar to madvise_free_huge_pmd() where folio lock is
acquired before raising the folio refcount. Since we lock and take a
refcount of the folio while holding the PTE lock, changing the order of
these operations should not break anything.
Modify move_pages_pte() to try locking the folio first and if that fails
and the folio is large then return EAGAIN without touching the folio
refcount. If the folio is single-page then split_folio() is not called,
so we don't have this issue. Lokesh has a reproducer [1] and I verified
that this change fixes the issue.
[1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock
[akpm@linux-foundation.org: reflow comment to 80 cols, s/end/end up/]
Link: https://lkml.kernel.org/r/20250226185510.2732648-2-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
(cherry-picked from commit 37b338eed10581784e854d4262da05c8d960c748 https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-hotfixes-stable)
Change-Id: I71b307add9707ad3518a44623aea2e2ca417b95a
Bug: 401790618
userfaultfd_move() checks whether the PTE entry is present or a
swap entry.
- If the PTE entry is present, move_present_pte() handles folio
migration by setting:
src_folio->index = linear_page_index(dst_vma, dst_addr);
- If the PTE entry is a swap entry, move_swap_pte() simply copies
the PTE to the new dst_addr.
This approach is incorrect because, even if the PTE is a swap entry,
it can still reference a folio that remains in the swap cache.
This creates a race window between steps 2 and 4.
1. add_to_swap: The folio is added to the swapcache.
2. try_to_unmap: PTEs are converted to swap entries.
3. pageout: The folio is written back.
4. Swapcache is cleared.
If userfaultfd_move() occurs in the window between steps 2 and 4,
after the swap PTE has been moved to the destination, accessing the
destination triggers do_swap_page(), which may locate the folio in
the swapcache. However, since the folio's index has not been updated
to match the destination VMA, do_swap_page() will detect a mismatch.
This can result in two critical issues depending on the system
configuration.
If KSM is disabled, both small and large folios can trigger a BUG
during the add_rmap operation due to:
page_pgoff(folio, page) != linear_page_index(vma, address)
[ 13.336953] page: refcount:6 mapcount:1 mapping:00000000f43db19c index:0xffffaf150 pfn:0x4667c
[ 13.337520] head: order:2 mapcount:1 entire_mapcount:0 nr_pages_mapped:1 pincount:0
[ 13.337716] memcg:ffff00000405f000
[ 13.337849] anon flags: 0x3fffc0000020459(locked|uptodate|dirty|owner_priv_1|head|swapbacked|node=0|zone=0|lastcpupid=0xffff)
[ 13.338630] raw: 03fffc0000020459 ffff80008507b538 ffff80008507b538 ffff000006260361
[ 13.338831] raw: 0000000ffffaf150 0000000000004000 0000000600000000 ffff00000405f000
[ 13.339031] head: 03fffc0000020459 ffff80008507b538 ffff80008507b538 ffff000006260361
[ 13.339204] head: 0000000ffffaf150 0000000000004000 0000000600000000 ffff00000405f000
[ 13.339375] head: 03fffc0000000202 fffffdffc0199f01 ffffffff00000000 0000000000000001
[ 13.339546] head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
[ 13.339736] page dumped because: VM_BUG_ON_PAGE(page_pgoff(folio, page) != linear_page_index(vma, address))
[ 13.340190] ------------[ cut here ]------------
[ 13.340316] kernel BUG at mm/rmap.c:1380!
[ 13.340683] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 13.340969] Modules linked in:
[ 13.341257] CPU: 1 UID: 0 PID: 107 Comm: a.out Not tainted 6.14.0-rc3-gcf42737e247a-dirty #299
[ 13.341470] Hardware name: linux,dummy-virt (DT)
[ 13.341671] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 13.341815] pc : __page_check_anon_rmap+0xa0/0xb0
[ 13.341920] lr : __page_check_anon_rmap+0xa0/0xb0
[ 13.342018] sp : ffff80008752bb20
[ 13.342093] x29: ffff80008752bb20 x28: fffffdffc0199f00 x27: 0000000000000001
[ 13.342404] x26: 0000000000000000 x25: 0000000000000001 x24: 0000000000000001
[ 13.342575] x23: 0000ffffaf0d0000 x22: 0000ffffaf0d0000 x21: fffffdffc0199f00
[ 13.342731] x20: fffffdffc0199f00 x19: ffff000006210700 x18: 00000000ffffffff
[ 13.342881] x17: 6c203d2120296567 x16: 6170202c6f696c6f x15: 662866666f67705f
[ 13.343033] x14: 6567617028454741 x13: 2929737365726464 x12: ffff800083728ab0
[ 13.343183] x11: ffff800082996bf8 x10: 0000000000000fd7 x9 : ffff80008011bc40
[ 13.343351] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000829eebf8
[ 13.343498] x5 : c0000000fffff000 x4 : 0000000000000000 x3 : 0000000000000000
[ 13.343645] x2 : 0000000000000000 x1 : ffff0000062db980 x0 : 000000000000005f
[ 13.343876] Call trace:
[ 13.344045] __page_check_anon_rmap+0xa0/0xb0 (P)
[ 13.344234] folio_add_anon_rmap_ptes+0x22c/0x320
[ 13.344333] do_swap_page+0x1060/0x1400
[ 13.344417] __handle_mm_fault+0x61c/0xbc8
[ 13.344504] handle_mm_fault+0xd8/0x2e8
[ 13.344586] do_page_fault+0x20c/0x770
[ 13.344673] do_translation_fault+0xb4/0xf0
[ 13.344759] do_mem_abort+0x48/0xa0
[ 13.344842] el0_da+0x58/0x130
[ 13.344914] el0t_64_sync_handler+0xc4/0x138
[ 13.345002] el0t_64_sync+0x1ac/0x1b0
[ 13.345208] Code: aa1503e0 f000f801 910f6021 97ff5779 (d4210000)
[ 13.345504] ---[ end trace 0000000000000000 ]---
[ 13.345715] note: a.out[107] exited with irqs disabled
[ 13.345954] note: a.out[107] exited with preempt_count 2
If KSM is enabled, Peter Xu also discovered that do_swap_page() may
trigger an unexpected CoW operation for small folios because
ksm_might_need_to_copy() allocates a new folio when the folio index
does not match linear_page_index(vma, addr).
This patch also checks the swapcache when handling swap entries. If a
match is found in the swapcache, it processes it similarly to a present
PTE.
However, there are some differences. For example, the folio is no longer
exclusive because folio_try_share_anon_rmap_pte() is performed during
unmapping.
Furthermore, in the case of swapcache, the folio has already been
unmapped, eliminating the risk of concurrent rmap walks and removing the
need to acquire src_folio's anon_vma or lock.
Note that for large folios, in the swapcache handling path, we directly
return -EBUSY since split_folio() will return -EBUSY regardless if
the folio is under writeback or unmapped. This is not an urgent issue,
so a follow-up patch may address it separately.
[v-songbaohua@oppo.com: minor cleanup according to Peter Xu]
Link: https://lkml.kernel.org/r/20250226024411.47092-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20250226001400.9129-1-21cnbao@gmail.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Conflicts:
1. mm/userfaultfd.c
[Removed pmd arguments being passed to move_swap_pte() to resolve conflicts - Lokesh Gidra]
[Replaced swap_cache_index() with swp_offset() as the former doesn't exist - Lokesh Gidra]
[Replaced folio_move_anon_rmap() with page_move_anon_rmap() as the
former doesn't exist - Lokesh Gidra]
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
(cherry-picked from commit c50f8e6053b0503375c2975bf47f182445aebb4c https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-hotfixes-stable)
Change-Id: I94caeac5bf78add4d78650929303a25d54d8a638
Bug: 401790618
The size of the buffer used to retrieve the memfd file name is
currently calculated at runtime, making the buffer a variable
length array. However, all of the terms used in the buffer size
calculation are known at compile time, so use compile time constants
for the calculation.
Bug: 399839316
Change-Id: Ie1edf9a28f735ebeffab07f64efc4de45f1f095a
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
In the current vendor hook, if next_uptodate_folio returns NULL, the
first_pgoff is set to zero, and the last_pgoff is set to start_pgoff.
Therefore, the collection range is from 0 to the start_pgoff.
|-----------|------------|-------------|------------------|
0 start_pgoff first_pgoff last_pgoff end_pgoff
We want to collect the first_pgoff to last_pgoff, so we have to add a
new vendor hook.
Bug: 398130226
Change-Id: I19d54c601e2ffc5de5ec2dafcd43fbdcdc84b0d2
Signed-off-by: zhanghui <zhanghui31@xiaomi.com>
Allow the memfd-ashmem-shim ioctl handler to run for any shmem file,
so that memfds can handle ashmem ioctl commands.
While this allows ashmem ioctl commands to be invoked on more than just
memfds, this should be fine, since the ioctl commands don't expose any
additional functionality than what is already achievable via other
system calls.
Bug: 111903542
Change-Id: I0bf57ac5a90dba66e5c2c32beff70bcf9d26db6b
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Certain applications treat any shared memory buffer that they obtain
as an ashmem buffer, meaning that they will attempt to invoke ashmem
ioctl commands on that buffer.
Android is transitioning to replacing ashmem with memfd, and memfd
currently does not support ashmem ioctl commands. So, when an
application attempts to invoke an ashmem ioctl command on a memfd,
the invocation will fail and report an error back to the app.
In order to preserve compatibility between these apps and memfds,
add a shim layer which will handle ashmem ioctl commands for memfds.
Bug: 111903542
Change-Id: I268a29ee2805739550d79fd2c21d3cfb5a852642
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Memfd does not support preventing a file from being mapped with PROT_EXEC,
as ashmem does. It would be useful to expose a knob to userspace to
change ashmem's behavior to match memfd to see if any issues arise
during tests.
Therefore, expose a tunable that userspace can use to cause ashmem to
ignore requests to deny PROT_EXEC mappings.
Bug: 111903542
Change-Id: I3da63d899c4753aa704092bf8e8a2568500fa833
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Memfd does not support preventing a file from being mapped with PROT_READ,
as ashmem does. It would be useful to expose a knob to userspace to
change ashmem's behavior to match memfd to see if any issues arise
during tests.
Therefore, expose a tunable that userspace can use to cause ashmem to
ignore requests to deny PROT_READ mappings.
Bug: 111903542
Change-Id: Id4d1770e93a4fd5a6b3be04fd82c67d0eff0200e
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
The final version of ashmem will not support unpinning buffers.
Therefore, to be able to have the ashmem driver behave as close as
possible to its final configuration for testing, add a device node
that can be used to disable unpinning.
This node will make it so that the ashmem shrinker stops running,
and that all unpinning requests are ignored.
Bug: 111903542
Change-Id: I99ae9b1a4e56ee8a5224d647a6f2f9eeeb86ef02
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Adding the following symbols:
- page_swap_info
Bug: 397308736
Change-Id: Ica1c945fd0401c0276d0409ff284fe9debc352a3
Signed-off-by: Marcus Ma <maminghui5@xiaomi.corp-partner.google.com>
We present a specific requirement regarding the memory management
and I/O operations.In our project,we're focused on handling scenarios
where I/O delays are triggered by anoymous pages.During this period,we
need to obtain swap_info_struct according to page to obtain the
corresponding block device id.
Bug: 397308736
Change-Id: Ibc11f412964245658cec60af42cf9486adc96e1a
Signed-off-by: Marcus Ma <maminghui5@xiaomi.corp-partner.google.com>
io_req_prep_async() can import provided buffers, commit the ring state
by giving up on that before, it'll be reimported later if needed.
Bug: 397153671
Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Reported-by: Jacob Soo <jacob.soo@starlabs.sg>
Fixes: c7fb19428d ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a94592ec30)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I0887e3efb936c793feb399d29640522215abc36b
Symbols updated to QCOM abi symbol list for updating DT property:
of_update_property
Bug: 390562181
Change-Id: I1c19c4aeba3ad3a928d4d90bee06952f70dfc194
Signed-off-by: Srinath Pandey <quic_srinpand@quicinc.com>
The change Ifdb1ba19f7147da286ea5e044e84dfb679050a94 ("FROMGIT: usb:
typec: tcpci: Prevent Sink disconnection before vPpsShutdown in SPR
PPS") breaks the KMI. Prevent the breakage by combining the parameters
"requested_vbus_voltage" and "pps_apdo_min_voltage" to a single u32
variable whose value is selected according to the values of parameter
"mode" and parameter "pps_active".
Bug: 388029777
Change-Id: I85872b9490561d248169bc8e008f3d907cc6c3c0
Signed-off-by: Kyle Tso <kyletso@google.com>
The Source can drop its output voltage to the minimum of the requested
PPS APDO voltage range when it is in Current Limit Mode. If this voltage
falls within the range of vPpsShutdown, the Source initiates a Hard
Reset and discharges Vbus. However, currently the Sink may disconnect
before the voltage reaches vPpsShutdown, leading to unexpected behavior.
Prevent premature disconnection by setting the Sink's disconnect
threshold to the minimum vPpsShutdown value. Additionally, consider the
voltage drop due to IR drop when calculating the appropriate threshold.
This ensures a robust and reliable interaction between the Source and
Sink during SPR PPS Current Limit Mode operation.
Fixes: 4288debeaa ("usb: typec: tcpci: Fix up sink disconnect thresholds for PD")
Cc: stable <stable@kernel.org>
Signed-off-by: Kyle Tso <kyletso@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Badhri Jagan Sridharan <badhri@google.com>
Link: https://lore.kernel.org/r/20250114142435.2093857-1-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 388029777
(cherry picked from commit 4d27afbf256028a1f54363367f30efc8854433c3
https: //git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/
usb-next)
Change-Id: Ifdb1ba19f7147da286ea5e044e84dfb679050a94
Signed-off-by: Kyle Tso <kyletso@google.com>
Function f2fs_invalidate_blocks() can process consecutive
blocks at a time, so f2fs_truncate_data_blocks_range() is
optimized to use the new functionality of
f2fs_invalidate_blocks().
Add two variables @blkstart and @blklen, @blkstart records
the first address of the consecutive blocks, and @blkstart
records the number of consecutive blocks.
Bug: 394006856
Change-Id: I219866b6c60a8f23f92aee64429064a04e7282d2
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 120ac1dc322f402544423582234f441d98ea4a6e)
New function can process some consecutive blocks at a time.
Function f2fs_invalidate_blocks()->down_write() and up_write()
are very time-consuming, so if f2fs_invalidate_blocks() can
process consecutive blocks at one time, it will save a lot of time.
Bug: 394006856
Change-Id: I6600c5be55f0261b142285fc45212921da8121fb
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit e53c568f4603e997426712146dce0bc194c1db12)
This function can process some consecutive blocks at a time.
When using update_sit_entry() to release consecutive blocks,
ensure that the consecutive blocks belong to the same segment.
Because after update_sit_entry_for_realese(), @segno is still
in use in update_sit_entry().
Bug: 394006856
Change-Id: Ia6be213c3838351292d1000a52bd54a1090f1137
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 81ffbd224e5f926bf8df01d6107db9c8779f7d57)
No logical changes, just for cleanliness.
Bug: 394006856
Change-Id: I4dddab6be974476879af46cda814dee2223ed21d
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Dylan: Resolved minor conflict in fs/f2fs/segment.c ]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 66baee2b886d72ab6be11a08d4c7897f9612e25b)
New function can process some consecutive blocks at a time.
Bug: 394006856
Change-Id: I6741915ec3fba137ae6295688b6c4f8474411177
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit d217b5cea488c0f644c189f91b636aeefa12deb9)
New function f2fs_invalidate_compress_pages_range() adds the @len
parameter. So it can process some consecutive blocks at a time.
Bug: 394006856
Change-Id: I3b30396567771e1d3608395fa0b7e5e379ddc805
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 3d56fbb1f03f2abf8c806aee14df2f462350dfba)
Below race case can cause data corruption:
Thread A GC thread
- gc_data_segment
- ra_data_block
- locked meta_inode page
- f2fs_inplace_write_data
- invalidate_mapping_pages
: fail to invalidate meta_inode page
due to lock failure or dirty|writeback
status
- f2fs_submit_page_bio
: write last dirty data to old blkaddr
- move_data_block
- load old data from meta_inode page
- f2fs_submit_page_write
: write old data to new blkaddr
Because invalidate_mapping_pages() will skip invalidating page which
has unclear status including locked, dirty, writeback and so on, so
we need to use truncate_inode_pages_range() instead of
invalidate_mapping_pages() to make sure meta_inode page will be dropped.
Fixes: 6aa58d8ad2 ("f2fs: readahead encrypted block during GC")
Fixes: e3b49ea368 ("f2fs: invalidate META_MAPPING before IPU/DIO write")
Bug: 394006856
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 7bb861b0b15b42d58669a0a23ce0b7efb1b6c6e5)
Change-Id: I4ba1f509a7257dd5a6636d22a0908edd91318dd8
Commit "mm,page_alloc,cma: conditionally prefer cma pageblocks for
movable allocations" (1686766493) introduced balancing of movable
allocations between CMA and normal areas.
Commit "ANDROID: cma: redirect page allocation to CMA" (f60c5572d2)
removes it, making allocations go in CMA area first.
1. Reintroduce the condition, so that CMA and normal area are used
in a balanced way(as it used to be), so it prevents depleting of CMA
region;
2. Back-port a command line option(from 6.6), "restrict_cma_redirect",
that can be used if only MOVABLE allocations marked as __GFP_CMA are
eligible to be redirected to CMA region. By default it is true.
The purpose of this change is to keep using CMA for movable allocations,
but at the same time, to have enough free CMA pages for critical system
areas such as modem initialization, GPU initialization and so on.
Bug: 381168812
Signed-off-by: Sebastian Achim <sebastian.1.achim@sony.com>
Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Change-Id: I5fd6d022340715e27754c687189c5ea0e56d9ee6
When allocating guest stage-2 page-table pages at EL2, pKVM can consume
pages from the host-provided kvm_hyp_memcache. As pgtable.c expects
zeroed pages, guest_s2_zalloc_page() actively implements this zeroing
with a PAGE_SIZE memset. Unfortunately, we don't check the page
alignment of the host-provided address before doing so, which could
lead to the memset overrunning the page if the host was malicious.
Fix this by simply force-aligning all kvm_hyp_memcache allocations to
page boundaries.
Bug: 396116221
Fixes: 60dfe093ec ("KVM: arm64: Instantiate guest stage-2 page-tables at EL2")
Reported-by: Ben Simner <ben.simner@cl.cam.ac.uk>
Link: https://lore.kernel.org/r/20250213153615.3642515-1-qperret@google.com
Change-Id: Icd8c79495a28c014aa3b320ca44a03ee46ede2ce
Signed-off-by: Quentin Perret <qperret@google.com>
INFO: 1 function symbol(s) added
'int snd_ctl_remove_id(struct snd_card*, struct snd_ctl_elem_id*)'
snd_ctl_remove_id: remove the control of the given id and release it
audio amp driver need to remove kcontrol when reload dapm control.
Bug: 391517506
Change-Id: Ic8641be7a06d2dfcacd93e1c065f71704690a5bb
Signed-off-by: Injune Choi <injune.choi@samsung.com>
6.1.127 includes commit "hrtimers: Handle CPU state correctly on
hotplug". This commit is supposed to fix "hrtimers: Push pending
hrtimers away from outgoing CPU earlier". However commit "hrtimers:
Push pending hrtimers away from outgoing CPU earlier" was reverted
earlier on 6.1-lts. 6.1.127 incorrectly merged parts of "hrtimers:
Handle CPU state correctly on hotplug", which leads to random reboots.
Let´s revert those parts.
Fixes: 79f1b689da ("Merge 6.1.127 into android14-6.1-lts")
Change-Id: Ieb5e3910d631034eb8d4376c6f5a8e9751cae936
Signed-off-by: Micha Lechner <michalechner92@googlemail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
These symbols are missing from the symbol list and are not available at runtime for unsigned modules:
iio_get_channel_type
Bug: 386822958
Change-Id: I188d191a5a23451f155584e86d66ae56c235ad64
Signed-off-by: Rishi Sikka <rishisikka@google.com>
Adding the following symbols:
- __traceiter_android_trigger_vendor_lmk_kill
- __tracepoint_android_trigger_vendor_lmk_kill
Bug: 385050909
Change-Id: I82e202177d870fe0a55c69172817c260ece41ff6
Signed-off-by: Martin Liu <liumartin@google.com>
This change adds an android_trigger_vendor_lmk_kill() trace event
which can be used by vendor modules to send LMKD kill requests.
LMKD attaches a BPF program to this trace event if it exists and
expects it to be in a particular format. To provide a standardized
definition for this event, we define and export it inside the GKI
even though there are no users. Android vendors can use this event
to emit this trace event inside their modules and experiment with
different kill strategies before upstreaming them into LMKD.
Bug: 385050909
Test: build and check the ftrace event
Change-Id: Ida4d9202675a90d6cc891e242c0621c5386df8cc
Signed-off-by: Martin Liu <liumartin@google.com>
Changes in 6.1.128
ASoC: wm8994: Add depends on MFD core
ASoC: samsung: Add missing selects for MFD_WM8994
seccomp: Stub for !CONFIG_SECCOMP
scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
drm/amd/display: Use HW lock mgr for PSR1
irqchip/sunxi-nmi: Add missing SKIP_WAKE flag
ASoC: samsung: midas_wm1811: Map missing jack kcontrols
ASoC: samsung: Add missing depends on I2C
regmap: detach regmap from dev on regmap_exit
ipv6: Fix soft lockups in fib6_select_path under high next hop churn
softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel
xfs: bump max fsgeom struct version
xfs: hoist freeing of rt data fork extent mappings
xfs: prevent rt growfs when quota is enabled
xfs: rt stubs should return negative errnos when rt disabled
xfs: fix units conversion error in xfs_bmap_del_extent_delay
xfs: make sure maxlen is still congruent with prod when rounding down
xfs: introduce protection for drop nlink
xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space
xfs: allow read IO and FICLONE to run concurrently
xfs: factor out xfs_defer_pending_abort
xfs: abort intent items when recovery intents fail
xfs: only remap the written blocks in xfs_reflink_end_cow_extent
xfs: up(ic_sema) if flushing data device fails
xfs: fix internal error from AGFL exhaustion
xfs: inode recovery does not validate the recovered inode
xfs: clean up dqblk extraction
xfs: dquot recovery does not validate the recovered dquot
xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags
xfs: respect the stable writes flag on the RT device
gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
io_uring: fix waiters missing wake ups
net: sched: fix ets qdisc OOB Indexing
block: fix integer overflow in BLKSECDISCARD
Revert "HID: multitouch: Add support for lenovo Y9000P Touchpad"
vfio/platform: check the bounds of read/write syscalls
ext4: fix access to uninitialised lock in fc replay path
ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_find()
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
wifi: iwlwifi: add a few rate index validity checks
smb: client: fix UAF in async decryption
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
ALSA: usb-audio: Add delay quirk for USB Audio Device
Input: atkbd - map F23 key to support default copilot shortcut
Input: xpad - add unofficial Xbox 360 wireless receiver clone
Input: xpad - add support for wooting two he (arm)
smb: client: fix NULL ptr deref in crypto_aead_setkey()
ASoC: samsung: midas_wm1811: Fix 'Headphone Switch' control creation
drm/v3d: Assign job pointer to NULL before signaling the fence
Linux 6.1.128
Change-Id: Ia1ddc5824b498862a5eb730dd99bd3a76dd16015
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit bb00b1190b which is
commit 2b2fc0be98a828cf33a88a28e9745e8599fb05cf upstream.
It breaks the Android kernel abi and can be brought back in the future
in an abi-safe way if it is really needed.
Bug: 161946584
Change-Id: Ib6926beda39531de3e1a1025b981b560535347a4
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 760f415e08 which is
commit fd4f101edbd9f99567ab2adb1f2169579ede7c13 upstream.
It breaks the Android kernel abi and can be brought back in the future
in an abi-safe way if it is really needed.
Bug: 161946584
Change-Id: I7cd575ae9e9d99f5181fdc297649d7e9f96d56fc
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit a3fdd5f3d6 which is
commit 6eedda01b2bfdcf427b37759e053dc27232f3af1 upstream.
It breaks the Android kernel abi and can be brought back in the future
in an abi-safe way if it is really needed.
Bug: 161946584
Change-Id: I6d0ce9375026632d6e27f9ec83c408fdee344963
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit c91e694619 which is
commit 46841c7053e6d25fb33e0534ef023833bf03e382 upstream.
It breaks the Android kernel abi and can be brought back in the future
in an abi-safe way if it is really needed.
Bug: 161946584
Change-Id: I1b6ed883c57437964fcfbcb2479ecf19f6a167f7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit efec287cba which is
commit eb28fd76c0a08a47b470677c6cef9dd1c60e92d1 upstream.
It breaks the Android kernel abi and can be brought back in the future
in an abi-safe way if it is really needed.
Bug: 161946584
Change-Id: I8f64380576053dce28f0f73ef06ccf0a27469ed1
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Changes in 6.1.127
net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()
bpf: Fix bpf_sk_select_reuseport() memory leak
openvswitch: fix lockup on tx to unregistering netdev with carrier
pktgen: Avoid out-of-bounds access in get_imix_entries
net: add exit_batch_rtnl() method
gtp: use exit_batch_rtnl() method
gtp: Use for_each_netdev_rcu() in gtp_genl_dump_pdp().
gtp: Destroy device along with udp socket's netns dismantle.
nfp: bpf: prevent integer overflow in nfp_bpf_event_output()
net: xilinx: axienet: Fix IRQ coalescing packet count overflow
net/mlx5: Fix RDMA TX steering prio
net/mlx5: Clear port select structure when fail to create
drm/v3d: Ensure job pointer is set to NULL after job completion
hwmon: (tmp513) Fix division of negative numbers
Revert "mtd: spi-nor: core: replace dummy buswidth from addr to data"
i2c: mux: demux-pinctrl: check initial mux selection, too
i2c: rcar: fix NACK handling when being a target
nvmet: propagate npwg topology
mac802154: check local interfaces before deleting sdata list
hfs: Sanity check the root record
fs: fix missing declaration of init_files
kheaders: Ignore silly-rename files
cachefiles: Parse the "secctx" immediately
scsi: ufs: core: Honor runtime/system PM levels if set by host controller drivers
selftests: tc-testing: reduce rshift value
ACPI: resource: acpi_dev_irq_override(): Check DMI match last
iomap: avoid avoid truncating 64-bit offset to 32 bits
poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll()
x86/asm: Make serialize() always_inline
ALSA: hda/realtek: Add support for Ayaneo System using CS35L41 HDA
zram: fix potential UAF of zram table
mptcp: be sure to send ack when mptcp-level window re-opens
selftests: mptcp: avoid spurious errors on disconnect
net: ethernet: xgbe: re-add aneg to supported features in PHY quirks
vsock/virtio: discard packets if the transport changes
vsock/virtio: cancel close work in the destructor
vsock: reset socket state when de-assigning the transport
vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]
filemap: avoid truncating 64-bit offset to 32 bits
fs/proc: fix softlockup in __read_vmcore (part 2)
gpiolib: cdev: Fix use after free in lineinfo_changed_notify
pmdomain: imx8mp-blk-ctrl: add missing loop break condition
irqchip: Plug a OF node reference leak in platform_irqchip_probe()
irqchip/gic-v3: Handle CPU_PM_ENTER_FAILED correctly
irqchip/gic-v3-its: Don't enable interrupts in its_irq_set_vcpu_affinity()
hrtimers: Handle CPU state correctly on hotplug
drm/i915/fb: Relax clear color alignment to 64 bytes
Revert "PCI: Use preserve_config in place of pci_flags"
iio: imu: inv_icm42600: fix spi burst write not supported
iio: imu: inv_icm42600: fix timestamps after suspend if sensor is on
iio: adc: rockchip_saradc: fix information leak in triggered buffer
drm/amd/display: Fix out-of-bounds access in 'dcn21_link_encoder_create'
drm/amdgpu: fix usage slab after free
block: fix uaf for flush rq while iterating tags
Revert "drm/amdgpu: rework resume handling for display (v2)"
RDMA/rxe: Fix the qp flush warnings in req
scsi: sg: Fix slab-use-after-free read in sg_release()
Revert "regmap: detach regmap from dev on regmap_exit"
wifi: ath10k: avoid NULL pointer error during sdio remove
erofs: tidy up EROFS on-disk naming
erofs: handle NONHEAD !delta[1] lclusters gracefully
nfsd: add list_head nf_gc to struct nfsd_file
x86/xen: fix SLS mitigation in xen_hypercall_iret()
net: fix data-races around sk->sk_forward_alloc
Linux 6.1.127
Change-Id: I5621f4287b21d7fbcc2f19d46e02d97afe5f0451
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
If a command is queued to the final usable TRB of a ring segment, the
enqueue pointer is advanced to the subsequent link TRB and no further.
If the command is later aborted, when the abort completion is handled
the dequeue pointer is advanced to the first TRB of the next segment.
If no further commands are queued, xhci_handle_stopped_cmd_ring() sees
the ring pointers unequal and assumes that there is a pending command,
so it calls xhci_mod_cmd_timer() which crashes if cur_cmd was NULL.
Don't attempt timer setup if cur_cmd is NULL. The subsequent doorbell
ring likely is unnecessary too, but it's harmless. Leave it alone.
This is probably Bug 219532, but no confirmation has been received.
The issue has been independently reproduced and confirmed fixed using
a USB MCU programmed to NAK the Status stage of SET_ADDRESS forever.
Everything continued working normally after several prevented crashes.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219532
Fixes: c311e391a7 ("xhci: rework command timeout and cancellation,")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241227120142.1035206-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 393469846
(cherry picked from commit 1e0a19912adb68a4b2b74fd77001c96cd83eb073)
[Sriram: Resolved minor conflict drivers/usb/host/xhci-ring.c ]
Change-Id: I0228fb40e139eb06b9ab68fd3e66ca3444da7077
Signed-off-by: Sriram Dash <quic_sriramd@quicinc.com>
Commit b50a013d33 ("BACKPORT: OPP: Extend support for the opp-level
beyond required-opps") used dev_pm_genpd_set_performance_state()
as a substitute for dev_pm_domain_set_performance_state(), since
introducing dev_pm_domain_set_performance_state() required breaking the
ABI on this branch.
However, directly invoking dev_pm_genpd_set_performance_state() is not
equivalent to dev_pm_domain_set_performance_state(), as the latter
only invokes a PM domain's set_performance_state callback if the device
it is invoked on uses a PM domain, and if that PM domain supports that
callback. If that check fails, then the invocation is simply a nop, and
no error code is returned.
In contrast, dev_pm_genpd_set_performance_state() checks to ensure that the
device uses a PM domain, and that the PM domain is a generic PM domain.
If that is not the case, then the invocation returns an error which is
then propagated up the call chain.
Therefore, fix _set_opp_level() to function as a nop for devices without
PM domains to align it to its original intent.
Bug: 394178898
Fixes: b50a013d33 ("BACKPORT: OPP: Extend support for the opp-level beyond required-opps")
Change-Id: I664d45168404d62aecf59a0afcd2e001d6b7a247
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>