Update whitelist for the symbols used by the unisoc in abi_gki_aarch64_unisoc.
1 variable symbol(s) added
'int percpu_counter_batch'
Bug: 296338673
Change-Id: Idd1d03e9482c5f9c3ea2184066371cd6705ddd0e
Signed-off-by: Yue Shen <yue.shen@unisoc.com>
Currently, zsmalloc has a hierarchy of locks, which includes a pool-level
migrate_lock, and a lock for each size class. We have to obtain both
locks in the hotpath in most cases anyway, except for zs_malloc. This
exception will no longer exist when we introduce a LRU into the zs_pool
for the new writeback functionality - we will need to obtain a pool-level
lock to synchronize LRU handling even in zs_malloc.
In preparation for zsmalloc writeback, consolidate these locks into a
single pool-level lock, which drastically reduces the complexity of
synchronization in zsmalloc.
We have also benchmarked the lock consolidation to see the performance
effect of this change on zram.
First, we ran a synthetic FS workload on a server machine with 36 cores
(same machine for all runs), using
fs_mark -d ../zram1mnt -s 100000 -n 2500 -t 32 -k
before and after for btrfs and ext4 on zram (FS usage is 80%).
Here is the result (unit is file/second):
With lock consolidation (btrfs):
Average: 13520.2, Median: 13531.0, Stddev: 137.5961482019028
Without lock consolidation (btrfs):
Average: 13487.2, Median: 13575.0, Stddev: 309.08283679298665
With lock consolidation (ext4):
Average: 16824.4, Median: 16839.0, Stddev: 89.97388510006668
Without lock consolidation (ext4)
Average: 16958.0, Median: 16986.0, Stddev: 194.7370021336469
As you can see, we observe a 0.3% regression for btrfs, and a 0.9%
regression for ext4. This is a small, barely measurable difference in my
opinion.
For a more realistic scenario, we also tries building the kernel on zram.
Here is the time it takes (in seconds):
With lock consolidation (btrfs):
real
Average: 319.6, Median: 320.0, Stddev: 0.8944271909999159
user
Average: 6894.2, Median: 6895.0, Stddev: 25.528415540334656
sys
Average: 521.4, Median: 522.0, Stddev: 1.51657508881031
Without lock consolidation (btrfs):
real
Average: 319.8, Median: 320.0, Stddev: 0.8366600265340756
user
Average: 6896.6, Median: 6899.0, Stddev: 16.04057355583023
sys
Average: 520.6, Median: 521.0, Stddev: 1.140175425099138
With lock consolidation (ext4):
real
Average: 320.0, Median: 319.0, Stddev: 1.4142135623730951
user
Average: 6896.8, Median: 6878.0, Stddev: 28.621670111997307
sys
Average: 521.2, Median: 521.0, Stddev: 1.7888543819998317
Without lock consolidation (ext4)
real
Average: 319.6, Median: 319.0, Stddev: 0.8944271909999159
user
Average: 6886.2, Median: 6887.0, Stddev: 16.93221781102523
sys
Average: 520.4, Median: 520.0, Stddev: 1.140175425099138
The difference is entirely within the noise of a typical run on zram.
This hardly justifies the complexity of maintaining both the pool lock and
the class lock. In fact, for writeback, we would need to introduce yet
another lock to prevent data races on the pool's LRU, further complicating
the lock handling logic. IMHO, it is just better to collapse all of these
into a single pool-level lock.
Link: https://lkml.kernel.org/r/20221128191616.1261026-4-nphamcs@gmail.com
Change-Id: Ib0eb09d7a69190fc4ffea8f819423c7f66d83379
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit c0547d0b6a)
Bug: 297093100
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Compared to all the other work we're already doing to deliver
an skb to userspace this is very cheap - at worse an extra
call to ktime_get_real() - and very useful.
(and indeed it may even be cheaper if we're running from other hooks)
(background: Android occasionally logs packets which
caused wake from sleep/suspend and we'd like to have
timestamps reliably associated with these events)
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
(cherry picked from commit 1d85594fd3)
Change-Id: Id9b8bc046204c11bf3321e73a67b444777d387dd
Resource manager can now return GH_RM_RESOURCE_NO_VIRQ (-1) instead of 0
as the value to mean "there's no vIRQ for this resource".
Bug: 297100131
Change-Id: I93c4f41b881bfc9e094fa6115df7ba6fcdaa7e6e
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
Add hooks at process waking up and exiting routines so that oems
can control these procedures. One possible benifit is the peak of
system load can be shaved and load can be more smooth when a large
number of threads is killed once upon a time, while a sudden peak
of system load can probably lead to user junk issues.
Bug: 296493318
Change-Id: Ide5f9e63a4f50d6a9e3ffbc9516de9ce48ededef
Signed-off-by: xieliujie <xieliujie@oppo.com>
We need a new vendor hook for two reasons:
1.The position of the previous vendor hook is inappropriate: when the task wakes up from percpu_rwsem_wait, it will enter a long runnable state, which will cause frame loss when the application starts. In order to solve this problem, we need to let the process enter the "vip" queue when it is woken up, so we need to set a flag for the process holding the lock to prove that it is about to hold the lock. The timing of setting the flag should be at the beginning of percpu_down_read/percpu_down_write rather than the end.
2.Most of this long runnable state occurs in the cgroup_threadgroup_rwsem, so we only care cgroup_threadgroup_rwsem, and cgroup_threadgroup_rwsem should be exported. At the same time, one more parameter "struct percpu_rw_semaphore *sem", is needed for this vendor hook.
Bug: 294496814
Change-Id: I5f014cfb68a60c29bbfd21452336e381e31e81b1
Signed-off-by: liuxudong5 <liuxudong5@xiaomi.com>
Commit 63f46b45dd ("ANDROID: fips140: eliminate crypto-fips.a build
step") made all fips140 source files other than fips140-module.c be
compiled in the "fake built-in code" mode. This broke the fail_selftest
and fail_integrity_check module parameters, as they are defined in
fips140-eval-testing.c. Fix this by making fips140-eval-testing.c be
compiled "normally", overriding fips140-defs.h.
Bug: 188620248
Fixes: 63f46b45dd ("ANDROID: fips140: eliminate crypto-fips.a build step")
Change-Id: Iebb70bdcbb698b92a7791fa7307e2325b1a9e4b6
Signed-off-by: Eric Biggers <ebiggers@google.com>
blk_crypto_profile_init() calls lockdep_register_key(), which warns and
does not register if the provided memory is a static object.
blk-crypto-fallback currently has a static blk_crypto_profile and calls
blk_crypto_profile_init() thereupon, resulting in the warning and
failure to register.
Fortunately it is simple enough to use a dynamically allocated profile
and make lockdep function correctly.
Fixes: 2fb48d88e7 ("blk-crypto: use dynamic lock class for blk_crypto_profile::lock")
Cc: stable@vger.kernel.org
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230817141615.15387-1-sweettea-kernel@dorminy.me
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit c984ff1423)
(resolved conflict due to HW-wrapped key support)
Change-Id: I8c889550f97dc3d326930bd5745da6ea64061309
Signed-off-by: Eric Biggers <ebiggers@google.com>
[ Upstream commit 3044b16e7c ]
When u32_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Bug: 296347075
Fixes: de5df63228 ("net: sched: cls_u32 changes to knode must appear atomic to readers")
Reported-by: valis <sec@valis.email>
Reported-by: M A Ramdhan <ramdhan@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-2-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit aab2d095ce)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I1a8381c308cc97cf61d6f95a02992d2c553455c5
[ Upstream commit 6f489a966f ]
The previous commit ebad8e731c ("media: usb: siano: Fix use after
free bugs caused by do_submit_urb") adds cancel_work_sync() in
smsusb_stop_streaming(). But smsusb_stop_streaming() may be called,
even if the work_struct surb->wq has not been initialized. As a result,
the warning will occur. One of the processes that could lead to warning
is shown below:
smsusb_probe()
smsusb_init_device()
if (!dev->in_ep || !dev->out_ep || align < 0) {
smsusb_term_device(intf);
smsusb_stop_streaming()
cancel_work_sync(&dev->surbs[i].wq);
__cancel_work_timer()
__flush_work()
if (WARN_ON(!work->func)) // work->func is null
The log reported by syzbot is shown below:
WARNING: CPU: 0 PID: 897 at kernel/workqueue.c:3066 __flush_work+0x798/0xa80 kernel/workqueue.c:3063
Modules linked in:
CPU: 0 PID: 897 Comm: kworker/0:2 Not tainted 6.2.0-rc1-syzkaller #0
RIP: 0010:__flush_work+0x798/0xa80 kernel/workqueue.c:3066
...
RSP: 0018:ffffc9000464ebf8 EFLAGS: 00010246
RAX: 1ffff11002dbb420 RBX: 0000000000000021 RCX: 1ffffffff204fa4e
RDX: dffffc0000000000 RSI: 0000000000000001 RDI: ffff888016dda0e8
RBP: ffffc9000464ed98 R08: 0000000000000001 R09: ffffffff90253b2f
R10: 0000000000000001 R11: 0000000000000000 R12: ffff888016dda0e8
R13: ffff888016dda0e8 R14: ffff888016dda100 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd4331efe8 CR3: 000000000b48e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__cancel_work_timer+0x315/0x460 kernel/workqueue.c:3160
smsusb_stop_streaming drivers/media/usb/siano/smsusb.c:182 [inline]
smsusb_term_device+0xda/0x2d0 drivers/media/usb/siano/smsusb.c:344
smsusb_init_device+0x400/0x9ce drivers/media/usb/siano/smsusb.c:419
smsusb_probe+0xbbd/0xc55 drivers/media/usb/siano/smsusb.c:567
...
This patch adds check before cancel_work_sync(). If surb->wq has not
been initialized, the cancel_work_sync() will not be executed.
Bug: 295075980
Reported-by: syzbot+27b0b464864741b18b99@syzkaller.appspotmail.com
Fixes: ebad8e731c ("media: usb: siano: Fix use after free bugs caused by do_submit_urb")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8abb53c516)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ie2946408cfde466d0138c23093ec6738b7e51161
This is now implemented with defconfig fragments.
define_common_kernels use the regular
build.config.gki.aarch64 and apply
16k_defconfig on it.
Bug: 286589887
Test: TH
Signed-off-by: Yifan Hong <elsk@google.com>
(cherry picked from https://android-review.googlesource.com/q/commit:03d155e488ab9e5192cb344419e219203b82ea54)
Merged-In: I71d9abd8faa19a2e517b1c9cb82f9b1a0c9b9197
Change-Id: I71d9abd8faa19a2e517b1c9cb82f9b1a0c9b9197
In current design of the PPS APDO selection, TCPM power supply only
accepts the requested voltage which is inside the range of the selected
PPS profile. To extend the flexibility and usability, remove the checks
about the voltage range in current profile. And try to search all PPS
APDOs of the Source that fit the requested voltage.
Also remove some redundant checks in tcpm_pd_build_pps_request.
Signed-off-by: Kyle Tso <kyletso@google.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230731162159.19483-1-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 273608315
(cherry picked from commit 40f362ffa5
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-next)
Change-Id: If7969af6acbda6769f6a3581fcf1d2325a2b3355
Signed-off-by: Kyle Tso <kyletso@google.com>
commit 4270d2b484 upstream.
Do not transition to SNK_UNATTACHED state when receiving vsafe0v event
while in SNK_HARD_RESET_WAIT_VBUS. Ignore VBUS off events as well as
in some platforms VBUS off can be signalled more than once.
[143515.364753] Requesting mux state 1, usb-role 2, orientation 2
[143515.365520] pending state change SNK_HARD_RESET_SINK_OFF -> SNK_HARD_RESET_SINK_ON @ 650 ms [rev3 HARD_RESET]
[143515.632281] CC1: 0 -> 0, CC2: 3 -> 0 [state SNK_HARD_RESET_SINK_OFF, polarity 1, disconnected]
[143515.637214] VBUS on
[143515.664985] VBUS off
[143515.664992] state change SNK_HARD_RESET_SINK_OFF -> SNK_HARD_RESET_WAIT_VBUS [rev3 HARD_RESET]
[143515.665564] VBUS VSAFE0V
[143515.665566] state change SNK_HARD_RESET_WAIT_VBUS -> SNK_UNATTACHED [rev3 HARD_RESET]
Fixes: 28b43d3d74 ("usb: typec: tcpm: Introduce vsafe0v for vbus")
Cc: <stable@vger.kernel.org>
Change-Id: I0279d8abde2ceb42aefea29b4ca21972dbe4065c
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20230712085722.1414743-1-badhri@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 290878685
(cherry picked from commit c2372b1559)
Change-Id: I9cfd4f5533edf7b3a0893a7bef2845448d21b650
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
This reverts commit eb57c31115.
This branch looks clean of WERROR warnings. Let's try to re-enable it.
Fixes: eb57c31115 ("ANDROID: allmodconfig: disable WERROR")
Change-Id: I0106dcd43d7e4b4e20ac768f3faac40285bc837b
(cherry picked from commit d19f8758ae)
Signed-off-by: Lee Jones <joneslee@google.com>
Move FAULT_FLAG_VMA_LOCK check out of handle_pte_fault(). This should
have a significant performance improvement for mmaped files. Write faults
(on read-only shared pages) still take the mmap lock as we do not want to
audit all the implementations of ->pfn_mkwrite() and ->page_mkwrite().
However write-faults on private mappings are handled under the VMA lock.
Link: https://lkml.kernel.org/r/20230724185410.1124082-11-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 88e2667632d43928d3ed50d0163ecd73aaa2d455
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
[surenb: replaced folio_put() with put_page() in wp_page_shared()]
Bug: 293665307
Change-Id: I27ac40bb0f7347083f641e0cfc8ab33e182c4c5b
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Push the VMA_LOCK check down from __handle_mm_fault() to
handle_pte_fault(). Once again, we refuse to call ->huge_fault() with the
VMA lock held, but we will wait for a PMD migration entry with the VMA
lock held, handle NUMA migration and set the accessed bit. We were
already doing this for anonymous VMAs, so it should be safe.
Link: https://lkml.kernel.org/r/20230724185410.1124082-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b7b8f56db92f56ce812e305f84aef0404287b534
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
[surenb: resolved merge conflicts in create_huge_pmd() and wp_huge_pmd()]
Bug: 293665307
Change-Id: I3ec9042b2e39a5caf6b6f3a478bf9ba337012aa4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Postpone checking the VMA_LOCK flag until we've attempted to handle faults
on PUDs. There's a mild upside to this patch in that we'll allocate the
page tables while under the VMA lock rather than the mmap lock, reducing
the hold time on the mmap lock, since the retry will find the page tables
already populated. The real purpose here is to make a commit that shows
we don't call ->huge_fault under the VMA lock. We do now handle setting
the accessed bit on a PUD fault under the VMA lock, but that doesn't seem
likely to be a measurable difference.
Link: https://lkml.kernel.org/r/20230724185410.1124082-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 3c04dd18ba57c6753a7ddc6e6c902550a7ac54d9
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
[surenb: resolved merge conflicts in wp_huge_pud()]
Bug: 293665307
Change-Id: Ife20ed7de6444c0e424e12f9fdcdc8f8ecaed2aa
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Handle a little more of the page fault path outside the mmap sem. The
hugetlb path doesn't need to check whether the VMA is anonymous; the
VM_HUGETLB flag is only set on hugetlbfs VMAs. There should be no
performance change from the previous commit; this is simply a step to ease
bisection of any problems.
Link: https://lkml.kernel.org/r/20230724185410.1124082-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 51db5e8974cafee10b2252efa78f89af7d60cd11
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 293665307
Change-Id: I300c7105fa3530e8eb05862cb3f66b7adac99420
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Remove the TCP layering violation by allowing per-VMA locks on all VMAs.
The fault path will immediately fail in handle_mm_fault(). There may be a
small performance reduction from this patch as a little unnecessary work
will be done on each page fault. See later patches for the improvement.
Link: https://lkml.kernel.org/r/20230724185410.1124082-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 698dcd77360a3ce15dfc6fe55f9b5572ad4c4291
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
[surenb: skip tcp-related changes]
Bug: 293665307
Change-Id: I73d9d1e4f96419d4723a920fc5960e806749c368
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
vma_prepare() is currently the central place where vmas are being locked
before vma_complete() applies changes to them. While this is convenient,
it also obscures vma locking and makes it harder to follow the locking
rules. Move vma locking out of vma_prepare() and take vma locks
explicitly at the locations where vmas are being modified. Move vma
locking and replace it with an assertion inside dup_anon_vma() to further
clarify the locking pattern inside vma_merge().
Link: https://lkml.kernel.org/r/20230804152724.3090321-7-surenb@google.com
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b1985ca5e7e6464d205a98a78cca229224346c21
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
[surenb: skip changes in vma_prepare() which does not exist, skip
changes in vma_merge() since required locks are already in __vma_adjust(),
skip change in dup_anon_vma() since required locks are already in place,
skip unnecessary lock in do_brk_flags()]
Bug: 293665307
Change-Id: I99261aa1db3bec73795e63c333768bc68da8045c
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
While it's not strictly necessary to lock a newly created vma before
adding it into the vma tree (as long as no further changes are performed
to it), it seems like a good policy to lock it and prevent accidental
changes after it becomes visible to the page faults. Lock the vma before
adding it into the vma tree.
Link: https://lkml.kernel.org/r/20230804152724.3090321-6-surenb@google.com
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit c3249c06c48dda30f93e62b57773d5ed409d4f77
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
[surenb: resolved conflicts due to changes in vma_merge() and
__vma_adjust()]
Bug: 293665307
Change-Id: I4ee0d2abcc8a3f45545f470f1bf7f0be728d6f44
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Implicit vma locking inside vm_flags_reset() and vm_flags_reset_once() is
not obvious and makes it hard to understand where vma locking is happening.
Also in some cases (like in dup_userfaultfd()) vma should be locked earlier
than vma_flags modification. To make locking more visible, change these
functions to assert that the vma write lock is taken and explicitly lock
the vma beforehand. Fix userfaultfd functions which should lock the vma
earlier.
Link: https://lkml.kernel.org/r/20230804152724.3090321-5-surenb@google.com
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit f26ee2701ab3ecd771084b44f262bd010accab72
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 293665307
Change-Id: I62f0f25c883588c3ba7a322b3a4929df01413591
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Vma write lock assertion always includes mmap write lock assertion and
additional vma lock checks when per-VMA locks are enabled. Replace
weaker mmap_assert_write_locked() assertions with stronger
vma_assert_write_locked() ones when we are operating on a vma which
is expected to be locked.
Link: https://lkml.kernel.org/r/20230804152724.3090321-4-surenb@google.com
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 928a31b91cf64aa99a8999dcd66bec0ad02f64ef
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 293665307
Change-Id: I861db0510612f571f2ca44e0a9d7e01274d4eb36
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mmap
lock is held write-locked by the caller, and the caller is responsible for
dropping the mmap lock at a later point (which will also release the VMA
locks).
Calling vma_end_write_all() here is dangerous because the caller might
have write-locked a VMA with the expectation that it will stay
write-locked until the mmap_lock is released, as usual.
This _almost_ becomes a problem in the following scenario:
An anonymous VMA A and an SGX VMA B are mapped adjacent to each other.
Userspace calls munmap() on a range starting at the start address of A and
ending in the middle of B.
Hypothetical call graph with additional notes in brackets:
do_vmi_align_munmap
[begin first for_each_vma_range loop]
vma_start_write [on VMA A]
vma_mark_detached [on VMA A]
__split_vma [on VMA B]
sgx_vma_open [== new->vm_ops->open]
sgx_encl_mm_add
__mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN]
mm_take_all_locks
mm_drop_all_locks
vma_end_write_all [drops VMA lock taken on VMA A before]
vma_start_write [on VMA B]
vma_mark_detached [on VMA B]
[end first for_each_vma_range loop]
vma_iter_clear_gfp [removes VMAs from maple tree]
mmap_write_downgrade
unmap_region
mmap_read_unlock
In this hypothetical scenario, while do_vmi_align_munmap() thinks it still
holds a VMA write lock on VMA A, the VMA write lock has actually been
invalidated inside __split_vma().
The call from sgx_encl_mm_add() to __mmu_notifier_register() can't
actually happen here, as far as I understand, because we are duplicating
an existing SGX VMA, but sgx_encl_mm_add() only calls
__mmu_notifier_register() for the first SGX VMA created in a given
process. So this could only happen in fork(), not on munmap(). But in my
view it is just pure luck that this can't happen.
Also, we wouldn't actually have any bad consequences from this in
do_vmi_align_munmap(), because by the time the bug drops the lock on VMA
A, we've already marked VMA A as detached, which makes it completely
ineligible for any VMA-locked page faults. But again, that's just pure
luck.
So remove the vma_end_write_all(), so that VMA write locks are only ever
released on mmap_write_unlock() or mmap_write_downgrade().
Also add comments to document the locking rules established by this patch.
Link: https://lkml.kernel.org/r/20230720193436.454247-1-jannh@google.com
Fixes: eeff9a5d47 ("mm/mmap: prevent pagefault handler from racing with mmu_notifier registration")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 28ed252b44fb2f1efaef1287eea267d54e79f7d5
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 293665307
Change-Id: Ic0b28229d175e3125de1ef274282fbf43b556db7
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.
A simple running the ebizzy benchmark on Lichee Pi 4A shows that
PER_VMA_LOCK can improve the ebizzy benchmark by about 32.68%. In
theory, the more CPUs, the bigger improvement, but I don't have any
HW platform which has more than 4 CPUs.
This is the riscv variant of "x86/mm: try VMA lock-based page fault
handling first".
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Link: https://lore.kernel.org/r/20230523165942.2630-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
(cherry picked from commit 648321fa0d)
Bug: 293665307
Change-Id: I59b63add96645d2483f87c2b680d4a7afa86f7b6
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
walk_page_range() and friends often operate under write-locked mmap_lock.
With introduction of vma locks, the vmas have to be locked as well during
such walks to prevent concurrent page faults in these areas. Add an
additional member to mm_walk_ops to indicate locking requirements for the
walk.
The change ensures that page walks which prevent concurrent page faults
by write-locking mmap_lock, operate correctly after introduction of
per-vma locks. With per-vma locks page faults can be handled under vma
lock without taking mmap_lock at all, so write locking mmap_lock would
not stop them. The change ensures vmas are properly locked during such
walks.
A sample issue this solves is do_mbind() performing queue_pages_range()
to queue pages for migration. Without this change a concurrent page
can be faulted into the area and be left out of migration.
Link: https://lkml.kernel.org/r/20230804152724.3090321-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Suggested-by: Jann Horn <jannh@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 2ebc368f59eedcef0de7c832fe1d62935cd3a7ff
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
[surenb: changed locking in break_ksm since it's done differently,
skipped the change in the missing __ksm_del_vma(), skipped the change in
the missing walk_page_range_vma(), removed unused local variables]
Bug: 293665307
Change-Id: Iede9eaa950ea59a268a2e74a8d3022162f0bbd80
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the
VMA that is being expanded to cover the area previously occupied by
another VMA. This currently happens while `dst` is not write-locked.
This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as
the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent
page faults can happen on `dst` under the per-VMA lock. This is already
icky in itself, since such page faults can now install pages into `dst`
that are attached to an `anon_vma` that is not yet tied back to the
`anon_vma` with an `anon_vma_chain`. But if `anon_vma_clone()` fails due
to an out-of-memory error, things get much worse: `anon_vma_clone()` then
reverts `dst->anon_vma` back to NULL, and `dst` remains completely
unconnected to the `anon_vma`, even though we can have pages in the area
covered by `dst` that point to the `anon_vma`.
This means the `anon_vma` of such pages can be freed while the pages are
still mapped into userspace, which leads to UAF when a helper like
folio_lock_anon_vma_read() tries to look up the anon_vma of such a page.
This theoretically is a security bug, but I believe it is really hard to
actually trigger as an unprivileged user because it requires that you can
make an order-0 GFP_KERNEL allocation fail, and the page allocator tries
pretty hard to prevent that.
I think doing the vma_start_write() call inside dup_anon_vma() is the most
straightforward fix for now.
For a kernel-assisted reproducer, see the notes section of the patch mail.
Link: https://lkml.kernel.org/r/20230721034643.616851-1-jannh@google.com
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d8ab9f7b64)
[surenb: since dup_anon_vma() is missing, add vma_start_write() directly
before anon_vma is assigned]
Bug: 293665307
Change-Id: I1b44e6278e464157e666cc5dbdb0fcc29bcf665e
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
mm->mm_lock_seq effectively functions as a read/write lock; therefore it
must be used with acquire/release semantics.
A specific example is the interaction between userfaultfd_register() and
lock_vma_under_rcu().
userfaultfd_register() does the following from the point where it changes
a VMA's flags to the point where concurrent readers are permitted again
(in a simple scenario where only a single private VMA is accessed and no
merging/splitting is involved):
userfaultfd_register
userfaultfd_set_vm_flags
vm_flags_reset
vma_start_write
down_write(&vma->vm_lock->lock)
vma->vm_lock_seq = mm_lock_seq [marks VMA as busy]
up_write(&vma->vm_lock->lock)
vm_flags_init
[sets VM_UFFD_* in __vm_flags]
vma->vm_userfaultfd_ctx.ctx = ctx
mmap_write_unlock
vma_end_write_all
WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA]
There are no memory barriers in between the __vm_flags update and the
mm->mm_lock_seq update that unlocks the VMA, so the unlock can be
reordered to above the `vm_flags_init()` call, which means from the
perspective of a concurrent reader, a VMA can be marked as a userfaultfd
VMA while it is not VMA-locked. That's bad, we definitely need a
store-release for the unlock operation.
The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly
fine because all accesses to vma->vm_lock_seq that matter are always
protected by the VMA lock. There is a racy read in vma_start_read()
though that can tolerate false-positives, so we should be using
WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN).
On the other side, lock_vma_under_rcu() works as follows in the relevant
region for locking and userfaultfd check:
lock_vma_under_rcu
vma_start_read
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout]
down_read_trylock(&vma->vm_lock->lock)
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check]
userfaultfd_armed
checks vma->vm_flags & __VM_UFFD_FLAGS
Here, the interesting aspect is how far down the mm->mm_lock_seq read can
be reordered - if this read is reordered down below the vma->vm_flags
access, this could cause lock_vma_under_rcu() to partly operate on
information that was read while the VMA was supposed to be locked. To
prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we
need to read it with a load-acquire.
Some of the comment wording is based on suggestions by Suren.
BACKPORT WARNING: One of the functions changed by this patch (which I've
written against Linus' tree) is vma_try_start_write(), but this function
no longer exists in mm/mm-everything. I don't know whether the merged
version of this patch will be ordered before or after the patch that
removes vma_try_start_write(). If you're backporting this patch to a tree
with vma_try_start_write(), make sure this patch changes that function.
Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b1f02b9575)
Bug: 293665307
Change-Id: Ifbf30a8ee7211f9c7fe26b923ca33ffde68b6a7b
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Some NXP processor using chipidea IP has a bug when frame babble is
detected.
As per 4.15.1.1.1 Serial Bus Babble:
A babble condition also exists if IN transaction is in progress at
High-speed SOF2 point. This is called frame babble. The host controller
must disable the port to which the frame babble is detected.
The USB controller has disabled the port (PE cleared) and has asserted
USBERRINT when frame babble is detected, but PEC is not asserted.
Therefore, the SW isn't aware that port has been disabled. Then the
SW keeps sending packets to this port, but all of the transfers will
fail.
This workaround will firstly assert PCD by SW when USBERRINT is detected
and then judge whether port change has really occurred or not by polling
roothub status. Because the PEC doesn't get asserted in our case, this
patch will also assert it by SW when specific conditions are satisfied.
Bug: 295046582
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Link: https://lore.kernel.org/r/20230809024432.535160-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dda4b60ed7https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-next)
[JD: replaced has_ci_pec_bug with existing has_fsl_port_bug to avoid abi breakage]
Change-Id: I7d36cf656efda2dd46c0ddcca252b3de6ea434ee
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>