Commit Graph

1149939 Commits

Author SHA1 Message Date
Peifeng Li
c3d26e2b5a ANDROID: vendor_hooks: Add hooks for lookaround
Add hooks for support lookaround in memory reclamation.

- android_vh_test_clear_look_around_ref
- android_vh_check_folio_look_around_ref
- android_vh_look_around_migrate_folio
- android_vh_look_around

Bug: 292051411

Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I9a606ae71d2f1303df3b02403b30bc8fdc9d06dd
(cherry picked from commit f50f24e781)
[huzhanyuan: changed page to folio where appropriate]
2023-08-02 21:57:15 +00:00
Giuliano Procida
29e2f3e3d1 ANDROID: ABI: Update STG ABI to format version 2
If you have trouble reading this new file format, please refresh your
prebuilt version of STG with repo sync.

Bug: 294213765
Change-Id: I4d7ee716231956c5f4da1343cc0db5170aaaa3b1
Signed-off-by: Giuliano Procida <gprocida@google.com>
2023-08-02 18:33:42 +00:00
Jindong Yue
3bd3d13701 ANDROID: ABI: Update symbol list for imx
2 function symbol(s) added
  'bool kthread_freezable_should_stop(bool*)'
  'int v4l2_enum_dv_timings_cap(struct v4l2_enum_dv_timings*, const struct v4l2_dv_timings_cap*, v4l2_check_dv_timings_fnc*, void*)'

Bug: 283014063
Change-Id: Ib4f8f9c67277501dcaa2fa5d8f2867d5fa670de3
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
2023-08-02 14:56:10 +00:00
sunshijie
ad0b008167 FROMGIT: erofs: fix wrong primary bvec selection on deduplicated extents
When handling deduplicated compressed data, there can be multiple
decompressed extents pointing to the same compressed data in one shot.

In such cases, the bvecs which belong to the longest extent will be
selected as the primary bvecs for real decompressors to decode and the
other duplicated bvecs will be directly copied from the primary bvecs.

Previously, only relative offsets of the longest extent were checked to
decompress the primary bvecs.  On rare occasions, it can be incorrect
if there are several extents with the same start relative offset.
As a result, some short bvecs could be selected for decompression and
then cause data corruption.

For example, as Shijie Sun reported off-list, considering the following
extents of a file:
117:   903345..  915250 |   11905 :     385024..    389120 |    4096
...
119:   919729..  930323 |   10594 :     385024..    389120 |    4096
...
124:   968881..  980786 |   11905 :     385024..    389120 |    4096

The start relative offset is the same: 2225, but extent 119 (919729..
930323) is shorter than the others.

Let's restrict the bvec length in addition to the start offset if bvecs
are not full.

Reported-by: Shijie Sun <sunshijie@xiaomi.com>
Fixes: 5c2a64252c ("erofs: introduce partial-referenced pclusters")
Tested-by Shijie Sun <sunshijie@xiaomi.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230719065459.60083-1-hsiangkao@linux.alibaba.com
(cherry picked from commit 7d15c91a75aae55767f368e8abbabd7cedf4ec94
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Bug: 293245292
Change-Id: Ic8ded9b2d3592ffd0863f4f0d2ac4ae6a1821a1b
Signed-off-by: sunshijie <sunshijie@xiaomi.corp-partner.google.com>
2023-08-01 21:50:12 +00:00
Ming Qian
126ef64cba UPSTREAM: media: Add ABGR64_12 video format
ABGR64_12 is a reversed RGB format with alpha channel last,
12 bits per component like ABGR32,
expanded to 16bits.
Data in the 12 high bits, zeros in the 4 low bits,
arranged in little endian order.

Bug: 293213303
Change-Id: Idc4e1100c9e2134a48b594151e3398f6436b010d
(cherry picked from commit 302b988ca0)
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
2023-08-01 21:45:37 +00:00
Ming Qian
86e2e8fd05 BACKPORT: media: Add BGR48_12 video format
BGR48_12 is a reversed RGB format with 12 bits per component like BGR24,
expanded to 16bits.
Data in the 12 high bits, zeros in the 4 low bits,
arranged in little endian order.

Bug: 293213303
Change-Id: I27d14a33c8e2b4847a63ea05b285786766949ebf
(cherry picked from commit da0b7a400e)
[Jindong: Fixed conflicts in .rst file and v4l2-ioctl.c]
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
2023-08-01 21:45:37 +00:00
Ming Qian
892293272c UPSTREAM: media: Add YUV48_12 video format
YUV48_12 is a YUV format with 12-bits per component like YUV24,
expanded to 16bits.
Data in the 12 high bits, zeros in the 4 low bits,
arranged in little endian order.

[hverkuil: replaced a . by ,]

Bug: 293213303
Change-Id: I12e6f02b99918a429224320da2127d6b4d777584
(cherry picked from commit 99c9549677)
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
2023-08-01 21:45:37 +00:00
Ming Qian
b2cf7e4268 UPSTREAM: media: Add Y212 v4l2 format info
Y212 is a YUV format with 12-bits per component like YUYV,
expanded to 16bits.
Data in the 12 high bits, zeros in the 4 low bits,
arranged in little endian order.

Add the missing v4l2 foramt info of Y212

Bug: 293213303
Change-Id: Ibdf9bb3a3f1eb895da9eca52d115e08b656b5153
(cherry picked from commit a178dd3bbe)
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
2023-08-01 21:45:37 +00:00
Tomi Valkeinen
0f3f7a21af UPSTREAM: media: Add Y210, Y212 and Y216 formats
Add Y210, Y212 and Y216 formats.

Bug: 293213303
Change-Id: I2d580dd82481f6a1364dfcedfd918e82d25ac211
(cherry picked from commit 0dc1d7a79a)
Signed-off-by: Tomi Valkeinen <tomi.valkeinen+renesas@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Acked-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
2023-08-01 21:45:37 +00:00
Ming Qian
ca7b45b128 UPSTREAM: media: Add Y012 video format
Y012 is a luma-only formats with 12-bits per pixel,
expanded to 16bits.
Data in the 12 high bits, zeros in the 4 low bits,
arranged in little endian order.

Bug: 293213303
Change-Id: I1a8f73162932e0760aabbe44525d7c74ace9f7bd
(cherry picked from commit a490ea6844)
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
2023-08-01 21:45:37 +00:00
Ming Qian
343b85ecad UPSTREAM: media: Add P012 and P012M video format
P012 is a YUV format with 12-bits per component with interleaved UV,
like NV12, expanded to 16 bits.
Data in the 12 high bits, zeros in the 4 low bits,
arranged in little endian order.
And P012M has two non contiguous planes.

Bug: 293213303
Change-Id: I1fbfa7c445bc682766f479cca07eb8cb16cbb44f
(cherry picked from commit aa10804042)
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
2023-08-01 21:45:37 +00:00
Ramji Jiyani
7beed73af0 ANDROID: GKI: Create symbol files in include/config
Create input symbol files to generate GKI modules header
under include/config. By placing files in this generated
directory, the default filters that ignore certain files
will work without any special handling required, and they
will also be available to inspect after the build to inspect
for the debugging purposes.

abi_gki_protected_exports: Input for gki_module_protected_exports.h
From :- ${objtree}/abi_gki_protected_exports
To :- include/config/abi_gki_protected_exports

all_kmi_symbols: Input for gki_module_unprotected.h
- Rename to abi_gki_kmi_symbols
From :- all_kmi_symbols
To :- include/config/abi_gki_kmi_symbols

Bug: 286529877
Test: TH
Test: Manual verification of the generated files
Change-Id: Iafa10631e7712a8e1e87a2f56cfd614de6b1053a
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
2023-08-01 21:21:29 +00:00
Paul Lawrence
295e779e8f ANDROID: fuse-bpf: Use stored bpf for create_open
create_open would always take its parent directory's bpf for the created
object. Modify to use the bpf stored in fuse_dentry which is set by
lookup.

Bug: 291705489
Test: fuse_test passes, adb push file /sdcard/Android/data works
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: I0a1ea2a291a8fdf67923f1827176b2ea96bd4c2d
2023-07-31 23:09:25 +00:00
Paul Lawrence
74d9daa59a ANDROID: fuse-bpf: Add bpf to negative fuse_dentry
Store the results of a negative lookup in the fuse_dentry so later
opcodes can use them to create files

Bug: 291705489
Test: fuse_test passes
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: I725e714a1d6ce43f24431d07c24e96349ef1a55c
2023-07-31 23:09:25 +00:00
Paul Lawrence
6aef06abba ANDROID: fuse-bpf: Check inode not null
fuse_iget_backing returns an inode or null, not a ERR_PTR. So check it's
not NULL

Also make sure we put the inode if d_splice_alias fails

Bug: 293349757
Test: fuse_test runs
Signed_off_by: Paul Lawrence <paullawrence@google.com>

Change-Id: I1eadad32f80bab6730e461412b4b7ab4d6c56bf2
2023-07-31 23:09:25 +00:00
Paul Lawrence
4bbda90bd8 ANDROID: fuse-bpf: Fix flock test compile error
Bug: 293161755
Test: fuse_test compiles
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: I249672bab85966e20a26018f65f135fe15c6eff5
2023-07-31 23:09:25 +00:00
Daniel Rosenberg
84ac22a0d3 ANDROID: fuse-bpf: Add partial ioctl support
This adds passthrough only support for ioctls with fuse-bpf.
compat_ioctls will return -ENOTTY.

Bug: 279519292
Test: F2fsMiscTest#testAtomicWrite
Change-Id: Ia3052e465d87dc1d15ae13955fba8a7f93bc387b
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2023-07-31 23:09:25 +00:00
xieliujie
e341d2312c ANDROID: ABI: Update oplus symbol list
3 function symbol(s) added
  'int __traceiter_android_rvh_rtmutex_force_update(void*, struct task_struct*, struct task_struct*, int*)'
  'int __traceiter_android_vh_rtmutex_waiter_prio(void*, struct task_struct*, int*)'
  'int __traceiter_android_vh_task_blocks_on_rtmutex(void*, struct rt_mutex_base*, struct rt_mutex_waiter*, struct task_struct*, struct ww_acquire_ctx*, unsigned int*)'

3 variable symbol(s) added
  'struct tracepoint __tracepoint_android_rvh_rtmutex_force_update'
  'struct tracepoint __tracepoint_android_vh_rtmutex_waiter_prio'
  'struct tracepoint __tracepoint_android_vh_task_blocks_on_rtmutex'

Bug: 290585456
Change-Id: I4af3d1c8df44822b7f5fd5d5682e65d7c6c4dcc3
Signed-off-by: xieliujie <xieliujie@oppo.com>
2023-07-31 22:47:04 +00:00
Jann Horn
f5c707dc65 UPSTREAM: mm/mempolicy: Take VMA lock before replacing policy
mbind() calls down into vma_replace_policy() without taking the per-VMA
locks, replaces the VMA's vma->vm_policy pointer, and frees the old
policy.  That's bad; a concurrent page fault might still be using the
old policy (in vma_alloc_folio()), resulting in use-after-free.

Normally this will manifest as a use-after-free read first, but it can
result in memory corruption, including because vma_alloc_folio() can
call mpol_cond_put() on the freed policy, which conditionally changes
the policy's refcount member.

This bug is specific to CONFIG_NUMA, but it does also affect non-NUMA
systems as long as the kernel was built with CONFIG_NUMA.

Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 293665307
(cherry picked from commit 6c21e066f9)
Change-Id: I2e3a4ee8bad97457ee3e127694f0609e7a240a2f
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-29 07:25:37 +00:00
Jann Horn
890b1aabb1 BACKPORT: mm: lock_vma_under_rcu() must check vma->anon_vma under vma lock
lock_vma_under_rcu() tries to guarantee that __anon_vma_prepare() can't
be called in the VMA-locked page fault path by ensuring that
vma->anon_vma is set.

However, this check happens before the VMA is locked, which means a
concurrent move_vma() can concurrently call unlink_anon_vmas(), which
disassociates the VMA's anon_vma.

This means we can get UAF in the following scenario:

  THREAD 1                   THREAD 2
  ========                   ========
  <page fault>
    lock_vma_under_rcu()
      rcu_read_lock()
      mas_walk()
      check vma->anon_vma

                             mremap() syscall
                               move_vma()
                                vma_start_write()
                                 unlink_anon_vmas()
                             <syscall end>

    handle_mm_fault()
      __handle_mm_fault()
        handle_pte_fault()
          do_pte_missing()
            do_anonymous_page()
              anon_vma_prepare()
                __anon_vma_prepare()
                  find_mergeable_anon_vma()
                    mas_walk() [looks up VMA X]

                             munmap() syscall (deletes VMA X)

                    reusable_anon_vma() [called on freed VMA X]

This is a security bug if you can hit it, although an attacker would
have to win two races at once where the first race window is only a few
instructions wide.

This patch is based on some previous discussion with Linus Torvalds on
the security list.

Cc: stable@vger.kernel.org
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 293665307
(cherry picked from commit 657b514695)
[surenb: removed vma_is_tcp() call not present in 6.1]
Change-Id: I4bd91e1db337ff35eb7c1d436f4372944556dd7d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-29 06:57:25 +00:00
Lorenzo Pieralisi
d3b37a712a BACKPORT: FROMGIT: irqchip/gic-v3: Workaround for GIC-700 erratum 2941627
GIC700 erratum 2941627 may cause GIC-700 missing SPIs wake
requests when SPIs are deactivated while targeting a
sleeping CPU - ie a CPU for which the redistributor:

GICR_WAKER.ProcessorSleep == 1

This runtime situation can happen if an SPI that has been
activated on a core is retargeted to a different core, it
becomes pending and the target core subsequently enters a
power state quiescing the respective redistributor.

When this situation is hit, the de-activation carried out
on the core that activated the SPI (through either ICC_EOIR1_EL1
or ICC_DIR_EL1 register writes) does not trigger a wake
requests for the sleeping GIC redistributor even if the SPI
is pending.

Work around the erratum by de-activating the SPI using the
redistributor GICD_ICACTIVER register if the runtime
conditions require it (ie the IRQ was retargeted between
activation and de-activation).

Bug: 292459437
Change-Id: Ide915b8c925a631a7fc9ccebca19d9175def162e
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704155034.148262-1-lpieralisi@kernel.org
(cherry picked from commit 6fe5c68ee6 https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/irqchip-fixes)
Signed-off-by: Carlos Galo <carlosgalo@google.com>
2023-07-27 19:40:08 +00:00
wangshuai12
a89e2cbbc0 ANDROID: GKI: update xiaomi symbol list
1 function symbol(s) added
  'int __blk_mq_debugfs_rq_show(struct seq_file*, struct request*)'

Bug: 290730657
Change-Id: Ib3711e9e875e3d6ccc809a87c607fae149159a58
Signed-off-by: wangshuai12 <wangshuai12@xiaomi.corp-partner.google.com>
2023-07-27 15:11:16 +00:00
Hugh Dickins
371f8d901a UPSTREAM: mm: lock newly mapped VMA with corrected ordering
Lockdep is certainly right to complain about

  (&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
                 but task is already holding lock:
  (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db

Invert those to the usual ordering.

Fixes: 33313a747e ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1c7873e336)
Change-Id: I85f9cfb6ee8f3d9fefda5518c5637a7dff64bac3
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 12:19:09 +00:00
Suren Baghdasaryan
0d9960403c UPSTREAM: fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().

We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable.  For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.

A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.

Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults.  But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.

This fix can potentially regress some fork-heavy workloads.  Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression.  If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance.  Further
optimizations are possible if this regression proves to be problematic.

Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea0 ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit fb49c45532)
Change-Id: Ic5aa9dc51a888b5b0319ec4ec6d2941424573ca0
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 12:19:09 +00:00
Suren Baghdasaryan
e3601b25ae UPSTREAM: mm: lock newly mapped VMA which can be modified after it becomes visible
mmap_region adds a newly created VMA into VMA tree and might modify it
afterwards before dropping the mmap_lock.  This poses a problem for page
faults handled under per-VMA locks because they don't take the mmap_lock
and can stumble on this VMA while it's still being modified.  Currently
this does not pose a problem since post-addition modifications are done
only for file-backed VMAs, which are not handled under per-VMA lock.
However, once support for handling file-backed page faults with per-VMA
locks is added, this will become a race.

Fix this by write-locking the VMA before inserting it into the VMA tree.
Other places where a new VMA is added into VMA tree do not modify it
after the insertion, so do not need the same locking.

Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 33313a747e)
Change-Id: I3bb6a7bc8dd579e11f9c18cbc8e4a6e7279bbfb2
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 12:19:09 +00:00
Suren Baghdasaryan
05f7c7fe72 UPSTREAM: mm: lock a vma before stack expansion
With recent changes necessitating mmap_lock to be held for write while
expanding a stack, per-VMA locks should follow the same rules and be
write-locked to prevent page faults into the VMA being expanded. Add
the necessary locking.

Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c137381f71)
Change-Id: I3e6a8c89c1fb7c0669e1232176bb04ea6b09bc0a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 12:19:09 +00:00
Greg Kroah-Hartman
c0ba567af1 ANDROID: GKI: bring back find_extend_vma()
In commit 8d7071af89 ("mm: always expand the stack with the mmap write
lock held"), find_extend_vma() was no longer being used in the tree, so
it was removed.  Unfortunately some GKI external module is using this,
so bring it back to allow things to continue to work.

Bug: 161946584
Fixes: 8d7071af89 ("mm: always expand the stack with the mmap write lock held")
Change-Id: I6f1fb1fd8193625fe3dac0bbc5b0aff653b3d879
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 11:47:21 +00:00
Linus Torvalds
188ce9572f BACKPORT: mm: always expand the stack with the mmap write lock held
commit 8d7071af89 upstream

This finishes the job of always holding the mmap write lock when
extending the user stack vma, and removes the 'write_locked' argument
from the vm helper functions again.

For some cases, we just avoid expanding the stack at all: drivers and
page pinning really shouldn't be extending any stacks.  Let's see if any
strange users really wanted that.

It's worth noting that architectures that weren't converted to the new
lock_mm_and_find_vma() helper function are left using the legacy
"expand_stack()" function, but it has been changed to drop the mmap_lock
and take it for writing while expanding the vma.  This makes it fairly
straightforward to convert the remaining architectures.

As a result of dropping and re-taking the lock, the calling conventions
for this function have also changed, since the old vma may no longer be
valid.  So it will now return the new vma if successful, and NULL - and
the lock dropped - if the area could not be extended.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[6.1: Patch drivers/iommu/io-pgfault.c instead]
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[surenb: change in io-pgfault.c was done in iommu-sva.c]
Change-Id: Icdcdded08d7ad4eda8fae1120a3c8b3d957516c1
(cherry picked from commit 8d7071af89)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 11:47:21 +00:00
Linus Torvalds
74efdc0966 BACKPORT: execve: expand new process stack manually ahead of time
commit f313c51d26 upstream.

This is a small step towards a model where GUP itself would not expand
the stack, and any user that needs GUP to not look up existing mappings,
but actually expand on them, would have to do so manually before-hand,
and with the mm lock held for writing.

It turns out that execve() already did almost exactly that, except it
didn't take the mm lock at all (it's single-threaded so no locking
technically needed, but it could cause lockdep errors).  And it only did
it for the CONFIG_STACK_GROWSUP case, since in that case GUP has
obviously never expanded the stack downwards.

So just make that CONFIG_STACK_GROWSUP case do the right thing with
locking, and enable it generally.  This will eventually help GUP, and in
the meantime avoids a special case and the lockdep issue.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[6.1 Minor context from still having FOLL_FORCE flags set]
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I24c652740dcfc674b0aef8e09ef72f09ad61254c
(cherry picked from commit f313c51d26)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 11:47:21 +00:00
jianzhou
c8ad906849 ANDROID: abi_gki_aarch64_qcom: ufshcd_mcq_poll_cqe_lock
Symbols added:
   ufshcd_mcq_poll_cqe_lock

Bug: 292490611
Change-Id: I0e26f360c56d302f9f980c9d43b7a3cc80d3a616
Signed-off-by: jianzhou <quic_jianzhou@quicinc.com>
2023-07-27 10:40:33 +00:00
Liam R. Howlett
1afccd4255 UPSTREAM: mm: make find_extend_vma() fail if write lock not held
commit f440fa1ac9 upstream.

Make calls to extend_vma() and find_extend_vma() fail if the write lock
is required.

To avoid making this a flag-day event, this still allows the old
read-locking case for the trivial situations, and passes in a flag to
say "is it write-locked".  That way write-lockers can say "yes, I'm
being careful", and legacy users will continue to work in all the common
cases until they have been fully converted to the new world order.

Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: If12d2d68429b6d71393f02d5ed7e6939c3cd5405
(cherry picked from commit f440fa1ac9)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:05:44 +00:00
Linus Torvalds
4087cac574 UPSTREAM: powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma()
commit 2cd76c50d0 upstream.

This is one of the simple cases, except there's no pt_regs pointer.
Which is fine, as lock_mm_and_find_vma() is set up to work fine with a
NULL pt_regs.

Powerpc already enabled LOCK_MM_AND_FIND_VMA for the main CPU faulting,
so we can just use the helper without any extra work.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I5736f498b2f45625e46554520d3aeb679e680907
(cherry picked from commit 2cd76c50d0)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:05:22 +00:00
Linus Torvalds
6c33246824 UPSTREAM: mm/fault: convert remaining simple cases to lock_mm_and_find_vma()
commit a050ba1e74 upstream.

This does the simple pattern conversion of alpha, arc, csky, hexagon,
loongarch, nios2, sh, sparc32, and xtensa to the lock_mm_and_find_vma()
helper.  They all have the regular fault handling pattern without odd
special cases.

The remaining architectures all have something that keeps us from a
straightforward conversion: ia64 and parisc have stacks that can grow
both up as well as down (and ia64 has special address region checks).

And m68k, microblaze, openrisc, sparc64, and um end up having extra
rules about only expanding the stack down a limited amount below the
user space stack pointer.  That is something that x86 used to do too
(long long ago), and it probably could just be skipped, but it still
makes the conversion less than trivial.

Note that this conversion was done manually and with the exception of
alpha without any build testing, because I have a fairly limited cross-
building environment.  The cases are all simple, and I went through the
changes several times, but...

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I93e4ce3cb077329e202699a16db576be3a40285b
(cherry picked from commit a050ba1e74)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:04:57 +00:00
Ben Hutchings
add0a1ea04 UPSTREAM: arm/mm: Convert to using lock_mm_and_find_vma()
commit 8b35ca3e45 upstream.

arm has an additional check for address < FIRST_USER_ADDRESS before
expanding the stack.  Since FIRST_USER_ADDRESS is defined everywhere
(generally as 0), move that check to the generic expand_downwards().

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ie1090f587090ef16de4bce224bbc52334bfe78fa
(cherry picked from commit 8b35ca3e45)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:04:33 +00:00
Ben Hutchings
9f136450af UPSTREAM: riscv/mm: Convert to using lock_mm_and_find_vma()
commit 7267ef7b0b upstream.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[6.1: Kconfig context]
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I601c5e4625e0357be7043026359aa85e5a63ade1
(cherry picked from commit 7267ef7b0b)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:03:27 +00:00
Ben Hutchings
053053fc68 UPSTREAM: mips/mm: Convert to using lock_mm_and_find_vma()
commit 4bce37a68f upstream.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ie1ec8bd98c52086790adcd691370a76d135a333e
(cherry picked from commit 4bce37a68f)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:03:08 +00:00
Michael Ellerman
9cdce804c0 UPSTREAM: powerpc/mm: Convert to using lock_mm_and_find_vma()
commit e6fe228c4f upstream.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ifeaee70ad1bdb9e583aaba137526cc49e2ecf8be
(cherry picked from commit e6fe228c4f)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:02:41 +00:00
SeongJae Park
1016faf509 BACKPORT: arch/arm64/mm/fault: Fix undeclared variable error in do_page_fault()
commit 24be4d0b46 upstream.

Commit ae870a68b5 ("arm64/mm: Convert to using
lock_mm_and_find_vma()") made do_page_fault() to use 'vma' even if
CONFIG_PER_VMA_LOCK is not defined, but the declaration is still in the
ifdef.

As a result, building kernel without the config fails with undeclared
variable error as below:

    arch/arm64/mm/fault.c: In function 'do_page_fault':
    arch/arm64/mm/fault.c:624:2: error: 'vma' undeclared (first use in this function); did you mean 'vmap'?
      624 |  vma = lock_mm_and_find_vma(mm, addr, regs);
          |  ^~~
          |  vmap

Fix it by moving the declaration out of the ifdef.

Fixes: ae870a68b5 ("arm64/mm: Convert to using lock_mm_and_find_vma()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[surenb: this one is taken from 6.4.y stable branch]
Change-Id: Iba3153aa67f2dab347e4bc04a09c566b47cf4f63
(cherry picked from commit 24be4d0b46)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:01:53 +00:00
Linus Torvalds
89298b8b3c BACKPORT: arm64/mm: Convert to using lock_mm_and_find_vma()
commit ae870a68b5 upstream.

This converts arm64 to use the new page fault helper.  It was very
straightforward, but still needed a fix for the "obvious" conversion I
initially did.  Thanks to Suren for the fix and testing.

Fixed-and-tested-by: Suren Baghdasaryan <surenb@google.com>
Unnecessary-code-removal-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[surenb: this one is taken from 6.4.y stable branch]
Change-Id: Ibda94ca9b3893b8961e1d6536c854c0aee559a6b
(cherry picked from commit ae870a68b5)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:01:26 +00:00
Linus Torvalds
cf70cb4f1f UPSTREAM: mm: make the page fault mmap locking killable
commit eda0047296 upstream.

This is done as a separate patch from introducing the new
lock_mm_and_find_vma() helper, because while it's an obvious change,
it's not what x86 used to do in this area.

We already abort the page fault on fatal signals anyway, so why should
we wait for the mmap lock only to then abort later? With the new helper
function that returns without the lock held on failure anyway, this is
particularly easy and straightforward.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I9730b4543265a20253cbfc02de135cc77927f821
(cherry picked from commit eda0047296)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:01:22 +00:00
xieliujie
544ae28cf6 ANDROID: Inherit "user-aware property" across rtmutex.
Since upstream commit 715f7f9ece ("locking/rtmutex: Squash !RT
tasks to DEFAULT_PRIO"), non-rt tasks do not inherit the
nice-priority values across rt_mutexes. This removes the minor
(and indirect) priority-inheritance that rt-mutexes provided for
CFS tasks.

Though without priority inheritance, time-bounded priority
inversion can occur between CFS tasks of different nice
priorities / cgroup limitations.  The proxy-execution efforts
are a work-in-progress to resolve this upstream, but in the
meantime it is left to vendor hooks to provide a near term
solution to avoid priority inversion between CFS tasks.

In our oem scheduler, if a  CFS thread has an "user-aware
property", we will always pick it even if it's vruntime is
bigger than the smallest one in runqueue. That's why the
trace_android_rvh_replace_next_task_fair vendorhook was added
previously in commit 53e8099784 ("ANDROID: vendor_hooks: Add
hooks for scheduler").

Thus for our oem scheduler, important CFS tasks(like
RenderThread) are marked with the "user-aware property" in their
struct task_struct. If those tasks are blocked on an rtmutex, we
want to allow the "user-aware property" to be inherited to lock
owner, so it will be selected to run immediately to release the
lock.

To support this, we need new hooks to map "user-aware property"
into different rtmutex_waiter prio and update the owner's
"user-aware property" if needed. Thus these additional vendor
hooks are needed.

In the future, once an generalized upstream solution for CFS
priority inheritance is in place, this will no longer be needed.

Bug: 290585456
Change-Id: I6521ed2086b147400a54da6b84a324baf16bc649
Signed-off-by: xieliujie <xieliujie@oppo.com>
2023-07-27 00:04:07 +00:00
Eric Biggers
5e4a5dc820 BACKPORT: blk-crypto: use dynamic lock class for blk_crypto_profile::lock
When a device-mapper device is passing through the inline encryption
support of an underlying device, calls to blk_crypto_evict_key() take
the blk_crypto_profile::lock of the device-mapper device, then take the
blk_crypto_profile::lock of the underlying device (nested).  This isn't
a real deadlock, but it causes a lockdep report because there is only
one lock class for all instances of this lock.

Lockdep subclasses don't really work here because the hierarchy of block
devices is dynamic and could have more than 2 levels.

Instead, register a dynamic lock class for each blk_crypto_profile, and
associate that with the lock.

This avoids false-positive lockdep reports like the following:

    ============================================
    WARNING: possible recursive locking detected
    6.4.0-rc5 #2 Not tainted
    --------------------------------------------
    fscryptctl/1421 is trying to acquire lock:
    ffffff80829ca418 (&profile->lock){++++}-{3:3}, at: __blk_crypto_evict_key+0x44/0x1c0

                   but task is already holding lock:
    ffffff8086b68ca8 (&profile->lock){++++}-{3:3}, at: __blk_crypto_evict_key+0xc8/0x1c0

                   other info that might help us debug this:
     Possible unsafe locking scenario:

           CPU0
           ----
      lock(&profile->lock);
      lock(&profile->lock);

                    *** DEADLOCK ***

     May be due to missing lock nesting notation

Fixes: 1b26283970 ("block: Keyslot Manager for Inline Encryption")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230610061139.212085-1-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Bug: 286427075
(cherry picked from commit 2fb48d88e7)
(added '#ifdef CONFIG_LOCKDEP' to keep the KMI tooling happy)
Change-Id: I21c0f941a36663c956a5c89324813bbaac0633ef
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-07-27 00:02:13 +00:00
Kyongho Cho
db2c29e53d ANDROID: ABI: update symbol list for Xclipse GPU
1 function symbol(s) added
  'void ttm_tt_unpopulate(struct ttm_device*, struct ttm_tt*)'

Bug: 291101811
Change-Id: I0be29227b37734304f00fc7b8e2612a0fa6c3fff
Signed-off-by: Kyongho Cho <pullip.cho@samsung.com>
2023-07-26 21:11:54 +00:00
Kyongho Cho
7edb035c79 ANDROID: drm/ttm: export ttm_tt_unpopulate()
Xclipse GPU driver depends on TTM for graphics buffer allocation and
management. It is required by customers to add graphics memory swap
to improve overall memory efficiency. However TTM's swap feature can't
be used since it selects victim buffer by LRU and we can't choose a
specific buffer to swap.
Xclipse GPU driver implements its own swap feature by means of APIs of
TTM. But the problem is TTM's buffer allocations statistics in ttm_tt.c
which are local to that file. Whenever a graphic buffer is swapped out,
the size of total page allocation should be decreased but it is not
possible from the outside of ttm_tt.c.
If the statistics is not maintained well, TTM ends up swapping out TTM
buffers globally which is unexpected.

Bug: 291101811
Change-Id: I143c705834bcc196432c3ef59b49c9ec31f2e971
Signed-off-by: Kyongho Cho <pullip.cho@samsung.com>
2023-07-26 21:11:54 +00:00
lambert wang
b61f298c0d ANDROID: GKI: Add ABI symbol list(devlink) for MTK
17 function symbol(s) added
  'bool device_remove_file_self(struct device*, const struct device_attribute*)'
  'struct devlink* devlink_alloc_ns(const struct devlink_ops*, size_t, struct net*, struct device*)'
  'void devlink_flash_update_status_notify(struct devlink*, const char*, const char*, unsigned long, unsigned long)'
  'int devlink_fmsg_binary_pair_nest_end(struct devlink_fmsg*)'
  'int devlink_fmsg_binary_pair_nest_start(struct devlink_fmsg*, const char*)'
  'int devlink_fmsg_binary_put(struct devlink_fmsg*, const void*, u16)'
  'void devlink_free(struct devlink*)'
  'int devlink_health_report(struct devlink_health_reporter*, const char*, void*)'
  'struct devlink_health_reporter* devlink_health_reporter_create(struct devlink*, const struct devlink_health_reporter_ops*, u64, void*)'
  'void devlink_health_reporter_destroy(struct devlink_health_reporter*)'
  'void* devlink_health_reporter_priv(struct devlink_health_reporter*)'
  'void devlink_health_reporter_state_update(struct devlink_health_reporter*, enum devlink_health_reporter_state)'
  'void* devlink_priv(struct devlink*)'
  'struct devlink_region* devlink_region_create(struct devlink*, const struct devlink_region_ops*, u32, u64)'
  'void devlink_region_destroy(struct devlink_region*)'
  'void devlink_register(struct devlink*)'
  'void devlink_unregister(struct devlink*)'

type 'struct devlink' changed
  was only declared, is now fully defined

type 'struct devlink_linecard' changed
  was only declared, is now fully defined

Bug: 283707518
Change-Id: I686fd14c13863c27b3dfdb29cd7c6b6d5a0a3127
Signed-off-by: lambert wang <lambert.wang@mediatek.com>
Signed-off-by: iven yang <iven.yang@mediatek.com>
Signed-off-by: michael cai <michael.cai@mediatek.com>
2023-07-26 20:55:37 +00:00
lambert wang
ec419af28f ANDROID: devlink: Select CONFIG_NET_DEVLINK in Kconfig.gki
Select hidden Kconfig: NET_DEVLINK.

Required by device drivers to provide unified interface to expose
device info, capture coredump and perform device flash.

Bug: 283707518

Change-Id: I1cc5b7dce36c79549cd7f1d9b755f7bab3973f0e
Signed-off-by: michael cai <michael.cai@mediatek.com>
Signed-off-by: lambert wang <lambert.wang@mediatek.com>
2023-07-26 20:55:37 +00:00
Vincent Donnefort
1e114e6efa ANDROID: KVM: arm64: Fix memory ordering for pKVM module callbacks
Registration of module callbacks for the pKVM hypervisor is lockless
thanks to the use of a cmpxchg.

Problem, a CPU can speculatively execute an indirect branch and
speculatively read variables used in that branch. We then need to order
the memory access between variables potentially set in the driver init
(before the callback registration happen) and the call to that
registered callback.

e.g. in the case of the serial.

 CPU0:                                   CPU1:

   driver_init():                        hyp_serial_enabled()
     base_addr = 0xdeadbeef;               enabled = __hyp_putc
     barrier();                            barrier();
     ops->register_serial_driver(putc);    if (enabled)
                                                __hyp_putc(); /* read base_addr */

This is the same for the SMC and PSCI handler callbacks. The abort and
fault callbacks are not impacted: the driver init can only happen before
the kernel is deprivileged i.e. before the host stage-2 is in place and
then before any of those callbacks can be triggered.

Instead of a full barrier, we can use the acquire/release semantics:
relaxing cmpxchg to cmpxchg_release in the registration path and use a
load_acquire in hyp_serial_enabled().

Bug: 292470326
Change-Id: I4b5fe3713fe40cc5ab42ea0e9cdf54e8315dfb44
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
2023-07-26 15:00:04 +00:00
Linus Torvalds
3803ae4a28 BACKPORT: mm: introduce new 'lock_mm_and_find_vma()' page fault helper
commit c2508ec5a5 upstream.

.. and make x86 use it.

This basically extracts the existing x86 "find and expand faulting vma"
code, but extends it to also take the mmap lock for writing in case we
actually do need to expand the vma.

We've historically short-circuited that case, and have some rather ugly
special logic to serialize the stack segment expansion (since we only
hold the mmap lock for reading) that doesn't match the normal VM
locking.

That slight violation of locking worked well, right up until it didn't:
the maple tree code really does want proper locking even for simple
extension of an existing vma.

So extract the code for "look up the vma of the fault" from x86, fix it
up to do the necessary write locking, and make it available as a helper
function for other architectures that can use the common helper.

Note: I say "common helper", but it really only handles the normal
stack-grows-down case.  Which is all architectures except for PA-RISC
and IA64.  So some rare architectures can't use the helper, but if they
care they'll just need to open-code this logic.

It's also worth pointing out that this code really would like to have an
optimistic "mmap_upgrade_trylock()" to make it quicker to go from a
read-lock (for the common case) to taking the write lock (for having to
extend the vma) in the normal single-threaded situation where there is
no other locking activity.

But that _is_ all the very uncommon special case, so while it would be
nice to have such an operation, it probably doesn't matter in reality.
I did put in the skeleton code for such a possible future expansion,
even if it only acts as pseudo-documentation for what we're doing.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[surenb: this one is taken from 6.4.y stable branch]
Change-Id: I6e16e6751245ac24adcbe78114bc57c726463acb
(cherry-picked from commit d6a5c7a1a6)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-26 09:57:34 +00:00
Peng Zhang
66b5ad3507 BACKPORT: maple_tree: fix potential out-of-bounds access in mas_wr_end_piv()
commit cd00dd2585 upstream.

Check the write offset end bounds before using it as the offset into the
pivot array.  This avoids a possible out-of-bounds access on the pivot
array if the write extends to the last slot in the node, in which case the
node maximum should be used as the end pivot.

akpm: this doesn't affect any current callers, but new users of mapletree
may encounter this problem if backported into earlier kernels, so let's
fix it in -stable kernels in case of this.

Link: https://lkml.kernel.org/r/20230506024752.2550-1-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I992549af25fa9c22f587893d004002d2e004d317
(cherry-picked from commit 4e2ad53aba)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-26 09:57:29 +00:00
Thomas Gleixner
19dd4101e0 UPSTREAM: x86/smp: Cure kexec() vs. mwait_play_dead() breakage
commit d7893093a7 upstream.

TLDR: It's a mess.

When kexec() is executed on a system with offline CPUs, which are parked in
mwait_play_dead() it can end up in a triple fault during the bootup of the
kexec kernel or cause hard to diagnose data corruption.

The reason is that kexec() eventually overwrites the previous kernel's text,
page tables, data and stack. If it writes to the cache line which is
monitored by a previously offlined CPU, MWAIT resumes execution and ends
up executing the wrong text, dereferencing overwritten page tables or
corrupting the kexec kernels data.

Cure this by bringing the offlined CPUs out of MWAIT into HLT.

Write to the monitored cache line of each offline CPU, which makes MWAIT
resume execution. The written control word tells the offlined CPUs to issue
HLT, which does not have the MWAIT problem.

That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as
those make it come out of HLT.

A follow up change will put them into INIT, which protects at least against
NMI and SMI.

Fixes: ea53069231 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case")
Reported-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I80035e671b55732ac3d56c71dc53364e82238fe2
(cherry-picked from commit 0af4750eaa)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-26 09:57:29 +00:00