mirror of https://github.com/hardkernel/linux.git synced 2026-06-06 02:50:49 +09:00

Go to file

Suren Baghdasaryan 04f73ad5b4 UPSTREAM: mm: introduce CONFIG_PER_VMA_LOCK

Patch series "Per-VMA locks", v4.

LWN article describing the feature: https://lwn.net/Articles/906852/

Per-vma locks idea that was discussed during SPF [1] discussion at LSF/MM
last year [2], which concluded with suggestion that “a reader/writer
semaphore could be put into the VMA itself; that would have the effect of
using the VMA as a sort of range lock.  There would still be contention at
the VMA level, but it would be an improvement.” This patchset implements
this suggested approach.

When handling page faults we lookup the VMA that contains the faulting
page under RCU protection and try to acquire its lock.  If that fails we
fall back to using mmap_lock, similar to how SPF handled this situation.

One notable way the implementation deviates from the proposal is the way
VMAs are read-locked.  During some of mm updates, multiple VMAs need to be
locked until the end of the update (e.g.  vma_merge, split_vma, etc).
Tracking all the locked VMAs, avoiding recursive locks, figuring out when
it's safe to unlock previously locked VMAs would make the code more
complex.  So, instead of the usual lock/unlock pattern, the proposed
solution marks a VMA as locked and provides an efficient way to:

1. Identify locked VMAs.

2. Unlock all locked VMAs in bulk.

We also postpone unlocking the locked VMAs until the end of the update,
when we do mmap_write_unlock.  Potentially this keeps a VMA locked for
longer than is absolutely necessary but it results in a big reduction of
code complexity.

Read-locking a VMA is done using two sequence numbers - one in the
vm_area_struct and one in the mm_struct.  VMA is considered read-locked
when these sequence numbers are equal.  To read-lock a VMA we set the
sequence number in vm_area_struct to be equal to the sequence number in
mm_struct.  To unlock all VMAs we increment mm_struct's seq number.  This
allows for an efficient way to track locked VMAs and to drop the locks on
all VMAs at the end of the update.

The patchset implements per-VMA locking only for anonymous pages which are
not in swap and avoids userfaultfs as their implementation is more
complex.  Additional support for file-back page faults, swapped and user
pages can be added incrementally.

Performance benchmarks show similar although slightly smaller benefits as
with SPF patchset (~75% of SPF benefits).  Still, with lower complexity
this approach might be more desirable.

Since RFC was posted in September 2022, two separate Google teams outside
of Android evaluated the patchset and confirmed positive results.  Here
are the known usecases when per-VMA locks show benefits:

Android:

Apps with high number of threads (~100) launch times improve by up to 20%.
Each thread mmaps several areas upon startup (Stack and Thread-local
storage (TLS), thread signal stack, indirect ref table), which requires
taking mmap_lock in write mode.  Page faults take mmap_lock in read mode.
During app launch, both thread creation and page faults establishing the
active workinget are happening in parallel and that causes lock contention
between mm writers and readers even if updates and page faults are
happening in different VMAs.  Per-vma locks prevent this contention by
providing more granular lock.

Google Fibers:

We have several dynamically sized thread pools that spawn new threads
under increased load and reduce their number when idling. For example,
Google's in-process scheduling/threading framework, UMCG/Fibers, is backed
by such a thread pool. When idling, only a small number of idle worker
threads are available; when a spike of incoming requests arrive, each
request is handled in its own "fiber", which is a work item posted onto a
UMCG worker thread; quite often these spikes lead to a number of new
threads spawning. Each new thread needs to allocate and register an RSEQ
section on its TLS, then register itself with the kernel as a UMCG worker
thread, and only after that it can be considered by the in-process
UMCG/Fiber scheduler as available to do useful work. In short, during an
incoming workload spike new threads have to be spawned, and they perform
several syscalls (RSEQ registration, UMCG worker registration, memory
allocations) before they can actually start doing useful work. Removing
any bottlenecks on this thread startup path will greatly improve our
services' latencies when faced with request/workload spikes.

At high scale, mmap_lock contention during thread creation and stack page
faults leads to user-visible multi-second serving latencies in a similar
pattern to Android app startup.  Per-VMA locking patchset has been run
successfully in limited experiments with user-facing production workloads.
In these experiments, we observed that the peak thread creation rate was
high enough that thread creation is no longer a bottleneck.

TCP zerocopy receive:

From the point of view of TCP zerocopy receive, the per-vma lock patch is
massively beneficial.

In today's implementation, a process with N threads where N - 1 are
performing zerocopy receive and 1 thread is performing madvise() with the
write lock taken (e.g.  needs to change vm_flags) will result in all N -1
receive threads blocking until the madvise is done.  Conversely, on a busy
process receiving a lot of data, an madvise operation that does need to
take the mmap lock in write mode will need to wait for all of the receives
to be done - a lose:lose proposition.  Per-VMA locking _removes_ by
definition this source of contention entirely.

There are other benefits for receive as well, chiefly a reduction in
cacheline bouncing across receiving threads for locking/unlocking the
single mmap lock.  On an RPC style synthetic workload with 4KB RPCs:

1a) The find+lock+unlock VMA path in the base case, without the
    per-vma lock patchset, is about 0.7% of cycles as measured by perf.

1b) mmap_read_lock + mmap_read_unlock in the base case is about 0.5%
    cycles overall - most of this is within the TCP read hotpath (a small
    fraction is 'other' usage in the system).

2a) The find+lock+unlock VMA path, with the per-vma patchset and a
    trivial patch written to take advantage of it in TCP, is about 0.4% of
    cycles (down from 0.7% above)

2b) mmap_read_lock + mmap_read_unlock in the per-vma patchset is <
    0.1% cycles and is out of the TCP read hotpath entirely (down from
    0.5% before, the remaining usage is the 'other' usage in the system).
    So, in addition to entirely removing an onerous source of contention,
    it also reduces the CPU cycles of TCP receive zerocopy by about 0.5%+
    (compared to overall cycles in perf) for the 'small' RPC scenario.

In https://lkml.kernel.org/r/87fsaqouyd.fsf_-_@stealth, Punit
demonstrated throughput improvements of as much as 188% from this
patchset.

This patch (of 25):

This configuration variable will be used to build the support for VMA
locking during page fault handling.

This is enabled on supported architectures with SMP and MMU set.

The architecture support is needed since the page fault handler is called
from the architecture's page faulting code which needs modifications to
handle faults under VMA lock.

Link: https://lkml.kernel.org/r/20230227173632.3292573-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-10-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 0b6cc04f3d)

Bug: 161210518
Change-Id: I787e1d28194655fb717d38718b2b839ef4e6226c
Signed-off-by: Suren Baghdasaryan <surenb@google.com>

2023-06-07 14:24:58 +00:00

android

ANDROID: GKI: Add symbols and update symbol list for Unisoc(2nd)

2023-06-07 10:06:20 +00:00

arch

UPSTREAM: mm: introduce __vm_flags_mod and use it in untrack_pfn

2023-06-07 14:24:58 +00:00

block

ANDROID: GKI: elevator: add Android ABI padding to some structures

2023-06-07 14:24:56 +00:00

certs

Merge 6.1.11 into android14-6.1

2023-02-09 13:29:55 +00:00

crypto

Merge 6.1.25 into android14-6.1

2023-04-26 13:13:19 +00:00

Documentation

ANDROID: gunyah: Sync with latest documentation and UAPI

2023-05-16 20:35:28 +00:00

drivers

UPSTREAM: mm: replace vma->vm_flags direct modifications with modifier calls

2023-06-07 14:24:57 +00:00

UPSTREAM: mm: replace vma->vm_flags direct modifications with modifier calls

2023-06-07 14:24:57 +00:00

include

UPSTREAM: mm: introduce vm_flags_reset_once to replace WRITE_ONCE vm_flags updates

2023-06-07 14:24:58 +00:00

init

UPSTREAM: gcc: disable '-Warray-bounds' for gcc-13 too

2023-05-16 17:26:14 +00:00

io_uring

io_uring: fix memory leak when removing provided buffers

2023-04-13 16:55:31 +02:00

ipc

ipc: fix memory leak in init_mqueue_fs()

2022-12-31 13:32:01 +01:00

kernel

UPSTREAM: mm: replace vma->vm_flags direct modifications with modifier calls

2023-06-07 14:24:57 +00:00

lib

FROMGIT: maple_tree: clear up index and last setting in single entry tree

2023-06-06 20:05:25 +00:00

LICENSES

LICENSES/LGPL-2.1: Add LGPL-2.1-or-later as valid identifiers

2021-12-16 14:33:10 +01:00

UPSTREAM: mm: introduce CONFIG_PER_VMA_LOCK

2023-06-07 14:24:58 +00:00

net

UPSTREAM: mm: replace vma->vm_flags direct modifications with modifier calls

2023-06-07 14:24:57 +00:00

rust

rust: kernel: Mark rust_fmt_argument as extern "C"

2023-04-26 14:28:38 +02:00

samples

ANDROID: gunyah: Sync remaining gunyah drivers with latest

2023-05-16 20:35:28 +00:00

scripts

BACKPORT: arm64: unwind: add asynchronous unwind tables to kernel and modules

2023-05-25 15:37:14 -07:00

security

UPSTREAM: mm: replace vma->vm_flags direct modifications with modifier calls

2023-06-07 14:24:57 +00:00

sound

UPSTREAM: mm: replace vma->vm_flags direct modifications with modifier calls

2023-06-07 14:24:57 +00:00

tools

FROMGIT: maple_tree: avoid unnecessary ascending

2023-06-06 20:05:25 +00:00

usr

usr/gen_init_cpio.c: remove unnecessary -1 values from int file

2022-10-03 14:21:44 -07:00

virt

KVM: Register /dev/kvm as the _very_ last thing during initialization

2023-03-10 09:34:11 +01:00

.clang-format

inet: ping: use hlist_nulls rcu iterator during lookup

2022-12-01 12:42:46 +01:00

.cocciconfig

…

.get_maintainer.ignore

get_maintainer: add Alan to .get_maintainer.ignore

2022-08-20 15:17:44 -07:00

.gitattributes

.gitattributes: use 'dts' diff driver for dts files

2019-12-04 19:44:11 -08:00

.gitignore

Kbuild: add Rust support

2022-09-28 09:02:20 +02:00

.mailmap

Merge tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2022-12-10 17:10:52 -08:00

.rustfmt.toml

rust: add .rustfmt.toml

2022-09-28 09:02:20 +02:00

BUILD.bazel

ANDROID: GKI: update symbol list file for honor

2023-05-17 21:26:09 +00:00

build.config.aarch64

ANDROID: Move NDK_TRIPLE to build.config.constants.

2023-02-14 14:13:51 -08:00

build.config.allmodconfig

ANDROID: Disable AF_RXRPC for allmodconfig.

2023-03-15 14:09:33 +00:00

build.config.allmodconfig.aarch64

ANDROID: drop KERNEL_DIR setting in build.config.common

2020-08-31 15:20:37 +00:00

build.config.allmodconfig.arm

ANDROID: drop KERNEL_DIR setting in build.config.common

2020-08-31 15:20:37 +00:00

build.config.allmodconfig.x86_64

ANDROID: drop KERNEL_DIR setting in build.config.common

2020-08-31 15:20:37 +00:00

build.config.amlogic

ANDROID: Unnest MAKE_GOALS from build configs

2023-05-02 13:37:21 +00:00

build.config.arm

ANDROID: kleaf: move NDK_TRIPLE for arm to build.config.constants.

2023-05-09 22:36:11 +00:00

build.config.common

ANDROID: 5/24/2023 KMI update

2023-05-24 14:06:40 +00:00

build.config.constants

ANDROID: clang: update to 17.0.2

2023-05-15 18:53:36 +00:00

build.config.db845c

ANDROID: db845c: Remove MAKE_GOALS from build.config

2023-05-15 07:01:39 +00:00

build.config.gki

ANDROID: GKI: Source GKI_BUILD_CONFIG_FRAGMENT after setting all variables

2022-12-27 13:52:08 -08:00

build.config.gki_kasan

ANDROID: build.config: re-disable LTO properly for KASAN

2022-03-24 12:41:35 -07:00

build.config.gki_kasan.aarch64

ANDROID: drop KERNEL_DIR setting in build.config.common

2020-08-31 15:20:37 +00:00

build.config.gki_kasan.x86_64

ANDROID: drop KERNEL_DIR setting in build.config.common

2020-08-31 15:20:37 +00:00

build.config.gki_kprobes

ANDROID: build.configs: migrate away from CC_LD_ARG

2021-07-02 09:49:23 +00:00

build.config.gki_kprobes.aarch64

ANDROID: Adding kprobes build configs for Cuttlefish

2021-03-01 15:29:45 +00:00

build.config.gki_kprobes.x86_64

ANDROID: Adding kprobes build configs for Cuttlefish

2021-03-01 15:29:45 +00:00

build.config.gki-debug.aarch64

ANDROID: drop KERNEL_DIR setting in build.config.common

2020-08-31 15:20:37 +00:00

build.config.gki-debug.x86_64

ANDROID: drop KERNEL_DIR setting in build.config.common

2020-08-31 15:20:37 +00:00

build.config.gki.aarch64

ANDROID: GKI: Remove MAKE_GOALS from build.config

2023-05-10 17:05:37 +00:00

build.config.gki.aarch64.16k

ANDROID: 16k target: don't write defconfig to source tree

2022-10-03 11:24:35 -07:00

build.config.gki.aarch64.fips140

ANDROID: fips140: add kernel crypto module

2023-01-09 21:33:43 +00:00

build.config.gki.riscv64

ANDROID: GKI: Remove MAKE_GOALS from build.config

2023-05-10 17:05:37 +00:00

build.config.gki.x86_64

ANDROID: GKI: Source GKI_BUILD_CONFIG_FRAGMENT after setting all variables

2022-12-27 13:52:08 -08:00

build.config.khwasan

ANDROID: Add a build config fragment for KHWASan.

2021-10-13 19:44:44 +00:00

build.config.riscv64

ANDROID: GKI: Add 64-bit RISC-V config

2022-12-08 20:01:15 +00:00

build.config.rockpi4

ANDROID: db845c: Remove MAKE_GOALS from build.config

2023-05-15 07:01:39 +00:00

build.config.x86_64

ANDROID: Move NDK_TRIPLE to build.config.constants.

2023-02-14 14:13:51 -08:00

COPYING

COPYING: state that all contributions really are covered by this file

2020-02-10 13:32:20 -08:00

CREDITS

MAINTAINERS: Remove Michal Marek from Kbuild maintainers

2022-11-16 14:53:00 +09:00

Kbuild

Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

2022-10-10 12:00:45 -07:00

Kconfig

ANDROID: kbuild: add Kconfig support for external modules

2021-12-13 18:33:18 +00:00

Kconfig.ext

ANDROID: kbuild: add Kconfig support for external modules

2021-12-13 18:33:18 +00:00

MAINTAINERS

ANDROID: Revert "mm: remove cleancache"

2023-04-26 17:01:50 +00:00

Makefile

UPSTREAM: scs: add support for dynamic shadow call stacks

2023-05-25 15:37:14 -07:00

modules.bzl

ANDROID: GKI: cfg/mac 80211 as vendor modules

2023-05-11 05:22:29 +00:00

OWNERS

ANDROID: add smuckle to OWNERS

2022-06-18 10:41:40 -07:00

OWNERS_DrNo

ANDROID: Updating OWNERS_DrNo

2022-08-22 16:34:52 +00:00

README

Drop all 00-INDEX files from Documentation/

2018-09-09 15:08:58 -06:00

README.md

ANDROID: README.md: fix checkpatch.pl path typo

2021-04-07 23:16:50 +00:00

README.md

How do I submit patches to Android Common Kernels

BEST: Make all of your changes to upstream Linux. If appropriate, backport to the stable releases. These patches will be merged automatically in the corresponding common kernels. If the patch is already in upstream Linux, post a backport of the patch that conforms to the patch requirements below.
- Do not send patches upstream that contain only symbol exports. To be considered for upstream Linux, additions of EXPORT_SYMBOL_GPL() require an in-tree modular driver that uses the symbol -- so include the new driver or changes to an existing driver in the same patchset as the export.
- When sending patches upstream, the commit message must contain a clear case for why the patch is needed and beneficial to the community. Enabling out-of-tree drivers or functionality is not not a persuasive case.
LESS GOOD: Develop your patches out-of-tree (from an upstream Linux point-of-view). Unless these are fixing an Android-specific bug, these are very unlikely to be accepted unless they have been coordinated with kernel-team@android.com. If you want to proceed, post a patch that conforms to the patch requirements below.

Common Kernel patch requirements

All patches must conform to the Linux kernel coding standards and pass scripts/checkpatch.pl
Patches shall not break gki_defconfig or allmodconfig builds for arm, arm64, x86, x86_64 architectures (see https://source.android.com/setup/build/building-kernels)
If the patch is not merged from an upstream branch, the subject must be tagged with the type of patch: UPSTREAM:, BACKPORT:, FROMGIT:, FROMLIST:, or ANDROID:.
All patches must have a Change-Id: tag (see https://gerrit-review.googlesource.com/Documentation/user-changeid.html)
If an Android bug has been assigned, there must be a Bug: tag.
All patches must have a Signed-off-by: tag by the author and the submitter

Additional requirements are listed below based on patch type

Requirements for backports from mainline Linux: `UPSTREAM:`, `BACKPORT:`

If the patch is a cherry-pick from Linux mainline with no changes at all
- tag the patch subject with UPSTREAM:.
- add upstream commit information with a (cherry picked from commit ...) line
- Example:
  - if the upstream commit message is

        important patch from upstream

        This is the detailed description of the important patch

        Signed-off-by: Fred Jones <fred.jones@foo.org>

then Joe Smith would upload the patch for the common kernel as

        UPSTREAM: important patch from upstream

        This is the detailed description of the important patch

        Signed-off-by: Fred Jones <fred.jones@foo.org>

        Bug: 135791357
        Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
        (cherry picked from commit c31e73121f4c1ec41143423ac6ce3ce6dafdcec1)
        Signed-off-by: Joe Smith <joe.smith@foo.org>

If the patch requires any changes from the upstream version, tag the patch with BACKPORT: instead of UPSTREAM:.
- use the same tags as UPSTREAM:
- add comments about the changes under the (cherry picked from commit ...) line
- Example:

        BACKPORT: important patch from upstream

        This is the detailed description of the important patch

        Signed-off-by: Fred Jones <fred.jones@foo.org>

        Bug: 135791357
        Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
        (cherry picked from commit c31e73121f4c1ec41143423ac6ce3ce6dafdcec1)
        [joe: Resolved minor conflict in drivers/foo/bar.c ]
        Signed-off-by: Joe Smith <joe.smith@foo.org>

Requirements for other backports: `FROMGIT:`, `FROMLIST:`,

If the patch has been merged into an upstream maintainer tree, but has not yet been merged into Linux mainline
- tag the patch subject with FROMGIT:
- add info on where the patch came from as (cherry picked from commit <sha1> <repo> <branch>). This must be a stable maintainer branch (not rebased, so don't use linux-next for example).
- if changes were required, use BACKPORT: FROMGIT:
- Example:
  - if the commit message in the maintainer tree is

        important patch from upstream

        This is the detailed description of the important patch

        Signed-off-by: Fred Jones <fred.jones@foo.org>

then Joe Smith would upload the patch for the common kernel as

        FROMGIT: important patch from upstream

        This is the detailed description of the important patch

        Signed-off-by: Fred Jones <fred.jones@foo.org>

        Bug: 135791357
        (cherry picked from commit 878a2fd9de10b03d11d2f622250285c7e63deace
         https://git.kernel.org/pub/scm/linux/kernel/git/foo/bar.git test-branch)
        Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
        Signed-off-by: Joe Smith <joe.smith@foo.org>

If the patch has been submitted to LKML, but not accepted into any maintainer tree
- tag the patch subject with FROMLIST:
- add a Link: tag with a link to the submittal on lore.kernel.org
- add a Bug: tag with the Android bug (required for patches not accepted into a maintainer tree)
- if changes were required, use BACKPORT: FROMLIST:
- Example:

        FROMLIST: important patch from upstream

        This is the detailed description of the important patch

        Signed-off-by: Fred Jones <fred.jones@foo.org>

        Bug: 135791357
        Link: https://lore.kernel.org/lkml/20190619171517.GA17557@someone.com/
        Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
        Signed-off-by: Joe Smith <joe.smith@foo.org>

Requirements for Android-specific patches: `ANDROID:`

If the patch is fixing a bug to Android-specific code
- tag the patch subject with ANDROID:
- add a Fixes: tag that cites the patch with the bug
- Example:

        ANDROID: fix android-specific bug in foobar.c

        This is the detailed description of the important fix

        Fixes: 1234abcd2468 ("foobar: add cool feature")
        Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
        Signed-off-by: Joe Smith <joe.smith@foo.org>

If the patch is a new feature
- tag the patch subject with ANDROID:
- add a Bug: tag with the Android bug (required for android-specific features)

README.md

How do I submit patches to Android Common Kernels

Common Kernel patch requirements

Requirements for backports from mainline Linux: UPSTREAM:, BACKPORT:

Requirements for other backports: FROMGIT:, FROMLIST:,

Requirements for Android-specific patches: ANDROID:

Requirements for backports from mainline Linux: `UPSTREAM:`, `BACKPORT:`

Requirements for other backports: `FROMGIT:`, `FROMLIST:`,

Requirements for Android-specific patches: `ANDROID:`