mirror of
https://github.com/hardkernel/linux.git
synced 2026-06-06 10:58:48 +09:00
4d3dd339dee404356e2c144014b924eecff94450
Patch series "userfaultfd: add minor fault handling", v9. Overview ======== This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS. When enabled (via the UFFDIO_API ioctl), this feature means that any hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also* get events for "minor" faults. By "minor" fault, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, userspace resolves the fault by either a) doing nothing if the contents are already correct, or b) updating the underlying contents using the second, non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Use Case ======== Consider the use case of VM live migration (e.g. under QEMU/KVM): 1. While a VM is still running, we copy the contents of its memory to a target machine. The pages are populated on the target by writing to the non-UFFD mapping, using the setup described above. The VM is still running (and therefore its memory is likely changing), so this may be repeated several times, until we decide the target is "up to date enough". 2. We pause the VM on the source, and start executing on the target machine. During this gap, the VM's user(s) will *see* a pause, so it is desirable to minimize this window. 3. Between the last time any page was copied from the source to the target, and when the VM was paused, the contents of that page may have changed - and therefore the copy we have on the target machine is out of date. Although we can keep track of which pages are out of date, for VMs with large amounts of memory, it is "slow" to transfer this information to the target machine. We want to resume execution before such a transfer would complete. 4. So, the guest begins executing on the target machine. The first time it touches its memory (via the UFFD-registered mapping), userspace wants to intercept this fault. Userspace checks whether or not the page is up to date, and if not, copies the updated page from the source machine, via the non-UFFD mapping. Finally, whether a copy was performed or not, userspace issues a UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". We don't have to do all of the final updates on-demand. The userfaultfd manager can, in the background, also copy over updated pages once it receives the map of which pages are up-to-date or not. Interaction with Existing APIs ============================== Because this is a feature, a registered VMA could potentially receive both missing and minor faults. I spent some time thinking through how the existing API interacts with the new feature: UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault: - For non-shared memory or shmem, -EINVAL is returned. - For hugetlb, -EFAULT is returned. UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without modifications, the existing codepath assumes a new page needs to be allocated. This is okay, since userspace must have a second non-UFFD-registered mapping anyway, thus there isn't much reason to want to use these in any case (just memcpy or memset or similar). - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case). - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns -ENOENT in that case (regardless of the kind of fault). Future Work =========== This series only supports hugetlbfs. I have a second series in flight to support shmem as well, extending the functionality. This series is more mature than the shmem support at this point, and the functionality works fully on hugetlbfs, so this series can be merged first and then shmem support will follow. This patch (of 6): This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. This is implemented as a new registration mode, instead of an API feature. This is because the alternative implementation has significant drawbacks [1]. However, doing it this was requires we allocate a VM_* flag for the new registration mode. On 32-bit systems, there are no unused bits, so this feature is only supported on architectures with CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in MINOR mode on 32-bit architectures, we return -EINVAL. [1] https://lore.kernel.org/patchwork/patch/1380226/ Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Steven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 82a150ec394f6b944e26786b907fc0deab5b2064 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1388132/ Conflicts: arch/arm64/Kconfig fs/userfaultfd.c mm/hugetlb.c (All related to SPF feature. Resolved by manual rebase) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: I43b37272d531341439ceaa03213d0e2415e04688
How do I submit patches to Android Common Kernels
-
BEST: Make all of your changes to upstream Linux. If appropriate, backport to the stable releases. These patches will be merged automatically in the corresponding common kernels. If the patch is already in upstream Linux, post a backport of the patch that conforms to the patch requirements below.
- Do not send patches upstream that contain only symbol exports. To be considered for upstream Linux,
additions of
EXPORT_SYMBOL_GPL()require an in-tree modular driver that uses the symbol -- so include the new driver or changes to an existing driver in the same patchset as the export. - When sending patches upstream, the commit message must contain a clear case for why the patch is needed and beneficial to the community. Enabling out-of-tree drivers or functionality is not not a persuasive case.
- Do not send patches upstream that contain only symbol exports. To be considered for upstream Linux,
additions of
-
LESS GOOD: Develop your patches out-of-tree (from an upstream Linux point-of-view). Unless these are fixing an Android-specific bug, these are very unlikely to be accepted unless they have been coordinated with kernel-team@android.com. If you want to proceed, post a patch that conforms to the patch requirements below.
Common Kernel patch requirements
- All patches must conform to the Linux kernel coding standards and pass
script/checkpatch.pl - Patches shall not break gki_defconfig or allmodconfig builds for arm, arm64, x86, x86_64 architectures (see https://source.android.com/setup/build/building-kernels)
- If the patch is not merged from an upstream branch, the subject must be tagged with the type of patch:
UPSTREAM:,BACKPORT:,FROMGIT:,FROMLIST:, orANDROID:. - All patches must have a
Change-Id:tag (see https://gerrit-review.googlesource.com/Documentation/user-changeid.html) - If an Android bug has been assigned, there must be a
Bug:tag. - All patches must have a
Signed-off-by:tag by the author and the submitter
Additional requirements are listed below based on patch type
Requirements for backports from mainline Linux: UPSTREAM:, BACKPORT:
- If the patch is a cherry-pick from Linux mainline with no changes at all
- tag the patch subject with
UPSTREAM:. - add upstream commit information with a
(cherry picked from commit ...)line - Example:
- if the upstream commit message is
- tag the patch subject with
important patch from upstream
This is the detailed description of the important patch
Signed-off-by: Fred Jones <fred.jones@foo.org>
- then Joe Smith would upload the patch for the common kernel as
UPSTREAM: important patch from upstream
This is the detailed description of the important patch
Signed-off-by: Fred Jones <fred.jones@foo.org>
Bug: 135791357
Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
(cherry picked from commit c31e73121f4c1ec41143423ac6ce3ce6dafdcec1)
Signed-off-by: Joe Smith <joe.smith@foo.org>
- If the patch requires any changes from the upstream version, tag the patch with
BACKPORT:instead ofUPSTREAM:.- use the same tags as
UPSTREAM: - add comments about the changes under the
(cherry picked from commit ...)line - Example:
- use the same tags as
BACKPORT: important patch from upstream
This is the detailed description of the important patch
Signed-off-by: Fred Jones <fred.jones@foo.org>
Bug: 135791357
Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
(cherry picked from commit c31e73121f4c1ec41143423ac6ce3ce6dafdcec1)
[joe: Resolved minor conflict in drivers/foo/bar.c ]
Signed-off-by: Joe Smith <joe.smith@foo.org>
Requirements for other backports: FROMGIT:, FROMLIST:,
- If the patch has been merged into an upstream maintainer tree, but has not yet
been merged into Linux mainline
- tag the patch subject with
FROMGIT: - add info on where the patch came from as
(cherry picked from commit <sha1> <repo> <branch>). This must be a stable maintainer branch (not rebased, so don't uselinux-nextfor example). - if changes were required, use
BACKPORT: FROMGIT: - Example:
- if the commit message in the maintainer tree is
- tag the patch subject with
important patch from upstream
This is the detailed description of the important patch
Signed-off-by: Fred Jones <fred.jones@foo.org>
- then Joe Smith would upload the patch for the common kernel as
FROMGIT: important patch from upstream
This is the detailed description of the important patch
Signed-off-by: Fred Jones <fred.jones@foo.org>
Bug: 135791357
(cherry picked from commit 878a2fd9de10b03d11d2f622250285c7e63deace
https://git.kernel.org/pub/scm/linux/kernel/git/foo/bar.git test-branch)
Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
Signed-off-by: Joe Smith <joe.smith@foo.org>
- If the patch has been submitted to LKML, but not accepted into any maintainer tree
- tag the patch subject with
FROMLIST: - add a
Link:tag with a link to the submittal on lore.kernel.org - add a
Bug:tag with the Android bug (required for patches not accepted into a maintainer tree) - if changes were required, use
BACKPORT: FROMLIST: - Example:
- tag the patch subject with
FROMLIST: important patch from upstream
This is the detailed description of the important patch
Signed-off-by: Fred Jones <fred.jones@foo.org>
Bug: 135791357
Link: https://lore.kernel.org/lkml/20190619171517.GA17557@someone.com/
Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
Signed-off-by: Joe Smith <joe.smith@foo.org>
Requirements for Android-specific patches: ANDROID:
- If the patch is fixing a bug to Android-specific code
- tag the patch subject with
ANDROID: - add a
Fixes:tag that cites the patch with the bug - Example:
- tag the patch subject with
ANDROID: fix android-specific bug in foobar.c
This is the detailed description of the important fix
Fixes: 1234abcd2468 ("foobar: add cool feature")
Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
Signed-off-by: Joe Smith <joe.smith@foo.org>
- If the patch is a new feature
- tag the patch subject with
ANDROID: - add a
Bug:tag with the Android bug (required for android-specific features)
- tag the patch subject with
Description
Languages
C
97.7%
Assembly
1.6%
Makefile
0.3%
Perl
0.1%