Commit Graph

1053322 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
8aa6f1cde0 Revert "Revert "bpf: Fix possible race in inc_misses_counter""
This reverts commit bb592b6898.

It is no longer needed as we can modify the KABI at this point in time.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1467db884a714b1379d31d65e650efccbc17ac5c
2022-03-23 11:32:21 -07:00
Greg Kroah-Hartman
57270a84df Revert "Revert "bpf: Use u64_stats_t in struct bpf_prog_stats""
This reverts commit beb134d21a.

It is no longer needed as we can modify the KABI at this point in time.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibabb6d2e2a1e00d18ad2e8c39b4459ba118c7002
2022-03-23 11:32:21 -07:00
Greg Kroah-Hartman
85c1108fd6 Revert "Revert "ethtool: Fix link extended state for big endian""
This reverts commit 74d434ad67.

It is no longer needed as we can modify the KABI at this point in time.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8be50594477ec53edb4005e5227c2df26218afdd
2022-03-23 11:32:21 -07:00
Greg Kroah-Hartman
97e323626c Revert "ANDROID: fix up rndis ABI breakage"
This reverts commit fc94364a70.

It is no longer needed as we can modify the KABI at this point in time.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Icb196c5ca88a1bb2dca12e132f012be16b2be11e
2022-03-23 11:32:21 -07:00
Guangming Cao
af2ae8657c FROMLIST: dma-buf: support users to change dma_buf.name
User space user can call DMA_BUF_SET_NAME to set dma_buf.name,
but until now we can't set it at kernel side, it's difficult to debug
kernel dma_buf users.

There are some kernel users of dma_heap also need it at MTK,
such as camera, it's also have a allocator for other camera part,
unlike most case in userspace, it's in kernel.
For debug buffer owner, we need add it to let it can set debug name
for each dmabuf, so that we can know dmabuf owner by dma_buf.name.

Leaf changes summary: 1 artifact changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 1 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable

1 Added function:

  [A] 'function long int dma_buf_set_name(dma_buf*, const char*)'

Bug: 223353875
Link: https://lore.kernel.org/patchwork/patch/1459719/
Change-Id: Iac5c6b8838b9b4d976f4525d000e17a3abab94f6
Signed-off-by: Guangming Cao <Guangming.Cao@mediatek.com>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
2022-03-23 11:32:20 -07:00
Isaac J. Manjarres
d2ca738f11 FROMLIST: iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
Add support for IOMMU drivers to have their own map_sg() callbacks.
This completes the path for having iommu_map_sg() invoke an IOMMU
driver's map_sg() callback, which can then invoke the io-pgtable
map_sg() callback with the entire scatter-gather list, so that it
can be processed entirely in the io-pgtable layer.

For IOMMU drivers that do not provide a callback, the default
implementation of iterating through the scatter-gather list, while
calling iommu_map() will be used.

Bug: 190544587
Link: https://lore.kernel.org/linux-iommu/1610376862-927-1-git-send-email-isaacm@codeaurora.org/T/#t
Change-Id: I3d5a8a9e8648649d8dcdda3fa1df41d72f87a528
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
2022-03-23 11:32:20 -07:00
Isaac J. Manjarres
536fdf792d FROMLIST: iommu/io-pgtable: Introduce map_sg() as a page table op
While mapping a scatter-gather list, iommu_map_sg() calls
into the IOMMU driver through an indirect call, which can
call into the io-pgtable code through another indirect call.

This sequence of going through the IOMMU core code, the IOMMU
driver, and finally the io-pgtable code, occurs for every
element in the scatter-gather list, in the worse case, which
is not optimal.

Introduce a map_sg callback in the io-pgtable ops so that
IOMMU drivers can invoke it with the complete scatter-gather
list, so that it can be processed within the io-pgtable
code entirely, reducing the number of indirect calls, and
boosting overall iommu_map_sg() performance.

Bug: 190544587
Link: https://lore.kernel.org/linux-iommu/1610376862-927-1-git-send-email-isaacm@codeaurora.org/T/#t
Change-Id: I4b2088dd08eb97dcd94a6c6968082a3c4395351a
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
2022-03-23 11:32:20 -07:00
JianMin Liu
46dfaf84ab ANDROID: rwsem: Add vendor hook to the rw-semaphore
Add the hook to apply vendor's performance tune for owner
of rwsem.

Add the hook for the waiter list of rwsem to allow
vendor perform waiting queue enhancement

ANDROID_VENDOR_DATA added to rw_semaphore

Bug: 222402411

Signed-off-by: JianMin Liu <jian-min.liu@mediatek.com>
Signed-off-by: Jino Hsu <jino.hsu@mediatek.com>
Change-Id: I007a5e26f3db2adaeaf4e5ccea414ce7abfa83b8
2022-03-23 11:32:20 -07:00
Suren Baghdasaryan
3a105c3caf ANDROID: ABI: modify exports for find_vma
A previous change [1] inlined find_vma function, resulting in its
removal from the exported kernel symbols and replacement with
__find_vma. This function is implemented in the header file and is
still available to drivers, but exported function is changed to
__find_vma. This causes ABI breakage with the following error:

ERROR: Differences between ksymtab and symbol list detected!
Symbols missing from ksymtab:
 - find_vma

Replace find_vma with new __find_vma in the symbol lists.

[1] https://lore.kernel.org/all/20220128131006.67712-13-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I23fdb68b7fd4d907354fc5902dca9ddec8060319
2022-03-23 11:32:20 -07:00
Vijayanand Jitta
385b0dd1f9 ANDROID: mm: Fix page table lookup in speculative fault path
In speculative fault path, while doing page table lookup, offset
is obtained at each level and value at that offset is read and
checks are perfomed on it, later to get next level offset we read
from previous level offset again. A concurrent page table reclaimation
operation could result in change in value at this offset, and we go
ahead and access it, this would result in reading an invalid entry.
Fix this by reading from previous level offset again and comparing
before performing next level access.

Bug: 221005439
Change-Id: I66b3d24ae79c7ee5ccce4ba7a94f028f4cf3fda0
Signed-off-by: Vijayanand Jitta <quic_vjitta@quicinc.com>
2022-03-23 11:32:20 -07:00
Michel Lespinasse
6febc3942c BACKPORT: FROMLIST: f2fs: implement speculative fault handling
We just need to make sure f2fs_filemap_fault() doesn't block in the
speculative case as it is called with an rcu read lock held.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-33-michel@lespinasse.org/

Conflicts:
    fs/f2fs/file.c

1. The change in f2fs_filemap_fault is not needed since i_mmap_sem is not
used anymore.

Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If7a46e131ee38ca02a4c5b8a76ab4eb742acbe95
2022-03-23 11:32:19 -07:00
Michel Lespinasse
a21ca34904 BACKPORT: FROMLIST: ext4: implement speculative fault handling
We just need to make sure ext4_filemap_fault() doesn't block in the
speculative case as it is called with an rcu read lock held.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-32-michel@lespinasse.org/

Conflicts:
    fs/ext4/inode.c

1. The change in fs/ext4/inode.c is not needed since i_mmap_sem is not
used anymore.

Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Idafc81074cf7f4b31985bdb24e0cc1597c91b875
2022-03-23 11:32:19 -07:00
Michel Lespinasse
7d6787088d BACKPORT: FROMLIST: mm: enable speculative fault handling for supported file types.
Introduce vma_can_speculate(), which allows speculative handling for
VMAs mapping supported file types.

From do_handle_mm_fault(), speculative handling will follow through
__handle_mm_fault(), handle_pte_fault() and do_fault().

At this point, we expect speculative faults to continue through one of:
- do_read_fault(), fully implemented;
- do_cow_fault(), which might abort if missing anon vmas,
- do_shared_fault(), not implemented yet
  (would require ->page_mkwrite() changes).

vma_can_speculate() provides an early abort for the do_shared_fault() case,
limiting the time spent on trying that unimplemented case.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-31-michel@lespinasse.org/

Conflicts:
    include/linux/vm_event_item.h
    mm/vmstat.c

1. SPF_ATTEMPT_FILE is taken from https://lore.kernel.org/all/20210407014502.24091-36-michel@lespinasse.org/
since the patch posted upstream at the time had a different structure
with stats for anonymouse and file-backed pagefaults introduced in a
separate patch.

Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I3a28af63b41b649f02f8b73d53f6494ad114ee5a
2022-03-23 11:32:19 -07:00
Michel Lespinasse
a2138fee6c FROMLIST: fs: list file types that support speculative faults.
Add a speculative field to the vm_operations_struct, which indicates if
the associated file type supports speculative faults.

Initially this is set for files that implement fault() with filemap_fault().

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-30-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic92efdf13283c45e7da7bf703f4f85f8b392ba69
2022-03-23 11:32:19 -07:00
Michel Lespinasse
4979ff3738 FROMLIST: mm: implement speculative handling in filemap_map_pages()
In the speculative case, we know the page table already exists, and it
must be locked with pte_map_lock(). In the case where no page is found
for the given address, return VM_FAULT_RETRY which will abort the
fault before we get into the vm_ops->fault() callback. This is fine
because if filemap_map_pages does not find the page in page cache,
vm_ops->fault() will not either.

Initialize addr and last_pgoff to correspond to the pte at the original
fault address (which was mapped with pte_map_lock()), rather than the
pte at start_pgoff. The choice of initial values doesn't matter as
they will all be adjusted together before use, so they just need to be
consistent with each other, and using the original fault address and
pte allows us to reuse pte_map_lock() without any changes to it.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-29-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I0acf4f9626ec0126cdc9a95a7ff1cd735c1af2ca
2022-03-23 11:32:19 -07:00
Michel Lespinasse
7045d2d838 FROMLIST: mm: implement speculative handling in do_fault_around()
Call the vm_ops->map_pages method within an rcu read locked section.
In the speculative case, verify the mmap sequence lock at the start of
the section. A match guarantees that the original vma is still valid
at that time, and that the associated vma->vm_file stays valid while
the vm_ops->map_pages() method is running.

Do not test vmf->pmd in the speculative case - we only speculate when
a page table already exists, and and this saves us from having to handle
synchronization around the vmf->pmd read.

Change xfs_filemap_map_pages() account for the fact that it can not
block anymore, as it is now running within an rcu read lock.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-28-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Id771c1e6fa9b883595a48d4df63f448a05916eda
2022-03-23 11:32:19 -07:00
Michel Lespinasse
6877640598 BACKPORT: FROMLIST: mm: implement speculative fault handling in finish_fault()
In the speculative case, we want to avoid direct pmd checks (which
would require some extra synchronization to be safe), and rely on
pte_map_lock which will both lock the page table and verify that the
pmd has not changed from its initial value.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-27-michel@lespinasse.org/

Conflicts:
    mm/memory.c

1. Merge conflict due to new vmf->prealloc_pte usage in finish_fault.

Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If6046592083eaf12caf5c51c3fbb287a4dfa1ace
2022-03-23 11:32:18 -07:00
Michel Lespinasse
cd333a037c BACKPORT: FROMLIST: mm: implement speculative handling in filemap_fault()
Extend filemap_fault() to handle speculative faults.

In the speculative case, we will only be fishing existing pages out of
the page cache. The logic we use mirrors what is done in the
non-speculative case, assuming that pages are found in the page cache,
are up to date and not already locked, and that readahead is not
necessary at this time. In all other cases, the fault is aborted to be
handled non-speculatively.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-26-michel@lespinasse.org/

Conflicts:
    mm/filemap.c

1. Added back file_ra_state variable used by SPF path.

Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I82eba7fcfc81876245c2e65bc5ae3d33ddfcc368
2022-03-23 11:32:18 -07:00
Michel Lespinasse
b12e52ca98 FROMLIST: mm: implement speculative handling in __do_fault()
In the speculative case, call the vm_ops->fault() method from within
an rcu read locked section, and verify the mmap sequence lock at the
start of the section. A match guarantees that the original vma is still
valid at that time, and that the associated vma->vm_file stays valid
while the vm_ops->fault() method is running.

Note that this implies that speculative faults can not sleep within
the vm_ops->fault method. We will only attempt to fetch existing pages
from the page cache during speculative faults; any miss (or prefetch)
will be handled by falling back to non-speculative fault handling.

The speculative handling case also does not preallocate page tables,
as it is always called with a pre-existing page table.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-25-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I995ba94d8e96014ef83ac93fe5a4669afcde34b9
2022-03-23 11:32:18 -07:00
Michel Lespinasse
48e35d053f FROMLIST: mm: rcu safe vma->vm_file freeing
Defer freeing of vma->vm_file when freeing vmas.
This is to allow speculative page faults in the mapped file case.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-24-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic766bc2086db82eae9f3aadf0f23dd743be1c464
2022-03-23 11:32:18 -07:00
Michel Lespinasse
fea117c94a FROMLIST: powerpc/mm: attempt speculative mm faults first
Attempt speculative mm fault handling first, and fall back to the
existing (non-speculative) code if that fails.

This follows the lines of the x86 speculative fault handling code,
but with some minor arch differences such as the way that the
access_pkey_error case is handled

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-36-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic12bc3d5070d1502fc5df182a19c92b4a8d59723
2022-03-23 11:32:18 -07:00
Michel Lespinasse
c3b8c726b8 FROMLIST: powerpc/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT so that the speculative fault
handling code can be compiled on this architecture.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-35-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia016c531d264c1022af4896b1e33db7b7b4d5013
2022-03-23 11:32:18 -07:00
Michel Lespinasse
ac39e2e1eb FROMLIST: arm64/mm: attempt speculative mm faults first
Attempt speculative mm fault handling first, and fall back to the
existing (non-speculative) code if that fails.

This follows the lines of the x86 speculative fault handling code,
but with some minor arch differences such as the way that the
VM_FAULT_BADACCESS case is handled.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-34-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Iccd87036b15eebf2ff28fbb8022b07c9f91d7353
2022-03-23 11:32:17 -07:00
Michel Lespinasse
f03ec9d1c6 FROMLIST: arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT so that the speculative fault
handling code can be compiled on this architecture.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-33-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I162b3272a7d2736addf22430ef79c0092baa5842
2022-03-23 11:32:17 -07:00
Michel Lespinasse
9b92402808 FROMLIST: mm: anon spf statistics
Add a new CONFIG_SPECULATIVE_PAGE_FAULT_STATS config option,
and dump extra statistics about executed spf cases and abort reasons
when the option is set.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-32-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia53cd88e4a7140aeb26bf8f3869e1fc5270012da
2022-03-23 11:32:17 -07:00
Michel Lespinasse
956cb3f228 FROMLIST: mm: create new include/linux/vm_event.h header file
Split off the definitions necessary to update event counters from vmstat.h
into a new vm_event.h file.

The rationale is to allow header files included from mm.h to update
counter events. vmstat.h can not be included from such header files,
because it refers to page_pgdat() which is only defined later down
in mm.h, and thus results in compile errors. vm_event.h does not refer
to page_pgdat() and thus does not result in such errors.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-31-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ie70dd435b3dcbad80a4a9bfc294b78a9107c1ac2
2022-03-23 11:32:17 -07:00
Michel Lespinasse
12230588f3 FROMLIST: mm: disable rcu safe vma freeing for single threaded user space
Performance tuning: as single threaded userspace does not use
speculative page faults, it does not require rcu safe vma freeing.
Turn this off to avoid the related (small) extra overheads.

For multi threaded userspace, we often see a performance benefit from
the rcu safe vma freeing - even in tests that do not have any frequent
concurrent page faults ! This is because rcu safe vma freeing prevents
recently released vmas from being immediately reused in a new thread.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-30-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I81ef7ab43e2757f268c567d5bfe6ab02f1e43a1c
2022-03-23 11:32:17 -07:00
Michel Lespinasse
959fc0f0f1 FROMLIST: mm: disable speculative faults for single threaded user space
Performance tuning: single threaded userspace does not benefit from
speculative page faults, so we turn them off to avoid any related
(small) extra overheads.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-29-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I52720f24949d69b3ccaa7dbc1173e47b030fcaaf
2022-03-23 11:32:17 -07:00
Michel Lespinasse
aa9ae5c915 FROMLIST: mm: implement and enable speculative fault handling in handle_pte_fault()
In handle_pte_fault(), allow speculative execution to proceed.

Use pte_spinlock() to validate the mmap sequence count when locking
the page table.

If speculative execution proceeds through do_wp_page(), ensure that we
end up in the wp_page_reuse() or wp_page_copy() paths, rather than
wp_pfn_shared() or wp_page_shared() (both unreachable as we only
handle anon vmas so far) or handle_userfault() (needs an explicit
abort to handle non-speculatively).

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-28-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia45d095ec7b8e23f1c5d68b7a7f572a3f6f6df97
2022-03-23 11:32:17 -07:00
Michel Lespinasse
40bc9ed389 FROMLIST: mm: implement speculative handling in wp_page_copy()
Change wp_page_copy() to handle the speculative case. This involves
aborting speculative faults if they have to allocate an anon_vma,
read-locking the mmu_notifier_lock to avoid races with
mmu_notifier_register(), and using pte_map_lock() instead of
pte_offset_map_lock() to complete the page fault.

Also change call sites to clear vmf->pte after unmapping the page table,
in order to satisfy pte_map_lock()'s preconditions.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-27-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icd2188e9facf5a7fea42000a2808bcda1ad6f0fc
2022-03-23 11:32:16 -07:00
Michel Lespinasse
81863f7422 FROMLIST: mm: add mmu_notifier_trylock() and mmu_notifier_unlock()
These new functions are to be used when firing MMU notifications
without holding any of the mmap or rmap locks, as is the case with
speculative page fault handlers.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-26-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I3789c44f509d8d7c7cb445e39d891300795cac3c
2022-03-23 11:32:16 -07:00
Michel Lespinasse
3e15787d22 FROMLIST: mm: write lock mmu_notifier_lock when registering mmu notifiers
Change mm_take_all_locks to also take the mmu_notifier_lock.
Note that mm_take_all_locks is called from mmu_notifier_register() only.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-25-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I7ad82c6bc66f8f59a718dc4bf030674d9306a53d
2022-03-23 11:32:16 -07:00
Michel Lespinasse
1ae855f191 FROMLIST: mm: add mmu_notifier_lock
Introduce mmu_notifier_lock as a per-mm percpu_rw_semaphore,
as well as the code to initialize and destroy it together with the mm.

This lock will be used to prevent races between mmu_notifier_register()
and speculative fault handlers that need to fire MMU notifications
without holding any of the mmap or rmap locks.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-24-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I453ebe979c8b9dcc6159b41c5ec7a1ea17d85ee2
2022-03-23 11:32:16 -07:00
Suren Baghdasaryan
3f4fefc1a4 FROMLIST: percpu-rwsem: enable percpu_sem destruction in atomic context
Calling percpu_free_rwsem in atomic context results in "scheduling while
atomic" bug being triggered:

BUG: scheduling while atomic: klogd/158/0x00000002
...
  __schedule_bug+0x191/0x290
  schedule_debug+0x97/0x180
  __schedule+0xdc/0xba0
  schedule+0xda/0x250
  schedule_timeout+0x92/0x2d0
  __wait_for_common+0x25b/0x430
  wait_for_completion+0x1f/0x30
  rcu_barrier+0x440/0x4f0
  rcu_sync_dtor+0xaa/0x190
  percpu_free_rwsem+0x41/0x80

Introduce percpu_rwsem_destroy function to perform semaphore destruction
in a worker thread.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-23-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic6df09ff048755cd862d340c89a83dfe8efa1bfb
2022-03-23 11:32:16 -07:00
Michel Lespinasse
009020e3d1 FROMLIST: mm: enable speculative fault handling in do_numa_page()
Change handle_pte_fault() to allow speculative fault execution to proceed
through do_numa_page().

do_swap_page() does not implement speculative execution yet, so it
needs to abort with VM_FAULT_RETRY in that case.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-22-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I0390331facc9ecd37534012abdd9f255ab5bbb12
2022-03-23 11:32:16 -07:00
Michel Lespinasse
fedc4d513e FROMLIST: mm: implement speculative handling in do_numa_page()
change do_numa_page() to use pte_spinlock() when locking the page table,
so that the mmap sequence counter will be validated in the speculative case.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-21-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If252547faf2a8a6cbba4c0a7ff929071a5f6a657
2022-03-23 11:32:15 -07:00
Michel Lespinasse
c2b2abe724 FROMLIST: mm: enable speculative fault handling through do_anonymous_page()
in x86 fault handler, only attempt spf if the vma is anonymous.

In do_handle_mm_fault(), let speculative page faults proceed as long
as they fall into anonymous vmas. This enables the speculative
handling code in __handle_mm_fault() and do_anonymous_page().

In handle_pte_fault(), if vmf->pte is set (the original pte was not
pte_none), catch speculative faults and return VM_FAULT_RETRY as
those cases are not implemented yet. Also assert that do_fault()
is not reached in the speculative case.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-20-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I875106fcfa1084f570c2bf8f24a129bdce55316b
2022-03-23 11:32:15 -07:00
Michel Lespinasse
31cf1fd564 FROMLIST: mm: implement speculative handling in do_anonymous_page()
Change do_anonymous_page() to handle the speculative case.
This involves aborting speculative faults if they have to allocate a new
anon_vma, and using pte_map_lock() instead of pte_offset_map_lock()
to complete the page fault.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-19-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I5ad955323faabc142c21f62415db039ac889066a
2022-03-23 11:32:15 -07:00
Michel Lespinasse
6e6766ab76 BACKPORT: FROMLIST: mm: add pte_map_lock() and pte_spinlock()
pte_map_lock() and pte_spinlock() are used by fault handlers to ensure
the pte is mapped and locked before they commit the faulted page to the
mm's address space at the end of the fault.

The functions differ in their preconditions; pte_map_lock() expects
the pte to be unmapped prior to the call, while pte_spinlock() expects
it to be already mapped.

In the speculative fault case, the functions verify, after locking the pte,
that the mmap sequence count has not changed since the start of the fault,
and thus that no mmap lock writers have been running concurrently with
the fault. After that point the page table lock serializes any further
races with concurrent mmap lock writers.

If the mmap sequence count check fails, both functions will return false
with the pte being left unmapped and unlocked.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-18-michel@lespinasse.org/

Conflicts:
    include/linux/mm.h

1. Fixed pte_map_lock and pte_spinlock macros not to fail when
CONFIG_SPECULATIVE_PAGE_FAULT=n

Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ibd7ccc2ead4fdf29f28c7657b312b2f677ac8836
2022-03-23 11:32:15 -07:00
Michel Lespinasse
6ab660d7cb FROMLIST: mm: implement speculative handling in __handle_mm_fault().
The speculative path calls speculative_page_walk_begin() before walking
the page table tree to prevent page table reclamation. The logic is
otherwise similar to the non-speculative path, but with additional
restrictions: in the speculative path, we do not handle huge pages or
wiring new pages tables.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-17-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If099534da8b0ac105bbaa5ea4714a6654032592a
2022-03-23 11:32:15 -07:00
Michel Lespinasse
f3f9f17a32 FROMLIST: mm: refactor __handle_mm_fault() / handle_pte_fault()
Move the code that initializes vmf->pte and vmf->orig_pte from
handle_pte_fault() to its single call site in __handle_mm_fault().

This ensures vmf->pte is now initialized together with the higher levels
of the page table hierarchy. This also prepares for speculative page fault
handling, where the entire page table walk (higher levels down to ptes)
needs special care in the speculative case.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-16-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Id550086fe568331aa71c91468f8314faad993b20
2022-03-23 11:32:15 -07:00
Michel Lespinasse
f8a4611b47 FROMLIST: mm: add speculative_page_walk_begin() and speculative_page_walk_end()
Speculative page faults will use these to protect against races with
page table reclamation.

This could always be handled by disabling local IRQs as the fast GUP
code does; however speculative page faults do not need to protect
against races with THP page splitting, so a weaker rcu read lock is
sufficient in the MMU_GATHER_RCU_TABLE_FREE case.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-15-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I3efe5fc6a5a49d537cf33e8093daeea42550077a
2022-03-23 11:32:14 -07:00
Michel Lespinasse
4dea585cfe FROMLIST: x86/mm: attempt speculative mm faults first
Attempt speculative mm fault handling first, and fall back to the
existing (non-speculative) code if that fails.

The speculative handling closely mirrors the non-speculative logic.
This includes some x86 specific bits such as the access_error() call.
This is why we chose to implement the speculative handling in arch/x86
rather than in common code.

The vma is first looked up and copied, under protection of the rcu
read lock. The mmap lock sequence count is used to verify the
integrity of the copied vma, and passed to do_handle_mm_fault() to
allow checking against races with mmap writers when finalizing the fault.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-14-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I2c078a173ee39f35af16daeee8c6a1466d10c3e8
2022-03-23 11:32:14 -07:00
Michel Lespinasse
0823d516af FROMLIST: mm: separate mmap locked assertion from find_vma
This adds a new __find_vma() function, which implements find_vma minus
the mmap_assert_locked() assertion.

find_vma() is then implemented as an inline wrapper around __find_vma().

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-13-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia999b8cb8f5eed93040ab4b3caaf90d739da908d
2022-03-23 11:32:14 -07:00
Michel Lespinasse
67cc8ce9a6 FROMLIST: mm: rcu safe vma freeing
This prepares for speculative page faults looking up and copying vmas
under protection of an rcu read lock, instead of the usual mmap read lock.

Note - it might also be feasible to just use SLAB_TYPESAFE_BY_RCU when
creating the vm_area_cachep, but that's probably too subtle to consider here.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-12-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I992fddb7c32c61bb4ab10b387f91c4e54c2250ef
2022-03-23 11:32:14 -07:00
Michel Lespinasse
29e9bee6fc FROMLIST: mm: add per-mm mmap sequence counter for speculative page fault handling.
The counter's write side is hooked into the existing mmap locking API:
mmap_write_lock() increments the counter to the next (odd) value, and
mmap_write_unlock() increments it again to the next (even) value.

The counter's speculative read side is supposed to be used as follows:

seq = mmap_seq_read_start(mm);
if (seq & 1)
	goto fail;
.... speculative handling here ....
if (!mmap_seq_read_check(mm, seq)
	goto fail;

This API guarantees that, if none of the "fail" tests abort
speculative execution, the speculative code section did not run
concurrently with any mmap writer.

This is very similar to a seqlock, but both the writer and speculative
readers are allowed to block. In the fail case, the speculative reader
does not spin on the sequence counter; instead it should fall back to
a different mechanism such as grabbing the mmap lock read side.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-11-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I60ba909e789371217cd77c39a562a66e156b68bb
2022-03-23 11:32:14 -07:00
Michel Lespinasse
4e2e391ff7 BACKPORT: FROMLIST: mm: add do_handle_mm_fault()
Add a new do_handle_mm_fault function, which extends the existing
handle_mm_fault() API by adding an mmap sequence count, to be used
in the FAULT_FLAG_SPECULATIVE case.

In the initial implementation, FAULT_FLAG_SPECULATIVE always fails
(by returning VM_FAULT_RETRY).

The existing handle_mm_fault() API is kept as a wrapper around
do_handle_mm_fault() so that we do not have to immediately update
every handle_mm_fault() call site.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>

Conflicts:
    mm/memory.c

1. Trivial merge conflict due to folios.

Link: https://lore.kernel.org/all/20220128131006.67712-10-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic07b6d84af3e5d1fcc856e0968f1a6dd1544fa88
2022-03-23 11:32:14 -07:00
Michel Lespinasse
f2fa9aae2e BACKPORT: FROMLIST: mm: add FAULT_FLAG_SPECULATIVE flag
Define the new FAULT_FLAG_SPECULATIVE flag, which indicates when we are
attempting speculative fault handling (without holding the mmap lock).

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>

Conflicts:
    include/linux/mm_types.h

1. Merge conflict due to enum fault_flag being defined in mm.h instead of
mm_types.h

Link: https://lore.kernel.org/all/20220128131006.67712-9-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I48ab427dfa4d7bdbe9932588bec7ae99e9e80ae9
2022-03-23 11:32:14 -07:00
Michel Lespinasse
f4108b362f FROMLIST: x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT so that the speculative fault
handling code can be compiled on this architecture.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-8-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ica804f098ea7c342a0749511d729470a0e978a2a
2022-03-23 11:32:13 -07:00
Michel Lespinasse
67ad4ad4de FROMLIST: mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT
This configuration variable will be used to build the code needed to
handle speculative page fault.

This is enabled by default on supported architectures with SMP and MMU set.

The architecture support is needed since the speculative page fault handler
is called from the architecture's page faulting code, and some code has to
be added there to try speculative fault handling first.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-7-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ie1dc3af30bf3949173b126e6469f372c4505ec8e
2022-03-23 11:32:13 -07:00