This patch is to prepare for an up coming patch where we read
pre-translated fds from the sender buffer and translate them before
copying them to the target. It does not change run time.
The patch adds two new parameters to binder_translate_fd_array() to
hold the sender buffer and sender buffer parent. These parameters let
us call copy_from_user() directly from the sender instead of using
binder_alloc_copy_from_buffer() to copy from the target. Also the patch
adds some new alignment checks. Previously the alignment checks would
have been done in a different place, but this lets us print more
useful error messages.
Reviewed-by: Martijn Coenen <maco@android.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20211130185152.437403-4-tkjos@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 137131904
Bug: 257685302
(cherry picked from commit 656e01f3ab)
Change-Id: Ib786020e49bd33e35aec88d43965f9d98021fa53
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Transactions are copied from the sender to the target
first and objects like BINDER_TYPE_PTR and BINDER_TYPE_FDA
are then fixed up. This means there is a short period where
the sender's version of these objects are visible to the
target prior to the fixups.
Instead of copying all of the data first, copy data only
after any needed fixups have been applied.
Fixes: 457b9a6f09 ("Staging: android: add binder driver")
Reviewed-by: Martijn Coenen <maco@android.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20211130185152.437403-3-tkjos@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 137131904
Bug: 257685302
(cherry picked from commit 6d98eb95b4)
Change-Id: I8c14a03a2ee23c5f060c82e1626686f72eff33d9
Signed-off-by: Carlos Llamas <cmllamas@google.com>
vm_write_begin() was added before variable definition, producing a
"mixing declarations and code is a C99 extension" warning. Fix by
rearranging the code.
Fixes: ("ANDROID: mm/khugepaged: add missing vm_write_{begin|end}")
Bug: 257443051
Change-Id: I6e85ccfabd5e37b1397c654d61d0b8177326c3d8
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
When CONFIG_USERFAULTFD=n __VM_UFFD_FLAGS mask is undefined and produces
build error. Fix it by making the check conditional on CONFIG_USERFAULTFD.
Fixes: ("ANDROID: mm: prevent speculative page fault handling for userfaults")
Bug: 257443051
Change-Id: Ie9ff98b840032eb18183b49e3566cf178359948f
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
NUMA support with speculative page faults might be broken if
vma_replace_policy() replaces the mempolicy object used in
do_anonymous_page()
alloc_zeroed_user_highpage_movable()
alloc_page_vma()
alloc_pages_vma()
get_vma_policy()
__get_vma_policy() in speculative path does not always refcounts the
mempolicy object, therefore can't be relied on stabilizing it.
Rather than fixing this, just disable speculation for CONFIG_NUMA
for now and fix it if it's ever needed in Android.
Bug: 257443051
Change-Id: Ib5750b9809979a69a42ebfa6c130e123f416f1aa
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Invalid condition was introduced when porting the original SPF patch
which would affect NUMA mode.
Fixes: 736ae8bde8 ("FROMLIST: mm: adding speculative page fault failure trace events")
Bug: 257443051
Change-Id: Ib20c625615b279dc467588933a1f598dc179861b
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
move_page_tables() can move entire pmd or pud without locking individual
ptes. This is problematic for speculative page faults which do not take
mmap_lock because they rely on ptl lock when writing new pte value. To
avoid possible race, disable move_page_tables() optimization when
CONFIG_SPECULATIVE_PAGE_FAULT is enabled.
Bug: 257443051
Change-Id: Ib48dda08ecad1abc60d08fc089a6566a63393c13
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
vm_write_{begin|end} has to be called when mmap_lock is taken
exlusively. Add an assert statement in vm_write_begin to enforce
that. free_pgtables can free page tables without exclusive mmap_lock
if the vma was isolated, therefore avoid assertions in such cases.
Bug: 257443051
Change-Id: Ie81aefe025c743cda6f66717d2f08f4d78a55608
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
In a number of cases vm_write_{begin|end} is called while mmap_lock is
not owned exclusively. This is unnecessary and can affect correctness of
the sequence counting protecting speculative page fault handlers. Remove
extra calls.
Bug: 257443051
Change-Id: I1278638a0794448e22fbdab5601212b3b2eaebdc
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Speculative page fault handler needs to detect concurrent pmd changes
and relies on vma seqcount for that. pmdp_collapse_flush(), set_huge_pmd() and collapse_and_free_pmd() can modify a pmd.
vm_write_{begin|end} are needed in the paths which can call these
functions for page fault handler to detect pmd changes.
Bug: 257443051
Change-Id: Ieb784b5f44901b66a594f61b9e7c91190ff97f80
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Extend filemap_fault() to handle speculative faults.
In the speculative case, we will only be fishing existing pages out of
the page cache. The logic we use mirrors what is done in the
non-speculative case, assuming that pages are found in the page cache,
are up to date and not already locked, and that readahead is not
necessary at this time. In all other cases, the fault is aborted to be
handled non-speculatively.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-26-michel@lespinasse.org/
Conflicts:
mm/filemap.c
1. Added back file_ra_state variable used by SPF path.
2. Updated comment for filemap_fault to reflect SPF locking rules.
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I82eba7fcfc81876245c2e65bc5ae3d33ddfcc368
Checks of pmd during speculative page fault handling are racy because
pmd is unprotected and might be modified or cleared. This might cause
use-after-free reads from speculative path, therefore prevent such
checks. At the beginning of speculation pmd is checked to be valid and
if it's changed before page fault is handled, the change will be detected
and page fault will be retried under mmap_lock protection.
Bug: 257443051
Change-Id: I0cbd3b0b44e8296cf0d6cb298fae48c696580068
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
handle_userfault() should be protected against a concurrent
userfaultfd_release(), therefore handling a userfaults speculatively
without mmap_lock protection should be disallowed.
Bug: 257443051
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic6ae39329c73e8849048ea15b5351a49346404d3
Speculative page fault checks pmd to be valid before starting to handle
the page fault and pte_alloc() should do nothing if pmd stays valid.
If pmd gets changed during speculative page fault, we will detect the
change later and retry with mmap_lock. Therefore pte_alloc() can be
safely skipped and this prevents the racy pmd_lock() call which can
access pmd->ptl after pmd was cleared.
Bug: 257443051
Change-Id: Iec57df5530dba6e0e0bdf9f7500f910851c3d3fd
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
When MADV_PAGEOUT is called on a private file mapping VMA region,
we bail out early if the process is neither owner nor write capable
of the file. However, this VMA may have both private/shared clean
pages and private dirty pages. The opportunity of paging out the
private dirty pages (Anon pages) is missed. Fix this by caching
the file access check and use it later along with PageAnon() during
page walk.
We observe ~10% improvement in zram usage, thus leaving more available
memory on a 4GB RAM system running Android.
Link: https://lkml.kernel.org/r/1667971116-12900-1-git-send-email-quic_pkondeti@quicinc.com
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Cc: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8fc5be8efc3cf356f25098fbd4bda7c0e949c2ea
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 259329159
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Change-Id: I5f2d425aec94e5a75ebeaf90f9f5d7adf1975c59
Revert submission 2302443
Reason for revert: Series is not queued in a maintainer tree and has not been posted to a public mailing list.
Reverted Changes:
Iffd38bf97:FROMGIT: arm64: Work around Cortex-A510 erratum 24...
I694523564:FROMGIT: mm/vmalloc: Add override for lazy vunmap
Change-Id: I345e32bac76292413908b4a81295a228003fa4c0
Signed-off-by: Will Deacon <willdeacon@google.com>
Revert submission 2302443
Reason for revert: Series is not queued in a maintainer tree and has not been posted to a public mailing list.
Reverted Changes:
Iffd38bf97:FROMGIT: arm64: Work around Cortex-A510 erratum 24...
I694523564:FROMGIT: mm/vmalloc: Add override for lazy vunmap
Change-Id: I254d427b9dad0791ca8df4dc51be92e458c58728
Signed-off-by: Will Deacon <willdeacon@google.com>
commit 9cb636b5f6 upstream.
A race condition may occur if the user calls close() on another thread
during a write() operation on the device node of the efi capsule.
This is a race condition that occurs between the efi_capsule_write() and
efi_capsule_flush() functions of efi_capsule_fops, which ultimately
results in UAF.
So, the page freeing process is modified to be done in
efi_capsule_release() instead of efi_capsule_flush().
Bug: 246690517
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Link: https://lore.kernel.org/all/20220907102920.GA88602@ubuntu/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I6b11df91a87c027ebed4a7b239610a9b9e28cec0
Cortex-A510 erratum 2454944 may cause clean cache lines to be
erroneously written back to memory, breaking the assumptions we rely on
for non-coherent DMA. Try to mitigate this by implementing special DMA
ops that do their best to avoid cacheable aliases via a combination of
bounce-buffering and manipulating the linear map directly, to minimise
the chance of DMA-mapped pages being speculated back into caches.
The other main concern is initial entry, where cache lines covering the
kernel image might potentially become affected between being cleaned by
the bootloader and the kernel being called, so perform some additional
maintenance to be safe in that regard too. Cortex-A510 supports S2FWB,
so KVM should be unaffected.
Bug: 223346425
(cherry picked from commit 5bb88dd8ed70973eeb15722710a46d60951c8255
https://git.gitlab.arm.com/linux-arm/linux-rm.git arm64/2454944)
Change-Id: Iffd38bf97114f7151f01c70750b465fc991c89c8
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Add an interface for arch code to disable lazy vunmap by forcing the
threshold to zero. This might be interesting for debugging/testing in
general, but primarily helps a horrible situation which needs to
guarantee that vmalloc aliases are up-to-date from atomic context,
wherein the only practical solution is to never let them get stale in
the first place.
Bug: 223346425
(cherry picked from commit 2a34c1503b85f49dd472dfd932dfcd16cab8ee8a
https://git.gitlab.arm.com/linux-arm/linux-rm.git arm64/2454944)
Change-Id: I694523564357b4c43d30c129af1e89fd803824d3
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Patch series "mm: ensure consistency of memory map poisoning".
Currently memory map allocation for FLATMEM case does not poison the
struct pages regardless of CONFIG_PAGE_POISON setting.
This happens because allocation of the memory map for FLATMEM and SPARSMEM
use different memblock functions and those that are used for SPARSMEM case
(namely memblock_alloc_try_nid_raw() and memblock_alloc_exact_nid_raw())
implicitly poison the allocated memory.
Another side effect of this implicit poisoning is that early setup code
that uses the same functions to allocate memory burns cycles for the
memory poisoning even if it was not intended.
These patches introduce memmap_alloc() wrapper that ensure that the memory
map allocation is consistent for different memory models.
This patch (of 4):
Currently memory map for the holes is initialized only when SPARSEMEM
memory model is used. Yet, even with FLATMEM there could be holes in the
physical memory layout that have memory map entries.
For instance, the memory reserved using e820 API on i386 or
"reserved-memory" nodes in device tree would not appear in memblock.memory
and hence the struct pages for such holes will be skipped during memory
map initialization.
These struct pages will be zeroed because the memory map for FLATMEM
systems is allocated with memblock_alloc_node() that clears the allocated
memory. While zeroed struct pages do not cause immediate problems, the
correct behaviour is to initialize every page using __init_single_page().
Besides, enabling page poison for FLATMEM case will trigger
PF_POISONED_CHECK() unless the memory map is properly initialized.
Make sure init_unavailable_range() is called for both SPARSEMEM and
FLATMEM so that struct pages representing memory holes would appear as
PG_Reserved with any memory layout.
[rppt@kernel.org: fix microblaze]
Link: https://lkml.kernel.org/r/YQWW3RCE4eWBuMu/@kernel.org
(cherry picked from commit c3ab6baf6a)
Bug: 258556132
Link: https://lkml.kernel.org/r/20210714123739.16493-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20210714123739.16493-2-rppt@kernel.org
Change-Id: Ib60682288ba76e65384de91b70a08662ead12934
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to the databook ep0 should be in setup phase during reset.
If host issues reset between control transfers, ep0 will be in an
invalid state. Fix this by issuing stall and restart on ep0 if it
is not in setup phase.
Also SW needs to complete pending control transfer and setup core for
next setup stage as per data book. Hence check ep0 state during reset
interrupt handling and make sure active transfers on ep0 out/in
endpoint are stopped by queuing ENDXFER command for that endpoint and
restart ep0 out again to receive next setup packet.
Signed-off-by: Mayank Rana <quic_mrana@quicinc.com>
Link: https://lore.kernel.org/r/1651693001-29891-1-git-send-email-quic_mrana@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9d778f0c5f)
Bug: 258997352
Change-Id: Ie7482ba08d4f77ad65f404b3014ac880f5a5a75e
Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
'struct damon_target' creation function, 'damon_new_target()' is not
initializing its '->list' field, unlike other DAMON structs creator
functions such as 'damon_new_region()'. Normal users of
'damon_new_target()' initializes the field by adding the target to DAMON
context's targets list, but some code could access the uninitialized
field.
This commit avoids the case by initializing the field in
'damon_new_target()'.
Bug: 254441685
Link: https://lkml.kernel.org/r/20221002193130.8227-1-sj@kernel.org
Fixes: f23b8eee18 ("mm/damon/core: implement region-based sampling")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b1f44cdaba)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ie500358e0cc7d5bf82225e6e2b5229f6629736f4
pmd_huge() is used to validate if the pmd entry is mapped by a huge page,
also including the case of non-present (migration or hwpoisoned) pmd entry
on arm64 or x86 architectures. This means that pmd_pfn() can not get the
correct pfn number for a non-present pmd entry, which will cause
damon_get_page() to get an incorrect page struct (also may be NULL by
pfn_to_online_page()), making the access statistics incorrect.
This means that the DAMON may make incorrect decision according to the
incorrect statistics, for example, DAMON may can not reclaim cold page
in time due to this cold page was regarded as accessed mistakenly if
DAMOS_PAGEOUT operation is specified.
Moreover it does not make sense that we still waste time to get the page
of the non-present entry. Just treat it as not-accessed and skip it,
which maintains consistency with non-present pte level entries.
So add pmd entry present validation to fix the above issues.
Bug: 254441685
Link: https://lkml.kernel.org/r/58b1d1f5fbda7db49ca886d9ef6783e3dcbbbc98.1660805030.git.baolin.wang@linux.alibaba.com
Fixes: 3f49584b26 ("mm/damon: implement primitives for the virtual memory address spaces")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit c8b9aff419)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Idda1765dcbc93a28ad38ccc53688d69b64202330
When user tries to create a DAMON context via the DAMON debugfs interface
with a name of an already existing context, the context directory creation
fails but a new context is created and added in the internal data
structure, due to absence of the directory creation success check. As a
result, memory could leak and DAMON cannot be turned on. An example test
case is as below:
# cd /sys/kernel/debug/damon/
# echo "off" > monitor_on
# echo paddr > target_ids
# echo "abc" > mk_context
# echo "abc" > mk_context
# echo $$ > abc/target_ids
# echo "on" > monitor_on <<< fails
Return value of 'debugfs_create_dir()' is expected to be ignored in
general, but this is an exceptional case as DAMON feature is depending
on the debugfs functionality and it has the potential duplicate name
issue. This commit therefore fixes the issue by checking the directory
creation failure and immediately return the error in the case.
Bug: 254441685
Link: https://lkml.kernel.org/r/20220821180853.2400-1-sj@kernel.org
Fixes: 75c1c2b53c ("mm/damon/dbgfs: support multiple contexts")
Signed-off-by: Badari Pulavarty <badari.pulavarty@intel.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [ 5.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d26f607036)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I8951b95f41306818ef1b4a5789369a84d8ca2cf2
CRYPTO_LIB_CHACHA_GENERIC doesn't need to select XOR_BLOCKS. It perhaps
was thought that it's needed for __crypto_xor, but that's not the case.
Enabling XOR_BLOCKS is problematic because the XOR_BLOCKS code runs a
benchmark when it is initialized. That causes a boot time regression on
systems that didn't have it enabled before.
Therefore, remove this unnecessary and problematic selection.
Bug: 254441685
Fixes: e56e189855 ("lib/crypto: add prompts back to crypto libraries")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 874b301985)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I89d552f31062ad677407107280874bc7eafe60bf
On page 362 of the USB3.2 specification (
https://usb.org/sites/default/files/usb_32_20210125.zip),
The 'SuperSpeed Endpoint Companion Descriptor' shall only be returned
by Enhanced SuperSpeed devices that are operating at Gen X speed.
Each endpoint described in an interface is followed by a 'SuperSpeed
Endpoint Companion Descriptor'.
If users use SuperSpeed UDC, host can't recognize the device if endpoint
doesn't have 'SuperSpeed Endpoint Companion Descriptor' followed.
Currently in the uac2 driver code:
1. ss_epout_desc_comp follows ss_epout_desc;
2. ss_epin_fback_desc_comp follows ss_epin_fback_desc;
3. ss_epin_desc_comp follows ss_epin_desc;
4. Only ss_ep_int_desc endpoint doesn't have 'SuperSpeed Endpoint
Companion Descriptor' followed, so we should add it.
Bug: 254441685
Fixes: eaf6cbe099 ("usb: gadget: f_uac2: add volume and mute support")
Cc: stable <stable@kernel.org>
Signed-off-by: Jing Leng <jleng@ambarella.com>
Signed-off-by: Jack Pham <quic_jackp@quicinc.com>
Link: https://lore.kernel.org/r/20220721014815.14453-1-quic_jackp@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f511aef2eb)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I7e4a0ce5482f44df32cfa13cc011281c2bc6393d
KVM does not support AArch32 EL0 on asymmetric systems. To that end,
prevent userspace from configuring a vCPU in such a state through
setting PSTATE.
It is already ABI that KVM rejects such a write on a system where
AArch32 EL0 is unsupported. Though the kernel's definition of a 32bit
system changed in commit 2122a83331 ("arm64: Allow mismatched
32-bit EL0 support"), KVM's did not.
Bug: 254441685
Fixes: 2122a83331 ("arm64: Allow mismatched 32-bit EL0 support")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220816192554.1455559-3-oliver.upton@linux.dev
(cherry picked from commit b10d86fb8e)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I73b63bf79bfbade51dc417fe2c76fd0057eb21b8
The patch d0be8347c6: "Bluetooth: L2CAP: Fix use-after-free caused
by l2cap_chan_put" from Jul 21, 2022, leads to the following Smatch
static checker warning:
net/bluetooth/l2cap_core.c:1977 l2cap_global_chan_by_psm()
error: we previously assumed 'c' could be null (see line 1996)
Bug: 254441685
Fixes: d0be8347c6 ("Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
(cherry picked from commit 332f1795ca)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I81c57064d558d8304d889fa3448a8aff45c7a408
We have an application with a lot of threads that use a shared mmap backed
by tmpfs mounted with -o huge=within_size. This application started
leaking loads of huge pages when we upgraded to a recent kernel.
Using the page ref tracepoints and a BPF program written by Tejun Heo we
were able to determine that these pages would have multiple refcounts from
the page fault path, but when it came to unmap time we wouldn't drop the
number of refs we had added from the faults.
I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
huge=always, and then spawned 20 threads all looping faulting random
offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
page aligned ranges. This very quickly reproduced the problem.
The problem here is that we check for the case that we have multiple
threads faulting in a range that was previously unmapped. One thread maps
the PMD, the other thread loses the race and then returns 0. However at
this point we already have the page, and we are no longer putting this
page into the processes address space, and so we leak the page. We
actually did the correct thing prior to f9ce0be71d, however it looks
like Kirill copied what we do in the anonymous page case. In the
anonymous page case we don't yet have a page, so we don't have to drop a
reference on anything. Previously we did the correct thing for file based
faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
the page we faulted in.
Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
case, this makes us drop the ref on the page properly, and now my
reproducer no longer leaks the huge pages.
Bug: 254441685
[josef@toxicpanda.com: v2]
Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com
Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com
Fixes: f9ce0be71d ("mm: Cleanup faultaround and finish_fault() codepaths")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 3fe2895cfe)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I982509aab4bcbf22d66aff5e1d3dfce927426f51
Both genpd_debug_add() and genpd_debug_remove() may be called
indirectly by other drivers while genpd_debugfs_dir is not yet
set. For example, drivers can call pm_genpd_init() in probe or
pm_genpd_init() in probe fail/cleanup path:
pm_genpd_init()
--> genpd_debug_add()
pm_genpd_remove()
--> genpd_remove()
--> genpd_debug_remove()
At this time, genpd_debug_init() may not yet be called.
genpd_debug_add() checks that if genpd_debugfs_dir is NULL, it
will return directly. Make sure this is also checked
in pm_genpd_remove(), otherwise components under debugfs root
which has the same name as other components under pm_genpd may
be accidentally removed, since NULL represents debugfs root.
Bug: 254441685
Fixes: 718072ceb2 ("PM: domains: create debugfs nodes when adding power domains")
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 37101d3c71)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I0b7e93f4bacf5d537f7f44cbb51237165d859054
If CONFIG_DMA_DECLARE_COHERENT is not set,
make ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- will be failed, like this:
drivers/remoteproc/remoteproc_core.c: In function ‘rproc_rvdev_release’:
./include/linux/dma-map-ops.h:182:42: error: statement with no effect [-Werror=unused-value]
#define dma_release_coherent_memory(dev) (0)
^
drivers/remoteproc/remoteproc_core.c:464:2: note: in expansion of macro ‘dma_release_coherent_memory’
dma_release_coherent_memory(dev);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
The return type of function dma_release_coherent_memory in CONFIG_DMA_DECLARE_COHERENT area is void, so in !CONFIG_DMA_DECLARE_COHERENT area it should neither return any value nor be defined as zero.
Bug: 254441685
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: e61c451476 ("dma-mapping: Add dma_release_coherent_memory to DMA API")
Signed-off-by: Ren Zhijie <renzhijie2@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220630123528.251181-1-renzhijie2@huawei.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
(cherry picked from commit 50d6281ce9)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I2af85ae87d77721972a3c3b01288da43d8fb16bb
Commit 64dd68497b relocated and renamed the alloc_calls and
free_calls files from /sys/kernel/slab/NAME/*_calls over to
/sys/kernel/debug/slab/NAME/*_calls but didn't update the slabinfo tool
with the new location.
This change will now have slabinfo look at the new location (and filenames)
with a fallback to the prior files.
Bug: 254441685
Fixes: 64dd68497b ("mm: slub: move sysfs slab alloc/free interfaces to debugfs")
Cc: stable@vger.kernel.org
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Tested-by: Stéphane Graber <stgraber@ubuntu.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
(cherry picked from commit 0c7e0d699e)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I7312aa9e86213bd37916e14c8e0e430e9c3cd2d4
blk_mq_run_hw_queues() could be run when there isn't queued request and
after queue is cleaned up, at that time tagset is freed, because tagset
lifetime is covered by driver, and often freed after blk_cleanup_queue()
returns.
So don't touch ->tagset for figuring out current default hctx by the mapping
built in request queue, so use-after-free on tagset can be avoided. Meantime
this way should be fast than retrieving mapping from tagset.
Bug: 254441685
Cc: "yukuai (C)" <yukuai3@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Fixes: b6e68ee825 ("blk-mq: Improve performance of non-mq IO schedulers with multiple HW queues")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220522122350.743103-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 5d05426e2d)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ifebb3d15ddfab0b41d8f30b556969ac68058ca8b
In the genpd governor we walk the list of child-domains to take into
account their next_wakeup. If the child-domain itself, doesn't have a
governor assigned to it, we can end up using the next_wakeup value before
it has been properly initialized. To prevent a possible incorrect behaviour
in the governor, let's initialize next_wakeup to KTIME_MAX.
Bug: 254441685
Fixes: c79aa080fb ("PM: domains: use device's next wakeup to determine domain idle state")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 622d9b5577)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Iee5350e44c89dc566c2058f55103985928347a23
Tryng to rename a directory that has all following properties fails with
EINVAL and triggers the 'WARN_ON_ONCE(!fscrypt_has_encryption_key(dir))'
in f2fs_match_ci_name():
- The directory is casefolded
- The directory is encrypted
- The directory's encryption key is not yet set up
- The parent directory is *not* encrypted
The problem is incorrect handling of the lookup of ".." to get the
parent reference to update. fscrypt_setup_filename() treats ".." (and
".") specially, as it's never encrypted. It's passed through as-is, and
setting up the directory's key is not attempted. As the name isn't a
no-key name, f2fs treats it as a "normal" name and attempts a casefolded
comparison. That breaks the assumption of the WARN_ON_ONCE() in
f2fs_match_ci_name() which assumes that for encrypted directories,
casefolded comparisons only happen when the directory's key is set up.
We could just remove this WARN_ON_ONCE(). However, since casefolding is
always a no-op on "." and ".." anyway, let's instead just not casefold
these names. This results in the standard bytewise comparison.
Bug: 254441685
Fixes: 7ad08a58bf ("f2fs: Handle casefolding with Encryption")
Cc: <stable@vger.kernel.org> # v5.11+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit b5639bb431)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Id53cfda129b034aa1ebefba8e9e3135e3def62d7