Patch series "mm: lru related cleanups", v2.
The cleanups are intended to reduce the verbosity in lru list operations
and make them less error-prone. A typical example would be how the
patches change __activate_page():
static void __activate_page(struct page *page, struct lruvec *lruvec)
{
if (!PageActive(page) && !PageUnevictable(page)) {
- int lru = page_lru_base_type(page);
int nr_pages = thp_nr_pages(page);
- del_page_from_lru_list(page, lruvec, lru);
+ del_page_from_lru_list(page, lruvec);
SetPageActive(page);
- lru += LRU_ACTIVE;
- add_page_to_lru_list(page, lruvec, lru);
+ add_page_to_lru_list(page, lruvec);
trace_mm_lru_activate(page);
There are a few more places like __activate_page() and they are
unnecessarily repetitive in terms of figuring out which list a page should
be added onto or deleted from. And with the duplicated code removed, they
are easier to read, IMO.
Patch 1 to 5 basically cover the above. Patch 6 and 7 make code more
robust by improving bug reporting. Patch 8, 9 and 10 take care of some
dangling helpers left in header files.
This patch (of 10):
There is add_page_to_lru_list(), and move_pages_to_lru() should reuse it,
not duplicate it.
Link: https://lkml.kernel.org/r/20210122220600.906146-1-yuzhao@google.com
Link: https://lore.kernel.org/linux-mm/20201207220949.830352-2-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-2-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 42895ea73b)
Bug: 227651406
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I7e09be6bedcd451c4e8c790c969306b6ca3adebd
This allows Bazel to load the value of $BRANCH in order
to determine the value of --dist_dir of copy_to_dist_dir
statically.
Test: TH
Bug: 229268271
Change-Id: Iff759b8188360ea1b2bc204d29750eece9095582
Signed-off-by: Yifan Hong <elsk@google.com>
Currently trying to move or delete a memslot results in a warning
and a failure. Userspace shouldn't be able to trigger kernel
warnings.
The cause is that in protected mode, stage-2 is managed by hyp.
Modifying a memslot flushes the shadow memslot, which tries to
unmap any stage-2 mapped pages.
Bug: 226890762
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Icc6a0aada76e8492285cd5509bad1ee57700af7c
We had a size mismatch for the return value, leading to EIOCBQUEUED
getting interpreted as a return size instead of an error code.
Test: generic/467, generic/013, and fuse_test
Bug: 217570523
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I64f9d5263f8b37d3c0e286467f9351997b294cc2
Allocates the iocb we create for asynchronous IO from a cache instead of
a regular kzalloc
Test: generic/467 and fuse_test
Bug: 217570523
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I27dcec89cd585835f6a8e80e1ae30c503f4038c8
The current name is a bit confusing. iocb_fuse could refer to the iocb
passed to fuse or created by fuse. The new name unambiguously refers to
the one passed in to fuse.
Test: compiles, behavior unchanged
Bug: 217570523
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I955500eb8a3186252427fd06ca6e99b4fec469b6
Existing fixattr was adjusting the same node twice.
Bug: 226655982
Test: generic/241 generic/269
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I4b1cb6d626ee6bd9010012ac126b78f14d6157d0
Fuse uses generic_file_llseek, so we must account for that in readdir to
ensure we read from the correct offset in the lower filesystem.
Bug: 226655281
Test: generic/257, fuse_test
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: Ie752c1c645e95b7c03ef9497562758a5c42b514a
MIGRATE_CMA is defined only when CONFIG_CMA. Thus, we
couldn't use MIGRATE_CMA directly to build for both
!CONFIG_CMA and CONFIG_CMA.
Let's use MIGRATE_RECLAIMABLE in the case.
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Idb4fc6f4ea02ab074f270ce62001182c8fff3b37
Since Android has pcp list for MIGRATE_CMA[1], it could cause
CMA allocation latency due to not freeing the MIGRATE_ISOLATE
page immediately.
Originally, MIGRATE_ISOLATED page is supposed to go buddy list
with skipping pcp list. Otherwise, the page could be reallocated
from pcp list or staying on the pcp list until the pcp is drained
so that CMA keeps retrying since it couldn't find the freed page
from buddy list. That worked before since the CMA pfnblocks changed
only from MIGRATE_CMA to MIGRATE_ISOLATE and free function logic
in page allocator has checked MIGRATE_ISOLATEness on every CMA
pages using below.
free_unref_page_commit
if (migratetype >= MIGRATE_PCPTYPES)
if(is_migrate_isolate(migratetype))
free_one_page(page);
It worked since enum MIGRATE_CMA was bigger than enum
MIGRATE_PCPTYPES but since [1], the enum MIGRATE_CMA is less than
MIGRATE_PCPTYPES so the logic above doesn't work any more.
It could cause following race
CPU 0 CPU 1
free_unref_page
migratetype = get_pfnblock_migratetype()
set_pcppage_migratetype(MIGRATE_CMA)
cma_alloc
alloc_contig_range
set_migrate_isolate(MIGRATE_ISOLATE)
add the page into pcp list
the page could be reallocated
This patch couldn't fix the race completely due to missing zone->lock
in order-0 page free(for performance reason). However, it's not a new
problem so we need to deal with the issue separately.
[1] ANDROID: mm: add cma pcp list
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ibea20085ce5bfb4b74b83b041f9bda9a380120f9
Ensure that the FFA memory range to be checked and annotated in the host
stage-2 page-table is page-aligned and that its size is calculated using
64-bit arithmetic to avoid the host triggering overflow and subsequent
truncation.
Bug: 228889679
Reported-by: Gulshan Singh <gsgx@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ifc51ee9598905cf2926d19c53159804f89d74040
Gulshan reports that the hypervisor is not pinning the host FFA mailbox
pages, therefore allowing the host to unshare them after registration
and to later donate them for things like page-table pages.
Pin the host FFA mailboxes to prevent the host from unsharing them while
they are in use.
Bug: 228931886
Reported-by: Gulshan Singh <gsgx@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I18ecad6ccaa3ef89015a71d97890fad55f0568f2
There are more vfs-only symbols that OEMs want to use, so place them in
the proper vfs-only namespace.
Bug: 157965270
Bug: 210074446
Bug: 227656251
Cc: Matthias Maennich <maennich@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I99b9facc8da45fb329f6627d204180d1f89bcf97
Xiling reports that the hypervisor dereferences the host memcache struct
twice when refilling its own memcache. This allows the host to change its
memcache head after it has been admitted and before it is consumed,
leading to an arbitrary write in hypervisor memory.
Fix this by copying the host memcache on the stack before starting to
refill hence guaranteeing its stability.
Bug: 228435321
Reported-by: Xiling Gong <xiling@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Ib7c5db203e4a4a7f27eb9f0c0083f4b5c726b4d9
This patch removes dump_page_pinner since it was not useful(IOW,
the page_pinner buffer to keep the history is enough).
This patch also changes mismatched printf format specifier.
Bug: 218731671
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I80c6f5ad656b3b0d27a50eabff4d1382559aa105
[Backport: resolve conflicts caused by CONFIG_CMA.]
KASAN changes that added new GFP flags mistakenly updated
__GFP_BITS_SHIFT as the total number of GFP bits instead of as a shift
used to define __GFP_BITS_MASK.
This broke LOCKDEP, as __GFP_BITS_MASK now gets the 25th bit enabled
instead of the 28th for __GFP_NOLOCKDEP.
Update __GFP_BITS_SHIFT to always count KASAN GFP bits.
In the future, we could handle all combinations of KASAN and LOCKDEP to
occupy as few bits as possible. For now, we have enough GFP bits to be
inefficient in this quick fix.
Link: https://lkml.kernel.org/r/462ff52742a1fcc95a69778685737f723ee4dfb3.1648400273.git.andreyknvl@google.com
Fixes: 9353ffa6e9 ("kasan, page_alloc: allow skipping memory init for HW_TAGS")
Fixes: 53ae233c30 ("kasan, page_alloc: allow skipping unpoisoning for HW_TAGS")
Fixes: f49d9c5bb1 ("kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 78d104f8b401c81d140adad91e027d7d83b3315c)
Bug: 217222520
Change-Id: I82484635012c5773c6ef9164a9368d9e61157f87
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Currently the generic IOMMU code lets the driver initialize its PT and
then invokes callbacks to set the permissions across the entire PA
range. Optimize this by making it a requirement on the driver to
initialize its PTs to all memory owned by the host. snapshot_host_stage2
then only calls the driver's callback for memory regions not owned by
the host.
Bug: 190463801
Bug: 218012133
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I51ff38cb4f4e28e19903af942776b401504c363e
Change the permissions that MPTs are initialized with from PROT_NONE to
PROT_RW. No functional change intended as the generic IOMMU code
sets permissions for the entire address space later. This will allow to
optimize boot time by only unmapping pages not available to host.
Bug: 190463801
Bug: 218012133
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: Ic29ec690a84cde22a2ce8fe33e7127711c6f0f3e
The second argument of the kvm_pgtable_walker callback was
misinterpreted as the end of the current entry, where in fact it is
the end of the walked memory region. Fix this by computing the end of
the current entry from the start and the level.
This did not affect correctness, as the code iterates linarly over
the entire address space, but it did affect boot time.
Bug: 190463801
Bug: 218012133
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I6d189b87645f47cd215a783c1bc9e1f032ff8c62
When vendor hooks are added to a file that previously didn't have any
vendor hooks, we end up indirectly including linux/tracepoint.h. This
causes some data types that used to be opaque (forward declared) to the
code to become visible to the code.
Modversions correctly catches this change in visibility, but we don't
really care about the data types made visible when linux/tracepoint.h is
included. So, hide this from modversions in the central vendor_hooks.h file
instead of having to fix this on a case by case basis.
This change itself will cause a one time CRC breakage/churn because it's
fixing the existing vendor hook headers, but should reduce unnecessary CRC
churns in the future.
To avoid future pointless CRC churn, vendor hook header files that include
vendor_hooks.h should not include linux/tracepoint.h directly.
Bug: 227513263
Bug: 226140073
Signed-off-by: Saravana Kannan <saravanak@google.com>
Change-Id: Ia88e6af11dd94fe475c464eb30a6e5e1e24c938b
It needs addtional struct page **pages params to judge whether
it's possible to migrate pages out of CMA.
Bug: 227475444
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I9a8aa57ff91228baf0fc970b8499464c07872c09
Introduce $debugfs/page_pinner/buffer_size to change
buffer_size on demand. The change of buffer_size will
reset the buffer.
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I505cdc2ee29aa0c6ed4e2dc2c0b6fcff77c388e4
We shouldn't waste memory for vendors who don't use
page_pinner so remove the page_pinner static buffer.
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I46ae2fb5000c4eb59253159032182ca106b39eb9
From the experience, longterm_pinner is not worth maintaining
considering how much it churns MM. Just drop the feature and
we are good with alloc_contig_failed.
The visible effect from this patch is
1. drop $debugfs/page_pinner/longterm_pinner
2. drop put_user_page expoerted API
3. rename alloc_contig_failed to buffer
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I68cc11db448260987a9e26b99647ecb55f571616
Currently, output format is a little hard to parse how long the
page has been pinned since user need to figure out the timeline
from migration failure detection to put event. Sometimes, the log
buffer would be overflowed so we lost the migration failure event
timeline, even. This patch stores the page pinning time in kernel
side and keep the information whenever page was released. Thus,
user could understand the output easier and never lose the information.
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I396f0c12438e0ff8a3497253b750a7e5bb342f57
Currently, page_pinner code are too ugly since we missed refactoring
last time due to GKI deadline. Let's make it better this time before
GKI is freezing.
What this patch is cleaning are __reset_page_pinner which is used
for freeing page as well as putting pages depending on free parameter.
It makes code too ugly for readability PoV and hard to make further
changes so split it with each put and free functions.
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I610ffc629eea5e996b7d55340b5589a3f49574d7
For various reasons based on the allocator behaviour and typical
use-cases at the time, when the max32_alloc_size optimisation was
introduced it seemed reasonable to couple the reset of the tracked
size to the update of cached32_node upon freeing a relevant IOVA.
However, since subsequent optimisations focused on helping genuine
32-bit devices make best use of even more limited address spaces, it
is now a lot more likely for cached32_node to be anywhere in a "full"
32-bit address space, and as such more likely for space to become
available from IOVAs below that node being freed.
At this point, the short-cut in __cached_rbnode_delete_update() really
doesn't hold up any more, and we need to fix the logic to reliably
provide the expected behaviour. We still want cached32_node to only move
upwards, but we should reset the allocation size if *any* 32-bit space
has become available.
Reported-by: Yunfei Wang <yf.wang@mediatek.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Miles Chen <miles.chen@mediatek.com>
Link: https://lore.kernel.org/r/033815732d83ca73b13c11485ac39336f15c3b40.1646318408.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Bug: 223712131
(cherry picked from commit 5b61343b50https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git core)
Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
Change-Id: I5026411dd022c6ddea5c0e4da6e69c7b14162c3f
(cherry picked from commit ec48b1892e)
When dealing with a guest with SVE enabled, we don't populate
the shadow SVE state, nor pin the SVE state at S1 EL2.
Fix both issues in one go.
Bug: 227292021
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <mzyngier@google.com>
Change-Id: I88dc7e9c84e5970ec2466a0aa98ad4e3c94711a0
It isn't always obvious what vcpu (host or shadow) should be used
in which context, nor whether the provided vcpu is valid or not.
To make this less error prone, provide a helper that will always
return a vcpu in the HYP address space as well as the corresponding
shadow state if we're in protected mode. If the host-provided vcpu
doesn't match the loaded vcpu, NULL is returned for both pointers.
In non-protected mode, no state is provided, of course, but the vcpu
is converted to its HYP pointer.
Bug: 227292021
Bug: 227768863
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <mzyngier@google.com>
Change-Id: Idcfc4e042ff05d97ae52f7991666935c1c570f10
We have namespaces, so use them for all vfs-exported namespaces so that
filesystems can use them, but not anything else.
Some in-kernel drivers that do direct filesystem accesses (because they
serve up files) are also allowed access to these symbols to keep 'make
allmodconfig' builds working properly, but it is not needed for Android
kernel images.
Bug: 157965270
Bug: 210074446
Cc: Matthias Maennich <maennich@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iaf6140baf3a18a516ab2d5c3966235c42f3f70de
The MODULE_IMPORT_NS() macro does not allow defined strings to work
properly with it, so add a layer of indirection to allow this to happen.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Matthias Maennich <maennich@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
(cherry picked from commit ca321ec743)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibd64ba139912ea10e81ac22490831129b23a31e1
Add code to head.S's el2_setup to detect MPAM and disable any EL2 traps.
This register resets to an unknown value, setting it to the default
parititons/pmg before we enable the MMU is the best thing to do.
Kexec/kdump will depend on this if the previous kernel left the CPU
configured with a restrictive configuration.
If linux is booted at the highest implemented exception level el2_setup
will clear the enable bit, disabling MPAM.
Signed-off-by: James Morse <james.morse@arm.com>
Bug: 221768437
(cherry picked from commit fa0ff38f06b397d8a92d88eb8083c2c5a20ac87f
git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v5.16)
Change-Id: I2758f7f7b236d09a207e13d1165efb6887e8611a
Signed-off-by: Valentin Schneider <Valentin.Schneider@arm.com>
[bm: amended commit msg, dropped config option and switched to named labels]
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
This is a partial cherry-pick of commit:
7fe77616f156 ("arm64: cpufeature: discover CPU support for MPAM")
from git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
Bug: 221768437
Change-Id: I77101abb07f9b73dbc7cc2a53ac44fbf772f0b1d
Signed-off-by: Valentin Schneider <Valentin.Schneider@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
virtio pci config structures may in future have non-standard bar
values in the bar field. We should anticipate this by skipping any
structures containing such a reserved value.
The bar value should never change: check for harmful modified values
we re-read it from the config space in vp_modern_map_capability().
Also clean up an existing check to consistently use PCI_STD_NUM_BARS.
Signed-off-by: Keir Fraser <keirf@google.com>
Link: https://lore.kernel.org/r/20220323140727.3499235-1-keirf@google.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 3f63a1d7f6)
Bug: 222232623
[keirf@: Pass virtio_pci_device to map_capability. Move everything
into virtio_pci_modern.c]
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: Idbba48154a051cf173b9cb0bd40c77fcf02902a4
This reverts commit 9e35276a53. Issue
were reported for the drivers that are using affinity managed IRQ
where manually toggling IRQ status is not expected. And we forget to
enable the interrupts in the restore path as well.
In the future, we will rework on the interrupt hardening.
Fixes: 9e35276a53 ("virtio_pci: harden MSI-X interrupts")
Reported-by: Marc Zyngier <maz@kernel.org>
Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220323031524.6555-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit eb4cecb453)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: I05264d9e61d558522a8a20cf87399aa3578b3a6e