In current iommu_unmap, this code is:
iommu_iotlb_gather_init(&iotlb_gather);
ret = __iommu_unmap(domain, iova, size, &iotlb_gather);
iommu_iotlb_sync(domain, &iotlb_gather);
We could gather the whole iova range in __iommu_unmap, and then do tlb
synchronization in the iommu_iotlb_sync.
This patch implement this, Gather the range in mtk_iommu_unmap.
then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
we don't call iommu_iotlb_gather_add_page since our tlb synchronization
could be regardless of granule size.
In this way, gather->start is impossible ULONG_MAX, remove the checking.
This patch aims to do tlb synchronization *once* in the iommu_unmap.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210107122909.16317-7-yong.wu@mediatek.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit f21ae3b100)
BUG=b:174513569
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Change-Id: Ieab4a1b914d7a1f0fe5100306f3d5ff13437127a
iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
mapping. This patch adds iova and size as the parameters in it. then the
IOMMU driver could flush tlb with the whole range once after iova mapping
to improve performance.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210107122909.16317-3-yong.wu@mediatek.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 2ebbd25873)
(YongWu: this branch has this commit which is not in mainline:
c8ab4bae58 FROMLIST: iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
)
BUG=b:174513569
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Change-Id: Ie463d3cf7efa9320661e7e5fb0042f54d9f1a892
The only user of tlb_flush_leaf is a particularly hairy corner of the
Arm short-descriptor code, which wants a synchronous invalidation to
minimise the races inherent in trying to split a large page mapping.
This is already far enough into "here be dragons" territory that no
sensible caller should ever hit it, and thus it really doesn't need
optimising. Although using tlb_flush_walk there may technically be
more heavyweight than needed, it does the job and saves everyone else
having to carry around useless baggage.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/9844ab0c5cb3da8b2f89c6c2da16941910702b41.1606324115.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit fefe8527a1)
BUG=b:174513569
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Change-Id: I04128ee941e78b3aad53d7d227d8d825e9ee6fe6
Add below kernel symbols for vendor modules to collect
debug information from running/panic kernel. These debug
information could be related to ftrace, irqstat, dmesg
etc.
android_debug_per_cpu_symbol
android_debug_symbol
copy_from_kernel_nofault
ipi_desc_get
kstat
kstat_irqs_cpu
kstat_irqs_usr
log_buf_addr_get
log_buf_len_get
nr_ipi_get
nr_irqs
per_cpu_ptr_to_phys
register_die_notifier
register_module_notifier
seq_buf_printf
__tracepoint_android_vh_ftrace_dump_buffer
__tracepoint_android_vh_ftrace_format_check
__tracepoint_android_vh_ftrace_oops_enter
__tracepoint_android_vh_ftrace_oops_exit
__tracepoint_android_vh_ftrace_size_check
unregister_die_notifier
unregister_module_notifier
Bug: 183479351
Change-Id: I8547e3f15a2cb12a72bc43e449fbaa8f31ec8759
Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Add vendor hook to get the binder message for vendor-specific tuning.
Bug: 182496370
Signed-off-by: Zhuguangqing <zhuguangqing@xiaomi.com>
Change-Id: Id47e59c4e3ccd07b26eef758ada147b98cd1964e
timerfd doesn't create any wakelocks, but eventpoll can. When it does,
it names them after the underlying file descriptor, and since all
timerfd file descriptors are named "[timerfd]" (which saves memory on
systems like desktops with potentially many timerfd instances), all
wakesources created as a result of using the eventpoll-on-timerfd idiom
are called... "[timerfd]".
However, it becomes impossible to tell which "[timerfd]" wakesource is
affliated with which process and hence troubleshooting is difficult.
Adding vendor hooks to allow vendor to assign appropriate names to
timerfd descriptors and eventoll wakesource.
Bug: 155142106
Signed-off-by: Manish Varma <varmam@google.com>
Change-Id: I330a42ab48bed4b26d5eb2f636925c66061165ec
This reverts commit 7d212a5102.
This commit is superseded by commit a0a0b3f42e ("FROMLIST: mm: fs:
Invalidate BH LRU during page migration").
Conflicts:
fs/buffer.c
1. In fs/buffer.c had to keep invalidate_bh_lrus_cpu() function that was
introduced in commit a0a0b3f42e as well as in the reverted commit.
Bug: 174118021
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I60d0a4e8beb35389727f29fea6ca4640ecee40a7
commit 67e3f828bd upstream.
The ARM 'adrl' pseudo instruction is a bit problematic, as it does not
exist in Thumb mode, and it is not implemented by Clang either. Since
the Thumb variant has a slightly bigger range, it is sometimes necessary
to emit the 'adrl' variant in ARM mode where Thumb mode can use adr just
fine. However, that still leaves the Clang issue, which does not appear
to be supporting this any time soon.
So let's switch to the adr_l macro, which works for both ARM and Thumb,
and has unlimited range.
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Bug: 141693040
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Change-Id: Ia296c0782cf8201ad8f96095849d034eefe5124b
commit 0b1674638a upstream.
Like arm64, ARM supports position independent code sequences that
produce symbol references with a greater reach than the ordinary
adr/ldr instructions. Since on ARM, the adrl pseudo-instruction is
only supported in ARM mode (and not at all when using Clang), having
a adr_l macro like we do on arm64 is useful, and increases symmetry
as well.
Currently, we use open coded instruction sequences involving literals
and arithmetic operations. Instead, we can use movw/movt pairs on v7
CPUs, circumventing the D-cache entirely.
E.g., on v7+ CPUs, we can emit a PC-relative reference as follows:
movw <reg>, #:lower16:<sym> - (1f + 8)
movt <reg>, #:upper16:<sym> - (1f + 8)
1: add <reg>, <reg>, pc
For older CPUs, we can emit the literal into a subsection, allowing it
to be emitted out of line while retaining the ability to perform
arithmetic on label offsets.
E.g., on pre-v7 CPUs, we can emit a PC-relative reference as follows:
ldr <reg>, 2f
1: add <reg>, <reg>, pc
.subsection 1
2: .long <sym> - (1b + 8)
.previous
This is allowed by the assembler because, unlike ordinary sections,
subsections are combined into a single section in the object file, and
so the label references are not true cross-section references that are
visible as relocations. (Subsections have been available in binutils
since 2004 at least, so they should not cause any issues with older
toolchains.)
So use the above to implement the macros mov_l, adr_l, ldr_l and str_l,
all of which will use movw/movt pairs on v7 and later CPUs, and use
PC-relative literals otherwise.
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Bug: 141693040
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Change-Id: I030ddac6e4c0681897e7e379d3207ab386eb89b0
commit 3c9f5708b7 upstream.
This patch replaces 6 IWMMXT instructions Clang's integrated assembler
does not support in iwmmxt.S using macros, while making sure GNU
assembler still emit the same instructions. This should be easier than
providing full IWMMXT support in Clang. This is one of the last bits of
kernel code that could be compiled but not assembled with clang. Once
all of it works with IAS, we no longer need to special-case 32-bit Arm
in Kbuild, or turn off CONFIG_IWMMXT when build-testing.
"Intel Wireless MMX Technology - Developer Guide - August, 2002" should
be referenced for the encoding schemes of these extensions.
Link: https://github.com/ClangBuiltLinux/linux/issues/975
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Jian Cai <jiancai@google.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Bug: 141693040
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Change-Id: Iff763cbe46155a05d10d943057a34057809b8cc7
User space needs to know if binder transactions occurred to frozen
processes. Introduce a new BINDER_GET_FROZEN ioctl and keep track of
transactions occurring to frozen proceses.
Bug: 180989544
(cherry picked from commit c55019c24b22d6770bd8e2f12fbddf3f83d37547
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-testing)
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Li Li <dualli@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20210316011630.1121213-4-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ie631f331ba4ca94a3bcdd43dec25fe9ba1306af2
when interrupted by a signal, binder_wait_for_work currently returns
-ERESTARTSYS. This error code isn't propagated to user space, but a way
to handle interruption due to signals must be provided to code using
this API.
Replace this instance of -ERESTARTSYS with -EINTR, which is propagated
to user space.
Bug: 180989544
(cherry picked from commit 48f10b7ed0c23e2df7b2c752ad1d3559dad007f9
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-testing)
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Li Li <dualli@google.com>
Test: built, booted, interrupted a worker thread within
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20210316011630.1121213-3-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ie6c7993cab699bc2c1a25a2f9d94b200a1156e5d
Frozen tasks can't process binder transactions, so a way is required to
inform transmitting ends of communication failures due to the frozen
state of their receiving counterparts. Additionally, races are possible
between transitions to frozen state and binder transactions enqueued to
a specific process.
Implement BINDER_FREEZE ioctl for user space to inform the binder driver
about the intention to freeze or unfreeze a process. When the ioctl is
called, block the caller until any pending binder transactions toward
the target process are flushed. Return an error to transactions to
processes marked as frozen.
Bug: 180989544
(cherry picked from commit 15949c3cdd97bccdcd45c0c0f6c31058520b6494
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-testing)
Co-developed-by: Todd Kjos <tkjos@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Li Li <dualli@google.com>
Link: https://lore.kernel.org/r/20210316011630.1121213-2-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ia1b5951cd99eeb98b59e06c3e27d59062dc725f6
Add pci_dev_present to qcom symbol list for a vendor module to check a
particular PCI device is present in the device list or not.
Bug: 182815465
Change-Id: I2d861a549d2baf57bb86925f1094fb4df46833af
Signed-off-by: Mahesh Kumar Kalikot Veetil <mkalikot@codeaurora.org>
This change adds the sysfs_emit function to the symbol list, which
allows for safely writing sysfs attributes to the input buffer.
Bug: 183479354
Change-Id: Ie3e6c2a014cde06362d0786bc5f59c9f0616bdc8
Signed-off-by: Siddharth Gupta <quic_sidgup@quicinc.com>
This is for rate limiting.
Test: TreeHugger
Bug: 179454839
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I3bb4540791e0db408ab996b270a2d85ca5dd3bda
Per drivers/net/usb/Kconfig:252:
This driver provides support for CDC NCM (Network Control Model Device USB Class Specification).
For example this includes 2.5gbps capable RTL8156 USB Ethernet dongles.
Test: TreeHugger
Bug: 183564444
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I2ca14791e641f242f677d3f3b164b678396e6ca0
Per drivers/net/usb/Kconfig:620
USB_NET_AQC111 is "Aquantia AQtion USB to 5/2.5GbE Controllers support"
This option adds support for Aquantia AQtion USB
Ethernet adapters based on AQC111U/AQC112 chips.
This driver should work with at least the following devices:
* Aquantia AQtion USB to 5GbE
Test: TreeHugger
Bug: 183564444
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I4ac418ed4a16c12900fe4ad334407fe58dd64afa
If the gadget driver doesn't specify a max_speed, then use the
controller's maximum supported speed as default. For DWC_usb32 IP, the
gadget's speed maybe limited to gen2x1 rate only if the driver's
max_speed is unknown. This scenario should not occur with the current
implementation since the default gadget driver's max_speed should always
be specified. However, to make the driver more robust and help with
readability, let's cover all the scenarios in __dwc3_gadget_set_speed().
Fixes: 450b9e9fab ("usb: dwc3: gadget: Set speed only up to the max supported")
Cc: <stable@vger.kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/55ac7001af73bfe9bc750c6446ef4ac8cf6f9313.1615254129.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 93f1d43c57)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8cadeeb472317c2d7fe29a4f534386cb77427841
Exporting the symbol freezer_cgrp_subsys, in that vendor module can
add can_attach & cancel_attach member function. It is vendor-specific
tuning.
Bug: 182496370
Signed-off-by: Zhuguangqing <zhuguangqing@xiaomi.com>
Change-Id: I153682b9d1015eed3f048b45ea6495ebb8f3c261
The ACPI probe starts failing since commit bea46b9815 ("usb: dwc3:
qcom: Add interconnect support in dwc3 driver"), because there is no
interconnect support for ACPI, and of_icc_get() call in
dwc3_qcom_interconnect_init() will just return -EINVAL.
Fix the problem by skipping interconnect init for ACPI probe, and then
the NULL icc_path_ddr will simply just scheild all ICC calls.
Fixes: bea46b9815 ("usb: dwc3: qcom: Add interconnect support in dwc3 driver")
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210311060318.25418-1-shawn.guo@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5e4010e36a)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iec6a686e1445a562894cafadad9099b6ae6ba0bf
The current dwc3_gadget_reset_interrupt() will stop any active
transfers, but only addresses blocking of EP queuing for while we are
coming from a disconnected scenario, i.e. after receiving the disconnect
event. If the host decides to issue a bus reset on the device, the
connected parameter will still be set to true, allowing for EP queuing
to continue while we are disabling the functions. To avoid this, set the
connected flag to false until the stop active transfers is complete.
Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Link: https://lore.kernel.org/r/1616146285-19149-3-git-send-email-wcheng@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 71ca43f30dhttps://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-next)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9d439e917639999f6e5219e1c053087dd2e762b1
Whitelist sched_setattr, so that we may set scheduler properties for
threads from the kernel, just as one can do from user space.
Bug: 183420374
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Change-Id: Ie2a7b611a22f9bc01b2317f4af3ac811080a257a
Try to mitigate potential future driver core api changes by adding
padding to a number of core internal scheduler structures.
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0ef2f8dd5f3259dcf443c5045aa1e8505ed78a76
Try to mitigate potential future driver core api changes by adding a
padding to stuct vm_area_struct and struct zone.
Based on a patch from Michal Marek <mmarek@suse.cz> from the SLES kernel
Leaf changes summary: 3 artifacts changed
Changed leaf types summary: 3 leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
'struct vm_area_struct at mm_types.h:292:1' changed:
type size changed from 1472 to 1728 (in bits)
4 data member insertions:
'u64 vm_area_struct::android_kabi_reserved1', at offset 1472 (in bits) at mm_types.h:365:1
'u64 vm_area_struct::android_kabi_reserved2', at offset 1536 (in bits) at mm_types.h:366:1
'u64 vm_area_struct::android_kabi_reserved3', at offset 1600 (in bits) at mm_types.h:367:1
'u64 vm_area_struct::android_kabi_reserved4', at offset 1664 (in bits) at mm_types.h:368:1
1435 impacted interfaces:
'struct zone at mmzone.h:420:1' changed:
type size changed from 12800 to 13312 (in bits)
4 data member insertions:
'u64 zone::android_kabi_reserved1', at offset 12672 (in bits) at mmzone.h:569:1
'u64 zone::android_kabi_reserved2', at offset 12736 (in bits) at mmzone.h:570:1
'u64 zone::android_kabi_reserved3', at offset 12800 (in bits) at mmzone.h:571:1
'u64 zone::android_kabi_reserved4', at offset 12864 (in bits) at mmzone.h:572:1
624 impacted interfaces:
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I81702aa833f419928e0e32e9609722b98592c171
Try to mitigate potential future driver core api changes by adding a
padding to struct vfsmount.
Based on a patch from Michal Marek <mmarek@suse.cz> from the SLES kernel
Leaf changes summary: 1 artifact changed
Changed leaf types summary: 1 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
'struct vfsmount at mount.h:68:1' changed:
type size changed from 256 to 512 (in bits)
4 data member insertions:
'u64 vfsmount::android_kabi_reserved1', at offset 192 (in bits) at mount.h:73:1
'u64 vfsmount::android_kabi_reserved2', at offset 256 (in bits) at mount.h:74:1
'u64 vfsmount::android_kabi_reserved3', at offset 320 (in bits) at mount.h:75:1
'u64 vfsmount::android_kabi_reserved4', at offset 384 (in bits) at mount.h:76:1
there are data member changes:
'void* vfsmount::data' offset changed from 192 to 448 (in bits) (by +256 bits)
8 impacted interfaces:
function vfsmount* mntget(vfsmount*)
function int notify_change2(vfsmount*, dentry*, iattr*, inode**)
function int vfs_create2(vfsmount*, inode*, dentry*, umode_t, bool)
function int vfs_mkdir2(vfsmount*, inode*, dentry*, umode_t)
function int vfs_path_lookup(dentry*, vfsmount*, const char*, unsigned int, path*)
function int vfs_rename2(vfsmount*, inode*, dentry*, inode*, dentry*, inode**, unsigned int)
function int vfs_rmdir2(vfsmount*, inode*, dentry*)
function int vfs_unlink2(vfsmount*, inode*, dentry*, inode**)
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9ce1b63f05c90af168eeea1312ac88d3cc5cfdf3
Pages containing buffer_heads that are in one of the per-CPU
buffer_head LRU caches will be pinned and thus cannot be migrated.
This can prevent CMA allocations from succeeding, which are often used
on platforms with co-processors (such as a DSP) that can only use
physically contiguous memory. It can also prevent memory
hot-unplugging from succeeding, which involves migrating at least
MIN_MEMORY_BLOCK_SIZE bytes of memory, which ranges from 8 MiB to 1
GiB based on the architecture in use.
Correspondingly, invalidate the BH LRU caches before a migration
starts and stop any buffer_head from being cached in the LRU caches,
until migration has finished.
Bug: 180018981
Link: https://lore.kernel.org/linux-mm/20210319175127.886124-3-minchan@kernel.org/
Tested-by: Oliver Sang <oliver.sang@intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Idb8279cb561812f5f1b43ddbb742c1808700754e
Currently, migrate_[prep|finish] is merely a wrapper of
lru_cache_[disable|enable]. There is not much to gain from
having additional abstraction.
Use lru_cache_[disable|enable] instead of migrate_[prep|finish],
which would be more descriptive.
note: migrate_prep_local in compaction.c changed into lru_add_drain
to avoid CPU schedule cost with involving many other CPUs to keep
keep old behavior.
Bug: 180018981
Link: https://lore.kernel.org/linux-mm/20210319175127.886124-2-minchan@kernel.org/
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I2f298c9ff53c8693527f1207ff25ab76a4ac3ada
LRU pagevec holds refcount of pages until the pagevec are drained.
It could prevent migration since the refcount of the page is greater
than the expection in migration logic. To mitigate the issue,
callers of migrate_pages drains LRU pagevec via migrate_prep or
lru_add_drain_all before migrate_pages call.
However, it's not enough because pages coming into pagevec after the
draining call still could stay at the pagevec so it could keep
preventing page migration. Since some callers of migrate_pages have
retrial logic with LRU draining, the page would migrate at next trail
but it is still fragile in that it doesn't close the fundamental race
between upcoming LRU pages into pagvec and migration so the migration
failure could cause contiguous memory allocation failure in the end.
To close the race, this patch disables lru caches(i.e, pagevec)
during ongoing migration until migrate is done.
Since it's really hard to reproduce, I measured how many times
migrate_pages retried with force mode(it is about a fallback to a
sync migration) with below debug code.
int migrate_pages(struct list_head *from, new_page_t get_new_page,
..
..
if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
dump_page(page, "fail to migrate");
}
The test was repeating android apps launching with cma allocation
in background every five seconds. Total cma allocation count was
about 500 during the testing. With this patch, the dump_page count
was reduced from 400 to 30.
The new interface is also useful for memory hotplug which currently
drains lru pcp caches after each migration failure. This is rather
suboptimal as it has to disrupt others running during the operation.
With the new interface the operation happens only once. This is also in
line with pcp allocator cache which are disabled for the offlining as
well.
Bug: 180018981
Link: https://lore.kernel.org/linux-mm/20210319175127.886124-1-minchan@kernel.org/
Reviewed-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I838c63d11ca49a8734d8b37a7d5272ab6b802f9f