Patch series "mm: page_alloc: fixes for high atomic reserve
caluculations", v3.
The state of the system where the issue exposed shown in oom kill logs:
[ 295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kBlocal_pcp:492kB free_cma:0kB
[ 295.998656] lowmem_reserve[]: 0 32
[ 295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH)
33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7752kB
From the above, it is seen that ~16MB of memory reserved for high atomic
reserves against the expectation of 1% reserves which is fixed in the 1st
patch.
Don't reserve the high atomic page blocks if 1% of zone memory size is
below a pageblock size.
This patch (of 2):
reserve_highatomic_pageblock() aims to reserve the 1% of the managed pages
of a zone, which is used for the high order atomic allocations.
It uses the below calculation to reserve:
static void reserve_highatomic_pageblock(struct page *page, ....) {
.......
max_managed = (zone_managed_pages(zone) / 100) + pageblock_nr_pages;
if (zone->nr_reserved_highatomic >= max_managed)
goto out;
zone->nr_reserved_highatomic += pageblock_nr_pages;
set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
out:
....
}
Since we are always appending the 1% of zone managed pages count to
pageblock_nr_pages, the minimum it is turning into 2 pageblocks as the
nr_reserved_highatomic is incremented/decremented in pageblock sizes.
Encountered a system(actually a VM running on the Linux kernel) with the
below zone configuration:
Normal free:7728kB boost:0kB min:804kB low:1004kB high:1204kB
reserved_highatomic:8192KB managed:49224kB
The existing calculations making it to reserve the 8MB(with pageblock size
of 4MB) i.e. 16% of the zone managed memory. Reserving such high amount
of memory can easily exert memory pressure in the system thus may lead
into unnecessary reclaims till unreserving of high atomic reserves.
Since high atomic reserves are managed in pageblock size granules, as
MIGRATE_HIGHATOMIC is set for such pageblock, fix the calculations for
high atomic reserves as, minimum is pageblock size , maximum is
approximately 1% of the zone managed pages.
Bug: 332219324
Link: https://lkml.kernel.org/r/cover.1700821416.git.quic_charante@quicinc.com
Link: https://lkml.kernel.org/r/1660034138397b82a0a8b6ae51cbe96bd583d89e.1700821416.git.quic_charante@quicinc.com
Change-Id: Icc15fb88ef6166f691f5aa14311bc45bff972b99
(cherry picked from commit d68e39fc45f70e35eb74df2128d315c1d91e4dc4)
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This function used to be called task_may_not_preempt() in older versions
and used by modules that have their extension to RT. Export it to allow
users to continue to use it.
Bug: 332629555
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: I04affb8e9e6258f9fb36ebab4d7956a265e9e299
This reverts commit 6bad1052c2, it is the
LTS merge that had to previously get reverted due to being merged too
early.
Cc: Todd Kjos <tkjos@google.com>
Change-Id: I31b7d660bd833cf022ac4870f6d01e723fda5182
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 75266774b9.
It conflicts with the LTS merge revert and will be brought back after
that is properly merged.
Bug: 308663717
Bug: 319125789
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Juan Yescas <jyescas@google.com>
Change-Id: Ieb275cba965a604019aa51e174e52082959c8872
This reverts commit 7932afa9bb.
It broke the Android kernel ABI and can be brought back in the future
in an ABI-safe way if needed.
Bug: 332277393
Change-Id: I10abcbf5237536b0ee382d3db16a5ebd82b3222c
Signed-off-by: Giuliano Procida <gprocida@google.com>
Commit 6d98eb95b4 ("binder: avoid potential data leakage when copying
txn") introduced changes to how binder objects are copied. In doing so,
it unintentionally removed an offset alignment check done through calls
to binder_alloc_copy_from_buffer() -> check_buffer().
These calls were replaced in binder_get_object() with copy_from_user(),
so now an explicit offset alignment check is needed here. This avoids
later complications when unwinding the objects gets harder.
It is worth noting this check existed prior to commit 7a67a39320
("binder: add function to copy binder object from buffer"), likely
removed due to redundancy at the time.
Fixes: 6d98eb95b4 ("binder: avoid potential data leakage when copying txn")
Cc: <stable@vger.kernel.org>
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Bug: 320661088
Link: https://lore.kernel.org/all/20240330190115.1877819-1-cmllamas@google.com/
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Change-Id: Iaddabaa28de7ba7b7d35dbb639d38ca79dbc5077
Add symbol list for Nothing at the first time
1 Added function:
[A] 'function int __traceiter_android_vh_thermal_pm_notify_suspend(void*, thermal_zone_device*, int*)'
1 Added variable:
[A] 'tracepoint __tracepoint_android_vh_thermal_pm_notify_suspend'
Bug: 332221925
Change-Id: I07b727a5c340f8f1cc7fb72e16a1c47881aa348b
Signed-off-by: Dylan Chang <dylan.chang@nothing.tech>
Currently, most of the thermal_zones are IRQ capable and they do not need
to be updated while resuming. To improve the system performance and reduce
the resume time. Add a vendor function to check if the thermal_zone is
not IRQ capable and needs to be updated.
Bug: 170905417
Bug: 332221925
Test: boot and vendor function worked properly.
Change-Id: I9389985bba29b551a7a20b55e1ed26b6c4da9b3d
Signed-off-by: David Chao <davidchao@google.com>
Signed-off-by: Dylan Chang <dylan.chang@nothing.tech>
This reverts commit 8a2f432fcb.
It conflicts with the LTS merge revert and will be brought back after
that is properly merged.
Bug: 308663717
Bug: 319125789
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Juan Yescas <jyescas@google.com>
Change-Id: I764adf995cae6b485d4d98e410c78128a88647e0
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
to make sure trace_android_vh_binder_has_special_work_ilocked will be called
in any case in android native logic
(here just fix is:
binder_thread_read (non_block case) ->
| binder_wait_for_work ->
| if(binder_has_work_ilocked(...)) ->
| fase: schedule true: break ->
),
if binder_has_work_ilocked do not deal with trace_android_vh_binder_has_special_work_ilocked
vip thread maybe return true because proc->todo list is not empty but it has not vip work
(special work with special binder_transaction:flag)
fix it by: move trace_android_vh_binder_has_special_work_ilocked for binder_has_work
to binder_has_work_ilocked
Fixs: 24bb8fc82e60("ANDROID: vendor_hooks: add hooks in driver/android/binder.c")
| https://android-review.googlesource.com/c/kernel/common/+/2897624
Bug: 318782978
Change-Id: I8ced722c71c82942e626f04dce950e8df580ae95
Signed-off-by: songfeng <songfeng@oppo.com>
kthread_park and wait_woken have a similar race that
kthread_stop and wait_woken used to have before it was fixed in
commit cb6538e740 ("sched/wait: Fix a kthread race with
wait_woken()"). Extend that fix to also cover kthread_park.
[jstultz: Made changes suggested by Peter to optimize
memory loads]
Change-Id: Idd1381e297efb1f2493deedcc0fc0288f0027fef
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/20230602212350.535358-1-jstultz@google.com
(cherry picked from commit ef73d6a4ef)
Wnen coalescing a table into a block, the break-before-make sequence
must invalidate the whole range of addresses translated by the entry in
order to avoid the possibility of a TLB conflict.
Fix the coalescing post-table walker so that the whole range of the old
table is invalidated, rather than just the first address, since a
refcount of 1 on the child page is not sufficient to ensure the absence
of any valid mappings.
Cc: Sebastian Ene <sebastianene@google.com>
Reported-by: Mostafa Saleh <smostafa@google.com>
Fixes: 9e7e5db52c ("ANDROID: KVM: arm64: Coalesce host stage2 entries on ownership reclaim")
Bug: 331232642
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I4c94f552e4385599ad88b1be50b69ffbafa64a9b
There are no new symbols to be added to GKI symbol list.
We simply update our symbol list.
Bug: 330957400
Change-Id: I5a6420884a82ccc1aa2c4aa149554e910fe0701e
Signed-off-by: Youngmin Nam <youngmin.nam@samsung.com>
Simply clean up symbol order to recognize changes easily.
There are no changes on symbol.
Bug: 330635812
Change-Id: I10a83b2ceb56c34bad57068bab58e48e7e753991
Signed-off-by: Youngmin Nam <youngmin.nam@samsung.com>
The hyp event host_hcall was missing when a custom HVC runs.
Bug: 278749606
Bug: 244543039
Bug: 244373730
Change-Id: I760cab4fbd36a13ad262842880d9ec484f23fd22
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
(cherry picked from commit a1836ffbea9fcb70fa9d49af7382b9343285036f)
Adding the following symbols:
- blk_queue_max_segment_size
Bug: 269652215
Change-Id: Ie599f23056fd0f6641df6dba57e48413b144fa1b
Signed-off-by: Robin Peng <robinpeng@google.com>
LPM feature of DWC2 module integrated in Rockchip SoCs doesn't work
properly or needs some additional handling, so disable it for now.
Without disabling LPM feature, the USB ADB communication fail with
the following error log:
dwc2 ff580000.usb: new address 27
dwc2 ff580000.usb: Failed to exit L1 sleep state in 200us.
dwc2 ff580000.usb: dwc2_hsotg_send_reply: cannot queue req
dwc2 ff580000.usb: dwc2_hsotg_process_req_status: failed to send reply
dwc2 ff580000.usb: dwc2_hsotg_enqueue_setup: failed queue (-11)
dwc2 ff580000.usb: Failed to exit L1 sleep state in 200us.
[diff vs vendor kernel: added lpm_clock_gating, besl and
hird_threshold_en settings as seen in commit 53febc9569 ("usb: dwc2:
disable Link Power Management on STM32MP15 HS OTG")]
Bug: 300024866
Change-Id: Ib8ae241dce5993e34b6cf8e83254b2730effe009
Signed-off-by: William Wu <william.wu@rock-chips.com>
Signed-off-by: Frank Wang <frank.wang@rock-chips.com>
Signed-off-by: Quentin Schulz <quentin.schulz@theobroma-systems.com>
Link: https://lore.kernel.org/r/20221206-dwc2-gadget-dual-role-v1-1-36515e1092cd@theobroma-systems.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 42a317d076)
This reverts commit fa6c89a93f.
Reason for revert: Only reverting the prerequisite CL. Will revert the dependent aosp/3007694 once it's safe.
Bug: 324290965
Test: tools/bazel run //common-modules/virtual-device:virtual_device_aarch64_dist
Change-Id: I4f4caf7916e8030b6fe4f407c41a593c4916f43f
Signed-off-by: Istvan Nador <istvannador@google.com>
The tail pages in a THP can have swap entry information stored in their
private field. When migrating to a new page, all tail pages of the new
page need to update ->private to avoid future data corruption.
This fix is stable-only, since after commit 07e09c483c ("mm/huge_memory:
work on folio->swap instead of page->private when splitting folio"),
subpages of a swapcached THP no longer requires the maintenance.
Adding THPs to the swapcache was introduced in commit
38d8b4e6bd ("mm, THP, swap: delay splitting THP during swap out"),
where each subpage of a THP added to the swapcache had its own swapcache
entry and required the ->private field to point to the correct swapcache
entry. Later, when THP migration functionality was implemented in commit
616b837153 ("mm: thp: enable thp migration in generic path"),
it initially did not handle the subpages of swapcached THPs, failing to
update their ->private fields or replace the subpage pointers in the
swapcache. Subsequently, commit e71769ae52 ("mm: enable thp migration
for shmem thp") addressed the swapcache update aspect. This patch fixes
the update of subpage ->private fields.
Bug: 324818390
Fixes: 616b837153 ("mm: thp: enable thp migration in generic path")
Link: https://lore.kernel.org/linux-mm/20240306155217.118467-1-zi.yan@sent.com/
Reported-and-tested-by: Charan Teja Kalla <quic_charante@quicinc.com>
Change-Id: Ia4603cd58b76dc6ff46a2c53a735942a87221419
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Closes: https://lore.kernel.org/linux-mm/1707814102-22682-1-git-send-email-quic_charante@quicinc.com/
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Gunyah resource manager has limited internal buffering to send messages
to VMs and it is possible to fill the buffer and cause RM to drop
replies. Prevent the "drop" scenario by serializing the entire
send/receive RPC flow.
Bug: 330201551
Change-Id: I65f2f6daf495eb24e1bc120a6a4d0b84c966e3cc
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Add support for configuring the maximum segment size.
Add support for segments smaller than the page size.
This patch enables testing segments smaller than the page size with a
driver that does not call blk_rq_map_sg().
Bug: 308663717
Bug: 319125789
Change-Id: Idd7094e9f773c295017b44377d2a3f10abea95cf
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Juan Yescas <jyescas@google.com>
Add a kernel module parameter for configuring the maximum segment size.
This patch enables testing SCSI support for segments smaller than the
page size.
Bug: 308663717
Bug: 319125789
Change-Id: I1d9f1714876de72630cbf3150e7082b988dd7322
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Juan Yescas <jyescas@google.com>
Add support in the bio splitting code and also in the bio submission code
for bios with segments smaller than the page size.
Bug: 308663717
Bug: 319125789
Change-Id: I056659cf86c04fb095aa01cd3d274d29417782ac
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Juan Yescas <jyescas@google.com>
If the segment size is smaller than the page size there may be multiple
segments per bvec even if a bvec only contains a single page. Hence this
patch.
Bug: 308663717
Bug: 319125789
Change-Id: I81516bf6da8ce3e4e60651ab2bd379080e7d3482
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Juan Yescas <jyescas@google.com>
This new debugfs attribute makes it easier to verify the code that tracks
how many queues require limits below the page size.
Bug: 308663717
Bug: 319125789
Change-Id: I2ee54a9e4544866e71b505ae9296b68039d5ca82
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[jyescas@google.com: Wrap #include "blk-mq-debugfs.h" with
#ifndef __GENKSYSM__ to avoid ABI CRC
changes.]
Signed-off-by: Juan Yescas <jyescas@google.com>
Allow block drivers to configure the following:
* Maximum number of hardware sectors values smaller than
PAGE_SIZE >> SECTOR_SHIFT. For PAGE_SIZE = 4096 this means that values
below 8 become supported.
* A maximum segment size below the page size. This is most useful
for page sizes above 4096 bytes.
The blk_sub_page_segments static branch will be used in later patches to
prevent that performance of block drivers that support segments >=
PAGE_SIZE and max_hw_sectors >= PAGE_SIZE >> SECTOR_SHIFT would be affected.
This patch may change the behavior of existing block drivers from not
working into working. If a block driver calls
blk_queue_max_hw_sectors() or blk_queue_max_segment_size(), this is
usually done to configure the maximum supported limits. An attempt to
configure a limit below what is supported by the block layer causes the
block layer to select a larger value. If that value is not supported by
the block driver, this may cause other data to be transferred than
requested, a kernel crash or other undesirable behavior.
Keeps the ABI stable by taking advantage of hole in struct queue_limits.
Bug: 308663717
Bug: 319125789
Bug: 324152549
Change-Id: I7358f3e16aa0c80a6d345cb7887fbe9276e52912
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[jyescas@google.com: disable subpage limits in block/blk-sysfs.c
instead block/blk-core.c because the function
blk_free_queue() is not defined in 6.1 kernel]
Signed-off-by: Juan Yescas <jyescas@google.com>
Introduce variables that represent the lower configuration bounds. This
patch does not change any functionality.
Bug: 308663717
Bug: 319125789
Change-Id: Ie88bfa6b716a43ca7e95a67fad267bdb1507015f
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Juan Yescas <jyescas@google.com>
Switch to the modern style of printing kernel messages. Use %u instead
of %d to print unsigned integers.
The pr_fmt() format is added on top of the file to include __func__
in the pr_info() calls.
Bug: 308663717
Bug: 319125789
Change-Id: I11dbb559263ae5ef18febc7ab89f27f231e511e2
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[jyescas@google.com: define pr_fmt(fmt) to include __func__ in the output]
Signed-off-by: Juan Yescas <jyescas@google.com>
Add vendor hook android_vh_sound_check_support_cpu_suspend
to allow ACPU to suspend during USB playback/capture,
if this is supported.
Bug: 329345852
Bug: 192206510
Change-Id: Ia8d4c335db27de5fcefab13cab653fd1ae34f691
Signed-off-by: JJ Lee <leejj@google.com>
(cherry picked from commit e8516fd3af)
(cherry picked from commit 4cbf19a6f8)
Add the hook that vendor can design and bypass the suspend/resume.
When the bypass is set, skip the orignal suspend/resume methods.
In mobile, a co-processor can be used with USB audio, and ACPU may
be able to sleep in such condition to improve power consumption.
We will need vendor hook to support this.
Bug: 329345852
Bug: 302982919
Signed-off-by: Puma Hsu <pumahsu@google.com>
Change-Id: Ic62a8a1e662bbe3fb0aa17af7491daace0b9f18a
(cherry picked from commit 98085b5dd8)
(cherry picked from commit 358b59f1bc)
Currently, the linker script's support for merging module's section is
guarded by either CONFIG_LTO_CLANG or CONFIG_CRYPTO_FIPS140_MOD. This
functionally is also needed by additional fips140 modules built out of
tree. So, have an explicit config (CRYPTO_FIPS140_MERGE_MOD_SECTIONS)
that can be selected by the various fips140 modules without having to
depend on and enabling CONFIG_CRYPTO_FIPS140_MOD.
Bug: 281657135
Change-Id: I2af727813151ba839a95696bc847e2a841a7175a
Signed-off-by: Konstantin Vyshetsky <vkon@google.com>
When driver uses pm_runtime_force_suspend() as the system suspend callback
function and registers the wake irq with reverse enable ordering, the wake
irq will be re-enabled when entering system suspend, triggering an
'Unbalanced enable for IRQ xxx' warning. In this scenario, the call
sequence during system suspend is as follows:
suspend_devices_and_enter()
-> dpm_suspend_start()
-> dpm_run_callback()
-> pm_runtime_force_suspend()
-> dev_pm_enable_wake_irq_check()
-> dev_pm_enable_wake_irq_complete()
-> suspend_enter()
-> dpm_suspend_noirq()
-> device_wakeup_arm_wake_irqs()
-> dev_pm_arm_wake_irq()
To fix this issue, complete the setting of WAKE_IRQ_DEDICATED_ENABLED flag
in dev_pm_enable_wake_irq_complete() to avoid redundant irq enablement.
Bug: 330244514
(cherry picked from commit e7a7681c859643f3f2476b2a28a494877fd89442
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
linux-next)
Fixes: 8527beb120 ("PM: sleep: wakeirq: fix wake irq arming")
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Qingliang Li <qingliang.li@mediatek.com>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Change-Id: I46ba27631ed5561123bd98dd32872837b726b5bd
There is no new symbol to be added. We just try to update our symbol list.
Bug: 330272507
Change-Id: I1acb83c75c0dd4f594f8cbcdf341fe8dbef5bf26
Signed-off-by: Youngmin Nam <youngmin.nam@samsung.com>