linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-11 13:27:06 +09:00

Author	SHA1	Message	Date
Mauro (mdrjr) Ribeiro	7e9ab6bbff	Merge tag 'v4.9.312' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y This is the 4.9.312 stable release Change-Id: I82d358bf9c3d99a65dfcd19f5ed46d48cad014b1	2022-04-27 17:04:31 -03:00
Mauro (mdrjr) Ribeiro	922b7e13c3	Merge tag 'v4.9.311' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y This is the 4.9.311 stable release Change-Id: I671e8e5aa10f2aaa12fc35b6ba1a0c8978c412d9	2022-04-27 17:04:21 -03:00
Mauro (mdrjr) Ribeiro	43c55a77e9	Merge tag 'v4.9.283' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y This is the 4.9.283 stable release Change-Id: I6cf9304183b00aff4c3b47c3fc072cc95ff18c6b	2022-04-27 13:37:12 -03:00
Xiongwei Song	2705733346	mm: page_alloc: fix building error on -Werror=array-compare commit `ca831f29f8` upstream. Arthur Marsh reported we would hit the error below when building kernel with gcc-12: CC mm/page_alloc.o mm/page_alloc.c: In function `mem_init_print_info': mm/page_alloc.c:8173:27: error: comparison between two arrays [-Werror=array-compare] 8173 \| if (start <= pos && pos < end && size > adj) \ \| In C++20, the comparision between arrays should be warned. Link: https://lkml.kernel.org/r/20211125130928.32465-1-sxwjean@me.com Signed-off-by: Xiongwei Song <sxwjean@gmail.com> Reported-by: Arthur Marsh <arthur.marsh@internode.on.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Khem Raj <raj.khem@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-27 13:14:10 +02:00
Juergen Gross	2f2ef479bd	mm, page_alloc: fix build_zonerefs_node() commit `e553f62f10` upstream. Since commit `6aa303defb` ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") only zones with free memory are included in a built zonelist. This is problematic when e.g. all memory of a zone has been ballooned out when zonelists are being rebuilt. The decision whether to rebuild the zonelists when onlining new memory is done based on populated_zone() returning 0 for the zone the memory will be added to. The new zone is added to the zonelists only, if it has free memory pages (managed_zone() returns a non-zero value) after the memory has been onlined. This implies, that onlining memory will always free the added pages to the allocator immediately, but this is not true in all cases: when e.g. running as a Xen guest the onlined new memory will be added only to the ballooned memory list, it will be freed only when the guest is being ballooned up afterwards. Another problem with using managed_zone() for the decision whether a zone is being added to the zonelists is, that a zone with all memory used will in fact be removed from all zonelists in case the zonelists happen to be rebuilt. Use populated_zone() when building a zonelist as it has been done before that commit. There was a report that QubesOS (based on Xen) is hitting this problem. Xen has switched to use the zone device functionality in kernel 5.9 and QubesOS wants to use memory hotplugging for guests in order to be able to start a guest with minimal memory and expand it as needed. This was the report leading to the patch. Link: https://lkml.kernel.org/r/20220407120637.9035-1-jgross@suse.com Fixes: `6aa303defb` ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") Signed-off-by: Juergen Gross <jgross@suse.com> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-20 09:06:45 +02:00
Alistair Popple	43397f238a	mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node commit `ddbc84f3f5` upstream. ZONE_MOVABLE uses the remaining memory in each node. Its starting pfn is also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is not enough room for ZONE_MOVABLE on that node. Unfortunately this condition is not checked for. This leads to zone_movable_pfn[] getting set to a pfn greater than the last pfn in a node. calculate_node_totalpages() then sets zone->present_pages to be greater than zone->spanned_pages which is invalid, as spanned_pages represents the maximum number of pages in a zone assuming no holes. Subsequently it is possible free_area_init_core() will observe a zone of size zero with present pages. In this case it will skip setting up the zone, including the initialisation of free_lists[]. However populated_zone() checks zone->present_pages to see if a zone has memory available. This is used by iterators such as walk_zones_in_node(). pagetypeinfo_showfree() uses this to walk the free_list of each zone in each node, which are assumed to be initialised due to the zone not being empty. As free_area_init_core() never initialised the free_lists[] this results in the following kernel crash when trying to read /proc/pagetypeinfo: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 RIP: 0010:pagetypeinfo_show+0x163/0x460 Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82 RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003 RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00 R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000 FS: 00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0 Call Trace: seq_read_iter+0x128/0x460 proc_reg_read_iter+0x51/0x80 new_sync_read+0x113/0x1a0 vfs_read+0x136/0x1d0 ksys_read+0x70/0xf0 __x64_sys_read+0x1a/0x20 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Fix this by checking that the aligned zone_movable_pfn[] does not exceed the end of the node, and if it does skip creating a movable zone on this node. Link: https://lkml.kernel.org/r/20220215025831.2113067-1-apopple@nvidia.com Fixes: `2a1e274acf` ("Create the ZONE_MOVABLE zone") Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-20 09:06:29 +02:00
Muchun Song	c88c1fd886	mm/page_alloc: speed up the iteration of max_order commit `7ad69832f3` upstream. When we free a page whose order is very close to MAX_ORDER and greater than pageblock_order, it wastes some CPU cycles to increase max_order to MAX_ORDER one by one and check the pageblock migratetype of that page repeatedly especially when MAX_ORDER is much larger than pageblock_order. We also should not be checking migratetype of buddy when "order == MAX_ORDER - 1" as the buddy pfn may be invalid, so adjust the condition. With the new check, we don't need the max_order check anymore, so we replace it. Also adjust max_order initialization so that it's lower by one than previously, which makes the code hopefully more clear. Link: https://lkml.kernel.org/r/20201204155109.55451-1-songmuchun@bytedance.com Fixes: `d9dddbf556` ("mm/page_alloc: prevent merging between isolated and other pageblocks") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-22 11:42:57 +02:00
Mauro (mdrjr) Ribeiro	6c84941389	Merge tag 'v4.9.239' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y This is the 4.9.239 stable release	2020-12-22 09:20:30 -03:00
Vijay Balakrishna	189394cf5e	mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged commit `4aab2be098` upstream. When memory is hotplug added or removed the min_free_kbytes should be recalculated based on what is expected by khugepaged. Currently after hotplug, min_free_kbytes will be set to a lower default and higher default set when THP enabled is lost. This change restores min_free_kbytes as expected for THP consumers. [vijayb@linux.microsoft.com: v5] Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.microsoft.com Fixes: `f000565adb` ("thp: set recommended min free kbytes") Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Allen Pais <apais@microsoft.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Song Liu <songliubraving@fb.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/1600305709-2319-2-git-send-email-vijayb@linux.microsoft.com Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-10-14 09:48:17 +02:00
Mauro (mdrjr) Ribeiro	3cb1471988	Merge tag 'v4.9.234' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y This is the 4.9.234 stable release	2020-09-15 10:54:07 -03:00
Mauro (mdrjr) Ribeiro	4d6f772c22	Merge tag 'v4.9.233' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y This is the 4.9.233 stable release	2020-09-15 10:53:28 -03:00
Charan Teja Reddy	1a4029e931	mm, page_alloc: fix core hung in free_pcppages_bulk() commit `88e8ac11d2` upstream. The following race is observed with the repeated online, offline and a delay between two successive online of memory blocks of movable zone. P1 P2 Online the first memory block in the movable zone. The pcp struct values are initialized to default values,i.e., pcp->high = 0 & pcp->batch = 1. Allocate the pages from the movable zone. Try to Online the second memory block in the movable zone thus it entered the online_pages() but yet to call zone_pcp_update(). This process is entered into the exit path thus it tries to release the order-0 pages to pcp lists through free_unref_page_commit(). As pcp->high = 0, pcp->count = 1 proceed to call the function free_pcppages_bulk(). Update the pcp values thus the new pcp values are like, say, pcp->high = 378, pcp->batch = 63. Read the pcp's batch value using READ_ONCE() and pass the same to free_pcppages_bulk(), pcp values passed here are, batch = 63, count = 1. Since num of pages in the pcp lists are less than ->batch, then it will stuck in while(list_empty(list)) loop with interrupts disabled thus a core hung. Avoid this by ensuring free_pcppages_bulk() is called with proper count of pcp list pages. The mentioned race is some what easily reproducible without [1] because pcp's are not updated for the first memory block online and thus there is a enough race window for P2 between alloc+free and pcp struct values update through onlining of second memory block. With [1], the race still exists but it is very narrow as we update the pcp struct values for the first memory block online itself. This is not limited to the movable zone, it could also happen in cases with the normal zone (e.g., hotplug to a node that only has DMA memory, or no other memory yet). [1]: https://patchwork.kernel.org/patch/11696389/ Fixes: `5f8dcc2121` ("page-allocator: split per-cpu list into one-list-per-migrate-type") Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: <stable@vger.kernel.org> [2.6+] Link: http://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@codeaurora.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-08-26 10:29:04 +02:00
Doug Berger	8c6a0bcb20	mm: include CMA pages in lowmem_reserve at boot commit `e08d3fdfe2` upstream. The lowmem_reserve arrays provide a means of applying pressure against allocations from lower zones that were targeted at higher zones. Its values are a function of the number of pages managed by higher zones and are assigned by a call to the setup_per_zone_lowmem_reserve() function. The function is initially called at boot time by the function init_per_zone_wmark_min() and may be called later by accesses of the /proc/sys/vm/lowmem_reserve_ratio sysctl file. The function init_per_zone_wmark_min() was moved up from a module_init to a core_initcall to resolve a sequencing issue with khugepaged. Unfortunately this created a sequencing issue with CMA page accounting. The CMA pages are added to the managed page count of a zone when cma_init_reserved_areas() is called at boot also as a core_initcall. This makes it uncertain whether the CMA pages will be added to the managed page counts of their zones before or after the call to init_per_zone_wmark_min() as it becomes dependent on link order. With the current link order the pages are added to the managed count after the lowmem_reserve arrays are initialized at boot. This means the lowmem_reserve values at boot may be lower than the values used later if /proc/sys/vm/lowmem_reserve_ratio is accessed even if the ratio values are unchanged. In many cases the difference is not significant, but for example an ARM platform with 1GB of memory and the following memory layout cma: Reserved 256 MiB at 0x0000000030000000 Zone ranges: DMA [mem 0x0000000000000000-0x000000002fffffff] Normal empty HighMem [mem 0x0000000030000000-0x000000003fffffff] would result in 0 lowmem_reserve for the DMA zone. This would allow userspace to deplete the DMA zone easily. Funnily enough $ cat /proc/sys/vm/lowmem_reserve_ratio would fix up the situation because as a side effect it forces setup_per_zone_lowmem_reserve. This commit breaks the link order dependency by invoking init_per_zone_wmark_min() as a postcore_initcall so that the CMA pages have the chance to be properly accounted in their zone(s) and allowing the lowmem_reserve arrays to receive consistent values. Fixes: `bc22af74f2` ("mm: update min_free_kbytes from khugepaged after core initialization") Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jason Baron <jbaron@akamai.com> Cc: David Rientjes <rientjes@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1597423766-27849-1-git-send-email-opendmb@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-08-26 10:29:04 +02:00
Oscar Salvador	23feab188c	mm: Avoid calling build_all_zonelists_init under hotplug context Recently a customer of ours experienced a crash when booting the system while enabling memory-hotplug. The problem is that Normal zones on different nodes don't get their private zone->pageset allocated, and keep sharing the initial boot_pageset. The sharing between zones is normally safe as explained by the comment for boot_pageset - it's a percpu structure, and manipulations are done with disabled interrupts, and boot_pageset is set up in a way that any page placed on its pcplist is immediately flushed to shared zone's freelist, because pcp->high == 1. However, the hotplug operation updates pcp->high to a higher value as it expects to be operating on a private pageset. The problem is in build_all_zonelists(), which is called when the first range of pages is onlined for the Normal zone of node X or Y: if (system_state == SYSTEM_BOOTING) { build_all_zonelists_init(); } else { #ifdef CONFIG_MEMORY_HOTPLUG if (zone) setup_zone_pageset(zone); #endif /* we have to stop all cpus to guarantee there is no user of zonelist / stop_machine(__build_all_zonelists, pgdat, NULL); / cpuset refresh routine should be here */ } When called during hotplug, it should execute the setup_zone_pageset(zone) which allocates the private pageset. However, with memhp_default_state=online, this happens early while system_state == SYSTEM_BOOTING is still true, hence this step is skipped. (and build_all_zonelists_init() is probably unsafe anyway at this point). Another hotplug operation on the same zone then leads to zone_pcp_update(zone) called from online_pages(), which updates the pcp->high for the shared boot_pageset to a value higher than 1. At that point, pages freed from Node X and Y Normal zones can end up on the same pcplist and from there they can be freed to the wrong zone's freelist, leading to the corruption and crashes. Please, note that upstream has fixed that differently (and unintentionally) by adding another boot state (SYSTEM_SCHEDULING), which is set before smp_init(). That should happen before memory hotplug events even with memhp_default_state=online. Backporting that would be too intrusive. Signed-off-by: Oscar Salvador <osalvador@suse.de> Debugged-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> # for stable trees Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-08-21 11:02:11 +02:00
Mauro (mdrjr) Ribeiro	61b0ff0d89	Merge tag 'v4.9.224' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y This is the 4.9.224 stable release Change-Id: I0e07ea572bc1a980b42dea56ab1fcb1069640ac1	2020-07-13 17:58:04 -03:00
Mauro (mdrjr) Ribeiro	4a44c1b17a	Merge tag 'v4.9.220' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y This is the 4.9.220 stable release Change-Id: I5df1feee2bcf4521fe0d4c65868d6b4f4279275d	2020-07-13 17:56:28 -03:00
David Hildenbrand	e254aa027f	mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() commit `e84fe99b68` upstream. Without CONFIG_PREEMPT, it can happen that we get soft lockups detected, e.g., while booting up. watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 RIP: __pageblock_pfn_to_page+0x134/0x1c0 Call Trace: set_zone_contiguous+0x56/0x70 page_alloc_init_late+0x166/0x176 kernel_init_freeable+0xfa/0x255 kernel_init+0xa/0x106 ret_from_fork+0x35/0x40 The issue becomes visible when having a lot of memory (e.g., 4TB) assigned to a single NUMA node - a system that can easily be created using QEMU. Inside VMs on a hypervisor with quite some memory overcommit, this is fairly easy to trigger. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Shile Zhang <shile.zhang@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416073417.5003-1-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-05-20 08:15:28 +02:00
Alexander Duyck	b59a2e0a56	mm: Use fixed constant in page_frag_alloc instead of size + 1 commit `8644772637` upstream. This patch replaces the size + 1 value introduced with the recent fix for 1 byte allocs with a constant value. The idea here is to reduce code overhead as the previous logic would have to read size into a register, then increment it, and write it back to whatever field was being used. By using a constant we can avoid those memory reads and arithmetic operations in favor of just encoding the maximum value into the operation itself. Fixes: `2c2ade8174` ("mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs") Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-24 07:58:57 +02:00
Mauro (mdrjr) Ribeiro	52de765630	Merge tag 'v4.9.183' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y This is the 4.9.183 stable release	2020-04-07 20:05:53 -03:00
Mauro (mdrjr) Ribeiro	fae8d03d01	Merge tag 'v4.9.165' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y This is the 4.9.165 stable release	2020-04-07 14:56:52 -03:00
Mauro (mdrjr) Ribeiro	877a56c96e	Merge tag 'v4.9.145' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y This is the 4.9.145 stable release	2020-04-07 14:40:23 -03:00
Tao Zeng	74a35431d2	mm: change lmk/cma using policy [1/1] PD#TV-10462 Problem: Memory will allocation fail if play secure vedio source. Usually seen by zram/wifi driver Solution: 1, wake up kswapd earlier if water mark without free cma is not ok; 2, using zone-filecache to increace active of lmk. Which can be more accurate than using global page status; 3, remove some restrict of using cma when allocate movable page by zram or migrate from cma pool; 4, try allocate hard for atomic request in soft IRQ Verify: T950L Change-Id: Ibf03f3c11a32175e9983ee8a61a14ae4b2436f1e Signed-off-by: Tao Zeng <tao.zeng@amlogic.com>	2019-10-25 00:09:00 -07:00
Tao Zeng	becb83999e	mm: fix wrong kasan report [1/1] PD#SWPL-13281 Problem: There are 2 types of wrong kasan report after merge change of save wasted slab. 1, slab-out-of-bounds, which is caused by krealloc set shadow memory out-of-range, since tail of page was freed. 2, use-after-free, which is caused by kasan_free_pages called after a page freed. Because this function already called in free_page, so it marked shadow memory twice. Solution: 1, make shadow do not out of range if a tail page was freed and been realloc again. 2, remove call of kasan_free_pages. Verify: X301 Change-Id: Ib5bdcbb618a783920009bb97d112c361888b0d7c Signed-off-by: Tao Zeng <tao.zeng@amlogic.com>	2019-08-28 23:36:43 -07:00
Linxu Fang	121100e479	mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE [ Upstream commit `299c83dce9` ] `342332e6a9` ("mm/page_alloc.c: introduce kernelcore=mirror option") and later patches rewrote the calculation of node spanned pages. `e506b99696` ("mem-hotplug: fix node spanned pages when we have a movable node"), but the current code still has problems, When we have a node with only zone_movable and the node id is not zero, the size of node spanned pages is double added. That's because we have an empty normal zone, and zone_start_pfn or zone_end_pfn is not between arch_zone_lowest_possible_pfn and arch_zone_highest_possible_pfn, so we need to use clamp to constrain the range just like the commit <96e907d13602> (bootmem: Reimplement __absent_pages_in_range() using for_each_mem_pfn_range()). e.g. Zone ranges: DMA [mem 0x0000000000001000-0x0000000000ffffff] DMA32 [mem 0x0000000001000000-0x00000000ffffffff] Normal [mem 0x0000000100000000-0x000000023fffffff] Movable zone start for each node Node 0: 0x0000000100000000 Node 1: 0x0000000140000000 Early memory node ranges node 0: [mem 0x0000000000001000-0x000000000009efff] node 0: [mem 0x0000000000100000-0x00000000bffdffff] node 0: [mem 0x0000000100000000-0x000000013fffffff] node 1: [mem 0x0000000140000000-0x000000023fffffff] node 0 DMA spanned:0xfff present:0xf9e absent:0x61 node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020 node 0 Normal spanned:0 present:0 absent:0 node 0 Movable spanned:0x40000 present:0x40000 absent:0 On node 0 totalpages(node_present_pages): 1048446 node_spanned_pages:1310719 node 1 DMA spanned:0 present:0 absent:0 node 1 DMA32 spanned:0 present:0 absent:0 node 1 Normal spanned:0x100000 present:0x100000 absent:0 node 1 Movable spanned:0x100000 present:0x100000 absent:0 On node 1 totalpages(node_present_pages): 2097152 node_spanned_pages:2097152 Memory: 6967796K/12582392K available (16388K kernel code, 3686K rwdata, 4468K rodata, 2160K init, 10444K bss, 5614596K reserved, 0K cma-reserved) It shows that the current memory of node 1 is double added. After this patch, the problem is fixed. node 0 DMA spanned:0xfff present:0xf9e absent:0x61 node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020 node 0 Normal spanned:0 present:0 absent:0 node 0 Movable spanned:0x40000 present:0x40000 absent:0 On node 0 totalpages(node_present_pages): 1048446 node_spanned_pages:1310719 node 1 DMA spanned:0 present:0 absent:0 node 1 DMA32 spanned:0 present:0 absent:0 node 1 Normal spanned:0 present:0 absent:0 node 1 Movable spanned:0x100000 present:0x100000 absent:0 On node 1 totalpages(node_present_pages): 1048576 node_spanned_pages:1048576 memory: 6967796K/8388088K available (16388K kernel code, 3686K rwdata, 4468K rodata, 2160K init, 10444K bss, 1420292K reserved, 0K cma-reserved) Link: http://lkml.kernel.org/r/1554178276-10372-1-git-send-email-fanglinxu@huawei.com Signed-off-by: Linxu Fang <fanglinxu@huawei.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-06-22 08:17:12 +02:00
Tao Zeng	4a456864ca	mm: fix cma allocation time too long [1/1] PD#TV-6340 Problem: When quickly enter live tv just after boot to home, video may display more than 10 seconds late after sound comeout. The main problem is cma allocation time too long. Solution: 1, add a page flag for pages under cma allocating. And do not increase page-ref count for cma pages under allocating when it is used by user space again. 2, restrict shmem/swap back pages using cma 3, improve cma using policy check in page allocating process. 4, replace righ page trace for migrated pages. Change-Id: Ie6b591213a9eda974c3443ca9b491fa8d00cee50 Signed-off-by: Tao Zeng <tao.zeng@amlogic.com>	2019-06-19 22:50:16 -07:00
Jann Horn	484e89a9a7	mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs [ Upstream commit `2c2ade8174` ] The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum number of references that we might need to create in the fastpath later, the bump-allocation fastpath only has to modify the non-atomic bias value that tracks the number of extra references we hold instead of the atomic refcount. The maximum number of allocations we can serve (under the assumption that no allocation is made with size 0) is nc->size, so that's the bias used. However, even when all memory in the allocation has been given away, a reference to the page is still held; and in the `offset < 0` slowpath, the page may be reused if everyone else has dropped their references. This means that the necessary number of references is actually `nc->size+1`. Luckily, from a quick grep, it looks like the only path that can call page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which requires CAP_NET_ADMIN in the init namespace and is only intended to be used for kernel testing and fuzzing. To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the `offset < 0` path, below the virt_to_page() call, and then repeatedly call writev() on a TAP device with IFF_TAP\|IFF_NO_PI\|IFF_NAPI_FRAGS\|IFF_NAPI, with a vector consisting of 15 elements containing 1 byte each. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-03-23 13:19:42 +01:00
Tao Zeng	ffe96a1f0d	mm: reclaim for unevictable cma pages [1/1] PD#SWPL-3902 Problem: If cma page is unevictable, migrate it will cost long time. Solution: 1. Recalim unevictable cma file cache pages. 2. Using CMA after first water mark not ok. Verify: einstern Change-Id: I0ecbf5dd535cb034430c4ea623891e7a7ae6e4dd Signed-off-by: Tao Zeng <tao.zeng@amlogic.com>	2019-02-19 01:43:23 -08:00
tao zeng	6af3c846c0	mm: save wasted memory by slab [1/1] PD#SWPL-1767 Problem: When driver/kernel call kmalloc with large size, memory may waste if size is not equal to 2^n. For example, driver call kmalloc with size 129KB, kmalloc will allocate a 256KB memory block to caller. Then 127kb memory will be wasted if this caller don't free it. Solution: Free tail of slab memory if size is not match to 2^n. This change can save about 900KB memory after boot, and more than 100KB during run time. Verify: P212 Change-Id: Iba378792ec30003358b64384361c0f0c4c2800d8 Signed-off-by: tao zeng <tao.zeng@amlogic.com>	2018-12-17 23:10:21 -08:00
Tetsuo Handa	fbb78e978a	mm: don't warn about allocations which stall for too long commit `400e22499d` upstream. Commit `63f53dea0c` ("mm: warn about allocations which stall for too long") was a great step for reducing possibility of silent hang up problem caused by memory allocation stalls. But this commit reverts it, for it is possible to trigger OOM lockup and/or soft lockups when many threads concurrently called warn_alloc() (in order to warn about memory allocation stalls) due to current implementation of printk(), and it is difficult to obtain useful information due to limitation of synchronous warning approach. Current printk() implementation flushes all pending logs using the context of a thread which called console_unlock(). printk() should be able to flush all pending logs eventually unless somebody continues appending to printk() buffer. Since warn_alloc() started appending to printk() buffer while waiting for oom_kill_process() to make forward progress when oom_kill_process() is processing pending logs, it became possible for warn_alloc() to force oom_kill_process() loop inside printk(). As a result, warn_alloc() significantly increased possibility of preventing oom_kill_process() from making forward progress. ---------- Pseudo code start ---------- Before warn_alloc() was introduced: retry: if (mutex_trylock(&oom_lock)) { while (atomic_read(&printk_pending_logs) > 0) { atomic_dec(&printk_pending_logs); print_one_log(); } // Send SIGKILL here. mutex_unlock(&oom_lock) } goto retry; After warn_alloc() was introduced: retry: if (mutex_trylock(&oom_lock)) { while (atomic_read(&printk_pending_logs) > 0) { atomic_dec(&printk_pending_logs); print_one_log(); } // Send SIGKILL here. mutex_unlock(&oom_lock) } else if (waited_for_10seconds()) { atomic_inc(&printk_pending_logs); } goto retry; ---------- Pseudo code end ---------- Although waited_for_10seconds() becomes true once per 10 seconds, unbounded number of threads can call waited_for_10seconds() at the same time. Also, since threads doing waited_for_10seconds() keep doing almost busy loop, the thread doing print_one_log() can use little CPU resource. Therefore, this situation can be simplified like ---------- Pseudo code start ---------- retry: if (mutex_trylock(&oom_lock)) { while (atomic_read(&printk_pending_logs) > 0) { atomic_dec(&printk_pending_logs); print_one_log(); } // Send SIGKILL here. mutex_unlock(&oom_lock) } else { atomic_inc(&printk_pending_logs); } goto retry; ---------- Pseudo code end ---------- when printk() is called faster than print_one_log() can process a log. One of possible mitigation would be to introduce a new lock in order to make sure that no other series of printk() (either oom_kill_process() or warn_alloc()) can append to printk() buffer when one series of printk() (either oom_kill_process() or warn_alloc()) is already in progress. Such serialization will also help obtaining kernel messages in readable form. ---------- Pseudo code start ---------- retry: if (mutex_trylock(&oom_lock)) { mutex_lock(&oom_printk_lock); while (atomic_read(&printk_pending_logs) > 0) { atomic_dec(&printk_pending_logs); print_one_log(); } // Send SIGKILL here. mutex_unlock(&oom_printk_lock); mutex_unlock(&oom_lock) } else { if (mutex_trylock(&oom_printk_lock)) { atomic_inc(&printk_pending_logs); mutex_unlock(&oom_printk_lock); } } goto retry; ---------- Pseudo code end ---------- But this commit does not go that direction, for we don't want to introduce a new lock dependency, and we unlikely be able to obtain useful information even if we serialized oom_kill_process() and warn_alloc(). Synchronous approach is prone to unexpected results (e.g. too late [1], too frequent [2], overlooked [3]). As far as I know, warn_alloc() never helped with providing information other than "something is going wrong". I want to consider asynchronous approach which can obtain information during stalls with possibly relevant threads (e.g. the owner of oom_lock and kswapd-like threads) and serve as a trigger for actions (e.g. turn on/off tracepoints, ask libvirt daemon to take a memory dump of stalling KVM guest for diagnostic purpose). This commit temporarily loses ability to report e.g. OOM lockup due to unable to invoke the OOM killer due to !__GFP_FS allocation request. But asynchronous approach will be able to detect such situation and emit warning. Thus, let's remove warn_alloc(). [1] https://bugzilla.kernel.org/show_bug.cgi?id=192981 [2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@mail.gmail.com [3] commit `db73ee0d46` ("mm, vmscan: do not loop on too_many_isolated for ever")) Link: http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Reported-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com> Reported-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [Resolved backport conflict due to missing `8225196`, `a8e9925`, `9e80c71` and `9a67f64` in 4.9 -- all of which modified this hunk being removed.] Signed-off-by: Amit Shah <amit@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2018-12-13 09:20:26 +01:00
tao zeng	1628236e99	mm: exclude free cma pages when calculate watermark [1/1] PD#SWPL-1210 Problem: Some drivers(eth/wifi) occasionally can't allocate memory under atomic context. From mem status print, there are enough free pages but most of them are CMA. Solution: Exclude free cma pages when calculate water mark. This can push kswapd/compaction work more aggressive. Verify: P212 Change-Id: Ia723f21c0316eff1a38e759ff9f044bb59aa8efa Signed-off-by: tao zeng <tao.zeng@amlogic.com>	2018-11-01 10:18:16 +08:00
tao zeng	68a4d449ed	mm: reduce watermark if free cma is too large [1/1] PD#SWPL-807 Problem: Sometimes driver can't allocation memory under atomic environment. And free pages is enough but they are nearly ALL CMA pages. Solution: Reduce watermark with harf of free cma pages even allocation support CMA. Verify: P212 Change-Id: I8e49768d4384ed064775537754a2b7f09a5bbb7c Signed-off-by: tao zeng <tao.zeng@amlogic.com>	2018-10-29 20:07:51 -07:00
Victor Wan	cc7b1eac54	Merge branch 'android-4.9' into amlogic-4.9-dev Signed-off-by: Victor Wan <victor.wan@amlogic.com> Conflicts: drivers/md/dm-bufio.c drivers/media/dvb-core/dvb_frontend.c drivers/usb/dwc3/core.c drivers/usb/gadget/function/f_fs.c	2018-08-07 14:43:24 +08:00
Greg Kroah-Hartman	9e79039544	Merge 4.9.112 into android-4.9 Changes in 4.9.112 usb: cdc_acm: Add quirk for Uniden UBC125 scanner USB: serial: cp210x: add CESINEL device ids USB: serial: cp210x: add Silicon Labs IDs for Windows Update usb: dwc2: fix the incorrect bitmaps for the ports of multi_tt hub n_tty: Fix stall at n_tty_receive_char_special(). n_tty: Access echo_* variables carefully. staging: android: ion: Return an ERR_PTR in ion_map_kernel vt: prevent leaking uninitialized data to userspace via /dev/vcs* i2c: rcar: fix resume by always initializing registers before transfer ipv4: Fix error return value in fib_convert_metrics() kprobes/x86: Do not modify singlestep buffer while resuming netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain() Revert "sit: reload iphdr in ipip6_rcv" net: phy: micrel: fix crash when statistic requested for KSZ9031 phy ARM: dts: imx6q: Use correct SDMA script for SPI5 core IB/hfi1: Fix user context tail allocation for DMA_RTAIL x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths x86/cpu: Re-apply forced caps every time CPU caps are re-read mm: hugetlb: yield when prepping struct pages tracing: Fix missing return symbol in function_graph output scsi: sg: mitigate read/write abuse s390: Correct register corruption in critical section cleanup drbd: fix access after free cifs: Fix infinite loop when using hard mount option drm/udl: fix display corruption of the last line jbd2: don't mark block as modified if the handle is out of credits ext4: make sure bitmaps and the inode table don't overlap with bg descriptors ext4: always check block group bounds in ext4_init_block_bitmap() ext4: only look at the bg_flags field if it is valid ext4: verify the depth of extent tree in ext4_find_extent() ext4: include the illegal physical block in the bad map ext4_error msg ext4: clear i_data in ext4_inode_info when removing inline data ext4: add more inode number paranoia checks ext4: add more mount time checks of the superblock ext4: check superblock mapped prior to committing mlxsw: spectrum: Forbid linking of VLAN devices to devices that have uppers HID: i2c-hid: Fix "incomplete report" noise HID: hiddev: fix potential Spectre v1 HID: debug: check length before copy_to_user() PM / OPP: Update voltage in case freq == old_freq Kbuild: fix # escaping in .cmd files for future Make media: cx25840: Use subdev host data for PLL override mm, page_alloc: do not break __GFP_THISNODE by zonelist reset dm bufio: avoid sleeping while holding the dm_bufio lock dm bufio: drop the lock when doing GFP_NOIO allocation mtd: rawnand: mxc: set spare area size register explicitly dm bufio: don't take the lock in dm_bufio_shrink_count mtd: cfi_cmdset_0002: Change definition naming to retry write operation mtd: cfi_cmdset_0002: Change erase functions to retry for error mtd: cfi_cmdset_0002: Change erase functions to check chip good only netfilter: nf_log: don't hold nf_log_mutex during user access staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write() Linux 4.9.112 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-07-11 16:40:16 +02:00
Vlastimil Babka	6cfbbdd2bc	mm, page_alloc: do not break __GFP_THISNODE by zonelist reset commit `7810e6781e` upstream. In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for allocations that can ignore memory policies. The zonelist is obtained from current CPU's node. This is a problem for __GFP_THISNODE allocations that want to allocate on a different node, e.g. because the allocating thread has been migrated to a different CPU. This has been observed to break SLAB in our 4.4-based kernel, because there it relies on __GFP_THISNODE working as intended. If a slab page is put on wrong node's list, then further list manipulations may corrupt the list because page_to_nid() is used to determine which node's list_lock should be locked and thus we may take a wrong lock and race. Current SLAB implementation seems to be immune by luck thanks to commit `511e3a0588` ("mm/slab: make cache_grow() handle the page allocated on arbitrary node") but there may be others assuming that __GFP_THISNODE works as promised. We can fix it by simply removing the zonelist reset completely. There is actually no reason to reset it, because memory policies and cpusets don't affect the zonelist choice in the first place. This was different when commit `183f6371aa` ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their own restricted zonelists. We might consider this for 4.17 although I don't know if there's anything currently broken. SLAB is currently not affected, but in kernels older than 4.7 that don't yet have `511e3a0588` ("mm/slab: make cache_grow() handle the page allocated on arbitrary node") it is. That's at least 4.4 LTS. Older ones I'll have to check. So stable backports should be more important, but will have to be reviewed carefully, as the code went through many changes. BTW I think that also the ac->preferred_zoneref reset is currently useless if we don't also reset ac->nodemask from a mempolicy to NULL first (which we probably should for the OOM victims etc?), but I would leave that for a separate patch. Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Fixes: `183f6371aa` ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-07-11 16:26:45 +02:00
Rik van Riel	dbe0f61220	ANDROID: add extra free kbytes tunable Add a userspace visible knob to tell the VM to keep an extra amount of memory free, by increasing the gap between each zone's min and low watermarks. This is useful for realtime applications that call system calls and have a bound on the number of allocations that happen in any short time period. In this application, extra_free_kbytes would be left at an amount equal to or larger than than the maximum number of allocations that happen in any burst. It may also be useful to reduce the memory use of virtual machines (temporarily?), in a way that does not cause memory fragmentation like ballooning does. [ccross] Revived for use on old kernels where no other solution exists. The tunable will be removed on kernels that do better at avoiding direct reclaim. [surenb] Will be reverted as soon as Android framework is reworked to use upstream-supported watermark_scale_factor instead of extra_free_kbytes. Bug: 86445363 Change-Id: I765a42be8e964bfd3e2886d1ca85a29d60c3bb3e Signed-off-by: Rik van Riel<riel@redhat.com> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2018-06-04 15:47:24 +00:00
Victor Wan	810c6dd972	Merge branch 'android-4.9' into amlogic-4.9-dev Signed-off-by: Victor Wan <victor.wan@amlogic.com> Conflicts: arch/arm/configs/bcm2835_defconfig arch/arm/configs/sunxi_defconfig include/linux/cpufreq.h init/main.c	2018-04-24 17:43:19 +08:00
tao zeng	718776ccb5	mm: optimize for CMA allocate time PD#159608: mm: optimize for CMA allocate time 1. Make all amlogic-changed mm code configuarable, which are wrapped by CONFIG_AMLOGIC_CMA/CONFIG_AMLOGIC_MEMORY_EXTEND 2. Implement some core code of CMA to a single file: drivers/amlogic/memory_ext/aml_cma.c 3. detailed imporove steps: a) use NOOP as default IO-scheduler for nand based storage. which can avoid long time wait for page lock found in CFQ scheduler; b) use per-cpu thread to allocate CMA concurrent when driver request large amount CMA memory; these threads have high user nice value to reduce schedule delay; c) increase task user nice of mmc queue and kswapd. d) wake up kswapd if page are hold by kswap shrink list and cma isolated test failed. e) Fobidden low user nice task use CMA, which can avoid priority inversion problem. f) optimize for LRU usage, devide each type of LRU to 2 parts, normal pages are linked after LRU head, CMA pages are linked after cma_list. g) avoid compaction case move cma forbidden pages to cma area. h) Increase strength of lowmemory killer. 4. Improve read speed of /proc/pagetrace, a filter can be set to reduce message which not print functions allocate memory less than filter value: echo filter=xxx > /proc/pagetrace Change-Id: Ie79288b7947aa642e4f7eacc25565559a73660df Signed-off-by: tao zeng <tao.zeng@amlogic.com>	2018-03-05 15:34:36 +08:00
Sami Tolvanen	97d5fd27f7	mm: fix drain_local_pages function type Bug: 67506682 Change-Id: I6ca80f521c880589efe45dc467d494051daae015 Signed-off-by: Sami Tolvanen <samitolvanen@google.com>	2018-02-28 15:09:59 -08:00
Johannes Weiner	19a7db1e2e	mm: fix 100% CPU kswapd busyloop on unreclaimable nodes commit `c73322d098` upstream. Patch series "mm: kswapd spinning on unreclaimable nodes - fixes and cleanups". Jia reported a scenario in which the kswapd of a node indefinitely spins at 100% CPU usage. We have seen similar cases at Facebook. The kernel's current method of judging its ability to reclaim a node (or whether to back off and sleep) is based on the amount of scanned pages in proportion to the amount of reclaimable pages. In Jia's and our scenarios, there are no reclaimable pages in the node, however, and the condition for backing off is never met. Kswapd busyloops in an attempt to restore the watermarks while having nothing to work with. This series reworks the definition of an unreclaimable node based not on scanning but on whether kswapd is able to actually reclaim pages in MAX_RECLAIM_RETRIES (16) consecutive runs. This is the same criteria the page allocator uses for giving up on direct reclaim and invoking the OOM killer. If it cannot free any pages, kswapd will go to sleep and leave further attempts to direct reclaim invocations, which will either make progress and re-enable kswapd, or invoke the OOM killer. Patch #1 fixes the immediate problem Jia reported, the remainder are smaller fixlets, cleanups, and overall phasing out of the old method. Patch #6 is the odd one out. It's a nice cleanup to get_scan_count(), and directly related to #5, but in itself not relevant to the series. If the whole series is too ambitious for 4.11, I would consider the first three patches fixes, the rest cleanups. This patch (of 9): Jia He reports a problem with kswapd spinning at 100% CPU when requesting more hugepages than memory available in the system: $ echo 4000 >/proc/sys/vm/nr_hugepages top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3 At that time, there are no reclaimable pages left in the node, but as kswapd fails to restore the high watermarks it refuses to go to sleep. Kswapd needs to back away from nodes that fail to balance. Up until commit `1d82de618d` ("mm, vmscan: make kswapd reclaim in terms of nodes") kswapd had such a mechanism. It considered zones whose theoretically reclaimable pages it had reclaimed six times over as unreclaimable and backed away from them. This guard was erroneously removed as the patch changed the definition of a balanced node. However, simply restoring this code wouldn't help in the case reported here: there are no reclaimable pages that could be scanned until the threshold is met. Kswapd would stay awake anyway. Introduce a new and much simpler way of backing off. If kswapd runs through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single page, make it back off from the node. This is the same number of shots direct reclaim takes before declaring OOM. Kswapd will go to sleep on that node until a direct reclaimer manages to reclaim some pages, thus proving the node reclaimable again. [hannes@cmpxchg.org: check kswapd failure against the cumulative nr_reclaimed count] Link: http://lkml.kernel.org/r/20170306162410.GB2090@cmpxchg.org [shakeelb@google.com: fix condition for throttle_direct_reclaim] Link: http://lkml.kernel.org/r/20170314183228.20152-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20170228214007.5621-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Jia He <hejianet@gmail.com> Tested-by: Jia He <hejianet@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dmitry Shmidt <dimitrysh@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-31 12:55:53 +01:00
Vlastimil Babka	685cce58f1	mm, page_alloc: fix potential false positive in __zone_watermark_ok commit `b050e3769c` upstream. Since commit `97a16fc82a` ("mm, page_alloc: only enforce watermarks for order-0 allocations"), __zone_watermark_ok() check for high-order allocations will shortcut per-migratetype free list checks for ALLOC_HARDER allocations, and return true as long as there's free page of any migratetype. The intention is that ALLOC_HARDER can allocate from MIGRATE_HIGHATOMIC free lists, while normal allocations can't. However, as a side effect, the watermark check will then also return true when there are pages only on the MIGRATE_ISOLATE list, or (prior to CMA conversion to ZONE_MOVABLE) on the MIGRATE_CMA list. Since the allocation cannot actually obtain isolated pages, and might not be able to obtain CMA pages, this can result in a false positive. The condition should be rare and perhaps the outcome is not a fatal one. Still, it's better if the watermark check is correct. There also shouldn't be a performance tradeoff here. Link: http://lkml.kernel.org/r/20171102125001.23708-1-vbabka@suse.cz Fixes: `97a16fc82a` ("mm, page_alloc: only enforce watermarks for order-0 allocations") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-31 12:55:51 +01:00
tao zeng	c2603ea592	mm: fix compile error on other configs PD#153120: mm:fix compile error on other configs Change-Id: I8e72afa508c44149f69c6d8c34698d13848e539c Signed-off-by: tao zeng <tao.zeng@amlogic.com>	2018-01-16 00:03:40 -07:00
tao zeng	5f97fc6b3d	mm: close debug print of cma alloc PD#153120: mm: close debug print of cma alloc These print message may print a lot when video playback And cause it not smooth. Change-Id: If1f18d5e8a0234f1daca2c6e803a23ba90354414 Signed-off-by: tao zeng <tao.zeng@amlogic.com>	2018-01-14 23:12:59 -07:00
Victor Wan	2c95ea743b	Merge branch 'android-4.9' into amlogic-4.9-dev Conflicts: arch/arm/configs/omap2plus_defconfig drivers/Makefile drivers/android/binder.c	2018-01-08 18:44:19 +08:00
tao zeng	4f40b91e06	mm: fix lowmem issue PD#157252: mm: fix lowmem issue 1. add statistics for CMA pages; 2. reduce cma print and not protect cma unless driver called; 3. fix cma usage policy in alloc path; 4. increase file scan ratio in kswapd; 5. using NOOP for default IO-scheduler; 6. change alloc flags of ZRAM to retry harder. Change-Id: If7b0363da03da1682efe3996c69bb9d511299209 Signed-off-by: tao zeng <tao.zeng@amlogic.com>	2017-12-28 20:20:30 -07:00
Michal Hocko	fae478cd93	mm: fix remote numa hits statistics [ Upstream commit `2df26639e7` ] Jia He has noticed that commit `b9f00e147f` ("mm, page_alloc: reduce branches in zone_statistics") has an unintentional side effect that remote node allocation requests are accounted as NUMA_MISS rathat than NUMA_HIT and NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE. There are many of these potentially because the flag is used very rarely while we have many users of __alloc_pages_node. Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a follow up patch) and treat all allocations that were satisfied from the preferred zone's node as NUMA_HITS because this is the same node we requested the allocation from in most cases. If this is not the local node then we just account it as NUMA_OTHER rather than NUMA_LOCAL. One downsize would be that an allocation request for a node which is outside of the mempolicy nodemask would be reported as a hit which is a bit weird but that was the case before `b9f00e147f` already. Fixes: `b9f00e147f` ("mm, page_alloc: reduce branches in zone_statistics") Link: http://lkml.kernel.org/r/20170102153057.9451-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Jia He <hejianet@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> # with cbmc[1] superpowers Acked-by: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-09 22:01:51 +01:00
Mike Kravetz	436f19a2e4	mm/cma: fix alloc_contig_range ret code/potential leak commit `63cd448908` upstream. If the call __alloc_contig_migrate_range() in alloc_contig_range returns -EBUSY, processing continues so that test_pages_isolated() is called where there is a tracepoint to identify the busy pages. However, it is possible for busy pages to become available between the calls to these two routines. In this case, the range of pages may be allocated. Unfortunately, the original return code (ret == -EBUSY) is still set and returned to the caller. Therefore, the caller believes the pages were not allocated and they are leaked. Update the comment to indicate that allocation is still possible even if __alloc_contig_migrate_range returns -EBUSY. Also, clear return code in this case so that it is not accidentally used or returned to caller. Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com Fixes: `8ef5849fa8` ("mm/cma: always check which page caused allocation failure") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:24:31 +01:00
Pavel Tatashin	9980b82783	mm/page_alloc.c: broken deferred calculation commit `d135e57502` upstream. In reset_deferred_meminit() we determine number of pages that must not be deferred. We initialize pages for at least 2G of memory, but also pages for reserved memory in this node. The reserved memory is determined in this function: memblock_reserved_memory_within(), which operates over physical addresses, and returns size in bytes. However, reset_deferred_meminit() assumes that that this function operates with pfns, and returns page count. The result is that in the best case machine boots slower than expected due to initializing more pages than needed in single thread, and in the worst case panics because fewer than needed pages are initialized early. Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com Fixes: `864b9a393d` ("mm: consider memblock reservations for deferred memory initialization sizing") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-24 08:33:42 +01:00
tao zeng	b99ebebc62	mm: close debug print of cma alloc PD#153120: mm: close debug print of cma alloc These print message may print a lot when video playback And cause it not smooth. Change-Id: I77920e902189ab4aa1ec594d351749ee4aa3a3ee Signed-off-by: tao zeng <tao.zeng@amlogic.com>	2017-11-12 22:29:12 -07:00
tao zeng	a998ca2c01	mm: add pagetrace function PD#151104: mm: add pagetrace function 1. implement pagetrace as a driver; you can get information of how many pages allocated by each function by read: cat /proc/pagetrace 2. fix wrong statistics of free memory of each migrate type. Change-Id: Ib2dff4bb5b3dd288ee188007352fc7b353eda100 Signed-off-by: tao zeng <tao.zeng@amlogic.com>	2017-10-23 00:11:33 -07:00
tao zeng	ee76ce9ce0	mm: fix cma allocate fail problem PD#152454: mm: fix cma allocate fail problem boost work shoud return right value of __alloc_contig_migrate_range Change-Id: I8275a8541cd263f5ea5574fba1c053a30e3cbc80 Signed-off-by: tao zeng <tao.zeng@amlogic.com>	2017-10-22 23:32:31 -07:00

1 2 3 4 5 ...

1278 Commits