linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-08 11:50:43 +09:00

Author	SHA1	Message	Date
Miaohe Lin	990e52b17d	hugetlbfs: remove unneeded header file The header file signal.h is unneeded now. Remove it. Link: https://lkml.kernel.org/r/20220726142918.51693-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:19 -07:00
Miaohe Lin	7ec3c362cf	hugetlbfs: remove unneeded hugetlbfs_ops forward declaration The forward declaration for hugetlbfs_ops is unnecessary. Remove it. Link: https://lkml.kernel.org/r/20220726142918.51693-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:19 -07:00
Miaohe Lin	d00365175e	hugetlbfs: use helper macro SZ_1{K,M} Patch series "A few cleanup and fixup patches for hugetlbfs", v2. This series contains a few cleaup patches to remove unneeded forward declaration, use helper macro and so on. More details can be found in the respective changelogs. This patch (of 5): Use helper macro SZ_1K and SZ_1M to do the size conversion. Minor readability improvement. Link: https://lkml.kernel.org/r/20220726142918.51693-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220726142918.51693-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:19 -07:00
Kefeng Wang	bb077c3ffd	mm: cleanup is_highmem() It is unnecessary to add CONFIG_HIGHMEM check in is_highmem(), which has been done in is_highmem_idx(), and move is_highmem() close to is_highmem_idx(). This has no functional impact. Link: https://lkml.kernel.org/r/20220726131816.149075-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:19 -07:00
Ralph Campbell	f6c3e1ae01	mm/hmm: add a test for cross device private faults Add a simple test case for when hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a device private PTE is found for a device other than the hmm_range::dev_private_owner. This should cause the page to be faulted back to system memory from the other device and the PFN returned in the output array. Also, remove a piece of code that unnecessarily unmaps part of the buffer. Link: https://lkml.kernel.org/r/20220727000837.4128709-3-rcampbell@nvidia.com Link: https://lkml.kernel.org/r/20220725183615.4118795-3-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Philip Yang <Philip.Yang@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:18 -07:00
Peter Xu	68deb82a7b	selftests: add soft-dirty into run_vmtests.sh Link: https://lkml.kernel.org/r/20220725142048.30450-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:18 -07:00
Peter Xu	c942f5bd17	selftests: soft-dirty: add test for mprotect Add two soft-dirty test cases for mprotect() on both anon or file. Link: https://lkml.kernel.org/r/20220725142048.30450-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:18 -07:00
Peter Xu	76aefad628	mm/mprotect: fix soft-dirty check in can_change_pte_writable() Patch series "mm/mprotect: Fix soft-dirty checks", v4. This patch (of 3): The check wanted to make sure when soft-dirty tracking is enabled we won't grant write bit by accident, as a page fault is needed for dirty tracking. The intention is correct but we didn't check it right because VM_SOFTDIRTY set actually means soft-dirty tracking disabled. Fix it. There's another thing tricky about soft-dirty is that, we can't check the vma flag !(vma_flags & VM_SOFTDIRTY) directly but only check it after we checked CONFIG_MEM_SOFT_DIRTY because otherwise VM_SOFTDIRTY will be defined as zero, and !(vma_flags & VM_SOFTDIRTY) will constantly return true. To avoid misuse, introduce a helper for checking whether vma has soft-dirty tracking enabled. We can easily verify this with any exclusive anonymous page, like program below: =======8<====== #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <assert.h> #include <inttypes.h> #include <stdint.h> #include <sys/types.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <fcntl.h> #include <stdbool.h> #define BIT_ULL(nr) (1ULL << (nr)) #define PM_SOFT_DIRTY BIT_ULL(55) unsigned int psize; char page; uint64_t pagemap_read_vaddr(int fd, void vaddr) { uint64_t value; int ret; ret = pread(fd, &value, sizeof(uint64_t), ((uint64_t)vaddr >> 12) * sizeof(uint64_t)); assert(ret == sizeof(uint64_t)); return value; } void clear_refs_write(void) { int fd = open("/proc/self/clear_refs", O_RDWR); assert(fd >= 0); write(fd, "4", 2); close(fd); } #define check_soft_dirty(str, expect) do { \ bool dirty = pagemap_read_vaddr(fd, page) & PM_SOFT_DIRTY; \ if (dirty != expect) { \ printf("ERROR: %s, soft-dirty=%d (expect: %d) ", str, dirty, expect); \ exit(-1); \ } \ } while (0) int main(void) { int fd = open("/proc/self/pagemap", O_RDONLY); assert(fd >= 0); psize = getpagesize(); page = mmap(NULL, psize, PROT_READ\|PROT_WRITE, MAP_ANONYMOUS\|MAP_PRIVATE, -1, 0); assert(page != MAP_FAILED); page = 1; check_soft_dirty("Just faulted in page", 1); clear_refs_write(); check_soft_dirty("Clear_refs written", 0); mprotect(page, psize, PROT_READ); check_soft_dirty("Marked RO", 0); mprotect(page, psize, PROT_READ\|PROT_WRITE); check_soft_dirty("Marked RW", 0); page = 2; check_soft_dirty("Wrote page again", 1); munmap(page, psize); close(fd); printf("Test passed. "); return 0; } =======8<====== Here we attach a Fixes to commit `64fe24a3e0` only for easy tracking, as this patch won't apply to a tree before that point. However the commit wasn't the source of problem, but instead `64e455079e`. It's just that after `64fe24a3e0` anonymous memory will also suffer from this problem with mprotect(). Link: https://lkml.kernel.org/r/20220725142048.30450-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20220725142048.30450-2-peterx@redhat.com Fixes: `64e455079e` ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Fixes: `64fe24a3e0` ("mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:18 -07:00
Tetsuo Handa	68aaee147e	mm: memcontrol: fix potential oom_lock recursion deadlock syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting memcg OOM [1]. If this allocation triggers the global OOM situation then the system can livelock because the GFP_KERNEL allocation with oom_lock held cannot trigger the global OOM killer because __alloc_pages_may_oom() fails to hold oom_lock. Fix this problem by removing the allocation from memory_stat_format() completely, and pass static buffer when calling from memcg OOM path. Note that the caller holding filesystem lock was the trigger for syzbot to report this locking dependency. Doing GFP_KERNEL allocation with filesystem lock held can deadlock the system even without involving OOM situation. Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] Link: https://lkml.kernel.org/r/86afb39f-8c65-bec2-6cfc-c5e3cd600c0b@I-love.SAKURA.ne.jp Fixes: `c8713d0b23` ("mm: memcontrol: dump memory.stat during cgroup OOM") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:18 -07:00
Alistair Popple	65974cb910	mm/gup.c: fix formatting in check_and_migrate_movable_page() Commit `b05a79d437` ("mm/gup: migrate device coherent pages when pinning instead of failing") added a badly formatted if statement. Fix it. Link: https://lkml.kernel.org/r/20220721020552.1397598-2-apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:18 -07:00
Shiyang Ruan	35fcd75af3	xfs: fail dax mount if reflink is enabled on a partition Failure notification is not supported on partitions. So, when we mount a reflink enabled xfs on a partition with dax option, let it fail with -EINVAL code. Link: https://lkml.kernel.org/r/20220609143435.393724-1-ruansy.fnst@fujitsu.com Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:17 -07:00
Jiebin Sun	873f64b791	mm/memcontrol.c: remove the redundant updating of stats_flush_threshold Remove the redundant updating of stats_flush_threshold. If the global var stats_flush_threshold has exceeded the trigger value for __mem_cgroup_flush_stats, further increment is unnecessary. Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads). Score gain: 1.95x Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%) CPU: ICX 8380 x 2 sockets Core number: 40 x 2 physical cores Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads) Link: https://lkml.kernel.org/r/20220722164949.47760-1-jiebin.sun@intel.com Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> Acked-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Amadeusz Sawiski <amadeuszx.slawinski@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:17 -07:00
Axel Rasmussen	914eedcb9b	userfaultfd: don't fail on unrecognized features The basic interaction for setting up a userfaultfd is, userspace issues a UFFDIO_API ioctl, and passes in a set of zero or more feature flags, indicating the features they would prefer to use. Of course, different kernels may support different sets of features (depending on kernel version, kconfig options, architecture, etc). Userspace's expectations may also not match: perhaps it was built against newer kernel headers, which defined some features the kernel it's running on doesn't support. Currently, if userspace passes in a flag we don't recognize, the initialization fails and we return -EINVAL. This isn't great, though. Userspace doesn't have an obvious way to react to this; sure, one of the features I asked for was unavailable, but which one? The only option it has is to turn off things "at random" and hope something works. Instead, modify UFFDIO_API to just ignore any unrecognized feature flags. The interaction is now that the initialization will succeed, and as always we return the subset of feature flags that can actually be used back to userspace. Now userspace has an obvious way to react: it checks if any flags it asked for are missing. If so, it can conclude this kernel doesn't support those, and it can either resign itself to not using them, or fail with an error on its own, or whatever else. Link: https://lkml.kernel.org/r/20220722201513.1624158-1-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:17 -07:00
Miaohe Lin	2727cfe407	hugetlb_cgroup: fix wrong hugetlb cgroup numa stat We forget to set cft->private for numa stat file. As a result, numa stat of hstates[0] is always showed for all hstates. Encode the hstates index into cft->private to fix this issue. Link: https://lkml.kernel.org/r/20220723073804.53035-1-linmiaohe@huawei.com Fixes: `f477619990` ("hugetlb: add hugetlb.*.numa_stat file") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:17 -07:00
Dan Carpenter	360b420dbd	selftest/vm: uninitialized variable in main() Initialize "length" to zero by default. Link: https://lkml.kernel.org/r/YtZzjvHXVXMXxpXO@kili Fixes: `ff712a627f` ("selftests/vm: cleanup hugetlb file after mremap test") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:17 -07:00
Kassey Li	198729c962	mm/cma_debug.c: align the name buffer length as struct cma Avoids truncating the debugfs output to 16 chars. Potentially alters the userspace output, but this is a debugfs interface and there are no stability guarantees. Link: https://lkml.kernel.org/r/20220719091554.27864-1-quic_yingangl@quicinc.com Signed-off-by: Kassey Li <quic_yingangl@quicinc.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:16 -07:00
Dan Carpenter	3d5367a042	tools/testing/selftests/vm/hugetlb-madvise.c: silence uninitialized variable warning This code just reads from memory without caring about the data itself. However static checkers complain that "tmp" is never properly initialized. Initialize it to zero and change the name to "dummy" to show that we don't care about the value stored in it. Link: https://lkml.kernel.org/r/YtZ8mKJmktA2GaHB@kili Fixes: `c4b6cb8840` ("selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Souptick Joarder (HPE) <jrdr.linux@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:16 -07:00
Miaohe Lin	6d97cf88dd	mm/mempolicy: remove unneeded out label We can use unlock label to unlock ptl and return ret directly to remove the unneeded out label and reduce the size of mempolicy.o. No functional change intended. [Before] text data bss dec hex filename 26702 3972 6168 36842 8fea mm/mempolicy.o [After] text data bss dec hex filename 26662 3972 6168 36802 8fc2 mm/mempolicy.o Link: https://lkml.kernel.org/r/20220719115233.6706-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:16 -07:00
Mark-PK Tsai	189cdcfeef	mm/page_alloc: correct the wrong cpuset file path in comment cpuset.c was moved to kernel/cgroup/ in below commit `201af4c0fa` ("cgroup: move cgroup files under kernel/cgroup/") Correct the wrong path in comment. Link: https://lkml.kernel.org/r/20220718120336.5145-1-mark-pk.tsai@mediatek.com Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:16 -07:00
Miaohe Lin	4d8ff64097	mm: remove unneeded PageAnon check in restore_exclusive_pte() When code reaches here, the page must be !PageAnon. There's no need to check PageAnon again. Remove it. Link: https://lkml.kernel.org/r/20220716081816.10752-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:16 -07:00
Yixuan Cao	9b7a4039d6	tools/vm/page_owner_sort.c: adjust the indent in is_need() I noticed one more indentation than necessary in is_need(). Link: https://lkml.kernel.org/r/20220717195506.7602-1-caoyixuan2019@email.szu.edu.cn Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:16 -07:00
Theodore Ts'o	e408e695f5	mm/shmem: support FS_IOC_[SG]ETFLAGS in tmpfs This allows userspace to set flags like FS_APPEND_FL, FS_IMMUTABLE_FL, FS_NODUMP_FL, etc., like all other standard Linux file systems. [akpm@linux-foundation.org: fix CONFIG_TMPFS_XATTR=n warnings] Link: https://lkml.kernel.org/r/20220715015912.2560575-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:15 -07:00
Jianglei Nie	188043c7f4	mm/damon/reclaim: fix potential memory leak in damon_reclaim_init() damon_reclaim_init() allocates a memory chunk for ctx with damon_new_ctx(). When damon_select_ops() fails, ctx is not released, which will lead to a memory leak. We should release the ctx with damon_destroy_ctx() when damon_select_ops() fails to fix the memory leak. Link: https://lkml.kernel.org/r/20220714063746.2343549-1-niejianglei2021@163.com Fixes: `4d69c34578` ("mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations()") Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:15 -07:00
Yosry Ahmed	73b73bac90	mm: vmpressure: don't count proactive reclaim in vmpressure memory.reclaim is a cgroup v2 interface that allows users to proactively reclaim memory from a memcg, without real memory pressure. Reclaim operations invoke vmpressure, which is used: (a) To notify userspace of reclaim efficiency in cgroup v1, and (b) As a signal for a memcg being under memory pressure for networking (see mem_cgroup_under_socket_pressure()). For (a), vmpressure notifications in v1 are not affected by this change since memory.reclaim is a v2 feature. For (b), the effects of the vmpressure signal (according to Shakeel [1]) are as follows: 1. Reducing send and receive buffers of the current socket. 2. May drop packets on the rx path. 3. May throttle current thread on the tx path. Since proactive reclaim is invoked directly by userspace, not by memory pressure, it makes sense not to throttle networking. Hence, this change makes sure that proactive reclaim caused by memory.reclaim does not trigger vmpressure. [1] https://lore.kernel.org/lkml/CALvZod68WdrXEmBpOkadhB5GPYmCXaDZzXH=yyGOCAjFRn4NDQ@mail.gmail.com/ [yosryahmed@google.com: update documentation] Link: https://lkml.kernel.org/r/20220721173015.2643248-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20220714064918.2576464-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:15 -07:00
Hui Zhu	c7e6f17b52	zsmalloc: zs_malloc: return ERR_PTR on failure zs_malloc returns 0 if it fails. zs_zpool_malloc will return -1 when zs_malloc return 0. But -1 makes the return value unclear. For example, when zswap_frontswap_store calls zs_malloc through zs_zpool_malloc, it will return -1 to its caller. The other return value is -EINVAL, -ENODEV or something else. This commit changes zs_malloc to return ERR_PTR on failure. It didn't just let zs_zpool_malloc return -ENOMEM becaue zs_malloc has two types of failure: - size is not OK return -EINVAL - memory alloc fail return -ENOMEM. Link: https://lkml.kernel.org/r/20220714080757.12161-1-teawater@gmail.com Signed-off-by: Hui Zhu <teawater@antgroup.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:15 -07:00
Xiu Jianfeng	fef3e9066d	writeback: remove inode_to_wb_is_valid() inode_to_wb_is_valid() is no longer used since commit `fe55d563d4` ("remove inode_congested()"), remove it. Link: https://lkml.kernel.org/r/20220714084147.140324-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:15 -07:00
Zhou Guanghui	450d0e74d8	memblock,arm64: expand the static memblock memory table In a system(Huawei Ascend ARM64 SoC) using HBM, a multi-bit ECC error occurs, and the BIOS will mark the corresponding area (for example, 2 MB) as unusable. When the system restarts next time, these areas are not reported or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a larger number of memblocks. For example, if the EFI_UNUSABLE_MEMORY type is reported: ... memory[0x92] [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x93] [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x94] [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0 memory[0x95] [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x96] [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x97] [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x98] [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0 memory[0x99] [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0 memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0 memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0 memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0 ... The EFI memory map is parsed to construct the memblock arrays before the memblock arrays can be resized. As the result, memory regions beyond INIT_MEMBLOCK_REGIONS are lost. Add a new macro INIT_MEMBLOCK_MEMORY_REGIONS to replace INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory array. Allow overriding memblock.memory array size with architecture defined INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled. Link: https://lkml.kernel.org/r/20220615102742.96450-1-zhouguanghui1@huawei.com Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Darren Hart <darren@os.amperecomputing.com> Acked-by: Will Deacon <will@kernel.org> [arm64] Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Xu Qiang <xuqiang36@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:15 -07:00
Miaohe Lin	0f0b6931ff	mm: remove obsolete comment in do_fault_around() Since commit `7267ec008b` ("mm: postpone page table allocation until we have page to map"), do_fault_around is not called with page table lock held. Cleanup the corresponding comments. Link: https://lkml.kernel.org/r/20220716080359.38791-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:14 -07:00
William Lam	b717d6b93b	mm: compaction: include compound page count for scanning in pageblock isolation The number of scanned pages can be lower than the number of isolated pages when isolating mirgratable or free pageblock. The metric is being reported in trace event and also used in vmstat. some example output from trace where it shows nr_taken can be greater than nr_scanned: Produced by kernel v5.19-rc6 kcompactd0-42 [001] ..... 1210.268022: mm_compaction_isolate_migratepages: range=(0x107ae4 ~ 0x107c00) nr_scanned=265 nr_taken=255 [...] kcompactd0-42 [001] ..... 1210.268382: mm_compaction_isolate_freepages: range=(0x215800 ~ 0x215a00) nr_scanned=13 nr_taken=128 kcompactd0-42 [001] ..... 1210.268383: mm_compaction_isolate_freepages: range=(0x215600 ~ 0x215680) nr_scanned=1 nr_taken=128 mm_compaction_isolate_migratepages does not seem to have this behaviour, but for the reason of consistency, nr_scanned should also be taken care of in that side. This behaviour is confusing since currently the count for isolated pages takes account of compound page but not for the case of scanned pages. And given that the number of isolated pages(nr_taken) reported in mm_compaction_isolate_template trace event is on a single-page basis, the ambiguity when reporting the number of scanned pages can be removed by also including compound page count. Link: https://lkml.kernel.org/r/20220711202806.22296-1-william.lam@bytedance.com Signed-off-by: William Lam <william.lam@bytedance.com> Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:14 -07:00
Adam Sindelar	ac3ced5fc1	selftests/vm: skip 128TBswitch on unsupported arch The test va_128TBswitch.c exercises a feature only supported on PPC and x86_64, but it's run on other 64-bit archs as well. Before this patch, the test did nothing and returned 0 for KSFT_PASS. This patch makes it return the KSFT codes from kselftest.h, including KSFT_SKIP when appropriate. Verified on arm64 and x86_64. Link: https://lkml.kernel.org/r/20220704123813.427625-1-adam@wowsignal.io Signed-off-by: Adam Sindelar <adam@wowsignal.io> Cc: David Vernet <void@manifault.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:14 -07:00
Adam Sindelar	3b8e7f5c42	selftests/vm: fix errno handling in mrelease_test mrelease_test should return KSFT_SKIP when process_mrelease is not defined, but due to a perror call consuming the errno, it returns KSFT_FAIL. This patch decides the exit code before calling perror. [adam@wowsignal.io: fix remaining instances of errno mishandling] Link: https://lkml.kernel.org/r/20220706141602.10159-1-adam@wowsignal.io Link: https://lkml.kernel.org/r/20220704173351.19595-1-adam@wowsignal.io Fixes: `33776141b8` ("selftests: vm: add process_mrelease tests") Signed-off-by: Adam Sindelar <adam@wowsignal.io> Reviewed-by: David Vernet <void@manifault.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:14 -07:00
Roman Gushchin	d6e103a757	mm: memcontrol: do not miss MEMCG_MAX events for enforced allocations Yafang Shao reported an issue related to the accounting of bpf memory: if a bpf map is charged indirectly for memory consumed from an interrupt context and allocations are enforced, MEMCG_MAX events are not raised. It's not/less of an issue in a generic case because consequent allocations from a process context will trigger the direct reclaim and MEMCG_MAX events will be raised. However a bpf map can belong to a dying/abandoned memory cgroup, so there will be no allocations from a process context and no MEMCG_MAX events will be triggered. Link: https://lkml.kernel.org/r/20220702033521.64630-1-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reported-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:14 -07:00
Miaohe Lin	ccac11da67	filemap: minor cleanup for filemap_write_and_wait_range Restructure the logic in filemap_write_and_wait_range to simplify the code and make it more consistent with file_write_and_wait_range. No functional change intended. Link: https://lkml.kernel.org/r/20220627132351.55680-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:14 -07:00
Miaohe Lin	7f82f92231	mm/mmap.c: fix missing call to vm_unacct_memory in mmap_region Since the beginning, charged is set to 0 to avoid calling vm_unacct_memory twice because vm_unacct_memory will be called by above unmap_region. But since commit `4f74d2c8e8` ("vm: remove 'nr_accounted' calculations from the unmap_vmas() interfaces"), unmap_region doesn't call vm_unacct_memory anymore. So charged shouldn't be set to 0 now otherwise the calling to paired vm_unacct_memory will be missed and leads to imbalanced account. Link: https://lkml.kernel.org/r/20220618082027.43391-1-linmiaohe@huawei.com Fixes: `4f74d2c8e8` ("vm: remove 'nr_accounted' calculations from the unmap_vmas() interfaces") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:13 -07:00
Liam Howlett	b0cab80ecd	android: binder: fix lockdep check on clearing vma When munmapping a vma, the mmap_lock can be degraded to a write before calling close() on the file handle. The binder close() function calls binder_alloc_set_vma() to clear the vma address, which now has a lock dep check for writing on the mmap_lock. Change the lockdep check to ensure the reading lock is held while clearing and keep the write check while writing. Link: https://lkml.kernel.org/r/20220627151857.2316964-1-Liam.Howlett@oracle.com Fixes: 472a68df605b ("android: binder: stop saving a pointer to the VMA") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: syzbot+da54fa8d793ca89c741f@syzkaller.appspotmail.com Acked-by: Todd Kjos <tkjos@google.com> Cc: "Arve Hjønnevåg" <arve@android.com> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hridya Valsaraju <hridya@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Martijn Coenen <maco@android.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:13 -07:00
Liam R. Howlett	a43cfc87ca	android: binder: stop saving a pointer to the VMA Do not record a pointer to a VMA outside of the mmap_lock for later use. This is unsafe and there are a number of failure paths after the recorded VMA pointer may be freed during setup. There is no callback to the driver to clear the saved pointer from generic mm code. Furthermore, the VMA pointer may become stale if any number of VMA operations end up freeing the VMA so saving it was fragile to being with. Instead, change the binder_alloc struct to record the start address of the VMA and use vma_lookup() to get the vma when needed. Add lockdep mmap_lock checks on updates to the vma pointer to ensure the lock is held and depend on that lock for synchronization of readers and writers - which was already the case anyways, so the smp_wmb()/smp_rmb() was not necessary. [akpm@linux-foundation.org: fix drivers/android/binder_alloc_selftest.c] Link: https://lkml.kernel.org/r/20220621140212.vpkio64idahetbyf@revolver Fixes: `da1b9564e8` ("android: binder: fix the race mmap and alloc_new_buf_locked") Reported-by: syzbot+58b51ac2b04e388ab7b0@syzkaller.appspotmail.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hridya Valsaraju <hridya@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Martijn Coenen <maco@android.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:13 -07:00
Liam R. Howlett	15d2ce7129	mips: rename mt_init to mips_mt_init Move mt_init out of the way for the maple tree. Use mips_mt prefix to match the rest of the functions in the file. Link: https://lkml.kernel.org/r/20220504002554.654642-2-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Howells <dhowells@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:13 -07:00
Tetsuo Handa	14773bfa70	mm: shrinkers: fix double kfree on shrinker name syzbot is reporting double kfree() at free_prealloced_shrinker() [1], for destroy_unused_super() calls free_prealloced_shrinker() even if prealloc_shrinker() returned an error. Explicitly clear shrinker name when prealloc_shrinker() called kfree(). [roman.gushchin@linux.dev: zero shrinker->name in all cases where shrinker->name is freed] Link: https://lkml.kernel.org/r/YtgteTnQTgyuKUSY@castle Link: https://syzkaller.appspot.com/bug?extid=8b481578352d4637f510 [1] Link: https://lkml.kernel.org/r/ffa62ece-6a42-2644-16cf-0d33ef32c676@I-love.SAKURA.ne.jp Fixes: `e33c267ab7` ("mm: shrinkers: provide shrinkers with names") Reported-by: syzbot <syzbot+8b481578352d4637f510@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:13 -07:00
NeilBrown	d6a97d3f58	NFSD: add security label to struct nfsd_attrs nfsd_setattr() now sets a security label if provided, and nfsv4 provides it in the 'open' and 'create' paths and the 'setattr' path. If setting the label failed (including because the kernel doesn't support labels), an error field in 'struct nfsd_attrs' is set, and the caller can respond. The open/create callers clear FATTR4_WORD2_SECURITY_LABEL in the returned attr set in this case. The setattr caller returns the error. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:17:00 -04:00
NeilBrown	93adc1e391	NFSD: set attributes when creating symlinks The NFS protocol includes attributes when creating symlinks. Linux does store attributes for symlinks and allows them to be set, though they are not used for permission checking. NFSD currently doesn't set standard (struct iattr) attributes when creating symlinks, but for NFSv4 it does set ACLs and security labels. This is inconsistent. To improve consistency, pass the provided attributes into nfsd_symlink() and call nfsd_create_setattr() to set them. NOTE: this results in a behaviour change for all NFS versions when the client sends non-default attributes with a SYMLINK request. With the Linux client, the only attributes are: attr.ia_mode = S_IFLNK \| S_IRWXUGO; attr.ia_valid = ATTR_MODE; so the final outcome will be unchanged. Other clients might sent different attributes, and if they did they probably expect them to be honoured. We ignore any error from nfsd_create_setattr(). It isn't really clear what should be done if a file is successfully created, but the attributes cannot be set. NFS doesn't allow partial success to be reported. Reporting failure is probably more misleading than reporting success, so the status is ignored. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:17:00 -04:00
NeilBrown	7fe2a71dda	NFSD: introduce struct nfsd_attrs The attributes that nfsd might want to set on a file include 'struct iattr' as well as an ACL and security label. The latter two are passed around quite separately from the first, in part because they are only needed for NFSv4. This leads to some clumsiness in the code, such as the attributes NOT being set in nfsd_create_setattr(). We need to keep the directory locked until all attributes are set to ensure the file is never visibile without all its attributes. This need combined with the inconsistent handling of attributes leads to more clumsiness. As a first step towards tidying this up, introduce 'struct nfsd_attrs'. This is passed (by reference) to vfs.c functions that work with attributes, and is assembled by the various nfs*proc functions which call them. As yet only iattr is included, but future patches will expand this. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:17:00 -04:00
Jeff Layton	876c553cb4	NFSD: verify the opened dentry after setting a delegation Between opening a file and setting a delegation on it, someone could rename or unlink the dentry. If this happens, we do not want to grant a delegation on the open. On a CLAIM_NULL open, we're opening by filename, and we may (in the non-create case) or may not (in the create case) be holding i_rwsem when attempting to set a delegation. The latter case allows a race. After getting a lease, redo the lookup of the file being opened and validate that the resulting dentry matches the one in the open file description. To properly redo the lookup we need an rqst pointer to pass to nfsd_lookup_dentry(), so make sure that is available. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:17:00 -04:00
Jeff Layton	bbf936edd5	NFSD: drop fh argument from alloc_init_deleg Currently, we pass the fh of the opened file down through several functions so that alloc_init_deleg can pass it to delegation_blocked. The filehandle of the open file is available in the nfs4_file however, so there's no need to pass it in a separate argument. Drop the argument from alloc_init_deleg, nfs4_open_delegation and nfs4_set_delegation. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:17:00 -04:00
Chuck Lever	a11ada99ce	NFSD: Move copy offload callback arguments into a separate structure Refactor so that CB_OFFLOAD arguments can be passed without allocating a whole struct nfsd4_copy object. On my system (x86_64) this removes another 96 bytes from struct nfsd4_copy. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:17:00 -04:00
Chuck Lever	e72f9bc006	NFSD: Add nfsd4_send_cb_offload() Refactor for legibility. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:16:59 -04:00
Chuck Lever	ad1e46c9b0	NFSD: Remove kmalloc from nfsd4_do_async_copy() Instead of manufacturing a phony struct nfsd_file, pass the struct file returned by nfs42_ssc_open() directly to nfsd4_do_copy(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:16:59 -04:00
Chuck Lever	3b7bf5933c	NFSD: Refactor nfsd4_do_copy() Refactor: Now that nfsd4_do_copy() no longer calls the cleanup helpers, plumb the use of struct file pointers all the way down to _nfsd_copy_file_range(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:16:59 -04:00
Chuck Lever	478ed7b10d	NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) Move the nfsd4_cleanup_*() call sites out of nfsd4_do_copy(). A subsequent patch will modify one of the new call sites to avoid the need to manufacture the phony struct nfsd_file. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:16:59 -04:00
Chuck Lever	24d796ea38	NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) The @src parameter is sometimes a pointer to a struct nfsd_file and sometimes a pointer to struct file hiding in a phony struct nfsd_file. Refactor nfsd4_cleanup_inter_ssc() so the @src parameter is always an explicit struct file. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:16:59 -04:00
Chuck Lever	1913cdf56c	NFSD: Replace boolean fields in struct nfsd4_copy Clean up: saves 8 bytes, and we can replace check_and_set_stop_copy() with an atomic bitop. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:16:59 -04:00

... 74 75 76 77 78 ...

1123688 Commits