linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-08 11:50:43 +09:00

Author	SHA1	Message	Date
Christian Brauner	28fdf6b165	fs: add i_user_ns() helper commit `a1ec9040a2` upstream. Since we'll be passing the filesystem's idmapping in even more places in the following patches and we do already dereference struct inode to get to the filesystem's idmapping multiple times add a tiny helper. Link: https://lore.kernel.org/r/20211123114227.3124056-10-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-10-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-10-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-04 13:00:53 -08:00
Christian Brauner	9ed9b9a53c	fs: port higher-level mapping helpers commit `209188ce75` upstream. Enable the mapped_fs{g,u}id() helpers to support filesystems mounted with an idmapping. Apart from core mapping helpers that use mapped_fs{g,u}id() to initialize struct inode's i_{g,u}id fields xfs is the only place that uses these low-level helpers directly. The patch only extends the helpers to be able to take the filesystem idmapping into account. Since we don't actually yet pass the filesystem's idmapping in no functional changes happen. This will happen in a final patch. Link: https://lore.kernel.org/r/20211123114227.3124056-9-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-9-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-9-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-04 13:00:53 -08:00
Christian Brauner	aaba2d360a	fs: remove unused low-level mapping helpers commit `02e4079913` upstream. Now that we ported all places to use the new low-level mapping helpers that are able to support filesystems mounted with an idmapping we can remove the old low-level mapping helpers. With the removal of these old helpers we also conclude the renaming of the mapping helpers we started in commit `a65e58e791` ("fs: document and rename fsid helpers"). Link: https://lore.kernel.org/r/20211123114227.3124056-8-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-8-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-8-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-04 13:00:53 -08:00
Christian Brauner	974a249420	fs: use low-level mapping helpers commit `4472071331` upstream. In a few places the vfs needs to interact with bare k{g,u}ids directly instead of struct inode. These are just a few. In previous patches we introduced low-level mapping helpers that are able to support filesystems mounted an idmapping. This patch simply converts the places to use these new helpers. Link: https://lore.kernel.org/r/20211123114227.3124056-7-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-7-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-7-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-04 13:00:53 -08:00
Christian Brauner	abd4895757	docs: update mapping documentation commit `8cc5c54de4` upstream. Now that we implement the full remapping algorithms described in our documentation remove the section about shortcircuting them. Link: https://lore.kernel.org/r/20211123114227.3124056-6-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-6-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-6-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-04 13:00:52 -08:00
Christian Brauner	6a1c1d6515	fs: account for filesystem mappings commit `1ac2a41049` upstream. Currently we only support idmapped mounts for filesystems mounted without an idmapping. This was a conscious decision mentioned in multiple places (cf. e.g. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now. Various container projects in userspace need this to run unprivileged and nested unprivileged containers (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] but I'll give an overview here again. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner. Because we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping the translation step from or into the filesystem idmapping could be skipped. In order to support idmapped mounts of filesystem's mountable with an idmapping the translation step we were able to skip before cannot be skipped anymore. A filesystem mounted with an idmapping is very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. More details with examples can be found in [3]. This patch adds a few new and prepares some already existing low-level mapping helpers to perform the full translation algorithm explained in [3]. The low-level helpers can be written in a way that they only perform the additional translation step when the filesystem is indeed mounted with an idmapping. If the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant k{g,u}id unchanged; no remapping needs to be performed at all. The no_idmapping() helper detects whether the shortcut can be used. If the low-level helpers detected that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shorcut and can continue to skip the translation step from or into the filesystem's idmapping. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all. This patch adds the helpers mapped_k{g,u}id_fs() and mapped_k{g,u}id_user(). Following patches will port all places to replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these two new helpers. After the conversion is done k{g,u}id_into_mnt() and k{g,u}id_from_mnt() will be removed. This also concludes the renaming of the mapping helpers we started in [4]. Now, all mapping helpers will started with the "mapped_" prefix making everything nice and consistent. The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt() helpers. They are to be used when k{g,u}ids are to be mapped from the vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers. They are to be used when k{g,u}ids are to be written to disk, e.g. when entering from a system call to change ownership of a file. This patch only introduces the helpers. It doesn't yet convert the relevant places to account for filesystem mounted with an idmapping. [1]: commit `2ca4dcc490` ("fs/mount_setattr: tighten permission checks") [2]: https://github.com/containers/podman/issues/10374 [3]: Documentations/filesystems/idmappings.rst [4]: commit `a65e58e791` ("fs: document and rename fsid helpers") Link: https://lore.kernel.org/r/20211123114227.3124056-5-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-5-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-5-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-04 13:00:52 -08:00
Christian Brauner	e896138e71	fs: tweak fsuidgid_has_mapping() commit `476860b3eb` upstream. If the caller's fs{g,u}id aren't mapped in the mount's idmapping we can return early and skip the check whether the mapped fs{g,u}id also have a mapping in the filesystem's idmapping. If the fs{g,u}id aren't mapped in the mount's idmapping they consequently can't be mapped in the filesystem's idmapping. So there's no point in checking that. Link: https://lore.kernel.org/r/20211123114227.3124056-4-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-4-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-4-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-04 13:00:52 -08:00
Christian Brauner	d3880474de	fs: move mapping helpers commit `a793d79ea3` upstream. The low-level mapping helpers were so far crammed into fs.h. They are out of place there. The fs.h header should just contain the higher-level mapping helpers that interact directly with vfs objects such as struct super_block or struct inode and not the bare mapping helpers. Similarly, only vfs and specific fs code shall interact with low-level mapping helpers. And so they won't be made accessible automatically through regular {g,u}id helpers. Link: https://lore.kernel.org/r/20211123114227.3124056-3-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-3-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-3-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-04 13:00:52 -08:00
Christian Brauner	6a373411de	fs: add is_idmapped_mnt() helper commit `bb49e9e730` upstream. Multiple places open-code the same check to determine whether a given mount is idmapped. Introduce a simple helper function that can be used instead. This allows us to get rid of the fragile open-coding. We will later change the check that is used to determine whether a given mount is idmapped. Introducing a helper allows us to do this in a single place instead of doing it for multiple places. Link: https://lore.kernel.org/r/20211123114227.3124056-2-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-2-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-2-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-04 13:00:52 -08:00
Jaegeuk Kim	fe2d3c2473	Revert "fs: add is_idmapped_mnt() helper" This reverts commit `bac0953c9f`.	2023-01-04 13:00:51 -08:00
Jaegeuk Kim	5094968a00	Revert "fs: move mapping helpers" This reverts commit `184519786e`.	2023-01-04 13:00:51 -08:00
Jaegeuk Kim	c8072d62cb	Revert "fs: tweak fsuidgid_has_mapping()" This reverts commit `47ab4bf156`.	2023-01-04 13:00:51 -08:00
Jaegeuk Kim	ad8f8965ff	Revert "fs: account for filesystem mappings" This reverts commit `791d198574`.	2023-01-04 13:00:51 -08:00
Jaegeuk Kim	93e71abbaa	Revert "docs: update mapping documentation" This reverts commit `bab0eb12d3`.	2023-01-04 13:00:51 -08:00
Jaegeuk Kim	3736f49a0e	Revert "fs: use low-level mapping helpers" This reverts commit `505f38a2fa`.	2023-01-04 13:00:51 -08:00
Jaegeuk Kim	76d110ca40	Revert "fs: remove unused low-level mapping helpers" This reverts commit `d6e05a8024`.	2023-01-04 13:00:50 -08:00
Jaegeuk Kim	adecd8eeae	Revert "fs: add i_user_ns() helper" This reverts commit `cab74ea1a7`.	2023-01-04 13:00:50 -08:00
Jaegeuk Kim	753c5d9322	Revert "fs: account for group membership" This reverts commit `09abecd7e0`.	2023-01-04 13:00:50 -08:00
Quentin Perret	9d79c30f82	ANDROID: KVM: arm64: Keep the pKVM private range under 1GiB The hypervisor memory pool is sized to allow mapping up to 1GiB of data in the 'private' range of the hypervisor. However, this is currently not enforced in any way, which might become a problem as private range mappings are used more and more (e.g. from pKVM modules). Enforce the 1GiB limit at allocation time, and while at it, rename __io_map_base to __private_range_base for consistency. Bug: 244543039 Change-Id: I32c9145ba331309b49428ff461a41c94ea0c1512 Signed-off-by: Quentin Perret <qperret@google.com>	2023-01-04 19:20:05 +00:00
Quentin Perret	a130b36c3c	ANDROID: KVM: arm64: Specify stage-2-protected regions in DT Parse the devicetree during pKVM init to find nodes with the "pkvm,protected-region" compatible string. These nodes specify a physical address range in reg that must alway be mapped as invalid in the host stage-2 page table when running under pKVM. Example DT: pkvm_prot_reg: pkvm_prot_reg@80000000 { compatible = "pkvm,protected-region"; reg = <0x00 0x80000000 0x00 0x200000>; }; Bug: 244543039 Bug: 244373730 Change-Id: I102cd16c91d96e5283cdd1a4fa58836cc4834eac Signed-off-by: Quentin Perret <qperret@google.com>	2023-01-04 19:20:05 +00:00
Quentin Perret	fec84bf038	ANDROID: KVM: arm64: Introduce concept of pKVM moveable regions The pKVM memory pool is currently sized to allow page-granularity mapping in the host stage-2 page-table of all the memory as well as up to 1GiB of MMIO range. Indeed, pKVM currently assumes that MMIO regions are completely and solely owned by the host for the entire lifetime of the system. As such, the pages used to map MMIO regions can always be recycled to allow forward progress if the memory pool ran out of pages -- pKVM can unmap MMIO ranges at stage-2 without fearing to loose important information about the state of the underlying page, and those mappings can always be reconstructed later. In order to allow transitioning the ownership of non-memory regions, introduce a concept of pkvm 'moveable' regions, which represents regions of the physical address space which can be 'moved' from an ownership perspective. These moveable regions are used to size the hyp memory pool. In a first step, the list of moveable regions is equal to the memblock list, but it will be extended in subsequent changes. No functional changes intended. Bug: 244543039 Bug: 244373730 Change-Id: I7f451924b1eed9579868e6ff8c7adc7b4a5a0ae1 Signed-off-by: Quentin Perret <qperret@google.com>	2023-01-04 19:20:05 +00:00
Quentin Perret	272164160a	ANDROID: KVM: arm64: Correctly flag MMIO pages as PKVM_PAGE_RESTRICTED_PROT The host_get_page_state() logic has currently a baked in assumption that it will only be used on memory, and checks against the default memory permssions to flag pages as having a RESTRICTED_PROT state. Add support for correctly flagging non-memory pages to prepare the ground for future patches. Bug: 244543039 Bug: 244373730 Change-Id: Idaaef96cb98c147c8b793059438064cf770af525 Signed-off-by: Quentin Perret <qperret@google.com>	2023-01-04 19:20:05 +00:00
Quentin Perret	4e54619b17	ANDROID: KVM: arm64: Introduce default_host_prot() pKVM uses different default permissions for memory and non-memory regions of the PA space. To avoid scattering this logic around, introduce a default_host_prot() helper function. Non functional changes intended. Bug: 244543039 Bug: 244373730 Change-Id: I36cdbb26a2cb0d54b5641f945f6ede4ffe371045 Signed-off-by: Quentin Perret <qperret@google.com>	2023-01-04 19:20:05 +00:00
Quentin Perret	191c0276be	ANDROID: KVM: arm64: Introduce a hyp panic module notifier pKVM modules may need to be notified in case of unexpected same-level EL2 exceptions, which result in a hyp panic. To do so, introduce a new notifier on the hyp_panic path. Bug: 244373730 Change-Id: I144609a933d648ddf2aebcd950e64d6035bf8be3 Signed-off-by: Quentin Perret <qperret@google.com>	2023-01-04 17:53:58 +00:00
Quentin Perret	2529c7a2bd	ANDROID: KVM: arm64: Expose linear map APIs to pKVM modules pKVM modules may need to temporarily map large-ish physically contiguous regions of memory when bootstrapping themselves. In order to support this use-case, introduce two new APIs in the module_ops struct allowing to map and unmap pages in pKVM's linear map range. Since pKVM's page ownership infrastructure relies on linear map PTEs, this needs to be done with special care. To avoid any problem, let's count the number of pages mapped by modules and unsure they have been unmapped before reaching the point of deprivilege. Bug: 244373730 Change-Id: I4aecb93f5c9ba08d9f830d1f0976704688b98509 Signed-off-by: Quentin Perret <qperret@google.com>	2023-01-04 17:53:58 +00:00
Minchan Kim	abafa1328b	BACKPORT: locking: Add missing __sched attributes This patch adds __sched attributes to a few missing places to show blocked function rather than locking function in get_wchan. Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220115231657.84828-1-minchan@kernel.org Conflicts: kernel/locking/percpu-rwsem.c 1. conflict <linux/sched/debug.h> Bug: 228243692 Change-Id: Ifb50c13cfdd7484269d9a291a8da515e1cce6a7b (cherry picked from commit `c441e934b6`) Signed-off-by: Minchan Kim <minchan@google.com> Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit `da358e264c`)	2023-01-04 02:28:50 +00:00
Minchan Kim	f74aca771c	BACKPORT: mm: don't be stuck to rmap lock on reclaim path The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contended under memory pressure if processes keep working on their vmas(e.g., fork, mmap, munmap). It makes reclaim path stuck. In our real workload traces, we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it makes other processes entering direct reclaim, which were also stuck on the lock. This patch makes lru aging path try_lock mode like shink_page_list so the reclaim context will keep working with next lru pages without being stuck. if it found the rmap lock contended, it rotates the page back to head of lru in both active/inactive lrus to make them consistent behavior, which is basic starting point rather than adding more heristic. Since this patch introduces a new "contended" field as out-param along with try_lock in-param in rmap_walk_control, it's not immutable any longer if the try_lock is set so remove const keywords on rmap related functions. Since rmap walking is already expensive operation, I doubt the const would help sizable benefit( And we didn't have it until 5.17). In a heavy app workload in Android, trace shows following statistics. It almost removes rmap lock contention from reclaim path. Martin Liu reported: Before: max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function 1632 0 1631 151.542173 31672 209 page_lock_anon_vma_read 601 0 601 145.544681 28817 198 rmap_walk_file After: max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function NaN NaN NaN NaN NaN 0.0 NaN 0 0 0 0.127645 1 12 rmap_walk_file [minchan@kernel.org: add comment, per Matthew] Link: https://lkml.kernel.org/r/YnNqeB5tUf6LZ57b@google.com Link: https://lkml.kernel.org/r/20220510215423.164547-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: John Dias <joaodias@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Martin Liu <liumartin@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: folio->page Conflicts: mm/huge_memory.c was refactored by commit `c5b5a3dd2c` mm: thp: refactor NUMA fault handling (cherry picked from commit `6d4675e601`) Bug: 239681156 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I0c63e0291120c8a1b5f2d83b8a7b210cb56c27a2 Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit `b8762fa265`)	2023-01-04 02:28:34 +00:00
Suren Baghdasaryan	abdeb31f26	ANDROID: page_pinner: prevent pp_buffer access before initialization If page_pinner is configured with page_pinner_enabled=false and failure_tracking=true, pp_buffer will be accessed without being initialized. Prevent this by adding page_pinner_inited checks in functions that access it. Fixes: 898cfbf094a2 ("ANDROID: mm: introduce page_pinner") Bug: 259024332 Bug: 260179017 Change-Id: I8f612cae3e74d36e8a4eee5edec25281246cbe5e Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit 23fb3111f63e5fe239a769668275c20493a5849c)	2023-01-04 02:18:52 +00:00
Charan Teja Kalla	8ca606e98b	FROMLIST: mm: fix use-after free of page_ext after race with memory-offline The below is one path where race between page_ext and offline of the respective memory blocks will cause use-after-free on the access of page_ext structure. process1 process2 --------- --------- a)doing /proc/page_owner doing memory offline through offline_pages. b)PageBuddy check is failed thus proceed to get the page_owner information through page_ext access. page_ext = lookup_page_ext(page); migrate_pages(); ................. Since all pages are successfully migrated as part of the offline operation,send MEM_OFFLINE notification where for page_ext it calls: offline_page_ext()--> __free_page_ext()--> free_page_ext()--> vfree(ms->page_ext) mem_section->page_ext = NULL c) Check for the PAGE_EXT flags in the page_ext->flags access results into the use-after-free(leading to the translation faults). As mentioned above, there is really no synchronization between page_ext access and its freeing in the memory_offline. The memory offline steps(roughly) on a memory block is as below: 1) Isolate all the pages 2) while(1) try free the pages to buddy.(->free_list[MIGRATE_ISOLATE]) 3) delete the pages from this buddy list. 4) Then free page_ext.(Note: The struct page is still alive as it is freed only during hot remove of the memory which frees the memmap, which steps the user might not perform). This design leads to the state where struct page is alive but the struct page_ext is freed, where the later is ideally part of the former which just representing the page_flags (check [3] for why this design is chosen). The above mentioned race is just one example __but the problem persists in the other paths too involving page_ext->flags access(eg: page_is_idle())__. Fix all the paths where offline races with page_ext access by maintaining synchronization with rcu lock and is achieved in 3 steps: 1) Invalidate all the page_ext's of the sections of a memory block by storing a flag in the LSB of mem_section->page_ext. 2) Wait till all the existing readers to finish working with the ->page_ext's with synchronize_rcu(). Any parallel process that starts after this call will not get page_ext, through lookup_page_ext(), for the block parallel offline operation is being performed. 3) Now safely free all sections ->page_ext's of the block on which offline operation is being performed. Note: If synchronize_rcu() takes time then optimizations can be done in this path through call_rcu()[2]. Thanks to David Hildenbrand for his views/suggestions on the initial discussion[1] and Pavan kondeti for various inputs on this patch. [1] https://lore.kernel.org/linux-mm/59edde13-4167-8550-86f0-11fc67882107@quicinc.com/ [2] https://lore.kernel.org/all/a26ce299-aed1-b8ad-711e-a49e82bdd180@quicinc.com/T/#u [3] https://lore.kernel.org/all/6fa6b7aa-731e-891c-3efb-a03d6a700efa@redhat.com/ Bug: 236222283 Bug: 240196534 Link: https://lore.kernel.org/all/1661496993-11473-1-git-send-email-quic_charante@quicinc.com/ Change-Id: Ib439ae19c61a557a5c70ea90e3c4b35a5583ba0d Suggested-by: David Hildenbrand <david@redhat.com> Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> Signed-off-by: Minchan Kim <minchan@google.com> (fixed merge conflicts and still exported lookup_page_ext) (minchan: fixed page_pinner with new page_ext scheme)	2023-01-04 02:18:52 +00:00
Minchan Kim	e12acd3eef	ANDROID: mm: introduce page_pinner For CMA allocation, it's really critical to migrate a page but sometimes it fails. One of the reasons is some driver holds a page refcount for a long time so VM couldn't migrate the page at that time. The concern here is there is no way to find the who hold the refcount of the page effectively. This patch introduces feature to keep tracking page's pinner. All get_page sites are vulnerable to pin a page for a long time but the cost to keep track it would be significat since get_page is the most frequent kernel operation. Furthermore, the page could be not user page but kernel page which is not related to the page migration failure. Thus, this patch keeps tracks of only migration failed pages to reduce runtime cost. Once page migration fails in CMA allocation path, those pages are marked as "migration failure" and every put_page operation against those pages, callstack of the put are recorded into page_pinner buffer. Later, admin can see what pages were failed and who released the refcount since the failure. It really helps effectively to find out longtime refcount holder to prevent the page migration. note: page_pinner doesn't guarantee attributing/unattributing are atomic if they happen at the same time. It's just best effort so false-positive could happen. Bug: 183414571 BUg: 240196534 Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I603d0c0122734c377db6b1eb95848a6f734173a0 (cherry picked from commit 898cfbf094a2fc13c67fab5b5d3c916f0139833a)	2023-01-04 02:18:52 +00:00
Eric Biggers	3c7625f85c	ANDROID: update "fscrypt: add support for hardware-wrapped keys" to v7 The hardware-wrapped key support in this branch is based on my patch "[RFC PATCH v3 3/3] fscrypt: add support for hardware-wrapped keys" (https://lore.kernel.org/r/20211021181608.54127-4-ebiggers@kernel.org) I've since made several updates to that patch and it is now at v7. This commit brings in the updates from v3 to v7, to the extent possible while retaining compatibility with the UAPI and on-disk format used for this feature in Android. This mainly includes some improved log messages, and compatibility with the blk-crypto updates. Bug: 160883801 Link: https://lore.kernel.org/all/20221216203636.81491-5-ebiggers@kernel.org Change-Id: I1c43ca55ec7e95dd06f8f7944100ffd14771d3a7 Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-01-03 23:00:35 +00:00
Eric Biggers	1006fa2eaa	ANDROID: update "dm: add support for passing through derive_sw_secret" Update this code to be compatible with the updated version of "block: add basic hardware-wrapped key support". Bug: 160883801 Change-Id: Ic6991ad163035870ace3cd468f53b21a824c5359 Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-01-03 23:00:35 +00:00
Eric Biggers	3918b39c3e	ANDROID: update "block: add basic hardware-wrapped key support" to v7 The hardware-wrapped key support in this branch is based on my patch "[RFC PATCH v3 1/3] block: add basic hardware-wrapped key support" (https://lore.kernel.org/all/20211021181608.54127-2-ebiggers@kernel.org). I've since made several updates to that patch and it is now at v7. This commit brings in the updates from v3 to v7. The main change is making blk_crypto_derive_sw_secret() operate on a struct block_device, and adding blk_crypto_hw_wrapped_keys_compatible(). This aligns with changes upstream in v6.1 and v6.2 that removed block-layer internal structures from the API that blk-crypto exposes to upper layers. There's also a slight change in prototype for ->derive_sw_secret, so a couple out-of-tree drivers will need to be updated, but people maintaining out-of-tree drivers know what they are dealing with anyway. Bug: 160883801 Link: https://lore.kernel.org/r/20221216203636.81491-2-ebiggers@kernel.org Change-Id: I0f285c11c2764064cd4a9d6eac0089099a9601ed Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-01-03 23:00:35 +00:00
Eric Biggers	35f067ef8d	ANDROID: dm-default-key: update for blk-crypto changes The prototypes of blk_crypto_evict_key() and blk_crypto_start_using_key() changed, so update the callers in dm-default-key which is not upstream. Bug: 160885805 Change-Id: Ie39a298d8aca77c042f11bbfa25fd9bf50593c52 Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-01-03 23:00:34 +00:00
Bart Van Assche	6e30f5a513	BACKPORT: blk-crypto: Add a missing include directive Allow the compiler to verify consistency of function declarations and function definitions. This patch fixes the following sparse errors: block/blk-crypto-profile.c:241:14: error: no previous prototype for ‘blk_crypto_get_keyslot’ [-Werror=missing-prototypes] 241 \| blk_status_t blk_crypto_get_keyslot(struct blk_crypto_profile profile, \| ^~~~~~~~~~~~~~~~~~~~~~ block/blk-crypto-profile.c:318:6: error: no previous prototype for ‘blk_crypto_put_keyslot’ [-Werror=missing-prototypes] 318 \| void blk_crypto_put_keyslot(struct blk_crypto_keyslot slot) \| ^~~~~~~~~~~~~~~~~~~~~~ block/blk-crypto-profile.c:344:6: error: no previous prototype for ‘__blk_crypto_cfg_supported’ [-Werror=missing-prototypes] 344 \| bool __blk_crypto_cfg_supported(struct blk_crypto_profile profile, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ block/blk-crypto-profile.c:373:5: error: no previous prototype for ‘__blk_crypto_evict_key’ [-Werror=missing-prototypes] 373 \| int __blk_crypto_evict_key(struct blk_crypto_profile profile, \| ^~~~~~~~~~~~~~~~~~~~~~ Cc: Eric Biggers <ebiggers@google.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221123172923.434339-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit `85168d416e`) (resolved trivial conflict) Change-Id: I797a99bc00c114dc86e74e1d5b1f7866f7e64a10 Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-01-03 23:00:34 +00:00
Christoph Hellwig	4909cb8714	UPSTREAM: blk-crypto: move internal only declarations to blk-crypto-internal.h blk_crypto_get_keyslot, blk_crypto_put_keyslot, __blk_crypto_evict_key and __blk_crypto_cfg_supported are only used internally by the blk-crypto code, so move the out of blk-crypto-profile.h, which is included by drivers that supply blk-crypto functionality. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221114042944.1009870-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit `3569788c08`) Change-Id: I80b07a1c3b6e6f41ffe48adbdb27a3ca4480ff75 Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-01-03 23:00:34 +00:00
Christoph Hellwig	9b46497fa7	BACKPORT: blk-crypto: add a blk_crypto_config_supported_natively helper Add a blk_crypto_config_supported_natively helper that wraps __blk_crypto_cfg_supported to retrieve the crypto_profile from the request queue. With this fscrypt can stop including blk-crypto-profile.h and rely on the public consumer interface in blk-crypto.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221114042944.1009870-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit `6715c98b6c`) (resolved conflicts in blk_crypto_config_supported() and __blk_crypto_bio_prep()) Change-Id: I40c4ab6bd9a108661c40c837227b6aed64685ae7 Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-01-03 23:00:34 +00:00
Christoph Hellwig	d7bd6ff825	BACKPORT: blk-crypto: don't use struct request_queue for public interfaces Switch all public blk-crypto interfaces to use struct block_device arguments to specify the device they operate on instead of th request_queue, which is a block layer implementation detail. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221114042944.1009870-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit `fce3caea0f`) (resolved conflict in blk_crypto_config_supported()) Change-Id: Ifde7cf1c8a2a5ddfb2fde4e5fb118269a3bfcdb0 Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-01-03 23:00:34 +00:00
Suren Baghdasaryan	a21b3ffd99	ANDROID: remove unnecessary SPECULATIVE_PAGE_FAULT config dependency After recent fixes [1], speculative page fault walks are performed with disabled interrupts, therefore do not depend on ALLOC_SPLIT_PTLOCKS which would affect them if performed under RCU protection. Remove unnecessary config dependency. [1] 5fcb50b0559a ("ANDROID: mm: fix speculative walk which is unsafe under RCU") Bug: 253557903 Change-Id: Ia1c835c7b08419f8fce61fa4f7e6842fbf786229 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2022-12-29 20:48:14 +00:00
Minchan Kim	98f3cc7ecd	ANDROID: mm: freeing MIGRATE_ISOLATE page instantly Since Android has pcp list for MIGRATE_CMA[1], it could cause CMA allocation latency due to not freeing the MIGRATE_ISOLATE page immediately. Originally, MIGRATE_ISOLATED page is supposed to go buddy list with skipping pcp list. Otherwise, the page could be reallocated from pcp list or staying on the pcp list until the pcp is drained so that CMA keeps retrying since it couldn't find the freed page from buddy list. That worked before since the CMA pfnblocks changed only from MIGRATE_CMA to MIGRATE_ISOLATE and free function logic in page allocator has checked MIGRATE_ISOLATEness on every CMA pages using below. free_unref_page_commit if (migratetype >= MIGRATE_PCPTYPES) if(is_migrate_isolate(migratetype)) free_one_page(page); It worked since enum MIGRATE_CMA was bigger than enum MIGRATE_PCPTYPES but since [1], the enum MIGRATE_CMA is less than MIGRATE_PCPTYPES so the logic above doesn't work any more. It could cause following race CPU 0 CPU 1 free_unref_page migratetype = get_pfnblock_migratetype() set_pcppage_migratetype(MIGRATE_CMA) cma_alloc alloc_contig_range set_migrate_isolate(MIGRATE_ISOLATE) add the page into pcp list the page could be reallocated This patch couldn't fix the race completely due to missing zone->lock in order-0 page free(for performance reason). However, it's not a new problem so we need to deal with the issue separately. [1] ANDROID: mm: add cma pcp list Bug: 218731671 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: Ibea20085ce5bfb4b74b83b041f9bda9a380120f9 Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit `d9e4b67784`)	2022-12-29 09:09:24 +00:00
Isaac J. Manjarres	2535deae80	ANDROID: GKI: Source GKI_BUILD_CONFIG_FRAGMENT after setting all variables build.config.gki sources a GKI_BUILD_CONFIG_FRAGMENT before all of the variables that are considered as part of a GKI kernel build are declared. This reduces the effectiveness of a GKI_BUILD_CONFIG_FRAGMENT, as it is only able to modify a subset of the build variables. Thus, move the logic to source GKI_BUILD_CONFIG_FRAGMENT to the end of the GKI build config files to provide more flexibility for a GKI_BUILD_CONFIG_FRAGMENT. Bug: 262930113 Change-Id: I74abb45f9043acce04cb0052f54fded4340a9366 [isaacmanjarres: Modified build.config.gki.aarch64.fips140, which did not exist on android13-5.15.] Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> (cherry picked from commit 69fefbb3db711e543ff0676526b7d285a4d10a14)	2022-12-27 13:44:09 -08:00
Quentin Perret	99e15725f7	ANDROID: KVM: arm64: Allow trap handling from pKVM modules Introduce a new default trap handler for the host that can be set from modules. Bug: 244543039 Bug: 245034629 Change-Id: Iaabfa44f5f2c41af51f36ed4eec8762e7c951c01 Signed-off-by: Quentin Perret <qperret@google.com>	2022-12-23 09:12:26 +00:00
Quentin Perret	7d67ca44dd	ANDROID: KVM: arm64: Notify pKVM modules of PSCI events Introduce a notifier allowing a pKVM module to be notified for major PSCI events: {CPU,SYSTEM}_SUSPEND, as well as on the resume path. Bug: 244543039 Bug: 245034629 Change-Id: Ia82923445214925fc77e321457c8eab31f9d42e8 Signed-off-by: Quentin Perret <qperret@google.com>	2022-12-23 09:12:25 +00:00
Quentin Perret	84cfedad9f	ANDROID: KVM: arm64: Allow handling illegal aborts from pKVM modules Introduce a new handler allowing to notify pKVM modules when pKVM detects an illegal access from the host. Bug: 244543039 Bug: 245034629 Change-Id: I62133a8d967d91437e5216b307e449f8c83dfab6 Signed-off-by: Quentin Perret <qperret@google.com>	2022-12-23 09:06:01 +00:00
Quentin Perret	5c8793e6f5	ANDROID: KVM: arm64: Allow SMC handling from pKVM modules Introduce a new default SMC handler for the host that can be set from modules. Bug: 244543039 Bug: 245034629 Change-Id: I8481bfb1926a3cb433b15de5c1a99e3550710689 Signed-off-by: Quentin Perret <qperret@google.com>	2022-12-23 09:06:01 +00:00
Eric Biggers	03c9b79ae2	fsverity: simplify fsverity_get_digest() Instead of looking up the algorithm by name in hash_algo_name[] to get its hash_algo ID, just store the hash_algo ID in the fsverity_hash_alg struct. Verify at boot time that every fsverity_hash_alg has a valid hash_algo ID with matching digest size. Remove an unnecessary memset() of the whole digest array to 0 before the digest is copied into it. Finally, remove the pr_debug statement. There is already a pr_debug for the fsverity digest when the file is opened. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Link: https://lore.kernel.org/r/20221129045139.69803-1-ebiggers@kernel.org	2022-12-22 10:57:47 -08:00
Eric Biggers	f1f49c3d97	fsverity: stop using PG_error to track error status As a step towards freeing the PG_error flag for other uses, change ext4 and f2fs to stop using PG_error to track verity errors. Instead, if a verity error occurs, just mark the whole bio as failed. The coarser granularity isn't really a problem since it isn't any worse than what the block layer provides, and errors from a multi-page readahead aren't reported to applications unless a single-page read fails too. f2fs supports compression, which makes the f2fs changes a bit more complicated than desired, but the basic premise still works. Note: there are still a few uses of PageError in f2fs, but they are on the write path, so they are unrelated and this patch doesn't touch them. Reviewed-by: Chao Yu <chao@kernel.org> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221129070401.156114-1-ebiggers@kernel.org	2022-12-22 10:57:47 -08:00
Eric Biggers	0c8369242e	fs-verity: use kmap_local_page() instead of kmap() Convert the use of kmap() to its recommended replacement kmap_local_page(). This avoids the overhead of doing a non-local mapping, which is unnecessary in this case. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Link: https://lore.kernel.org/r/20220818224010.43778-1-ebiggers@kernel.org	2022-12-22 10:57:47 -08:00
Fabio M. De Francesco	2070d7da64	highmem: Make __kunmap_{local,atomic}() take const void pointer __kunmap_ {local,atomic}() currently take pointers to void. However, this is semantically incorrect, since these functions do not change the memory their arguments point to. Therefore, make this semantics explicit by modifying the __kunmap_{local,atomic}() prototypes to take pointers to const void. As a side effect, compilers may produce more efficient code. Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Helge Deller <deller@gmx.de> # parisc Suggested-by: David Sterba <dsterba@suse.cz> Suggested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-12-22 10:57:47 -08:00
Eric Biggers	b6062e09be	fs-verity: use memcpy_from_page() Replace extract_hash() with the memcpy_from_page() helper function. This is simpler, and it has the side effect of replacing the use of kmap_atomic() with its recommended replacement kmap_local_page(). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Link: https://lore.kernel.org/r/20220818223903.43710-1-ebiggers@kernel.org	2022-12-22 10:57:46 -08:00

1 2 3 4 5 ...

1064108 Commits