linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-07 19:30:30 +09:00

Author	SHA1	Message	Date
Suren Baghdasaryan	d0bf79a102	ANDROID: GKI: enable CONFIG_ANON_VMA_NAME to support anonymous vma names Enable CONFIG_ANON_VMA_NAME to support anonymous vma names. Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I38d547359fa799d48b286582997baeb5de50423f	2022-01-18 14:31:28 -08:00
Arnd Bergmann	b5db68c850	UPSTREAM: mm: move anon_vma declarations to linux/mm_inline.h The patch to add anonymous vma names causes a build failure in some configurations: include/linux/mm_types.h: In function 'is_same_vma_anon_name': include/linux/mm_types.h:924:37: error: implicit declaration of function 'strcmp' [-Werror=implicit-function-declaration] 924 \| return name && vma_name && !strcmp(name, vma_name); \| ^~~~~~ include/linux/mm_types.h:22:1: note: 'strcmp' is defined in header '<string.h>'; did you forget to '#include <string.h>'? This should not really be part of linux/mm_types.h in the first place, as that header is meant to only contain structure defintions and need a minimum set of indirect includes itself. While the header clearly includes more than it should at this point, let's not make it worse by including string.h as well, which would pull in the expensive (compile-speed wise) fortify-string logic. Move the new functions into a separate header that only needs to be included in a couple of locations. Link: https://lkml.kernel.org/r/20211207125710.2503446-1-arnd@kernel.org Fixes: "mm: add a field to store names for private anonymous memory" Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Colin Cross <ccross@google.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `17fca131ce`) Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I54719d7ea27d3cf53ef7245b2af88d2a2bc9bafe	2022-01-18 14:27:45 -08:00
Suren Baghdasaryan	a145fd90ac	UPSTREAM: mm: add anonymous vma name refcounting While forking a process with high number (64K) of named anonymous vmas the overhead caused by strdup() is noticeable. Experiments with ARM64 Android device show up to 40% performance regression when forking a process with 64k unpopulated anonymous vmas using the max name lengths vs the same process with the same number of anonymous vmas having no name. Introduce anon_vma_name refcounted structure to avoid the overhead of copying vma names during fork() and when splitting named anonymous vmas. When a vma is duplicated, instead of copying the name we increment the refcount of this structure. Multiple vmas can point to the same anon_vma_name as long as they increment the refcount. The name member of anon_vma_name structure is assigned at structure allocation time and is never changed. If vma name changes then the refcount of the original structure is dropped, a new anon_vma_name structure is allocated to hold the new name and the vma pointer is updated to point to the new structure. With this approach the fork() performance regressions is reduced 3-4x times and with usecases using more reasonable number of VMAs (a few thousand) the regressions is not measurable. Link: https://lkml.kernel.org/r/20211019215511.3771969-3-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Colin Cross <ccross@google.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Glauber <jan.glauber@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Landley <rob@landley.net> Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com> Cc: Shaohua Li <shli@fusionio.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `78db341283`) Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I4b6d63b1aced3813ebb91479f4bcfd0d89e8fa29	2022-01-18 14:14:39 -08:00
Colin Cross	5be683755f	UPSTREAM: mm: add a field to store names for private anonymous memory In many userspace applications, and especially in VM based applications like Android uses heavily, there are multiple different allocators in use. At a minimum there is libc malloc and the stack, and in many cases there are libc malloc, the stack, direct syscalls to mmap anonymous memory, and multiple VM heaps (one for small objects, one for big objects, etc.). Each of these layers usually has its own tools to inspect its usage; malloc by compiling a debug version, the VM through heap inspection tools, and for direct syscalls there is usually no way to track them. On Android we heavily use a set of tools that use an extended version of the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped in userspace and slice their usage by process, shared (COW) vs. unique mappings, backing, etc. This can account for real physical memory usage even in cases like fork without exec (which Android uses heavily to share as many private COW pages as possible between processes), Kernel SamePage Merging, and clean zero pages. It produces a measurement of the pages that only exist in that process (USS, for unique), and a measurement of the physical memory usage of that process with the cost of shared pages being evenly split between processes that share them (PSS). If all anonymous memory is indistinguishable then figuring out the real physical memory usage (PSS) of each heap requires either a pagemap walking tool that can understand the heap debugging of every layer, or for every layer's heap debugging tools to implement the pagemap walking logic, in which case it is hard to get a consistent view of memory across the whole system. Tracking the information in userspace leads to all sorts of problems. It either needs to be stored inside the process, which means every process has to have an API to export its current heap information upon request, or it has to be stored externally in a filesystem that somebody needs to clean up on crashes. It needs to be readable while the process is still running, so it has to have some sort of synchronization with every layer of userspace. Efficiently tracking the ranges requires reimplementing something like the kernel vma trees, and linking to it from every layer of userspace. It requires more memory, more syscalls, more runtime cost, and more complexity to separately track regions that the kernel is already tracking. This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a userspace-provided name for anonymous vmas. The names of named anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:<name>]. Userspace can set the name for a region of memory by calling prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name) Setting the name to NULL clears it. The name length limit is 80 bytes including NUL-terminator and is checked to contain only printable ascii characters (including space), except '[',']','\','$' and '`'. Ascii strings are being used to have a descriptive identifiers for vmas, which can be understood by the users reading /proc/pid/maps or /proc/pid/smaps. Names can be standardized for a given system and they can include some variable parts such as the name of the allocator or a library, tid of the thread using it, etc. The name is stored in a pointer in the shared union in vm_area_struct that points to a null terminated string. Anonymous vmas with the same name (equivalent strings) and are otherwise mergeable will be merged. The name pointers are not shared between vmas even if they contain the same name. The name pointer is stored in a union with fields that are only used on file-backed mappings, so it does not increase memory usage. CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this feature. It keeps the feature disabled by default to prevent any additional memory overhead and to avoid confusing procfs parsers on systems which are not ready to support named anonymous vmas. The patch is based on the original patch developed by Colin Cross, more specifically on its latest version [1] posted upstream by Sumit Semwal. It used a userspace pointer to store vma names. In that design, name pointers could be shared between vmas. However during the last upstreaming attempt, Kees Cook raised concerns [2] about this approach and suggested to copy the name into kernel memory space, perform validity checks [3] and store as a string referenced from vm_area_struct. One big concern is about fork() performance which would need to strdup anonymous vma names. Dave Hansen suggested experimenting with worst-case scenario of forking a process with 64k vmas having longest possible names [4]. I ran this experiment on an ARM64 Android device and recorded a worst-case regression of almost 40% when forking such a process. This regression is addressed in the followup patch which replaces the pointer to a name with a refcounted structure that allows sharing the name pointer between vmas of the same name. Instead of duplicating the string during fork() or when splitting a vma it increments the refcount. [1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/ [2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/ [3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/ [4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/ Changes for prctl(2) manual page (in the options section): PR_SET_VMA Sets an attribute specified in arg2 for virtual memory areas starting from the address specified in arg3 and spanning the size specified in arg4. arg5 specifies the value of the attribute to be set. Note that assigning an attribute to a virtual memory area might prevent it from being merged with adjacent virtual memory areas due to the difference in that attribute's value. Currently, arg2 must be one of: PR_SET_VMA_ANON_NAME Set a name for anonymous virtual memory areas. arg5 should be a pointer to a null-terminated string containing the name. The name length including null byte cannot exceed 80 bytes. If arg5 is NULL, the name of the appropriate anonymous virtual memory areas will be reset. The name can contain only printable ascii characters (including space), except '[',']','\','$' and '`'. This feature is available only if the kernel is built with the CONFIG_ANON_VMA_NAME option enabled. [surenb@google.com: docs: proc.rst: /proc/PID/maps: fix malformed table] Link: https://lkml.kernel.org/r/20211123185928.2513763-1-surenb@google.com [surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy, added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the work here was done by Colin Cross, therefore, with his permission, keeping him as the author] Link: https://lkml.kernel.org/r/20211019215511.3771969-2-surenb@google.com Signed-off-by: Colin Cross <ccross@google.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Glauber <jan.glauber@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Landley <rob@landley.net> Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com> Cc: Shaohua Li <shli@fusionio.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `9a10064f56`) Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I53d56d551a7d62f75341304751814294b447c04e	2022-01-18 14:08:57 -08:00
Colin Cross	eaf81c4217	UPSTREAM: mm: rearrange madvise code to allow for reuse Patch series "mm: rearrange madvise code to allow for reuse", v11. Avoid performance regression of the new anon vma name field refcounting it. I checked the image sizes with allnoconfig builds: unpatched Linus' ToT text data bss dec hex filename 1324759 32 73928 1398719 1557bf vmlinux After the first patch is applied (madvise refactoring) text data bss dec hex filename 1322346 32 73928 1396306 154e52 vmlinux >>> 2413 bytes decrease vs ToT <<< After all patches applied with CONFIG_ANON_VMA_NAME=n text data bss dec hex filename 1322337 32 73928 1396297 154e49 vmlinux >>> 2422 bytes decrease vs ToT <<< After all patches applied with CONFIG_ANON_VMA_NAME=y text data bss dec hex filename 1325228 32 73928 1399188 155994 vmlinux >>> 469 bytes increase vs ToT <<< This patch (of 3): Refactor the madvise syscall to allow for parts of it to be reused by a prctl syscall that affects vmas. Move the code that walks vmas in a virtual address range into a function that takes a function pointer as a parameter. The only caller for now is sys_madvise, which uses it to call madvise_vma_behavior on each vma, but the next patch will add an additional caller. Move handling all vma behaviors inside madvise_behavior, and rename it to madvise_vma_behavior. Move the code that updates the flags on a vma, including splitting or merging the vma as necessary, into a new function called madvise_update_vma. The next patch will add support for updating a new anon_name field as well. Link: https://lkml.kernel.org/r/20211019215511.3771969-1-surenb@google.com Signed-off-by: Colin Cross <ccross@google.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jan Glauber <jan.glauber@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Rob Landley <rob@landley.net> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com> Cc: David Rientjes <rientjes@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Shaohua Li <shli@fusionio.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `ac1e9acc5a`) Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: If96c14ca3acc3795de373d658ba0a940dda68e1c	2022-01-18 13:55:21 -08:00
Suren Baghdasaryan	d99767b97a	Revert "ANDROID: mm: add a field to store names for private anonymous memory" This reverts commit `60500a4228`. Replacing out-of-tree implementation with the upstream one. Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ic34c8e16d51ccf9f00cb59d2de341e911bcb2828	2022-01-18 13:44:53 -08:00
Suren Baghdasaryan	ea253a055d	Revert "ANDROID: mm: fix up new call to vma_merge()" This reverts commit `7df9282d8e`. Replacing out-of-tree implementation with the upstream one. Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Iefc8aeeea89d89b4960d51da8625cf5d48b5e98f	2022-01-18 13:44:35 -08:00
Suren Baghdasaryan	bf8d29c109	Revert "ANDROID: fix up `60500a4228` ("ANDROID: mm: add a field to store names for private anonymous memory")" This reverts commit `b5c8a97d50`. Replacing out-of-tree implementation with the upstream one. Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I8f97071b73bfb1af66a8349c9575e7c53af00642	2022-01-18 13:44:19 -08:00
Connor O'Brien	3778d1a75a	ANDROID: GKI: defconfig: enable BTF debug info Build BTF type info into the kernel to enable use of BPF-based tools such as BCC's libbpf-tools. Bug: 203823368 Test: build Signed-off-by: Connor O'Brien <connoro@google.com> Change-Id: Ice20d6bbf83b3a2407a553a37a9befff6c6bb66d	2022-01-18 18:01:18 +00:00
Connor O'Brien	44b9bfcd4c	FROMGIT: tools/resolve_btfids: Build with host flags resolve_btfids is built using $(HOSTCC) and $(HOSTLD) but does not pick up the corresponding flags. As a result, host-specific settings (such as a sysroot specified via HOSTCFLAGS=--sysroot=..., or a linker specified via HOSTLDFLAGS=-fuse-ld=...) will not be respected. Fix this by setting CFLAGS to KBUILD_HOSTCFLAGS and LDFLAGS to KBUILD_HOSTLDFLAGS. Also pass the cflags through to libbpf via EXTRA_CFLAGS to ensure that the host libbpf is built with flags consistent with resolve_btfids. Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220112002503.115968-1-connoro@google.com (cherry picked from commit `0e3a1c902f` git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master) Bug: 203823368 Test: build with CONFIG_DEBUG_INFO_BTF=y Signed-off-by: Connor O'Brien <connoro@google.com> Change-Id: I09ee10b29b57933653eb1cdd4249bac2d9cebf22	2022-01-18 18:00:57 +00:00
Quentin Perret	5c1e9f311f	ANDROID: ABI: Update the generic symbol list Bug: 207662659 Signed-off-by: Quentin Perret <qperret@google.com> Change-Id: Ia7f7730e14f9a43c5cf3be22960efd552976223c	2022-01-18 17:19:15 +00:00
Kuan-Ying Lee	819223c02c	UPSTREAM: kasan, slub: reset tag when printing address The address still includes the tags when it is printed. With hardware tag-based kasan enabled, we will get a false positive KASAN issue when we access metadata. Reset the tag before we access the metadata. Link: https://lkml.kernel.org/r/20210804090957.12393-3-Kuan-Ying.Lee@mediatek.com Fixes: `aa1ef4d7b3` ("kasan, mm: reset tags when accessing metadata") Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Nicholas Tang <nicholas.tang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `340caf178d`) Bug: 187129171 Signed-off-by: Connor O'Brien <connoro@google.com> Change-Id: I9657a312f7629b1b44f90ec647bb858d78932b4f	2022-01-18 09:09:57 -08:00
Qi Zheng	3b8b51547b	UPSTREAM: mm: fix the deadlock in finish_fault() Commit `63f3655f95` ("mm, memcg: fix reclaim deadlock with writeback") fix the following ABBA deadlock by pre-allocating the pte page table without holding the page lock. lock_page(A) SetPageWriteback(A) unlock_page(A) lock_page(B) lock_page(B) pte_alloc_one shrink_page_list wait_on_page_writeback(A) SetPageWriteback(B) unlock_page(B) # flush A, B to clear the writeback Commit `f9ce0be71d` ("mm: Cleanup faultaround and finish_fault() codepaths") reworked the relevant code but ignored this race. This will cause the deadlock above to appear again, so fix it. Link: https://lkml.kernel.org/r/20210721074849.57004-1-zhengqi.arch@bytedance.com Fixes: `f9ce0be71d` ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `e4dc348914`) Bug: 187129171 Signed-off-by: Connor O'Brien <connoro@google.com> Change-Id: I206746de94a795e41be5593fa512985b5a89aaaf	2022-01-18 09:09:57 -08:00
Hannes Reinecke	d5f74773bf	UPSTREAM: scsi: virtio_scsi: Do not overwrite SCSI status When a sense code is present we should not override the SAM status; the driver already sets it based on the response from the hypervisor. In addition we should only copy the sense buffer if one is actually provided by the hypervisor. Link: https://lore.kernel.org/r/20210622091153.29231-1-hare@suse.de Fixes: `464a00c9e0` ("scsi: core: Kill DRIVER_SENSE") Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit `c43ddbf97f`) Bug: 187129171 Signed-off-by: Connor O'Brien <connoro@google.com> Change-Id: I6a42c80e2cbd6786f2e08ebe4226f2cddfbb8e97	2022-01-18 09:09:57 -08:00
Robin Peng	973230a4ca	ANDROID: Update the ABI symbol list Update the generic symbol list. Bug: 211546634 Signed-off-by: Robin Peng <robinpeng@google.com> Change-Id: I27d2f29b7afd5ec47e911053a15158e1a560cdf2	2022-01-17 15:53:42 +08:00
Daniel Rosenberg	b074149d20	ANDROID: bpf-fuse: Fix Setattr Setattr implementation was mixing up some flags, and missing some of them. Test: atest android.appsecurity.cts.ExternalStorageHostTest Bug: 202785178 Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: Id41fa30881766faad5858b658f5b6871c0ae46b3	2022-01-16 15:27:50 -08:00
Ramji Jiyani	049a8b54d3	ANDROID: GKI: Disable security lockdown for unsigned modules By default with SELinux enabled behavior for unsigned module loading is same as sig_enforce=1. This causes loading of unsigned modules fail. All modules in Android GKI are unsigned except GKI modules. Do not prevent module loading in case of CONFIG_SIG_MODULE_PROTECT; which was introduced to change behavior of sig_enforce to allow unsigned modules but not access to protected symbols. Bug: 200082547 Bug: 214445388 Fixes: `9ab6a24225` ("ANDROID: GKI: Add module load time protected symbol lookup") Test: TreeHugger Signed-off-by: Ramji Jiyani <ramjiyani@google.com> Change-Id: Iab3113d706cbd7db7a5684897bcafd5671a6d424	2022-01-14 19:59:14 +00:00
Ramji Jiyani	2da3b12bc0	ANDROID: GKI: Enable system_dlkm build for gki Update GKI build configs to build system_dlkm.img. Add an empty system_dlkm modules list file at: android/gki_system_dlkm_modules Bug: 200082547 Bug: 214445388 Test: TH Signed-off-by: Ramji Jiyani <ramjiyani@google.com> Change-Id: Ia11b48d6033a39479d71c90159c74809a874893d	2022-01-14 19:58:15 +00:00
Ramji Jiyani	39220e855b	ANDROID: GKI: Enable config for module signing Enabled signed module and Android gki module symbol protection support. Bug: 200082547 Bug: 214445388 Test: TH Signed-off-by: Ramji Jiyani <ramjiyani@google.com> Change-Id: I0ecfb1df8437c67c00a5bb9bf813d27ff153a2cf	2022-01-14 19:57:39 +00:00
Ramji Jiyani	254d999798	ANDROID: GKI: Do not force select MODULE_SIG_ALL CONFIG_MODULE_SIG_ALL needs to be set for gki_defconig, but will require an override via device fragments to avoid signing the vendor modules at build-time. It defaults to 'y' so no need to explicitly set for gki_defconfig. Bug: 200082547 Bug: 214445388 Fixes: `9ab6a24225` ("ANDROID: GKI: Add module load time protected symbol lookup") Test: TH, manual builds including P21 mainline Signed-off-by: Ramji Jiyani <ramjiyani@google.com> Change-Id: Iafc0936b5e7bfb781b28642d1ec233a7fcf85f09	2022-01-14 19:57:16 +00:00
David Brazdil	d1109f05c3	BACKPORT: FROMLIST: misc: open-dice: Add driver to expose DICE data to userspace Open Profile for DICE is an open protocol for measured boot compatible with the Trusted Computing Group's Device Identifier Composition Engine (DICE) specification. The generated Compound Device Identifier (CDI) certificates represent the hardware/software combination measured by DICE, and can be used for remote attestation and sealing. Add a driver that exposes reserved memory regions populated by firmware with DICE CDIs and exposes them to userspace via a character device. Userspace obtains the memory region's size from read() and calls mmap() to create a mapping of the memory region in its address space. The mapping is not allowed to be write+shared, giving userspace a guarantee that the data were not overwritten by another process. Userspace can also call write(), which triggers a wipe of the DICE data by the driver. Because both the kernel and userspace mappings use write-combine semantics, all clients observe the memory as zeroed after the syscall has returned. Acked-by: Rob Herring <robh@kernel.org> Cc: Andrew Scull <ascull@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: David Brazdil <dbrazdil@google.com> Link: https://lore.kernel.org/r/20220104100645.1810028-3-dbrazdil@google.com Bug: 198197082 [willdeacon@: Fixed context conflicts in reserved_mem_matches[] and Makefile] Signed-off-by: Will Deacon <willdeacon@google.com> Bug: 209580772 Change-Id: If1160c8cc3a39ea822e089d1b80c837aec8075fa Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:28 +00:00
David Brazdil	3d914125b2	FROMLIST: dt-bindings: reserved-memory: Open Profile for DICE Add DeviceTree bindings for Open Profile for DICE, an open protocol for measured boot. Firmware uses DICE to measure the hardware/software combination and generates Compound Device Identifier (CDI) certificates. These are stored in memory and the buffer is described in the DT as a reserved memory region compatible with 'google,open-dice'. Signed-off-by: David Brazdil <dbrazdil@google.com> Link: https://lore.kernel.org/r/20220104100645.1810028-2-dbrazdil@google.com Bug: 198197082 Bug: 209580772 Signed-off-by: Will Deacon <willdeacon@google.com> Change-Id: If318ad91ef1ae26ff639f99a4349e8c737d286b6 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:28 +00:00
Will Deacon	c02620fb25	ANDROID: KVM: arm64: relay entropy requests from protected guests directly to secure As pKVM does not trust the host, it should not be involved in the handling of, or be able to observe the response to entropy requests issues by protected guests. When an SMC-based implementation of the ARM SMCCC TRNG interface is present, pass any HVC-based requests directly on to the secure firmware. Co-developed-by: Ard Biesheuvel <ardb@google.com> Signed-off-by: Ard Biesheuvel <ardb@google.com> Signed-off-by: Will Deacon <will@kernel.org> Bug: 209580772 Change-Id: Ica492ce49fd059a62ecc31bb7ac13c9adb773a08 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:28 +00:00
Will Deacon	d01a90c858	ANDROID: KVM: arm64: Create EL2 copy of __icache_flags for pKVM instead of alias Using an alias of the host's `__icache_flags` variable at EL2 for pKVM is risky, as it provides the host with a mechanism to elide cache maintenance of guest pages by causing functions such as icache_is_vpipt() to erroneously return false. Create a private copy of the __icache_flags variable at EL2 and initialise it using the host's version during pKVM init. Signed-off-by: Will Deacon <will@kernel.org> Bug: 209580772 Change-Id: I595f11d1e336dadae0eb82222e4da79a1069012a Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:28 +00:00
Marc Zyngier	ec2585f588	ANDROID: arm64: Register earlycon fixmap with the MMIO guard On initialising the MMIO guard infrastructure, register the earlycon mapping if present. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I379387253d08e2414fa386a3360a45391da7d90d Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:27 +00:00
Marc Zyngier	f6edd58a0c	ANDROID: arm64: Add a helper to retrieve the PTE of a fixmap In order to transfer the early mapping state into KVM's MMIO guard infrastucture, provide a small helper that will retrieve the associated PTE. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: Iefc1c57d5e9476b718a8a68f60e562a57b09fb6a Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:27 +00:00
Marc Zyngier	8e744844c6	ANDROID: arm64: Enroll into KVM's MMIO guard if required Should a guest desire to enroll into the MMIO guard, allow it to do so with a command-line option. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: Ia9a77f693531740500739693c52b4959abacafd4 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:27 +00:00
Marc Zyngier	f89d2055a3	ANDROID: arm64: Implement ioremap/iounmap hooks calling into KVM's MMIO guard Implement the previously defined ioremap/iounmap hooks for arm64, calling into KVM's MMIO guard if available. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I86a78f8941fb60078fb873a34c5eb32830a00259 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:27 +00:00
Marc Zyngier	bd1474cd4c	ANDROID: BACKPORT: mm/vmalloc: Add arch-specific callbacks to track io{remap,unmap} physical pages Add a pair of hooks (ioremap_phys_range_hook/iounmap_phys_range_hook) that can be implemented by an architecture. Contrary to the existing arch_sync_kernel_mappings(), this one tracks things at the physical address level. This is specially useful in these virtualised environments where the guest has to tell the host whether (and how) it intends to use a MMIO device. Signed-off-by: Marc Zyngier <maz@kernel.org> [willdeacon@: Hook ioremap_page_range() in mm/ioremap.c] Bug: 209580772 Change-Id: I970c2e632cb2b01060d5e66e4194fa9248188f43 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:23 +00:00
Marc Zyngier	be179b64d0	ANDROID: KVM: arm64: Add some documentation for the MMIO guard feature Document the hypercalls user for the MMIO guard infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I927bcd6c5e3ef932265d817288ff2b46b0e0db66 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:23 +00:00
Marc Zyngier	a0841f8b50	ANDROID: KVM: arm64: Plumb MMIO checking into the fault handling Plumb the MMIO checking code into the MMIO fault handling code. Any fault hitting outside of an MMIO region will now report an invalid syndrome, and won't leak any data from the guest. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I68bef2d0211a804aa1e598aeaa0c85dc4098f61e Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:23 +00:00
Marc Zyngier	1b72ff723d	ANDROID: KVM: arm64: pkvm: Wire MMIO guard hypercalls Plumb in the hypercall interface to allow a guest to discover, enroll, map and unmap MMIO regions. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I0390456ffde8ceca351d3d8e82fd1dddeb747fac Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:23 +00:00
Marc Zyngier	0cf8258626	ANDROID: KVM: arm64: pkvm: Add MMIO guard infrastructure Introduce the infrastructure required to identify an IPA region that is expected to be used as an MMIO window. This include mapping, unmapping and checking the regions. Nothing calls into it yet, so no expected functional change. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I227eaa28b98e067e3daae4f9e1071eb37a6761cc Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:23 +00:00
Marc Zyngier	b7bef27e71	ANDROID: KVM: arm64: Introduce KVM_ARCH_FLAG_MMIO_GUARD flag Add a per-VM flag indicating that the guest has bought into the MMIO guard enforcement framework. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: If60b2b38a419a9f44ebe9029f55dd016fd2444b5 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:22 +00:00
Marc Zyngier	fc38ca626f	ANDROID: KVM: arm64: Expose topup_hyp_memcache() to the rest of KVM In order to simplify the implementation of an EL2-only version of MMIO guard, expose topup_hyp_memcache() and simplify its usage by only requiring a vcpu. While we're at it, make free_hyp_memcache() visible in kvm_host.h Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I4f54c57a9693cf7a3450f99fedc15ae32af09a31 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:22 +00:00
Marc Zyngier	77deb98705	ANDROID: KVM: arm64: Define MMIO guard hypercalls Define the handful of hypercalls that MMIO guard will require. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: Iac312b2327c31a1532fdb38e8fa8066291d9f611 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:22 +00:00
Marc Zyngier	19b510d531	ANDROID: KVM: arm64: Check for PTE valitity when checking for executable/cacheable Don't blindly assume that the PTE is valid when checking whether it describes an executable or cacheable mapping. This makes sure that we don't issue CMOs for invalid mappings. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I5b271c91aa6ceb23f7b1e6a571e30d080866d5c9 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:22 +00:00
Marc Zyngier	d760740101	ANDROID: KVM: arm64: Generalise VM features into a set of flags We currently deal with a set of booleans for VM features, while they could be better represented as set of flags contained in an unsigned long, similarily to what we are doing on the CPU side. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I86be6bab12287c3eb21bbe03f255e2899edbdffb Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:22 +00:00
Marc Zyngier	000f0c90c4	ANDROID: KVM: arm64: pkvm: Plug in cache invalidation for non-protected guests Since we must still support the dreaded set/way CMOs for non-protected VMs (as well as the equivalent operation when vcpus switch their MMU on), perform an invalidation that will iterate over all the pages that have been donated to the guest, one after the other. This requires a minor change to the locking used for donation so that all donated pages can be seen by a concurrent invalidation. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Signed-off-by: Will Deacon <willdeacon@google.com> Change-Id: I1780127722bda7bdc884bb4e68db6ae47d042822	2022-01-14 16:43:22 +00:00
Marc Zyngier	21b5ab1b19	ANDROID: KVM: arm64: pkvm: Allow the shadows to be destroyed on teardown There is no difference between protected and non-protected guests when it comes to shadow structures, and we want these shadow structures to have the same life cycle. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I7e9bf366aae6bd0542d0038d24e2350a9dd23cd0 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:22 +00:00
Marc Zyngier	70f68991b3	ANDROID: KVM: arm64: pkvm: Don't init pvm traps non non-protected guests We want the host to handle everything as usual. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: Icf8ee146917e886bca258815cf948a1b12540353 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:22 +00:00
Marc Zyngier	3336b8073a	ANDROID: KVM: arm64: pkvm: Share memory with non-protected guests Instead of donating memory to non-pVMs, share the memory, which gives us a good enough approximation of the usual behaviour. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I47213754613110a6fb8157806eb96ddf92ead346 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:21 +00:00
Marc Zyngier	4fe4fd6746	ANDROID: KVM: arm64: pkvm: Manage the non-protected guest dirty state from EL1 In order to deal with state synchronisation between EL1 and EL2, we use the following setup: - On exit from EL2, the state is forcefully marked clean. - Should a trap be handled, the state is synchronised and immediately marked dirty - On vcpu_put(), the state is also marked dirty, since it can be modified by userspace Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I47a889ca5432566f236de4630d81753348632f8a Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:21 +00:00
Marc Zyngier	3e7b59fb60	ANDROID: KVM: arm64: pkvm: State sync primitives for non-protected guests In order for a non-protected guest to be functionnal, userspace has to be able to query its state, which means that the host view of the vcpu has to be kept up to date. In order to achieve this, we establish the following scheme for EL2: - On entering vcpu_run(), we check for the KVM_ARM64_PKVM_STATE_DIRTY flag in the host vcpu. If set, we sync the state from the host to the shadow version. - On exiting vcpu_run(), we don't do anything, but let the host issue a synch hypercall if required. - On vcpu_put(), we force a synchronisation to the host. The El1 host will have a complementary approach in the following patches. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I42811a25d2e176d6c7d9a66ade6e9149a96e9256 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:21 +00:00
Marc Zyngier	520514c341	ANDROID: KVM: arm64: pkvm: Introduce entry/exit handlers for non-protected guests A non-protected guest requires a lot less handling than a protected one when dealing with entries/exits from/to EL2. Since we already indiredct those, introduce new entry/exit tables for non-pVMs. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I66602bc491a4a87d6482b12e4eaf7aa53a7dbfd9 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:21 +00:00
Marc Zyngier	a1ab3b544a	ANDROID: KVM: arm64: pkvm: Make {flush,sync}_shadow_state() take the full state As we're about to need to copy some state back and forth for non0-protected guests, pass the full loaded state to the flush/sync functions. No functionnal change. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I7ad6a00a7500e91237fcc0981261c819b2224ee0 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:21 +00:00
Marc Zyngier	4262892c41	ANDROID: KVM: arm64: pkvm: Replace pkvm_loaded_state.is_shadow with is_protected When pKVM is enabled, all the vcpus must have a shadow structure managed by the hypervisor, irrespective of theur protection status. This field thus represents the wrong abstraction. Replace it with 'pkvm_loaded_state.is_protected', which tracks whether a vcpu is part of a protected VM. pkvm_loaded_state gets also moved around for convenience with the following patches. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: Ic9876fde543abb350fe8969d5b4661e30092f553 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:21 +00:00
Marc Zyngier	5106d405c6	ANDROID: KVM: arm64: Generate hyp-constants.o as an nVHE object A number of KVM definitions are keyd on _KVM_NVHE_HYPERVISOR__ being defined or not. Make sure we advertise this #define when compiling hyp-constants.o, so that we get the right stuff. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: Ied191c0a18274258cffede72b06b0fb5bba5604e Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:21 +00:00
Marc Zyngier	c0dc717dca	ANDROID: KVM: arm64: Introduce vcpu_is_protected() helper Instead of poking into the internals of the host KVM structure, stick to the shadow structures when trying to work out whether a vcpu is part of a protected VM or not. Take this opportunity to sprinkle a couple of unlikely(), just because. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I22a096e1e3cfe34cd2658684b02d8bac486416c4 Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:20 +00:00
Marc Zyngier	c496a48855	ANDROID: KVM: arm64: pkvm: Update the shadow view of pkvm.enabled at creation time As we can't really rely on the host side for the protection status, snapshot the expected status at VM creation time. Signed-off-by: Marc Zyngier <maz@kernel.org> Bug: 209580772 Change-Id: I0943eadba25e6c9fe718f29e749b9fcc8fbb79ba Signed-off-by: Will Deacon <willdeacon@google.com>	2022-01-14 16:43:20 +00:00

1 2 3 4 5 ...

986025 Commits