Our current ID register filtering is starting to be a mess of if()
statements, and isn't going to get any saner.
Let's turn it into a switch(), which has a chance of being more
readable, and introduce a FEATURE() macro that allows easy generation
of feature masks.
No functionnal change intended.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit c885793558
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Idec30e79f2ecf90ab26337c01b7b6239db61a670
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
Despite advertising support for AArch32 PMUv3p1, we fail to handle
the PMCEID{2,3} registers, which conveniently alias with the top
bits of PMCEID{0,1}_EL1.
Implement these registers with the usual AA32(HI/LO) aliasing
mechanism.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 99b6a4013f
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I4f5777a65a12f4e3c66b6511707478d0f5d3508c
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
We shouldn't expose *any* PMU capability when no PMU has been
configured for this VM.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit cb95914685
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I59a410fc2390f407ff8dd5ecf6883e0a4428cb9f
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
The AArch32 CP14 DBGDIDR has bit 15 set to RES1, which our current
emulation doesn't set. Just add the missing bit.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit bea7e97fef
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I2d85bf57a967ad089a0b1cda47fb49e275379a62
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
gen-hyprel is, for better or worse, a native-endian program:
it assumes that the ELF data structures are in the host's
endianness, and even assumes that the compiled kernel is
little-endian in one particular case.
None of these assumptions hold true though: people actually build
(use?) BE arm64 kernels, and seem to avoid doing so on BE hosts.
Madness!
In order to solve this, wrap each access to the ELF data structures
with the required byte-swapping magic. This requires to obtain
the kernel data structure, and provide per-endianess wrappers.
This result in a kernel that links and even boots in a model.
Fixes: 8c49b5d43d ("KVM: arm64: Generate hyp relocation data")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit bc93763f17
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ib2b332ee2bf25c39d34fda6c9f50668491a1defd
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
Provide a hypervisor implementation of the ARM architected TRNG firmware
interface described in ARM spec DEN0098. All function IDs are implemented,
including both 32-bit and 64-bit versions of the TRNG_RND service, which
is the centerpiece of the API.
The API is backed by the kernel's entropy pool only, to avoid guests
draining more precious direct entropy sources.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
[Andre: minor fixes, drop arch_get_random() usage]
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210106103453.152275-6-andre.przywara@arm.com
(cherry picked from commit a8e190cdae
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I2d634c3b4f96c952e8b935d2a86e258e9a08925a
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
We now set the pfn dirty and mark the page dirty before calling fault
handlers in user_mem_abort(), so we might end up having spurious dirty
pages if update of permissions or mapping has failed. Let's move these
two operations after the fault handlers, and they will be done only if
the fault has been handled successfully.
When an -EAGAIN errno is returned from the map handler, we hope to the
vcpu to enter guest directly instead of exiting back to userspace, so
adjust the return value at the end of function.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210114121350.123684-4-wangyanan55@huawei.com
(cherry picked from commit 509552e65a
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ic34f97c2a71d49109544552be884a3b9483cd320
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
(1) During running time of a a VM with numbers of vCPUs, if some vCPUs
access the same GPA almost at the same time and the stage-2 mapping of
the GPA has not been built yet, as a result they will all cause
translation faults. The first vCPU builds the mapping, and the followed
ones end up updating the valid leaf PTE. Note that these vCPUs might
want different access permissions (RO, RW, RX, RWX, etc.).
(2) It's inevitable that we sometimes will update an existing valid leaf
PTE in the map path, and we perform break-before-make in this case.
Then more unnecessary translation faults could be caused if the
*break stage* of BBM is just catched by other vCPUS.
With (1) and (2), something unsatisfactory could happen: vCPU A causes
a translation fault and builds the mapping with RW permissions, vCPU B
then update the valid leaf PTE with break-before-make and permissions
are updated back to RO. Besides, *break stage* of BBM may trigger more
translation faults. Finally, some useless small loops could occur.
We can make some optimization to solve above problems: When we need to
update a valid leaf PTE in the map path, let's filter out the case where
this update only change access permissions, and don't update the valid
leaf PTE here in this case. Instead, let the vCPU enter back the guest
and it will exit next time to go through the relax_perms path without
break-before-make if it still wants more permissions.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210114121350.123684-3-wangyanan55@huawei.com
(cherry picked from commit 694d071f8d
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Iac22cc9c455f10795c6d938f21d4837c22bcdadd
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
Procedures of hyp stage-1 map and guest stage-2 map are quite different,
but they are tied closely by function kvm_set_valid_leaf_pte().
So adjust the relative code for ease of code maintenance in the future.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210114121350.123684-2-wangyanan55@huawei.com
(cherry picked from commit 8ed80051c8
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I8fdaed094feec5228239ba23d57fd6c46ae7857e
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
The arguments for __do_hyp_init are now passed with a pointer to a
struct which means there are scratch registers available for use. Thanks
to this, we no longer need to use clever, but hard to read, tricks that
avoid the need for scratch registers when checking for the
__kvm_hyp_init HVC.
Tested-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210125145415.122439-2-ascull@google.com
(cherry picked from commit 87b26801f0
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I581c3b13067d4a40008e2c6ffd4a3e2edf98515e
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
Hyp code used the hyp_symbol_addr helper to force PC-relative addressing
because absolute addressing results in kernel VAs due to the way hyp
code is linked. This is not true anymore, so remove the helper and
update all of its users.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-9-dbrazdil@google.com
(cherry picked from commit 247bc166e6
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ibcd569ae01199097aaff45ffa7833ffecae33b9e
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
Storing a function pointer in hyp now generates relocation information
used at early boot to convert the address to hyp VA. The existing
alternative-based conversion mechanism is therefore obsolete. Remove it
and simplify its users.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-8-dbrazdil@google.com
(cherry picked from commit 537db4af26
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ife8468fc2062fc47a093a82edd41558fc1b76205
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
Hyp code uses absolute addressing to obtain a kimg VA of a small number
of kernel symbols. Since the kernel now converts constant pool addresses
to hyp VAs, this trick does not work anymore.
Change the helpers to convert from hyp VA back to kimg VA or PA, as
needed and rework the callers accordingly.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-7-dbrazdil@google.com
(cherry picked from commit 97cbd2fc02
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I43420e1c0414370a0ca22c0129c933188df89dea
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
KVM nVHE code runs under a different VA mapping than the kernel, hence
so far it avoided using absolute addressing because the VA in a constant
pool is relocated by the linker to a kernel VA (see hyp_symbol_addr).
Now the kernel has access to a list of positions that contain a kimg VA
but will be accessed only in hyp execution context. These are generated
by the gen-hyprel build-time tool and stored in .hyp.reloc.
Add early boot pass over the entries and convert the kimg VAs to hyp VAs.
Note that this requires for .hyp* ELF sections to be mapped read-write
at that point.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-6-dbrazdil@google.com
(cherry picked from commit 6ec6259d70
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I3e2ec6a53c8c2850a359eed172024d660f92b01a
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
Add a post-processing step to compilation of KVM nVHE hyp code which
calls a custom host tool (gen-hyprel) on the partially linked object
file (hyp sections' names prefixed).
The tool lists all R_AARCH64_ABS64 data relocations targeting hyp
sections and generates an assembly file that will form a new section
.hyp.reloc in the kernel binary. The new section contains an array of
32-bit offsets to the positions targeted by these relocations.
Since these addresses of those positions will not be determined until
linking of `vmlinux`, each 32-bit entry carries a R_AARCH64_PREL32
relocation with addend <section_base_sym> + <r_offset>. The linker of
`vmlinux` will therefore fill the slot accordingly.
This relocation data will be used at runtime to convert the kernel VAs
at those positions to hyp VAs.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-5-dbrazdil@google.com
(cherry picked from commit 8c49b5d43d
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I2e3d7856cd8baaa0ed96444cb30812496203ac25
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
Generating hyp relocations will require referencing positions at a given
offset from the beginning of hyp sections. Since the final layout will
not be determined until the linking of `vmlinux`, modify the hyp linker
script to insert a symbol at the first byte of each hyp section to use
as an anchor. The linker of `vmlinux` will place the symbols together
with the sections.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-4-dbrazdil@google.com
(cherry picked from commit f7a4825d95
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ia541c6f15641a9b016744f973f179a54f6db6920
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
We will need to recognize pointers in .rodata specific to hyp, so
establish a .hyp.rodata ELF section. Merge it with the existing
.hyp.data..ro_after_init as they are treated the same at runtime.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-3-dbrazdil@google.com
(cherry picked from commit 16174eea2e
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I541778e5e57208ba0cadc9622e276679ef6d5c88
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
So far hyp-init.S created a .hyp.idmap.text section directly, without
relying on the hyp linker script to prefix its name. Change it to create
.idmap.text and add a HYP_SECTION entry to hyp.lds.S. This way all .hyp*
sections go through the linker script and can be instrumented there.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-2-dbrazdil@google.com
(cherry picked from commit eceaf38f52
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I2ed935af7932326ceca9ef3371723cc8a4fdf9b6
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
The ARM architected TRNG firmware interface, described in ARM spec
DEN0098, define an ARM SMCCC based interface to a true random number
generator, provided by firmware.
Add the definitions of the SMCCC functions as defined by the spec.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20210106103453.152275-2-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 67c6bb56b6
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I3a7551e4777a3ccf0ad291f877f4415a6d6e5965
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
GCC 4.9 seems to have a problem with the "S" asm constraint
when the symbol lives in the same compilation unit, and pretends
the constraint is impossible:
$ cat x.c
void *foo(void)
{
static int x;
int *addr;
asm("adrp %0, %1" : "=r" (addr) : "S" (&x));
return addr;
}
$ ~/Work/gcc-linaro-aarch64-linux-gnu-4.9-2014.09_linux/bin/aarch64-linux-gnu-gcc -S -x c -O2 x.c
x.c: In function ‘foo’:
x.c:5:2: error: impossible constraint in ‘asm’
asm("adrp %0, %1" : "=r" (addr) : "S" (&x));
^
Boo. Following revisions of the compiler work just fine, though.
We can fallback to the "i" constraint for GCC version prior to 5.0,
which *seems* to do the right thing. Hopefully we will be able to
remove this at some point, but in the meantime this gets us going.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20201217111135.1536658-1-maz@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 9fd339a45b)
[will: Fixed trivial conflict due to removal of __smccc_workaround_1_smc]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I8ec0842eb50ee32eefaed947bf5dbf7a4892f3b4
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3.
This patchset adds a tracepoint, error_repor_end, that is to be used by
KFENCE, KASAN, and potentially other bug detection tools, when they print
an error report. One of the possible use cases is userspace collection of
kernel error reports: interested parties can subscribe to the tracing
event via tracefs, and get notified when an error report occurs.
This patch (of 3):
Introduce error_report_end tracepoint. It can be used in debugging tools
like KASAN, KFENCE, etc. to provide extensions to the error reporting
mechanisms (e.g. allow tests hook into error reporting, ease error report
collection from production kernels). Another benefit would be making use
of ftrace for debugging or benchmarking the tools themselves.
Should we need it, the tracepoint name leaves us with the possibility to
introduce a complementary error_report_start tracepoint in the future.
Link: https://lkml.kernel.org/r/20210121131915.1331302-1-glider@google.com
Link: https://lkml.kernel.org/r/20210121131915.1331302-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 177201466
(cherry picked from commit ba7612c00686f204f7bca4ceb7394a9e705e84bd
https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Ic86e29982c04dad4b3b7889a424f37b22cc5f22b
Instead of removing the fault handling portion of the stack trace based on
the fault handler's name, just use struct pt_regs directly.
Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it
through to kfence_report_error() for out-of-bounds, use-after-free, or
invalid access errors, where pt_regs is used to generate the stack trace.
If the kernel is a DEBUG_KERNEL, also show registers for more information.
Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 177201466
(cherry picked from commit 54a5abe9b5d542ee71836439cc662efe178c8211
https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I3a60060b24f0efb4faee2e6c953973bc1263e8d1
For certain usecases, specifically where the sample interval is always
set to a very low value such as 1ms, it can make sense to use a dynamic
branch instead of static branches due to the overhead of toggling a
static branch.
Therefore, add a new Kconfig option to remove the static branches and
instead check kfence_allocation_gate if a KFENCE allocation should be
set up.
Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Jörn Engel <joern@purestorage.com>
Reviewed-by: Jörn Engel <joern@purestorage.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 177201466
(cherry picked from commit c01761611b325c1e4ec7d3e236cc9db003cb82fd
https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I68a112a8ff68fa24742b198e036f130a9757c27f
Lockdep reports that we may deadlock when calling wake_up() in
__kfence_alloc(), because we may already hold base->lock. This can happen
if debug objects are enabled:
...
__kfence_alloc+0xa0/0xbc0 mm/kfence/core.c:710
kfence_alloc include/linux/kfence.h:108 [inline]
...
kmem_cache_zalloc include/linux/slab.h:672 [inline]
fill_pool+0x264/0x5c0 lib/debugobjects.c:171
__debug_object_init+0x7a/0xd10 lib/debugobjects.c:560
debug_object_init lib/debugobjects.c:615 [inline]
debug_object_activate+0x32c/0x3e0 lib/debugobjects.c:701
debug_timer_activate kernel/time/timer.c:727 [inline]
__mod_timer+0x77d/0xe30 kernel/time/timer.c:1048
...
Therefore, switch to an open-coded wait loop. The difference to before is
that the waiter wakes up and rechecks the condition after 1 jiffy;
however, given the infrequency of kfence allocations, the difference is
insignificant.
Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com
Reported-by: syzbot+8983d6d4f7df556be565@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Hillf Danton <hdanton@sina.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 177201466
(cherry picked from commit c5fb1ab1a3c6d0ee02d1054a10d51ffcac57aed5
https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Iee40e9f216afbc3fce8e43c0e2a4bc807fdddf39
To toggle the allocation gates, we set up a delayed work that calls
toggle_allocation_gate(). Here we use wait_event() to await an allocation
and subsequently disable the static branch again. However, if the kernel
has stopped doing allocations entirely, we'd wait indefinitely, and stall
the worker task. This may also result in the appropriate warnings if
CONFIG_DETECT_HUNG_TASK=y.
Therefore, introduce a 1 second timeout and use wait_event_timeout(). If
the timeout is reached, the static branch is disabled and a new delayed
work is scheduled to try setting up an allocation at a later time.
Note that, this scenario is very unlikely during normal workloads once the
kernel has booted and user space tasks are running. It can, however,
happen during early boot after KFENCE has been enabled, when e.g. running
tests that do not result in any allocations.
Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com
Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 177201466
(cherry picked from commit 80d4693491f6f20de01437319b081fdda2079e67
https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I2332ff8144b8bce5c4574b01ea2863e0e71e6124
Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.
This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors. This
series enables KFENCE for the x86 and arm64 architectures, and adds
KFENCE hooks to the SLAB and SLUB allocators.
KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.
KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error.
Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval,
the next allocation through the main allocator (SLAB or SLUB) returns a
guarded allocation from the KFENCE object pool. At this point, the timer
is reset, and the next allocation is set up after the expiration of the
interval.
To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE.
The KFENCE memory pool is of fixed size, and if the pool is exhausted no
further KFENCE allocations occur. The default config is conservative
with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
pages).
We have verified by running synthetic benchmarks (sysbench I/O,
hackbench) and production server-workload benchmarks that a kernel with
KFENCE (using sample intervals 100-500ms) is performance-neutral
compared to a non-KFENCE baseline kernel.
KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
properties. The name "KFENCE" is a homage to the Electric Fence Malloc
Debugger [2].
For more details, see Documentation/dev-tools/kfence.rst added in the
series -- also viewable here:
https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst
[1] http://llvm.org/docs/GwpAsan.html
[2] https://linux.die.net/man/3/efence
This patch (of 9):
This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.
KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.
KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error. To detect out-of-bounds
writes to memory within the object's page itself, KFENCE also uses
pattern-based redzones. The following figure illustrates the page
layout:
---+-----------+-----------+-----------+-----------+-----------+---
| xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
| xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
| x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
| xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
| xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
| xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
---+-----------+-----------+-----------+-----------+-----------+---
Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval, a
guarded allocation from the KFENCE object pool is returned to the main
allocator (SLAB or SLUB). At this point, the timer is reset, and the
next allocation is set up after the expiration of the interval.
To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE. To date, we have verified by running synthetic
benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE
is performance-neutral compared to the non-KFENCE baseline.
For more details, see Documentation/dev-tools/kfence.rst (added later in
the series).
Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: SeongJae Park <sjpark@amazon.de>
Co-developed-by: Marco Elver <elver@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[glider: resolved minor conflict in init/main.c]
Bug: 177201466
(cherry picked from commit 2a8dede73c3496bbd917644657f3735a4f508cb9
https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I6b474675cc9732c31118df53fa06c3997f577218
If the system doesn't have enough memory when fuse_passthrough_read_iter
is requested in asynchronous IO, an error is directly returned without
restoring the caller's credentials.
Fix by always ensuring credentials are restored.
Fixes: aa29f32988 ("FROMLIST: fuse: Use daemon creds in passthrough mode")
Link: https://lore.kernel.org/lkml/YB0qPHVORq7bJy6G@google.com/
Reported-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I4aff43f5dd8ddab2cc8871cd9f81438963ead5b6
This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.
Signed-off-by: Daniel Colascione <dancol@google.com>
[LG: Remove owner inode from userfaultfd_ctx]
[LG: Use anon_inode_getfd_secure() in userfaultfd syscall]
[LG: Use inode of file in userfaultfd_read() in resolve_userfault_fork()]
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit b537900f15)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Ib2973ca3650a8defe15eded13294a3fb25356b9d
This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.
A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".
Example:
type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };
(The next patch in this series is necessary for making userfaultfd
support this new interface. The example above is just
for exposition.)
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit 29cd6591ab)
Conflicts:
security/selinux/include/classmap.h
(1. Removed 'lockdown' mapping to be in sync with d9cb255af3)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Iaa9f236f43bf225f089f00ead17e64326adbb328
This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.
The new function accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules.
For example, in case of userfaultfd, the created inode is a 'logical child'
of the context_inode (userfaultfd inode of the parent process) in the sense
that it provides the security context required during creation of the child
process' userfaultfd inode.
Signed-off-by: Daniel Colascione <dancol@google.com>
[LG: Delete obsolete comments to alloc_anon_inode()]
[LG: Add context_inode description in comments to anon_inode_getfd_secure()]
[LG: Remove definition of anon_inode_getfile_secure() as there are no callers]
[LG: Make __anon_inode_getfile() static]
[LG: Use correct error cast in __anon_inode_getfile()]
[LG: Fix error handling in __anon_inode_getfile()]
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit e7e832ce6f)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I3061c599f2951368914a2ca9f56ea60387d42a1d
This change adds a new LSM hook, inode_init_security_anon(), that will
be used while creating secure anonymous inodes. The hook allows/denies
its creation and assigns a security context to the inode.
The new hook accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules
for granting/denying permission to create an anon-inode of the same type.
This context_inode's security_context can also be used to initialize the
newly created anon-inode's security_context.
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit 215b674b84)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I2bbbb7a5c2371103c5b632b791c5c397ae228e0b