Commit Graph

973222 Commits

Author SHA1 Message Date
Marc Zyngier
a6fd168d79 FROMGIT: KVM: arm64: Refactor filtering of ID registers
Our current ID register filtering is starting to be a mess of if()
statements, and isn't going to get any saner.

Let's turn it into a switch(), which has a chance of being more
readable, and introduce a FEATURE() macro that allows easy generation
of feature masks.

No functionnal change intended.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit c885793558
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Idec30e79f2ecf90ab26337c01b7b6239db61a670
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:36 +00:00
Marc Zyngier
250fe462bd FROMGIT: KVM: arm64: Add handling of AArch32 PCMEID{2,3} PMUv3 registers
Despite advertising support for AArch32 PMUv3p1, we fail to handle
the PMCEID{2,3} registers, which conveniently alias with the top
bits of PMCEID{0,1}_EL1.

Implement these registers with the usual AA32(HI/LO) aliasing
mechanism.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 99b6a4013f
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I4f5777a65a12f4e3c66b6511707478d0f5d3508c
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:36 +00:00
Marc Zyngier
81f982e5cd FROMGIT: KVM: arm64: Fix AArch32 PMUv3 capping
We shouldn't expose *any* PMU capability when no PMU has been
configured for this VM.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit cb95914685
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I59a410fc2390f407ff8dd5ecf6883e0a4428cb9f
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:36 +00:00
Marc Zyngier
fe1ebbf9a5 FROMGIT: KVM: arm64: Fix missing RES1 in emulation of DBGBIDR
The AArch32 CP14 DBGDIDR has bit 15 set to RES1, which our current
emulation doesn't set. Just add the missing bit.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit bea7e97fef
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I2d85bf57a967ad089a0b1cda47fb49e275379a62
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:35 +00:00
Marc Zyngier
1967d5b8cb FROMGIT: KVM: arm64: Make gen-hyprel endianness agnostic
gen-hyprel is, for better or worse, a native-endian program:
it assumes that the ELF data structures are in the host's
endianness, and even assumes that the compiled kernel is
little-endian in one particular case.

None of these assumptions hold true though: people actually build
(use?) BE arm64 kernels, and seem to avoid doing so on BE hosts.
Madness!

In order to solve this, wrap each access to the ELF data structures
with the required byte-swapping magic. This requires to obtain
the kernel data structure, and provide per-endianess wrappers.

This result in a kernel that links and even boots in a model.

Fixes: 8c49b5d43d ("KVM: arm64: Generate hyp relocation data")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit bc93763f17
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ib2b332ee2bf25c39d34fda6c9f50668491a1defd
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:35 +00:00
Ard Biesheuvel
801a536597 FROMGIT: KVM: arm64: Implement the TRNG hypervisor call
Provide a hypervisor implementation of the ARM architected TRNG firmware
interface described in ARM spec DEN0098. All function IDs are implemented,
including both 32-bit and 64-bit versions of the TRNG_RND service, which
is the centerpiece of the API.

The API is backed by the kernel's entropy pool only, to avoid guests
draining more precious direct entropy sources.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
[Andre: minor fixes, drop arch_get_random() usage]
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210106103453.152275-6-andre.przywara@arm.com
(cherry picked from commit a8e190cdae
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I2d634c3b4f96c952e8b935d2a86e258e9a08925a
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:35 +00:00
Yanan Wang
744c46b8dd FROMGIT: KVM: arm64: Mark the page dirty only if the fault is handled successfully
We now set the pfn dirty and mark the page dirty before calling fault
handlers in user_mem_abort(), so we might end up having spurious dirty
pages if update of permissions or mapping has failed. Let's move these
two operations after the fault handlers, and they will be done only if
the fault has been handled successfully.

When an -EAGAIN errno is returned from the map handler, we hope to the
vcpu to enter guest directly instead of exiting back to userspace, so
adjust the return value at the end of function.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210114121350.123684-4-wangyanan55@huawei.com
(cherry picked from commit 509552e65a
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ic34f97c2a71d49109544552be884a3b9483cd320
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:35 +00:00
Yanan Wang
57166e07d9 FROMGIT: KVM: arm64: Filter out the case of only changing permissions from stage-2 map path
(1) During running time of a a VM with numbers of vCPUs, if some vCPUs
access the same GPA almost at the same time and the stage-2 mapping of
the GPA has not been built yet, as a result they will all cause
translation faults. The first vCPU builds the mapping, and the followed
ones end up updating the valid leaf PTE. Note that these vCPUs might
want different access permissions (RO, RW, RX, RWX, etc.).

(2) It's inevitable that we sometimes will update an existing valid leaf
PTE in the map path, and we perform break-before-make in this case.
Then more unnecessary translation faults could be caused if the
*break stage* of BBM is just catched by other vCPUS.

With (1) and (2), something unsatisfactory could happen: vCPU A causes
a translation fault and builds the mapping with RW permissions, vCPU B
then update the valid leaf PTE with break-before-make and permissions
are updated back to RO. Besides, *break stage* of BBM may trigger more
translation faults. Finally, some useless small loops could occur.

We can make some optimization to solve above problems: When we need to
update a valid leaf PTE in the map path, let's filter out the case where
this update only change access permissions, and don't update the valid
leaf PTE here in this case. Instead, let the vCPU enter back the guest
and it will exit next time to go through the relax_perms path without
break-before-make if it still wants more permissions.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210114121350.123684-3-wangyanan55@huawei.com
(cherry picked from commit 694d071f8d
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Iac22cc9c455f10795c6d938f21d4837c22bcdadd
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:35 +00:00
Yanan Wang
6ffc840533 FROMGIT: KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2 map
Procedures of hyp stage-1 map and guest stage-2 map are quite different,
but they are tied closely by function kvm_set_valid_leaf_pte().
So adjust the relative code for ease of code maintenance in the future.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210114121350.123684-2-wangyanan55@huawei.com
(cherry picked from commit 8ed80051c8
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I8fdaed094feec5228239ba23d57fd6c46ae7857e
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:35 +00:00
Andrew Scull
5fe36b53b8 FROMGIT: KVM: arm64: Simplify __kvm_hyp_init HVC detection
The arguments for __do_hyp_init are now passed with a pointer to a
struct which means there are scratch registers available for use. Thanks
to this, we no longer need to use clever, but hard to read, tricks that
avoid the need for scratch registers when checking for the
__kvm_hyp_init HVC.

Tested-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210125145415.122439-2-ascull@google.com
(cherry picked from commit 87b26801f0
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I581c3b13067d4a40008e2c6ffd4a3e2edf98515e
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:35 +00:00
David Brazdil
4017191f41 FROMGIT: KVM: arm64: Remove hyp_symbol_addr
Hyp code used the hyp_symbol_addr helper to force PC-relative addressing
because absolute addressing results in kernel VAs due to the way hyp
code is linked. This is not true anymore, so remove the helper and
update all of its users.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-9-dbrazdil@google.com
(cherry picked from commit 247bc166e6
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ibcd569ae01199097aaff45ffa7833ffecae33b9e
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:34 +00:00
David Brazdil
475e739737 FROMGIT: KVM: arm64: Remove patching of fn pointers in hyp
Storing a function pointer in hyp now generates relocation information
used at early boot to convert the address to hyp VA. The existing
alternative-based conversion mechanism is therefore obsolete. Remove it
and simplify its users.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-8-dbrazdil@google.com
(cherry picked from commit 537db4af26
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ife8468fc2062fc47a093a82edd41558fc1b76205
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:34 +00:00
David Brazdil
8d26de221a FROMGIT: KVM: arm64: Fix constant-pool users in hyp
Hyp code uses absolute addressing to obtain a kimg VA of a small number
of kernel symbols. Since the kernel now converts constant pool addresses
to hyp VAs, this trick does not work anymore.

Change the helpers to convert from hyp VA back to kimg VA or PA, as
needed and rework the callers accordingly.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-7-dbrazdil@google.com
(cherry picked from commit 97cbd2fc02
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I43420e1c0414370a0ca22c0129c933188df89dea
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:34 +00:00
David Brazdil
eb2cb82148 FROMGIT: KVM: arm64: Apply hyp relocations at runtime
KVM nVHE code runs under a different VA mapping than the kernel, hence
so far it avoided using absolute addressing because the VA in a constant
pool is relocated by the linker to a kernel VA (see hyp_symbol_addr).

Now the kernel has access to a list of positions that contain a kimg VA
but will be accessed only in hyp execution context. These are generated
by the gen-hyprel build-time tool and stored in .hyp.reloc.

Add early boot pass over the entries and convert the kimg VAs to hyp VAs.
Note that this requires for .hyp* ELF sections to be mapped read-write
at that point.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-6-dbrazdil@google.com
(cherry picked from commit 6ec6259d70
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I3e2ec6a53c8c2850a359eed172024d660f92b01a
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:34 +00:00
David Brazdil
9bb0581174 FROMGIT: KVM: arm64: Generate hyp relocation data
Add a post-processing step to compilation of KVM nVHE hyp code which
calls a custom host tool (gen-hyprel) on the partially linked object
file (hyp sections' names prefixed).

The tool lists all R_AARCH64_ABS64 data relocations targeting hyp
sections and generates an assembly file that will form a new section
.hyp.reloc in the kernel binary. The new section contains an array of
32-bit offsets to the positions targeted by these relocations.

Since these addresses of those positions will not be determined until
linking of `vmlinux`, each 32-bit entry carries a R_AARCH64_PREL32
relocation with addend <section_base_sym> + <r_offset>. The linker of
`vmlinux` will therefore fill the slot accordingly.

This relocation data will be used at runtime to convert the kernel VAs
at those positions to hyp VAs.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-5-dbrazdil@google.com
(cherry picked from commit 8c49b5d43d
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I2e3d7856cd8baaa0ed96444cb30812496203ac25
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:34 +00:00
David Brazdil
70f07edcb2 FROMGIT: KVM: arm64: Add symbol at the beginning of each hyp section
Generating hyp relocations will require referencing positions at a given
offset from the beginning of hyp sections. Since the final layout will
not be determined until the linking of `vmlinux`, modify the hyp linker
script to insert a symbol at the first byte of each hyp section to use
as an anchor. The linker of `vmlinux` will place the symbols together
with the sections.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-4-dbrazdil@google.com
(cherry picked from commit f7a4825d95
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ia541c6f15641a9b016744f973f179a54f6db6920
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:34 +00:00
David Brazdil
065a448eac FROMGIT: KVM: arm64: Set up .hyp.rodata ELF section
We will need to recognize pointers in .rodata specific to hyp, so
establish a .hyp.rodata ELF section. Merge it with the existing
.hyp.data..ro_after_init as they are treated the same at runtime.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-3-dbrazdil@google.com
(cherry picked from commit 16174eea2e
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I541778e5e57208ba0cadc9622e276679ef6d5c88
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:34 +00:00
David Brazdil
f035468f02 FROMGIT: KVM: arm64: Rename .idmap.text in hyp linker script
So far hyp-init.S created a .hyp.idmap.text section directly, without
relying on the hyp linker script to prefix its name. Change it to create
.idmap.text and add a HYP_SECTION entry to hyp.lds.S. This way all .hyp*
sections go through the linker script and can be instrumented there.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-2-dbrazdil@google.com
(cherry picked from commit eceaf38f52
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I2ed935af7932326ceca9ef3371723cc8a4fdf9b6
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:33 +00:00
Ard Biesheuvel
0aaf7769ba FROMGIT: firmware: smccc: Add SMCCC TRNG function call IDs
The ARM architected TRNG firmware interface, described in ARM spec
DEN0098, define an ARM SMCCC based interface to a true random number
generator, provided by firmware.

Add the definitions of the SMCCC functions as defined by the spec.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20210106103453.152275-2-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 67c6bb56b6
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I3a7551e4777a3ccf0ad291f877f4415a6d6e5965
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:33 +00:00
Marc Zyngier
eda0251413 BACKPORT: arm64: Work around broken GCC 4.9 handling of "S" constraint
GCC 4.9 seems to have a problem with the "S" asm constraint
when the symbol lives in the same compilation unit, and pretends
the constraint is impossible:

$ cat x.c
void *foo(void)
{
	static int x;
	int *addr;
	asm("adrp %0, %1" : "=r" (addr) : "S" (&x));
	return addr;
}

$ ~/Work/gcc-linaro-aarch64-linux-gnu-4.9-2014.09_linux/bin/aarch64-linux-gnu-gcc -S -x c -O2 x.c
x.c: In function ‘foo’:
x.c:5:2: error: impossible constraint in ‘asm’
  asm("adrp %0, %1" : "=r" (addr) : "S" (&x));
  ^

Boo. Following revisions of the compiler work just fine, though.

We can fallback to the "i" constraint for GCC version prior to 5.0,
which *seems* to do the right thing. Hopefully we will be able to
remove this at some point, but in the meantime this gets us going.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20201217111135.1536658-1-maz@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 9fd339a45b)
[will: Fixed trivial conflict due to removal of __smccc_workaround_1_smc]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I8ec0842eb50ee32eefaed947bf5dbf7a4892f3b4
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
2021-02-05 17:37:33 +00:00
Alexander Potapenko
94a6873e86 FROMGIT: kasan: use error_report_end tracepoint
Make it possible to trace KASAN error reporting.  A good usecase is
watching for trace events from the userspace to detect and process memory
corruption reports from the kernel.

Link: https://lkml.kernel.org/r/20210121131915.1331302-4-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 99bbc9fba9b0c8956b817dd21ecb510da8b676dc
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I60ca062373d13f08ed4ed05bee341a8c28619407
2021-02-05 09:20:54 -08:00
Alexander Potapenko
fc3ec019c8 FROMGIT: kfence: use error_report_end tracepoint
Make it possible to trace KFENCE error reporting.  A good usecase is
watching for trace events from the userspace to detect and process memory
corruption reports from the kernel.

Link: https://lkml.kernel.org/r/20210121131915.1331302-3-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 2eb9559621418b844b29d2b56d9962ac019f600b
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I991cd1eee48d107bcdf5e1c0d9194e6a386b9bdf
2021-02-05 09:20:54 -08:00
Alexander Potapenko
8d1fea4a24 FROMGIT: tracing: add error_report_end trace point
Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3.

This patchset adds a tracepoint, error_repor_end, that is to be used by
KFENCE, KASAN, and potentially other bug detection tools, when they print
an error report.  One of the possible use cases is userspace collection of
kernel error reports: interested parties can subscribe to the tracing
event via tracefs, and get notified when an error report occurs.

This patch (of 3):

Introduce error_report_end tracepoint.  It can be used in debugging tools
like KASAN, KFENCE, etc.  to provide extensions to the error reporting
mechanisms (e.g.  allow tests hook into error reporting, ease error report
collection from production kernels).  Another benefit would be making use
of ftrace for debugging or benchmarking the tools themselves.

Should we need it, the tracepoint name leaves us with the possibility to
introduce a complementary error_report_start tracepoint in the future.

Link: https://lkml.kernel.org/r/20210121131915.1331302-1-glider@google.com
Link: https://lkml.kernel.org/r/20210121131915.1331302-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit ba7612c00686f204f7bca4ceb7394a9e705e84bd
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Ic86e29982c04dad4b3b7889a424f37b22cc5f22b
2021-02-05 09:20:54 -08:00
Marco Elver
2da503f43b FROMGIT: kfence: show access type in report
Show the access type in KFENCE reports by plumbing through read/write
information from the page fault handler.  Update the documentation and
test accordingly.

Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Jörn Engel <joern@purestorage.com>
Reviewed-by: Jörn Engel <joern@purestorage.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit e29117c1fbf30d27d5afe41cf34263e1fd8e4f04
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I2e9bb224292cf92ac828232c51cd57024ac56d7d
2021-02-05 09:20:54 -08:00
Marco Elver
be3b5ae235 FROMGIT: kfence: fix typo in test
Fix a typo/accidental copy-paste that resulted in the obviously
incorrect 'GFP_KERNEL * 2' expression.

Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 161c8770c371ca3992565d8f1db1ad4a88a562e9
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I4c26617c64fd2e6d410e6e793bb0eebf8fc87e55
2021-02-05 09:20:54 -08:00
Marco Elver
d15b326fe3 FROMGIT: kfence: add test suite
Add KFENCE test suite, testing various error detection scenarios. Makes
use of KUnit for test organization. Since KFENCE's interface to obtain
error reports is via the console, the test verifies that KFENCE outputs
expected reports to the console.

Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit d6364119849bb0432e9a46e9699519ea9ff1bb77
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I733090d4109a795c078fe8090c46b19cdfe9413f
2021-02-05 09:20:54 -08:00
Marco Elver
7e77808675 FROMGIT: kfence: add missing copyright header to documentation
Add missing copyright header to KFENCE documentation.

Link: https://lkml.kernel.org/r/20210118092159.145934-4-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 14c8d0148b7b44c549c8892a93a800a9650fa31c
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Iaf953c9bf1f220054b1ec4068f2ba0bd9a7e9fc4
2021-02-05 09:20:54 -08:00
Marco Elver
a6c0e21733 FROMGIT: kfence, Documentation: add KFENCE documentation
Add KFENCE documentation in dev-tools/kfence.rst, and add to index.

Link: https://lkml.kernel.org/r/20201103175841.3495947-8-elver@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit f63eeaffb09b57d4c9cb9c92c4925ee3bd3df457
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Iaedafbf20944d68c87229f6c8b2a6e5155fcee60
2021-02-05 09:20:53 -08:00
Alexander Potapenko
263969e007 FROMGIT: kfence, kasan: make KFENCE compatible with KASAN
Make KFENCE compatible with KASAN. Currently this helps test KFENCE
itself, where KASAN can catch potential corruptions to KFENCE state, or
other corruptions that may be a result of freepointer corruptions in the
main allocators.

Link: https://lkml.kernel.org/r/20201103175841.3495947-7-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Co-developed-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 8ab944ae627dc9fb165bff68acc465751a0b8de2
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I2f862c2e514e7fcff50a019048c8f0d22f46e6c4
2021-02-05 09:20:53 -08:00
Alexander Potapenko
2019e66b4e FROMGIT: mm, kfence: insert KFENCE hooks for SLUB
Inserts KFENCE hooks into the SLUB allocator.

To pass the originally requested size to KFENCE, add an argument
'orig_size' to slab_alloc*(). The additional argument is required to
preserve the requested original size for kmalloc() allocations, which
uses size classes (e.g. an allocation of 272 bytes will return an object
of size 512). Therefore, kmem_cache::size does not represent the
kmalloc-caller's requested size, and we must introduce the argument
'orig_size' to propagate the originally requested size to KFENCE.

Without the originally requested size, we would not be able to detect
out-of-bounds accesses for objects placed at the end of a KFENCE object
page if that object is not equal to the kmalloc-size class it was
bucketed into.

When KFENCE is disabled, there is no additional overhead, since
slab_alloc*() functions are __always_inline.

Link: https://lkml.kernel.org/r/20201103175841.3495947-6-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Co-developed-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 5aa4d4f541f500cd63f814ae39bea55024ed5c6b
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Iacfc22088087cb920107a595f71b09e21d5f2f47
2021-02-05 09:20:53 -08:00
Alexander Potapenko
6ba57f3a0c BACKPORT: mm, kfence: insert KFENCE hooks for SLAB
Inserts KFENCE hooks into the SLAB allocator.

To pass the originally requested size to KFENCE, add an argument
'orig_size' to slab_alloc*(). The additional argument is required to
preserve the requested original size for kmalloc() allocations, which
uses size classes (e.g. an allocation of 272 bytes will return an object
of size 512). Therefore, kmem_cache::size does not represent the
kmalloc-caller's requested size, and we must introduce the argument
'orig_size' to propagate the originally requested size to KFENCE.

Without the originally requested size, we would not be able to detect
out-of-bounds accesses for objects placed at the end of a KFENCE object
page if that object is not equal to the kmalloc-size class it was
bucketed into.

When KFENCE is disabled, there is no additional overhead, since
slab_alloc*() functions are __always_inline.

Link: https://lkml.kernel.org/r/20201103175841.3495947-5-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Marco Elver <elver@google.com>

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

[glider: resolved minor API change in mm/slab_common.c]
Bug: 177201466
(cherry picked from commit 840c0553e89413319d67971a321bcc07114da9b8
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Iab2ba9c7b06b9a234d93ba892be639941861f8ab
2021-02-05 09:20:53 -08:00
Alexander Popov
1e95fcd132 FROMGIT: mm/slab: rerform init_on_free earlier
Currently in CONFIG_SLAB init_on_free happens too late, and heap objects
go to the heap quarantine not being erased.

Lets move init_on_free clearing before calling kasan_slab_free().  In that
case heap quarantine will store erased objects, similarly to CONFIG_SLUB=y
behavior.

Link: https://lkml.kernel.org/r/20201210183729.1261524-1-alex.popov@linux.com
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 177201466
(cherry picked from commit a32d654db5
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I2bf5e70c1524619526efd792bbdd959b813af1e4
2021-02-05 09:20:53 -08:00
Marco Elver
1ac855fd1f FROMGIT: kfence: use pt_regs to generate stack trace on faults
Instead of removing the fault handling portion of the stack trace based on
the fault handler's name, just use struct pt_regs directly.

Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it
through to kfence_report_error() for out-of-bounds, use-after-free, or
invalid access errors, where pt_regs is used to generate the stack trace.

If the kernel is a DEBUG_KERNEL, also show registers for more information.

Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 54a5abe9b5d542ee71836439cc662efe178c8211
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I3a60060b24f0efb4faee2e6c953973bc1263e8d1
2021-02-05 09:20:53 -08:00
Marco Elver
d92230e20b FROMGIT: kfence, arm64: add missing copyright and description header
Add missing copyright and description header to KFENCE source file.

Link: https://lkml.kernel.org/r/20210118092159.145934-3-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit ce19707fadfe62b7441f2a9455b6866161fbc1f0
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I75c06e57e4bb906626c13318d00cfc7d47b76bde
2021-02-05 09:20:53 -08:00
Marco Elver
16591e4945 FROMGIT: arm64, kfence: enable KFENCE for ARM64
Add architecture specific implementation details for KFENCE and enable
KFENCE for the arm64 architecture. In particular, this implements the
required interface in <asm/kfence.h>.

KFENCE requires that attributes for pages from its memory pool can
individually be set. Therefore, force the entire linear map to be mapped
at page granularity. Doing so may result in extra memory allocated for
page tables in case rodata=full is not set; however, currently
CONFIG_RODATA_FULL_DEFAULT_ENABLED=y is the default, and the common case
is therefore not affected by this change.

Link: https://lkml.kernel.org/r/20201103175841.3495947-4-elver@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit d3cb0555da9b469c3f9ebe663d4c0d6265757175
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Ibe9d26c2c9088862ba8855c18e0c5210a68c7f50
2021-02-05 09:20:53 -08:00
Marco Elver
ea098a1eb9 FROMGIT: kfence, x86: add missing copyright and description header
Add missing copyright and description header to KFENCE source file.

Link: https://lkml.kernel.org/r/20210118092159.145934-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 8470d53956b7e1b6f3f2dd17f277b9d2bb1472ae
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I90aef6f80e7b5a46f5168d1a24c02a5737785831
2021-02-05 09:20:53 -08:00
Alexander Potapenko
2eac397390 FROMGIT: x86, kfence: enable KFENCE for x86
Add architecture specific implementation details for KFENCE and enable
KFENCE for the x86 architecture. In particular, this implements the
required interface in <asm/kfence.h> for setting up the pool and
providing helper functions for protecting and unprotecting pages.

For x86, we need to ensure that the pool uses 4K pages, which is done
using the set_memory_4k() helper function.

Link: https://lkml.kernel.org/r/20201103175841.3495947-3-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Marco Elver <elver@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit cb99fcd83140d0d58ea36db6c1c2034abc95f983
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I111caffb0b88c34ed9ff57b95f127b08eacedcb9
2021-02-05 09:20:53 -08:00
Marco Elver
752081e03f FROMGIT: kfence: add missing copyright and description headers
Add missing copyright and description headers to KFENCE source files.

Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit b86d1ed1155ce1d2420057bfbdcc62b9fd53c1d6
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I95eb6756baaa0d8ca1dbc656708667372cff546d
2021-02-05 09:20:53 -08:00
Marco Elver
33ad66179a FROMGIT: kfence: add option to use KFENCE without static keys
For certain usecases, specifically where the sample interval is always
set to a very low value such as 1ms, it can make sense to use a dynamic
branch instead of static branches due to the overhead of toggling a
static branch.

Therefore, add a new Kconfig option to remove the static branches and
instead check kfence_allocation_gate if a KFENCE allocation should be
set up.

Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Jörn Engel <joern@purestorage.com>
Reviewed-by: Jörn Engel <joern@purestorage.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit c01761611b325c1e4ec7d3e236cc9db003cb82fd
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I68a112a8ff68fa24742b198e036f130a9757c27f
2021-02-05 09:20:52 -08:00
Marco Elver
97d9142889 FROMGIT: kfence: fix potential deadlock due to wake_up()
Lockdep reports that we may deadlock when calling wake_up() in
__kfence_alloc(), because we may already hold base->lock.  This can happen
if debug objects are enabled:

    ...
    __kfence_alloc+0xa0/0xbc0 mm/kfence/core.c:710
    kfence_alloc include/linux/kfence.h:108 [inline]
    ...
    kmem_cache_zalloc include/linux/slab.h:672 [inline]
    fill_pool+0x264/0x5c0 lib/debugobjects.c:171
    __debug_object_init+0x7a/0xd10 lib/debugobjects.c:560
    debug_object_init lib/debugobjects.c:615 [inline]
    debug_object_activate+0x32c/0x3e0 lib/debugobjects.c:701
    debug_timer_activate kernel/time/timer.c:727 [inline]
    __mod_timer+0x77d/0xe30 kernel/time/timer.c:1048
    ...

Therefore, switch to an open-coded wait loop.  The difference to before is
that the waiter wakes up and rechecks the condition after 1 jiffy;
however, given the infrequency of kfence allocations, the difference is
insignificant.

Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com
Reported-by: syzbot+8983d6d4f7df556be565@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Hillf Danton <hdanton@sina.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit c5fb1ab1a3c6d0ee02d1054a10d51ffcac57aed5
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Iee40e9f216afbc3fce8e43c0e2a4bc807fdddf39
2021-02-05 09:20:52 -08:00
Marco Elver
ad318a7b61 FROMGIT: kfence: avoid stalling work queue task without allocations
To toggle the allocation gates, we set up a delayed work that calls
toggle_allocation_gate().  Here we use wait_event() to await an allocation
and subsequently disable the static branch again.  However, if the kernel
has stopped doing allocations entirely, we'd wait indefinitely, and stall
the worker task.  This may also result in the appropriate warnings if
CONFIG_DETECT_HUNG_TASK=y.

Therefore, introduce a 1 second timeout and use wait_event_timeout().  If
the timeout is reached, the static branch is disabled and a new delayed
work is scheduled to try setting up an allocation at a later time.

Note that, this scenario is very unlikely during normal workloads once the
kernel has booted and user space tasks are running.  It can, however,
happen during early boot after KFENCE has been enabled, when e.g.  running
tests that do not result in any allocations.

Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com
Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 80d4693491f6f20de01437319b081fdda2079e67
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I2332ff8144b8bce5c4574b01ea2863e0e71e6124
2021-02-05 09:20:52 -08:00
Marco Elver
b4e7724e69 FROMGIT: kfence: Fix parameter description for kfence_object_start()
Describe parameter @addr correctly by delimiting with ':'.

Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 65f5b471bfd099a54862e14a895724e982a381c9
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Ieb71f57f82351eaaabd3c63cd52e97fbfbfca2f1
2021-02-05 09:20:52 -08:00
Alexander Potapenko
adb54c78ab BACKPORT: mm: add Kernel Electric-Fence infrastructure
Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.  This
series enables KFENCE for the x86 and arm64 architectures, and adds
KFENCE hooks to the SLAB and SLUB allocators.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error.

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval,
the next allocation through the main allocator (SLAB or SLUB) returns a
guarded allocation from the KFENCE object pool. At this point, the timer
is reset, and the next allocation is set up after the expiration of the
interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE.

The KFENCE memory pool is of fixed size, and if the pool is exhausted no
further KFENCE allocations occur. The default config is conservative
with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
pages).

We have verified by running synthetic benchmarks (sysbench I/O,
hackbench) and production server-workload benchmarks that a kernel with
KFENCE (using sample intervals 100-500ms) is performance-neutral
compared to a non-KFENCE baseline kernel.

KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
properties. The name "KFENCE" is a homage to the Electric Fence Malloc
Debugger [2].

For more details, see Documentation/dev-tools/kfence.rst added in the
series -- also viewable here:

	https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst

[1] http://llvm.org/docs/GwpAsan.html
[2] https://linux.die.net/man/3/efence

This patch (of 9):

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error. To detect out-of-bounds
writes to memory within the object's page itself, KFENCE also uses
pattern-based redzones. The following figure illustrates the page
layout:

  ---+-----------+-----------+-----------+-----------+-----------+---
     | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
     | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
     | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
     | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
     | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
     | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
  ---+-----------+-----------+-----------+-----------+-----------+---

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval, a
guarded allocation from the KFENCE object pool is returned to the main
allocator (SLAB or SLUB). At this point, the timer is reset, and the
next allocation is set up after the expiration of the interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE. To date, we have verified by running synthetic
benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE
is performance-neutral compared to the non-KFENCE baseline.

For more details, see Documentation/dev-tools/kfence.rst (added later in
the series).

Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: SeongJae Park <sjpark@amazon.de>
Co-developed-by: Marco Elver <elver@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

[glider: resolved minor conflict in init/main.c]
Bug: 177201466
(cherry picked from commit 2a8dede73c3496bbd917644657f3735a4f508cb9
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I6b474675cc9732c31118df53fa06c3997f577218
2021-02-05 09:20:52 -08:00
Alessio Balsini
8a0e4c2b94 FROMLIST: fuse: Fix crediantials leak in passthrough read_iter
If the system doesn't have enough memory when fuse_passthrough_read_iter
is requested in asynchronous IO, an error is directly returned without
restoring the caller's credentials.
Fix by always ensuring credentials are restored.

Fixes: aa29f32988 ("FROMLIST: fuse: Use daemon creds in passthrough mode")
Link: https://lore.kernel.org/lkml/YB0qPHVORq7bJy6G@google.com/
Reported-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I4aff43f5dd8ddab2cc8871cd9f81438963ead5b6
2021-02-05 14:15:09 +00:00
Lokesh Gidra
6a6bc06393 UPSTREAM: userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob
With this change, when the knob is set to 0, it allows unprivileged users
to call userfaultfd, like when it is set to 1, but with the restriction
that page faults from only user-mode can be handled.  In this mode, an
unprivileged user (without SYS_CAP_PTRACE capability) must pass
UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.

This enables administrators to reduce the likelihood that an attacker with
access to userfaultfd can delay faulting kernel code to widen timing
windows for other exploits.

The default value of this knob is changed to 0.  This is required for
correct functioning of pipe mutex.  However, this will fail postcopy live
migration, which will be unnoticeable to the VM guests.  To avoid this,
set 'vm.userfault = 1' in /sys/sysctl.conf.

The main reason this change is desirable as in the short term is that the
Android userland will behave as with the sysctl set to zero.  So without
this commit, any Linux binary using userfaultfd to manage its memory would
behave differently if run within the Android userland.  For more details,
refer to Andrea's reply [1].

[1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/

Link: https://lkml.kernel.org/r/20201120030411.2690816-3-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: <calin@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit d0d4730ac2)

Bug: 160737021
Bug: 169683130
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Change-Id: I08b7080b49ca626c5ab41bb2621fa21fa9a928a2
2021-02-05 13:14:08 +00:00
Lokesh Gidra
b8af1f96cc UPSTREAM: userfaultfd: add UFFD_USER_MODE_ONLY
Patch series "Control over userfaultfd kernel-fault handling", v6.

This patch series is split from [1].  The other series enables SELinux
support for userfaultfd file descriptors so that its creation and movement
can be controlled.

It has been demonstrated on various occasions that suspending kernel code
execution for an arbitrary amount of time at any access to userspace
memory (copy_from_user()/copy_to_user()/...) can be exploited to change
the intended behavior of the kernel.  For instance, handling page faults
in kernel-mode using userfaultfd has been exploited in [2, 3].  Likewise,
FUSE, which is similar to userfaultfd in this respect, has been exploited
in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object.  It then adds a 'user-mode only' option to the
unprivileged_userfaultfd sysctl knob to require unprivileged callers to
use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to enhance
security vulnerabilities by lengthening the race window in kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

This patch (of 2):

userfaultfd handles page faults from both user and kernel code.  Add a new
UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting
userfaultfd object refuse to handle faults from kernel mode, treating
these faults as if SIGBUS were always raised, causing the kernel code to
fail with EFAULT.

A future patch adds a knob allowing administrators to give some processes
the ability to create userfaultfd file objects only if they pass
UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will
exploit userfaultfd's ability to delay kernel page faults to open timing
windows for future exploits.

Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com
Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.com
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <calin@google.com>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 37cd0575b8)

Bug: 160737021
Bug: 169683130
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Change-Id: I19ff309b616c7a4a247e8c8427a87caffb1b2df9
2021-02-05 13:14:00 +00:00
Daniel Colascione
dbc935c62b UPSTREAM: userfaultfd: use secure anon inodes for userfaultfd
This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione <dancol@google.com>
[LG: Remove owner inode from userfaultfd_ctx]
[LG: Use anon_inode_getfd_secure() in userfaultfd syscall]
[LG: Use inode of file in userfaultfd_read() in resolve_userfault_fork()]
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit b537900f15)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Ib2973ca3650a8defe15eded13294a3fb25356b9d
2021-02-05 11:04:21 +00:00
Daniel Colascione
924b494bac BACKPORT: selinux: teach SELinux about anonymous inodes
This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit 29cd6591ab)

Conflicts:
    security/selinux/include/classmap.h

(1. Removed 'lockdown' mapping to be in sync with d9cb255af3)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Iaa9f236f43bf225f089f00ead17e64326adbb328
2021-02-05 11:04:12 +00:00
Daniel Colascione
d7848abe40 UPSTREAM: fs: add LSM-supporting anon-inode interface
This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.

The new function accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules.
For example, in case of userfaultfd, the created inode is a 'logical child'
of the context_inode (userfaultfd inode of the parent process) in the sense
that it provides the security context required during creation of the child
process' userfaultfd inode.

Signed-off-by: Daniel Colascione <dancol@google.com>
[LG: Delete obsolete comments to alloc_anon_inode()]
[LG: Add context_inode description in comments to anon_inode_getfd_secure()]
[LG: Remove definition of anon_inode_getfile_secure() as there are no callers]
[LG: Make __anon_inode_getfile() static]
[LG: Use correct error cast in __anon_inode_getfile()]
[LG: Fix error handling in __anon_inode_getfile()]
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit e7e832ce6f)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I3061c599f2951368914a2ca9f56ea60387d42a1d
2021-02-05 11:03:56 +00:00
Lokesh Gidra
4e8b67477e UPSTREAM: security: add inode_init_security_anon() LSM hook
This change adds a new LSM hook, inode_init_security_anon(), that will
be used while creating secure anonymous inodes. The hook allows/denies
its creation and assigns a security context to the inode.

The new hook accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules
for granting/denying permission to create an anon-inode of the same type.
This context_inode's security_context can also be used to initialize the
newly created anon-inode's security_context.

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit 215b674b84)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I2bbbb7a5c2371103c5b632b791c5c397ae228e0b
2021-02-05 11:03:49 +00:00