Patch series "kasan, vmalloc, arm64: add vmalloc tagging support for SW/HW_TAGS", v6.
This patchset adds vmalloc tagging support for SW_TAGS and HW_TAGS
KASAN modes.
About half of patches are cleanups I went for along the way. None of them
seem to be important enough to go through stable, so I decided not to
split them out into separate patches/series.
The patchset is partially based on an early version of the HW_TAGS
patchset by Vincenzo that had vmalloc support. Thus, I added a
Co-developed-by tag into a few patches.
SW_TAGS vmalloc tagging support is straightforward. It reuses all of the
generic KASAN machinery, but uses shadow memory to store tags instead of
magic values. Naturally, vmalloc tagging requires adding a few
kasan_reset_tag() annotations to the vmalloc code.
HW_TAGS vmalloc tagging support stands out. HW_TAGS KASAN is based on Arm
MTE, which can only assigns tags to physical memory. As a result, HW_TAGS
KASAN only tags vmalloc() allocations, which are backed by page_alloc
memory. It ignores vmap() and others.
This patch (of 39):
Currently, should_skip_kasan_poison() has two definitions: one for when
CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, one for when it's not.
Instead of duplicating the checks, add a deferred_pages_enabled() helper
and use it in a single should_skip_kasan_poison() definition.
Also move should_skip_kasan_poison() closer to its caller and clarify all
conditions in the comment.
Link: https://lkml.kernel.org/r/cover.1643047180.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/658b79f5fb305edaf7dc16bc52ea870d3220d4a8.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 64748cf67a6b75dde881e282529dbf6a22979786
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 217222520
Change-Id: Ie47a5a877f515e6174a2a5db97f3e5d7a9a3245b
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform
dynamic checks using __builtin_object_size(ptr), which when failed will
panic the kernel.
Because the KASAN test deliberately performs out-of-bounds operations,
the kernel panics with FORTIFY_SOURCE, for example:
| kernel BUG at lib/string_helpers.c:910!
| invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
| CPU: 1 PID: 137 Comm: kunit_try_catch Tainted: G B 5.16.0-rc3+ #3
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
| RIP: 0010:fortify_panic+0x19/0x1b
| ...
| Call Trace:
| kmalloc_oob_in_memset.cold+0x16/0x16
| ...
Fix it by also hiding `ptr` from the optimizer, which will ensure that
__builtin_object_size() does not return a valid size, preventing
fortified string functions from panicking.
Link: https://lkml.kernel.org/r/20220124160744.1244685-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Nico Pache <npache@redhat.com>
Reviewed-by: Nico Pache <npache@redhat.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 09c6304e38)
Bug: 217222520
Change-Id: I51ef8a2fc61e4e86916cd4d83c1ca0d2d980a81d
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
The non-interrupt portion of interrupt stack traces before interrupt
entry is usually arbitrary. Therefore, saving stack traces of
interrupts (that include entries before interrupt entry) to stack depot
leads to unbounded stackdepot growth.
As such, use of filter_irq_stacks() is a requirement to ensure
stackdepot can efficiently deduplicate interrupt stacks.
Looking through all current users of stack_depot_save(), none (except
KASAN) pass the stack trace through filter_irq_stacks() before passing
it on to stack_depot_save().
Rather than adding filter_irq_stacks() to all current users of
stack_depot_save(), it became clear that stack_depot_save() should
simply do filter_irq_stacks().
Link: https://lkml.kernel.org/r/20211130095727.2378739-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e940066089)
Bug: 217222520
Change-Id: I3176c61b5a1170096db036f3b7bda081bc6f838e
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
(Backport: adjacent lines changed in kmem_cache_destroy.)
Because mm/slab_common.c is not instrumented with software KASAN modes,
it is not possible to detect use-after-free of the kmem_cache passed
into kmem_cache_destroy(). In particular, because of the s->refcount--
and subsequent early return if non-zero, KASAN would never be able to
see the double-free via kmem_cache_free(kmem_cache, s). To be able to
detect a double-kmem_cache_destroy(), check accessibility of the
kmem_cache, and in case of failure return early.
While KASAN_HW_TAGS is able to detect such bugs, by checking
accessibility and returning early we fail more gracefully and also avoid
corrupting reused objects (where tags mismatch).
A recent case of a double-kmem_cache_destroy() was detected by KFENCE:
https://lkml.kernel.org/r/0000000000003f654905c168b09d@google.com, which
was not detectable by software KASAN modes.
Link: https://lkml.kernel.org/r/20211119142219.1519617-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit bed0a9b591)
Bug: 217222520
Change-Id: I6a3cbe5b92ea806c7c1e447a64c5f36abd679593
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Add a test checking that KASAN generic can also detect out-of-bounds
accesses to the left of globals.
Unfortunately it seems that GCC doesn't catch this (tested GCC 10, 11).
The main difference between GCC's globals redzoning and Clang's is that
GCC relies on using increased alignment to producing padding, where
Clang's redzoning implementation actually adds real data after the
global and doesn't rely on alignment to produce padding. I believe this
is the main reason why GCC can't reliably catch globals out-of-bounds in
this case.
Given this is now a known issue, to avoid failing the whole test suite,
skip this test case with GCC.
Link: https://lkml.kernel.org/r/20211117130714.135656-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Kaiwan N Billimoria <kaiwan.billimoria@gmail.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kaiwan N Billimoria <kaiwan.billimoria@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e5f4728767)
Bug: 217222520
Change-Id: Iba1aee23d8a63bdc68cd219b8fd35e40734d65aa
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
As done in commit d73dad4eb5 ("kasan: test: bypass __alloc_size
checks") for __write_overflow warnings, also silence some more cases
that trip the __read_overflow warnings seen in 5.16-rc1[1]:
In file included from include/linux/string.h:253,
from include/linux/bitmap.h:10,
from include/linux/cpumask.h:12,
from include/linux/mm_types_task.h:14,
from include/linux/mm_types.h:5,
from include/linux/page-flags.h:13,
from arch/arm64/include/asm/mte.h:14,
from arch/arm64/include/asm/pgtable.h:12,
from include/linux/pgtable.h:6,
from include/linux/kasan.h:29,
from lib/test_kasan.c:10:
In function 'memcmp',
inlined from 'kasan_memcmp' at lib/test_kasan.c:897:2:
include/linux/fortify-string.h:263:25: error: call to '__read_overflow' declared with attribute error: detected read beyond size of object (1st parameter)
263 | __read_overflow();
| ^~~~~~~~~~~~~~~~~
In function 'memchr',
inlined from 'kasan_memchr' at lib/test_kasan.c:872:2:
include/linux/fortify-string.h:277:17: error: call to '__read_overflow' declared with attribute error: detected read beyond size of object (1st parameter)
277 | __read_overflow();
| ^~~~~~~~~~~~~~~~~
[1] http://kisskb.ellerman.id.au/kisskb/buildresult/14660585/log/
Link: https://lkml.kernel.org/r/20211116004111.3171781-1-keescook@chromium.org
Fixes: d73dad4eb5 ("kasan: test: bypass __alloc_size checks")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit cab71f7495)
Bug: 217222520
Change-Id: I1e521864e42b993ed5f3815c0a640ef86b8818e6
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
With HW tag-based KASAN, error checks are performed implicitly by the
load and store instructions in the memcpy implementation. A failed
check results in tag checks being disabled and execution will keep
going. As a result, under HW tag-based KASAN, prior to commit
1b0668be62 ("kasan: test: disable kmalloc_memmove_invalid_size for
HW_TAGS"), this memcpy would end up corrupting memory until it hits an
inaccessible page and causes a kernel panic.
This is a pre-existing issue that was revealed by commit 285133040e
("arm64: Import latest memcpy()/memmove() implementation") which changed
the memcpy implementation from using signed comparisons (incorrectly,
resulting in the memcpy being terminated early for negative sizes) to
using unsigned comparisons.
It is unclear how this could be handled by memcpy itself in a reasonable
way. One possibility would be to add an exception handler that would
force memcpy to return if a tag check fault is detected -- this would
make the behavior roughly similar to generic and SW tag-based KASAN.
However, this wouldn't solve the problem for asynchronous mode and also
makes memcpy behavior inconsistent with manually copying data.
This test was added as a part of a series that taught KASAN to detect
negative sizes in memory operations, see commit 8cceeff48f ("kasan:
detect negative size in memory operation function"). Therefore we
should keep testing for negative sizes with generic and SW tag-based
KASAN. But there is some value in testing small memcpy overflows, so
let's add another test with memcpy that does not destabilize the kernel
by performing out-of-bounds writes, and run it in all modes.
Link: https://linux-review.googlesource.com/id/I048d1e6a9aff766c4a53f989fb0c83de68923882
Link: https://lkml.kernel.org/r/20210910211356.3603758-1-pcc@google.com
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 758cabae31)
Bug: 217222520
Change-Id: I2cde4d706018889c635e290c8f94884688cba172
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
The default kasan_record_aux_stack() calls stack_depot_save() with GFP_NOWAIT,
which in turn can then call alloc_pages(GFP_NOWAIT, ...). In general, however,
it is not even possible to use either GFP_ATOMIC nor GFP_NOWAIT in certain
non-preemptive contexts/RT kernel including raw_spin_locks (see gfp.h and ab00db216c).
Fix it by instructing stackdepot to not expand stack storage via alloc_pages()
in case it runs out by using kasan_record_aux_stack_noalloc().
Jianwei Hu reported:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:969
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 15319, name: python3
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
softirqs last enabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 6 PID: 15319 Comm: python3 Tainted: G W O 5.15-rc7-preempt-rt #1
Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1b 12/17/2018
Call Trace:
show_stack+0x52/0x58
dump_stack+0xa1/0xd6
___might_sleep.cold+0x11c/0x12d
rt_spin_lock+0x3f/0xc0
rmqueue+0x100/0x1460
rmqueue+0x100/0x1460
mark_usage+0x1a0/0x1a0
ftrace_graph_ret_addr+0x2a/0xb0
rmqueue_pcplist.constprop.0+0x6a0/0x6a0
__kasan_check_read+0x11/0x20
__zone_watermark_ok+0x114/0x270
get_page_from_freelist+0x148/0x630
is_module_text_address+0x32/0xa0
__alloc_pages_nodemask+0x2f6/0x790
__alloc_pages_slowpath.constprop.0+0x12d0/0x12d0
create_prof_cpu_mask+0x30/0x30
alloc_pages_current+0xb1/0x150
stack_depot_save+0x39f/0x490
kasan_save_stack+0x42/0x50
kasan_save_stack+0x23/0x50
kasan_record_aux_stack+0xa9/0xc0
__call_rcu+0xff/0x9c0
call_rcu+0xe/0x10
put_object+0x53/0x70
__delete_object+0x7b/0x90
kmemleak_free+0x46/0x70
slab_free_freelist_hook+0xb4/0x160
kfree+0xe5/0x420
kfree_const+0x17/0x30
kobject_cleanup+0xaa/0x230
kobject_put+0x76/0x90
netdev_queue_update_kobjects+0x17d/0x1f0
... ...
ksys_write+0xd9/0x180
__x64_sys_write+0x42/0x50
do_syscall_64+0x38/0x50
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Links: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/kasan.h?id=7cb3007ce2da27ec02a1a3211941e7fe6875b642
Fixes: 84109ab585 ("rcu: Record kvfree_call_rcu() call stack for KASAN")
Fixes: 26e760c9a7 ("rcu: kasan: record and print call_rcu() call stack")
Reported-by: Jianwei Hu <jianwei.hu@windriver.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Marco Elver <elver@google.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Jun Miao <jun.miao@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 300c0c5e72)
Bug: 217222520
Change-Id: I5b7d4f9dfe5da290627599d8d6de3278debb2a13
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Shuah Khan reported:
| When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
| kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
| it tries to allocate memory attempting to acquire spinlock in page
| allocation code while holding workqueue pool raw_spinlock.
|
| There are several instances of this problem when block layer tries
| to __queue_work(). Call trace from one of these instances is below:
|
| kblockd_mod_delayed_work_on()
| mod_delayed_work_on()
| __queue_delayed_work()
| __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
| insert_work()
| kasan_record_aux_stack()
| kasan_save_stack()
| stack_depot_save()
| alloc_pages()
| __alloc_pages()
| get_page_from_freelist()
| rm_queue()
| rm_queue_pcplist()
| local_lock_irqsave(&pagesets.lock, flags);
| [ BUG: Invalid wait context triggered ]
The default kasan_record_aux_stack() calls stack_depot_save() with
GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...).
In general, however, it is not even possible to use either GFP_ATOMIC
nor GFP_NOWAIT in certain non-preemptive contexts, including
raw_spin_locks (see gfp.h and commmit ab00db216c).
Fix it by instructing stackdepot to not expand stack storage via
alloc_pages() in case it runs out by using
kasan_record_aux_stack_noalloc().
While there is an increased risk of failing to insert the stack trace,
this is typically unlikely, especially if the same insertion had already
succeeded previously (stack depot hit).
For frequent calls from the same location, it therefore becomes
extremely unlikely that kasan_record_aux_stack_noalloc() fails.
Link: https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
Link: https://lkml.kernel.org/r/20210913112609.2651084-7-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Taras Madan <tarasmadan@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f70da745be)
Bug: 217222520
Change-Id: I81052dba582dbcb492fd7efcef8d56bac4766503
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Patch series "stackdepot, kasan, workqueue: Avoid expanding stackdepot
slabs when holding raw_spin_lock", v2.
Shuah Khan reported [1]:
| When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
| kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
| it tries to allocate memory attempting to acquire spinlock in page
| allocation code while holding workqueue pool raw_spinlock.
|
| There are several instances of this problem when block layer tries
| to __queue_work(). Call trace from one of these instances is below:
|
| kblockd_mod_delayed_work_on()
| mod_delayed_work_on()
| __queue_delayed_work()
| __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
| insert_work()
| kasan_record_aux_stack()
| kasan_save_stack()
| stack_depot_save()
| alloc_pages()
| __alloc_pages()
| get_page_from_freelist()
| rm_queue()
| rm_queue_pcplist()
| local_lock_irqsave(&pagesets.lock, flags);
| [ BUG: Invalid wait context triggered ]
PROVE_RAW_LOCK_NESTING is pointing out that (on RT kernels) the locking
rules are being violated. More generally, memory is being allocated
from a non-preemptive context (raw_spin_lock'd c-s) where it is not
allowed.
To properly fix this, we must prevent stackdepot from replenishing its
"stack slab" pool if memory allocations cannot be done in the current
context: it's a bug to use either GFP_ATOMIC nor GFP_NOWAIT in certain
non-preemptive contexts, including raw_spin_locks (see gfp.h and commit
ab00db216c).
The only downside is that saving a stack trace may fail if: stackdepot
runs out of space AND the same stack trace has not been recorded before.
I expect this to be unlikely, and a simple experiment (boot the kernel)
didn't result in any failure to record stack trace from insert_work().
The series includes a few minor fixes to stackdepot that I noticed in
preparing the series. It then introduces __stack_depot_save(), which
exposes the option to force stackdepot to not allocate any memory.
Finally, KASAN is changed to use the new stackdepot interface and
provide kasan_record_aux_stack_noalloc(), which is then used by
workqueue code.
[1] https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
This patch (of 6):
<linux/stackdepot.h> refers to gfp_t, but doesn't include gfp.h.
Fix it by including <linux/gfp.h>.
Link: https://lkml.kernel.org/r/20210913112609.2651084-1-elver@google.com
Link: https://lkml.kernel.org/r/20210913112609.2651084-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Taras Madan <tarasmadan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 7857ccdf94)
Bug: 217222520
Change-Id: Idba812b3e8c92211d1b6d359618793c8738e6a0d
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
After switched page size from 64KB to 4KB on several arm64 servers here,
kmemleak starts to run out of early memory pool due to a huge number of
those early_pgtable_alloc() calls:
kmemleak_alloc_phys()
memblock_alloc_range_nid()
memblock_phys_alloc_range()
early_pgtable_alloc()
init_pmd()
alloc_init_pud()
__create_pgd_mapping()
__map_memblock()
paging_init()
setup_arch()
start_kernel()
Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times
won't be enough for a server with 200GB+ memory. There isn't much
interesting to check memory leaks for those early page tables and those
early memory mappings should not reference to other memory. Hence, no
kmemleak false positives, and we can safely skip tracking those early
allocations from kmemleak like we did in the commit fed84c7852
("mm/memblock.c: skip kmemleak for kasan_init()") without needing to
introduce complications to automatically scale the value depends on the
runtime memory size etc. After the patch, the default value of
DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again.
Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20211105150509.7826-1-quic_qiancai@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit c6975d7cab)
Change-Id: Ie2a33b4219185948cbbc599df76973d547c78dbb
Bug: 217222520
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Architectures supported by KASAN_HW_TAGS can provide an asymmetric mode
of execution. On an MTE enabled arm64 hw for example this can be
identified with the asymmetric tagging mode of execution. In particular,
when such a mode is present, the CPU triggers a fault on a tag mismatch
during a load operation and asynchronously updates a register when a tag
mismatch is detected during a store operation.
Extend the KASAN HW execution mode kernel command line parameter to
support asymmetric mode.
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20211006154751.4463-6-vincenzo.frascino@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 2d27e58514)
Bug: 217221156
Change-Id: I5284fd8a4e8c2ddb1e06ca65bed133e35d70eb7f
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
MTE provides an asymmetric mode for detecting tag exceptions. In
particular, when such a mode is present, the CPU triggers a fault
on a tag mismatch during a load operation and asynchronously updates
a register when a tag mismatch is detected during a store operation.
Add support for MTE asymmetric mode.
Note: If the CPU does not support MTE asymmetric mode the kernel falls
back on synchronous mode which is the default for kasan=on.
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20211006154751.4463-5-vincenzo.frascino@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit ec0288369f)
Bug: 217221156
Change-Id: I6ed463f3df90f7cb5fb7ac11bbb6345a0770d7fc
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Right now, define_common_kernels() uses a logic
kmi_symbol_lists = glob(["android/abi_gki_aarch64*"])
as the list of KMI symbols for aarch64 non debug builds.
If the list differs from
KMI_SYMBOL_LIST and ADDITIONAL_KMI_SYMBOL_LISTS,
the value needs to be manually overridden.
In addition, define_common_kernels() sets
trim_nonlisted_kmi = not kmi_symbol_lists.empty()
for aarch64 non debug builds. If this value differs from
TRIM_NONLISTED_KMI, the value needs to be manually overriden.
To ensure that they don't get out of sync, add a note
in both places to keep them in sync.
In the future, we can load values from build.config like we
did for CLANG_VERSION in build.config.common. Then, this note
can be deleted.
Bug: 215745244
Test: none
Change-Id: I7e2c62e7dd97c6b06f4d628c3c8672922e99aaee
Signed-off-by: Yifan Hong <elsk@google.com>
Changes in 5.15.23
moxart: fix potential use-after-free on remove path
arm64: Add Cortex-A510 CPU part definition
KVM: s390: Return error on SIDA memop on normal guest
ksmbd: fix SMB 3.11 posix extension mount failure
crypto: api - Move cryptomgr soft dependency into algapi
tipc: improve size validations for received domain records
Linux 5.15.23
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib071a172997afc03a06eaee09d1f19086f8682aa
commit 9aa422ad32 upstream.
The function tipc_mon_rcv() allows a node to receive and process
domain_record structs from peer nodes to track their views of the
network topology.
This patch verifies that the number of members in a received domain
record does not exceed the limit defined by MAX_MON_DOMAIN, something
that may otherwise lead to a stack overflow.
tipc_mon_rcv() is called from the function tipc_link_proto_rcv(), where
we are reading a 32 bit message data length field into a uint16. To
avert any risk of bit overflow, we add an extra sanity check for this in
that function. We cannot see that happen with the current code, but
future designers being unaware of this risk, may introduce it by
allowing delivery of very large (> 64k) sk buffers from the bearer
layer. This potential problem was identified by Eric Dumazet.
This fixes CVE-2022-0435
Reported-by: Samuel Page <samuel.page@appgate.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: 35c55c9877 ("tipc: add neighbor monitoring framework")
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Samuel Page <samuel.page@appgate.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6ce9c5831 upstream.
The soft dependency on cryptomgr is only needed in algapi because
if algapi isn't present then no algorithms can be loaded. This
also fixes the case where api is built-in but algapi is built as
a module as the soft dependency would otherwise get lost.
Fixes: 8ab23d547f ("crypto: api - Add softdep on cryptomgr")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ca8581e79 upstream.
cifs client set 4 to DataLength of create_posix context, which mean
Mode variable of create_posix context is only available. So buffer
validation of ksmbd should check only the size of Mode except for
the size of Reserved variable.
Fixes: 8f77150c15 ("ksmbd: add buffer validation for SMB2_CREATE_CONTEXT")
Cc: stable@vger.kernel.org # v5.15+
Reported-by: Steve French <smfrench@gmail.com>
Tested-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c212e1bae upstream.
Refuse SIDA memops on guests which are not protected.
For normal guests, the secure instruction data address designation,
which determines the location we access, is not under control of KVM.
Fixes: 19e1227768 (KVM: S390: protvirt: Introduce instruction data area bounce buffer)
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8250_core registers 4 ISA uart ports by default, which can cause
problems on some devices which don't have them. This change doesn't
break earlycon=uart8250, but it will cause the 8250_of and 8250_pci sub
drivers to be unable to register ports. Boards that really need the full
8250 driver to take over from earlycon can use the "8250.nr_uarts=X"
kernel command line option to restore the ports allocation.
Bug: 216312411
Signed-off-by: Alistair Delva <adelva@google.com>
Change-Id: I04715394b32bd98544657101de4537df34554ea9
If a task with a restricted possible CPU mask and PF_FROZEN or
PF_FREEZER_SKIP set blocks, then we must not put it back on the runqueue
to handle a signal because this could lead to migration failures later
on if the suspending CPU is not capable of running it.
Return such a task to the runqueue only if a fatal signal is pending,
and otherwise allow the task to block.
Bug: 202918514
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I04cc9e65751f2bffc556c4da9ef02fe386764324
Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.
Although we take care to prevent explicit hot-unplug of all 32-bit
capable CPUs on such a system, this is required when suspending on some
SoCs where the firmware mandates that the suspend/resume operation is
handled by CPU 0, which may not be capable of running 32-bit tasks.
Consequently, there is a window on the resume path where no 32-bit
capable CPUs are available for scheduling and waking up a 32-bit task
will result in a scheduler BUG() due to failure of select_fallback_rq():
| kernel BUG at kernel/sched/core.c:2858!
| Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
| ...
| Call trace:
| select_fallback_rq+0x4b0/0x4e4
| try_to_wake_up.llvm.4388853297126348405+0x460/0x5b0
| default_wake_function+0x1c/0x30
| autoremove_wake_function+0x1c/0x60
| __wake_up_common.llvm.11763074518265335900+0x100/0x1b8
| __wake_up+0x78/0xc4
| ep_poll_callback+0x20c/0x3fc
Prevent wakeups of unschedulable frozen tasks in ttwu() and instead
defer the wakeup to __thaw_tasks(), which runs only once all the
secondary CPUs are back online.
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/linux-arch/20210525151432.16875-17-will@kernel.org/
Bug: 186372082
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I5a0531b48d537a79e1926289b5a87edcd7dd78ad
(cherry picked from commit 94155f60a5)
Occasionally it is necessary to see if a task is either frozen or
sleeping in the PF_FREEZER_SKIP state. In preparation for adding
additional users of this check, introduce a frozen_or_skipped() helper
function and convert the hung task detector over to using it.
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/linux-arch/20210525151432.16875-16-will@kernel.org/
Bug: 186372082
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I138ffe2fae5a2da96df6f30d50d3a8a0dc61724c
(cherry picked from commit 9c12d36117)
This reverts commit 9d6e741783.
When BTF generation is enabled for modules, we can have module load
failures due to BTF mismatches even when ABI is unchanged. Currently
Kconfig does not allow disabling BTF for modules only, so turn it off
entirely for now.
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I0a7251397f95d6c02ba999bcf2ab9377ac0d76c3
The newly added ffa_compatible_version_find() function causes a
build warning because of a variable that is never used:
drivers/firmware/arm_ffa/driver.c:180:6: error: unused variable 'compat_version' [-Werror,-Wunused-variable]
u32 compat_version;
Link: https://lore.kernel.org/r/20211026083400.3444946-1-arnd@kernel.org
Fixes: 8e3f9da608 ("firmware: arm_ffa: Handle compatibility with different firmware versions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit 01537a078b)
Change-Id: Ia6eea859046c07d58b32d24113e4b0ea13509364
Bug: 168585974
Signed-off-by: Will Deacon <willdeacon@google.com>
As part of the FF-A spec, an endpoint is allowed to transfer access of,
or lend, a memory region to one or more borrowers.
Extend the existing memory sharing implementation to support
FF-A MEM_LEND functionality and expose this to other kernel drivers.
Note that upon a successful MEM_LEND request the caller must ensure that
the memory region specified is not accessed until a successful
MEM_RECALIM call has been made. On systems with a hypervisor present
this will been enforced, however on systems without a hypervisor the
responsibility falls to the calling kernel driver to prevent access.
Link: https://lore.kernel.org/r/20211015165742.2513065-1-marc.bonnici@arm.com
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Marc Bonnici <marc.bonnici@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit 82a8daaecf)
Change-Id: I45e0376785904580ca6585225b7b63cc49f66bf1
Bug: 168585974
Signed-off-by: Will Deacon <willdeacon@google.com>
The driver currently just support v1.0 of Arm FFA specification. It also
expects the firmware implementation to match the same and bail out if it
doesn't match. This is causing issue when running with higher version of
firmware implementation(e.g. v1.1 which will released soon).
In order to support compatibility with different firmware versions, let
us add additional checks and find the compatible version the driver can
work with.
Link: https://lore.kernel.org/r/20211013091127.990992-1-sudeep.holla@arm.com
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
(cherry picked from commit 8e3f9da608)
Change-Id: I7bc9a3b172a9067bfd4e9bb9d50b4729e915b5a5
Bug: 168585974
Signed-off-by: Will Deacon <willdeacon@google.com>