(Upstream commit 2813b9c029).
Tag-based KASAN doesn't check memory accesses through pointers tagged with
0xff. When page_address is used to get pointer to memory that corresponds
to some page, the tag of the resulting pointer gets set to 0xff, even
though the allocated memory might have been tagged differently.
For slab pages it's impossible to recover the correct tag to return from
page_address, since the page might contain multiple slab objects tagged
with different values, and we can't know in advance which one of them is
going to get accessed. For non slab pages however, we can recover the tag
in page_address, since the whole page was marked with the same tag.
This patch adds tagging to non slab memory allocated with pagealloc. To
set the tag of the pointer returned from page_address, the tag gets stored
to page->flags when the memory gets allocated.
Link: http://lkml.kernel.org/r/d758ddcef46a5abc9970182b9137e2fbee202a2c.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I500bdf42462fee0ee14495a6be51815e7e44460f
(Upstream commit 7f94ffbc4c).
This commit adds tag-based KASAN specific hooks implementation and
adjusts common generic and tag-based KASAN ones.
1. When a new slab cache is created, tag-based KASAN rounds up the size of
the objects in this cache to KASAN_SHADOW_SCALE_SIZE (== 16).
2. On each kmalloc tag-based KASAN generates a random tag, sets the shadow
memory, that corresponds to this object to this tag, and embeds this
tag value into the top byte of the returned pointer.
3. On each kfree tag-based KASAN poisons the shadow memory with a random
tag to allow detection of use-after-free bugs.
The rest of the logic of the hook implementation is very much similar to
the one provided by generic KASAN. Tag-based KASAN saves allocation and
free stack metadata to the slab object the same way generic KASAN does.
Link: http://lkml.kernel.org/r/bda78069e3b8422039794050ddcb2d53d053ed41.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I23c40472207cba40dc962bb182225e378322249e
(Upstream commit 5b7c414822).
While with SLUB we can actually preassign tags for caches with contructors
and store them in pointers in the freelist, SLAB doesn't allow that since
the freelist is stored as an array of indexes, so there are no pointers to
store the tags.
Instead we compute the tag twice, once when a slab is created before
calling the constructor and then again each time when an object is
allocated with kmalloc. Tag is computed simply by taking the lowest byte
of the index that corresponds to the object. However in kasan_kmalloc we
only have access to the objects pointer, so we need a way to find out
which index this object corresponds to.
This patch moves obj_to_index from slab.c to include/linux/slab_def.h to
be reused by KASAN.
Link: http://lkml.kernel.org/r/c02cd9e574cfd93858e43ac94b05e38f891fef64.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I6cb3332ea05e6152ab8de0490171ed5d4947def3
(Upstream commit 4d176711ea).
An object constructor can initialize pointers within this objects based on
the address of the object. Since the object address might be tagged, we
need to assign a tag before calling constructor.
The implemented approach is to assign tags to objects with constructors
when a slab is allocated and call constructors once as usual. The
downside is that such object would always have the same tag when it is
reallocated, so we won't catch use-after-frees on it.
Also pressign tags for objects from SLAB_TYPESAFE_BY_RCU caches, since
they can be validy accessed after having been freed.
Link: http://lkml.kernel.org/r/f158a8a74a031d66f0a9398a5b0ed453c37ba09a.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I48b1de0516b9998f3b3e3917a15dd0bde27897cf
The conflict during backport is caused by the
include/linux/compiler_attributes.h file not being present.
(Upstream commit 2bd926b439).
This commit splits the current CONFIG_KASAN config option into two:
1. CONFIG_KASAN_GENERIC, that enables the generic KASAN mode (the one
that exists now);
2. CONFIG_KASAN_SW_TAGS, that enables the software tag-based KASAN mode.
The name CONFIG_KASAN_SW_TAGS is chosen as in the future we will have
another hardware tag-based KASAN mode, that will rely on hardware memory
tagging support in arm64.
With CONFIG_KASAN_SW_TAGS enabled, compiler options are changed to
instrument kernel files with -fsantize=kernel-hwaddress (except the ones
for which KASAN_SANITIZE := n is set).
Both CONFIG_KASAN_GENERIC and CONFIG_KASAN_SW_TAGS support both
CONFIG_KASAN_INLINE and CONFIG_KASAN_OUTLINE instrumentation modes.
This commit also adds empty placeholder (for now) implementation of
tag-based KASAN specific hooks inserted by the compiler and adjusts
common hooks implementation.
While this commit adds the CONFIG_KASAN_SW_TAGS config option, this option
is not selectable, as it depends on HAVE_ARCH_KASAN_SW_TAGS, which we will
enable once all the infrastracture code has been added.
Link: http://lkml.kernel.org/r/b2550106eb8a68b10fefbabce820910b115aa853.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Id95c0c0b6857c6b30f2bea4597aea6c90273ef89
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
(Upstream commit 0116523cff).
Patch series "kasan: add software tag-based mode for arm64", v13.
This patchset adds a new software tag-based mode to KASAN [1]. (Initially
this mode was called KHWASAN, but it got renamed, see the naming rationale
at the end of this section).
The plan is to implement HWASan [2] for the kernel with the incentive,
that it's going to have comparable to KASAN performance, but in the same
time consume much less memory, trading that off for somewhat imprecise bug
detection and being supported only for arm64.
The underlying ideas of the approach used by software tag-based KASAN are:
1. By using the Top Byte Ignore (TBI) arm64 CPU feature, we can store
pointer tags in the top byte of each kernel pointer.
2. Using shadow memory, we can store memory tags for each chunk of kernel
memory.
3. On each memory allocation, we can generate a random tag, embed it into
the returned pointer and set the memory tags that correspond to this
chunk of memory to the same value.
4. By using compiler instrumentation, before each memory access we can add
a check that the pointer tag matches the tag of the memory that is being
accessed.
5. On a tag mismatch we report an error.
With this patchset the existing KASAN mode gets renamed to generic KASAN,
with the word "generic" meaning that the implementation can be supported
by any architecture as it is purely software.
The new mode this patchset adds is called software tag-based KASAN. The
word "tag-based" refers to the fact that this mode uses tags embedded into
the top byte of kernel pointers and the TBI arm64 CPU feature that allows
to dereference such pointers. The word "software" here means that shadow
memory manipulation and tag checking on pointer dereference is done in
software. As it is the only tag-based implementation right now, "software
tag-based" KASAN is sometimes referred to as simply "tag-based" in this
patchset.
A potential expansion of this mode is a hardware tag-based mode, which
would use hardware memory tagging support (announced by Arm [3]) instead
of compiler instrumentation and manual shadow memory manipulation.
Same as generic KASAN, software tag-based KASAN is strictly a debugging
feature.
[1] https://www.kernel.org/doc/html/latest/dev-tools/kasan.html
[2] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[3] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a
====== Rationale
On mobile devices generic KASAN's memory usage is significant problem.
One of the main reasons to have tag-based KASAN is to be able to perform a
similar set of checks as the generic one does, but with lower memory
requirements.
Comment from Vishwath Mohan <vishwath@google.com>:
I don't have data on-hand, but anecdotally both ASAN and KASAN have proven
problematic to enable for environments that don't tolerate the increased
memory pressure well. This includes
(a) Low-memory form factors - Wear, TV, Things, lower-tier phones like Go,
(c) Connected components like Pixel's visual core [1].
These are both places I'd love to have a low(er) memory footprint option at
my disposal.
Comment from Evgenii Stepanov <eugenis@google.com>:
Looking at a live Android device under load, slab (according to
/proc/meminfo) + kernel stack take 8-10% available RAM (~350MB). KASAN's
overhead of 2x - 3x on top of it is not insignificant.
Not having this overhead enables near-production use - ex. running
KASAN/KHWASAN kernel on a personal, daily-use device to catch bugs that do
not reproduce in test configuration. These are the ones that often cost
the most engineering time to track down.
CPU overhead is bad, but generally tolerable. RAM is critical, in our
experience. Once it gets low enough, OOM-killer makes your life
miserable.
[1] https://www.blog.google/products/pixel/pixel-visual-core-image-processing-and-machine-learning-pixel-2/
====== Technical details
Software tag-based KASAN mode is implemented in a very similar way to the
generic one. This patchset essentially does the following:
1. TCR_TBI1 is set to enable Top Byte Ignore.
2. Shadow memory is used (with a different scale, 1:16, so each shadow
byte corresponds to 16 bytes of kernel memory) to store memory tags.
3. All slab objects are aligned to shadow scale, which is 16 bytes.
4. All pointers returned from the slab allocator are tagged with a random
tag and the corresponding shadow memory is poisoned with the same value.
5. Compiler instrumentation is used to insert tag checks. Either by
calling callbacks or by inlining them (CONFIG_KASAN_OUTLINE and
CONFIG_KASAN_INLINE flags are reused).
6. When a tag mismatch is detected in callback instrumentation mode
KASAN simply prints a bug report. In case of inline instrumentation,
clang inserts a brk instruction, and KASAN has it's own brk handler,
which reports the bug.
7. The memory in between slab objects is marked with a reserved tag, and
acts as a redzone.
8. When a slab object is freed it's marked with a reserved tag.
Bug detection is imprecise for two reasons:
1. We won't catch some small out-of-bounds accesses, that fall into the
same shadow cell, as the last byte of a slab object.
2. We only have 1 byte to store tags, which means we have a 1/256
probability of a tag match for an incorrect access (actually even
slightly less due to reserved tag values).
Despite that there's a particular type of bugs that tag-based KASAN can
detect compared to generic KASAN: use-after-free after the object has been
allocated by someone else.
====== Testing
Some kernel developers voiced a concern that changing the top byte of
kernel pointers may lead to subtle bugs that are difficult to discover.
To address this concern deliberate testing has been performed.
It doesn't seem feasible to do some kind of static checking to find
potential issues with pointer tagging, so a dynamic approach was taken.
All pointer comparisons/subtractions have been instrumented in an LLVM
compiler pass and a kernel module that would print a bug report whenever
two pointers with different tags are being compared/subtracted (ignoring
comparisons with NULL pointers and with pointers obtained by casting an
error code to a pointer type) has been used. Then the kernel has been
booted in QEMU and on an Odroid C2 board and syzkaller has been run.
This yielded the following results.
The two places that look interesting are:
is_vmalloc_addr in include/linux/mm.h
is_kernel_rodata in mm/util.c
Here we compare a pointer with some fixed untagged values to make sure
that the pointer lies in a particular part of the kernel address space.
Since tag-based KASAN doesn't add tags to pointers that belong to rodata
or vmalloc regions, this should work as is. To make sure debug checks to
those two functions that check that the result doesn't change whether we
operate on pointers with or without untagging has been added.
A few other cases that don't look that interesting:
Comparing pointers to achieve unique sorting order of pointee objects
(e.g. sorting locks addresses before performing a double lock):
tty_ldisc_lock_pair_timeout in drivers/tty/tty_ldisc.c
pipe_double_lock in fs/pipe.c
unix_state_double_lock in net/unix/af_unix.c
lock_two_nondirectories in fs/inode.c
mutex_lock_double in kernel/events/core.c
ep_cmp_ffd in fs/eventpoll.c
fsnotify_compare_groups fs/notify/mark.c
Nothing needs to be done here, since the tags embedded into pointers
don't change, so the sorting order would still be unique.
Checks that a pointer belongs to some particular allocation:
is_sibling_entry in lib/radix-tree.c
object_is_on_stack in include/linux/sched/task_stack.h
Nothing needs to be done here either, since two pointers can only belong
to the same allocation if they have the same tag.
Overall, since the kernel boots and works, there are no critical bugs.
As for the rest, the traditional kernel testing way (use until fails) is
the only one that looks feasible.
Another point here is that tag-based KASAN is available under a separate
config option that needs to be deliberately enabled. Even though it might
be used in a "near-production" environment to find bugs that are not found
during fuzzing or running tests, it is still a debug tool.
====== Benchmarks
The following numbers were collected on Odroid C2 board. Both generic and
tag-based KASAN were used in inline instrumentation mode.
Boot time [1]:
* ~1.7 sec for clean kernel
* ~5.0 sec for generic KASAN
* ~5.0 sec for tag-based KASAN
Network performance [2]:
* 8.33 Gbits/sec for clean kernel
* 3.17 Gbits/sec for generic KASAN
* 2.85 Gbits/sec for tag-based KASAN
Slab memory usage after boot [3]:
* ~40 kb for clean kernel
* ~105 kb (~260% overhead) for generic KASAN
* ~47 kb (~20% overhead) for tag-based KASAN
KASAN memory overhead consists of three main parts:
1. Increased slab memory usage due to redzones.
2. Shadow memory (the whole reserved once during boot).
3. Quaratine (grows gradually until some preset limit; the more the limit,
the more the chance to detect a use-after-free).
Comparing tag-based vs generic KASAN for each of these points:
1. 20% vs 260% overhead.
2. 1/16th vs 1/8th of physical memory.
3. Tag-based KASAN doesn't require quarantine.
[1] Time before the ext4 driver is initialized.
[2] Measured as `iperf -s & iperf -c 127.0.0.1 -t 30`.
[3] Measured as `cat /proc/meminfo | grep Slab`.
====== Some notes
A few notes:
1. The patchset can be found here:
https://github.com/xairy/kasan-prototype/tree/khwasan
2. Building requires a recent Clang version (7.0.0 or later).
3. Stack instrumentation is not supported yet and will be added later.
This patch (of 25):
Tag-based KASAN changes the value of the top byte of pointers returned
from the kernel allocation functions (such as kmalloc). This patch
updates KASAN hooks signatures and their usage in SLAB and SLUB code to
reflect that.
Link: http://lkml.kernel.org/r/aec2b5e3973781ff8a6bb6760f8543643202c451.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I62e554e732ec79ffd195e2269c8a50aed14381c0
(Upstream commit 386b3c7bda).
So that we can export symbols directly from assembly files, let's make
use of the generic <asm/export.h>. We have a few symbols that we'll want
to conditionally export for !KASAN kernel builds, so we add a helper for
that in <asm/assembler.h>.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I7cfd1b717c9b172487f7a872a3b9a4b4485e454a
(Upstream commit 163c8d54a9).
The __no_sanitize_address_or_inline and __no_kasan_or_inline defines
are almost identical. The only difference is that __no_kasan_or_inline
does not have the 'notrace' attribute.
To be able to replace __no_sanitize_address_or_inline with the older
definition, add 'notrace' to __no_kasan_or_inline and change to two
users of __no_sanitize_address_or_inline in the s390 code.
The 'notrace' option is necessary for e.g. the __load_psw_mask function
in arch/s390/include/asm/processor.h. Without the option it is possible
to trace __load_psw_mask which leads to kernel stack overflow.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pointed-out-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I27af631729f8ea52e55f31c02f584c01a0918073
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
(Upstream commit 026d1eaf5e).
The static lock quarantine_lock is used in quarantine.c to protect the
quarantine queue datastructures. It is taken inside quarantine queue
manipulation routines (quarantine_put(), quarantine_reduce() and
quarantine_remove_cache()), with IRQs disabled. This is not a problem on
a stock kernel but is problematic on an RT kernel where spin locks are
sleeping spinlocks, which can sleep and can not be acquired with disabled
interrupts.
Convert the quarantine_lock to a raw spinlock_t. The usage of
quarantine_lock is confined to quarantine.c and the work performed while
the lock is held is used for debug purpose.
[bigeasy@linutronix.de: slightly altered the commit message]
Link: http://lkml.kernel.org/r/20181010214945.5owshc3mlrh74z4b@linutronix.de
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I12f35246b81b23cad5ce8b90407c00b86bf90cc0
(Upstream commit dde709d136).
Due to conflict between kasan instrumentation and inlining
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368 functions which are
defined as inline could not be called from functions defined with
__no_sanitize_address.
Introduce __no_sanitize_address_or_inline which would expand to
__no_sanitize_address when the kernel is built with kasan support and
to inline otherwise. This helps to avoid disabling kasan
instrumentation for entire files.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: If81ec5a63ae788bfe1a31a1678f9509daa76b01f
(Upstream commit 0293c8ba80).
"bellow" -> "below"
The recommendation from kegel.com/kerspell is to only fix the howlers.
"Bellow" is a synonym of "howl" so this should be appropriate.
Signed-off-by: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Bug: 128674696
Change-Id: I1c4a983d3d08ab1194ae541bfdaa7e5074ddea0e
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Leaf changes summary: 6 artifacts changed
Changed leaf types summary: 4 leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 1 Changed, 0 Added function (156 filtered out)
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable (4 filtered out)
1 function with some sub-type change:
'struct fb_info at fb.h:464:1' changed:
type size changed from 6144 to 6912 (in bits)
2 data member insertions:
'delayed_work fb_info::deferred_work', at offset 5376 (in bits) at fb.h:496:1
'fb_deferred_io* fb_info::fbdefio', at offset 6080 (in bits) at fb.h:497:1
there are data member changes:
'fb_ops* fb_info::fbops' offset changed from 5376 to 6144 (in bits) (by +768 bits)
'device* fb_info::device' offset changed from 5440 to 6208 (in bits) (by +768 bits)
'device* fb_info::dev' offset changed from 5504 to 6272 (in bits) (by +768 bits)
'int fb_info::class_flag' offset changed from 5568 to 6336 (in bits) (by +768 bits)
while looking at anonymous data member 'union {char* screen_base; char* screen_buffer;}':
the internal name of that anonymous data memberchanged from:
__anonymous_union__4
to:
__anonymous_union__1
This is usually due to an anonymous member type being added or removed from the containing type
offset changed from 5632 to 6400 (in bits) (by +768 bits)
'unsigned long int fb_info::screen_size' offset changed from 5696 to 6464 (in bits) (by +768 bits)
'void* fb_info::pseudo_palette' offset changed from 5760 to 6528 (in bits) (by +768 bits)
'u32 fb_info::state' offset changed from 5824 to 6592 (in bits) (by +768 bits)
'void* fb_info::fbcon_par' offset changed from 5888 to 6656 (in bits) (by +768 bits)
'void* fb_info::par' offset changed from 5952 to 6720 (in bits) (by +768 bits)
'apertures_struct* fb_info::apertures' offset changed from 6016 to 6784 (in bits) (by +768 bits)
'bool fb_info::skip_vt_switch' offset changed from 6080 to 6848 (in bits) (by +768 bits)
411 impacted interfaces:
'struct net at net_namespace.h:51:1' changed:
type size hasn't changed
1 data member insertion:
'sk_buff_head net::wext_nlevents', at offset 33984 (in bits) at net_namespace.h:145:1
there are data member changes:
'net_generic* net::gen' offset changed from 33984 to 34176 (in bits) (by +192 bits)
1495 impacted interfaces:
'struct net_device at netdevice.h:1745:1' changed:
type size hasn't changed
2 data member insertions:
'const iw_handler_def* net_device::wireless_handlers', at offset 3904 (in bits) at netdevice.h:1800:1
'iw_public_data* net_device::wireless_data', at offset 3968 (in bits) at netdevice.h:1801:1
there are data member changes:
'const net_device_ops* net_device::netdev_ops' offset changed from 3904 to 4032 (in bits) (by +128 bits)
'const ethtool_ops* net_device::ethtool_ops' offset changed from 3968 to 4096 (in bits) (by +128 bits)
'const ndisc_ops* net_device::ndisc_ops' offset changed from 4032 to 4160 (in bits) (by +128 bits)
'const header_ops* net_device::header_ops' offset changed from 4096 to 4224 (in bits) (by +128 bits)
'unsigned int net_device::flags' offset changed from 4160 to 4288 (in bits) (by +128 bits)
'unsigned int net_device::priv_flags' offset changed from 4192 to 4320 (in bits) (by +128 bits)
'unsigned short int net_device::gflags' offset changed from 4224 to 4352 (in bits) (by +128 bits)
'unsigned short int net_device::padded' offset changed from 4240 to 4368 (in bits) (by +128 bits)
'unsigned char net_device::operstate' offset changed from 4256 to 4384 (in bits) (by +128 bits)
'unsigned char net_device::link_mode' offset changed from 4264 to 4392 (in bits) (by +128 bits)
'unsigned char net_device::if_port' offset changed from 4272 to 4400 (in bits) (by +128 bits)
'unsigned char net_device::dma' offset changed from 4280 to 4408 (in bits) (by +128 bits)
'unsigned int net_device::mtu' offset changed from 4288 to 4416 (in bits) (by +128 bits)
'unsigned int net_device::min_mtu' offset changed from 4320 to 4448 (in bits) (by +128 bits)
'unsigned int net_device::max_mtu' offset changed from 4352 to 4480 (in bits) (by +128 bits)
'unsigned short int net_device::type' offset changed from 4384 to 4512 (in bits) (by +128 bits)
'unsigned short int net_device::hard_header_len' offset changed from 4400 to 4528 (in bits) (by +128 bits)
'unsigned char net_device::min_header_len' offset changed from 4416 to 4544 (in bits) (by +128 bits)
'unsigned short int net_device::needed_headroom' offset changed from 4432 to 4560 (in bits) (by +128 bits)
'unsigned short int net_device::needed_tailroom' offset changed from 4448 to 4576 (in bits) (by +128 bits)
'unsigned char net_device::perm_addr[32]' offset changed from 4464 to 4592 (in bits) (by +128 bits)
'unsigned char net_device::addr_assign_type' offset changed from 4720 to 4848 (in bits) (by +128 bits)
'unsigned char net_device::addr_len' offset changed from 4728 to 4856 (in bits) (by +128 bits)
'unsigned short int net_device::neigh_priv_len' offset changed from 4736 to 4864 (in bits) (by +128 bits)
'unsigned short int net_device::dev_id' offset changed from 4752 to 4880 (in bits) (by +128 bits)
'unsigned short int net_device::dev_port' offset changed from 4768 to 4896 (in bits) (by +128 bits)
'spinlock_t net_device::addr_list_lock' offset changed from 4800 to 4928 (in bits) (by +128 bits)
'unsigned char net_device::name_assign_type' offset changed from 4832 to 4960 (in bits) (by +128 bits)
'bool net_device::uc_promisc' offset changed from 4840 to 4968 (in bits) (by +128 bits)
'netdev_hw_addr_list net_device::uc' offset changed from 4864 to 4992 (in bits) (by +128 bits)
'netdev_hw_addr_list net_device::mc' offset changed from 5056 to 5184 (in bits) (by +128 bits)
'netdev_hw_addr_list net_device::dev_addrs' offset changed from 5248 to 5376 (in bits) (by +128 bits)
'kset* net_device::queues_kset' offset changed from 5440 to 5568 (in bits) (by +128 bits)
'unsigned int net_device::promiscuity' offset changed from 5504 to 5632 (in bits) (by +128 bits)
'unsigned int net_device::allmulti' offset changed from 5536 to 5664 (in bits) (by +128 bits)
'tipc_bearer* net_device::tipc_ptr' offset changed from 5568 to 5696 (in bits) (by +128 bits)
'in_device* net_device::ip_ptr' offset changed from 5632 to 5760 (in bits) (by +128 bits)
'inet6_dev* net_device::ip6_ptr' offset changed from 5696 to 5824 (in bits) (by +128 bits)
'wireless_dev* net_device::ieee80211_ptr' offset changed from 5760 to 5888 (in bits) (by +128 bits)
'wpan_dev* net_device::ieee802154_ptr' offset changed from 5824 to 5952 (in bits) (by +128 bits)
'unsigned char* net_device::dev_addr' offset changed from 5888 to 6016 (in bits) (by +128 bits)
'netdev_rx_queue* net_device::_rx' offset changed from 5952 to 6080 (in bits) (by +128 bits)
'unsigned int net_device::num_rx_queues' offset changed from 6016 to 6144 (in bits) (by +128 bits)
'unsigned int net_device::real_num_rx_queues' offset changed from 6048 to 6176 (in bits) (by +128 bits)
'bpf_prog* net_device::xdp_prog' offset changed from 6080 to 6208 (in bits) (by +128 bits)
'unsigned long int net_device::gro_flush_timeout' offset changed from 6144 to 6272 (in bits) (by +128 bits)
'rx_handler_func_t* net_device::rx_handler' offset changed from 6208 to 6336 (in bits) (by +128 bits)
'void* net_device::rx_handler_data' offset changed from 6272 to 6400 (in bits) (by +128 bits)
'mini_Qdisc* net_device::miniq_ingress' offset changed from 6336 to 6464 (in bits) (by +128 bits)
'netdev_queue* net_device::ingress_queue' offset changed from 6400 to 6528 (in bits) (by +128 bits)
'nf_hook_entries* net_device::nf_hooks_ingress' offset changed from 6464 to 6592 (in bits) (by +128 bits)
'unsigned char net_device::broadcast[32]' offset changed from 6528 to 6656 (in bits) (by +128 bits)
'cpu_rmap* net_device::rx_cpu_rmap' offset changed from 6784 to 6912 (in bits) (by +128 bits)
'hlist_node net_device::index_hlist' offset changed from 6848 to 6976 (in bits) (by +128 bits)
1332 impacted interfaces:
'struct pinctrl_dev at core.h:43:1' changed:
type size changed from 1152 to 1536 (in bits)
4 data member insertions:
'radix_tree_root pinctrl_dev::pin_group_tree', at offset 320 (in bits) at core.h:48:1
'unsigned int pinctrl_dev::num_groups', at offset 448 (in bits) at core.h:49:1
'radix_tree_root pinctrl_dev::pin_function_tree', at offset 512 (in bits) at core.h:52:1
'unsigned int pinctrl_dev::num_functions', at offset 640 (in bits) at core.h:53:1
there are data member changes:
'list_head pinctrl_dev::gpio_ranges' offset changed from 320 to 704 (in bits) (by +384 bits)
'device* pinctrl_dev::dev' offset changed from 448 to 832 (in bits) (by +384 bits)
'module* pinctrl_dev::owner' offset changed from 512 to 896 (in bits) (by +384 bits)
'void* pinctrl_dev::driver_data' offset changed from 576 to 960 (in bits) (by +384 bits)
'pinctrl* pinctrl_dev::p' offset changed from 640 to 1024 (in bits) (by +384 bits)
'pinctrl_state* pinctrl_dev::hog_default' offset changed from 704 to 1088 (in bits) (by +384 bits)
'pinctrl_state* pinctrl_dev::hog_sleep' offset changed from 768 to 1152 (in bits) (by +384 bits)
'mutex pinctrl_dev::mutex' offset changed from 832 to 1216 (in bits) (by +384 bits)
'dentry* pinctrl_dev::device_root' offset changed from 1088 to 1472 (in bits) (by +384 bits)
29 impacted interfaces:
function pinctrl_dev* devm_pinctrl_register(device*, pinctrl_desc*, void*)
function int devm_pinctrl_register_and_init(device*, pinctrl_desc*, void*, pinctrl_dev**)
function void devm_pinctrl_unregister(device*, pinctrl_dev*)
function bool pin_is_valid(pinctrl_dev*, int)
function void pinconf_generic_dt_free_map(pinctrl_dev*, pinctrl_map*, unsigned int)
function int pinconf_generic_dt_node_to_map(pinctrl_dev*, device_node*, pinctrl_map**, unsigned int*, pinctrl_map_type)
function int pinconf_generic_dt_subnode_to_map(pinctrl_dev*, device_node*, pinctrl_map**, unsigned int*, unsigned int*, pinctrl_map_type)
function void pinconf_generic_dump_config(pinctrl_dev*, seq_file*, unsigned long int)
function void pinctrl_add_gpio_range(pinctrl_dev*, pinctrl_gpio_range*)
function void pinctrl_add_gpio_ranges(pinctrl_dev*, pinctrl_gpio_range*, unsigned int)
function const char* pinctrl_dev_get_devname(pinctrl_dev*)
function void* pinctrl_dev_get_drvdata(pinctrl_dev*)
function const char* pinctrl_dev_get_name(pinctrl_dev*)
function int pinctrl_enable(pinctrl_dev*)
function pinctrl_dev* pinctrl_find_and_add_gpio_range(const char*, pinctrl_gpio_range*)
function pinctrl_gpio_range* pinctrl_find_gpio_range_from_pin(pinctrl_dev*, unsigned int)
function pinctrl_gpio_range* pinctrl_find_gpio_range_from_pin_nolock(pinctrl_dev*, unsigned int)
function int pinctrl_force_default(pinctrl_dev*)
function int pinctrl_force_sleep(pinctrl_dev*)
function int pinctrl_get_group_pins(pinctrl_dev*, const char*, const unsigned int**, unsigned int*)
function pinctrl_dev* pinctrl_register(pinctrl_desc*, device*, void*)
function int pinctrl_register_and_init(pinctrl_desc*, device*, void*, pinctrl_dev**)
function void pinctrl_remove_gpio_range(pinctrl_dev*, pinctrl_gpio_range*)
function void pinctrl_unregister(pinctrl_dev*)
function int pinctrl_utils_add_config(pinctrl_dev*, unsigned long int**, unsigned int*, unsigned long int)
function int pinctrl_utils_add_map_configs(pinctrl_dev*, pinctrl_map**, unsigned int*, unsigned int*, const char*, unsigned long int*, unsigned int, pinctrl_map_type)
function int pinctrl_utils_add_map_mux(pinctrl_dev*, pinctrl_map**, unsigned int*, unsigned int*, const char*, const char*)
function void pinctrl_utils_free_map(pinctrl_dev*, pinctrl_map*, unsigned int)
function int pinctrl_utils_reserve_map(pinctrl_dev*, pinctrl_map**, unsigned int*, unsigned int*, unsigned int)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I940293527b5aa1875b49fad611bac9fc01c2931a
Changes in 4.19.75
netfilter: nf_flow_table: set default timeout after successful insertion
HID: wacom: generic: read HID_DG_CONTACTMAX from any feature report
RDMA/restrack: Release task struct which was hold by CM_ID object
Input: elan_i2c - remove Lenovo Legion Y7000 PnpID
powerpc/mm/radix: Use the right page size for vmemmap mapping
USB: usbcore: Fix slab-out-of-bounds bug during device reset
media: tm6000: double free if usb disconnect while streaming
phy: renesas: rcar-gen3-usb2: Disable clearing VBUS in over-current
ip6_gre: fix a dst leak in ip6erspan_tunnel_xmit
udp: correct reuseport selection with connected sockets
xen-netfront: do not assume sk_buff_head list is empty in error handling
net_sched: let qdisc_put() accept NULL pointer
KVM: coalesced_mmio: add bounds checking
firmware: google: check if size is valid when decoding VPD data
serial: sprd: correct the wrong sequence of arguments
tty/serial: atmel: reschedule TX after RX was started
mwifiex: Fix three heap overflow at parsing element in cfg80211_ap_settings
nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds
ieee802154: hwsim: Fix error handle path in hwsim_init_module
ieee802154: hwsim: unregister hw while hwsim_subscribe_all_others fails
ARM: dts: am57xx: Disable voltage switching for SD card
ARM: OMAP2+: Fix missing SYSC_HAS_RESET_STATUS for dra7 epwmss
bus: ti-sysc: Fix using configured sysc mask value
s390/bpf: fix lcgr instruction encoding
ARM: OMAP2+: Fix omap4 errata warning on other SoCs
ARM: dts: dra74x: Fix iodelay configuration for mmc3
ARM: OMAP1: ams-delta-fiq: Fix missing irq_ack
bus: ti-sysc: Simplify cleanup upon failures in sysc_probe()
s390/bpf: use 32-bit index for tail calls
selftests/bpf: fix "bind{4, 6} deny specific IP & port" on s390
tools: bpftool: close prog FD before exit on showing a single program
fpga: altera-ps-spi: Fix getting of optional confd gpio
netfilter: ebtables: Fix argument order to ADD_COUNTER
netfilter: nft_flow_offload: missing netlink attribute policy
netfilter: xt_nfacct: Fix alignment mismatch in xt_nfacct_match_info
NFSv4: Fix return values for nfs4_file_open()
NFSv4: Fix return value in nfs_finish_open()
NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup
Kconfig: Fix the reference to the IDT77105 Phy driver in the description of ATM_NICSTAR_USE_IDT77105
xdp: unpin xdp umem pages in error path
qed: Add cleanup in qed_slowpath_start()
ARM: 8874/1: mm: only adjust sections of valid mm structures
batman-adv: Only read OGM2 tvlv_len after buffer len check
bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
r8152: Set memory to all 0xFFs on failed reg reads
x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines
netfilter: xt_physdev: Fix spurious error message in physdev_mt_check
netfilter: nf_conntrack_ftp: Fix debug output
NFSv2: Fix eof handling
NFSv2: Fix write regression
kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol
cifs: set domainName when a domain-key is used in multiuser
cifs: Use kzfree() to zero out the password
usb: host: xhci-tegra: Set DMA mask correctly
ARM: 8901/1: add a criteria for pfn_valid of arm
ibmvnic: Do not process reset during or after device removal
sky2: Disable MSI on yet another ASUS boards (P6Xxxx)
i2c: designware: Synchronize IRQs when unregistering slave client
perf/x86/intel: Restrict period on Nehalem
perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops
amd-xgbe: Fix error path in xgbe_mod_init()
tools/power x86_energy_perf_policy: Fix "uninitialized variable" warnings at -O2
tools/power x86_energy_perf_policy: Fix argument parsing
tools/power turbostat: fix buffer overrun
net: aquantia: fix out of memory condition on rx side
net: seeq: Fix the function used to release some memory in an error handling path
dmaengine: ti: dma-crossbar: Fix a memory leak bug
dmaengine: ti: omap-dma: Add cleanup in omap_dma_probe()
x86/uaccess: Don't leak the AC flags into __get_user() argument evaluation
x86/hyper-v: Fix overflow bug in fill_gva_list()
keys: Fix missing null pointer check in request_key_auth_describe()
iommu/amd: Flush old domains in kdump kernel
iommu/amd: Fix race in increase_address_space()
PCI: kirin: Fix section mismatch warning
ovl: fix regression caused by overlapping layers detection
floppy: fix usercopy direction
binfmt_elf: move brk out of mmap when doing direct loader exec
arm64: kpti: Whitelist Cortex-A CPUs that don't implement the CSV3 field
media: technisat-usb2: break out of loop at end of buffer
Linux 4.19.75
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1dd841f112ee81497cd085b102979f45ee5e6b9d
commit 2a355ec257 upstream.
While the CSV3 field of the ID_AA64_PFR0 CPU ID register can be checked
to see if a CPU is susceptible to Meltdown and therefore requires kpti
to be enabled, existing CPUs do not implement this field.
We therefore whitelist all unaffected Cortex-A CPUs that do not implement
the CSV3 field.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bbdc6076d2 upstream.
Commmit eab09532d4 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"),
made changes in the rare case when the ELF loader was directly invoked
(e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of
the loader), by moving into the mmap region to avoid both ET_EXEC and
PIE binaries. This had the effect of also moving the brk region into
mmap, which could lead to the stack and brk being arbitrarily close to
each other. An unlucky process wouldn't get its requested stack size
and stack allocations could end up scribbling on the heap.
This is illustrated here. In the case of using the loader directly, brk
(so helpfully identified as "[heap]") is allocated with the _loader_ not
the binary. For example, with ASLR entirely disabled, you can see this
more clearly:
$ /bin/cat /proc/self/maps
555555554000-55555555c000 r-xp 00000000 ... /bin/cat
55555575b000-55555575c000 r--p 00007000 ... /bin/cat
55555575c000-55555575d000 rw-p 00008000 ... /bin/cat
55555575d000-55555577e000 rw-p 00000000 ... [heap]
...
7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar]
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 ...
7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack]
$ /lib/x86_64-linux-gnu/ld-2.27.so /bin/cat /proc/self/maps
...
7ffff7bcc000-7ffff7bd4000 r-xp 00000000 ... /bin/cat
7ffff7bd4000-7ffff7dd3000 ---p 00008000 ... /bin/cat
7ffff7dd3000-7ffff7dd4000 r--p 00007000 ... /bin/cat
7ffff7dd4000-7ffff7dd5000 rw-p 00008000 ... /bin/cat
7ffff7dd5000-7ffff7dfc000 r-xp 00000000 ... /lib/x86_64-linux-gnu/ld-2.27.so
7ffff7fb2000-7ffff7fd6000 rw-p 00000000 ...
7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar]
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
7ffff7ffe000-7ffff8020000 rw-p 00000000 ... [heap]
7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack]
The solution is to move brk out of mmap and into ELF_ET_DYN_BASE since
nothing is there in the direct loader case (and ET_EXEC is still far
away at 0x400000). Anything that ran before should still work (i.e.
the ultimately-launched binary already had the brk very far from its
text, so this should be no different from a COMPAT_BRK standpoint). The
only risk I see here is that if someone started to suddenly depend on
the entire memory space lower than the mmap region being available when
launching binaries via a direct loader execs which seems highly
unlikely, I'd hope: this would mean a binary would _not_ work when
exec()ed normally.
(Note that this is only done under CONFIG_ARCH_HAS_ELF_RANDOMIZATION
when randomization is turned on.)
Link: http://lkml.kernel.org/r/20190422225727.GA21011@beast
Link: https://lkml.kernel.org/r/CAGXu5jJ5sj3emOT2QPxQkNQk0qbU6zEfu9=Omfhx_p0nCKPSjA@mail.gmail.com
Fixes: eab09532d4 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Ali Saidi <alisaidi@amazon.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0be0bfd2de upstream.
Once upon a time, commit 2cac0c00a6 ("ovl: get exclusive ownership on
upper/work dirs") in v4.13 added some sanity checks on overlayfs layers.
This change caused a docker regression. The root cause was mount leaks
by docker, which as far as I know, still exist.
To mitigate the regression, commit 85fdee1eef ("ovl: fix regression
caused by exclusive upper/work dir protection") in v4.14 turned the
mount errors into warnings for the default index=off configuration.
Recently, commit 146d62e5a5 ("ovl: detect overlapping layers") in
v5.2, re-introduced exclusive upper/work dir checks regardless of
index=off configuration.
This changes the status quo and mount leak related bug reports have
started to re-surface. Restore the status quo to fix the regressions.
To clarify, index=off does NOT relax overlapping layers check for this
ovelayfs mount. index=off only relaxes exclusive upper/work dir checks
with another overlayfs mount.
To cover the part of overlapping layers detection that used the
exclusive upper/work dir checks to detect overlap with self upper/work
dir, add a trap also on the work base dir.
Link: https://github.com/moby/moby/issues/34672
Link: https://lore.kernel.org/linux-fsdevel/20171006121405.GA32700@veci.piliscsaba.szeredi.hu/
Link: https://github.com/containers/libpod/issues/3540
Fixes: 146d62e5a5 ("ovl: detect overlapping layers")
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Colin Walters <walters@verbum.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6870b67350 upstream.
The PCI kirin driver compilation produces the following section mismatch
warning:
WARNING: vmlinux.o(.text+0x4758cc): Section mismatch in reference from
the function kirin_pcie_probe() to the function
.init.text:kirin_add_pcie_port()
The function kirin_pcie_probe() references
the function __init kirin_add_pcie_port().
This is often because kirin_pcie_probe lacks a __init
annotation or the annotation of kirin_add_pcie_port is wrong.
Remove '__init' from kirin_add_pcie_port() to fix it.
Fixes: fc5165db24 ("PCI: kirin: Add HiSilicon Kirin SoC PCIe controller driver")
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 754265bcab ]
After the conversion to lock-less dma-api call the
increase_address_space() function can be called without any
locking. Multiple CPUs could potentially race for increasing
the address space, leading to invalid domain->mode settings
and invalid page-tables. This has been happening in the wild
under high IO load and memory pressure.
Fix the race by locking this operation. The function is
called infrequently so that this does not introduce
a performance regression in the dma-api path again.
Reported-by: Qian Cai <cai@lca.pw>
Fixes: 256e4621c2 ('iommu/amd: Make use of the generic IOVA allocator')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 36b7200f67 ]
When devices are attached to the amd_iommu in a kdump kernel, the old device
table entries (DTEs), which were copied from the crashed kernel, will be
overwritten with a new domain number. When the new DTE is written, the IOMMU
is told to flush the DTE from its internal cache--but it is not told to flush
the translation cache entries for the old domain number.
Without this patch, AMD systems using the tg3 network driver fail when kdump
tries to save the vmcore to a network system, showing network timeouts and
(sometimes) IOMMU errors in the kernel log.
This patch will flush IOMMU translation cache entries for the old domain when
a DTE gets overwritten with a new domain number.
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Fixes: 3ac3e5ee5e ('iommu/amd: Copy old trans table from old kernel')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d41a3effbb ]
If a request_key authentication token key gets revoked, there's a window in
which request_key_auth_describe() can see it with a NULL payload - but it
makes no check for this and something like the following oops may occur:
BUG: Kernel NULL pointer dereference at 0x00000038
Faulting instruction address: 0xc0000000004ddf30
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [...] request_key_auth_describe+0x90/0xd0
LR [...] request_key_auth_describe+0x54/0xd0
Call Trace:
[...] request_key_auth_describe+0x54/0xd0 (unreliable)
[...] proc_keys_show+0x308/0x4c0
[...] seq_read+0x3d0/0x540
[...] proc_reg_read+0x90/0x110
[...] __vfs_read+0x3c/0x70
[...] vfs_read+0xb4/0x1b0
[...] ksys_read+0x7c/0x130
[...] system_call+0x5c/0x70
Fix this by checking for a NULL pointer when describing such a key.
Also make the read routine check for a NULL pointer to be on the safe side.
[DH: Modified to not take already-held rcu lock and modified to also check
in the read routine]
Fixes: 04c567d931 ("[PATCH] Keys: Fix race between two instantiators of a key")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4030b4c585 ]
When the 'start' parameter is >= 0xFF000000 on 32-bit
systems, or >= 0xFFFFFFFF'FF000000 on 64-bit systems,
fill_gva_list() gets into an infinite loop.
With such inputs, 'cur' overflows after adding HV_TLB_FLUSH_UNIT
and always compares as less than end. Memory is filled with
guest virtual addresses until the system crashes.
Fix this by never incrementing 'cur' to be larger than 'end'.
Reported-by: Jong Hyun Park <park.jonghyun@yonsei.ac.kr>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2ffd9e33ce ("x86/hyper-v: Use hypercall for remote TLB flush")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e1e54ec7fb ]
In commit 99cd149efe ("sgiseeq: replace use of dma_cache_wback_inv"),
a call to 'get_zeroed_page()' has been turned into a call to
'dma_alloc_coherent()'. Only the remove function has been updated to turn
the corresponding 'free_page()' into 'dma_free_attrs()'.
The error hndling path of the probe function has not been updated.
Fix it now.
Rename the corresponding label to something more in line.
Fixes: 99cd149efe ("sgiseeq: replace use of dma_cache_wback_inv")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be6cef69ba ]
On embedded environments with hard memory limits it is a normal although
rare case when skb can't be allocated on rx part under high traffic.
In such OOM cases napi_complete_done() was not called.
So the napi object became in an invalid state like it is "scheduled".
Kernel do not re-schedules the poll of that napi object.
Consequently, kernel can not remove that object the system hangs on
`ifconfig down` waiting for a poll.
We are fixing this by gracefully closing napi poll routine with correct
invocation of napi_complete_done.
This was reproduced with artificially failing the allocation of skb to
simulate an "out of memory" error case and check that traffic does
not get stuck.
Fixes: 970a2e9864 ("net: ethernet: aquantia: Vector operations")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eeb71c950b ]
turbostat could be terminated by general protection fault on some latest
hardwares which (for example) support 9 levels of C-states and show 18
"tADDED" lines. That bloats the total output and finally causes buffer
overrun. So let's extend the buffer to avoid this.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>