commit dd86e373e0 upstream.
The package management code in RAPL relies on package mapping being
available before a CPU is started. This changed with:
9d85eb9119 ("x86/smpboot: Make logical package management more robust")
because the ACPI/BIOS information turned out to be unreliable, but that
left RAPL in broken state. This was not noticed because on a regular boot
all CPUs are online before RAPL is initialized.
A possible fix would be to reintroduce the mess which allocates a package
data structure in CPU prepare and when it turns out to already exist in
starting throw it away later in the CPU online callback. But that's a
horrible hack and not required at all because RAPL becomes functional for
perf only in the CPU online callback. That's correct because user space is
not yet informed about the CPU being onlined, so nothing caan rely on RAPL
being available on that particular CPU.
Move the allocation to the CPU online callback and simplify the hotplug
handling. At this point the package mapping is established and correct.
This also adds a missing check for available package data in the
event_init() function.
Reported-by: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 9d85eb9119 ("x86/smpboot: Make logical package management more robust")
Link: http://lkml.kernel.org/r/20170131230141.212593966@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ jwang: backport to 4.9 fix Null pointer deref during hotplug cpu.]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64aee2a965 upstream.
Regardless of which events form a group, it does not make sense for the
events to target different tasks and/or CPUs, as this leaves the group
inconsistent and impossible to schedule. The core perf code assumes that
these are consistent across (successfully intialised) groups.
Core perf code only verifies this when moving SW events into a HW
context. Thus, we can violate this requirement for pure SW groups and
pure HW groups, unless the relevant PMU driver happens to perform this
verification itself. These mismatched groups subsequently wreak havoc
elsewhere.
For example, we handle watchpoints as SW events, and reserve watchpoint
HW on a per-CPU basis at pmu::event_init() time to ensure that any event
that is initialised is guaranteed to have a slot at pmu::add() time.
However, the core code only checks the group leader's cpu filter (via
event_filter_match()), and can thus install follower events onto CPUs
violating thier (mismatched) CPU filters, potentially installing them
into a CPU without sufficient reserved slots.
This can be triggered with the below test case, resulting in warnings
from arch backends.
#define _GNU_SOURCE
#include <linux/hw_breakpoint.h>
#include <linux/perf_event.h>
#include <sched.h>
#include <stdio.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <unistd.h>
static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
int group_fd, unsigned long flags)
{
return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
}
char watched_char;
struct perf_event_attr wp_attr = {
.type = PERF_TYPE_BREAKPOINT,
.bp_type = HW_BREAKPOINT_RW,
.bp_addr = (unsigned long)&watched_char,
.bp_len = 1,
.size = sizeof(wp_attr),
};
int main(int argc, char *argv[])
{
int leader, ret;
cpu_set_t cpus;
/*
* Force use of CPU0 to ensure our CPU0-bound events get scheduled.
*/
CPU_ZERO(&cpus);
CPU_SET(0, &cpus);
ret = sched_setaffinity(0, sizeof(cpus), &cpus);
if (ret) {
printf("Unable to set cpu affinity\n");
return 1;
}
/* open leader event, bound to this task, CPU0 only */
leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
if (leader < 0) {
printf("Couldn't open leader: %d\n", leader);
return 1;
}
/*
* Open a follower event that is bound to the same task, but a
* different CPU. This means that the group should never be possible to
* schedule.
*/
ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
if (ret < 0) {
printf("Couldn't open mismatched follower: %d\n", ret);
return 1;
} else {
printf("Opened leader/follower with mismastched CPUs\n");
}
/*
* Open as many independent events as we can, all bound to the same
* task, CPU0 only.
*/
do {
ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
} while (ret >= 0);
/*
* Force enable/disble all events to trigger the erronoeous
* installation of the follower event.
*/
printf("Opened all events. Toggling..\n");
for (;;) {
prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
}
return 0;
}
Fix this by validating this requirement regardless of whether we're
moving events.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhou Chengming <zhouchengming1@huawei.com>
Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8f0f9e499 upstream.
There's a small race when function graph shutsdown and the calling of the
registered function graph entry callback. The callback must not reference
the task's ret_stack without first checking that it is not NULL. Note, when
a ret_stack is allocated for a task, it stays allocated until the task exits.
The problem here, is that function_graph is shutdown, and a new task was
created, which doesn't have its ret_stack allocated. But since some of the
functions are still being traced, the callbacks can still be called.
The normal function_graph code handles this, but starting with commit
8861dd303c ("ftrace: Access ret_stack->subtime only in the function
profiler") the profiler code references the ret_stack on function entry, but
doesn't check if it is NULL first.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196611
Fixes: 8861dd303c ("ftrace: Access ret_stack->subtime only in the function profiler")
Reported-by: lilydjwg@gmail.com
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc788f64f1 upstream.
When processing an NFSv4 WRITE operation, argp->end should never
point past the end of the data in the final page of the page list.
Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
More critical, nfsd4_decode_write is failing to increment argp->pagelen
when it increments argp->pagelist. This can cause later xdr decoders
to assume more data is available than really is, which can cause server
crashes on malformed requests.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3edede29f upstream.
Add checking for the path component length and verify it is <= the maximum
that the server advertizes via FileFsAttributeInformation.
With this patch cifs.ko will now return ENAMETOOLONG instead of ENOENT
when users to access an overlong path.
To test this, try to cd into a (non-existing) directory on a CIFS share
that has a too long name:
cd /mnt/aaaaaaaaaaaaaaa...
and it now should show a good error message from the shell:
bash: cd: /mnt/aaaaaaaaaaaaaaaa...aaaaaa: File name too long
rh bz 1153996
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42bec214d8 upstream.
The df for a SMB2 share triggers a GetInfo call for
FS_FULL_SIZE_INFORMATION. The values returned are used to populate
struct statfs.
The problem is that none of the information returned by the call
contains the total blocks available on the filesystem. Instead we use
the blocks available to the user ie. quota limitation when filling out
statfs.f_blocks. The information returned does contain Actual free units
on the filesystem and is used to populate statfs.f_bfree. For users with
quota enabled, it can lead to situations where the total free space
reported is more than the total blocks on the system ending up with df
reports like the following
# df -h /mnt/a
Filesystem Size Used Avail Use% Mounted on
//192.168.22.10/a 2.5G -2.3G 2.5G - /mnt/a
To fix this problem, we instead populate both statfs.f_bfree with the
same value as statfs.f_bavail ie. CallerAvailableAllocationUnits. This
is similar to what is done already in the code for cifs and df now
reports the quota information for the user used to mount the share.
# df --si /mnt/a
Filesystem Size Used Avail Use% Mounted on
//192.168.22.10/a 2.7G 101M 2.6G 4% /mnt/a
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Pierguido Lambri <plambri@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb87481ee8 upstream.
The .data and .bss sections were modified in the generic linker script to
pull in sections named .data.<C identifier>, which are generated by gcc with
-ffunction-sections and -fdata-sections options.
The problem with this pattern is it can also match section names that Linux
defines explicitly, e.g., .data.unlikely. This can cause Linux sections to
get moved into the wrong place.
The way to avoid this is to use ".." separators for explicit section names
(the dot character is valid in a section name but not a C identifier).
However currently there are sections which don't follow this rule, so for
now just disable the wild card by default.
Example: http://marc.info/?l=linux-arm-kernel&m=150106824024221&w=2
Fixes: b67067f117 ("kbuild: allow archs to select link dead code/data elimination")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b0db1a5bd upstream.
Performing the following task with kmemleak enabled:
# cd /sys/kernel/tracing/events/irq/irq_handler_entry/
# echo 'enable_event:kmem:kmalloc:3 if irq >' > trigger
# echo 'enable_event:kmem:kmalloc:3 if irq > 31' > trigger
# echo scan > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800b9290308 (size 32):
comm "bash", pid 1114, jiffies 4294848451 (age 141.139s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81cef5aa>] kmemleak_alloc+0x4a/0xa0
[<ffffffff81357938>] kmem_cache_alloc_trace+0x158/0x290
[<ffffffff81261c09>] create_filter_start.constprop.28+0x99/0x940
[<ffffffff812639c9>] create_filter+0xa9/0x160
[<ffffffff81263bdc>] create_event_filter+0xc/0x10
[<ffffffff812655e5>] set_trigger_filter+0xe5/0x210
[<ffffffff812660c4>] event_enable_trigger_func+0x324/0x490
[<ffffffff812652e2>] event_trigger_write+0x1a2/0x260
[<ffffffff8138cf87>] __vfs_write+0xd7/0x380
[<ffffffff8138f421>] vfs_write+0x101/0x260
[<ffffffff8139187b>] SyS_write+0xab/0x130
[<ffffffff81cfd501>] entry_SYSCALL_64_fastpath+0x1f/0xbe
[<ffffffffffffffff>] 0xffffffffffffffff
The function create_filter() is passed a 'filterp' pointer that gets
allocated, and if "set_str" is true, it is up to the caller to free it, even
on error. The problem is that the pointer is not freed by create_filter()
when set_str is false. This is a bug, and it is not up to the caller to free
the filter on error if it doesn't care about the string.
Link: http://lkml.kernel.org/r/1502705898-27571-2-git-send-email-chuhu@redhat.com
Fixes: 38b78eb85 ("tracing: Factorize filter creation")
Reported-by: Chunyu Hu <chuhu@redhat.com>
Tested-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bb0f0e73c upstream.
The clear_boot_tracer function is used to reset the default_bootup_tracer
string to prevent it from being accessed after boot, as it originally points
to init data. But since clear_boot_tracer() is called via the
init_lateinit() call, it races with the initcall for registering the hwlat
tracer. If someone adds "ftrace=hwlat" to the kernel command line, depending
on how the linker sets up the text, the saved command line may be cleared,
and the hwlat tracer never is initialized.
Simply have the clear_boot_tracer() be called by initcall_lateinit_sync() as
that's for tasks to be called after lateinit.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196551
Fixes: e7c15cd8a ("tracing: Added hardware latency tracer")
Reported-by: Zamir SUN <sztsian@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05ee29e94a upstream.
When an encoder fails to initialize the driver prints an error message
to the kernel log. The message contains the name of the encoder's DT
node, which is NULL for internal encoders. Use the of_node_full_name()
macro to avoid dereferencing a NULL pointer, print the output number to
add more context to the error, and make sure we still own a reference to
the encoder's DT node by delaying the of_node_put() call.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Thong Ho <thong.ho.px@rvc.renesas.com>
Signed-off-by: Nhan Nguyen <nhan.nguyen.yb@renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b7e8665b4 upstream.
Commit 7c05126793 ("mm, fork: make dup_mmap wait for mmap_sem for
write killable") made it possible to kill a forking task while it is
waiting to acquire its ->mmap_sem for write, in dup_mmap().
However, it was overlooked that this introduced an new error path before
a reference is taken on the mm_struct's ->exe_file. Since the
->exe_file of the new mm_struct was already set to the old ->exe_file by
the memcpy() in dup_mm(), it was possible for the mmput() in the error
path of dup_mm() to drop a reference to ->exe_file which was never
taken.
This caused the struct file to later be freed prematurely.
Fix it by updating mm_init() to NULL out the ->exe_file, in the same
place it clears other things like the list of mmaps.
This bug was found by syzkaller. It can be reproduced using the
following C program:
#define _GNU_SOURCE
#include <pthread.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>
static void *mmap_thread(void *_arg)
{
for (;;) {
mmap(NULL, 0x1000000, PROT_READ,
MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
}
}
static void *fork_thread(void *_arg)
{
usleep(rand() % 10000);
fork();
}
int main(void)
{
fork();
fork();
fork();
for (;;) {
if (fork() == 0) {
pthread_t t;
pthread_create(&t, NULL, mmap_thread, NULL);
pthread_create(&t, NULL, fork_thread, NULL);
usleep(rand() % 10000);
syscall(__NR_exit_group, 0);
}
wait(NULL);
}
}
No special kernel config options are needed. It usually causes a NULL
pointer dereference in __remove_shared_vm_struct() during exit, or in
dup_mmap() (which is usually inlined into copy_process()) during fork.
Both are due to a vm_area_struct's ->vm_file being used after it's
already been freed.
Google Bug Id: 64772007
Link: http://lkml.kernel.org/r/20170823211408.31198-1-ebiggers3@gmail.com
Fixes: 7c05126793 ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a23318feef upstream.
The commit 8503ff1665 ("i2c: designware: Avoid unnecessary resuming
during system suspend"), may suggest to the PM core to try out the so
called direct_complete path for system sleep. In this path, the PM core
treats a runtime suspended device as it's already in a proper low power
state for system sleep, which makes it skip calling the system sleep
callbacks for the device, except for the ->prepare() and the ->complete()
callbacks.
However, the PM core may unset the direct_complete flag for a parent
device, in case its child device are being system suspended before. In this
scenario, the PM core invokes the system sleep callbacks, no matter if the
device is runtime suspended or not.
Particularly in cases of an existing i2c slave device, the above path is
triggered, which breaks the assumption that the i2c device is always
runtime resumed whenever the dw_i2c_plat_suspend() is being called.
More precisely, dw_i2c_plat_suspend() calls clk_core_disable() and
clk_core_unprepare(), for an already disabled/unprepared clock, leading to
a splat in the log about clocks calls being wrongly balanced and breaking
system sleep.
To still allow the direct_complete path in cases when it's possible, but
also to keep the fix simple, let's runtime resume the i2c device in the
->suspend() callback, before continuing to put the device into low power
state.
Note, in cases when the i2c device is attached to the ACPI PM domain, this
problem doesn't occur, because ACPI's ->suspend() callback, assigned to
acpi_subsys_suspend(), already calls pm_runtime_resume() for the device.
It should also be noted that this change does not fix commit 8503ff1665
("i2c: designware: Avoid unnecessary resuming during system suspend").
Because for the non-ACPI case, the system sleep support was already broken
prior that point.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: John Stultz <john.stultz@linaro.org>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d79cee2c6 upstream.
It is necessary to explicitly set both SLC_AUX_RGN_START1 and SLC_AUX_RGN_END1
which hold MSB bits of the physical address correspondingly of region start
and end otherwise SLC region operation is executed in unpredictable manner
Without this patch, SLC flushes on HSDK (IOC disabled) were taking
seconds.
Reported-by: Vladimir Kondratiev <vladimir.kondratiev@intel.com>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: PAR40 regs only written if PAE40 exist]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c264af7be upstream.
When calling 'iso_resource_free()' for uninitialized data, this function
causes NULL pointer dereference due to its 'unit' member. This occurs when
unplugging audio and music units on IEEE 1394 bus at failure of card
registration.
This commit fixes the bug. The bug exists since kernel v4.5.
Fixes: 324540c4e0 ('ALSA: fireface: postpone sound card registration') at v4.12
Fixes: 8865a31e0f ('ALSA: firewire-motu: postpone sound card registration') at v4.12
Fixes: b610386c8a ('ALSA: firewire-tascam: deleyed registration of sound card') at v4.7
Fixes: 86c8dd7f4d ('ALSA: firewire-digi00x: delayed registration of sound card') at v4.7
Fixes: 6c29230e2a ('ALSA: oxfw: delayed registration of sound card') at v4.7
Fixes: 7d3c1d5901 ('ALSA: fireworks: delayed registration of sound card') at v4.7
Fixes: 04a2c73c97 ('ALSA: bebob: delayed registration of sound card') at v4.7
Fixes: b59fb1900b ('ALSA: dice: postpone card registration') at v4.5
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 88c54cdf61 upstream.
When user tries to replace the user-defined control TLV, the kernel
checks the change of its content via memcmp(). The problem is that
the kernel passes the return value from memcmp() as is. memcmp()
gives a non-zero negative value depending on the comparison result,
and this shall be recognized as an error code.
The patch covers that corner-case, return 1 properly for the changed
TLV.
Fixes: 8aa9b586e4 ("[ALSA] Control API - more robust TLV implementation")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07b3b5e9ed upstream.
These headsets reports a lot of: cannot set freq 44100 to ep 0x81
and need a small delay between sample rate settings, just like
Zoom R16/24. Add both headsets to the Zoom R16/24 quirk for
a 1 ms delay between control msgs.
Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c469268cd5 upstream.
If the host has protection keys disabled, we cannot read and write the
guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0. Block
the PKU cpuid bit in that case.
This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1.
Fixes: 1be0e61c1f
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 857b8de967 upstream.
sthyi should only generate a specification exception if the function
code is zero and the response buffer is not on a 4k boundary.
The current code would also test for unknown function codes if the
response buffer, that is currently only defined for function code 0,
is not on a 4k boundary and incorrectly inject a specification
exception instead of returning with condition code 3 and return code 4
(unsupported function code).
Fix this by moving the boundary check.
Fixes: 95ca2cb579 ("KVM: s390: Add sthyi emulation")
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a4eefcd0e upstream.
The sthyi inline assembly misses register r3 within the clobber
list. The sthyi instruction will always write a return code to
register "R2+1", which in this case would be r3. Due to that we may
have register corruption and see host crashes or data corruption
depending on how gcc decided to allocate and use registers during
compile time.
Fixes: 95ca2cb579 ("KVM: s390: Add sthyi emulation")
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a646580f7 upstream.
Fixed the issue that two finger scroll does not work correctly
on V8 protocol. The cause is that V8 protocol X-coordinate decode
is wrong at SS4 PLUS device. I added SS4 PLUS X decode definition.
Mote notes:
the problem manifests itself by the commit e7348396c6 ("Input: ALPS
- fix V8+ protocol handling (73 03 28)"), where a fix for the V8+
protocol was applied. Although the culprit must have been present
beforehand, the two-finger scroll worked casually even with the
wrongly reported values by some reason. It got broken by the commit
above just because it changed x_max value, and this made libinput
correctly figuring the MT events. Since the X coord is reported as
falsely doubled, the events on the right-half side go outside the
boundary, thus they are no longer handled. This resulted as a broken
two-finger scroll.
One finger event is decoded differently, and it didn't suffer from
this problem. The problem was only about MT events. --tiwai
Fixes: e7348396c6 ("Input: ALPS - fix V8+ protocol handling (73 03 28)")
Signed-off-by: Masaki Ota <masaki.ota@jp.alps.com>
Tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Paul Donohue <linux-kernel@PaulSD.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec667683c5 upstream.
Synaptics add new TP firmware ID: 0x2 and 0x3, for now both lower 2 bits
are indicated as TP. Change the constant to bitwise values.
This makes trackpoint to be recognized on Lenovo Carbon X1 Gen5 instead
of it being identified as "PS/2 Generic Mouse".
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9305706c2e ]
We have to subtract the src max from the dst min, and vice-versa, since
(e.g.) the smallest result comes from the largest subtrahend.
Fixes: 484611357c ("bpf: allow access into map value arrays")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4cabc5b186 ]
Edward reported that there's an issue in min/max value bounds
tracking when signed and unsigned compares both provide hints
on limits when having unknown variables. E.g. a program such
as the following should have been rejected:
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (18) r1 = 0xffff8a94cda93400
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+7
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
7: (7a) *(u64 *)(r10 -16) = -8
8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = -1
10: (2d) if r1 > r2 goto pc+3
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0
R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
11: (65) if r1 s> 0x1 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1
R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
12: (0f) r0 += r1
13: (72) *(u8 *)(r0 +0) = 0
R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1
R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
14: (b7) r0 = 0
15: (95) exit
What happens is that in the first part ...
8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = -1
10: (2d) if r1 > r2 goto pc+3
... r1 carries an unsigned value, and is compared as unsigned
against a register carrying an immediate. Verifier deduces in
reg_set_min_max() that since the compare is unsigned and operation
is greater than (>), that in the fall-through/false case, r1's
minimum bound must be 0 and maximum bound must be r2. Latter is
larger than the bound and thus max value is reset back to being
'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now
'R1=inv,min_value=0'. The subsequent test ...
11: (65) if r1 s> 0x1 goto pc+2
... is a signed compare of r1 with immediate value 1. Here,
verifier deduces in reg_set_min_max() that since the compare
is signed this time and operation is greater than (>), that
in the fall-through/false case, we can deduce that r1's maximum
bound must be 1, meaning with prior test, we result in r1 having
the following state: R1=inv,min_value=0,max_value=1. Given that
the actual value this holds is -8, the bounds are wrongly deduced.
When this is being added to r0 which holds the map_value(_adj)
type, then subsequent store access in above case will go through
check_mem_access() which invokes check_map_access_adj(), that
will then probe whether the map memory is in bounds based
on the min_value and max_value as well as access size since
the actual unknown value is min_value <= x <= max_value; commit
fce366a9dd ("bpf, verifier: fix alu ops against map_value{,
_adj} register types") provides some more explanation on the
semantics.
It's worth to note in this context that in the current code,
min_value and max_value tracking are used for two things, i)
dynamic map value access via check_map_access_adj() and since
commit 06c1c04972 ("bpf: allow helpers access to variable memory")
ii) also enforced at check_helper_mem_access() when passing a
memory address (pointer to packet, map value, stack) and length
pair to a helper and the length in this case is an unknown value
defining an access range through min_value/max_value in that
case. The min_value/max_value tracking is /not/ used in the
direct packet access case to track ranges. However, the issue
also affects case ii), for example, the following crafted program
based on the same principle must be rejected as well:
0: (b7) r2 = 0
1: (bf) r3 = r10
2: (07) r3 += -512
3: (7a) *(u64 *)(r10 -16) = -8
4: (79) r4 = *(u64 *)(r10 -16)
5: (b7) r6 = -1
6: (2d) if r4 > r6 goto pc+5
R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
7: (65) if r4 s> 0x1 goto pc+4
R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1
R10=fp
8: (07) r4 += 1
9: (b7) r5 = 0
10: (6a) *(u16 *)(r10 -512) = 0
11: (85) call bpf_skb_load_bytes#26
12: (b7) r0 = 0
13: (95) exit
Meaning, while we initialize the max_value stack slot that the
verifier thinks we access in the [1,2] range, in reality we
pass -7 as length which is interpreted as u32 in the helper.
Thus, this issue is relevant also for the case of helper ranges.
Resetting both bounds in check_reg_overflow() in case only one
of them exceeds limits is also not enough as similar test can be
created that uses values which are within range, thus also here
learned min value in r1 is incorrect when mixed with later signed
test to create a range:
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (18) r1 = 0xffff880ad081fa00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+7
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
7: (7a) *(u64 *)(r10 -16) = -8
8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = 2
10: (3d) if r2 >= r1 goto pc+3
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
11: (65) if r1 s> 0x4 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
12: (0f) r0 += r1
13: (72) *(u8 *)(r0 +0) = 0
R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4
R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
14: (b7) r0 = 0
15: (95) exit
This leaves us with two options for fixing this: i) to invalidate
all prior learned information once we switch signed context, ii)
to track min/max signed and unsigned boundaries separately as
done in [0]. (Given latter introduces major changes throughout
the whole verifier, it's rather net-next material, thus this
patch follows option i), meaning we can derive bounds either
from only signed tests or only unsigned tests.) There is still the
case of adjust_reg_min_max_vals(), where we adjust bounds on ALU
operations, meaning programs like the following where boundaries
on the reg get mixed in context later on when bounds are merged
on the dst reg must get rejected, too:
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (18) r1 = 0xffff89b2bf87ce00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+6
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
7: (7a) *(u64 *)(r10 -16) = -8
8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = 2
10: (3d) if r2 >= r1 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
11: (b7) r7 = 1
12: (65) if r7 s> 0x0 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp
13: (b7) r0 = 0
14: (95) exit
from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp
15: (0f) r7 += r1
16: (65) if r7 s> 0x4 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
17: (0f) r0 += r7
18: (72) *(u8 *)(r0 +0) = 0
R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
19: (b7) r0 = 0
20: (95) exit
Meaning, in adjust_reg_min_max_vals() we must also reset range
values on the dst when src/dst registers have mixed signed/
unsigned derived min/max value bounds with one unbounded value
as otherwise they can be added together deducing false boundaries.
Once both boundaries are established from either ALU ops or
compare operations w/o mixing signed/unsigned insns, then they
can safely be added to other regs also having both boundaries
established. Adding regs with one unbounded side to a map value
where the bounded side has been learned w/o mixing ops is
possible, but the resulting map value won't recover from that,
meaning such op is considered invalid on the time of actual
access. Invalid bounds are set on the dst reg in case i) src reg,
or ii) in case dst reg already had them. The only way to recover
would be to perform i) ALU ops but only 'add' is allowed on map
value types or ii) comparisons, but these are disallowed on
pointers in case they span a range. This is fine as only BPF_JEQ
and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers
which potentially turn them into PTR_TO_MAP_VALUE type depending
on the branch, so only here min/max value cannot be invalidated
for them.
In terms of state pruning, value_from_signed is considered
as well in states_equal() when dealing with adjusted map values.
With regards to breaking existing programs, there is a small
risk, but use-cases are rather quite narrow where this could
occur and mixing compares probably unlikely.
Joint work with Josef and Edward.
[0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html
Fixes: 484611357c ("bpf: allow access into map value arrays")
Reported-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fce366a9dd ]
While looking into map_value_adj, I noticed that alu operations
directly on the map_value() resp. map_value_adj() register (any
alu operation on a map_value() register will turn it into a
map_value_adj() typed register) are not sufficiently protected
against some of the operations. Two non-exhaustive examples are
provided that the verifier needs to reject:
i) BPF_AND on r0 (map_value_adj):
0: (bf) r2 = r10
1: (07) r2 += -8
2: (7a) *(u64 *)(r2 +0) = 0
3: (18) r1 = 0xbf842a00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+2
R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
7: (57) r0 &= 8
8: (7a) *(u64 *)(r0 +0) = 22
R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=8 R10=fp
9: (95) exit
from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
9: (95) exit
processed 10 insns
ii) BPF_ADD in 32 bit mode on r0 (map_value_adj):
0: (bf) r2 = r10
1: (07) r2 += -8
2: (7a) *(u64 *)(r2 +0) = 0
3: (18) r1 = 0xc24eee00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+2
R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
7: (04) (u32) r0 += (u32) 0
8: (7a) *(u64 *)(r0 +0) = 22
R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
9: (95) exit
from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
9: (95) exit
processed 10 insns
Issue is, while min_value / max_value boundaries for the access
are adjusted appropriately, we change the pointer value in a way
that cannot be sufficiently tracked anymore from its origin.
Operations like BPF_{AND,OR,DIV,MUL,etc} on a destination register
that is PTR_TO_MAP_VALUE{,_ADJ} was probably unintended, in fact,
all the test cases coming with 484611357c ("bpf: allow access
into map value arrays") perform BPF_ADD only on the destination
register that is PTR_TO_MAP_VALUE_ADJ.
Only for UNKNOWN_VALUE register types such operations make sense,
f.e. with unknown memory content fetched initially from a constant
offset from the map value memory into a register. That register is
then later tested against lower / upper bounds, so that the verifier
can then do the tracking of min_value / max_value, and properly
check once that UNKNOWN_VALUE register is added to the destination
register with type PTR_TO_MAP_VALUE{,_ADJ}. This is also what the
original use-case is solving. Note, tracking on what is being
added is done through adjust_reg_min_max_vals() and later access
to the map value enforced with these boundaries and the given offset
from the insn through check_map_access_adj().
Tests will fail for non-root environment due to prohibited pointer
arithmetic, in particular in check_alu_op(), we bail out on the
is_pointer_value() check on the dst_reg (which is false in root
case as we allow for pointer arithmetic via env->allow_ptr_leaks).
Similarly to PTR_TO_PACKET, one way to fix it is to restrict the
allowed operations on PTR_TO_MAP_VALUE{,_ADJ} registers to 64 bit
mode BPF_ADD. The test_verifier suite runs fine after the patch
and it also rejects mentioned test cases.
Fixes: 484611357c ("bpf: allow access into map value arrays")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3c2ce60bdd ]
Current limits with regards to processing program paths do not
really reflect today's needs anymore due to programs becoming
more complex and verifier smarter, keeping track of more data
such as const ALU operations, alignment tracking, spilling of
PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
smarter matching of what LLVM generates.
This also comes with the side-effect that we result in fewer
opportunities to prune search states and thus often need to do
more work to prove safety than in the past due to different
register states and stack layout where we mismatch. Generally,
it's quite hard to determine what caused a sudden increase in
complexity, it could be caused by something as trivial as a
single branch somewhere at the beginning of the program where
LLVM assigned a stack slot that is marked differently throughout
other branches and thus causing a mismatch, where verifier
then needs to prove safety for the whole rest of the program.
Subsequently, programs with even less than half the insn size
limit can get rejected. We noticed that while some programs
load fine under pre 4.11, they get rejected due to hitting
limits on more recent kernels. We saw that in the vast majority
of cases (90+%) pruning failed due to register mismatches. In
case of stack mismatches, majority of cases failed due to
different stack slot types (invalid, spill, misc) rather than
differences in spilled registers.
This patch makes pruning more aggressive by also adding markers
that sit at conditional jumps as well. Currently, we only mark
jump targets for pruning. For example in direct packet access,
these are usually error paths where we bail out. We found that
adding these markers, it can reduce number of processed insns
by up to 30%. Another option is to ignore reg->id in probing
PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
slightly as well by up to 7% observed complexity reduction as
stand-alone. Meaning, if a previous path with register type
PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
the same map X must be safe as well. Last but not least the
patch also adds a scheduling point and bumps the current limit
for instructions to be processed to a more adequate value.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 43188702b3 ]
Currently the verifier does not track imm across alu operations when
the source register is of unknown type. This adds additional pattern
matching to catch this and track imm. We've seen LLVM generating this
pattern while working on cilium.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 68a66d149a ]
This important to call qdisc_tree_reduce_backlog() after changing queue
length. Parent qdisc should deactivate class in ->qlen_notify() called from
qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero.
Missed class deactivations leads to crashes/warnings at picking packets
from empty qdisc and corrupting state at reactivating this class in future.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 86a7996cc8 ("net_sched: introduce qdisc_replace() helper")
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4f8a881acc ]
As we know in some target's checkentry it may dereference par.entryinfo
to check entry stuff inside. But when sched action calls xt_check_target,
par.entryinfo is set with NULL. It would cause kernel panic when calling
some targets.
It can be reproduce with:
# tc qd add dev eth1 ingress handle ffff:
# tc filter add dev eth1 parent ffff: u32 match u32 0 0 action xt \
-j ECN --ecn-tcp-remove
It could also crash kernel when using target CLUSTERIP or TPROXY.
By now there's no proper value for par.entryinfo in ipt_init_target,
but it can not be set with NULL. This patch is to void all these
panics by setting it with an ipt_entry obj with all members = 0.
Note that this issue has been there since the very beginning.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b024d949a3 ]
list.dev has not been initialized and so the copy_to_user is copying
data from the stack back to user space which is a potential
information leak. Fix this ensuring all of list is initialized to
zero.
Detected by CoverityScan, CID#1357894 ("Uninitialized scalar variable")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ca3d89a3eb ]
enable_4k_uar module parameter was added in patch cited below to
address the backward compatibility issue in SRIOV when the VM has
system's PAGE_SIZE uar implementation and the Hypervisor has 4k uar
implementation.
The above compatibility issue does not exist in the non SRIOV case.
In this patch, we always enable 4k uar implementation if SRIOV
is not enabled on mlx4's supported cards.
Fixes: 76e39ccf9c ("net/mlx4_core: Fix backward compatibility on VFs")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cdbeb633ca ]
In some situations tcp_send_loss_probe() can realize that it's unable
to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
realizes that the RTO was eligible to fire immediately or at some
point in the past (delta_us <= 0). Previously in such cases
tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
icsk_rto, which caused needless delays of hundreds of milliseconds
(and non-linear behavior that made reproducible testing
difficult). This commit changes the logic to schedule "overdue" RTOs
ASAP, rather than at now + icsk_rto.
Fixes: 6ba8a3b19e ("tcp: Tail loss probe (TLP)")
Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 348a400272 ]
In fib6_add(), it is possible that fib6_add_1() picks an intermediate
node and sets the node's fn->leaf to NULL in order to add this new
route. However, if fib6_add_rt2node() fails to add the new
route for some reason, fn->leaf will be left as NULL and could
potentially cause crash when fn->leaf is accessed in fib6_locate().
This patch makes sure fib6_repair_tree() is called to properly repair
fn->leaf in the above failure case.
Here is the syzkaller reported general protection fault in fib6_locate:
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline]
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] ipv6_prefix_equal include/net/ipv6.h:492 [inline]
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline]
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233
RSP: 0018:ffff8801d01a36a8 EFLAGS: 00010202
RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000
RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100
RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000
FS: 00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0
ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0
ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988
Call Trace:
[<ffffffff82a223d6>] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109
[<ffffffff82a23f9d>] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075
[<ffffffff82621359>] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450
[<ffffffff8274c1d1>] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281
[<ffffffff82613ddf>] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456
[<ffffffff8274ad38>] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline]
[<ffffffff8274ad38>] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232
[<ffffffff8274b83e>] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778
[<ffffffff82564aff>] sock_sendmsg_nosec net/socket.c:609 [inline]
[<ffffffff82564aff>] sock_sendmsg+0xcf/0x110 net/socket.c:619
[<ffffffff82564d62>] sock_write_iter+0x222/0x3a0 net/socket.c:834
[<ffffffff8178523d>] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478
[<ffffffff817853f4>] __vfs_write+0xe4/0x110 fs/read_write.c:491
[<ffffffff81786c38>] vfs_write+0x178/0x4b0 fs/read_write.c:538
[<ffffffff817892a9>] SYSC_write fs/read_write.c:585 [inline]
[<ffffffff817892a9>] SyS_write+0xd9/0x1b0 fs/read_write.c:577
[<ffffffff82c71e32>] entry_SYSCALL_64_fastpath+0x12/0x17
Note: there is no "Fixes" tag as this seems to be a bug introduced
very early.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 383143f31d ]
syzcaller reported the following use-after-free issue in rt6_select():
BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
Read of size 4 by task syz-executor1/439628
CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
Call Trace:
[<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
[<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
[<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
[<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
[<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
[<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
[<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
[<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
[<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
[<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
[<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
[<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
[<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
[<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
[<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
[<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
[<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
[<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
[<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
[<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
[<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
[<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
[<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
Object at ffff8800bc699380, in cache ip6_dst_cache size: 384
The root cause of it is that in fib6_add_rt2node(), when it replaces an
existing route with the new one, it does not update fn->rr_ptr.
This commit resets fn->rr_ptr to NULL when it points to a route which is
replaced in fib6_add_rt2node().
Fixes: 2759647247 ("ipv6: fix ECMP route replacement")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5bfd37b4de ]
syszkaller reported use-after-free in tipc [1]
When msg->rep skb is freed, set the pointer to NULL,
so that caller does not free it again.
[1]
==================================================================
BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/core/skbuff.c:1466
Read of size 8 at addr ffff8801c6e71e90 by task syz-executor5/4115
CPU: 1 PID: 4115 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #32
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x24e/0x340 mm/kasan/report.c:409
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
skb_push+0xd4/0xe0 net/core/skbuff.c:1466
tipc_nl_compat_recv+0x833/0x18f0 net/tipc/netlink_compat.c:1209
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:00007f3bc8184c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004512e9
RDX: 0000000000000020 RSI: 0000000020fdb000 RDI: 0000000000000006
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b5e76
R13: 00007f3bc8184b48 R14: 00000000004b5e86 R15: 0000000000000000
Allocated by task 4115:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
kmem_cache_alloc_node+0x13d/0x750 mm/slab.c:3651
__alloc_skb+0xf1/0x740 net/core/skbuff.c:219
alloc_skb include/linux/skbuff.h:903 [inline]
tipc_tlv_alloc+0x26/0xb0 net/tipc/netlink_compat.c:148
tipc_nl_compat_dumpit+0xf2/0x3c0 net/tipc/netlink_compat.c:248
tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe
Freed by task 4115:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kmem_cache_free+0x77/0x280 mm/slab.c:3763
kfree_skbmem+0x1a1/0x1d0 net/core/skbuff.c:622
__kfree_skb net/core/skbuff.c:682 [inline]
kfree_skb+0x165/0x4c0 net/core/skbuff.c:699
tipc_nl_compat_dumpit+0x36a/0x3c0 net/tipc/netlink_compat.c:260
tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe
The buggy address belongs to the object at ffff8801c6e71dc0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 208 bytes inside of
224-byte region [ffff8801c6e71dc0, ffff8801c6e71ea0)
The buggy address belongs to the page:
page:ffffea00071b9c40 count:1 mapcount:0 mapping:ffff8801c6e71000 index:0x0
flags: 0x200000000000100(slab)
raw: 0200000000000100 ffff8801c6e71000 0000000000000000 000000010000000c
raw: ffffea0007224a20 ffff8801d98caf48 ffff8801d9e79040 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8801c6e71d80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8801c6e71e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801c6e71e80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff8801c6e71f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801c6e71f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit eac2c68d66 ]
The while loop that performs the dma page unmapping never decrements
index counter f and hence loops forever. Fix this with a pre-decrement
on f.
Detected by CoverityScan, CID#1357309 ("Infinite loop")
Fixes: 4c3523623d ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c780a049f9 ]
While working on yet another syzkaller report, I found
that our IP_MAX_MTU enforcements were not properly done.
gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
final result can be bigger than IP_MAX_MTU :/
This is a problem because device mtu can be changed on other cpus or
threads.
While this patch does not fix the issue I am working on, it is
probably worth addressing it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 81fbfe8ada ]
As found by syzkaller, malicious users can set whatever tx_queue_len
on a tun device and eventually crash the kernel.
Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
ring buffer is not fast anyway.
Fixes: 2e0ab8ca83 ("ptr_ring: array based FIFO for pointers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 494bea39f3 ]
For sw_flow_actions, the actions_len only represents the kernel part's
size, and when we dump the actions to the userspace, we will do the
convertions, so it's true size may become bigger than the actions_len.
But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
to alloc the skbuff, so the user_skb's size may become insufficient and
oops will happen like this:
skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
<IRQ>
[<ffffffff8148be82>] skb_put+0x43/0x44
[<ffffffff8148fabf>] skb_zerocopy+0x6c/0x1f4
[<ffffffffa0290d36>] queue_userspace_packet+0x3a3/0x448 [openvswitch]
[<ffffffffa0292023>] ovs_dp_upcall+0x30/0x5c [openvswitch]
[<ffffffffa028d435>] output_userspace+0x132/0x158 [openvswitch]
[<ffffffffa01e6890>] ? ip6_rcv_finish+0x74/0x77 [ipv6]
[<ffffffffa028e277>] do_execute_actions+0xcc1/0xdc8 [openvswitch]
[<ffffffffa028e3f2>] ovs_execute_actions+0x74/0x106 [openvswitch]
[<ffffffffa0292130>] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
[<ffffffffa0292b77>] ? key_extract+0x63c/0x8d5 [openvswitch]
[<ffffffffa029848b>] ovs_vport_receive+0xa1/0xc3 [openvswitch]
[...]
Also we can find that the actions_len is much little than the orig_len:
crash> struct sw_flow_actions 0xffff8812f539d000
struct sw_flow_actions {
rcu = {
next = 0xffff8812f5398800,
func = 0xffffe3b00035db32
},
orig_len = 1384,
actions_len = 592,
actions = 0xffff8812f539d01c
}
So as a quick fix, use the orig_len instead of the actions_len to alloc
the user_skb.
Last, this oops happened on our system running a relative old kernel, but
the same risk still exists on the mainline, since we use the wrong
actions_len from the beginning.
Fixes: ccea74457b ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
Cc: Neil McKee <neil.mckee@inmon.com>
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>