Commit Graph

22407 Commits

Author SHA1 Message Date
Tao Huang
e7f55d159a printk: add support show process information on printks
Change-Id: I34cf76388ceb2e1f6b6417638c82bf774641ebac
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
2018-02-02 19:26:07 +08:00
Tao Huang
640193f76b Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git
* linux-linaro-lsk-v4.4-android: (733 commits)
  LSK-ANDROID: memcg: Remove wrong ->attach callback
  LSK-ANDROID: arm64: mm: Fix __create_pgd_mapping() call
  ANDROID: sdcardfs: Move default_normal to superblock
  blkdev: Refactoring block io latency histogram codes
  FROMLIST: arm64: kpti: Fix the interaction between ASID switching and software PAN
  FROMLIST: arm64: Move post_ttbr_update_workaround to C code
  FROMLIST: arm64: mm: Rename post_ttbr0_update_workaround
  sched: EAS: Initialize push_task as NULL to avoid direct reference on out_unlock path
  fscrypt: updates on 4.15-rc4
  ANDROID: uid_sys_stats: fix the comment
  BACKPORT: tee: indicate privileged dev in gen_caps
  BACKPORT: tee: optee: sync with new naming of interrupts
  BACKPORT: tee: tee_shm: Constify dma_buf_ops structures.
  BACKPORT: tee: optee: interruptible RPC sleep
  BACKPORT: tee: optee: add const to tee_driver_ops and tee_desc structures
  BACKPORT: tee.txt: standardize document format
  BACKPORT: tee: add forward declaration for struct device
  BACKPORT: tee: optee: fix uninitialized symbol 'parg'
  BACKPORT: tee: add ARM_SMCCC dependency
  BACKPORT: selinux: nlmsgtab: add SOCK_DESTROY to the netlink mapping tables
  ...

Conflicts:
	arch/arm64/kernel/vdso.c
	drivers/usb/host/xhci-plat.c
	include/drm/drmP.h
	include/linux/kasan.h
	kernel/time/timekeeping.c
	mm/kasan/kasan.c
	security/selinux/nlmsgtab.c

Also add this commit:
0bcdc0987c ("time: Fix ktime_get_raw() incorrect base accumulation")
2018-01-26 19:26:47 +08:00
Ke Wang
204bbd5268 sched: EAS: Initialize push_task as NULL to avoid direct reference on out_unlock path
After applying up-migrate patches(dc626b2 sched: avoid pushing
tasks to an offline CPU, 2da014c sched: Extend active balance
to accept 'push_task' argument), leaving EAS disabled and doing
a stability test which includes some random cpu plugin/plugout.
There are two types crashes happened as below:

TYPE 1:
[ 2072.653091] c1 ------------[ cut here ]------------
[ 2072.653133] c1 WARNING: CPU: 1 PID: 13 at kernel/fork.c:252 __put_task_struct+0x30/0x124()
[ 2072.653173] c1 CPU: 1 PID: 13 Comm: migration/1 Tainted: G        W  O    4.4.83-01066-g04c5403-dirty #17
[ 2072.653215] c1 [<c011141c>] (unwind_backtrace) from [<c010ced8>] (show_stack+0x20/0x24)
[ 2072.653235] c1 [<c010ced8>] (show_stack) from [<c043d7f8>] (dump_stack+0xa8/0xe0)
[ 2072.653255] c1 [<c043d7f8>] (dump_stack) from [<c012be04>] (warn_slowpath_common+0x98/0xc4)
[ 2072.653273] c1 [<c012be04>] (warn_slowpath_common) from [<c012beec>] (warn_slowpath_null+0x2c/0x34)
[ 2072.653291] c1 [<c012beec>] (warn_slowpath_null) from [<c01293b4>] (__put_task_struct+0x30/0x124)
[ 2072.653310] c1 [<c01293b4>] (__put_task_struct) from [<c0166964>] (active_load_balance_cpu_stop+0x22c/0x314)
[ 2072.653331] c1 [<c0166964>] (active_load_balance_cpu_stop) from [<c01c2604>] (cpu_stopper_thread+0x90/0x144)
[ 2072.653352] c1 [<c01c2604>] (cpu_stopper_thread) from [<c014d80c>] (smpboot_thread_fn+0x258/0x270)
[ 2072.653370] c1 [<c014d80c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c)
[ 2072.653388] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24)
[ 2072.653400] c1 ---[ end trace 49c3d154890763fc ]---
[ 2072.653418] c1 Unable to handle kernel NULL pointer dereference at virtual address 00000000
...
[ 2072.832804] c1 [<c01ba00c>] (put_css_set) from [<c01be870>] (cgroup_free+0x6c/0x78)
[ 2072.832823] c1 [<c01be870>] (cgroup_free) from [<c01293f8>] (__put_task_struct+0x74/0x124)
[ 2072.832844] c1 [<c01293f8>] (__put_task_struct) from [<c0166964>] (active_load_balance_cpu_stop+0x22c/0x314)
[ 2072.832860] c1 [<c0166964>] (active_load_balance_cpu_stop) from [<c01c2604>] (cpu_stopper_thread+0x90/0x144)
[ 2072.832879] c1 [<c01c2604>] (cpu_stopper_thread) from [<c014d80c>] (smpboot_thread_fn+0x258/0x270)
[ 2072.832896] c1 [<c014d80c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c)
[ 2072.832914] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24)
[ 2072.832930] c1 Code: f57ff05b f590f000 e3e02000 e3a03001 (e1941f9f)
[ 2072.839208] c1 ---[ end trace 49c3d154890763fd ]---

TYPE 2:
[  214.742695] c1 ------------[ cut here ]------------
[  214.742709] c1 kernel BUG at kernel/smpboot.c:136!
[  214.742718] c1 Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[  214.748785] c1 CPU: 1 PID: 18 Comm: migration/2 Tainted: G        W  O    4.4.83-00912-g370f62c #1
[  214.748805] c1 task: ef2d9680 task.stack: ee862000
[  214.748821] c1 PC is at smpboot_thread_fn+0x168/0x270
[  214.748832] c1 LR is at smpboot_thread_fn+0xe4/0x270
...
[  214.821339] c1 [<c014d71c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c)
[  214.821363] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24)
[  214.821378] c1 Code: e5950000 e5943010 e1500003 0a000000 (e7f001f2)
[  214.827676] c1 ---[ end trace da87539f59bab8de ]---

For the first type crash, the root cause is the push_task pointer will be
used without initialization on the out_lock path. And maybe cpu hotplug in/out
make this happen more easily.

For the second type crash, it hits 'BUG_ON(td->cpu != smp_processor_id());' in
smpboot_thread_fn(). It seems that OOPS was caused by migration/2 which actually
running on cpu1. And I haven't found what actually happened.

However, after this fix, the second type crash seems gone too.

Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
2018-01-22 13:16:20 +05:30
Dmitry Vyukov
123b7a5eb3 UPSTREAM: kcov: fix comparison callback signature
Fix a silly copy-paste bug.  We truncated u32 args to u16.

Link: http://lkml.kernel.org/r/20171207101134.107168-1-dvyukov@google.com
Fixes: ded97d2c2b ("kcov: support comparison operands collection")
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: syzkaller@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 64145065
(cherry-picked from 8c0431ec452de79ef3fe998c1fbb1e3d3ac13ddd)
Change-Id: Ic3872c33d03a456640dd6fdcce3b0795765dc1c0
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Victor Chibotaru
4b4c3b015e UPSTREAM: kcov: support comparison operands collection
Enables kcov to collect comparison operands from instrumented code.
This is done by using Clang's -fsanitize=trace-cmp instrumentation
(currently not available for GCC).

The comparison operands help a lot in fuzz testing.  E.g.  they are used
in Syzkaller to cover the interiors of conditional statements with way
less attempts and thus make previously unreachable code reachable.

To allow separate collection of coverage and comparison operands two
different work modes are implemented.  Mode selection is now done via a
KCOV_ENABLE ioctl call with corresponding argument value.

Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.com
Signed-off-by: Victor Chibotaru <tchibo@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from ded97d2c2b)
Change-Id: Iaba700a3f4786048be14a5e764ccabceae114eb7
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Andrey Ryabinin
c00b857401 UPSTREAM: kcov: remove pointless current != NULL check
__sanitizer_cov_trace_pc() is a hot code, so it's worth to remove
pointless '!current' check.  Current is never NULL.

Link: http://lkml.kernel.org/r/20170929162221.32500-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from fcf4edac04)
Change-Id: Ia76e8c6cc0dc3fb796d8e8b92430fcf659b52eee
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Dmitry Vyukov
78c0349eeb UPSTREAM: kcov: support compat processes
Support compat processes in KCOV by providing compat_ioctl callback.
Compat mode uses the same ioctl callback: we have 2 commands that do not
use the argument and 1 that already checks that the arg does not overflow
INT_MAX.  This allows to use KCOV-guided fuzzing in compat processes.

Link: http://lkml.kernel.org/r/20170823100553.55812-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 7483e5d420)
Change-Id:I74b62f01941091649ce6e88b3130e4ca4274a8de
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Dmitry Vyukov
42bc836aac UPSTREAM: kcov: simplify interrupt check
in_interrupt() semantics are confusing and wrong for most users as it
also returns true when bh is disabled.  Thus we open coded a proper
check for interrupts in __sanitizer_cov_trace_pc() with a lengthy
explanatory comment.

Use the new in_task() predicate instead.

Link: http://lkml.kernel.org/r/20170321091026.139655-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: James Morse <james.morse@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from f61e869d51)
Change-Id: Ice260535314238c8f82ddc578ecaeea6177d28fc
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Alexander Popov
4f2df4ac63 UPSTREAM: kcov: make kcov work properly with KASLR enabled
Subtract KASLR offset from the kernel addresses reported by kcov.
Tested on x86_64 and AArch64 (Hikey LeMaker).

Link: http://lkml.kernel.org/r/1481417456-28826-3-git-send-email-alex.popov@linux.com
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Jon Masters <jcm@redhat.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: James Morse <james.morse@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 4983f0ab7f)
Change-Id: Ib19d7fb559f7db28314cd13a3e33e061d1dfdec9
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Kefeng Wang
96932cfd9f UPSTREAM: kcov: add more missing includes
It is fragile that some definitions acquired via transitive
dependencies, as shown in below:

atomic_*        (<linux/atomic.h>)
ENOMEM/EN*      (<linux/errno.h>)
EXPORT_SYMBOL   (<linux/export.h>)
device_initcall (<linux/init.h>)
preempt_*       (<linux/preempt.h>)

Include them to prevent possible issues.

Link: http://lkml.kernel.org/r/1481163221-40170-1-git-send-email-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from db862358a4)
Change-Id: Ia529631d2072cc795c46ae0276e51592318cd40f
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Kefeng Wang
36b8316195 UPSTREAM: kcov: add missing #include <linux/sched.h>
In __sanitizer_cov_trace_pc we use task_struct and fields within it, but
as we haven't included <linux/sched.h>, it is not guaranteed to be
defined.  While we usually happen to acquire the definition through a
transitive include, this is fragile (and hasn't been true in the past,
causing issues with backports).

Include <linux/sched.h> to avoid any fragility.

[mark.rutland@arm.com: rewrote changelog]
Link: http://lkml.kernel.org/r/1481007384-27529-1-git-send-email-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 166ad0e1e2)
Change-Id: Id5a06b927687ace8788f623fc91cc8305fde5f2d
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Andrey Konovalov
9779e195be UPSTREAM: kcov: properly check if we are in an interrupt
in_interrupt() returns a nonzero value when we are either in an
interrupt or have bh disabled via local_bh_disable().  Since we are
interested in only ignoring coverage from actual interrupts, do a proper
check instead of just calling in_interrupt().

As a result of this change, kcov will start to collect coverage from
within local_bh_disable()/local_bh_enable() sections.

Link: http://lkml.kernel.org/r/1476115803-20712-1-git-send-email-andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: James Morse <james.morse@arm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from b274c0bb39)
Change-Id: I91364ef699f9af0a57959caf65372a6559d7a6b0
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Andrey Ryabinin
928e925573 UPSTREAM: kcov: don't profile branches in kcov
Profiling 'if' statements in __sanitizer_cov_trace_pc() leads to
unbound recursion and crash:

	__sanitizer_cov_trace_pc() ->
		ftrace_likely_update ->
			__sanitizer_cov_trace_pc() ...

Define DISABLE_BRANCH_PROFILING to disable this tracer.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 36f05ae8bc)
Change-Id: I53cea027ba86b89016df3944374bdb119a2ca9dd
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
James Morse
318c10553b UPSTREAM: kcov: don't trace the code coverage code
Kcov causes the compiler to add a call to __sanitizer_cov_trace_pc() in
every basic block.  Ftrace patches in a call to _mcount() to each
function it has annotated.

Letting these mechanisms annotate each other is a bad thing.  Break the
loop by adding 'notrace' to __sanitizer_cov_trace_pc() so that ftrace
won't try to patch this code.

This patch lets arm64 with KCOV and STACK_TRACER boot.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from bdab42dfc9)
Change-Id: If7708ca761f81e0645d709b263e61493fc016e01
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Dmitry Vyukov
8c8a5ee21a BACKPORT: kernel: add kcov code coverage
kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing).  Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system.  A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/).  However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible.  It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g.  scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes.  Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch).  I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side.  The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

  https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation".  For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
typical coverage can be just a dozen of basic blocks (e.g.  an invalid
input).  In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M).  Cost of
kcov depends only on number of executed basic blocks/edges.  On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure.  But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 5c9a8750a6)
Change-Id: I17b5e04f6e89b241924e78ec32ead79c38b860ce
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Alexander Potapenko
cc19018a12 UPSTREAM: arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections
KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.

Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>.  Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.

Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from be7635e728)
Change-Id: Ib321eb9c2b76ef4785cf3fd522169f524348bd9a
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Ke Wang
2cc4764f3f sched: EAS/WALT: Don't take into account of running task's util
For upmigrating misfit running task case, the currently running
task's util has been counted into cpu_util(). Thus currently
__cpu_overutilized() which add task's uitl twice is overestimated.

Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
2018-01-22 13:15:43 +05:30
Viresh Kumar
56c4ae908c BACKPORT: schedutil: Reset cached freq if it is not in sync with next_freq
'cached_raw_freq' is used to get the next frequency quickly but should
always be in sync with sg_policy->next_freq. There are cases where it is
not and in such cases it should be reset to avoid switching to incorrect
frequencies.

Consider this case for example:
- policy->cur is 1.2 GHz (Max)
- New request comes for 780 MHz and we store that in cached_raw_freq.
- Based on 780 MHz, we calculate the effective frequency as 800 MHz.
- We then decide not to update the frequency as
  sugov_up_down_rate_limit() return true.
- Here cached_raw_freq is 780 MHz and sg_policy->next_freq is 1.2 GHz.
- Now if the utilization doesn't change in next request, then the next
  target frequency will still be 780 MHz and it will match with
  cached_raw_freq and so we will directly return 1.2 GHz instead of 800
  MHz.

BACKPORT of upstream commit 07458f6a51 ("cpufreq: schedutil: Reset
cached_raw_freq when not in sync with next_freq").

This also updates sugov_update_commit() for handling up/down tunables, which
aren't present in mainline.

Change-Id: I70bca2c5dfdb545a0471d1c9e4c5addb30ab5494
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-01-22 13:15:43 +05:30
John Stultz
aafcb0f503 BACKPORT: time: Clean up CLOCK_MONOTONIC_RAW time handling
(cherry pick from commit fc6eead7c1)

Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
remove the duplicitive tk->raw_time.tv_nsec, which can be
stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
for monotonic time).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Mentz <danielmentz@google.com>
Tested-by: Daniel Mentz <danielmentz@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 20045882
Bug: 63737556
Change-Id: I243827d21b08703a09d2d2fe738a9258be224582
2018-01-22 13:15:43 +05:30
John Stultz
581a337227 BACKPORT: time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
(cherry pick from commit 3d88d56c58)

Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.

This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.

Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Daniel Mentz <danielmentz@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "stable #4 . 8+" <stable@vger.kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Bug: 20045882
Bug: 63737556
Change-Id: I6c55dd7685f6bd212c6af9d09c527528e1dd5fa1
2018-01-22 13:15:43 +05:30
Amit Pundir
395ae9f4e6 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

Conflicts:
    kernel/fork.c
        Conflict due to Kaiser implementation in LTS 4.4.110.
    net/ipv4/raw.c
        Minor conflict due to LTS commit
        be27b620a8 ("net: ipv4: fix for a race condition in raw_sendmsg")
2018-01-22 11:50:22 +05:30
Alex Shi
13b48cd3ac Merge tag 'v4.4.112' into linux-linaro-lsk-v4.4
This is the 4.4.112 stable release
2018-01-18 12:02:00 +08:00
Daniel Borkmann
095b0ba360 bpf, array: fix overflow in max_entries and undefined behavior in index_mask
commit bbeb6e4323 upstream.

syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.

However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr->max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.

Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.

This fixes all the issues triggered by syzkaller's reproducers.

Fixes: b2157399cc ("bpf: prevent out-of-bounds speculation")
Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:31 +01:00
Alexei Starovoitov
9a7fad4c0e bpf: prevent out-of-bounds speculation
commit b2157399cc upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index >= max_entries) {
    index &= map->index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:31 +01:00
Alexei Starovoitov
648064515d bpf: adjust insn_aux_data when patching insns
commit 8041902dae upstream.

convert_ctx_accesses() replaces single bpf instruction with a set of
instructions. Adjust corresponding insn_aux_data while patching.
It's needed to make sure subsequent 'for(all insn)' loops
have matching insn and insn_aux_data.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:31 +01:00
Alexei Starovoitov
19614eee06 bpf: refactor fixup_bpf_calls()
commit 79741b3bde upstream.

reduce indent and make it iterate over instructions similar to
convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:30 +01:00
Alexei Starovoitov
14c7c55f45 bpf: move fixup_bpf_calls() function
commit e245c5c6a5 upstream.

no functional change.
move fixup_bpf_calls() to verifier.c
it's being refactored in the next patch

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:30 +01:00
Jakub Kicinski
0748b80e43 bpf: don't (ab)use instructions to store state
commit 3df126f35f upstream.

Storing state in reserved fields of instructions makes
it impossible to run verifier on programs already
marked as read-only. Allocate and use an array of
per-instruction state instead.

While touching the error path rename and move existing
jump target.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:30 +01:00
Daniel Borkmann
087a92287d bpf: add bpf_patch_insn_single helper
commit c237ee5eb3 upstream.

Move the functionality to patch instructions out of the verifier
code and into the core as the new bpf_patch_insn_single() helper
will be needed later on for blinding as well. No changes in
functionality.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:30 +01:00
Davidlohr Bueso
bd44e3f19d locking/mutex: Allow next waiter lockless wakeup
commit 1329ce6fbb upstream.

Make use of wake-queues and enable the wakeup to occur after releasing the
wait_lock. This is similar to what we do with rtmutex top waiter,
slightly shortening the critical region and allow other waiters to
acquire the wait_lock sooner. In low contention cases it can also help
the recently woken waiter to find the wait_lock available (fastpath)
when it continues execution.

Reviewed-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: Will Deacon <Will.Deacon@arm.com>
Link: http://lkml.kernel.org/r/20160125022343.GA3322@linux-uzut.site
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:27 +01:00
Jianyu Zhan
1920b8a6a6 futex: Replace barrier() in unqueue_me() with READ_ONCE()
commit 29b75eb2d5 upstream.

Commit e91467ecd1 ("bug in futex unqueue_me") introduced a barrier() in
unqueue_me() to prevent the compiler from rereading the lock pointer which
might change after a check for NULL.

Replace the barrier() with a READ_ONCE() for the following reasons:

1) READ_ONCE() is a weaker form of barrier() that affects only the specific
   load operation, while barrier() is a general compiler level memory barrier.
   READ_ONCE() was not available at the time when the barrier was added.

2) Aside of that READ_ONCE() is descriptive and self explainatory while a
   barrier without comment is not clear to the casual reader.

No functional change.

[ tglx: Massaged changelog ]

Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Cc: dave@stgolabs.net
Cc: peterz@infradead.org
Cc: linux@rasmusvillemoes.dk
Cc: akpm@linux-foundation.org
Cc: fengguang.wu@intel.com
Cc: bigeasy@linutronix.de
Link: http://lkml.kernel.org/r/1457314344-5685-1-git-send-email-nasa4836@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:27 +01:00
Alex Shi
921884d92b Merge tag 'v4.4.111' into linux-linaro-lsk-v4.4
This is the 4.4.111 stable release
2018-01-11 12:01:16 +08:00
Libor Pechacek
c819a67f7e module: Issue warnings when tainting kernel
commit 3205c36cf7 upstream.

While most of the locations where a kernel taint bit is set are accompanied
with a warning message, there are two which set their bits silently.  If
the tainting module gets unloaded later on, it is almost impossible to tell
what was the reason for setting the flag.

Signed-off-by: Libor Pechacek <lpechacek@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:14 +01:00
Miroslav Benes
7e35bc655e module: keep percpu symbols in module's symtab
commit e022441851 upstream.

Currently, percpu symbols from .data..percpu ELF section of a module are
not copied over and stored in final symtab array of struct module.
Consequently such symbol cannot be returned via kallsyms API (for
example kallsyms_lookup_name). This can be especially confusing when the
percpu symbol is exported. Only its __ksymtab et al. are present in its
symtab.

The culprit is in layout_and_allocate() function where SHF_ALLOC flag is
dropped for .data..percpu section. There is in fact no need to copy the
section to final struct module, because kernel module loader allocates
extra percpu section by itself. Unfortunately only symbols from
SHF_ALLOC sections are copied due to a check in is_core_symbol().

The patch changes is_core_symbol() function to copy over also percpu
symbols (their st_shndx points to .data..percpu ELF section). We do it
only if CONFIG_KALLSYMS_ALL is set to be consistent with the rest of the
function (ELF section is SHF_ALLOC but !SHF_EXECINSTR). Finally
elf_type() returns type 'a' for a percpu symbol because its address is
absolute.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:13 +01:00
Oleg Nesterov
5f1aa83c58 kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
commit 426915796c upstream.

complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.

If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.

After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.

This should hopefully fix the problem reported by Dmitry.  This
test-case

	static int init(void *arg)
	{
		for (;;)
			pause();
	}

	int main(void)
	{
		char stack[16 * 1024];

		for (;;) {
			int pid = clone(init, stack + sizeof(stack)/2,
					CLONE_NEWPID | SIGCHLD, NULL);
			assert(pid > 0);

			assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
			assert(waitpid(-1, NULL, WSTOPPED) == pid);

			assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
			assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
			assert(pid == wait(NULL));
		}
	}

triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop().  do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.

And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.

Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Kyle Huey <me@kylehuey.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:11 +01:00
Oleg Nesterov
7a7f54f8e3 kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
commit ac25385089 upstream.

Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
signals even if force == T.  This simplifies the next change and this
matches the same check in get_signal() which will drop these signals
anyway.

Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:11 +01:00
Oleg Nesterov
be95f1308f kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
commit 628c1bcba2 upstream.

The comment in sig_ignored() says "Tracers may want to know about even
ignored signals" but SIGKILL can not be reported to debugger and it is
just wrong to return 0 in this case: SIGKILL should only kill the
SIGNAL_UNKILLABLE task if it comes from the parent ns.

Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
sig_task_ignored().

SISGTOP coming from within the namespace is not really right too but at
least debugger can intercept it, and we can't drop it here because this
will break "gdb -p 1": ptrace_attach() won't work.  Perhaps we will add
another ->ptrace check later, we will see.

Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:10 +01:00
Thiago Rafael Becker
58330ec2fe kernel: make groups_sort calling a responsibility group_info allocators
commit bdcf0a423e upstream.

In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
 - Make groups_sort globally visible.
 - Move the call to groups_sort to the modifiers of group_info
 - Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:10 +01:00
Oleg Nesterov
83875f5825 kernel/acct.c: fix the acct->needcheck check in check_free_space()
commit 4d9570158b upstream.

As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check
is very wrong, we need time_is_after_jiffies() to make sys_acct() work.

Ignoring the overflows, the code should "goto out" if needcheck >
jiffies, while currently it checks "needcheck < jiffies" and thus in the
likely case check_free_space() does nothing until jiffies overflow.

In particular this means that sys_acct() is simply broken, acct_on()
sets acct->needcheck = jiffies and expects that check_free_space()
should set acct->active = 1 after the free-space check, but this won't
happen if jiffies increments in between.

This was broken by commit 32dc730860 ("get rid of timer in
kern/acct.c") in 2011, then another (correct) commit 795a2f22a8
("acct() should honour the limits from the very beginning") made the
problem more visible.

Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com
Fixes: 32dc730860 ("get rid of timer in kern/acct.c")
Reported-by: TSUKADA Koutaro <tsukada@ascade.co.jp>
Suggested-by: TSUKADA Koutaro <tsukada@ascade.co.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:08 +01:00
Alex Shi
9c95ff4010 Merge tag 'v4.4.110' into linux-linaro-lsk-v4.4
This is the 4.4.110 stable release
2018-01-08 12:01:03 +08:00
Hugh Dickins
003e476716 kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping().  And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Dave Hansen
bed9bb7f3e kaiser: merged update
Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Richard Fellner
8a43ddfb93 KAISER: Kernel Address Isolation
This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

        https://github.com/IAIK/KAISER

From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>

After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Alex Shi
95d4cbea02 Merge remote-tracking branch 'lts/linux-4.4.y' into linux-linaro-lsk-v4.4
Conflicts:
	early call high_memory in arch/arm64/mm/init.c
2018-01-05 14:45:31 +08:00
Thomas Gleixner
458ed31799 nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
commit 5d62c183f9 upstream.

The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:

  if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))

If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.

Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....

Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.

Reported-by: Sebastian Siewior <bigeasy@linutronix.d>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:28 +01:00
Steven Rostedt (VMware)
9dc9648942 ring-buffer: Mask out the info bits when returning buffer page length
commit 45d8b80c2a upstream.

Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.

What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.

Fixes: 66a8cb95ed ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:22 +01:00
Jing Xia
25fade614b tracing: Fix crash when it fails to alloc ring buffer
commit 24f2aaf952 upstream.

Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:

instance_mkdir()
    |-allocate_trace_buffers()
        |-allocate_trace_buffer(tr, &tr->trace_buffer...)
	|-allocate_trace_buffer(tr, &tr->max_buffer...)

          // allocate fail(-ENOMEM),first free
          // and the buffer pointer is not set to null
        |-ring_buffer_free(tr->trace_buffer.buffer)

       // out_free_tr
    |-free_trace_buffers()
        |-free_trace_buffer(&tr->trace_buffer);

	      //if trace_buffer is not null, free again
	    |-ring_buffer_free(buf->buffer)
                |-rb_free_cpu_buffer(buffer->buffers[cpu])
                    // ring_buffer_per_cpu is null, and
                    // crash in ring_buffer_per_cpu->pages

Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com

Fixes: 737223fbca ("tracing: Consolidate buffer allocation code")
Signed-off-by: Jing Xia <jing.xia@spreadtrum.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:22 +01:00
Steven Rostedt (VMware)
c2a62f84d4 tracing: Fix possible double free on failure of allocating trace buffer
commit 4397f04575 upstream.

Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.

Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com

Fixes: 737223fbca ("tracing: Consolidate buffer allocation code")
Reported-by: Jing Xia <jing.xia@spreadtrum.com>
Reported-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:22 +01:00
Steven Rostedt (VMware)
0752421016 tracing: Remove extra zeroing out of the ring buffer page
commit 6b7e633fe9 upstream.

The ring_buffer_read_page() takes care of zeroing out any extra data in the
page that it returns. There's no need to zero it out again from the
consumer. It was removed from one consumer of this function, but
read_buffers_splice_read() did not remove it, and worse, it contained a
nasty bug because of it.

Fixes: 2711ca237a ("ring-buffer: Move zeroing out excess in page to ring buffer code")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:21 +01:00
Juergen Gross
e41d630c8c UPSTREAM: genirq/PM: Properly pretend disabled state when force resuming interrupts
Interrupts with the IRQF_FORCE_RESUME flag set have also the
IRQF_NO_SUSPEND flag set. They are not disabled in the suspend path, but
must be forcefully resumed. That's used by XEN to keep IPIs enabled beyond
the suspension of device irqs. Force resume works by pretending that the
interrupt was disabled and then calling __irq_enable().

Incrementing the disabled depth counter was enough to do that, but with the
recent changes which use state flags to avoid unnecessary hardware access,
this is not longer sufficient. If the state flags are not set, then the
hardware callbacks are not invoked and the interrupt line stays disabled in
"hardware".

Set the disabled and masked state when pretending that an interrupt got
disabled by suspend.

Fixes: bf22ff45be ("genirq: Avoid unnecessary low level irq function calls")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: boris.ostrovsky@oracle.com
Link: http://lkml.kernel.org/r/20170717174703.4603-2-jgross@suse.com
(cherry picked from commit a696712c3d)

Conflicts:
	kernel/irq/internals.h
	[due to missing upstream patch: b354286eff irq: Privatize irq_common_data::state_use_accessors]

Change-Id: I94d230ff1a7ccac1234432384685b5ef9d116b0a
Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
2017-12-26 11:12:46 +08:00