Commit Graph

22005 Commits

Author SHA1 Message Date
Daniel Borkmann
a0b2c35580 bpf: don't let ldimm64 leak map addresses on unprivileged
The patch fixes two things at once:

1) It checks the env->allow_ptr_leaks and only prints the map address
to the log if we have the privileges to do so, otherwise it just dumps
0 as we would when kptr_restrict is enabled on %pK. Given the latter
is off by default and not every distro sets it, I don't want to rely
on this, hence the 0 by default for unprivileged.

2) Printing of ldimm64 in the verifier log is currently broken in that
   we don't print the full immediate, but only the 32 bit part of the
   first insn part for ldimm64. Thus, fix this up as well; it's okay
   to access, since we verified all ldimm64 earlier already (including
   just constants) through replace_map_fd_with_map_ptr().

Fixes: 1be7f75d16 ("bpf: enable non-root eBPF programs")
Fixes: cbd3570086 ("bpf: verifier (add ability to receive verification log)")
Change-Id: Icfc2f72c98470b106f6972fea3eaa26d5489c234
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-23 11:56:24 +09:00
Huang, Tao
77bab04357 Merge tag 'lsk-v4.4-17.03-android' of git://git.linaro.org/kernel/linux-linaro-stable.git
LSK 17.03 v4.4-android

* tag 'lsk-v4.4-17.03-android': (166 commits)
  Linux 4.4.55
  ext4: don't BUG when truncating encrypted inodes on the orphan list
  dm: flush queued bios when process blocks to avoid deadlock
  nfit, libnvdimm: fix interleave set cookie calculation
  s390/kdump: Use "LINUX" ELF note name instead of "CORE"
  KVM: s390: Fix guest migration for huge guests resulting in panic
  mvsas: fix misleading indentation
  serial: samsung: Continue to work if DMA request fails
  USB: serial: io_ti: fix information leak in completion handler
  USB: serial: io_ti: fix NULL-deref in interrupt callback
  USB: iowarrior: fix NULL-deref in write
  USB: iowarrior: fix NULL-deref at probe
  USB: serial: omninet: fix reference leaks at open
  USB: serial: safe_serial: fix information leak in completion handler
  usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers
  usb: host: xhci-dbg: HCIVERSION should be a binary number
  usb: gadget: function: f_fs: pass companion descriptor along
  usb: dwc3: gadget: make Set Endpoint Configuration macros safe
  usb: gadget: dummy_hcd: clear usb_gadget region before registration
  powerpc: Emulation support for load/store instructions on LE
  ...

Change-Id: I4db95bbe5b2523e19ddf22b3f65863f7f6d46632
2017-03-31 11:43:47 +08:00
Alex Shi
e0d60977f2 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-03-20 12:03:10 +08:00
Alex Shi
1c563c0006 Merge tag 'v4.4.55' into linux-linaro-lsk-v4.4
This is the 4.4.55 stable release
2017-03-20 12:03:07 +08:00
Mathieu Desnoyers
c0ef1f537a Fix: Disable sys_membarrier when nohz_full is enabled
commit 907565337e upstream.

Userspace applications should be allowed to expect the membarrier system
call with MEMBARRIER_CMD_SHARED command to issue memory barriers on
nohz_full CPUs, but synchronize_sched() does not take those into
account.

Given that we do not want unrelated processes to be able to affect
real-time sensitive nohz_full CPUs, simply return ENOSYS when membarrier
is invoked on a kernel with enabled nohz_full CPUs.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Lai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:37:26 +01:00
Huang, Tao
5ed6b099c8 Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git
* linux-linaro-lsk-v4.4-android: (434 commits)
  Linux 4.4.52
  kvm: vmx: ensure VMCS is current while enabling PML
  Revert "usb: chipidea: imx: enable CI_HDRC_SET_NON_ZERO_TTHA"
  rtlwifi: rtl_usb: Fix for URB leaking when doing ifconfig up/down
  block: fix double-free in the failure path of cgwb_bdi_init()
  goldfish: Sanitize the broken interrupt handler
  x86/platform/goldfish: Prevent unconditional loading
  USB: serial: ark3116: fix register-accessor error handling
  USB: serial: opticon: fix CTS retrieval at open
  USB: serial: spcp8x5: fix modem-status handling
  USB: serial: ftdi_sio: fix line-status over-reporting
  USB: serial: ftdi_sio: fix extreme low-latency setting
  USB: serial: ftdi_sio: fix modem-status error handling
  USB: serial: cp210x: add new IDs for GE Bx50v3 boards
  USB: serial: mos7840: fix another NULL-deref at open
  tty: serial: msm: Fix module autoload
  net: socket: fix recvmmsg not returning error from sock_error
  ip: fix IP_CHECKSUM handling
  irda: Fix lockdep annotations in hashbin_delete().
  dccp: fix freeing skb too early for IPV6_RECVPKTINFO
  ...

Conflicts:
	drivers/mmc/core/mmc.c
	drivers/usb/dwc3/ep0.c
	drivers/usb/host/xhci.h

Change-Id: Icf331a68162ab686d01996a3f43fa2e97543f62e
2017-03-01 18:40:28 +08:00
Alex Shi
ba0cfa06c5 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-02-24 12:03:21 +08:00
Alex Shi
aedb4a24b9 Merge tag 'v4.4.51' into linux-linaro-lsk-v4.4
This is the 4.4.51 stable release
2017-02-24 12:03:18 +08:00
Sergey Senozhatsky
efa061998d printk: use rcuidle console tracepoint
commit fc98c3c8c9 upstream.

Use rcuidle console tracepoint because, apparently, it may be issued
from an idle CPU:

  hw-breakpoint: Failed to enable monitor mode on CPU 0.
  hw-breakpoint: CPU 0 failed to disable vector catch

  ===============================
  [ ERR: suspicious RCU usage.  ]
  4.10.0-rc8-next-20170215+ #119 Not tainted
  -------------------------------
  ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  RCU used illegally from idle CPU!
  rcu_scheduler_active = 2, debug_locks = 0
  RCU used illegally from extended quiescent state!
  2 locks held by swapper/0/0:
   #0:  (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
   #1:  (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474

  stack backtrace:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119
  Hardware name: Generic OMAP4 (Flattened Device Tree)
    console_unlock
    vprintk_emit
    vprintk_default
    printk
    reset_ctrl_regs
    dbg_cpu_pm_notify
    notifier_call_chain
    cpu_pm_exit
    omap_enter_idle_coupled
    cpuidle_enter_state
    cpuidle_enter_state_coupled
    do_idle
    cpu_startup_entry
    start_kernel

This RCU warning, however, is suppressed by lockdep_off() in printk().
lockdep_off() increments the ->lockdep_recursion counter and thus
disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want
lockdep to be enabled "current->lockdep_recursion == 0".

Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Russell King <rmk@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-23 17:43:10 +01:00
Yang Yang
e6394c7d1c futex: Move futex_init() to core_initcall
commit 25f71d1c3e upstream.

The UEVENT user mode helper is enabled before the initcalls are executed
and is available when the root filesystem has been mounted.

The user mode helper is triggered by device init calls and the executable
might use the futex syscall.

futex_init() is marked __initcall which maps to device_initcall, but there
is no guarantee that futex_init() is invoked _before_ the first device init
call which triggers the UEVENT user mode helper.

If the user mode helper uses the futex syscall before futex_init() then the
syscall crashes with a NULL pointer dereference because the futex subsystem
has not been initialized yet.

Move futex_init() to core_initcall so futexes are initialized before the
root filesystem is mounted and the usermode helper becomes available.

[ tglx: Rewrote changelog ]

Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: jiang.biao2@zte.com.cn
Cc: jiang.zhengxiong@zte.com.cn
Cc: zhong.weidong@zte.com.cn
Cc: deng.huali@zte.com.cn
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-23 17:43:09 +01:00
Alex Shi
106bdd9b95 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-02-10 12:01:01 +08:00
Alex Shi
fd0d0fd17f Merge tag 'v4.4.48' into linux-linaro-lsk-v4.4
This is the 4.4.48 stable release
2017-02-10 12:00:58 +08:00
Peter Zijlstra
d49d465d17 perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory
commit 0b3589be9b upstream.

Andres reported that MMAP2 records for anonymous memory always have
their protection field 0.

Turns out, someone daft put the prot/flags generation code in the file
branch, leaving them unset for anonymous memory.

Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Don Zickus <dzickus@redhat.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: anton@ozlabs.org
Cc: namhyung@kernel.org
Fixes: f972eb63b1 ("perf: Pass protection and flags bits through mmap2 interface")
Link: http://lkml.kernel.org/r/20170126221508.GF6536@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-09 08:02:45 +01:00
Alex Shi
5bee6fb934 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-02-04 12:11:17 +08:00
Alex Shi
efa59a01f7 Merge tag 'v4.4.46' into linux-linaro-lsk-v4.4
This is the 4.4.46 stable release
2017-02-04 12:11:15 +08:00
Eric Dumazet
d1b232c2ce sysctl: fix proc_doulongvec_ms_jiffies_minmax()
commit ff9f8a7cf9 upstream.

We perform the conversion between kernel jiffies and ms only when
exporting kernel value to user space.

We need to do the opposite operation when value is written by user.

Only matters when HZ != 1000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:30:52 +01:00
Alex Shi
b4bbeeb816 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-01-22 12:01:43 +08:00
Alex Shi
261e8dbdb9 Merge tag 'v4.4.44' into linux-linaro-lsk-v4.4
This is the 4.4.44 stable release
2017-01-22 12:01:41 +08:00
David Matlack
3d27cd4b25 jump_labels: API for flushing deferred jump label updates
commit b6416e6101 upstream.

Modules that use static_key_deferred need a way to synchronize with
any delayed work that is still pending when the module is unloaded.
Introduce static_key_deferred_flush() which flushes any pending
jump label updates.

Signed-off-by: David Matlack <dmatlack@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19 20:17:19 +01:00
Dan Williams
70429b970b mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
commit f931ab479d upstream.

Both arch_add_memory() and arch_remove_memory() expect a single threaded
context.

For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does
not hold any locks over this check and branch:

    if (pgd_val(*pgd)) {
    	pud = (pud_t *)pgd_page_vaddr(*pgd);
    	paddr_last = phys_pud_init(pud, __pa(vaddr),
    				   __pa(vaddr_end),
    				   page_size_mask);
    	continue;
    }

    pud = alloc_low_page();
    paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
    			   page_size_mask);

The result is that two threads calling devm_memremap_pages()
simultaneously can end up colliding on pgd initialization.  This leads
to crash signatures like the following where the loser of the race
initializes the wrong pgd entry:

    BUG: unable to handle kernel paging request at ffff888ebfff0000
    IP: memcpy_erms+0x6/0x10
    PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13
    task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000
    RIP: memcpy_erms+0x6/0x10
    [..]
    Call Trace:
      ? pmem_do_bvec+0x205/0x370 [nd_pmem]
      ? blk_queue_enter+0x3a/0x280
      pmem_rw_page+0x38/0x80 [nd_pmem]
      bdev_read_page+0x84/0xb0

Hold the standard memory hotplug mutex over calls to
arch_{add,remove}_memory().

Fixes: 41e94a8513 ("add devm_memremap_pages")
Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19 20:17:18 +01:00
Brendan Jackman
ee620ddd65 DEBUG: sched/fair: Fix sched_load_avg_cpu events for task_groups
The current sched_load_avg_cpu event traces the load for any cfs_rq that is
updated. This is not representative of the CPU load - instead we should only
trace this event when the cfs_rq being updated is in the root_task_group.

Change-Id: I345c2f13f6b5718cb4a89beb247f7887ce97ed6b
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-01-16 15:03:08 +05:30
Brendan Jackman
52a2ef75c3 DEBUG: sched/fair: Fix missing sched_load_avg_cpu events
update_cfs_rq_load_avg is called from update_blocked_averages without triggering
the sched_load_avg_cpu event. Move the event trigger to inside
update_cfs_rq_load_avg to avoid this missing event.

Change-Id: I6c4f66f687a644e4e7f798db122d28a8f5919b7b
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-01-16 15:03:08 +05:30
Alex Shi
e30546378e Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-01-13 12:01:52 +08:00
Alex Shi
99d4c5fe0b Merge tag 'v4.4.42' into linux-linaro-lsk-v4.4
This is the 4.4.42 stable release
2017-01-13 12:01:45 +08:00
Thomas Gleixner
6053479cbb tick/broadcast: Prevent NULL pointer dereference
commit c1a9eeb938 upstream.

When a disfunctional timer, e.g. dummy timer, is installed, the tick core
tries to setup the broadcast timer.

If no broadcast device is installed, the kernel crashes with a NULL pointer
dereference in tick_broadcast_setup_oneshot() because the function has no
sanity check.

Reported-by: Mason <slash.tmp@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Sebastian Frias <sf84@laposte.net>
Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-12 11:22:51 +01:00
Arnd Bergmann
56ef587b77 stable-fixup: hotplug: fix unused function warning
[resolves a messed up backport, so no matching upstream commit]

The backport of upstream commit 777c6e0dae ("hotplug: Make
register and unregister notifier API symmetric") to linux-4.4.y
introduced a harmless warning in 'allnoconfig' builds as spotted by
kernelci.org:

kernel/cpu.c:226:13: warning: 'cpu_notify_nofail' defined but not used [-Wunused-function]

So far, this is the only stable tree that is affected, as linux-4.6 and
higher contain commit 984581728e ("cpu/hotplug: Split out cpu down functions")
that makes the function used in all configurations, while older longterm
releases so far don't seem to have a backport of 777c6e0dae.

The fix for the warning is trivial: move the unused function back
into the #ifdef section where it was before.

Link: https://kernelci.org/build/id/586fcacb59b514049ef6c3aa/logs/
Fixes: 1c0f4e0ebb ("hotplug: Make register and unregister notifier API symmetric") in v4.4.y
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-12 11:22:48 +01:00
Huang, Tao
7b2f394bb8 Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git
* linux-linaro-lsk-v4.4-android: (199 commits)
  Linux 4.4.41
  net: mvpp2: fix dma unmapping of TX buffers for fragments
  sg_write()/bsg_write() is not fit to be called under KERNEL_DS
  kconfig/nconf: Fix hang when editing symbol with a long prompt
  target/user: Fix use-after-free of tcmu_cmds if they are expired
  powerpc: Convert cmp to cmpd in idle enter sequence
  powerpc/ps3: Fix system hang with GCC 5 builds
  nfs_write_end(): fix handling of short copies
  libceph: verify authorize reply on connect
  PCI: Check for PME in targeted sleep state
  Input: drv260x - fix input device's parent assignment
  media: solo6x10: fix lockup by avoiding delayed register write
  IB/cma: Fix a race condition in iboe_addr_get_sgid()
  IB/multicast: Check ib_find_pkey() return value
  IPoIB: Avoid reading an uninitialized member variable
  IB/mad: Fix an array index check
  fgraph: Handle a case where a tracer ignores set_graph_notrace
  platform/x86: asus-nb-wmi.c: Add X45U quirk
  ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it
  kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)
  ...

Change-Id: I8c8467700d5563d9a1121c982737ff0ab6d9cdc9
2017-01-10 16:07:06 +08:00
Alex Shi
7785301d92 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-01-10 12:01:14 +08:00
Alex Shi
f02e043c5e Merge tag 'v4.4.41' into linux-linaro-lsk-v4.4
This is the 4.4.41 stable release
2017-01-10 12:01:08 +08:00
Steven Rostedt (Red Hat)
e945df4c6b fgraph: Handle a case where a tracer ignores set_graph_notrace
commit 794de08a16 upstream.

Both the wakeup and irqsoff tracers can use the function graph tracer when
the display-graph option is set. The problem is that they ignore the notrace
file, and record the entry of functions that would be ignored by the
function_graph tracer. This causes the trace->depth to be recorded into the
ring buffer. The set_graph_notrace uses a trick by adding a large negative
number to the trace->depth when a graph function is to be ignored.

On trace output, the graph function uses the depth to record a stack of
functions. But since the depth is negative, it accesses the array with a
negative number and causes an out of bounds access that can cause a kernel
oops or corrupt data.

Have the print functions handle cases where a tracer still records functions
even when they are in set_graph_notrace.

Also add warnings if the depth is below zero before accessing the array.

Note, the function graph logic will still prevent the return of these
functions from being recorded, which means that they will be left hanging
without a return. For example:

   # echo '*spin*' > set_graph_notrace
   # echo 1 > options/display-graph
   # echo wakeup > current_tracer
   # cat trace
   [...]
      _raw_spin_lock() {
        preempt_count_add() {
        do_raw_spin_lock() {
      update_rq_clock();

Where it should look like:

      _raw_spin_lock() {
        preempt_count_add();
        do_raw_spin_lock();
      }
      update_rq_clock();

Cc: Namhyung Kim <namhyung.kim@lge.com>
Fixes: 29ad23b004 ("ftrace: Add set_graph_notrace filter")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09 08:07:50 +01:00
Thomas Gleixner
e01b04be3e timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
commit 9c1645727b upstream.

The clocksource delta to nanoseconds conversion is using signed math, but
the delta is unsigned. This makes the conversion space smaller than
necessary and in case of a multiplication overflow the conversion can
become negative. The conversion is done with scaled math:

    s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;

Shifting a signed integer right obvioulsy preserves the sign, which has
interesting consequences:

 - Time jumps backwards

 - __iter_div_u64_rem() which is used in one of the calling code pathes
   will take forever to piecewise calculate the seconds/nanoseconds part.

This has been reported by several people with different scenarios:

David observed that when stopping a VM with a debugger:

 "It was essentially the stopped by debugger case.  I forget exactly why,
  but the guest was being explicitly stopped from outside, it wasn't just
  scheduling lag.  I think it was something in the vicinity of 10 minutes
  stopped."

 When lifting the stop the machine went dead.

The stopped by debugger case is not really interesting, but nevertheless it
would be a good thing not to die completely.

But this was also observed on a live system by Liav:

 "When the OS is too overloaded, delta will get a high enough value for the
  msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
  after the shift the nsec variable will gain a value similar to
  0xffffffffff000000."

Unfortunately this has been reintroduced recently with commit 6bd58f09e1
("time: Add cycles to nanoseconds translation"). It had been fixed a year
ago already in commit 35a4933a89 ("time: Avoid signed overflow in
timekeeping_get_ns()").

Though it's not surprising that the issue has been reintroduced because the
function itself and the whole call chain uses s64 for the result and the
propagation of it. The change in this recent commit is subtle:

   s64 nsec;

-  nsec = (d * m + n) >> s:
+  nsec = d * m + n;
+  nsec >>= s;

d being type of cycle_t adds another level of obfuscation.

This wouldn't have happened if the previous change to unsigned computation
would have made the 'nsec' variable u64 right away and a follow up patch
had cleaned up the whole call chain.

There have been patches submitted which basically did a revert of the above
patch leaving everything else unchanged as signed. Back to square one. This
spawned a admittedly pointless discussion about potential users which rely
on the unsigned behaviour until someone pointed out that it had been fixed
before. The changelogs of said patches added further confusion as they made
finally false claims about the consequences for eventual users which expect
signed results.

Despite delta being cycle_t, aka. u64, it's very well possible to hand in
a signed negative value and the signed computation will happily return the
correct result. But nobody actually sat down and analyzed the code which
was added as user after the propably unintended signed conversion.

Though in sensitive code like this it's better to analyze it proper and
make sure that nothing relies on this than hunting the subtle wreckage half
a year later. After analyzing all call chains it stands that no caller can
hand in a negative value (which actually would work due to the s64 cast)
and rely on the signed math to do the right thing.

Change the conversion function to unsigned math. The conversion of all call
chains is done in a follow up patch.

This solves the starvation issue, which was caused by the negative result,
but it does not solve the underlying problem. It merily procrastinates
it. When the timekeeper update is deferred long enough that the unsigned
multiplication overflows, then time going backwards is observable again.

It does neither solve the issue of clocksources with a small counter width
which will wrap around possibly several times and cause random time stamps
to be generated. But those are usually not found on systems used for
virtualization, so this is likely a non issue.

I took the liberty to claim authorship for this simply because
analyzing all callsites and writing the changelog took substantially
more time than just making the simple s/s64/u64/ change and ignore the
rest.

Fixes: 6bd58f09e1 ("time: Add cycles to nanoseconds translation")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Reported-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09 08:07:43 +01:00
Alex Shi
19192a140a Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-01-09 12:01:35 +08:00
Alex Shi
eaa88578f2 Merge tag 'v4.4.40' into linux-linaro-lsk-v4.4
This is the 4.4.40 stable release
2017-01-09 12:01:31 +08:00
Douglas Anderson
f93777c915 kernel/debug/debug_core.c: more properly delay for secondary CPUs
commit 2d13bb6494 upstream.

We've got a delay loop waiting for secondary CPUs.  That loop uses
loops_per_jiffy.  However, loops_per_jiffy doesn't actually mean how
many tight loops make up a jiffy on all architectures.  It is quite
common to see things like this in the boot log:

  Calibrating delay loop (skipped), value calculated using timer
  frequency.. 48.00 BogoMIPS (lpj=24000)

In my case I was seeing lots of cases where other CPUs timed out
entering the debugger only to print their stack crawls shortly after the
kdb> prompt was written.

Elsewhere in kgdb we already use udelay(), so that should be safe enough
to use to implement our timeout.  We'll delay 1 ms for 1000 times, which
should give us a full second of delay (just like the old code wanted)
but allow us to notice that we're done every 1 ms.

[akpm@linux-foundation.org: simplifications, per Daniel]
Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.org
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Brian Norris <briannorris@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06 11:16:16 +01:00
Konstantin Khlebnikov
f2b8b3455b kernel/watchdog: use nmi registers snapshot in hardlockup handler
commit 4d1f0fb096 upstream.

NMI handler doesn't call set_irq_regs(), it's set only by normal IRQ.
Thus get_irq_regs() returns NULL or stale registers snapshot with IP/SP
pointing to the code interrupted by IRQ which was interrupted by NMI.
NULL isn't a problem: in this case watchdog calls dump_stack() and
prints full stack trace including NMI.  But if we're stuck in IRQ
handler then NMI watchlog will print stack trace without IRQ part at
all.

This patch uses registers snapshot passed into NMI handler as arguments:
these registers point exactly to the instruction interrupted by NMI.

Fixes: 55537871ef ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup")
Link: http://lkml.kernel.org/r/146771764784.86724.6006627197118544150.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06 11:16:16 +01:00
Eric W. Biederman
b35f34f669 exec: Ensure mm->user_ns contains the execed files
commit f84df2a6f2 upstream.

When the user namespace support was merged the need to prevent
ptrace from revealing the contents of an unreadable executable
was overlooked.

Correct this oversight by ensuring that the executed file
or files are in mm->user_ns, by adjusting mm->user_ns.

Use the new function privileged_wrt_inode_uidgid to see if
the executable is a member of the user namespace, and as such
if having CAP_SYS_PTRACE in the user namespace should allow
tracing the executable.  If not update mm->user_ns to
the parent user namespace until an appropriate parent is found.

Reported-by: Jann Horn <jann@thejh.net>
Fixes: 9e4a36ece6 ("userns: Fail exec for suid and sgid binaries with ids outside our user namespace.")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06 11:16:14 +01:00
Eric W. Biederman
1c1f15f8eb ptrace: Capture the ptracer's creds not PT_PTRACE_CAP
commit 64b875f7ac upstream.

When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
overlooked.  This can result in incorrect behavior when an application
like strace traces an exec of a setuid executable.

Further PT_PTRACE_CAP does not have enough information for making good
security decisions as it does not report which user namespace the
capability is in.  This has already allowed one mistake through
insufficient granulariy.

I found this issue when I was testing another corner case of exec and
discovered that I could not get strace to set PT_PTRACE_CAP even when
running strace as root with a full set of caps.

This change fixes the above issue with strace allowing stracing as
root a setuid executable without disabling setuid.  More fundamentaly
this change allows what is allowable at all times, by using the correct
information in it's decision.

Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06 11:16:11 +01:00
Eric W. Biederman
03eed7afbc mm: Add a user_ns owner to mm_struct and fix ptrace permission checks
commit bfedb58925 upstream.

During exec dumpable is cleared if the file that is being executed is
not readable by the user executing the file.  A bug in
ptrace_may_access allows reading the file if the executable happens to
enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).

This problem is fixed with only necessary userspace breakage by adding
a user namespace owner to mm_struct, captured at the time of exec, so
it is clear in which user namespace CAP_SYS_PTRACE must be present in
to be able to safely give read permission to the executable.

The function ptrace_may_access is modified to verify that the ptracer
has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
This ensures that if the task changes it's cred into a subordinate
user namespace it does not become ptraceable.

The function ptrace_attach is modified to only set PT_PTRACE_CAP when
CAP_SYS_PTRACE is held over task->mm->user_ns.  The intent of
PT_PTRACE_CAP is to be a flag to note that whatever permission changes
the task might go through the tracer has sufficient permissions for
it not to be an issue.  task->cred->user_ns is always the same
as or descendent of mm->user_ns.  Which guarantees that having
CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
credentials.

To prevent regressions mm->dumpable and mm->user_ns are not considered
when a task has no mm.  As simply failing ptrace_may_attach causes
regressions in privileged applications attempting to read things
such as /proc/<pid>/stat

Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
Fixes: 8409cca705 ("userns: allow ptrace from non-init user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06 11:16:11 +01:00
Juri Lelli
ede03b5d80 sched/walt: kill {min,max}_capacity
{min,max}_capacity are static variables that are only updated from
__update_min_max_capacity(), but not used anywhere else.

Remove them together with the function updating them. This has also
the nice side effect of fixing a LOCKDEP warning related to locking
all CPUs in update_min_max_capacity(), as reported by Ke Wang:

[    2.853595] c0 =============================================
[    2.859219] c0 [ INFO: possible recursive locking detected ]
[    2.864852] c0 4.4.6+ #5 Tainted: G        W
[    2.869604] c0 ---------------------------------------------
[    2.875230] c0 swapper/0/1 is trying to acquire lock:
[    2.880248]  (&rq->lock){-.-.-.}, at: [<ffffff80081241cc>] cpufreq_notifier_policy+0x2e8/0x37c
[    2.888815] c0
[    2.888815] c0 but task is already holding lock:
[    2.895132]  (&rq->lock){-.-.-.}, at: [<ffffff80081241cc>] cpufreq_notifier_policy+0x2e8/0x37c
[    2.903700] c0
[    2.903700] c0 other info that might help us debug this:
[    2.910710] c0  Possible unsafe locking scenario:
[    2.910710] c0
[    2.917112] c0        CPU0
[    2.919795] c0        ----
[    2.922478]   lock(&rq->lock);
[    2.925507]   lock(&rq->lock);
[    2.928536] c0
[    2.928536] c0  *** DEADLOCK ***
[    2.928536] c0
[    2.935200] c0  May be due to missing lock nesting notation
[    2.935200] c0
[    2.942471] c0 7 locks held by swapper/0/1:
[    2.946623]  #0:  (&dev->mutex){......}, at: [<ffffff800850e118>] __driver_attach+0x64/0xb8
[    2.954931]  #1:  (&dev->mutex){......}, at: [<ffffff800850e128>] __driver_attach+0x74/0xb8
[    2.963239]  #2:  (cpu_hotplug.lock){++++++}, at: [<ffffff80080cb218>] get_online_cpus+0x48/0xa8
[    2.971979]  #3:  (subsys mutex#6){+.+.+.}, at: [<ffffff800850bed4>] subsys_interface_register+0x44/0xc0
[    2.981411]  #4:  (&policy->rwsem){+.+.+.}, at: [<ffffff8008720338>] cpufreq_online+0x330/0x76c
[    2.990065]  #5:  ((cpufreq_policy_notifier_list).rwsem){.+.+..}, at: [<ffffff80080f3418>] blocking_notifier_call_chain+0x38/0xc4
[    3.001661]  #6:  (&rq->lock){-.-.-.}, at: [<ffffff80081241cc>] cpufreq_notifier_policy+0x2e8/0x37c
[    3.010661] c0
[    3.010661] c0 stack backtrace:
[    3.015514] c0 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W 4.4.6+ #5
[    3.022864] c0 Hardware name: Spreadtrum SP9860g Board (DT)
[    3.028402] c0 Call trace:
[    3.031092] c0 [<ffffff800808b50c>] dump_backtrace+0x0/0x210
[    3.036716] c0 [<ffffff800808b73c>] show_stack+0x20/0x28
[    3.041994] c0 [<ffffff8008433310>] dump_stack+0xa8/0xe0
[    3.047273] c0 [<ffffff80081349e0>] __lock_acquire+0x1e0c/0x2218
[    3.053243] c0 [<ffffff80081353c0>] lock_acquire+0xe0/0x280
[    3.058784] c0 [<ffffff8008abfdfc>] _raw_spin_lock+0x44/0x58
[    3.064407] c0 [<ffffff80081241cc>] cpufreq_notifier_policy+0x2e8/0x37c
[    3.070983] c0 [<ffffff80080f3458>] blocking_notifier_call_chain+0x78/0xc4
[    3.077820] c0 [<ffffff8008720294>] cpufreq_online+0x28c/0x76c
[    3.083618] c0 [<ffffff80087208a4>] cpufreq_add_dev+0x98/0xdc
[    3.089331] c0 [<ffffff800850bf14>] subsys_interface_register+0x84/0xc0
[    3.095907] c0 [<ffffff800871fa0c>] cpufreq_register_driver+0x168/0x28c
[    3.102486] c0 [<ffffff80087272f8>] sprd_cpufreq_probe+0x134/0x19c
[    3.108629] c0 [<ffffff8008510768>] platform_drv_probe+0x58/0xd0
[    3.114599] c0 [<ffffff800850de2c>] driver_probe_device+0x1e8/0x470
[    3.120830] c0 [<ffffff800850e168>] __driver_attach+0xb4/0xb8
[    3.126541] c0 [<ffffff800850b750>] bus_for_each_dev+0x6c/0xac
[    3.132339] c0 [<ffffff800850d6c0>] driver_attach+0x2c/0x34
[    3.137877] c0 [<ffffff800850d234>] bus_add_driver+0x210/0x298
[    3.143676] c0 [<ffffff800850f1f4>] driver_register+0x7c/0x114
[    3.149476] c0 [<ffffff8008510654>] __platform_driver_register+0x60/0x6c
[    3.156139] c0 [<ffffff8008f49f40>] sprd_cpufreq_platdrv_init+0x18/0x20
[    3.162714] c0 [<ffffff8008082a64>] do_one_initcall+0xd0/0x1d8
[    3.168514] c0 [<ffffff8008f0bc58>] kernel_init_freeable+0x1fc/0x29c
[    3.174834] c0 [<ffffff8008ab554c>] kernel_init+0x20/0x12c
[    3.180281] c0 [<ffffff8008086290>] ret_from_fork+0x10/0x40

Reported-by: Ke Wang <ke.wang@spreadtrum.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2017-01-02 13:54:08 +05:30
Alex Shi
35194db1b7 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-12-20 13:48:11 +08:00
Alex Shi
6381499f58 Merge remote-tracking branch 'lts/linux-4.4.y' into linux-linaro-lsk-v4.4
Conflicts:
	replaced with _ASM_EXTABLE() in arch/arm64/include/asm/futex.h
2016-12-20 13:47:14 +08:00
Michal Hocko
1c0f4e0ebb hotplug: Make register and unregister notifier API symmetric
commit 777c6e0dae upstream.

Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
notifiers when HOTPLUG_CPU=y while the registration might succeed even
when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
might keep a stale notifier on the list on the manual clean up during
the pool tear down and thus corrupt the list. Resulting in the following

[  144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
[  144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
<snipped>
[  145.122628] Call Trace:
[  145.125086]  [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
[  145.131350]  [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
[  145.137268]  [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
[  145.143188]  [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
[  145.149018]  [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
[  145.154940]  [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
[  145.160511]  [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
[  145.167035]  [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
[  145.172694]  [<ffffffffa290848d>] module_attr_store+0x1d/0x30
[  145.178443]  [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
[  145.183925]  [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
[  145.189761]  [<ffffffffa2a99248>] __vfs_write+0x18/0x40
[  145.194982]  [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
[  145.200122]  [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
[  145.205177]  [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17

This can be even triggered manually by changing
/sys/module/zswap/parameters/compressor multiple times.

Fix this issue by making unregister APIs symmetric to the register so
there are no surprises.

Fixes: 47e627bc8c ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
Reported-and-tested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-15 08:49:23 -08:00
Thomas Gleixner
c6a5bf4cda locking/rtmutex: Use READ_ONCE() in rt_mutex_owner()
commit 1be5d4fa0a upstream.

While debugging the rtmutex unlock vs. dequeue race Will suggested to use
READ_ONCE() in rt_mutex_owner() as it might race against the
cmpxchg_release() in unlock_rt_mutex_safe().

Will: "It's a minor thing which will most likely not matter in practice"

Careful search did not unearth an actual problem in todays code, but it's
better to be safe than surprised.

Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-15 08:49:22 -08:00
Thomas Gleixner
b27d9147f2 locking/rtmutex: Prevent dequeue vs. unlock race
commit dbb26055de upstream.

David reported a futex/rtmutex state corruption. It's caused by the
following problem:

CPU0		CPU1		CPU2

l->owner=T1
		rt_mutex_lock(l)
		lock(l->wait_lock)
		l->owner = T1 | HAS_WAITERS;
		enqueue(T2)
		boost()
		  unlock(l->wait_lock)
		schedule()

				rt_mutex_lock(l)
				lock(l->wait_lock)
				l->owner = T1 | HAS_WAITERS;
				enqueue(T3)
				boost()
				  unlock(l->wait_lock)
				schedule()
		signal(->T2)	signal(->T3)
		lock(l->wait_lock)
		dequeue(T2)
		deboost()
		  unlock(l->wait_lock)
				lock(l->wait_lock)
				dequeue(T3)
				  ===> wait list is now empty
				deboost()
				 unlock(l->wait_lock)
		lock(l->wait_lock)
		fixup_rt_mutex_waiters()
		  if (wait_list_empty(l)) {
		    owner = l->owner & ~HAS_WAITERS;
		    l->owner = owner
		     ==> l->owner = T1
		  }

				lock(l->wait_lock)
rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
				  if (wait_list_empty(l)) {
				    owner = l->owner & ~HAS_WAITERS;
cmpxchg(l->owner, T1, NULL)
 ===> Success (l->owner = NULL)
				    l->owner = owner
				     ==> l->owner = T1
				  }

That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.

This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().

The solution is rather trivial: verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.

It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.

Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.

Reported-by: David Daney <ddaney@caviumnetworks.com>
Tested-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 23f78d4a03 ("[PATCH] pi-futex: rt mutex core")
Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-15 08:49:22 -08:00
Alex Shi
2f0de5192a Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-12-12 22:17:37 +08:00
Alex Shi
a057484ab4 Merge remote-tracking branch 'lts/linux-4.4.y' into linux-linaro-lsk-v4.4
Conflicts:
	also change cpu_enable_uao in arch/arm64/include/asm/processor.h
	comment unmatch fixed in arch/arm64/kernel/suspend.c
2016-12-12 22:16:26 +08:00
Ding Tianhong
dfb704f96c rcu: Fix soft lockup for rcu_nocb_kthread
commit bedc196915 upstream.

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
[  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
[  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
[  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
[  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
[  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
[  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
[  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
[  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
[  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.106005] Stack:
[  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
[  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
[  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
[  368.106005] Call Trace:
[  368.106005]  <IRQ>
[  368.106005]
[  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
[  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
[  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
[  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
[  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
[  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
[  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
[  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
[  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
[  368.106005]  <EOI>
[  368.106005]
[  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
[  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
[  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
[  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
[  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140

==================================cut here==============================

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-08 07:15:24 +01:00
Huang, Tao
ef179e79e9 Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git
* linux-linaro-lsk-v4.4-android: (61 commits)
  Linux 4.4.36
  scsi: mpt3sas: Unblock device after controller reset
  flow_dissect: call init_default_flow_dissectors() earlier
  mei: fix return value on disconnection
  mei: me: fix place for kaby point device ids.
  mei: me: disable driver on SPT SPS firmware
  drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on
  mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]
  parisc: Also flush data TLB in flush_icache_page_asm
  parisc: Fix race in pci-dma.c
  parisc: Fix races in parisc_setup_cache_timing()
  NFSv4.x: hide array-bounds warning
  apparmor: fix change_hat not finding hat after policy replacement
  cfg80211: limit scan results cache size
  tile: avoid using clocksource_cyc2ns with absolute cycle count
  scsi: mpt3sas: Fix secure erase premature termination
  Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y
  USB: serial: ftdi_sio: add support for TI CC3200 LaunchPad
  USB: serial: cp210x: add ID for the Zone DPMX
  usb: chipidea: move the lock initialization to core file
  ...
2016-12-06 20:58:56 +08:00
Huang, Tao
45cd824a30 Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git
* linux-linaro-lsk-v4.4-android: (315 commits)
  Linux 4.4.35
  netfilter: nft_dynset: fix element timeout for HZ != 1000
  IB/cm: Mark stale CM id's whenever the mad agent was unregistered
  IB/uverbs: Fix leak of XRC target QPs
  IB/core: Avoid unsigned int overflow in sg_alloc_table
  IB/mlx5: Fix fatal error dispatching
  IB/mlx5: Use cache line size to select CQE stride
  IB/mlx4: Fix create CQ error flow
  IB/mlx4: Check gid_index return value
  PM / sleep: don't suspend parent when async child suspend_{noirq, late} fails
  PM / sleep: fix device reference leak in test_suspend
  uwb: fix device reference leaks
  mfd: core: Fix device reference leak in mfd_clone_cell
  iwlwifi: pcie: fix SPLC structure parsing
  rtc: omap: Fix selecting external osc
  clk: mmp: mmp2: fix return value check in mmp2_clk_init()
  clk: mmp: pxa168: fix return value check in pxa168_clk_init()
  clk: mmp: pxa910: fix return value check in pxa910_clk_init()
  drm/amdgpu: Attach exclusive fence to prime exported bo's. (v5)
  crypto: caam - do not register AES-XTS mode on LP units
  ...

Change-Id: Ic14c01a22a5e8a0356d6c0ef6bcca7bc6cad6b4b
2016-12-02 20:31:31 +08:00
Ke Wang
a8575ee86e sched: tune: Fix lacking spinlock initialization
The spinlock used by boost_groups in sched tune must be initialized.
This commit fixes this lack and the following errors:

[    0.384739] c2 BUG: spinlock bad magic on CPU#2, swapper/2/0
[    0.390313] c2  lock: 0xffffffc15fe1fc80, .magic:00000000, .owner: <none>/-1, .owner_cpu: 0
[    0.398739] c2 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.6+ #4
[    0.404816] c2 Hardware name: Spreadtrum SP9860gBoard (DT)
[    0.410462] c2 Call trace:
[    0.413159] c2 [<ffffff800808b50c>] dump_backtrace+0x0/0x210
[    0.418803] c2 [<ffffff800808b73c>] show_stack+0x20/0x28
[    0.424100] c2 [<ffffff8008433310>] dump_stack+0xa8/0xe0
[    0.429398] c2 [<ffffff8008139398>] spin_dump+0x78/0x9c
[    0.434608] c2 [<ffffff80081393ec>] spin_bug+0x30/0x3c
[    0.439644] c2 [<ffffff80081394e4>] do_raw_spin_lock+0xac/0x1b4
[    0.445639] c2 [<ffffff8008abffe4>] _raw_spin_lock_irqsave+0x58/0x68
[    0.451977] c2 [<ffffff800812a560>] schedtune_enqueue_task+0x84/0x3bc
[    0.458320] c2 [<ffffff8008111678>] enqueue_task_fair+0x438/0x208c
[    0.464487] c2 [<ffffff80080feeec>] activate_task+0x70/0xd0
[    0.470130] c2 [<ffffff80080ff4a4>] ttwu_do_activate.constprop.131+0x4c/0x98
[    0.477079] c2 [<ffffff80081005d0>] try_to_wake_up+0x254/0x54c
[    0.482899] c2 [<ffffff80081009d4>] default_wake_function+0x30/0x3c
[    0.489154] c2 [<ffffff8008122464>] autoremove_wake_function+0x3c/0x6c
[    0.495754] c2 [<ffffff8008121b70>] __wake_up_common+0x64/0xa4
[    0.501574] c2 [<ffffff8008121e9c>] __wake_up+0x48/0x60
[    0.506788] c2 [<ffffff8008150fac>] rcu_gp_kthread_wake+0x50/0x5c
[    0.512866] c2 [<ffffff8008151fec>] note_gp_changes+0xac/0xd4
[    0.518597] c2 [<ffffff8008153044>] rcu_process_callbacks+0xe8/0x93c
[    0.524940] c2 [<ffffff80080d0b84>] __do_softirq+0x24c/0x5b8
[    0.530584] c2 [<ffffff80080d1284>] irq_exit+0xc0/0xec
[    0.535623] c2 [<ffffff8008144208>] __handle_domain_irq+0x94/0xf8
[    0.541789] c2 [<ffffff8008082554>] gic_handle_irq+0x64/0xc0

Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
2016-12-01 15:18:44 +05:30