Commit Graph

637355 Commits

Author SHA1 Message Date
Anand Moon
c2e92581da hwmon: pwm-fan: add sys attribute to control active cooling
update fan_speed sys attribute to update the cooling level
for pwm-fan.

Change-Id: I67b0d047dbe0cb0cfa81896ec6226e8e6d79fd1e
2017-04-13 11:44:58 +05:30
OtherCrashOverride
f602b09e52 ODROID-XU4: Never blend the base layer. jenkins-deb_kernel_5422_4.9-26 2017-04-09 17:31:25 +00:00
Mauro Ribeiro
e9913196a7 Merge "Revert "uas: Reset device on reboot"" into odroidxu4-4.9.y jenkins-deb_kernel_5422_4.9-25 2017-04-07 20:35:17 +09:00
Brian Kim
7565b9406b Revert "uas: Reset device on reboot"
This reverts commit da65c2bb99.

That commit is for specific x86 hardware and the shutdown function
caused the system hang problem while the software reboot. Calling
usb_reset_device() function in the shutdown method reprobes the UAS
storage device again especially.

Change-Id: If9859b1b9dd88c7b1d4ec7ba94bdb386354f47c4
Signed-off-by: Brian Kim <brian.kim@hardkernel.com>
2017-04-07 13:25:43 +09:00
Joy Cho
e6169efe37 drm/exynos: Add new hdmi modes, 480x800p60hz/1920x1200p60hz
Change-Id: Idbfd34015c56f5715ca893d91fffdb0e0af8eaec
(cherry picked from commit 9c60de878944dc7169452adfc6abf284e9b92f52)
jenkins-deb_kernel_5422_4.9-24
2017-04-07 11:12:13 +09:00
Mauro Ribeiro
2be9d240d1 Merge "ODROID-XU4 : s2mps11 manual reset function enable for hardware watchdog." into odroidxu4-4.9.y jenkins-deb_kernel_5422_4.9-23 2017-04-06 21:35:53 +09:00
Mauro (mdrjr) Ribeiro
66a3c19c9e [mali] update gpu integration 2017-04-06 09:33:00 -03:00
charles.park
59efc896cc ODROID-XU4 : s2mps11 manual reset function enable for hardware watchdog.
Change-Id: Ib22723438ee02b77c3e75a00f2e7dfbb84c01618
2017-04-06 17:31:00 +09:00
Joy Cho
b67bca1ecc ODROID-XU4: Add LIRC interface driver (CONFIG_LIRC)
Change-Id: I3650557a73d55de3b769dee54ea025f104a0695b
jenkins-deb_kernel_5422_4.9-22
2017-04-06 11:02:08 +09:00
popcornmix
601dc03b76 Allow mac address to be set in smsc95xx
Signed-off-by: popcornmix <popcornmix@gmail.com>
Signed-off-by: Mauro (mdrjr) Ribeiro <mauro.ribeiro@hardkernel.com>
jenkins-deb_kernel_5422_4.9-21 jenkins-deb_kernel_5422_4.9-20
2017-04-03 15:20:08 -03:00
Mauro (mdrjr) Ribeiro
72312f238e Merge tag 'v4.9.20' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidxu4-4.9.y
This is the 4.9.20 stable release
2017-04-03 14:57:01 -03:00
Mauro (mdrjr) Ribeiro
d2c6906655 Merge tag 'v4.9.19' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidxu4-4.9.y
This is the 4.9.19 stable release
2017-04-03 14:56:53 -03:00
Mauro (mdrjr) Ribeiro
88204d3733 Merge tag 'v4.9.18' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidxu4-4.9.y
This is the 4.9.18 stable release
2017-04-03 14:56:38 -03:00
Mauro (mdrjr) Ribeiro
6a8c174666 Merge tag 'v4.9.17' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidxu4-4.9.y
This is the 4.9.17 stable release
2017-04-03 14:56:29 -03:00
Mauro (mdrjr) Ribeiro
95318e275c Merge tag 'v4.9.16' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidxu4-4.9.y
This is the 4.9.16 stable release
2017-04-03 14:56:20 -03:00
Mauro (mdrjr) Ribeiro
ad668d5992 Merge tag 'v4.9.15' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidxu4-4.9.y
This is the 4.9.15 stable release
2017-04-03 14:56:11 -03:00
Mauro (mdrjr) Ribeiro
cf836c25a8 Merge tag 'v4.9.14' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidxu4-4.9.y
This is the 4.9.14 stable release
2017-04-03 14:55:46 -03:00
Greg Kroah-Hartman
f676772746 Linux 4.9.20 2017-03-31 10:32:02 +02:00
Bin Liu
1dc3a068cc usb: musb: fix possible spinlock deadlock
commit bc1e215454 upstream.

The DSPS glue calls del_timer_sync() in its musb_platform_disable()
implementation, which requires the caller to not hold a lock. But
musb_remove() calls musb_platform_disable() will musb->lock held. This
could causes spinlock deadlock.

So change musb_remove() to call musb_platform_disable() without holds
musb->lock. This doesn't impact the musb_platform_disable implementation
in other glue drivers.

root@am335x-evm:~# modprobe -r musb-dsps
[  126.134879] musb-hdrc musb-hdrc.1: remove, state 1
[  126.140465] usb usb2: USB disconnect, device number 1
[  126.146178] usb 2-1: USB disconnect, device number 2
[  126.416985] musb-hdrc musb-hdrc.1: USB bus 2 deregistered
[  126.423943]
[  126.425525] ======================================================
[  126.431997] [ INFO: possible circular locking dependency detected ]
[  126.438564] 4.11.0-rc1-00003-g1557f13bca04-dirty #77 Not tainted
[  126.444852] -------------------------------------------------------
[  126.451414] modprobe/778 is trying to acquire lock:
[  126.456523]  (((&glue->timer))){+.-...}, at: [<c01b8788>] del_timer_sync+0x0/0xd0
[  126.464403]
[  126.464403] but task is already holding lock:
[  126.470511]  (&(&musb->lock)->rlock){-.-...}, at: [<bf30b7f8>] musb_remove+0x50/0x1
30 [musb_hdrc]
[  126.479965]
[  126.479965] which lock already depends on the new lock.
[  126.479965]
[  126.488531]
[  126.488531] the existing dependency chain (in reverse order) is:
[  126.496368]
[  126.496368] -> #1 (&(&musb->lock)->rlock){-.-...}:
[  126.502968]        otg_timer+0x80/0xec [musb_dsps]
[  126.507990]        call_timer_fn+0xb4/0x390
[  126.512372]        expire_timers+0xf0/0x1fc
[  126.516754]        run_timer_softirq+0x80/0x178
[  126.521511]        __do_softirq+0xc4/0x554
[  126.525802]        irq_exit+0xe8/0x158
[  126.529735]        __handle_domain_irq+0x58/0xb8
[  126.534583]        __irq_usr+0x54/0x80
[  126.538507]
[  126.538507] -> #0 (((&glue->timer))){+.-...}:
[  126.544636]        del_timer_sync+0x40/0xd0
[  126.549066]        musb_remove+0x6c/0x130 [musb_hdrc]
[  126.554370]        platform_drv_remove+0x24/0x3c
[  126.559206]        device_release_driver_internal+0x14c/0x1e0
[  126.565225]        bus_remove_device+0xd8/0x108
[  126.569970]        device_del+0x1e4/0x308
[  126.574170]        platform_device_del+0x24/0x8c
[  126.579006]        platform_device_unregister+0xc/0x20
[  126.584394]        dsps_remove+0x14/0x30 [musb_dsps]
[  126.589595]        platform_drv_remove+0x24/0x3c
[  126.594432]        device_release_driver_internal+0x14c/0x1e0
[  126.600450]        driver_detach+0x38/0x6c
[  126.604740]        bus_remove_driver+0x4c/0xa0
[  126.609407]        SyS_delete_module+0x11c/0x1e4
[  126.614252]        __sys_trace_return+0x0/0x10

Fixes: ea2f35c01d ("usb: musb: Fix sleeping function called from invalid context for hdrc glue")
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:46 +02:00
Sebastian Andrzej Siewior
916c5cfeab sched/rt: Add a missing rescheduling point
commit 619bd4a718 upstream.

Since the change in commit:

  fd7a4bed18 ("sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks")

... we don't reschedule a task under certain circumstances:

Lets say task-A, SCHED_OTHER, is running on CPU0 (and it may run only on
CPU0) and holds a PI lock. This task is removed from the CPU because it
used up its time slice and another SCHED_OTHER task is running. Task-B on
CPU1 runs at RT priority and asks for the lock owned by task-A. This
results in a priority boost for task-A. Task-B goes to sleep until the
lock has been made available. Task-A is already runnable (but not active),
so it receives no wake up.

The reality now is that task-A gets on the CPU once the scheduler decides
to remove the current task despite the fact that a high priority task is
enqueued and waiting. This may take a long time.

The desired behaviour is that CPU0 immediately reschedules after the
priority boost which made task-A the task with the lowest priority.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: fd7a4bed18 ("sched, rt: Convert switched_{from, to}_rt() prio_changed_rt() to balance callbacks")
Link: http://lkml.kernel.org/r/20170124144006.29821-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:46 +02:00
Eric Biggers
2984e52c75 fscrypt: remove broken support for detecting keyring key revocation
commit 1b53cf9815 upstream.

Filesystem encryption ostensibly supported revoking a keyring key that
had been used to "unlock" encrypted files, causing those files to become
"locked" again.  This was, however, buggy for several reasons, the most
severe of which was that when key revocation happened to be detected for
an inode, its fscrypt_info was immediately freed, even while other
threads could be using it for encryption or decryption concurrently.
This could be exploited to crash the kernel or worse.

This patch fixes the use-after-free by removing the code which detects
the keyring key having been revoked, invalidated, or expired.  Instead,
an encrypted inode that is "unlocked" now simply remains unlocked until
it is evicted from memory.  Note that this is no worse than the case for
block device-level encryption, e.g. dm-crypt, and it still remains
possible for a privileged user to evict unused pages, inodes, and
dentries by running 'sync; echo 3 > /proc/sys/vm/drop_caches', or by
simply unmounting the filesystem.  In fact, one of those actions was
already needed anyway for key revocation to work even somewhat sanely.
This change is not expected to break any applications.

In the future I'd like to implement a real API for fscrypt key
revocation that interacts sanely with ongoing filesystem operations ---
waiting for existing operations to complete and blocking new operations,
and invalidating and sanitizing key material and plaintext from the VFS
caches.  But this is a hard problem, and for now this bug must be fixed.

This bug affected almost all versions of ext4, f2fs, and ubifs
encryption, and it was potentially reachable in any kernel configured
with encryption support (CONFIG_EXT4_ENCRYPTION=y,
CONFIG_EXT4_FS_ENCRYPTION=y, CONFIG_F2FS_FS_ENCRYPTION=y, or
CONFIG_UBIFS_FS_ENCRYPTION=y).  Note that older kernels did not use the
shared fs/crypto/ code, but due to the potential security implications
of this bug, it may still be worthwhile to backport this fix to them.

Fixes: b7236e21d5 ("ext4 crypto: reorganize how we store keys in the inode")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:46 +02:00
Dave Martin
21c95eca62 metag/ptrace: Reject partial NT_METAG_RPIPE writes
commit 7195ee3120 upstream.

It's not clear what behaviour is sensible when doing partial write of
NT_METAG_RPIPE, so just don't bother.

This patch assumes that userspace will never rely on a partial SETREGSET
in this case, since it's not clear what should happen anyway.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:46 +02:00
Dave Martin
2d6532ceb4 metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
commit 5fe81fe981 upstream.

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill TXSTATUS, a well-defined default value is used, based on the
task's current value.

Suggested-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:46 +02:00
Dave Martin
2739b48741 metag/ptrace: Preserve previous registers for short regset write
commit a78ce80d2c upstream.

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:46 +02:00
Dave Martin
84b94c4356 sparc/ptrace: Preserve previous registers for short regset write
commit d3805c546b upstream.

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:46 +02:00
Dave Martin
0ba34c87f7 mips/ptrace: Preserve previous registers for short regset write
commit d614fd58a2 upstream.

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:46 +02:00
Dave Martin
616fe993f4 h8300/ptrace: Fix incorrect register transfer count
commit 502585c755 upstream.

regs_set() and regs_get() are vulnerable to an off-by-1 buffer overrun
if CONFIG_CPU_H8S is set, since this adds an extra entry to
register_offset[] but not to user_regs_struct.

So, iterate over user_regs_struct based on its actual size, not based on
the length of register_offset[].

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:46 +02:00
Dave Martin
d77bd2d6a4 c6x/ptrace: Remove useless PTRACE_SETREGSET implementation
commit fb411b837b upstream.

gpr_set won't work correctly and can never have been tested, and the
correct behaviour is not clear due to the endianness-dependent task
layout.

So, just remove it.  The core code will now return -EOPNOTSUPPORT when
trying to set NT_PRSTATUS on this architecture until/unless a correct
implementation is supplied.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:46 +02:00
Bjorn Andersson
dd6054234b pinctrl: qcom: Don't clear status bit on irq_unmask
commit a6566710ad upstream.

Clearing the status bit on irq_unmask will discard any pending interrupt
that did arrive after the irq_ack, i.e. while the IRQ handler function
was executing.

Fixes: f365be0925 ("pinctrl: Add Qualcomm TLMM driver")
Cc: Stephen Boyd <sboyd@codeaurora.org>
Reported-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:45 +02:00
Ladi Prosek
b82b2f930c virtio_balloon: init 1st buffer in stats vq
commit fc8653228c upstream.

When init_vqs runs, virtio_balloon.stats is either uninitialized or
contains stale values. The host updates its state with garbage data
because it has no way of knowing that this is just a marker buffer
used for signaling.

This patch updates the stats before pushing the initial buffer.

Alternative fixes:
* Push an empty buffer in init_vqs. Not easily done with the current
  virtio implementation and violates the spec "Driver MUST supply the
  same subset of statistics in all buffers submitted to the statsq".
* Push a buffer with invalid tags in init_vqs. Violates the same
  spec clause, plus "invalid tag" is not really defined.

Note: the spec says:
	When using the legacy interface, the device SHOULD ignore all values in
	the first buffer in the statsq supplied by the driver after device
	initialization. Note: Historically, drivers supplied an uninitialized
	buffer in the first buffer.

Unfortunately QEMU does not seem to implement the recommendation
even for the legacy interface.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:45 +02:00
Paolo Bonzini
3e8c6bd9db KVM: x86: cleanup the page tracking SRCU instance
commit 2beb6dad2e upstream.

SRCU uses a delayed work item.  Skip cleaning it up, and
the result is use-after-free in the work item callbacks.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 0eb05bf290
Reviewed-by: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:45 +02:00
Andy Whitcroft
79191ea36d xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
commit f843ee6dd0 upstream.

Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
wrapping issues.  To ensure we are correctly ensuring that the two ESN
structures are the same size compare both the overall size as reported
by xfrm_replay_state_esn_len() and the internal length are the same.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:45 +02:00
Andy Whitcroft
64a5465799 xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
commit 677e806da4 upstream.

When a new xfrm state is created during an XFRM_MSG_NEWSA call we
validate the user supplied replay_esn to ensure that the size is valid
and to ensure that the replay_window size is within the allocated
buffer.  However later it is possible to update this replay_esn via a
XFRM_MSG_NEWAE call.  There we again validate the size of the supplied
buffer matches the existing state and if so inject the contents.  We do
not at this point check that the replay_window is within the allocated
memory.  This leads to out-of-bounds reads and writes triggered by
netlink packets.  This leads to memory corruption and the potential for
priviledge escalation.

We already attempt to validate the incoming replay information in
xfrm_new_ae() via xfrm_replay_verify_len().  This confirms that the user
is not trying to change the size of the replay state buffer which
includes the replay_esn.  It however does not check the replay_window
remains within that buffer.  Add validation of the contained
replay_window.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:45 +02:00
Florian Westphal
f68a09c794 xfrm: policy: init locks early
commit c282222a45 upstream.

Dmitry reports following splat:
 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 CPU: 0 PID: 13059 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170207 #1
[..]
 spin_lock_bh include/linux/spinlock.h:304 [inline]
 xfrm_policy_flush+0x32/0x470 net/xfrm/xfrm_policy.c:963
 xfrm_policy_fini+0xbf/0x560 net/xfrm/xfrm_policy.c:3041
 xfrm_net_init+0x79f/0x9e0 net/xfrm/xfrm_policy.c:3091
 ops_init+0x10a/0x530 net/core/net_namespace.c:115
 setup_net+0x2ed/0x690 net/core/net_namespace.c:291
 copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
 create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
 SYSC_unshare kernel/fork.c:2281 [inline]

Problem is that when we get error during xfrm_net_init we will call
xfrm_policy_fini which will acquire xfrm_policy_lock before it was
initialized.  Just move it around so locks get set up first.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 283bc9f35b ("xfrm: Namespacify xfrm state/policy locks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 10:31:45 +02:00
Greg Kroah-Hartman
c8e131605d Linux 4.9.19 2017-03-30 09:41:57 +02:00
Jiri Slaby
bc959a402d crypto: algif_hash - avoid zero-sized array
commit 6207119444 upstream.

With this reproducer:
  struct sockaddr_alg alg = {
          .salg_family = 0x26,
          .salg_type = "hash",
          .salg_feat = 0xf,
          .salg_mask = 0x5,
          .salg_name = "digest_null",
  };
  int sock, sock2;

  sock = socket(AF_ALG, SOCK_SEQPACKET, 0);
  bind(sock, (struct sockaddr *)&alg, sizeof(alg));
  sock2 = accept(sock, NULL, NULL);
  setsockopt(sock, SOL_ALG, ALG_SET_KEY, "\x9b\xca", 2);
  accept(sock2, NULL, NULL);

==== 8< ======== 8< ======== 8< ======== 8< ====

one can immediatelly see an UBSAN warning:
UBSAN: Undefined behaviour in crypto/algif_hash.c:187:7
variable length array bound value 0 <= 0
CPU: 0 PID: 15949 Comm: syz-executor Tainted: G            E      4.4.30-0-default #1
...
Call Trace:
...
 [<ffffffff81d598fd>] ? __ubsan_handle_vla_bound_not_positive+0x13d/0x188
 [<ffffffff81d597c0>] ? __ubsan_handle_out_of_bounds+0x1bc/0x1bc
 [<ffffffffa0e2204d>] ? hash_accept+0x5bd/0x7d0 [algif_hash]
 [<ffffffffa0e2293f>] ? hash_accept_nokey+0x3f/0x51 [algif_hash]
 [<ffffffffa0e206b0>] ? hash_accept_parent_nokey+0x4a0/0x4a0 [algif_hash]
 [<ffffffff8235c42b>] ? SyS_accept+0x2b/0x40

It is a correct warning, as hash state is propagated to accept as zero,
but creating a zero-length variable array is not allowed in C.

Fix this as proposed by Herbert -- do "?: 1" on that site. No sizeof or
similar happens in the code there, so we just allocate one byte even
though we do not use the array.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net> (maintainer:CRYPTO API)
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:28 +02:00
Takashi Iwai
3fd37725c4 fbcon: Fix vc attr at deinit
commit 8aac7f3436 upstream.

fbcon can deal with vc_hi_font_mask (the upper 256 chars) and adjust
the vc attrs dynamically when vc_hi_font_mask is changed at
fbcon_init().  When the vc_hi_font_mask is set, it remaps the attrs in
the existing console buffer with one bit shift up (for 9 bits), while
it remaps with one bit shift down (for 8 bits) when the value is
cleared.  It works fine as long as the font gets updated after fbcon
was initialized.

However, we hit a bizarre problem when the console is switched to
another fb driver (typically from vesafb or efifb to drmfb).  At
switching to the new fb driver, we temporarily rebind the console to
the dummy console, then rebind to the new driver.  During the
switching, we leave the modified attrs as is.  Thus, the new fbcon
takes over the old buffer as if it were to contain 8 bits chars
(although the attrs are still shifted for 9 bits), and effectively
this results in the yellow color texts instead of the original white
color, as found in the bugzilla entry below.

An easy fix for this is to re-adjust the attrs before leaving the
fbcon at con_deinit callback.  Since the code to adjust the attrs is
already present in the current fbcon code, in this patch, we simply
factor out the relevant code, and call it from fbcon_deinit().

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000619
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:28 +02:00
Daniel Vetter
c75fe78995 drm: reference count event->completion
commit 24835e442f upstream.

When writing the generic nonblocking commit code I assumed that
through clever lifetime management I can assure that the completion
(stored in drm_crtc_commit) only gets freed after it is completed. And
that worked.

I also wanted to make nonblocking helpers resilient against driver
bugs, by having timeouts everywhere. And that worked too.

Unfortunately taking boths things together results in oopses :( Well,
at least sometimes: What seems to happen is that the drm event hangs
around forever stuck in limbo land. The nonblocking helpers eventually
time out, move on and release it. Now the bug I tested all this
against is drivers that just entirely fail to deliver the vblank
events like they should, and in those cases the event is simply
leaked. But what seems to happen, at least sometimes, on i915 is that
the event is set up correctly, but somohow the vblank fails to fire in
time. Which means the event isn't leaked, it's still there waiting for
eventually a vblank to fire. That tends to happen when re-enabling the
pipe, and then the trap springs and the kernel oopses.

The correct fix here is simply to refcount the crtc commit to make
sure that the event sticks around even for drivers which only
sometimes fail to deliver vblanks for some arbitrary reasons. Since
crtc commits are already refcounted that's easy to do.

References: https://bugs.freedesktop.org/show_bug.cgi?id=96781
Cc: Jim Rees <rees@umich.edu>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161221102331.31033-1-daniel.vetter@ffwll.ch
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:28 +02:00
Johannes Berg
56769e7a05 nl80211: fix dumpit error path RTNL deadlocks
commit ea90e0dc8c upstream.

Sowmini pointed out Dmitry's RTNL deadlock report to me, and it turns out
to be perfectly accurate - there are various error paths that miss unlock
of the RTNL.

To fix those, change the locking a bit to not be conditional in all those
nl80211_prepare_*_dump() functions, but make those require the RTNL to
start with, and fix the buggy error paths. This also let me use sparse
(by appropriately overriding the rtnl_lock/rtnl_unlock functions) to
validate the changes.

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:28 +02:00
Marek Szyprowski
7b3c8b2a2e drm/bridge: analogix dp: Fix runtime PM state on driver bind
commit f0a8b49c03 upstream.

Analogix_dp_bind() can be called from component framework, which doesn't
guarantee proper runtime PM state of the device during bind operation,
so ensure that device is runtime active before doing any register access.
This ensures that the power domain, to which DP module belongs, is turned
on. While at it, also fix the unbalanced call to phy_power_on() in
analogix_dp_bind() function.

This patch solves the following kernel oops on Samsung Exynos5250 Snow
board:

Unhandled fault: imprecise external abort (0x406) at 0x00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: : 406 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 75 Comm: kworker/0:2 Not tainted 4.9.0 #1046
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Workqueue: events deferred_probe_work_func
task: ee272300 task.stack: ee312000
PC is at analogix_dp_enable_sw_function+0x18/0x2c
LR is at analogix_dp_init_dp+0x2c/0x50
...
[<c03fcb38>] (analogix_dp_enable_sw_function) from [<c03fa9c4>] (analogix_dp_init_dp+0x2c/0x50)
[<c03fa9c4>] (analogix_dp_init_dp) from [<c03fab6c>] (analogix_dp_bind+0x184/0x42c)
[<c03fab6c>] (analogix_dp_bind) from [<c03fdb84>] (component_bind_all+0xf0/0x218)
[<c03fdb84>] (component_bind_all) from [<c03ed64c>] (exynos_drm_load+0x134/0x200)
[<c03ed64c>] (exynos_drm_load) from [<c03d5058>] (drm_dev_register+0xa0/0xd0)
[<c03d5058>] (drm_dev_register) from [<c03d66b8>] (drm_platform_init+0x58/0xb0)
[<c03d66b8>] (drm_platform_init) from [<c03fe0c4>] (try_to_bring_up_master+0x14c/0x188)
[<c03fe0c4>] (try_to_bring_up_master) from [<c03fe188>] (component_add+0x88/0x138)
[<c03fe188>] (component_add) from [<c0403a38>] (platform_drv_probe+0x50/0xb0)
[<c0403a38>] (platform_drv_probe) from [<c0402470>] (driver_probe_device+0x1f0/0x2a8)
[<c0402470>] (driver_probe_device) from [<c0400a54>] (bus_for_each_drv+0x44/0x8c)
[<c0400a54>] (bus_for_each_drv) from [<c04021f8>] (__device_attach+0x9c/0x100)
[<c04021f8>] (__device_attach) from [<c04018e8>] (bus_probe_device+0x84/0x8c)
[<c04018e8>] (bus_probe_device) from [<c0401d1c>] (deferred_probe_work_func+0x60/0x8c)
[<c0401d1c>] (deferred_probe_work_func) from [<c012fc14>] (process_one_work+0x120/0x318)
[<c012fc14>] (process_one_work) from [<c012fe34>] (process_scheduled_works+0x28/0x38)
[<c012fe34>] (process_scheduled_works) from [<c0130048>] (worker_thread+0x204/0x4ac)
[<c0130048>] (worker_thread) from [<c01352c4>] (kthread+0xd8/0xf4)
[<c01352c4>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
Code: e59035f0 e5935018 f57ff04f e3c55001 (f57ff04e)
---[ end trace 3d1d0d87796de344 ]---

Reviewed-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1483091866-1088-1-git-send-email-m.szyprowski@samsung.com
Cc: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:28 +02:00
Dave Jiang
eae72468c4 device-dax: fix pmd/pte fault fallback handling
commit 0134ed4fb9 upstream.

Jeff Moyer reports:

    With a device dax alignment of 4KB or 2MB, I get sigbus when running
    the attached fio job file for the current kernel (4.11.0-rc1+).  If
    I specify an alignment of 1GB, it works.

    I turned on debug output, and saw that it was failing in the huge
    fault code.

     dax dax1.0: dax_open
     dax dax1.0: dax_mmap
     dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 -
     dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf60
     dax dax1.0: dax_release

    fio config for reproduce:
    [global]
    ioengine=dev-dax
    direct=0
    filename=/dev/dax0.0
    bs=2m

    [write]
    rw=write

    [read]
    stonewall
    rw=read

The driver fails to fallback when taking a fault that is larger than
the device alignment, or handling a larger fault when a smaller
mapping is already established. While we could support larger
mappings for a device with a smaller alignment, that change is
too large for the immediate fix. The simplest change is to force
fallback until the fault size matches the alignment.

Fixes: dee4107924 ("/dev/dax, core: file operations and dax-mmap")
Cc: <stable@vger.kernel.org>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:27 +02:00
Ilya Dryomov
81ec3dc1de libceph: don't set weight to IN when OSD is destroyed
commit b581a5854e upstream.

Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
This changes the result of applying an incremental for clients, not
just OSDs.  Because CRUSH computations are obviously affected,
pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
object placement, resulting in misdirected requests.

Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.

Fixes: 930c532869 ("libceph: apply new_state before new_up_client on incrementals")
Link: http://tracker.ceph.com/issues/19122
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:27 +02:00
K. Y. Srinivasan
df1fe6c9ad Drivers: hv: vmbus: Don't leak memory when a channel is rescinded
commit 5e030d5ce9 upstream.

When we close a channel that has been rescinded, we will leak memory since
vmbus_teardown_gpadl() returns an error. Fix this so that we can properly
cleanup the memory allocated to the ring buffers.

Fixes: ccb61f8a99 ("Drivers: hv: vmbus: Fix a rescind handling bug")

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:27 +02:00
K. Y. Srinivasan
b1f6b0a5a0 Drivers: hv: vmbus: Don't leak channel ids
commit 9a5476020a upstream.

If we cannot allocate memory for the channel, free the relid
associated with the channel.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:27 +02:00
Alexander Shishkin
3076066bb5 intel_th: Don't leak module refcount on failure to activate
commit e609ccef52 upstream.

Output 'activation' may fail for the reasons of the output driver,
for example, if msc's buffer is not allocated. We forget, however,
to drop the module reference in this case. So each attempt at
activation in this case leaks a reference, preventing the module
from ever unloading.

This patch adds the missing module_put() in the activation error
path.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:27 +02:00
Eric Biggers
b176a6eed3 jbd2: don't leak memory if setting up journal fails
commit cd9cb405e0 upstream.

In journal_init_common(), if we failed to allocate the j_wbuf array, or
if we failed to create the buffer_head for the journal superblock, we
leaked the memory allocated for the revocation tables.  Fix this.

Fixes: f0c9fd5458
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:27 +02:00
Dmitry Torokhov
90f39ad2ce auxdisplay: img-ascii-lcd: add missing sentinel entry in img_ascii_lcd_matches
commit abda288bb2 upstream.

The OF device table must be terminated, otherwise we'll be walking past it
and into areas unknown.

Fixes: 0cad855fbd ("auxdisplay: img-ascii-lcd: driver for simple ASCII...")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:27 +02:00
Alex Deucher
9740abe0fd drm/amdgpu: reinstate oland workaround for sclk
commit e11ddff68a upstream.

Higher sclks seem to be unstable on some boards.

bug: https://bugs.freedesktop.org/show_bug.cgi?id=100222

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:27 +02:00
Ming Lei
21d17f1b53 blk-mq: don't complete un-started request in timeout handler
commit 95a4960370 upstream.

When iterating busy requests in timeout handler,
if the STARTED flag of one request isn't set, that means
the request is being processed in block layer or driver, and
isn't submitted to hardware yet.

In current implementation of blk_mq_check_expired(),
if the request queue becomes dying, un-started requests are
handled as being completed/freed immediately. This way is
wrong, and can cause rq corruption or double allocation[1][2],
when doing I/O and removing&resetting NVMe device at the sametime.

This patch fixes several issues reported by Yi Zhang.

[1]. oops log 1
[  581.789754] ------------[ cut here ]------------
[  581.789758] kernel BUG at block/blk-mq.c:374!
[  581.789760] invalid opcode: 0000 [#1] SMP
[  581.789761] Modules linked in: vfat fat ipmi_ssif intel_rapl sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm nvme
irqbypass crct10dif_pclmul nvme_core crc32_pclmul ghash_clmulni_intel
intel_cstate ipmi_si mei_me ipmi_devintf intel_uncore sg ipmi_msghandler
intel_rapl_perf iTCO_wdt mei iTCO_vendor_support mxm_wmi lpc_ich dcdbas shpchp
pcspkr acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd dm_multipath grace
sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci
crc32c_intel tg3 libata megaraid_sas i2c_core ptp fjes pps_core dm_mirror
dm_region_hash dm_log dm_mod
[  581.789796] CPU: 1 PID: 1617 Comm: kworker/1:1H Not tainted 4.10.0.bz1420297+ #4
[  581.789797] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
[  581.789804] Workqueue: kblockd blk_mq_timeout_work
[  581.789806] task: ffff8804721c8000 task.stack: ffffc90006ee4000
[  581.789809] RIP: 0010:blk_mq_end_request+0x58/0x70
[  581.789810] RSP: 0018:ffffc90006ee7d50 EFLAGS: 00010202
[  581.789811] RAX: 0000000000000001 RBX: ffff8802e4195340 RCX: ffff88028e2f4b88
[  581.789812] RDX: 0000000000001000 RSI: 0000000000001000 RDI: 0000000000000000
[  581.789813] RBP: ffffc90006ee7d60 R08: 0000000000000003 R09: ffff88028e2f4b00
[  581.789814] R10: 0000000000001000 R11: 0000000000000001 R12: 00000000fffffffb
[  581.789815] R13: ffff88042abe5780 R14: 000000000000002d R15: ffff88046fbdff80
[  581.789817] FS:  0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
[  581.789818] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  581.789819] CR2: 00007f64f403a008 CR3: 000000014d078000 CR4: 00000000001406e0
[  581.789820] Call Trace:
[  581.789825]  blk_mq_check_expired+0x76/0x80
[  581.789828]  bt_iter+0x45/0x50
[  581.789830]  blk_mq_queue_tag_busy_iter+0xdd/0x1f0
[  581.789832]  ? blk_mq_rq_timed_out+0x70/0x70
[  581.789833]  ? blk_mq_rq_timed_out+0x70/0x70
[  581.789840]  ? __switch_to+0x140/0x450
[  581.789841]  blk_mq_timeout_work+0x88/0x170
[  581.789845]  process_one_work+0x165/0x410
[  581.789847]  worker_thread+0x137/0x4c0
[  581.789851]  kthread+0x101/0x140
[  581.789853]  ? rescuer_thread+0x3b0/0x3b0
[  581.789855]  ? kthread_park+0x90/0x90
[  581.789860]  ret_from_fork+0x2c/0x40
[  581.789861] Code: 48 85 c0 74 0d 44 89 e6 48 89 df ff d0 5b 41 5c 5d c3 48
8b bb 70 01 00 00 48 85 ff 75 0f 48 89 df e8 7d f0 ff ff 5b 41 5c 5d c3 <0f>
0b e8 71 f0 ff ff 90 eb e9 0f 1f 40 00 66 2e 0f 1f 84 00 00
[  581.789882] RIP: blk_mq_end_request+0x58/0x70 RSP: ffffc90006ee7d50
[  581.789889] ---[ end trace bcaf03d9a14a0a70 ]---

[2]. oops log2
[ 6984.857362] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 6984.857372] IP: nvme_queue_rq+0x6e6/0x8cd [nvme]
[ 6984.857373] PGD 0
[ 6984.857374]
[ 6984.857376] Oops: 0000 [#1] SMP
[ 6984.857379] Modules linked in: ipmi_ssif vfat fat intel_rapl sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_si iTCO_wdt
iTCO_vendor_support mxm_wmi ipmi_devintf intel_cstate sg dcdbas intel_uncore
mei_me intel_rapl_perf mei pcspkr lpc_ich ipmi_msghandler shpchp
acpi_power_meter wmi nfsd auth_rpcgss dm_multipath nfs_acl lockd grace sunrpc
ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect crc32c_intel sysimgblt fb_sys_fops ttm nvme drm nvme_core ahci
libahci i2c_core tg3 libata ptp megaraid_sas pps_core fjes dm_mirror
dm_region_hash dm_log dm_mod
[ 6984.857416] CPU: 7 PID: 1635 Comm: kworker/7:1H Not tainted
4.10.0-2.el7.bz1420297.x86_64 #1
[ 6984.857417] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
[ 6984.857427] Workqueue: kblockd blk_mq_run_work_fn
[ 6984.857429] task: ffff880476e3da00 task.stack: ffffc90002e90000
[ 6984.857432] RIP: 0010:nvme_queue_rq+0x6e6/0x8cd [nvme]
[ 6984.857433] RSP: 0018:ffffc90002e93c50 EFLAGS: 00010246
[ 6984.857434] RAX: 0000000000000000 RBX: ffff880275646600 RCX: 0000000000001000
[ 6984.857435] RDX: 0000000000000fff RSI: 00000002fba2a000 RDI: ffff8804734e6950
[ 6984.857436] RBP: ffffc90002e93d30 R08: 0000000000002000 R09: 0000000000001000
[ 6984.857437] R10: 0000000000001000 R11: 0000000000000000 R12: ffff8804741d8000
[ 6984.857438] R13: 0000000000000040 R14: ffff880475649f80 R15: ffff8804734e6780
[ 6984.857439] FS:  0000000000000000(0000) GS:ffff88047fcc0000(0000) knlGS:0000000000000000
[ 6984.857440] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6984.857442] CR2: 0000000000000010 CR3: 0000000001c09000 CR4: 00000000001406e0
[ 6984.857443] Call Trace:
[ 6984.857451]  ? mempool_free+0x2b/0x80
[ 6984.857455]  ? bio_free+0x4e/0x60
[ 6984.857459]  blk_mq_dispatch_rq_list+0xf5/0x230
[ 6984.857462]  blk_mq_process_rq_list+0x133/0x170
[ 6984.857465]  __blk_mq_run_hw_queue+0x8c/0xa0
[ 6984.857467]  blk_mq_run_work_fn+0x12/0x20
[ 6984.857473]  process_one_work+0x165/0x410
[ 6984.857475]  worker_thread+0x137/0x4c0
[ 6984.857478]  kthread+0x101/0x140
[ 6984.857480]  ? rescuer_thread+0x3b0/0x3b0
[ 6984.857481]  ? kthread_park+0x90/0x90
[ 6984.857489]  ret_from_fork+0x2c/0x40
[ 6984.857490] Code: 8b bd 70 ff ff ff 89 95 50 ff ff ff 89 8d 58 ff ff ff 44
89 95 60 ff ff ff e8 b7 dd 12 e1 8b 95 50 ff ff ff 48 89 85 68 ff ff ff <4c>
8b 48 10 44 8b 58 18 8b 8d 58 ff ff ff 44 8b 95 60 ff ff ff
[ 6984.857511] RIP: nvme_queue_rq+0x6e6/0x8cd [nvme] RSP: ffffc90002e93c50
[ 6984.857512] CR2: 0000000000000010
[ 6984.895359] ---[ end trace 2d7ceb528432bf83 ]---

Reported-by: Yi Zhang <yizhan@redhat.com>
Tested-by: Yi Zhang <yizhan@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:27 +02:00
Tejun Heo
62f6341c85 cgroup, net_cls: iterate the fds of only the tasks which are being migrated
commit a05d4fd917 upstream.

The net_cls controller controls the classid field of each socket which
is associated with the cgroup.  Because the classid is per-socket
attribute, when a task migrates to another cgroup or the configured
classid of the cgroup changes, the controller needs to walk all
sockets and update the classid value, which was implemented by
3b13758f51 ("cgroups: Allow dynamically changing net_classid").

While the approach is not scalable, migrating tasks which have a lot
of fds attached to them is rare and the cost is born by the ones
initiating the operations.  However, for simplicity, both the
migration and classid config change paths call update_classid() which
scans all fds of all tasks in the target css.  This is an overkill for
the migration path which only needs to cover a much smaller subset of
tasks which are actually getting migrated in.

On cgroup v1, this can lead to unexpected scalability issues when one
tries to migrate a task or process into a net_cls cgroup which already
contains a lot of fds.  Even if the migration traget doesn't have many
to get scanned, update_classid() ends up scanning all fds in the
target cgroup which can be extremely numerous.

Unfortunately, on cgroup v2 which doesn't use net_cls, the problem is
even worse.  Before bfc2cf6f61 ("cgroup: call subsys->*attach() only
for subsystems which are actually affected by migration"), cgroup core
would call the ->css_attach callback even for controllers which don't
see actual migration to a different css.

As net_cls is always disabled but still mounted on cgroup v2, whenever
a process is migrated on the cgroup v2 hierarchy, net_cls sees
identity migration from root to root and cgroup core used to call
->css_attach callback for those.  The net_cls ->css_attach ends up
calling update_classid() on the root net_cls css to which all
processes on the system belong to as the controller isn't used.  This
makes any cgroup v2 migration O(total_number_of_fds_on_the_system)
which is horrible and easily leads to noticeable stalls triggering RCU
stall warnings and so on.

The worst symptom is already fixed in upstream by bfc2cf6f61
("cgroup: call subsys->*attach() only for subsystems which are
actually affected by migration"); however, backporting that commit is
too invasive and we want to avoid other cases too.

This patch updates net_cls's cgrp_attach() to iterate fds of only the
processes which are actually getting migrated.  This removes the
surprising migration cost which is dependent on the total number of
fds in the target cgroup.  As this leaves write_classid() the only
user of update_classid(), open-code the helper into write_classid().

Reported-by: David Goode <dgoode@fb.com>
Fixes: 3b13758f51 ("cgroups: Allow dynamically changing net_classid")
Cc: Nina Schiff <ninasc@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30 09:41:27 +02:00