Commit Graph

2990 Commits

Author SHA1 Message Date
Mauro (mdrjr) Ribeiro
026b455712 Merge tag 'v4.9.228' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.228 stable release
2020-07-13 21:26:47 -03:00
Mauro (mdrjr) Ribeiro
61b0ff0d89 Merge tag 'v4.9.224' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.224 stable release

Change-Id: I0e07ea572bc1a980b42dea56ab1fcb1069640ac1
2020-07-13 17:58:04 -03:00
Mauro (mdrjr) Ribeiro
9658d9b4d6 Merge tag 'v4.9.219' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.219 stable release

Change-Id: I1cb302600e983a1dddadf6236ea8b76f6511a177
2020-07-13 13:53:45 -03:00
Giuliano Procida
6754baabb8 blk-mq: move blk_mq_update_nr_hw_queues synchronize_rcu call
This fixes the
4.9 backport commit f530afb974
which was
upstream commit f5bbbbe4d6.

The upstream commit added a call to synchronize_rcu to
_blk_mq_update_nr_hw_queues, just after freezing queues.

In the backport this landed (in blk_mq_update_nr_hw_queues instead),
just after unfreezeing queues.

This commit moves the call to its intended place.

Fixes: f530afb974 ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter")
Signed-off-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20 10:24:19 +02:00
Waiman Long
b488092565 blktrace: Fix potential deadlock between delete & sysfs ops
commit 5acb3cc2c2 upstream.

The lockdep code had reported the following unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(s_active#228);
                               lock(&bdev->bd_mutex/1);
                               lock(s_active#228);
  lock(&bdev->bd_mutex);

 *** DEADLOCK ***

The deadlock may happen when one task (CPU1) is trying to delete a
partition in a block device and another task (CPU0) is accessing
tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that
partition.

The s_active isn't an actual lock. It is a reference count (kn->count)
on the sysfs (kernfs) file. Removal of a sysfs file, however, require
a wait until all the references are gone. The reference count is
treated like a rwsem using lockdep instrumentation code.

The fact that a thread is in the sysfs callback method or in the
ioctl call means there is a reference to the opended sysfs or device
file. That should prevent the underlying block structure from being
removed.

Instead of using bd_mutex in the block_device structure, a new
blk_trace_mutex is now added to the request_queue structure to protect
access to the blk_trace structure.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Fix typo in patch subject line, and prune a comment detailing how
the code used to work.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20 08:15:30 +02:00
Keith Busch
d4d7444936 blk-mq: Allow blocking queue tag iter callbacks
commit 530ca2c9bd upstream.

A recent commit runs tag iterator callbacks under the rcu read lock,
but existing callbacks do not satisfy the non-blocking requirement.
The commit intended to prevent an iterator from accessing a queue that's
being modified. This patch fixes the original issue by taking a queue
reference instead of reading it, which allows callbacks to make blocking
calls.

Fixes: f5bbbbe4d6 ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter")
Acked-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-13 10:32:52 +02:00
Jianchao Wang
f530afb974 blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter
commit f5bbbbe4d6 upstream.

For blk-mq, part_in_flight/rw will invoke blk_mq_in_flight/rw to
account the inflight requests. It will access the queue_hw_ctx and
nr_hw_queues w/o any protection. When updating nr_hw_queues and
blk_mq_in_flight/rw occur concurrently, panic comes up.

Before update nr_hw_queues, the q will be frozen. So we could use
q_usage_counter to avoid the race. percpu_ref_is_zero is used here
so that we will not miss any in-flight request. The access to
nr_hw_queues and queue_hw_ctx in blk_mq_queue_tag_busy_iter are
under rcu critical section, __blk_mq_update_nr_hw_queues could use
synchronize_rcu to ensure the zeroed q_usage_counter to be globally
visible.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Giuliano Procida <gprocida@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-13 10:32:52 +02:00
Mauro (mdrjr) Ribeiro
e6b94853ef Merge tag 'v4.9.212' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.212 stable release
2020-04-07 21:26:32 -03:00
Mauro (mdrjr) Ribeiro
6122ff4d83 Merge tag 'v4.9.211' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.211 stable release
2020-04-07 21:24:11 -03:00
Mauro (mdrjr) Ribeiro
1e2f3136bb Merge tag 'v4.9.209' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.209 stable release
2020-04-07 21:21:21 -03:00
Mauro (mdrjr) Ribeiro
ef076b4c70 Merge tag 'v4.9.207' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.207 stable release
2020-04-07 21:21:08 -03:00
Mauro (mdrjr) Ribeiro
4199c257f6 Merge tag 'v4.9.189' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.189 stable release
2020-04-07 20:16:11 -03:00
Mauro (mdrjr) Ribeiro
33417c4f2f Merge tag 'v4.9.187' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.187 stable release
2020-04-07 20:10:04 -03:00
Mauro (mdrjr) Ribeiro
d1f3b5e15b Merge tag 'v4.9.169' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.169 stable release
2020-04-07 14:57:38 -03:00
Mauro (mdrjr) Ribeiro
f95d762cbe Merge tag 'v4.9.148' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.148 stable release
2020-04-07 14:42:47 -03:00
Mauro (mdrjr) Ribeiro
649073e44f Merge tag 'v4.9.128' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.128 stable release
2020-04-07 13:26:06 -03:00
Mauro (mdrjr) Ribeiro
a4e561af29 Merge tag 'v4.9.127' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.127 stable release
2020-04-07 13:21:25 -03:00
Mauro (mdrjr) Ribeiro
e5a0a1f8cd Merge tag 'v4.9.115' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.115 stable release
2020-04-06 22:31:40 -03:00
Ming Lei
beb0a21d2c block: don't use bio->bi_vcnt to figure out segment number
[ Upstream commit 1a67356e9a ]

It is wrong to use bio->bi_vcnt to figure out how many segments
there are in the bio even though CLONED flag isn't set on this bio,
because this bio may be splitted or advanced.

So always use bio_segments() in blk_recount_segments(), and it shouldn't
cause any performance loss now because the physical segment number is figured
out in blk_queue_split() and BIO_SEG_VALID is set meantime since
bdced438ac ("block: setup bi_phys_segments after splitting").

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 76d8137a31 ("blk-merge: recaculate segment if it isn't less than max segments")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-29 10:24:15 +01:00
Mikulas Patocka
5dbde467cc block: fix an integer overflow in logical block size
commit ad6bf88a6c upstream.

Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.

For exmaple (run this on an architecture with 64k pages):

Mount will fail with this error because it tries to read the superblock using 2-sector
access:
  device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
  EXT4-fs (dm-0): unable to read superblock

This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.

Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23 08:19:38 +01:00
Yang Yingliang
b45244f40c block: fix memleak when __blk_rq_map_user_iov() is failed
[ Upstream commit 3b7995a98a ]

When I doing fuzzy test, get the memleak report:

BUG: memory leak
unreferenced object 0xffff88837af80000 (size 4096):
  comm "memleak", pid 3557, jiffies 4294817681 (age 112.499s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    20 00 00 00 10 01 00 00 00 00 00 00 01 00 00 00   ...............
  backtrace:
    [<000000001c894df8>] bio_alloc_bioset+0x393/0x590
    [<000000008b139a3c>] bio_copy_user_iov+0x300/0xcd0
    [<00000000a998bd8c>] blk_rq_map_user_iov+0x2f1/0x5f0
    [<000000005ceb7f05>] blk_rq_map_user+0xf2/0x160
    [<000000006454da92>] sg_common_write.isra.21+0x1094/0x1870
    [<00000000064bb208>] sg_write.part.25+0x5d9/0x950
    [<000000004fc670f6>] sg_write+0x5f/0x8c
    [<00000000b0d05c7b>] __vfs_write+0x7c/0x100
    [<000000008e177714>] vfs_write+0x1c3/0x500
    [<0000000087d23f34>] ksys_write+0xf9/0x200
    [<000000002c8dbc9d>] do_syscall_64+0x9f/0x4f0
    [<00000000678d8e9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe

If __blk_rq_map_user_iov() is failed in blk_rq_map_user_iov(),
the bio(s) which is allocated before this failing will leak. The
refcount of the bio(s) is init to 1 and increased to 2 by calling
bio_get(), but __blk_rq_unmap_user() only decrease it to 1, so
the bio cannot be freed. Fix it by calling blk_rq_unmap_user().

Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12 11:24:24 +01:00
Arnd Bergmann
075bc9872a compat_ioctl: block: handle Persistent Reservations
commit b2c0fcd287 upstream.

These were added to blkdev_ioctl() in linux-5.5 but not
blkdev_compat_ioctl, so add them now.

Cc: <stable@vger.kernel.org> # v4.4+
Fixes: bbd3e06436 ("block: add an API for Persistent Reservations")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Fold in followup patch from Arnd with missing pr.h header include.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-12 11:24:13 +01:00
Ming Lei
382a2f0030 blk-mq: make sure that line break can be printed
commit d2c9be89f8 upstream.

8962842ca5 ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
avoids sysfs buffer overflow, and reserves one character for line break.
However, the last snprintf() doesn't get correct 'size' parameter passed
in, so fixed it.

Fixes: 8962842ca5 ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21 10:42:18 +01:00
Ming Lei
433e1ee850 blk-mq: avoid sysfs buffer overflow with too many CPU cores
commit 8962842ca5 upstream.

It is reported that sysfs buffer overflow can be triggered if the system
has too many CPU cores(>841 on 4K PAGE_SIZE) when showing CPUs of
hctx via /sys/block/$DEV/mq/$N/cpu_list.

Use snprintf to avoid the potential buffer overflow.

This version doesn't change the attribute format, and simply stops
showing CPU numbers if the buffer is going to overflow.

Cc: stable@vger.kernel.org
Fixes: 676141e48af7("blk-mq: don't dump CPU -> hw queue map on driver load")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21 10:42:02 +01:00
xiao jin
c19199167c block: blk_init_allocated_queue() set q->fq as NULL in the fail case
commit 54648cf1ec upstream.

We find the memory use-after-free issue in __blk_drain_queue()
on the kernel 4.14. After read the latest kernel 4.18-rc6 we
think it has the same problem.

Memory is allocated for q->fq in the blk_init_allocated_queue().
If the elevator init function called with error return, it will
run into the fail case to free the q->fq.

Then the __blk_drain_queue() uses the same memory after the free
of the q->fq, it will lead to the unpredictable event.

The patch is to set q->fq as NULL in the fail case of
blk_init_allocated_queue().

Fixes: commit 7c94e1c157 ("block: introduce blk_flush_queue to drive flush machinery")
Cc: <stable@vger.kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: xiao jin <jin.xiao@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[groeck: backport to v4.4.y/v4.9.y (context change)]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-11 12:22:18 +02:00
Al Viro
06f9e7be05 take floppy compat ioctls to sodding floppy.c
[ Upstream commit 229b53c9bf ]

all other drivers recognizing those ioctls are very much *not*
biarch.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04 09:33:29 +02:00
Jérôme Glisse
056066d8a7 block: do not leak memory in bio_copy_user_iov()
commit a3761c3c91 upstream.

When bio_add_pc_page() fails in bio_copy_user_iov() we should free
the page we just allocated otherwise we are leaking it.

Cc: linux-block@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-17 08:36:46 +02:00
Mikulas Patocka
5af2d106ca block: fix infinite loop if the device loses discard capability
[ Upstream commit b88aef36b8 ]

If __blkdev_issue_discard is in progress and a device mapper device is
reloaded with a table that doesn't support discard,
q->limits.max_discard_sectors is set to zero. This results in infinite
loop in __blkdev_issue_discard.

This patch checks if max_discard_sectors is zero and aborts with
-EOPNOTSUPP.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Zdenek Kabelac <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-29 13:40:14 +01:00
Jens Axboe
f5cecc0550 block: break discard submissions into the user defined size
[ Upstream commit af097f5d19 ]

Don't build discards bigger than what the user asked for, if the
user decided to limit the size by writing to 'discard_max_bytes'.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-29 13:40:14 +01:00
Ruixuan Li
1407ac534a emmc: modify device node name [1/1]
PD#SWPL-2719

Problem:
Can't idetify correctlly when move disk have multi
partition

Solution:
Remove the function of using the partition name as
the device node name

Verify:
test pass on tl1 ref board

Change-Id: I113e63f209c529149fb94b0bb10b0b254717c2bf
Signed-off-by: Ruixuan Li <ruixuan.li@amlogic.com>
2018-12-11 14:31:06 +08:00
Mauricio Faria de Oliveira
a4b41559e5 partitions/aix: fix usage of uninitialized lv_info and lvname structures
[ Upstream commit 14cb2c8a6c ]

The if-block that sets a successful return value in aix_partition()
uses 'lvip[].pps_per_lv' and 'n[].name' potentially uninitialized.

For example, if 'numlvs' is zero or alloc_lvn() fails, neither is
initialized, but are used anyway if alloc_pvd() succeeds after it.

So, make the alloc_pvd() call conditional on their initialization.

This has been hit when attaching an apparently corrupted/stressed
AIX LUN, misleading the kernel to pr_warn() invalid data and hang.

    [...] partition (null) (11 pp's found) is not contiguous
    [...] partition (null) (2 pp's found) is not contiguous
    [...] partition (null) (3 pp's found) is not contiguous
    [...] partition (null) (64 pp's found) is not contiguous

Fixes: 6ceea22bbb ("partitions: add aix lvm partition support files")
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19 22:47:15 +02:00
Mauricio Faria de Oliveira
6bccef1e75 partitions/aix: append null character to print data from disk
[ Upstream commit d43fdae7ba ]

Even if properly initialized, the lvname array (i.e., strings)
is read from disk, and might contain corrupt data (e.g., lack
the null terminating character for strings).

So, make sure the partition name string used in pr_warn() has
the null terminating character.

Fixes: 6ceea22bbb ("partitions: add aix lvm partition support files")
Suggested-by: Daniel J. Axtens <daniel.axtens@canonical.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19 22:47:15 +02:00
Tejun Heo
07a6134db0 block,blkcg: use __GFP_NOWARN for best-effort allocations in blkcg
commit e00f4f4d0f upstream.

blkcg allocates some per-cgroup data structures with GFP_NOWAIT and
when that fails falls back to operations which aren't specific to the
cgroup.  Occassional failures are expected under pressure and falling
back to non-cgroup operation is the right thing to do.

Unfortunately, I forgot to add __GFP_NOWARN to these allocations and
these expected failures end up creating a lot of noise.  Add
__GFP_NOWARN.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Marc MERLIN <marc@merlins.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19 22:47:11 +02:00
Ritesh Harjani
a4187e923f cfq: Give a chance for arming slice idle timer in case of group_idle
commit b3193bc0dc upstream.

In below scenario blkio cgroup does not work as per their assigned
weights :-
1. When the underlying device is nonrotational with a single HW queue
with depth of >= CFQ_HW_QUEUE_MIN
2. When the use case is forming two blkio cgroups cg1(weight 1000) &
cg2(wight 100) and two processes(file1 and file2) doing sync IO in
their respective blkio cgroups.

For above usecase result of fio (without this patch):-
file1: (groupid=0, jobs=1): err= 0: pid=685: Thu Jan  1 19:41:49 1970
  write: IOPS=1315, BW=41.1MiB/s (43.1MB/s)(1024MiB/24906msec)
<...>
file2: (groupid=0, jobs=1): err= 0: pid=686: Thu Jan  1 19:41:49 1970
  write: IOPS=1295, BW=40.5MiB/s (42.5MB/s)(1024MiB/25293msec)
<...>
// both the process BW is equal even though they belong to diff.
cgroups with weight of 1000(cg1) and 100(cg2)

In above case (for non rotational NCQ devices),
as soon as the request from cg1 is completed and even
though it is provided with higher set_slice=10, because of CFQ
algorithm when the driver tries to fetch the request, CFQ expires
this group without providing any idle time nor weight priority
and schedules another cfq group (in this case cg2).
And thus both cfq groups(cg1 & cg2) keep alternating to get the
disk time and hence loses the cgroup weight based scheduling.

Below patch gives a chance to cfq algorithm (cfq_arm_slice_timer)
to arm the slice timer in case group_idle is enabled.
In case if group_idle is also not required (including for nonrotational
NCQ drives), we need to explicitly set group_idle = 0 from sysfs for
such cases.

With this patch result of fio(for above usecase) :-
file1: (groupid=0, jobs=1): err= 0: pid=690: Thu Jan  1 00:06:08 1970
  write: IOPS=1706, BW=53.3MiB/s (55.9MB/s)(1024MiB/19197msec)
<..>
file2: (groupid=0, jobs=1): err= 0: pid=691: Thu Jan  1 00:06:08 1970
  write: IOPS=1043, BW=32.6MiB/s (34.2MB/s)(1024MiB/31401msec)
<..>
// In this processes BW is as per their respective cgroups weight.

Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19 22:47:10 +02:00
Greg Edwards
bbab86e891 block: bvec_nr_vecs() returns value for wrong slab
[ Upstream commit d6c02a9beb ]

In commit ed996a52c8 ("block: simplify and cleanup bvec pool
handling"), the value of the slab index is incremented by one in
bvec_alloc() after the allocation is done to indicate an index value of
0 does not need to be later freed.

bvec_nr_vecs() was not updated accordingly, and thus returns the wrong
value.  Decrement idx before performing the lookup.

Fixes: ed996a52c8 ("block: simplify and cleanup bvec pool handling")
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-15 09:42:59 +02:00
Ruixuan Li
8842daf8de emmc: upgrade gpt or ept based on priority
PD#168362: P211: emmc: upgrade gpt or ept based on priority

Change-Id: I0e7ea483ac5fc0a59ee24ae21d2e40e1c5d32465
Signed-off-by: Ruixuan Li <ruixuan.li@amlogic.com>
2018-08-10 00:25:06 -07:00
Alan Jenkins
3118ceb456 block: do not use interruptible wait anywhere
commit 1dc3039bc8 upstream.

When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.

The PREEMPT_ONLY flag was introduced later in commit 3a0a529971
("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.

So that commit exposed the bug as a regression in v4.15.  A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed.  Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]

[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979

Without this fix, I get an IO error in this test:

# dd if=/dev/sda of=/dev/null iflag=direct & \
  while killall -SIGUSR1 dd; do sleep 0.1; done & \
  echo mem > /sys/power/state ; \
  sleep 5; killall dd  # stop after 5 seconds

The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25 11:24:03 +02:00
Greg Kroah-Hartman
320d53a9d0 Merge 4.9.96 into android-4.9
Changes in 4.9.96
	tty: make n_tty_read() always abort if hangup is in progress
	ubifs: Check ubifs_wbuf_sync() return code
	ubi: fastmap: Don't flush fastmap work on detach
	ubi: Fix error for write access
	ubi: Reject MLC NAND
	fs/reiserfs/journal.c: add missing resierfs_warning() arg
	resource: fix integer overflow at reallocation
	ipc/shm: fix use-after-free of shm file via remap_file_pages()
	mm, slab: reschedule cache_reap() on the same CPU
	usb: musb: gadget: misplaced out of bounds check
	usb: gadget: udc: core: update usb_ep_queue() documentation
	ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property
	ARM: dts: exynos: Fix IOMMU support for GScaler devices on Exynos5250
	ARM: dts: at91: sama5d4: fix pinctrl compatible string
	spi: Fix scatterlist elements size in spi_map_buf
	xen-netfront: Fix hang on device removal
	regmap: Fix reversed bounds check in regmap_raw_write()
	ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E
	ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
	USB: gadget: f_midi: fixing a possible double-free in f_midi
	USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
	usb: dwc3: pci: Properly cleanup resource
	smb3: Fix root directory when server returns inode number of zero
	HID: i2c-hid: fix size check and type usage
	powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
	powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
	powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
	powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
	HID: Fix hid_report_len usage
	HID: core: Fix size as type u32
	ASoC: ssm2602: Replace reg_default_raw with reg_default
	thunderbolt: Resume control channel after hibernation image is created
	irqchip/gic: Take lock when updating irq type
	random: use a tighter cap in credit_entropy_bits_safe()
	jbd2: if the journal is aborted then don't allow update of the log tail
	ext4: don't update checksum of new initialized bitmaps
	ext4: protect i_disksize update by i_data_sem in direct write path
	ext4: fail ext4_iget for root directory if unallocated
	RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
	RDMA/rxe: Fix an out-of-bounds read
	ALSA: pcm: Fix UAF at PCM release via PCM timer access
	IB/srp: Fix srp_abort()
	IB/srp: Fix completion vector assignment algorithm
	dmaengine: at_xdmac: fix rare residue corruption
	libnvdimm, namespace: use a safe lookup for dimm device name
	nfit, address-range-scrub: fix scrub in-progress reporting
	um: Compile with modern headers
	um: Use POSIX ucontext_t instead of struct ucontext
	iommu/vt-d: Fix a potential memory leak
	mmc: jz4740: Fix race condition in IRQ mask update
	clk: mvebu: armada-38x: add support for 1866MHz variants
	clk: mvebu: armada-38x: add support for missing clocks
	clk: fix false-positive Wmaybe-uninitialized warning
	clk: bcm2835: De-assert/assert PLL reset signal when appropriate
	pwm: rcar: Fix a condition to prevent mismatch value setting to duty
	thermal: imx: Fix race condition in imx_thermal_probe()
	dt-bindings: clock: mediatek: add binding for fixed-factor clock axisel_d4
	watchdog: f71808e_wdt: Fix WD_EN register read
	vfio/pci: Virtualize Maximum Read Request Size
	ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation
	ALSA: pcm: Avoid potential races between OSS ioctls and read/write
	ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
	ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
	ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation
	ext4: don't allow r/w mounts if metadata blocks overlap the superblock
	drm/amdgpu: Add an ATPX quirk for hybrid laptop
	drm/amdgpu: Fix always_valid bos multiple LRU insertions.
	drm/amdgpu: Fix PCIe lane width calculation
	drm/rockchip: Clear all interrupts before requesting the IRQ
	drm/radeon: Fix PCIe lane width calculation
	ALSA: line6: Use correct endpoint type for midi output
	ALSA: rawmidi: Fix missing input substream checks in compat ioctls
	ALSA: hda - New VIA controller suppor no-snoop path
	random: fix crng_ready() test
	random: crng_reseed() should lock the crng instance that it is modifying
	random: add new ioctl RNDRESEEDCRNG
	HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
	MIPS: uaccess: Add micromips clobbers to bzero invocation
	MIPS: memset.S: EVA & fault support for small_memset
	MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
	MIPS: memset.S: Fix clobber of v1 in last_fixup
	powerpc/eeh: Fix enabling bridge MMIO windows
	powerpc/lib: Fix off-by-one in alternate feature patching
	udf: Fix leak of UTF-16 surrogates into encoded strings
	jffs2_kill_sb(): deal with failed allocations
	hypfs_kill_super(): deal with failed allocations
	orangefs_kill_sb(): deal with allocation failures
	rpc_pipefs: fix double-dput()
	Don't leak MNT_INTERNAL away from internal mounts
	autofs: mount point create should honour passed in mode
	mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
	fanotify: fix logic of events on child
	writeback: safer lock nesting
	block/mq: fix potential deadlock during cpu hotplug
	Linux 4.9.96

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-04-24 11:26:46 +02:00
Wanpeng Li
8d7f1fde9d block/mq: fix potential deadlock during cpu hotplug
commit 51d638b1f5 upstream.

This can be triggered by hot-unplug one cpu.

======================================================
 [ INFO: possible circular locking dependency detected ]
 4.11.0+ #17 Not tainted
 -------------------------------------------------------
 step_after_susp/2640 is trying to acquire lock:
  (all_q_mutex){+.+...}, at: [<ffffffffb33f95b8>] blk_mq_queue_reinit_work+0x18/0x110

 but task is already holding lock:
  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (cpu_hotplug.lock){+.+.+.}:
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        get_online_cpus+0x64/0x80
        blk_mq_init_allocated_queue+0x3a0/0x4e0
        blk_mq_init_queue+0x3a/0x60
        loop_add+0xe5/0x280
        loop_init+0x124/0x177
        do_one_initcall+0x53/0x1c0
        kernel_init_freeable+0x1e3/0x27f
        kernel_init+0xe/0x100
        ret_from_fork+0x31/0x40

 -> #0 (all_q_mutex){+.+...}:
        __lock_acquire+0x189a/0x18a0
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        blk_mq_queue_reinit_work+0x18/0x110
        blk_mq_queue_reinit_dead+0x1c/0x20
        cpuhp_invoke_callback+0x1f2/0x810
        cpuhp_down_callbacks+0x42/0x80
        _cpu_down+0xb2/0xe0
        freeze_secondary_cpus+0xb6/0x390
        suspend_devices_and_enter+0x3b3/0xa40
        pm_suspend+0x129/0x490
        state_store+0x82/0xf0
        kobj_attr_store+0xf/0x20
        sysfs_kf_write+0x45/0x60
        kernfs_fop_write+0x135/0x1c0
        __vfs_write+0x37/0x160
        vfs_write+0xcd/0x1d0
        SyS_write+0x58/0xc0
        do_syscall_64+0x8f/0x710
        return_from_SYSCALL_64+0x0/0x7a

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(cpu_hotplug.lock);
                                lock(all_q_mutex);
                                lock(cpu_hotplug.lock);
   lock(all_q_mutex);

  *** DEADLOCK ***

 8 locks held by step_after_susp/2640:
  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffffb3244aed>] vfs_write+0x1ad/0x1d0
  #1:  (&of->mutex){+.+.+.}, at: [<ffffffffb32d3a51>] kernfs_fop_write+0x101/0x1c0
  #2:  (s_active#166){.+.+.+}, at: [<ffffffffb32d3a59>] kernfs_fop_write+0x109/0x1c0
  #3:  (pm_mutex){+.+...}, at: [<ffffffffb30d2ecd>] pm_suspend+0x21d/0x490
  #4:  (acpi_scan_lock){+.+.+.}, at: [<ffffffffb34dc3d7>] acpi_scan_lock_acquire+0x17/0x20
  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffffb306d6d7>] freeze_secondary_cpus+0x27/0x390
  #6:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffffb306cfd5>] cpu_hotplug_begin+0x5/0xe0
  #7:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 stack backtrace:
 CPU: 3 PID: 2640 Comm: step_after_susp Not tainted 4.11.0+ #17
 Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS 1.4.9 09/12/2016
 Call Trace:
  dump_stack+0x99/0xce
  print_circular_bug+0x1fa/0x270
  __lock_acquire+0x189a/0x18a0
  lock_acquire+0x11c/0x230
  ? lock_acquire+0x11c/0x230
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? blk_mq_queue_reinit_work+0x18/0x110
  __mutex_lock+0x92/0x990
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? kmem_cache_free+0x2cb/0x330
  ? anon_transport_class_unregister+0x20/0x20
  ? blk_mq_queue_reinit_work+0x110/0x110
  mutex_lock_nested+0x1b/0x20
  ? mutex_lock_nested+0x1b/0x20
  blk_mq_queue_reinit_work+0x18/0x110
  blk_mq_queue_reinit_dead+0x1c/0x20
  cpuhp_invoke_callback+0x1f2/0x810
  ? __flow_cache_shrink+0x160/0x160
  cpuhp_down_callbacks+0x42/0x80
  _cpu_down+0xb2/0xe0
  freeze_secondary_cpus+0xb6/0x390
  suspend_devices_and_enter+0x3b3/0xa40
  ? rcu_read_lock_sched_held+0x79/0x80
  pm_suspend+0x129/0x490
  state_store+0x82/0xf0
  kobj_attr_store+0xf/0x20
  sysfs_kf_write+0x45/0x60
  kernfs_fop_write+0x135/0x1c0
  __vfs_write+0x37/0x160
  ? rcu_read_lock_sched_held+0x79/0x80
  ? rcu_sync_lockdep_assert+0x2f/0x60
  ? __sb_start_write+0xd9/0x1c0
  ? vfs_write+0x1ad/0x1d0
  vfs_write+0xcd/0x1d0
  SyS_write+0x58/0xc0
  ? rcu_read_lock_sched_held+0x79/0x80
  do_syscall_64+0x8f/0x710
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL64_slow_path+0x25/0x25

The cpu hotplug path will hold cpu_hotplug.lock and then reinit all exiting
queues for blk mq w/ all_q_mutex, however, blk_mq_init_allocated_queue() will
contend these two locks in the inversion order. This is due to commit eabe06595d
(blk/mq: Cure cpu hotplug lock inversion), it fixes a cpu hotplug lock inversion
issue because of hotplug rework, however the hotplug rework is still work-in-progress
and lives in a -tip branch and mainline cannot yet trigger that splat. The commit
breaks the linus's tree in the merge window, so this patch reverts the lock order
and avoids to splat linus's tree.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Cc: Thierry Escande <thierry.escande@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24 09:34:18 +02:00
Greg Kroah-Hartman
8683408f8e Merge 4.9.94 into android-4.9
Changes in 4.9.94
	qed: Fix overriding of supported autoneg value.
	cfg80211: make RATE_INFO_BW_20 the default
	md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock
	rtc: snvs: fix an incorrect check of return value
	x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()
	x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility
	ovl: persistent inode numbers for upper hardlinks
	NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION
	x86/boot: Declare error() as noreturn
	IB/srpt: Fix abort handling
	IB/srpt: Avoid that aborting a command triggers a kernel warning
	af_key: Fix slab-out-of-bounds in pfkey_compile_policy.
	mac80211: bail out from prep_connection() if a reconfig is ongoing
	bna: Avoid reading past end of buffer
	qlge: Avoid reading past end of buffer
	ubi: fastmap: Fix slab corruption
	ipmi_ssif: unlock on allocation failure
	net: cdc_ncm: Fix TX zero padding
	net: ethernet: ti: cpsw: adjust cpsw fifos depth for fullduplex flow control
	lockd: fix lockd shutdown race
	drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow tests
	pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid()
	s390: move _text symbol to address higher than zero
	net/mlx4_en: Avoid adding steering rules with invalid ring
	qed: Correct doorbell configuration for !4Kb pages
	NFSv4.1: Work around a Linux server bug...
	CIFS: silence lockdep splat in cifs_relock_file()
	perf/callchain: Force USER_DS when invoking perf_callchain_user()
	blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op
	net: qca_spi: Fix alignment issues in rx path
	netxen_nic: set rcode to the return status from the call to netxen_issue_cmd
	mdio: mux: Correct mdio_mux_init error path issues
	Input: elan_i2c - check if device is there before really probing
	Input: elantech - force relative mode on a certain module
	KVM: PPC: Book3S PR: Check copy_to/from_user return values
	irqchip/mbigen: Fix the clear register offset calculation
	vmxnet3: ensure that adapter is in proper state during force_close
	mm, vmstat: Remove spurious WARN() during zoneinfo print
	SMB2: Fix share type handling
	bus: brcmstb_gisb: Use register offsets with writes too
	bus: brcmstb_gisb: correct support for 64-bit address output
	PowerCap: Fix an error code in powercap_register_zone()
	iio: pressure: zpa2326: report interrupted case as failure
	ARM: dts: imx53-qsrb: Pulldown PMIC IRQ pin
	staging: wlan-ng: prism2mgmt.c: fixed a double endian conversion before calling hfa384x_drvr_setconfig16, also fixes relative sparse warning
	clk: renesas: rcar-gen2: Fix PLL0 on R-Car V2H and E2
	x86/tsc: Provide 'tsc=unstable' boot parameter
	powerpc/modules: If mprofile-kernel is enabled add it to vermagic
	ARM: dts: imx6qdl-wandboard: Fix audio channel swap
	i2c: mux: reg: put away the parent i2c adapter on probe failure
	arm64: perf: Ignore exclude_hv when kernel is running in HYP
	mdio: mux: fix device_node_continue.cocci warnings
	ipv6: avoid dad-failures for addresses with NODAD
	async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
	KVM: arm: Restore banked registers and physical timer access on hyp_panic()
	KVM: arm64: Restore host physical timer access on hyp_panic()
	usb: dwc3: keystone: check return value
	btrfs: fix incorrect error return ret being passed to mapping_set_error
	ata: libahci: properly propagate return value of platform_get_irq()
	ipmr: vrf: Find VIFs using the actual device
	uio: fix incorrect memory leak cleanup
	neighbour: update neigh timestamps iff update is effective
	arp: honour gratuitous ARP _replies_
	ARM: dts: rockchip: fix rk322x i2s1 pinctrl error
	usb: chipidea: properly handle host or gadget initialization failure
	pxa_camera: fix module remove codepath for v4l2 clock
	USB: ene_usb6250: fix first command execution
	net: x25: fix one potential use-after-free issue
	USB: ene_usb6250: fix SCSI residue overwriting
	serial: 8250: omap: Disable DMA for console UART
	serial: sh-sci: Fix race condition causing garbage during shutdown
	net/wan/fsl_ucc_hdlc: fix unitialized variable warnings
	net/wan/fsl_ucc_hdlc: fix incorrect memory allocation
	fsl/qe: add bit description for SYNL register for GUMR
	sh_eth: Use platform device for printing before register_netdev()
	mlxsw: spectrum: Avoid possible NULL pointer dereference
	scsi: csiostor: fix use after free in csio_hw_use_fwconfig()
	powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash
	ath5k: fix memory leak on buf on failed eeprom read
	selftests/powerpc: Fix TM resched DSCR test with some compilers
	xfrm: fix state migration copy replay sequence numbers
	ASoC: simple-card: fix mic jack initialization
	iio: hi8435: avoid garbage event at first enable
	iio: hi8435: cleanup reset gpio
	iio: light: rpr0521 poweroff for probe fails
	ext4: handle the rest of ext4_mb_load_buddy() ENOMEM errors
	md-cluster: fix potential lock issue in add_new_disk
	ARM: davinci: da8xx: Create DSP device only when assigned memory
	ray_cs: Avoid reading past end of buffer
	net/wan/fsl_ucc_hdlc: fix muram allocation error
	leds: pca955x: Correct I2C Functionality
	perf/core: Fix error handling in perf_event_alloc()
	sched/numa: Use down_read_trylock() for the mmap_sem
	gpio: crystalcove: Do not write regular gpio registers for virtual GPIOs
	net/mlx5: Tolerate irq_set_affinity_hint() failures
	selinux: do not check open permission on sockets
	block: fix an error code in add_partition()
	mlx5: fix bug reading rss_hash_type from CQE
	net: ieee802154: fix net_device reference release too early
	libceph: NULL deref on crush_decode() error path
	perf report: Fix off-by-one for non-activation frames
	netfilter: ctnetlink: fix incorrect nf_ct_put during hash resize
	pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()
	ASoC: rsnd: SSI PIO adjust to 24bit mode
	scsi: bnx2fc: fix race condition in bnx2fc_get_host_stats()
	fix race in drivers/char/random.c:get_reg()
	ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff()
	ARM64: PCI: Fix struct acpi_pci_root_ops allocation failure path
	tcp: better validation of received ack sequences
	net: move somaxconn init from sysctl code
	Input: elan_i2c - clear INT before resetting controller
	bonding: Don't update slave->link until ready to commit
	cpuhotplug: Link lock stacks for hotplug callbacks
	PCI/msi: fix the pci_alloc_irq_vectors_affinity stub
	KVM: X86: Fix preempt the preemption timer cancel
	KVM: nVMX: Fix handling of lmsw instruction
	net: llc: add lock_sock in llc_ui_bind to avoid a race condition
	drm/msm: Take the mutex before calling msm_gem_new_impl
	i40iw: Fix sequence number for the first partial FPDU
	i40iw: Correct Q1/XF object count equation
	ARM: dts: ls1021a: add "fsl,ls1021a-esdhc" compatible string to esdhc node
	thermal: power_allocator: fix one race condition issue for thermal_instances list
	perf probe: Add warning message if there is unexpected event name
	l2tp: fix missing print session offset info
	rds; Reset rs->rs_bound_addr in rds_add_bound() failure path
	ACPI / video: Default lcd_only to true on Win8-ready and newer machines
	net/mlx4_en: Change default QoS settings
	VFS: close race between getcwd() and d_move()
	PM / devfreq: Fix potential NULL pointer dereference in governor_store
	hwmon: (ina2xx) Make calibration register value fixed
	media: videobuf2-core: don't go out of the buffer range
	ASoC: Intel: Skylake: Disable clock gating during firmware and library download
	ASoC: Intel: cht_bsw_rt5645: Analog Mic support
	scsi: libiscsi: Allow sd_shutdown on bad transport
	scsi: mpt3sas: Proper handling of set/clear of "ATA command pending" flag.
	irqchip/gic-v3: Fix the driver probe() fail due to disabled GICC entry
	ACPI: EC: Fix debugfs_create_*() usage
	mac80211: Fix setting TX power on monitor interfaces
	vfb: fix video mode and line_length being set when loaded
	gpio: label descriptors using the device name
	IB/rdmavt: Allocate CQ memory on the correct node
	blk-mq: fix race between updating nr_hw_queues and switching io sched
	backlight: tdo24m: Fix the SPI CS between transfers
	pinctrl: baytrail: Enable glitch filter for GPIOs used as interrupts
	ASoC: Intel: sst: Fix the return value of 'sst_send_byte_stream_mrfld()'
	rt2x00: do not pause queue unconditionally on error path
	wl1251: check return from call to wl1251_acx_arp_ip_filter
	hdlcdrv: Fix divide by zero in hdlcdrv_ioctl
	x86/efi: Disable runtime services on kexec kernel if booted with efi=old_map
	netfilter: conntrack: don't call iter for non-confirmed conntracks
	HID: i2c: Call acpi_device_fix_up_power for ACPI-enumerated devices
	ovl: filter trusted xattr for non-admin
	powerpc/[booke|4xx]: Don't clobber TCR[WP] when setting TCR[DIE]
	dmaengine: imx-sdma: Handle return value of clk_prepare_enable
	backlight: Report error on failure
	arm64: futex: Fix undefined behaviour with FUTEX_OP_OPARG_SHIFT usage
	net/mlx5: avoid build warning for uniprocessor
	cxgb4: FW upgrade fixes
	cxgb4: Fix netdev_features flag
	rtc: m41t80: fix SQW dividers override when setting a date
	i40evf: fix merge error in older patch
	rtc: opal: Handle disabled TPO in opal_get_tpo_time()
	rtc: interface: Validate alarm-time before handling rollover
	SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()
	net: freescale: fix potential null pointer dereference
	clk: at91: fix clk-generated parenting
	drm/sun4i: Ignore the generic connectors for components
	dt-bindings: display: sun4i: Add allwinner,tcon-channel property
	mtd: nand: gpmi: Fix gpmi_nand_init() error path
	mtd: nand: check ecc->total sanity in nand_scan_tail
	KVM: SVM: do not zero out segment attributes if segment is unusable or not present
	clk: scpi: fix return type of __scpi_dvfs_round_rate
	clk: Fix __set_clk_rates error print-string
	powerpc/spufs: Fix coredump of SPU contexts
	drm/amdkfd: NULL dereference involving create_process()
	ath10k: add BMI parameters to fix calibration from DT/pre-cal
	perf trace: Add mmap alias for s390
	qlcnic: Fix a sleep-in-atomic bug in qlcnic_82xx_hw_write_wx_2M and qlcnic_82xx_hw_read_wx_2M
	arm64: kernel: restrict /dev/mem read() calls to linear region
	mISDN: Fix a sleep-in-atomic bug
	net: phy: micrel: Restore led_mode and clk_sel on resume
	RDMA/iw_cxgb4: Avoid touch after free error in ARP failure handlers
	RDMA/hfi1: fix array termination by appending NULL to attr array
	drm/omap: fix tiled buffer stride calculations
	powerpc/8xx: fix mpc8xx_get_irq() return on no irq
	cxgb4: fix incorrect cim_la output for T6
	Fix serial console on SNI RM400 machines
	bio-integrity: Do not allocate integrity context for bio w/o data
	ip6_tunnel: fix traffic class routing for tunnels
	skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow
	macsec: check return value of skb_to_sgvec always
	sit: reload iphdr in ipip6_rcv
	net/mlx4: Fix the check in attaching steering rules
	net/mlx4: Check if Granular QoS per VF has been enabled before updating QP qos_vport
	perf header: Set proper module name when build-id event found
	perf report: Ensure the perf DSO mapping matches what libdw sees
	iwlwifi: mvm: fix firmware debug restart recording
	watchdog: f71808e_wdt: Add F71868 support
	iwlwifi: mvm: Fix command queue number on d0i3 flow
	iwlwifi: tt: move ucode_loaded check under mutex
	iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3
	iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265
	tags: honor COMPILED_SOURCE with apart output directory
	ARM: dts: qcom: ipq4019: fix i2c_0 node
	e1000e: fix race condition around skb_tstamp_tx()
	igb: fix race condition with PTP_TX_IN_PROGRESS bits
	cxl: Unlock on error in probe
	cx25840: fix unchecked return values
	mceusb: sporadic RX truncation corruption fix
	net: phy: avoid genphy_aneg_done() for PHYs without clause 22 support
	ARM: imx: Add MXC_CPU_IMX6ULL and cpu_is_imx6ull
	nvme-pci: fix multiple ctrl removal scheduling
	nvme: fix hang in remove path
	KVM: nVMX: Update vmcs12->guest_linear_address on nested VM-exit
	e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails
	perf/core: Correct event creation with PERF_FORMAT_GROUP
	sched/deadline: Use the revised wakeup rule for suspending constrained dl tasks
	MIPS: mm: fixed mappings: correct initialisation
	MIPS: mm: adjust PKMAP location
	MIPS: kprobes: flush_insn_slot should flush only if probe initialised
	ARM: dts: armadillo800eva: Split LCD mux and gpio
	Fix loop device flush before configure v3
	net: emac: fix reset timeout with AR8035 phy
	perf tools: Decompress kernel module when reading DSO data
	perf tests: Decompress kernel module before objdump
	skbuff: only inherit relevant tx_flags
	xen: avoid type warning in xchg_xen_ulong
	X.509: Fix error code in x509_cert_parse()
	pinctrl: meson-gxbb: remove non-existing pin GPIOX_22
	coresight: Fix reference count for software sources
	coresight: tmc: Configure DMA mask appropriately
	stmmac: fix ptp header for GMAC3 hw timestamp
	geneve: add missing rx stats accounting
	crypto: omap-sham - buffer handling fixes for hashing later
	crypto: omap-sham - fix closing of hash with separate finalize call
	bnx2x: Allow vfs to disable txvlan offload
	sctp: fix recursive locking warning in sctp_do_peeloff
	net: fec: Add a fec_enet_clear_ethtool_stats() stub for CONFIG_M5272
	sparc64: ldc abort during vds iso boot
	iio: magnetometer: st_magn_spi: fix spi_device_id table
	net: ena: fix rare uncompleted admin command false alarm
	net: ena: fix race condition between submit and completion admin command
	net: ena: add missing return when ena_com_get_io_handlers() fails
	net: ena: add missing unmap bars on device removal
	net: ena: disable admin msix while working in polling mode
	clk: meson: meson8b: add compatibles for Meson8 and Meson8m2
	Bluetooth: Send HCI Set Event Mask Page 2 command only when needed
	cpuidle: dt: Add missing 'of_node_put()'
	ACPICA: OSL: Add support to exclude stdarg.h
	ACPICA: Events: Add runtime stub support for event APIs
	ACPICA: Disassembler: Abort on an invalid/unknown AML opcode
	s390/dasd: fix hanging safe offline
	vxlan: dont migrate permanent fdb entries during learn
	hsr: fix incorrect warning
	selftests: kselftest_harness: Fix compile warning
	drm/vc4: Fix resource leak in 'vc4_get_hang_state_ioctl()' in error handling path
	bcache: stop writeback thread after detaching
	bcache: segregate flash only volume write streams
	scsi: libsas: fix memory leak in sas_smp_get_phy_events()
	scsi: libsas: fix error when getting phy events
	scsi: libsas: initialize sas_phy status according to response of DISCOVER
	blk-mq: fix kernel oops in blk_mq_tag_idle()
	tty: n_gsm: Allow ADM response in addition to UA for control dlci
	EDAC, mv64x60: Fix an error handling path
	cxgb4vf: Fix SGE FL buffer initialization logic for 64K pages
	sdhci: Advertise 2.0v supply on SDIO host controller
	Input: goodix - disable IRQs while suspended
	mtd: mtd_oobtest: Handle bitflips during reads
	perf tools: Fix copyfile_offset update of output offset
	ipsec: check return value of skb_to_sgvec always
	rxrpc: check return value of skb_to_sgvec always
	virtio_net: check return value of skb_to_sgvec always
	virtio_net: check return value of skb_to_sgvec in one more location
	random: use lockless method of accessing and updating f->reg_idx
	clk: at91: fix clk-generated compilation
	arp: fix arp_filter on l3slave devices
	ipv6: the entire IPv6 header chain must fit the first fragment
	net: fix possible out-of-bound read in skb_network_protocol()
	net/ipv6: Fix route leaking between VRFs
	net/ipv6: Increment OUTxxx counters after netfilter hook
	netlink: make sure nladdr has correct size in netlink_connect()
	net/sched: fix NULL dereference in the error path of tcf_bpf_init()
	pptp: remove a buggy dst release in pptp_connect()
	r8169: fix setting driver_data after register_netdev
	sctp: do not leak kernel memory to user space
	sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6
	sky2: Increase D3 delay to sky2 stops working after suspend
	vhost: correctly remove wait queue during poll failure
	vlan: also check phy_driver ts_info for vlan's real device
	bonding: fix the err path for dev hwaddr sync in bond_enslave
	bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave
	bonding: process the err returned by dev_set_allmulti properly in bond_enslave
	net: fool proof dev_valid_name()
	ip_tunnel: better validate user provided tunnel names
	ipv6: sit: better validate user provided tunnel names
	ip6_gre: better validate user provided tunnel names
	ip6_tunnel: better validate user provided tunnel names
	vti6: better validate user provided tunnel names
	net/mlx5e: Sync netdev vxlan ports at open
	net/sched: fix NULL dereference in the error path of tunnel_key_init()
	net/sched: fix NULL dereference on the error path of tcf_skbmod_init()
	net/mlx4_en: Fix mixed PFC and Global pause user control requests
	vhost: validate log when IOTLB is enabled
	route: check sysctl_fib_multipath_use_neigh earlier than hash
	team: move dev_mc_sync after master_upper_dev_link in team_port_add
	vhost_net: add missing lock nesting notation
	net/mlx4_core: Fix memory leak while delete slave's resources
	strparser: Fix sign of err codes
	net sched actions: fix dumping which requires several messages to user space
	vrf: Fix use after free and double free in vrf_finish_output
	Revert "xhci: plat: Register shutdown for xhci_plat"
	Linux 4.9.94

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-04-14 15:40:56 +02:00
Ming Lei
3f4e241969 blk-mq: fix kernel oops in blk_mq_tag_idle()
[ Upstream commit 8ab0b7dc73 ]

HW queues may be unmapped in some cases, such as blk_mq_update_nr_hw_queues(),
then we need to check it before calling blk_mq_tag_idle(), otherwise
the following kernel oops can be triggered, so fix it by checking if
the hw queue is unmapped since it doesn't make sense to idle the tags
any more after hw queues are unmapped.

[  440.771298] Workqueue: nvme-wq nvme_rdma_del_ctrl_work [nvme_rdma]
[  440.779104] task: ffff894bae755ee0 ti: ffff893bf9bc8000 task.ti: ffff893bf9bc8000
[  440.788359] RIP: 0010:[<ffffffffb730e2b4>]  [<ffffffffb730e2b4>] __blk_mq_tag_idle+0x24/0x40
[  440.798697] RSP: 0018:ffff893bf9bcbd10  EFLAGS: 00010286
[  440.805538] RAX: 0000000000000000 RBX: ffff895bb131dc00 RCX: 000000000000011f
[  440.814426] RDX: 00000000ffffffff RSI: 0000000000000120 RDI: ffff895bb131dc00
[  440.823301] RBP: ffff893bf9bcbd10 R08: 000000000001b860 R09: 4a51d361c00c0000
[  440.832193] R10: b5907f32b4cc7003 R11: ffffd6cabfb57000 R12: ffff894bafd1e008
[  440.841091] R13: 0000000000000001 R14: ffff895baf770000 R15: 0000000000000080
[  440.849988] FS:  0000000000000000(0000) GS:ffff894bbdcc0000(0000) knlGS:0000000000000000
[  440.859955] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  440.867274] CR2: 0000000000000008 CR3: 000000103d098000 CR4: 00000000001407e0
[  440.876169] Call Trace:
[  440.879818]  [<ffffffffb7309d68>] blk_mq_exit_hctx+0xd8/0xe0
[  440.887051]  [<ffffffffb730dc40>] blk_mq_free_queue+0xf0/0x160
[  440.894465]  [<ffffffffb72ff679>] blk_cleanup_queue+0xd9/0x150
[  440.901881]  [<ffffffffc08a802b>] nvme_ns_remove+0x5b/0xb0 [nvme_core]
[  440.910068]  [<ffffffffc08a811b>] nvme_remove_namespaces+0x3b/0x60 [nvme_core]
[  440.919026]  [<ffffffffc08b817b>] __nvme_rdma_remove_ctrl+0x2b/0xb0 [nvme_rdma]
[  440.928079]  [<ffffffffc08b8237>] nvme_rdma_del_ctrl_work+0x17/0x20 [nvme_rdma]
[  440.937126]  [<ffffffffb70ab58a>] process_one_work+0x17a/0x440
[  440.944517]  [<ffffffffb70ac3a8>] worker_thread+0x278/0x3c0
[  440.951607]  [<ffffffffb70ac130>] ? manage_workers.isra.24+0x2a0/0x2a0
[  440.959760]  [<ffffffffb70b352f>] kthread+0xcf/0xe0
[  440.966055]  [<ffffffffb70b3460>] ? insert_kthread_work+0x40/0x40
[  440.973715]  [<ffffffffb76d8658>] ret_from_fork+0x58/0x90
[  440.980586]  [<ffffffffb70b3460>] ? insert_kthread_work+0x40/0x40
[  440.988229] Code: 5b 41 5c 5d c3 66 90 0f 1f 44 00 00 48 8b 87 20 01 00 00 f0 0f ba 77 40 01 19 d2 85 d2 75 08 c3 0f 1f 80 00 00 00 00 55 48 89 e5 <f0> ff 48 08 48 8d 78 10 e8 7f 0f 05 00 5d c3 0f 1f 00 66 2e 0f
[  441.011620] RIP  [<ffffffffb730e2b4>] __blk_mq_tag_idle+0x24/0x40
[  441.019301]  RSP <ffff893bf9bcbd10>
[  441.024052] CR2: 0000000000000008

Reported-by: Zhang Yi <yizhan@redhat.com>
Tested-by: Zhang Yi <yizhan@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:48:30 +02:00
Dmitry Monakhov
7f851311e4 bio-integrity: Do not allocate integrity context for bio w/o data
[ Upstream commit 3116a23bb3 ]

If bio has no data, such as ones from blkdev_issue_flush(),
then we have nothing to protect.

This patch prevent bugon like follows:

kfree_debugcheck: out of range ptr ac1fa1d106742a5ah
kernel BUG at mm/slab.c:2773!
invalid opcode: 0000 [#1] SMP
Modules linked in: bcache
CPU: 0 PID: 4428 Comm: xfs_io Tainted: G        W       4.11.0-rc4-ext4-00041-g2ef0043-dirty #43
Hardware name: Virtuozzo KVM, BIOS seabios-1.7.5-11.vz7.4 04/01/2014
task: ffff880137786440 task.stack: ffffc90000ba8000
RIP: 0010:kfree_debugcheck+0x25/0x2a
RSP: 0018:ffffc90000babde0 EFLAGS: 00010082
RAX: 0000000000000034 RBX: ac1fa1d106742a5a RCX: 0000000000000007
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88013f3ccb40
RBP: ffffc90000babde8 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000fcb76420 R11: 00000000725172ed R12: 0000000000000282
R13: ffffffff8150e766 R14: ffff88013a145e00 R15: 0000000000000001
FS:  00007fb09384bf40(0000) GS:ffff88013f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd0172f9e40 CR3: 0000000137fa9000 CR4: 00000000000006f0
Call Trace:
 kfree+0xc8/0x1b3
 bio_integrity_free+0xc3/0x16b
 bio_free+0x25/0x66
 bio_put+0x14/0x26
 blkdev_issue_flush+0x7a/0x85
 blkdev_fsync+0x35/0x42
 vfs_fsync_range+0x8e/0x9f
 vfs_fsync+0x1c/0x1e
 do_fsync+0x31/0x4a
 SyS_fsync+0x10/0x14
 entry_SYSCALL_64_fastpath+0x1f/0xc2

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:48:18 +02:00
Ming Lei
3bab65f29e blk-mq: fix race between updating nr_hw_queues and switching io sched
[ Upstream commit fb350e0ad9 ]

In both elevator_switch_mq() and blk_mq_update_nr_hw_queues(), sched tags
can be allocated, and q->nr_hw_queue is used, and race is inevitable, for
example: blk_mq_init_sched() may trigger use-after-free on hctx, which is
freed in blk_mq_realloc_hw_ctxs() when nr_hw_queues is decreased.

This patch fixes the race be holding q->sysfs_lock.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:48:11 +02:00
Dan Carpenter
e668764524 block: fix an error code in add_partition()
[ Upstream commit 7bd897cfce ]

We don't set an error code on this path.  It means that we return NULL
instead of an error pointer and the caller does a NULL dereference.

Fixes: 6d1d8050b4 ("block, partition: add partition_meta_info to hd_struct")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:48:04 +02:00
Wen Xiong
2005c4f301 blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op
[ Upstream commit f36ea50ca0 ]

When formatting NVMe to 512B/4K + T10 DIf/DIX, dd with split op returns
"Input/output error". Looks block layer split the bio after calling
bio_integrity_prep(bio). This patch fixes the issue.

Below is how we debug this issue:
(1)format nvme to 4K block # size with type 2 DIF
(2)dd with block size bigger than 1024k.
oflag=direct
dd: error writing '/dev/nvme0n1': Input/output error

We added some debug code in nvme device driver. It showed us the first
op and the second op have the same bi and pi address. This is not
correct.

1st op: nvme0n1 Op:Wr slba 0x505 length 0x100, PI ctrl=0x1400,
	dsmgmt=0x0, AT=0x0 & RT=0x505
	Guard 0x00b1, AT 0x0000, RT physical 0x00000505 RT virtual 0x00002828

2nd op: nvme0n1 Op:Wr slba 0x605 length 0x1, PI ctrl=0x1400, dsmgmt=0x0,
	AT=0x0 & RT=0x605  ==> This op fails and subsequent 5 retires..
	Guard 0x00b1, AT 0x0000, RT physical 0x00000605 RT virtual 0x00002828

With the fix, It showed us both of the first op and the second op have
correct bi and pi address.

1st op: nvme2n1 Op:Wr slba 0x505 length 0x100, PI ctrl=0x1400,
	dsmgmt=0x0, AT=0x0 & RT=0x505
	Guard 0x5ccb, AT 0x0000, RT physical 0x00000505 RT virtual
	0x00002828
2nd op: nvme2n1 Op:Wr slba 0x605 length 0x1, PI ctrl=0x1400, dsmgmt=0x0,
	AT=0x0 & RT=0x605
	Guard 0xab4c, AT 0x0000, RT physical 0x00000605 RT virtual
	0x00003028

Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:47:54 +02:00
Greg Hackmann
05baf14727 Merge tag 'v4.9.93' into android-4.9
This is the 4.9.93 stable release

Change-Id: I4293d83f45982c6fd479bddbf9b0f811248ddc30
Signed-off-by: Greg Hackmann <ghackmann@google.com>
2018-04-09 11:39:17 -07:00
Mikulas Patocka
6bae91221e Fix slab name "biovec-(1<<(21-12))"
commit bd5c4facf5 upstream.

I'm getting a slab named "biovec-(1<<(21-12))". It is caused by unintended
expansion of the macro BIO_MAX_PAGES. This patch renames it to biovec-max.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org	# v4.14+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08 12:13:00 +02:00
Richard Narron
bd94a2c744 partitions/msdos: Unable to mount UFS 44bsd partitions
commit 5f15684bd5 upstream.

UFS partitions from newer versions of FreeBSD 10 and 11 use relative
addressing for their subpartitions. But older versions of FreeBSD still
use absolute addressing just like OpenBSD and NetBSD.

Instead of simply testing for a FreeBSD partition, the code needs to
also test if the starting offset of the C subpartition is zero.

https://bugzilla.kernel.org/show_bug.cgi?id=197733

Signed-off-by: Richard Narron <comet.berkeley@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08 12:12:42 +02:00
Greg Kroah-Hartman
dd1e37e646 Merge 4.9.90 into android-4.9
Changes in 4.9.90
	tpm: fix potential buffer overruns caused by bit glitches on the bus
	ASoC: rsnd: check src mod pointer for rsnd_mod_id()
	SMB3: Validate negotiate request must always be signed
	CIFS: Enable encryption during session setup phase
	staging: android: ashmem: Fix possible deadlock in ashmem_ioctl
	Revert "led: core: Fix brightness setting when setting delay_off=0"
	led: core: Clear LED_BLINK_SW flag in led_blink_set()
	platform/x86: asus-nb-wmi: Add wapf4 quirk for the X302UA
	bonding: handle link transition from FAIL to UP correctly
	regulator: anatop: set default voltage selector for pcie
	power: supply: bq24190_charger: Limit over/under voltage fault logging
	x86: i8259: export legacy_pic symbol
	rtc: cmos: Do not assume irq 8 for rtc when there are no legacy irqs
	Input: ar1021_i2c - fix too long name in driver's device table
	time: Change posix clocks ops interfaces to use timespec64
	ACPI/processor: Fix error handling in __acpi_processor_start()
	ACPI/processor: Replace racy task affinity logic
	cpufreq/sh: Replace racy task affinity logic
	genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs
	i2c: i2c-scmi: add a MS HID
	net: ipv6: send unsolicited NA on admin up
	media/dvb-core: Race condition when writing to CAM
	btrfs: fix a bogus warning when converting only data or metadata
	ASoC: Intel: Atom: update Thinkpad 10 quirk
	tools/testing/nvdimm: fix nfit_test shutdown crash
	spi: dw: Disable clock after unregistering the host
	powerpc/64s: Remove SAO feature from Power9 DD1
	ath: Fix updating radar flags for coutry code India
	clk: ns2: Correct SDIO bits
	iwlwifi: split the handler and the wake parts of the notification infra
	iwlwifi: a000: fix memory offsets and lengths
	scsi: virtio_scsi: Always try to read VPD pages
	KVM: PPC: Book3S PR: Exit KVM on failed mapping
	mwifiex: don't leak 'chan_stats' on reset
	x86/reboot: Turn off KVM when halting a CPU
	ARM: 8668/1: ftrace: Fix dynamic ftrace with DEBUG_RODATA and !FRAME_POINTER
	irqchip/mips-gic: Separate IPI reservation & usage tracking
	iommu/omap: Register driver before setting IOMMU ops
	md/raid10: wait up frozen array in handle_write_completed
	NFS: Fix missing pg_cleanup after nfs_pageio_cond_complete()
	tcp: remove poll() flakes with FastOpen
	e1000e: fix timing for 82579 Gigabit Ethernet controller
	ALSA: hda - Fix headset microphone detection for ASUS N551 and N751
	IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow
	IB/ipoib: Update broadcast object if PKey value was changed in index 0
	HSI: ssi_protocol: double free in ssip_pn_xmit()
	IB/mlx4: Take write semaphore when changing the vma struct
	IB/mlx4: Change vma from shared to private
	IB/mlx5: Take write semaphore when changing the vma struct
	IB/mlx5: Change vma from shared to private
	IB/mlx5: Set correct SL in completion for RoCE
	ASoC: Intel: Skylake: Uninitialized variable in probe_codec()
	ibmvnic: Disable irq prior to close
	netvsc: Deal with rescinded channels correctly
	Fix driver usage of 128B WQEs when WQ_CREATE is V1.
	Fix Express lane queue creation.
	gpio: gpio-wcove: fix irq pending status bit width
	netfilter: xt_CT: fix refcnt leak on error path
	openvswitch: Delete conntrack entry clashing with an expectation.
	netfilter: nf_ct_helper: permit cthelpers with different names via nfnetlink
	mmc: host: omap_hsmmc: checking for NULL instead of IS_ERR()
	tipc: check return value of nlmsg_new
	wan: pc300too: abort path on failure
	qlcnic: fix unchecked return value
	netfilter: nft_dynset: continue to next expr if _OP_ADD succeeded
	platform/x86: intel-vbtn: add volume up and down
	scsi: mac_esp: Replace bogus memory barrier with spinlock
	infiniband/uverbs: Fix integer overflows
	pNFS: Fix use after free issues in pnfs_do_read()
	xprtrdma: Cancel refresh worker during buffer shutdown
	NFS: don't try to cross a mountpount when there isn't one there.
	iio: st_pressure: st_accel: Initialise sensor platform data properly
	mt7601u: check return value of alloc_skb
	libertas: check return value of alloc_workqueue
	rndis_wlan: add return value validation
	Btrfs: fix incorrect space accounting after failure to insert inline extent
	Btrfs: send, fix file hole not being preserved due to inline extent
	Btrfs: fix extent map leak during fallocate error path
	orangefs: do not wait for timeout if umounting
	mac80211: don't parse encrypted management frames in ieee80211_frame_acked
	ACPICA: iasl: Fix IORT SMMU GSI disassembling
	iio: hid-sensor: fix return of -EINVAL on invalid values in ret or value
	dt-bindings: mfd: axp20x: Add "xpowers,master-mode" property for AXP806 PMICs
	mfd: palmas: Reset the POWERHOLD mux during power off
	mtip32xx: use runtime tag to initialize command header
	x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails
	gpio: gpio-wcove: fix GPIO IRQ status mask
	staging: unisys: visorhba: fix s-Par to boot with option CONFIG_VMAP_STACK set to y
	staging: wilc1000: fix unchecked return value
	ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
	mac80211: Fix possible sband related NULL pointer de-reference
	mmc: sdhci-of-esdhc: limit SD clock for ls1012a/ls1046a
	netfilter: x_tables: unlock on error in xt_find_table_lock()
	ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
	IB/rdmavt: restore IRQs on error path in rvt_create_ah()
	IB/hfi1: Fix softlockup issue
	platform/x86: asus-wmi: try to set als by default
	ipmi/watchdog: fix wdog hang on panic waiting for ipmi response
	ACPI / PMIC: xpower: Fix power_table addresses
	drm/amdgpu: fix gpu reset crash
	drm/nouveau/kms: Increase max retries in scanout position queries.
	jbd2: Fix lockdep splat with generic/270 test
	ixgbevf: fix size of queue stats length
	net: ethernet: ucc_geth: fix MEM_PART_MURAM mode
	soc/fsl/qe: round brg_freq to 1kHz granularity
	Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue()
	Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup()
	vxlan: correctly handle ipv6.disable module parameter
	qed: Unlock on error in qed_vf_pf_acquire()
	bnx2x: Align RX buffers
	power: supply: bq24190_charger: Add disable-reset device-property
	power: supply: isp1704: Fix unchecked return value of devm_kzalloc
	power: supply: pda_power: move from timer to delayed_work
	Input: twl4030-pwrbutton - use correct device for irq request
	IB/rxe: Don't clamp residual length to mtu
	md/raid10: skip spare disk as 'first' disk
	ACPI / power: Delay turning off unused power resources after suspend
	ia64: fix module loading for gcc-5.4
	tcm_fileio: Prevent information leak for short reads
	x86/xen: split xen_smp_prepare_boot_cpu()
	video: fbdev: udlfb: Fix buffer on stack
	sm501fb: don't return zero on failure path in sm501fb_start()
	pNFS: Fix a deadlock when coalescing writes and returning the layout
	net: hns: fix ethtool_get_strings overflow in hns driver
	cifs: small underflow in cnvrtDosUnixTm()
	mm: fix check for reclaimable pages in PF_MEMALLOC reclaim throttling
	mm, vmstat: suppress pcp stats for unpopulated zones in zoneinfo
	mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page
	rtc: ds1374: wdt: Fix issue with timeout scaling from secs to wdt ticks
	rtc: ds1374: wdt: Fix stop/start ioctl always returning -EINVAL
	ath10k: fix out of bounds access to local buffer
	perf tests kmod-path: Don't fail if compressed modules aren't supported
	block/mq: Cure cpu hotplug lock inversion
	Bluetooth: hci_qca: Avoid setup failure on missing rampatch
	Bluetooth: btqcomsmd: Fix skb double free corruption
	media: c8sectpfe: fix potential NULL pointer dereference in c8sectpfe_timer_interrupt
	drm/msm: fix leak in failed get_pages
	RDMA/iwpm: Fix uninitialized error code in iwpm_send_mapinfo()
	rtlwifi: rtl_pci: Fix the bug when inactiveps is enabled.
	media: bt8xx: Fix err 'bt878_probe()'
	ath10k: handling qos at STA side based on AP WMM enable/disable
	media: [RESEND] media: dvb-frontends: Add delay to Si2168 restart
	qmi_wwan: set FLAG_SEND_ZLP to avoid network initiated disconnect
	serial: 8250_dw: Disable clock on error
	cros_ec: fix nul-termination for firmware build info
	watchdog: Fix potential kref imbalance when opening watchdog
	platform/chrome: Use proper protocol transfer function
	dmaengine: zynqmp_dma: Fix race condition in the probe
	drm/tilcdc: ensure nonatomic iowrite64 is not used
	mmc: avoid removing non-removable hosts during suspend
	IB/ipoib: Avoid memory leak if the SA returns a different DGID
	RDMA/cma: Use correct size when writing netlink stats
	IB/umem: Fix use of npages/nmap fields
	iser-target: avoid reinitializing rdma contexts for isert commands
	vgacon: Set VGA struct resource types
	omapdrm: panel: fix compatible vendor string for td028ttec1
	drm/omap: DMM: Check for DMM readiness after successful transaction commit
	pty: cancel pty slave port buf's work in tty_release
	coresight: Fix disabling of CoreSight TPIU
	pinctrl: Really force states during suspend/resume
	pinctrl: rockchip: enable clock when reading pin direction register
	iommu/vt-d: clean up pr_irq if request_threaded_irq fails
	ip6_vti: adjust vti mtu according to mtu of lower device
	RDMA/ocrdma: Fix permissions for OCRDMA_RESET_STATS
	ARM: dts: aspeed-evb: Add unit name to memory node
	nfsd4: permit layoutget of executable-only files
	clk: Don't touch hardware when reparenting during registration
	clk: axi-clkgen: Correctly handle nocount bit in recalc_rate()
	clk: si5351: Rename internal plls to avoid name collisions
	dmaengine: ti-dma-crossbar: Fix event mapping for TPCC_EVT_MUX_60_63
	IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
	IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq
	clk: migrate the count of orphaned clocks at init
	RDMA/ucma: Fix access to non-initialized CM_ID object
	RDMA/ucma: Don't allow join attempts for unsupported AF family
	usb: gadget: f_hid: fix: Move IN request allocation to set_alt()
	Linux 4.9.90

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-03-25 10:55:44 +02:00
Peter Zijlstra
18dd7b964c block/mq: Cure cpu hotplug lock inversion
[ Upstream commit eabe06595d ]

By poking at /debug/sched_features I triggered the following splat:

 [] ======================================================
 [] WARNING: possible circular locking dependency detected
 [] 4.11.0-00873-g964c8b7-dirty #694 Not tainted
 [] ------------------------------------------------------
 [] bash/2109 is trying to acquire lock:
 []  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff8120cb8b>] static_key_slow_dec+0x1b/0x50
 []
 [] but task is already holding lock:
 []  (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170
 []
 [] which lock already depends on the new lock.
 []
 []
 [] the existing dependency chain (in reverse order) is:
 []
 [] -> #2 (&sb->s_type->i_mutex_key#4){+++++.}:
 []        lock_acquire+0x100/0x210
 []        down_write+0x28/0x60
 []        start_creating+0x5e/0xf0
 []        debugfs_create_dir+0x13/0x110
 []        blk_mq_debugfs_register+0x21/0x70
 []        blk_mq_register_dev+0x64/0xd0
 []        blk_register_queue+0x6a/0x170
 []        device_add_disk+0x22d/0x440
 []        loop_add+0x1f3/0x280
 []        loop_init+0x104/0x142
 []        do_one_initcall+0x43/0x180
 []        kernel_init_freeable+0x1de/0x266
 []        kernel_init+0xe/0x100
 []        ret_from_fork+0x31/0x40
 []
 [] -> #1 (all_q_mutex){+.+.+.}:
 []        lock_acquire+0x100/0x210
 []        __mutex_lock+0x6c/0x960
 []        mutex_lock_nested+0x1b/0x20
 []        blk_mq_init_allocated_queue+0x37c/0x4e0
 []        blk_mq_init_queue+0x3a/0x60
 []        loop_add+0xe5/0x280
 []        loop_init+0x104/0x142
 []        do_one_initcall+0x43/0x180
 []        kernel_init_freeable+0x1de/0x266
 []        kernel_init+0xe/0x100
 []        ret_from_fork+0x31/0x40

 []  *** DEADLOCK ***
 []
 [] 3 locks held by bash/2109:
 []  #0:  (sb_writers#11){.+.+.+}, at: [<ffffffff81292bcd>] vfs_write+0x17d/0x1a0
 []  #1:  (debugfs_srcu){......}, at: [<ffffffff8155a90d>] full_proxy_write+0x5d/0xd0
 []  #2:  (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170
 []
 [] stack backtrace:
 [] CPU: 9 PID: 2109 Comm: bash Not tainted 4.11.0-00873-g964c8b7-dirty #694
 [] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
 [] Call Trace:

 []  lock_acquire+0x100/0x210
 []  get_online_cpus+0x2a/0x90
 []  static_key_slow_dec+0x1b/0x50
 []  static_key_disable+0x20/0x30
 []  sched_feat_write+0x131/0x170
 []  full_proxy_write+0x97/0xd0
 []  __vfs_write+0x28/0x120
 []  vfs_write+0xb5/0x1a0
 []  SyS_write+0x49/0xa0
 []  entry_SYSCALL_64_fastpath+0x23/0xc2

This is because of the cpu hotplug lock rework. Break the chain at #1
by reversing the lock acquisition order. This way i_mutex_key#4 no
longer depends on cpu_hotplug_lock and things are good.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24 11:00:22 +01:00