commit 233bde21aa upstream.
It happens often while I'm preparing a patch for a block driver that
I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
available for this driver? Do I have to introduce definitions of these
constants before I can use these constants? To avoid this confusion,
move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
<linux/blkdev.h> header file such that these become available for all
block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
header file conditional to avoid that including that header file after
<linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
redefinition.
Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
not been removed from uapi header files nor from NAND drivers in
which these constants are used for another purpose than converting
block layer offsets and sizes into a number of sectors.
Cc: David S. Miller <davem@davemloft.net>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5f6b95c72 upstream.
Since nobody else is going to restart our hw_queue for us, the
blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
necessarily sufficient to ensure that the queue will get started again.
In case of global resource outage (-ENOMEM because mapping failure,
because of swiotlb full) our virtqueue may be empty and we can get
stuck with a stopped hw_queue.
Let us not stop the queue on arbitrary errors, but only on -EONSPC which
indicates a full virtqueue, where the hw_queue is guaranteed to get
started by virtblk_done() before when it makes sense to carry on
submitting requests. Let us also remove a stale comment.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Fixes: f7728002c1 ("virtio_ring: fix return code on DMA mapping fails")
Link: https://lore.kernel.org/r/20200213123728.61216-2-pasic@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e90ca68b0 upstream.
Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in
wait_til_ready().
Which on the face of it can't happen, since as Willy Tarreau points out,
the function does no particular memory access. Except through the FDCS
macro, which just indexes a static allocation through teh current fdc,
which is always checked against N_FDC.
Except the checking happens after we've already assigned the value.
The floppy driver is a disgrace (a lot of it going back to my original
horrd "design"), and has no real maintainer. Nobody has the hardware,
and nobody really cares. But it still gets used in virtual environment
because it's one of those things that everybody supports.
The whole thing should be re-written, or at least parts of it should be
seriously cleaned up. The 'current fdc' index, which is used by the
FDCS macro, and which is often shadowed by a local 'fdc' variable, is a
prime example of how not to write code.
But because nobody has the hardware or the motivation, let's just fix up
the immediate problem with a nasty band-aid: test the fdc index before
actually assigning it to the static 'fdc' variable.
Reported-by: Jordy Zomer <jordy@simplyhacker.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c8ab422553 ]
In brd_init func, rd_nr num of brd_device are firstly allocated
and add in brd_devices, then brd_devices are traversed to add each
brd_device by calling add_disk func. When allocating brd_device,
the disk->first_minor is set to i * max_part, if rd_nr * max_part
is larger than MINORMASK, two different brd_device may have the same
devt, then only one of them can be successfully added.
when rmmod brd.ko, it will cause oops when calling brd_exit.
Follow those steps:
# modprobe brd rd_nr=3 rd_size=102400 max_part=1048576
# rmmod brd
then, the oops will appear.
Oops log:
[ 726.613722] Call trace:
[ 726.614175] kernfs_find_ns+0x24/0x130
[ 726.614852] kernfs_find_and_get_ns+0x44/0x68
[ 726.615749] sysfs_remove_group+0x38/0xb0
[ 726.616520] blk_trace_remove_sysfs+0x1c/0x28
[ 726.617320] blk_unregister_queue+0x98/0x100
[ 726.618105] del_gendisk+0x144/0x2b8
[ 726.618759] brd_exit+0x68/0x560 [brd]
[ 726.619501] __arm64_sys_delete_module+0x19c/0x2a0
[ 726.620384] el0_svc_common+0x78/0x130
[ 726.621057] el0_svc_handler+0x38/0x78
[ 726.621738] el0_svc+0x8/0xc
[ 726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260)
Here, we add brd_check_and_reset_par func to check and limit max_part par.
--
V5->V6:
- remove useless code
V4->V5:(suggested by Ming Lei)
- make sure max_part is not larger than DISK_MAX_PARTS
V3->V4:(suggested by Ming Lei)
- remove useless change
- add one limit of max_part
V2->V3: (suggested by Ming Lei)
- clear .minors when running out of consecutive minor space in brd_alloc
- remove limit of rd_nr
V1->V2:
- add more checks in brd_check_par_valid as suggested by Ming Lei.
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33da8e7c81 ]
My recent to change to only use force_sig for a synchronous events
wound up breaking signal reception cifs and drbd. I had overlooked
the fact that by default kthreads start out with all signals set to
SIG_IGN. So a change I thought was safe turned out to have made it
impossible for those kernel thread to catch their signals.
Reverting the work on force_sig is a bad idea because what the code
was doing was very much a misuse of force_sig. As the way force_sig
ultimately allowed the signal to happen was to change the signal
handler to SIG_DFL. Which after the first signal will allow userspace
to send signals to these kernel threads. At least for
wake_ack_receiver in drbd that does not appear actively wrong.
So correct this problem by adding allow_kernel_signal that will allow
signals whose siginfo reports they were sent by the kernel through,
but will not allow userspace generated signals, and update cifs and
drbd to call allow_kernel_signal in an appropriate place so that their
thread can receive this signal.
Fixing things this way ensures that userspace won't be able to send
signals and cause problems, that it is clear which signals the
threads are expecting to receive, and it guarantees that nothing
else in the system will be affected.
This change was partly inspired by similar cifs and drbd patches that
added allow_signal.
Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com>
Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Cc: Steve French <smfrench@gmail.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Fixes: 247bc9470b ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes")
Fixes: 72abe3bcf0 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig")
Fixes: fee109901f ("signal/drbd: Use send_sig not force_sig")
Fixes: 3cf5d076fb ("signal: Remove task parameter from force_sig")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 589b72894f upstream.
Clang warns:
../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
statement is not part of the previous 'if' [-Wmisleading-indentation]
nr_parts = PARTS_PER_DISK;
^
../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
if (err)
^
This is because there is a space at the beginning of this line; remove
it so that the indentation is consistent according to the Linux kernel
coding style and clang no longer warns.
While we are here, the previous line has some trailing whitespace; clean
that up as well.
Fixes: c80a420995 ("xen-blkfront: handle Xen major numbers other than XENVBD")
Link: https://github.com/ClangBuiltLinux/linux/issues/791
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f9bd84a8a8 ]
For each I/O request, blkback first maps the foreign pages for the
request to its local pages. If an allocation of a local page for the
mapping fails, it should unmap every mapping already made for the
request.
However, blkback's handling mechanism for the allocation failure does
not mark the remaining foreign pages as unmapped. Therefore, the unmap
function merely tries to unmap every valid grant page for the request,
including the pages not mapped due to the allocation failure. On a
system that fails the allocation frequently, this problem leads to
following kernel crash.
[ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[ 372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40
[ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
[ 372.012562] Oops: 0002 [#1] SMP
[ 372.012566] Modules linked in: act_police sch_ingress cls_u32
...
[ 372.012746] Call Trace:
[ 372.012752] [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40
[ 372.012759] [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
...
[ 372.012802] [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
...
Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[ 0.000000] Initializing cgroup subsys cpuset
This commit fixes this problem by marking the grant pages of the given
request that didn't mapped due to the allocation failure as invalid.
Fixes: c6cc142dac ("xen-blkback: use balloon pages for all mappings")
Reviewed-by: David Woodhouse <dwmw@amazon.de>
Reviewed-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Paul Durrant <pdurrant@amazon.co.uk>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa2ac657f9 ]
Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem
cache. This cache is destoyed when xen-blkif is unloaded so it is
necessary to wait for the deferred free routine used for such objects to
complete. This necessity was missed in commit 14855954f6 "xen-blkback:
allow module to be cleanly unloaded". This patch fixes the problem by
taking/releasing extra module references in xen_blkif_alloc/free()
respectively.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcb77e4b27 ]
The driver misses calling destroy_workqueue in remove like what is done
when probe fails.
Add the missed calls to fix it.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c38f03511 ]
print_st_err() is defined with its 4th argument taking an
'enum drbd_state_rv' but its prototype use an int for it.
Fix this by using 'enum drbd_state_rv' in the prototype too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f708bd08ec ]
"suspending" IO is overloaded.
It can mean "do not allow new requests" (obviously),
but it also may mean "must not complete pending IO",
for example while the fencing handlers do their arbitration.
When adjusting disk options, we suspend io (disallow new requests), then
wait for the activity-log to become unused (drain all IO completions),
and possibly replace it with a new activity log of different size.
If the other "suspend IO" aspect is active, pending IO completions won't
happen, and we would block forever (unkillable drbdsetup process).
Fix this by skipping the activity log adjustment if the "al-extents"
setting did not change. Also, in case it did change, fail early without
blocking if it looks like we would block forever.
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe43ed97bb ]
Multiple failure scenario:
a) all good
Connected Primary/Secondary UpToDate/UpToDate
b) lose disk on Primary,
Connected Primary/Secondary Diskless/UpToDate
c) continue to write to the device,
changes only make it to the Secondary storage.
d) lose disk on Secondary,
Connected Primary/Secondary Diskless/Diskless
e) now try to re-attach on Primary
This would have succeeded before, even though that is clearly the
wrong data set to attach to (missing the modifications from c).
Because we only compared our "effective" and the "to-be-attached"
data generation uuid tags if (device->state.conn < C_CONNECTED).
Fix: change that constraint to (device->state.pdsk != D_UP_TO_DATE)
compare the uuids, and reject the attach.
This patch also tries to improve the reverse scenario:
first lose Secondary, then Primary disk,
then try to attach the disk on Secondary.
Before this patch, the attach on the Secondary succeeds, but since commit
drbd: disconnect, if the wrong UUIDs are attached on a connected peer
the Primary will notice unsuitable data, and drop the connection hard.
Though unfortunately at a point in time during the handshake where
we cannot easily abort the attach on the peer without more
refactoring of the handshake.
We now reject any attach to "unsuitable" uuids,
as long as we can see a Primary role,
unless we already have access to "good" data.
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94c43a13b8 ]
During handshake, if we are diskless ourselves, we used to accept any size
presented by the peer.
Which could be zero if that peer was just brought up and connected
to us without having a disk attached first, in which case both
peers would just "flip" their volume sizes.
Now, even a diskless node will ignore "zero" sizes
presented by a diskless peer.
Also a currently Diskless Primary will refuse to shrink during handshake:
it may be frozen, and waiting for a "suitable" local disk or peer to
re-appear (on-no-data-accessible suspend-io). If the peer is smaller
than what we used to be, it is not suitable.
The logic for a diskless node during handshake is now supposed to be:
believe the peer, if
- I don't have a current size myself
- we agree on the size anyways
- I do have a current size, am Secondary, and he has the only disk
- I do have a current size, am Primary, and he has the only disk,
which is larger than my current size
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8e9c523016 ]
There are two callers of this function and they both unlock the mutex so
this ends up being a double unlock.
Fixes: 44ed167da7 ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 53d0f8dbde ]
The error handling in fd_probe_drives() doesn't clean up at all. Fix it
up in preparation for converting to blk-mq. While we're here, get rid of
the commented out amiga_floppy_remove().
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fdbe4eeeb1 ]
Enabling Direct I/O with loop devices helps reducing memory usage by
avoiding double caching. 32 bit applications running on 64 bits systems
are currently not able to request direct I/O because is missing from the
lo_compat_ioctl.
This patch fixes the compatibility issue mentioned above by exporting
LOOP_SET_DIRECT_IO as additional lo_compat_ioctl() entry.
The input argument for this ioctl is a single long converted to a 1-bit
boolean, so compatibility is preserved.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ae78ca3cf3 ]
In read_per_ring_refs(), after 'req' and related memory regions are
allocated, xen_blkif_map() is invoked to map the shared frame, irq, and
etc. However, if this mapping process fails, no cleanup is performed,
leading to memory leaks. To fix this issue, invoke the cleanup before
returning the error.
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 77ce56e2bf ]
Building with clang and KASAN, we get a warning about an overly large
stack frame on 32-bit architectures:
drivers/block/drbd/drbd_receiver.c:921:31: error: stack frame size of 1280 bytes in function 'conn_connect'
[-Werror,-Wframe-larger-than=]
We already allocate other data dynamically in this function, so
just do the same for the shash descriptor, which makes up most of
this memory.
Link: https://lore.kernel.org/lkml/20190617132440.2721536-1-arnd@arndb.de/
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>