commit 1bc8d18c75 upstream.
I forgot to release the allocated object at the early error path in
line6_init_pcm(). For addressing it, slightly shuffle the code so
that the PCM destructor (pcm->private_free) is assigned properly
before all error paths.
Fixes: 3450121997 ("ALSA: line6: Fix write on zero-sized buffer")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ef8d8ccdc2 ]
As Jason Baron explained in commit 790ba4566c ("tcp: set SOCK_NOSPACE
under memory pressure"), it is crucial we properly set SOCK_NOSPACE
when needed.
However, Jason patch had a bug, because the 'nonblocking' status
as far as sk_stream_wait_memory() is concerned is governed
by MSG_DONTWAIT flag passed at sendmsg() time :
long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
value.
This patch removes the 'noblock' variable since we must always
set SOCK_NOSPACE if -EAGAIN is returned.
It also renames the do_nonblock label since we might reach this
code path even if we were in blocking mode.
Fixes: 790ba4566c ("tcp: set SOCK_NOSPACE under memory pressure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Reported-by: Vladimir Rutsky <rutsky@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4651d1802f ]
Currently, we are only explicitly setting SOCK_NOSPACE on a write timeout
for non-blocking sockets. Epoll() edge-trigger mode relies on SOCK_NOSPACE
being set when -EAGAIN is returned to ensure that EPOLLOUT is raised.
Expand the setting of SOCK_NOSPACE to non-blocking sockets as well that can
use SO_SNDTIMEO to adjust their write timeout. This mirrors the behavior
that Eric Dumazet introduced for tcp sockets.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ursula Braun <ubraun@linux.ibm.com>
Cc: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit daac07156b upstream.
The `uac_mixer_unit_descriptor` shown as below is read from the
device side. In `parse_audio_mixer_unit`, `baSourceID` field is
accessed from index 0 to `bNrInPins` - 1, the current implementation
assumes that descriptor is always valid (the length of descriptor
is no shorter than 5 + `bNrInPins`). If a descriptor read from
the device side is invalid, it may trigger out-of-bound memory
access.
```
struct uac_mixer_unit_descriptor {
__u8 bLength;
__u8 bDescriptorType;
__u8 bDescriptorSubtype;
__u8 bUnitID;
__u8 bNrInPins;
__u8 baSourceID[];
}
```
This patch fixes the bug by add a sanity check on the length of
the descriptor.
Reported-by: Hui Peng <benquike@gmail.com>
Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Peng <benquike@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 19bce474c4 upstream.
`check_input_term` recursively calls itself with input from
device side (e.g., uac_input_terminal_descriptor.bCSourceID)
as argument (id). In `check_input_term`, if `check_input_term`
is called with the same `id` argument as the caller, it triggers
endless recursive call, resulting kernel space stack overflow.
This patch fixes the bug by adding a bitmap to `struct mixer_build`
to keep track of the checked ids and stop the execution if some id
has been checked (similar to how parse_audio_unit handles unitid
argument).
Reported-by: Hui Peng <benquike@gmail.com>
Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
Signed-off-by: Hui Peng <benquike@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 8c3088f895 ("tcp: be more careful in tcp_fragment()")
triggers following stack trace:
[25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
[25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
[25244.888167] Call Trace:
[25244.889182] <IRQ>
[25244.890001] tcp_fragment+0x9c/0x2cf
[25244.891295] tcp_write_xmit+0x68f/0x988
[25244.892732] __tcp_push_pending_frames+0x3b/0xa0
[25244.894347] tcp_data_snd_check+0x2a/0xc8
[25244.895775] tcp_rcv_established+0x2a8/0x30d
[25244.897282] tcp_v4_do_rcv+0xb2/0x158
[25244.898666] tcp_v4_rcv+0x692/0x956
[25244.899959] ip_local_deliver_finish+0xeb/0x169
[25244.901547] __netif_receive_skb_core+0x51c/0x582
[25244.903193] ? inet_gro_receive+0x239/0x247
[25244.904756] netif_receive_skb_internal+0xab/0xc6
[25244.906395] napi_gro_receive+0x8a/0xc0
[25244.907760] receive_buf+0x9a1/0x9cd
[25244.909160] ? load_balance+0x17a/0x7b7
[25244.910536] ? vring_unmap_one+0x18/0x61
[25244.911932] ? detach_buf+0x60/0xfa
[25244.913234] virtnet_poll+0x128/0x1e1
[25244.914607] net_rx_action+0x12a/0x2b1
[25244.915953] __do_softirq+0x11c/0x26b
[25244.917269] ? handle_irq_event+0x44/0x56
[25244.918695] irq_exit+0x61/0xa0
[25244.919947] do_IRQ+0x9d/0xbb
[25244.921065] common_interrupt+0x85/0x85
[25244.922479] </IRQ>
tcp_rtx_queue_tail() (called by tcp_fragment()) can call
tcp_write_queue_prev() on the first packet in the queue, which will trigger
the BUG in tcp_write_queue_prev(), because there is no previous packet.
This happens when the retransmit queue is empty, for example in case of a
zero window.
Commit 8c3088f895 ("tcp: be more careful in tcp_fragment()") was not a
simple cherry-pick of the original one from master (b617158dc0)
because there is a specific TCP rtx queue only since v4.15. For more
details, please see the commit message of b617158dc0 ("tcp: be more
careful in tcp_fragment()").
The BUG() is hit due to the specific code added to versions older than
v4.15. The comment in skb_queue_prev() (include/linux/skbuff.h:1406),
just before the BUG_ON() somehow suggests to add a check before using
it, what Tim did.
In master, this code path causing the issue will not be taken because
the implementation of tcp_rtx_queue_tail() is different:
tcp_fragment() → tcp_rtx_queue_tail() → tcp_write_queue_prev() →
skb_queue_prev() → BUG_ON()
Fixes: 8c3088f895 ("tcp: be more careful in tcp_fragment()")
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 432973fd3a ]
Register cpufreq notifier after we have initialized the crtc and
unregister it before we remove the ctrc. Receiving a cpufreq notify
without crtc causes a crash.
Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ebcb8f8508 ]
Fix RX_TERMINATION_FORCE_ENABLE define value from 0x0089 to 0x00A9
according to MIPI Alliance MPHY specification.
Fixes: e785060ea3 ("ufs: definitions for phy interface")
Signed-off-by: Pedro Sousa <sousa@synopsys.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ae78ca3cf3 ]
In read_per_ring_refs(), after 'req' and related memory regions are
allocated, xen_blkif_map() is invoked to map the shared frame, irq, and
etc. However, if this mapping process fails, no cleanup is performed,
leading to memory leaks. To fix this issue, invoke the cleanup before
returning the error.
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a56a478a5 ]
If fsg_disable() and fsg_set_alt() are called too closely to each
other (for example due to a quick reset/reconnect), what can happen
is that fsg_set_alt sets common->new_fsg from an interrupt while
handle_exception is trying to process the config change caused by
fsg_disable():
fsg_disable()
...
handle_exception()
sets state back to FSG_STATE_NORMAL
hasn't yet called do_set_interface()
or is inside it.
---> interrupt
fsg_set_alt
sets common->new_fsg
queues a new FSG_STATE_CONFIG_CHANGE
<---
Now, the first handle_exception can "see" the updated
new_fsg, treats it as if it was a fsg_set_alt() response,
call usb_composite_setup_continue() etc...
But then, the thread sees the second FSG_STATE_CONFIG_CHANGE,
and goes back down the same path, wipes and reattaches a now
active fsg, and .. calls usb_composite_setup_continue() which
at this point is wrong.
Not only we get a backtrace, but I suspect the second set_interface
wrecks some state causing the host to get upset in my case.
This fixes it by replacing "new_fsg" by a "state argument" (same
principle) which is set in the same lock section as the state
update, and retrieved similarly.
That way, there is never any discrepancy between the dequeued
state and the observed value of it. We keep the ability to have
the latest reconfig operation take precedence, but we guarantee
that once "dequeued" the argument (new_fsg) will not be clobbered
by any new event.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 602fda17c7 ]
In some cases, one can get out of suspend with a reset or
a disconnect followed by a reconnect. Previously we would
leave a stale suspended flag set.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ab2cbeb0ed ]
Since scatterlist dimensions are all unsigned ints, in the relatively
rare cases where a device's max_segment_size is set to UINT_MAX, then
the "cur_len + s_length <= max_len" check in __finalise_sg() will always
return true. As a result, the corner case of such a device mapping an
excessively large scatterlist which is mergeable to or beyond a total
length of 4GB can lead to overflow and a bogus truncated dma_length in
the resulting segment.
As we already assume that any single segment must be no longer than
max_len to begin with, this can easily be addressed by reshuffling the
comparison.
Fixes: 809eac54cd ("iommu/dma: Implement scatterlist segment merging")
Reported-by: Nicolin Chen <nicoleotsuka@gmail.com>
Tested-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d6fb56072 ]
clang-9 points out that there are two variables that depending on the
configuration may only be used in an ARRAY_SIZE() expression but not
referenced:
drivers/dma/ste_dma40.c:145:12: error: variable 'd40_backup_regs' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static u32 d40_backup_regs[] = {
^
drivers/dma/ste_dma40.c:214:12: error: variable 'd40_backup_regs_chan' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static u32 d40_backup_regs_chan[] = {
Mark these __maybe_unused to shut up the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20190712091357.744515-1-arnd@arndb.de
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit 0e8529143a.
Which was upstream commit 53fe307dfd.
Ben Hutchings reports that this commit depends on new code added in
v4.18, and so is irrelevant on older kernels, and breaks the build.
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The upstream commit:
22e9c88d48 ("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
has a similar effect, but since it is a rewrite of the assembler to C, is
too invasive for stable. This patch is a minimal fix to address the issue in
assembler.
This patch applies cleanly to v5.2, v4.19 & v4.14.
When calling flush_(inval_)dcache_range with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.
This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e0702d90b7 ]
This function is supposed to return error pointers so it matches the
dmz_get_rnd_zone_for_reclaim() function. The current code could lead to
a NULL dereference in dmz_do_reclaim()
Fixes: b234c6d7a7 ("dm zoned: improve error handling in reclaim")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 1fb254aa98 upstream.
Benjamin Moody reported to Debian that XFS partially wedges when a chgrp
fails on account of being out of disk quota. I ran his reproducer
script:
# adduser dummy
# adduser dummy plugdev
# dd if=/dev/zero bs=1M count=100 of=test.img
# mkfs.xfs test.img
# mount -t xfs -o gquota test.img /mnt
# mkdir -p /mnt/dummy
# chown -c dummy /mnt/dummy
# xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt
(and then as user dummy)
$ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo
$ chgrp plugdev /mnt/dummy/foo
and saw:
================================================
WARNING: lock held when returning to user space!
5.3.0-rc5 #rc5 Tainted: G W
------------------------------------------------
chgrp/47006 is leaving the kernel with locks still held!
1 lock held by chgrp/47006:
#0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs]
...which is clearly caused by xfs_setattr_nonsize failing to unlock the
ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing
unlock.
Reported-by: benjamin.moody@gmail.com
Fixes: 253f4911f2 ("xfs: better xfs_trans_alloc interface")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 701d678599 upstream.
In zs_destroy_pool() we call flush_work(&pool->free_work). However, we
have no guarantee that migration isn't happening in the background at
that time.
Since migration can't directly free pages, it relies on free_work being
scheduled to free the pages. But there's nothing preventing an
in-progress migrate from queuing the work *after*
zs_unregister_migration() has called flush_work(). Which would mean
pages still pointing at the inode when we free it.
Since we know at destroy time all objects should be free, no new
migrations can come in (since zs_page_isolate() fails for fully-free
zspages). This means it is sufficient to track a "# isolated zspages"
count by class, and have the destroy logic ensure all such pages have
drained before proceeding. Keeping that state under the class spinlock
keeps the logic straightforward.
In this case a memory leak could lead to an eventual crash if compaction
hits the leaked page. This crash would only occur if people are
changing their zswap backend at runtime (which eventually starts
destruction).
Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
Fixes: 48b4800a1c ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0ff14fdc9 upstream.
If alloc_descs() fails before irq_sysfs_init() has run, free_desc() in the
cleanup path will call kobject_del() even though the kobject has not been
added with kobject_add().
Fix this by making the call to kobject_del() conditional on whether
irq_sysfs_init() has run.
This problem surfaced because commit aa30f47cf6 ("kobject: Add support
for default attribute groups to kobj_type") makes kobject_del() stricter
about pairing with kobject_add(). If the pairing is incorrrect, a WARNING
and backtrace occur in sysfs_remove_group() because there is no parent.
[ tglx: Add a comment to the code and make it work with CONFIG_SYSFS=n ]
Fixes: ecb3f394c5 ("genirq: Expose interrupt information through sysfs")
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1564703564-4116-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 75d66ffb48 upstream.
dm-zoned is observed to lock up or livelock in case of hardware
failure or some misconfiguration of the backing zoned device.
This patch adds a new dm-zoned target function that checks the status of
the backing device. If the request queue of the backing device is found
to be in dying state or the SCSI backing device enters offline state,
the health check code sets a dm-zoned target flag prompting all further
incoming I/O to be rejected. In order to detect backing device failures
timely, this new function is called in the request mapping path, at the
beginning of every reclaim run and before performing any metadata I/O.
The proper way out of this situation is to do
dmsetup remove <dm-zoned target>
and recreate the target when the problem with the backing device
is resolved.
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d7428c5011 upstream.
Some errors are ignored in the I/O path during queueing chunks
for processing by chunk works. Since at least these errors are
transient in nature, it should be possible to retry the failed
incoming commands.
The fix -
Errors that can happen while queueing chunks are carried upwards
to the main mapping function and it now returns DM_MAPIO_REQUEUE
for any incoming requests that can not be properly queued.
Error logging/debug messages are added where needed.
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b234c6d7a7 upstream.
There are several places in reclaim code where errors are not
propagated to the main function, dmz_reclaim(). This function
is responsible for unlocking zones that might be still locked
at the end of any failed reclaim iterations. As the result,
some device zones may be left permanently locked for reclaim,
degrading target's capability to reclaim zones.
This patch fixes these issues as follows -
Make sure that dmz_reclaim_buf(), dmz_reclaim_seq_data() and
dmz_reclaim_rnd_data() return error codes to the caller.
dmz_reclaim() function is renamed to dmz_do_reclaim() to avoid
clashing with "struct dmz_reclaim" and is modified to return the
error to the caller.
dmz_get_zone_for_reclaim() now returns an error instead of NULL
pointer and reclaim code checks for that error.
Error logging/debug messages are added where necessary.
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1cfd5d3399 upstream.
If the sector number is too high, dm_table_find_target() should return a
pointer to a zeroed dm_target structure (the caller should test it with
dm_target_is_valid).
However, for some table sizes, the code in dm_table_find_target() that
performs btree lookup will access out of bound memory structures.
Fix this bug by testing the sector number at the beginning of
dm_table_find_target(). Also, add an "inline" keyword to the function
dm_table_get_size() because this is a hot path.
Fixes: 512875bd96 ("dm: table detect io beyond device")
Cc: stable@vger.kernel.org
Reported-by: Zhang Tao <kontais@zoho.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae148243d3 upstream.
In commit 6096d91af0 ("dm space map metadata: fix occasional leak
of a metadata block on resize"), we refactor the commit logic to a new
function 'apply_bops'. But when that logic was replaced in out() the
return value was not stored. This may lead out() returning a wrong
value to the caller.
Fixes: 6096d91af0 ("dm space map metadata: fix occasional leak of a metadata block on resize")
Cc: stable@vger.kernel.org
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4f9d60138 upstream.
When btree_split_beneath() splits a node to two new children, it will
allocate two blocks: left and right. If right block's allocation
failed, the left block will be unlocked and marked dirty. If this
happened, the left block'ss content is zero, because it wasn't
initialized with the btree struct before the attempot to allocate the
right block. Upon return, when flushing the left block to disk, the
validator will fail when check this block. Then a BUG_ON is raised.
Fix this by completely initializing the left block before allocating and
initializing the right block.
Fixes: 4dcb8b57df ("dm btree: fix leak of bufio-backed block in btree_split_beneath error path")
Cc: stable@vger.kernel.org
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1fef41465 upstream.
This patch fixes a problem in dm-kcopyd that may leave jobs in
complete queue indefinitely in the event of backing storage failure.
This behavior has been observed while running 100% write file fio
workload against an XFS volume created on top of a dm-zoned target
device. If the underlying storage of dm-zoned goes to offline state
under I/O, kcopyd sometimes never issues the end copy callback and
dm-zoned reclaim work hangs indefinitely waiting for that completion.
This behavior was traced down to the error handling code in
process_jobs() function that places the failed job to complete_jobs
queue, but doesn't wake up the job handler. In case of backing device
failure, all outstanding jobs may end up going to complete_jobs queue
via this code path and then stay there forever because there are no
more successful I/O jobs to wake up the job handler.
This patch adds a wake() call to always wake up kcopyd job wait queue
for all I/O jobs that fail before dm_io() gets called for that job.
The patch also sets the write error status in all sub jobs that are
failed because their master job has failed.
Fixes: b73c67c2cb ("dm kcopyd: add sequential write feature")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a90118c445 upstream.
Recent gcc compilers (gcc 9.1) generate warnings about an out of bounds
memset, if the memset goes accross several fields of a struct. This
generated a couple of warnings on x86_64 builds in sanitize_boot_params().
Fix this by explicitly saving the fields in struct boot_params
that are intended to be preserved, and zeroing all the rest.
[ tglx: Tagged for stable as it breaks the warning free build there as well ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190731054627.5627-2-jhubbard@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f897e60a12 upstream.
Some newer machines do not advertise legacy timers. The kernel can handle
that situation if the TSC and the CPU frequency are enumerated by CPUID or
MSRs and the CPU supports TSC deadline timer. If the CPU does not support
TSC deadline timer the local APIC timer frequency has to be known as well.
Some Ryzens machines do not advertize legacy timers, but there is no
reliable way to determine the bus frequency which feeds the local APIC
timer when the machine allows overclocking of that frequency.
As there is no legacy timer the local APIC timer calibration crashes due to
a NULL pointer dereference when accessing the not installed global clock
event device.
Switch the calibration loop to a non interrupt based one, which polls
either TSC (if frequency is known) or jiffies. The latter requires a global
clockevent. As the machines which do not have a global clockevent installed
have a known TSC frequency this is a non issue. For older machines where
TSC frequency is not known, there is no known case where the legacy timers
do not exist as that would have been reported long ago.
Reported-by: Daniel Drake <drake@endlessm.com>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linutronix.de
Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b63f20a778 upstream.
Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to
avoid clobbering flags.
KVM's emulator makes indirect calls into a jump table of sorts, where
the destination of the CALL_NOSPEC is a small blob of code that performs
fast emulation by executing the target instruction with fixed operands.
adcb_al_dl:
0x000339f8 <+0>: adc %dl,%al
0x000339fa <+2>: ret
A major motiviation for doing fast emulation is to leverage the CPU to
handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
both an input and output to the target of CALL_NOSPEC. Clobbering flags
results in all sorts of incorrect emulation, e.g. Jcc instructions often
take the wrong path. Sans the nops...
asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
0x0003595a <+58>: mov 0xc0(%ebx),%eax
0x00035960 <+64>: mov 0x60(%ebx),%edx
0x00035963 <+67>: mov 0x90(%ebx),%ecx
0x00035969 <+73>: push %edi
0x0003596a <+74>: popf
0x0003596b <+75>: call *%esi
0x000359a0 <+128>: pushf
0x000359a1 <+129>: pop %edi
0x000359a2 <+130>: mov %eax,0xc0(%ebx)
0x000359b1 <+145>: mov %edx,0x60(%ebx)
ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
0x000359a8 <+136>: mov -0x10(%ebp),%eax
0x000359ab <+139>: and $0x8d5,%edi
0x000359b4 <+148>: and $0xfffff72a,%eax
0x000359b9 <+153>: or %eax,%edi
0x000359bd <+157>: mov %edi,0x4(%ebx)
For the most part this has gone unnoticed as emulation of guest code
that can trigger fast emulation is effectively limited to MMIO when
running on modern hardware, and MMIO is rarely, if ever, accessed by
instructions that affect or consume flags.
Breakage is almost instantaneous when running with unrestricted guest
disabled, in which case KVM must emulate all instructions when the guest
has invalid state, e.g. when the guest is in Big Real Mode during early
BIOS.
Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support")
Fixes: 1a29b5b7f3 ("KVM: x86: Make indirect calls in emulator speculation safe")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c60e6b5c9 upstream.
If the driver doesn't support open-drain/source config options, we
emulate this behavior when setting the direction by calling
gpiod_direction_input() if the default value is 0 (open-source) or
1 (open-drain), thus not actively driving the line in those cases.
This however clears the FLAG_IS_OUT bit for the GPIO line descriptor
and makes the LINEINFO ioctl() incorrectly report this line's mode as
'input' to user-space.
This commit modifies the ioctl() to always set the GPIOLINE_FLAG_IS_OUT
bit in the lineinfo structure's flags field. Since it's impossible to
use the input mode and open-drain/source options at the same time, we
can be sure the reported information will be correct.
Fixes: 521a2ad6f8 ("gpio: add userspace ABI for GPIO line information")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/r/20190806114151.17652-1-brgl@bgdev.pl
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c358ebf596 upstream.
While I had thought I had fixed this issue in:
commit 342406e4fb ("drm/nouveau/i2c: Disable i2c bus access after
->fini()")
It turns out that while I did fix the error messages I was seeing on my
P50 when trying to access i2c busses with the GPU in runtime suspend, I
accidentally had missed one important detail that was mentioned on the
bug report this commit was supposed to fix: that the CPU would only lock
up when trying to access i2c busses _on connected devices_ _while the
GPU is not in runtime suspend_. Whoops. That definitely explains why I
was not able to get my machine to hang with i2c bus interactions until
now, as plugging my P50 into it's dock with an HDMI monitor connected
allowed me to finally reproduce this locally.
Now that I have managed to reproduce this issue properly, it looks like
the problem is much simpler then it looks. It turns out that some
connected devices, such as MST laptop docks, will actually ACK i2c reads
even if no data was actually read:
[ 275.063043] nouveau 0000:01:00.0: i2c: aux 000a: 1: 0000004c 1
[ 275.063447] nouveau 0000:01:00.0: i2c: aux 000a: 00 01101000 10040000
[ 275.063759] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000001
[ 275.064024] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064285] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064594] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
Because we don't handle the situation of i2c ack without any data, we
end up entering an infinite loop in nvkm_i2c_aux_i2c_xfer() since the
value of cnt always remains at 0. This finally properly explains how
this could result in a CPU hang like the ones observed in the
aforementioned commit.
So, fix this by retrying transactions if no data is written or received,
and give up and fail the transaction if we continue to not write or
receive any data after 32 retries.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a561372405 upstream.
We can't rely on ->peer_features in calc_target() because it may be
called both when the OSD session is established and open and when it's
not. ->peer_features is not valid unless the OSD session is open. If
this happens on a PG split (pg_num increase), that could mean we don't
resend a request that should have been resent, hanging the client
indefinitely.
In userspace this was fixed by looking at require_osd_release and
get_xinfo[osd].features fields of the osdmap. However these fields
belong to the OSD section of the osdmap, which the kernel doesn't
decode (only the client section is decoded).
Instead, let's drop this feature check. It effectively checks for
luminous, so only pre-luminous OSDs would be affected in that on a PG
split the kernel might resend a request that should not have been
resent. Duplicates can occur in other scenarios, so both sides should
already be prepared for them: see dup/replay logic on the OSD side and
retry_attempt check on the client side.
Cc: stable@vger.kernel.org
Fixes: 7de030d6b1 ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT")
Link: https://tracker.ceph.com/issues/41162
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Jerry Lee <leisurelysw24@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf3591ef83 upstream.
Revert the commit bd293d071f. The proper
fix has been made available with commit d0a255e795 ("loop: set
PF_MEMALLOC_NOIO for the worker thread").
Note that the fix offered by commit bd293d071f doesn't really prevent
the deadlock from occuring - if we look at the stacktrace reported by
Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex -
i.e. it has already successfully taken the mutex. Changing the mutex
from mutex_lock to mutex_trylock won't help with deadlocks that happen
afterwards.
PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0"
#0 [ffff8813dedfb938] __schedule at ffffffff8173f405
#1 [ffff8813dedfb990] schedule at ffffffff8173fa27
#2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
#3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
#4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
#5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
#6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
#9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
#13 [ffff8813dedfbec0] kthread at ffffffff810a8428
#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: bd293d071f ("dm bufio: fix deadlock with loop device")
Depends-on: d0a255e795 ("loop: set PF_MEMALLOC_NOIO for the worker thread")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>