[ Upstream commit d81c5054a5 ]
At least old Xen net backends seem to send frags with no real data
sometimes. In case such a fragment happens to occur with the frag limit
already reached the frontend will BUG currently even if this situation
is easily recoverable.
Modify the BUG_ON() condition accordingly.
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a915b982d8 ]
If a server side socket is bound to an address, but not in the listening
state yet, incoming connection requests should receive a reset control
packet in response. However, the function used to send the reset
silently drops the reset packet if the sending socket isn't bound
to a remote address (as is the case for a bound socket not yet in
the listening state). This change fixes this by using the src
of the incoming packet as destination for the reset packet in
this case.
Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 841df92241 ]
We miss a write barrier that guarantees used idx is updated and seen
before log. This will let userspace sync and copy used ring before
used idx is update. Fix this by adding a barrier before log_write().
Fixes: 8dd014adfe ("vhost-net: mergeable buffers support")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4a2eb0c37b ]
syzbot reported a kernel-infoleak, which is caused by an uninitialized
field(sin6_flowinfo) of addr->a.v6 in sctp_inet6addr_event().
The call trace is as below:
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x19a/0x230 lib/usercopy.c:33
CPU: 1 PID: 8164 Comm: syz-executor2 Not tainted 4.20.0-rc3+ #95
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x32d/0x480 lib/dump_stack.c:113
kmsan_report+0x12c/0x290 mm/kmsan/kmsan.c:683
kmsan_internal_check_memory+0x32a/0xa50 mm/kmsan/kmsan.c:743
kmsan_copy_to_user+0x78/0xd0 mm/kmsan/kmsan_hooks.c:634
_copy_to_user+0x19a/0x230 lib/usercopy.c:33
copy_to_user include/linux/uaccess.h:183 [inline]
sctp_getsockopt_local_addrs net/sctp/socket.c:5998 [inline]
sctp_getsockopt+0x15248/0x186f0 net/sctp/socket.c:7477
sock_common_getsockopt+0x13f/0x180 net/core/sock.c:2937
__sys_getsockopt+0x489/0x550 net/socket.c:1939
__do_sys_getsockopt net/socket.c:1950 [inline]
__se_sys_getsockopt+0xe1/0x100 net/socket.c:1947
__x64_sys_getsockopt+0x62/0x80 net/socket.c:1947
do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
sin6_flowinfo is not really used by SCTP, so it will be fixed by simply
setting it to 0.
The issue exists since very beginning.
Thanks Alexander for the reproducer provided.
Reported-by: syzbot+ad5d327e6936a2e284be@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1986af16e8 ]
Added support for the Telit LN940 series cellular modules QMI interface.
QMI_QUIRK_SET_DTR quirk requied for Qualcomm MDM9x40 chipset.
Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6b8d95f179 ]
Validate packet socket address length if a length is given. Zero
length is equivalent to not setting an address.
Fixes: 99137b7888 ("packet: validate address length")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d5c7c745f2 ]
When x25_asy_open() fails, it already cleans up by itself,
so its caller doesn't need to free the memory again.
It seems we still have to call x25_asy_free() to clear the SLF_INUSE
bit, so just set these pointers to NULL after kfree().
Reported-and-tested-by: syzbot+5e5e969e525129229052@syzkaller.appspotmail.com
Fixes: 3b780bed31 ("x25_asy: Free x25_asy on x25_asy_open() failure.")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7314f5480f ]
nr_find_socket(), nr_find_peer() and nr_find_listener() lock the
sock after finding it in the global list. However, the call path
requires BH disabled for the sock lock consistently.
Actually the locking is unnecessary at this point, we can just hold
the sock refcnt to make sure it is not gone after we unlock the global
list, and lock it later only when needed.
Reported-and-tested-by: syzbot+f621cda8b7e598908efa@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8742beb50f ]
Even though the link is down before entering hibernation,
there is an issue that the network interface always links up after resuming
from hibernation.
If the link is still down before enabling the network interface,
and after resuming from hibernation, the phydev->state is forcibly set
to PHY_UP in mdio_bus_phy_restore(), and the link becomes up.
In suspend sequence, only if the PHY is attached, mdio_bus_phy_suspend()
calls phy_stop_machine(), and mdio_bus_phy_resume() calls
phy_start_machine().
In resume sequence, it's enough to do the same as mdio_bus_phy_resume()
because the state has been preserved.
This patch fixes the issue by calling phy_start_machine() in
mdio_bus_phy_restore() in the same way as mdio_bus_phy_resume().
Fixes: bc87922ff5 ("phy: Move PHY PM operations into phy_device")
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4298388574 ]
On some platforms (currently detected only on SAMA5D4) TX might stuck
even the pachets are still present in DMA memories and TX start was
issued for them. This happens due to race condition between MACB driver
updating next TX buffer descriptor to be used and IP reading the same
descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
is asserted TX must be restarted. Restart TX if used bit is read and
packets are present in software TX queue. Packets are removed from software
TX queue if TX was successful for them (see macb_tx_interrupt()).
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ade446403b ]
Since commit 7969e5c40d ("ip: discard IPv4 datagrams with overlapping
segments.") IPv4 reassembly code drops the whole queue whenever an
overlapping fragment is received. However, the test is written in a way
which detects duplicate fragments as overlapping so that in environments
with many duplicate packets, fragmented packets may be undeliverable.
Add an extra test and for (potentially) duplicate fragment, only drop the
new fragment rather than the whole queue. Only starting offset and length
are checked, not the contents of the fragments as that would be too
expensive. For similar reason, linear list ("run") of a rbtree node is not
iterated, we only check if the new fragment is a subset of the interval
covered by existing consecutive fragments.
v2: instead of an exact check iterating through linear list of an rbtree
node, only check if the new fragment is subset of the "run" (suggested
by Eric Dumazet)
Fixes: 7969e5c40d ("ip: discard IPv4 datagrams with overlapping segments.")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fb24274546 ]
syzbot reported the use of uninitialized udp6_addr::sin6_scope_id.
We can just set ::sin6_scope_id to zero, as tunnels are unlikely
to use an IPv6 address that needs a scope id and there is no
interface to bind in this context.
For net-next, it looks different as we have cfg->bind_ifindex there
so we can probably call ipv6_iface_scope_id().
Same for ::sin6_flowinfo, tunnels don't use it.
Fixes: 8024e02879 ("udp: Add udp_sock_create for UDP tunnels to open listener socket")
Reported-by: syzbot+c56449ed3652e6720f30@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 40c3ff6d5e ]
Packet sockets may call dev_header_parse with NULL daddr. Make
lowpan_header_ops.create fail.
Fixes: 87a93e4ece ("ieee802154: change needed headroom/tailroom")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 756af9c642 ]
Commit 33a48ab105 ("ibmveth: Fix DMA unmap error") fixed an issue in the
normal code path of ibmveth_xmit_start() that was originally introduced by
Commit 6e8ab30ec6 ("ibmveth: Add scatter-gather support"). This original
fix missed the error path where dma_unmap_page is wrongly called on the
header portion in descs[0] which was mapped with dma_map_single. As a
result a failure to DMA map any of the frags results in a dmesg warning
when CONFIG_DMA_API_DEBUG is enabled.
------------[ cut here ]------------
DMA-API: ibmveth 30000002: device driver frees DMA memory with wrong function
[device address=0x000000000a430000] [size=172 bytes] [mapped as page] [unmapped as single]
WARNING: CPU: 1 PID: 8426 at kernel/dma/debug.c:1085 check_unmap+0x4fc/0xe10
...
<snip>
...
DMA-API: Mapped at:
ibmveth_start_xmit+0x30c/0xb60
dev_hard_start_xmit+0x100/0x450
sch_direct_xmit+0x224/0x490
__qdisc_run+0x20c/0x980
__dev_queue_xmit+0x1bc/0xf20
This fixes the API misuse by unampping descs[0] with dma_unmap_single.
Fixes: 6e8ab30ec6 ("ibmveth: Add scatter-gather support")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c433570458 ]
There are multiple issues here:
1. After freeing dev->ax25_ptr, we need to set it to NULL otherwise
we may use a dangling pointer.
2. There is a race between ax25_setsockopt() and device notifier as
reported by syzbot. Close it by holding RTNL lock.
3. We need to test if dev->ax25_ptr is NULL before using it.
Reported-and-tested-by: syzbot+ae6bb869cbed29b29040@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 69d2c86766 ]
vr.mifi is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
net/ipv6/ip6mr.c:1845 ip6mr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)
net/ipv6/ip6mr.c:1919 ip6mr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)
Fix this by sanitizing vr.mifi before using it to index mrt->vif_table'
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5648451e30 ]
vr.vifi is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
net/ipv4/ipmr.c:1616 ipmr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)
net/ipv4/ipmr.c:1690 ipmr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)
Fix this by sanitizing vr.vifi before using it to index mrt->vif_table'
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 505b524032 upstream.
nr is indirectly controlled by user-space, hence leading to a
potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/gpu/drm/drm_ioctl.c:805 drm_ioctl() warn: potential spectre issue 'dev->driver->ioctls' [r]
drivers/gpu/drm/drm_ioctl.c:810 drm_ioctl() warn: potential spectre issue 'drm_ioctls' [r] (local cap)
drivers/gpu/drm/drm_ioctl.c:892 drm_ioctl_flags() warn: potential spectre issue 'drm_ioctls' [r] (local cap)
Fix this by sanitizing nr before using it to index dev->driver->ioctls
and drm_ioctls.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20181220000015.GA18973@embeddedor
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea5751ccd6 upstream.
proc_sys_lookup can fail with ENOMEM instead of ENOENT when the
corresponding sysctl table is being unregistered. In our case we see
this upon opening /proc/sys/net/*/conf files while network interfaces
are being deleted, which confuses our configuration daemon.
The problem was successfully reproduced and this fix tested on v4.9.122
and v4.20-rc6.
v2: return ERR_PTRs in all cases when proc_sys_make_inode fails instead
of mixing them with NULL. Thanks Al Viro for the feedback.
Fixes: ace0c791e6 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
Cc: stable@vger.kernel.org
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68600f623d upstream.
I've noticed, that dying memory cgroups are often pinned in memory by a
single pagecache page. Even under moderate memory pressure they sometimes
stayed in such state for a long time. That looked strange.
My investigation showed that the problem is caused by applying the LRU
pressure balancing math:
scan = div64_u64(scan * fraction[lru], denominator),
where
denominator = fraction[anon] + fraction[file] + 1.
Because fraction[lru] is always less than denominator, if the initial scan
size is 1, the result is always 0.
This means the last page is not scanned and has
no chances to be reclaimed.
Fix this by rounding up the result of the division.
In practice this change significantly improves the speed of dying cgroups
reclaim.
[guro@fb.com: prevent double calculation of DIV64_U64_ROUND_UP() arguments]
Link: http://lkml.kernel.org/r/20180829213311.GA13501@castle
Link: http://lkml.kernel.org/r/20180827162621.30187-3-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e58725d51f upstream.
UBIFS's recovery code strictly assumes that a deleted inode will never
come back, therefore it removes all data which belongs to that inode
as soon it faces an inode with link count 0 in the replay list.
Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE
it can lead to data loss upon a power-cut.
Consider a journal with entries like:
0: inode X (nlink = 0) /* O_TMPFILE was created */
1: data for inode X /* Someone writes to the temp file */
2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */
3: inode X (nlink = 1) /* inode was re-linked via linkat() */
Upon replay of entry #2 UBIFS will drop all data that belongs to inode X,
this will lead to an empty file after mounting.
As solution for this problem, scan the replay list for a re-link entry
before dropping data.
Fixes: 474b93704f ("ubifs: Implement O_TMPFILE")
Cc: stable@vger.kernel.org # 4.9-4.18
Cc: Russell Senior <russell@personaltelco.net>
Cc: Rafał Miłecki <zajec5@gmail.com>
Reported-by: Russell Senior <russell@personaltelco.net>
Reported-by: Rafał Miłecki <zajec5@gmail.com>
Tested-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Richard Weinberger <richard@nod.at>
[rmilecki: update ubifs_assert() calls to compile with 4.18 and older]
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
(cherry picked from commit e58725d51f)
Signed-off-by: Sasha Levin <sashal@kernel.org>
The relevant difference between prepare_message and config is that the
former is run before the CS signal is asserted. So the polarity of the
CLK line must be configured in prepare_message as an edge generated by
config might already result in a latch of the MOSI line.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
[ukleinek: backport to v4.14.x]
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This is just preparatory work which allows to move some initialisation
that currently is done in the per transfer hook .config to an earlier
point in time in the next few patches. There is no change in behaviour
introduced by this patch.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
[ukleinek: backport to v4.14.x]
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit c7c3f05e34 upstream.
From printk()/serial console point of view panic() is special, because
it may force CPU to re-enter printk() or/and serial console driver.
Therefore, some of serial consoles drivers are re-entrant. E.g. 8250:
serial8250_console_write()
{
if (port->sysrq)
locked = 0;
else if (oops_in_progress)
locked = spin_trylock_irqsave(&port->lock, flags);
else
spin_lock_irqsave(&port->lock, flags);
...
}
panic() does set oops_in_progress via bust_spinlocks(1), so in theory
we should be able to re-enter serial console driver from panic():
CPU0
<NMI>
uart_console_write()
serial8250_console_write() // if (oops_in_progress)
// spin_trylock_irqsave()
call_console_drivers()
console_unlock()
console_flush_on_panic()
bust_spinlocks(1) // oops_in_progress++
panic()
<NMI/>
spin_lock_irqsave(&port->lock, flags) // spin_lock_irqsave()
serial8250_console_write()
call_console_drivers()
console_unlock()
printk()
...
However, this does not happen and we deadlock in serial console on
port->lock spinlock. And the problem is that console_flush_on_panic()
called after bust_spinlocks(0):
void panic(const char *fmt, ...)
{
bust_spinlocks(1);
...
bust_spinlocks(0);
console_flush_on_panic();
...
}
bust_spinlocks(0) decrements oops_in_progress, so oops_in_progress
can go back to zero. Thus even re-entrant console drivers will simply
spin on port->lock spinlock. Given that port->lock may already be
locked either by a stopped CPU, or by the very same CPU we execute
panic() on (for instance, NMI panic() on printing CPU) the system
deadlocks and does not reboot.
Fix this by removing bust_spinlocks(0), so oops_in_progress is always
set in panic() now and, thus, re-entrant console drivers will trylock
the port->lock instead of spinning on it forever, when we call them
from console_flush_on_panic().
Link: http://lkml.kernel.org/r/20181025101036.6823-1-sergey.senozhatsky@gmail.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Wang <wonderfly@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: linux-serial@vger.kernel.org
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2dd5146e9 upstream.
nested_get_vmcs12_pages() processes the posted_intr address in vmcs12. It
caches the kmap()ed page object and pointer, however, it doesn't handle
errors correctly: it's possible to cache a valid pointer, then release
the page and later dereference the dangling pointer.
I was able to reproduce with the following steps:
1. Call vmlaunch with valid posted_intr_desc_addr but an invalid
MSR_EFER. This causes nested_get_vmcs12_pages() to cache the kmap()ed
pi_desc_page and pi_desc. Later the invalid EFER value fails
check_vmentry_postreqs() which fails the first vmlaunch.
2. Call vmlanuch with a valid EFER but an invalid posted_intr_desc_addr
(I set it to 2G - 0x80). The second time we call nested_get_vmcs12_pages
pi_desc_page is unmapped and released and pi_desc_page is set to NULL
(the "shouldn't happen" clause). Due to the invalid
posted_intr_desc_addr, kvm_vcpu_gpa_to_page() fails and
nested_get_vmcs12_pages() returns. It doesn't return an error value so
vmlaunch proceeds. Note that at this time we have a dangling pointer in
vmx->nested.pi_desc and POSTED_INTR_DESC_ADDR in L0's vmcs.
3. Issue an IPI in L2 guest code. This triggers a call to
vmx_complete_nested_posted_interrupt() and pi_test_and_clear_on() which
dereferences the dangling pointer.
Vulnerable code requires nested and enable_apicv variables to be set to
true. The host CPU must also support posted interrupts.
Fixes: 5e2f30b756 "KVM: nVMX: get rid of nested_get_page()"
Cc: stable@vger.kernel.org
Reviewed-by: Andy Honig <ahonig@google.com>
Signed-off-by: Cfir Cohen <cfir@google.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e1b869fff upstream.
Some guests OSes (including Windows 10) write to MSR 0xc001102c
on some cases (possibly while trying to apply a CPU errata).
Make KVM ignore reads and writes to that MSR, so the guest won't
crash.
The MSR is documented as "Execution Unit Configuration (EX_CFG)",
at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family
15h Models 00h-0Fh Processors".
Cc: stable@vger.kernel.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e59f5e08ec upstream.
Commit 78d3a92edb ("gpiolib-acpi: Register GpioInt ACPI event handlers
from a late_initcall") deferred the entire acpi_gpiochip_request_interrupt
call for each event resource.
This means it also delays the gpiochip_request_own_desc(..., "ACPI:Event")
call. This is a problem if some AML code reads the GPIO pin before we
run the deferred acpi_gpiochip_request_interrupt, because in that case
acpi_gpio_adr_space_handler() will already have called
gpiochip_request_own_desc(..., "ACPI:OpRegion") causing the call from
acpi_gpiochip_request_interrupt to fail with -EBUSY and we will fail to
register an event handler.
acpi_gpio_adr_space_handler is prepared for acpi_gpiochip_request_interrupt
already having claimed the pin, but the other way around does not work.
One example of a problem this causes, is the event handler for the OTG
ID pin on a Prowise PT301 tablet not registering, keeping the port stuck
in whatever mode it was in during boot and e.g. only allowing charging
after a reboot.
This commit fixes this by only deferring the request_irq call and the
initial run of edge-triggered IRQs instead of deferring all of
acpi_gpiochip_request_interrupt.
Cc: stable@vger.kernel.org
Fixes: 78d3a92edb ("gpiolib-acpi: Register GpioInt ACPI event ...")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abf221d2f5 upstream.
spi_read() and spi_write() require DMA-safe memory. When
CONFIG_VMAP_STACK is selected, those functions cannot be used
with buffers on stack.
This patch replaces calls to spi_read() and spi_write() by
spi_write_then_read() which doesn't require DMA-safe buffers.
Fixes: 0c36ec3147 ("gpio: gpio driver for max7301 SPI GPIO expander")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b47979068 upstream.
While booting with rootfs on MMC, the following warning is encountered
on OMAP4430:
omap-dma-engine 4a056000.dma-controller: DMA-API: mapping sg segment longer than device claims to support [len=69632] [max=65536]
This is because the DMA engine has a default maximum segment size of 64K
but HSMMC sets:
mmc->max_blk_size = 512; /* Block Length at max can be 1024 */
mmc->max_blk_count = 0xFFFF; /* No. of Blocks is 16 bits */
mmc->max_req_size = mmc->max_blk_size * mmc->max_blk_count;
mmc->max_seg_size = mmc->max_req_size;
which ends up telling the block layer that we support a maximum segment
size of 65535*512, which exceeds the advertised DMA engine capabilities.
Fix this by clamping the maximum segment size to the lower of the
maximum request size and of the DMA engine device used for either DMA
channel.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e3ae3401aa upstream.
Some eMMCs from Micron have been reported to need ~800 ms timeout, while
enabling the CACHE ctrl after running sudden power failure tests. The
needed timeout is greater than what the card specifies as its generic CMD6
timeout, through the EXT_CSD register, hence the problem.
Normally we would introduce a card quirk to extend the timeout for these
specific Micron cards. However, due to the rather complicated debug process
needed to find out the error, let's simply use a minimum timeout of 1600ms,
the double of what has been reported, for all cards when enabling CACHE
ctrl.
Reported-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
Reported-by: Andreas Dannenberg <dannenberg@ti.com>
Reported-by: Faiz Abbas <faiz_abbas@ti.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba9f39a785 upstream.
In commit 5320226a05 ("mmc: core: Disable HPI for certain Hynix eMMC
cards"), then intent was to prevent HPI from being used for some eMMC
cards, which didn't properly support it. However, that went too far, as
even BKOPS and CACHE ctrl became prevented. Let's restore those parts and
allow BKOPS and CACHE ctrl even if HPI isn't supported.
Fixes: 5320226a05 ("mmc: core: Disable HPI for certain Hynix eMMC cards")
Cc: Pratibhasagar V <pratibha@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0741ba40a upstream.
During a re-initialization of the eMMC card, we may fail to re-enable HPI.
In these cases, that isn't properly reflected in the card->ext_csd.hpi_en
bit, as it keeps being set. This may cause following attempts to use HPI,
even if's not enabled. Let's fix this!
Fixes: eb0d8f135b ("mmc: core: support HPI send command")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61cce6f6ee upstream.
When boxes are run near (or to) OOM, we have a problem with the discard
page allocation in sd. If we fail allocating the special page, we return
busy, and it'll get retried. But since ordering is honored for dispatch
requests, we can keep retrying this same IO and failing. Behind that IO
could be requests that want to free memory, but they never get the
chance. This means you get repeated spews of traces like this:
[1201401.625972] Call Trace:
[1201401.631748] dump_stack+0x4d/0x65
[1201401.639445] warn_alloc+0xec/0x190
[1201401.647335] __alloc_pages_slowpath+0xe84/0xf30
[1201401.657722] ? get_page_from_freelist+0x11b/0xb10
[1201401.668475] ? __alloc_pages_slowpath+0x2e/0xf30
[1201401.679054] __alloc_pages_nodemask+0x1f9/0x210
[1201401.689424] alloc_pages_current+0x8c/0x110
[1201401.699025] sd_setup_write_same16_cmnd+0x51/0x150
[1201401.709987] sd_init_command+0x49c/0xb70
[1201401.719029] scsi_setup_cmnd+0x9c/0x160
[1201401.727877] scsi_queue_rq+0x4d9/0x610
[1201401.736535] blk_mq_dispatch_rq_list+0x19a/0x360
[1201401.747113] blk_mq_sched_dispatch_requests+0xff/0x190
[1201401.758844] __blk_mq_run_hw_queue+0x95/0xa0
[1201401.768653] blk_mq_run_work_fn+0x2c/0x30
[1201401.777886] process_one_work+0x14b/0x400
[1201401.787119] worker_thread+0x4b/0x470
[1201401.795586] kthread+0x110/0x150
[1201401.803089] ? rescuer_thread+0x320/0x320
[1201401.812322] ? kthread_park+0x90/0x90
[1201401.820787] ? do_syscall_64+0x53/0x150
[1201401.829635] ret_from_fork+0x29/0x40
Ensure that the discard page allocation has a mempool backing, so we
know we can make progress.
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>