commit 5a9372f751 upstream.
While reading through the sysvipc implementation, I noticed that the n32
semctl/shmctl/msgctl system calls behave differently based on whether
o32 support is enabled or not: Without o32, the IPC_64 flag passed by
user space is rejected but calls without that flag get IPC_64 behavior.
As far as I can tell, this was inadvertently changed by a cleanup patch
but never noticed by anyone, possibly nobody has tried using sysvipc
on n32 after linux-3.19.
Change it back to the old behavior now.
Fixes: 78aaf956ba ("MIPS: Compat: Fix build error if CONFIG_MIPS32_COMPAT but no compat ABI.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44759979a4 upstream.
Changing of caching mode via /sys/devices/.../scsi_disk/.../cache_type may
fail if device responds to MODE SENSE command with DPOFUA flag set, and
then checks this flag to be not set on MODE SELECT command.
In this scenario, when trying to change cache_type, write always fails:
# echo "none" >cache_type
bash: echo: write error: Invalid argument
And following appears in dmesg:
[13007.865745] sd 1:0:1:0: [sda] Sense Key : Illegal Request [current]
[13007.865753] sd 1:0:1:0: [sda] Add. Sense: Invalid field in parameter list
From SBC-4 r15, 6.5.1 "Mode pages overview", description of DEVICE-SPECIFIC
PARAMETER field in the mode parameter header:
...
The write protect (WP) bit for mode data sent with a MODE SELECT
command shall be ignored by the device server.
...
The DPOFUA bit is reserved for mode data sent with a MODE SELECT
command.
...
The remaining bits in the DEVICE-SPECIFIC PARAMETER byte are also reserved
and shall be set to zero.
[mkp: shuffled commentary to commit description]
Cc: stable@vger.kernel.org
Signed-off-by: Ivan Mironov <mironov.ivan@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f7e62bba0 upstream.
The commit 356fd2663c ("scsi: Set request queue runtime PM status back to
active on resume") fixed up the inconsistent RPM status between request
queue and device. However changing request queue RPM status shall be done
only on successful resume, otherwise status may be still inconsistent as
below,
Request queue: RPM_ACTIVE
Device: RPM_SUSPENDED
This ends up soft lockup because requests can be submitted to underlying
devices but those devices and their required resource are not resumed.
For example,
After above inconsistent status happens, IO request can be submitted to UFS
device driver but required resource (like clock) is not resumed yet thus
lead to warning as below call stack,
WARN_ON(hba->clk_gating.state != CLKS_ON);
ufshcd_queuecommand
scsi_dispatch_cmd
scsi_request_fn
__blk_run_queue
cfq_insert_request
__elv_add_request
blk_flush_plug_list
blk_finish_plug
jbd2_journal_commit_transaction
kjournald2
We may see all behind IO requests hang because of no response from storage
host or device and then soft lockup happens in system. In the end, system
may crash in many ways.
Fixes: 356fd2663c (scsi: Set request queue runtime PM status back to active on resume)
Cc: stable@vger.kernel.org
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77b7aad195 upstream.
This reverts commit e73e81b6d0.
This patch causes a few problems:
- adds latency to btrfs_finish_ordered_io
- as btrfs_finish_ordered_io is used for free space cache, generating
more work from btrfs_btree_balance_dirty_nodelay could end up in the
same workque, effectively deadlocking
12260 kworker/u96:16+btrfs-freespace-write D
[<0>] balance_dirty_pages+0x6e6/0x7ad
[<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90
[<0>] btrfs_finish_ordered_io+0x3da/0x770
[<0>] normal_work_helper+0x1c5/0x5a0
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x46/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
Transaction commit will wait on the freespace cache:
838 btrfs-transacti D
[<0>] btrfs_start_ordered_extent+0x154/0x1e0
[<0>] btrfs_wait_ordered_range+0xbd/0x110
[<0>] __btrfs_wait_cache_io+0x49/0x1a0
[<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0
[<0>] commit_cowonly_roots+0x215/0x2b0
[<0>] btrfs_commit_transaction+0x37e/0x910
[<0>] transaction_kthread+0x14d/0x180
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
And then writepages ends up waiting on transaction commit:
9520 kworker/u96:13+flush-btrfs-1 D
[<0>] wait_current_trans+0xac/0xe0
[<0>] start_transaction+0x21b/0x4b0
[<0>] cow_file_range_inline+0x10b/0x6b0
[<0>] cow_file_range.isra.69+0x329/0x4a0
[<0>] run_delalloc_range+0x105/0x3c0
[<0>] writepage_delalloc+0x119/0x180
[<0>] __extent_writepage+0x10c/0x390
[<0>] extent_write_cache_pages+0x26f/0x3d0
[<0>] extent_writepages+0x4f/0x80
[<0>] do_writepages+0x17/0x60
[<0>] __writeback_single_inode+0x59/0x690
[<0>] writeback_sb_inodes+0x291/0x4e0
[<0>] __writeback_inodes_wb+0x87/0xb0
[<0>] wb_writeback+0x3bb/0x500
[<0>] wb_workfn+0x40d/0x610
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x1e0/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
Eventually, we have every process in the system waiting on
balance_dirty_pages(), and nobody is able to make progress on page
writeback.
The original patch tried to fix an OOM condition, that happened on 4.4 but no
success reproducing that on later kernels (4.19 and 4.20). This is more likely
a problem in OOM itself.
Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/
Reported-by: Chris Mason <clm@fb.com>
CC: stable@vger.kernel.org # 4.18+
CC: ethanlien <ethanlien@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 867cefb4cb upstream.
Commit f94c8d1169 ("sched/clock, x86/tsc: Rework the x86 'unstable'
sched_clock() interface") broke Xen guest time handling across
migration:
[ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 187.251137] OOM killer disabled.
[ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 187.252299] suspending xenstore...
[ 187.266987] xen:grant_table: Grant tables using version 1 layout
[18446743811.706476] OOM killer enabled.
[18446743811.706478] Restarting tasks ... done.
[18446743811.720505] Setting capacity to 16777216
Fix that by setting xen_sched_clock_offset at resume time to ensure a
monotonic clock value.
[boris: replaced pr_info() with pr_info_once() in xen_callback_vector()
to avoid printing with incorrect timestamp during resume (as we
haven't re-adjusted the clock yet)]
Fixes: f94c8d1169 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface")
Cc: <stable@vger.kernel.org> # 4.11
Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f9c469348 upstream.
Keys for "authenc" AEADs are formatted as an rtattr containing a 4-byte
'enckeylen', followed by an authentication key and an encryption key.
crypto_authenc_extractkeys() parses the key to find the inner keys.
However, it fails to consider the case where the rtattr's payload is
longer than 4 bytes but not 4-byte aligned, and where the key ends
before the next 4-byte aligned boundary. In this case, 'keylen -=
RTA_ALIGN(rta->rta_len);' underflows to a value near UINT_MAX. This
causes a buffer overread and crash during crypto_ahash_setkey().
Fix it by restricting the rtattr payload to the expected size.
Reproducer using AF_ALG:
#include <linux/if_alg.h>
#include <linux/rtnetlink.h>
#include <sys/socket.h>
int main()
{
int fd;
struct sockaddr_alg addr = {
.salg_type = "aead",
.salg_name = "authenc(hmac(sha256),cbc(aes))",
};
struct {
struct rtattr attr;
__be32 enckeylen;
char keys[1];
} __attribute__((packed)) key = {
.attr.rta_len = sizeof(key),
.attr.rta_type = 1 /* CRYPTO_AUTHENC_KEYA_PARAM */,
};
fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *)&addr, sizeof(addr));
setsockopt(fd, SOL_ALG, ALG_SET_KEY, &key, sizeof(key));
}
It caused:
BUG: unable to handle kernel paging request at ffff88007ffdc000
PGD 2e01067 P4D 2e01067 PUD 2e04067 PMD 2e05067 PTE 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 883 Comm: authenc Not tainted 4.20.0-rc1-00108-g00c9fe37a7f27 #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
RIP: 0010:sha256_ni_transform+0xb3/0x330 arch/x86/crypto/sha256_ni_asm.S:155
[...]
Call Trace:
sha256_ni_finup+0x10/0x20 arch/x86/crypto/sha256_ssse3_glue.c:321
crypto_shash_finup+0x1a/0x30 crypto/shash.c:178
shash_digest_unaligned+0x45/0x60 crypto/shash.c:186
crypto_shash_digest+0x24/0x40 crypto/shash.c:202
hmac_setkey+0x135/0x1e0 crypto/hmac.c:66
crypto_shash_setkey+0x2b/0xb0 crypto/shash.c:66
shash_async_setkey+0x10/0x20 crypto/shash.c:223
crypto_ahash_setkey+0x2d/0xa0 crypto/ahash.c:202
crypto_authenc_setkey+0x68/0x100 crypto/authenc.c:96
crypto_aead_setkey+0x2a/0xc0 crypto/aead.c:62
aead_setkey+0xc/0x10 crypto/algif_aead.c:526
alg_setkey crypto/af_alg.c:223 [inline]
alg_setsockopt+0xfe/0x130 crypto/af_alg.c:256
__sys_setsockopt+0x6d/0xd0 net/socket.c:1902
__do_sys_setsockopt net/socket.c:1913 [inline]
__se_sys_setsockopt net/socket.c:1910 [inline]
__x64_sys_setsockopt+0x1f/0x30 net/socket.c:1910
do_syscall_64+0x4a/0x180 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: e236d4a89a ("[CRYPTO] authenc: Move enckeylen into key itself")
Cc: <stable@vger.kernel.org> # v2.6.25+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab57b33525 upstream.
Convert the bcm crypto driver to use crypto_authenc_extractkeys() so
that it picks up the fix for broken validation of rtattr::rta_len.
This also fixes the DES weak key check to actually be done on the right
key. (It was checking the authentication key, not the encryption key...)
Fixes: 9d12ba86f8 ("crypto: brcm - Add Broadcom SPU driver")
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 11287b693d ]
This soft dependency works around an issue where sometimes the genphy
driver is used instead of the dedicated PHY driver. The root cause of
the issue isn't clear yet. People reported the unloading/re-loading
module r8169 helps, and also configuring this soft dependency in
the modprobe config files. Important just seems to be that the
realtek module is loaded before r8169.
Once this has been applied preliminary fix 38af4b9032 ("net: phy:
add workaround for issue where PHY driver doesn't bind to the device")
will be removed.
Fixes: f1e911d5d0 ("r8169: add basic phylib support")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4a06fa67c4 ]
Commit 2efd4fca70 ("ip: in cmsg IP(V6)_ORIGDSTADDR call
pskb_may_pull") avoided a read beyond the end of the skb linear
segment by calling pskb_may_pull.
That function can trigger a BUG_ON in pskb_expand_head if the skb is
shared, which it is when when peeking. It can also return ENOMEM.
Avoid both by switching to safer skb_header_pointer.
Fixes: 2efd4fca70 ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull")
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 001e465f09 ]
A network device stack with multiple layers of bonding devices can
trigger a false positive lockdep warning. Adding lockdep nest levels
fixes this. Update the level on both enslave and unlink, to avoid the
following series of events ..
ip netns add test
ip netns exec test bash
ip link set dev lo addr 00:11:22:33:44:55
ip link set dev lo down
ip link add dev bond1 type bond
ip link add dev bond2 type bond
ip link set dev lo master bond1
ip link set dev bond1 master bond2
ip link set dev bond1 nomaster
ip link set dev bond2 master bond1
.. from still generating a splat:
[ 193.652127] ======================================================
[ 193.658231] WARNING: possible circular locking dependency detected
[ 193.664350] 4.20.0 #8 Not tainted
[ 193.668310] ------------------------------------------------------
[ 193.674417] ip/15577 is trying to acquire lock:
[ 193.678897] 00000000a40e3b69 (&(&bond->stats_lock)->rlock#3/3){+.+.}, at: bond_get_stats+0x58/0x290
[ 193.687851]
but task is already holding lock:
[ 193.693625] 00000000807b9d9f (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0x58/0x290
[..]
[ 193.851092] lock_acquire+0xa7/0x190
[ 193.855138] _raw_spin_lock_nested+0x2d/0x40
[ 193.859878] bond_get_stats+0x58/0x290
[ 193.864093] dev_get_stats+0x5a/0xc0
[ 193.868140] bond_get_stats+0x105/0x290
[ 193.872444] dev_get_stats+0x5a/0xc0
[ 193.876493] rtnl_fill_stats+0x40/0x130
[ 193.880797] rtnl_fill_ifinfo+0x6c5/0xdc0
[ 193.885271] rtmsg_ifinfo_build_skb+0x86/0xe0
[ 193.890091] rtnetlink_event+0x5b/0xa0
[ 193.894320] raw_notifier_call_chain+0x43/0x60
[ 193.899225] netdev_change_features+0x50/0xa0
[ 193.904044] bond_compute_features.isra.46+0x1ab/0x270
[ 193.909640] bond_enslave+0x141d/0x15b0
[ 193.913946] do_set_master+0x89/0xa0
[ 193.918016] do_setlink+0x37c/0xda0
[ 193.921980] __rtnl_newlink+0x499/0x890
[ 193.926281] rtnl_newlink+0x48/0x70
[ 193.930238] rtnetlink_rcv_msg+0x171/0x4b0
[ 193.934801] netlink_rcv_skb+0xd1/0x110
[ 193.939103] rtnetlink_rcv+0x15/0x20
[ 193.943151] netlink_unicast+0x3b5/0x520
[ 193.947544] netlink_sendmsg+0x2fd/0x3f0
[ 193.951942] sock_sendmsg+0x38/0x50
[ 193.955899] ___sys_sendmsg+0x2ba/0x2d0
[ 193.960205] __x64_sys_sendmsg+0xad/0x100
[ 193.964687] do_syscall_64+0x5a/0x460
[ 193.968823] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 7e2556e400 ("bonding: avoid lockdep confusion in bond_get_stats()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a0071840d2 ]
It has been noticed that some phys do not have the registers
required by the previous implementation.
To fix this, instead of using phy_read, the required information
is extracted from the phy_device structure.
fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0b7959b625 ]
BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1
Call Trace:
? napi_gro_frags+0xa7/0x2c0
tun_get_user+0xb50/0xf20
tun_chr_write_iter+0x53/0x70
new_sync_write+0xff/0x160
vfs_write+0x191/0x1e0
__x64_sys_write+0x5e/0xd0
do_syscall_64+0x47/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
I think there is a subtle race between sending a packet via tap and
attaching it:
CPU0: CPU1:
tun_chr_ioctl(TUNSETIFF)
tun_set_iff
tun_attach
rcu_assign_pointer(tfile->tun, tun);
tun_fops->write_iter()
tun_chr_write_iter
tun_napi_alloc_frags
napi_get_frags
napi->skb = napi_alloc_skb
tun_napi_init
netif_napi_add
napi->skb = NULL
napi->skb is NULL here
napi_gro_frags
napi_frags_skb
skb = napi->skb
skb_reset_mac_header(skb)
panic()
Move rcu_assign_pointer(tfile->tun) and rcu_assign_pointer(tun->tfiles) to
be the last thing we do in tun_attach(); this should guarantee that when we
call tun_get() we always get an initialized object.
v2 changes:
* remove extra napi_mutex locks/unlocks for napi operations
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 90e33d4594 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c5715b8fab ]
Previously upon SYN timeouts the sender recomputes the txhash to
try a different path. However this does not apply on the initial
timeout of SYN-data (active Fast Open). Therefore an active IPv6
Fast Open connection may incur one second RTO penalty to take on
a new path after the second SYN retransmission uses a new flow label.
This patch removes this undesirable behavior so Fast Open changes
the flow label just like the regular connections. This also helps
avoid falsely disabling Fast Open on the sender which triggers
after two consecutive SYN timeouts on Fast Open.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d972f3dce8 ]
'dev' is non NULL when the addr_len check triggers so it must goto a label
that does the dev_put otherwise dev will have a leaked refcount.
This bug causes the ib_ipoib module to become unloadable when using
systemd-network as it triggers this check on InfiniBand links.
Fixes: 99137b7888 ("packet: validate address length")
Reported-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c84edc11b ]
When handling DNAT'ed packets on a bridge device, the neighbour cache entry
from lookup was used without checking its state. It means that a cache entry
in the NUD_STALE state will be used directly instead of entering the NUD_DELAY
state to confirm the reachability of the neighbor.
This problem becomes worse after commit 2724680bce ("neigh: Keep neighbour
cache entries if number of them is small enough."), since all neighbour cache
entries in the NUD_STALE state will be kept in the neighbour table as long as
the number of cache entries does not exceed the value specified in gc_thresh1.
This commit validates the state of a neighbour cache entry before using
the entry.
Signed-off-by: JianJhen Chen <kchen@synology.com>
Reviewed-by: JinLin Chen <jlchen@synology.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b3669b1e1c ]
To allow EL0 (and/or EL1) to use pointer authentication functionality,
we must ensure that pointer authentication instructions and accesses to
pointer authentication keys are not trapped to EL2.
This patch ensures that HCR_EL2 is configured appropriately when the
kernel is booted at EL2. For non-VHE kernels we set HCR_EL2.{API,APK},
ensuring that EL1 can access keys and permit EL0 use of instructions.
For VHE kernels host EL0 (TGE && E2H) is unaffected by these settings,
and it doesn't matter how we configure HCR_EL2.{API,APK}, so we don't
bother setting them.
This does not enable support for KVM guests, since KVM manages HCR_EL2
itself when running VMs.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4eaed6aa2c ]
In KVM we define the configuration of HCR_EL2 for a VHE HOST in
HCR_HOST_VHE_FLAGS, but we don't have a similar definition for the
non-VHE host flags, and open-code HCR_RW. Further, in head.S we
open-code the flags for VHE and non-VHE configurations.
In future, we're going to want to configure more flags for the host, so
lets add a HCR_HOST_NVHE_FLAGS defintion, and consistently use both
HCR_HOST_VHE_FLAGS and HCR_HOST_NVHE_FLAGS in the kvm code and head.S.
We now use mov_q to generate the HCR_EL2 value, as we use when
configuring other registers in head.S.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 801df68d61 ]
csk leak can happen if a new TCP connection gets established after
cxgbit_accept_np() returns, to fix this leak free remaining csk in
cxgbit_free_np().
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit c9cef2c71a.
A wrong commit message was used for the stable commit because of a human
error (and duplicate commit subject lines).
This patch reverts this error, and the following patches add the two
upstream commits.
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a89e7bcb18 upstream.
The Clock Data Recovery (CDR) circuit allows to automatically adjust
the RX sampling-point/phase for high frequency cards (SDR104, HS200...).
CDR is automatically enabled during DLL configuration.
However, according to the APQ8016 reference manual, this function
must be disabled during TX and tuning phase in order to prevent any
interferences during tuning challenges and unexpected phase alteration
during TX transfers.
This patch enables/disables CDR according to the current transfer mode.
This fixes sporadic write transfer issues observed with some SDR104 and
HS200 cards.
Inspired by sdhci-msm downstream patch:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/432516/
Reported-by: Leonid Segal <leonid.s@variscite.com>
Reported-by: Manabu Igusa <migusa@arrowjapan.com>
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Georgi Djakov <georgi.djakov@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[georgi: backport to v4.19+]
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a007232066 upstream.
Size and 'next bit' were swapped, this bug could cause worker to
reschedule itself even if system was idle.
Fixes: 5c789e131c ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c80f10bc97 upstream.
Instead of removing a empty list node that might be reintroduced soon
thereafter, tentatively place the empty list node on the list passed to
tree_nodes_free(), then re-check if the list is empty again before erasing
it from the tree.
[ Florian: rebase on top of pending nf_conncount fixes ]
Fixes: 5c789e131c ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f971a8f42 upstream.
Two CPUs may race to remove a connection from the list, the existing
conn->dead will result in a use-after-free. Use the per-list spinlock to
protect list iterations.
As all accesses to the list now happen while holding the per-list lock,
we no longer need to delay free operations with rcu.
Joint work with Florian.
Fixes: 5c789e131c ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df4a902509 upstream.
'lookup' is always followed by 'add'.
Merge both and make the list-walk part of nf_conncount_add().
This also avoids one unneeded unlock/re-lock pair.
Extra care needs to be taken in count_tree, as we only hold rcu
read lock, i.e. we can only insert to an existing tree node after
acquiring its lock and making sure it has a nonzero count.
As a zero count should be rare, just fall back to insert_tree()
(which acquires tree lock).
This issue and its solution were pointed out by Shawn Bohrer
during patch review.
Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e8cfb372b3 upstream.
Shawn Bohrer reported a following crash:
|RIP: 0010:rb_erase+0xae/0x360
[..]
Call Trace:
nf_conncount_destroy+0x59/0xc0 [nf_conncount]
cleanup_match+0x45/0x70 [ip_tables]
...
Shawn tracked this down to bogus 'parent' pointer:
Problem is that when we insert a new node, then there is a chance that
the 'parent' that we found was also passed to tree_nodes_free() (because
that node was empty) for erase+free.
Instead of trying to be clever and detect when this happens, restart
the search if we have evicted one or more nodes. To prevent frequent
restarts, do not perform gc on the second round.
Also, unconditionally schedule the gc worker.
The condition
gc_count > ARRAY_SIZE(gc_nodes))
cannot be true unless tree grows very large, as the height of the tree
will be low even with hundreds of nodes present.
Fixes: 5c789e131c ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
Reported-by: Shawn Bohrer <sbohrer@cloudflare.com>
Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7fcc98dfc upstream.
The lockless workqueue garbage collector can race with packet path
garbage collector to delete list nodes, as it calls tree_nodes_free()
with the addresses of nodes that might have been free'd already from
another cpu.
To fix this, split gc into two phases.
One phase to perform gc on the connections: From a locking perspective,
this is the same as count_tree(): we hold rcu lock, but we do not
change the tree, we only change the nodes' contents.
The second phase acquires the tree lock and reaps empty nodes.
This avoids a race condition of the garbage collection vs. packet path:
If a node has been free'd already, the second phase won't find it anymore.
This second phase is, from locking perspective, same as insert_tree().
The former only modifies nodes (list content, count), latter modifies
the tree itself (rb_erase or rb_insert).
Fixes: 5c789e131c ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4cd273bb91 upstream.
age is signed integer, so result can be negative when the timestamps
have a large delta. In this case we want to discard the entry.
Instead of using age >= 2 || age < 0, just make it unsigned.
Fixes: b36e4523d4 ("netfilter: nf_conncount: fix garbage collection confirm race")
Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c78e7818f1 upstream.
Most of the time these were the same value anyway, but when
CONFIG_LOCKDEP was enabled we would use a smaller number of locks to
reduce overhead. Unfortunately having two values is confusing and not
worth the complexity.
This fixes a bug where tree_gc_worker() would only GC up to
CONNCOUNT_LOCK_SLOTS trees which meant when CONFIG_LOCKDEP was enabled
not all trees would be GCed by tree_gc_worker().
Fixes: 5c789e131c ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0aaa81377c upstream.
Muyu Yu provided a POC where user root with CAP_NET_ADMIN can create a CAN
frame modification rule that makes the data length code a higher value than
the available CAN frame data size. In combination with a configured checksum
calculation where the result is stored relatively to the end of the data
(e.g. cgw_csum_xor_rel) the tail of the skb (e.g. frag_list pointer in
skb_shared_info) can be rewritten which finally can cause a system crash.
Michael Kubecek suggested to drop frames that have a DLC exceeding the
available space after the modification process and provided a patch that can
handle CAN FD frames too. Within this patch we also limit the length for the
checksum calculations to the maximum of Classic CAN data length (8).
CAN frames that are dropped by these additional checks are counted with the
CGW_DELETED counter which indicates misconfigurations in can-gw rules.
This fixes CVE-2019-3701.
Reported-by: Muyu Yu <ieatmuttonchuan@gmail.com>
Reported-by: Marcus Meissner <meissner@suse.de>
Suggested-by: Michal Kubecek <mkubecek@suse.cz>
Tested-by: Muyu Yu <ieatmuttonchuan@gmail.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.2
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3736d82e8 upstream.
Try to get reference for ldisc during tty_reopen().
If ldisc present, we don't need to do tty_ldisc_reinit() and lock the
write side for line discipline semaphore.
Effectively, it optimizes fast-path for tty_reopen(), but more
importantly it won't interrupt ongoing IO on the tty as no ldisc change
is needed.
Fixes user-visible issue when tty_reopen() interrupted login process for
user with a long password, observed and reported by Lukas.
Fixes: c96cf923a9 ("tty: Don't block on IO when ldisc change is pending")
Fixes: 83d817f410 ("tty: Hold tty_ldisc_lock() during tty_reopen()")
Cc: Jiri Slaby <jslaby@suse.com>
Reported-by: Lukas F. Hartmann <lukas@mntmn.com>
Tested-by: Lukas F. Hartmann <lukas@mntmn.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 83d817f410 upstream.
tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup()
nor set_ldisc() nor tty_ldisc_release() as they use tty lock.
But it races with anyone who expects line discipline to be the same
after hoding read semaphore in tty_ldisc_ref().
We've seen the following crash on v4.9.108 stable:
BUG: unable to handle kernel paging request at 0000000000002260
IP: [..] n_tty_receive_buf_common+0x5f/0x86d
Workqueue: events_unbound flush_to_ldisc
Call Trace:
[..] n_tty_receive_buf2
[..] tty_ldisc_receive_buf
[..] flush_to_ldisc
[..] process_one_work
[..] worker_thread
[..] kthread
[..] ret_from_fork
tty_ldisc_reinit() should be called with ldisc_sem hold for writing,
which will protect any reader against line discipline changes.
Cc: Jiri Slaby <jslaby@suse.com>
Cc: stable@vger.kernel.org # b027e2298b ("tty: fix data race between tty_init_dev and flush of buf")
Reviewed-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: syzbot+3aa9784721dfb90e984d@syzkaller.appspotmail.com
Tested-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Tested-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 231f8fd0cc upstream.
ldsem_down_read() will sleep if there is pending writer in the queue.
If the writer times out, readers in the queue should be woken up,
otherwise they may miss a chance to acquire the semaphore until the last
active reader will do ldsem_up_read().
There was a couple of reports where there was one active reader and
other readers soft locked up:
Showing all locks held in the system:
2 locks held by khungtaskd/17:
#0: (rcu_read_lock){......}, at: watchdog+0x124/0x6d1
#1: (tasklist_lock){.+.+..}, at: debug_show_all_locks+0x72/0x2d3
2 locks held by askfirst/123:
#0: (&tty->ldisc_sem){.+.+.+}, at: ldsem_down_read+0x46/0x58
#1: (&ldata->atomic_read_lock){+.+...}, at: n_tty_read+0x115/0xbe4
Prevent readers wait for active readers to release ldisc semaphore.
Link: lkml.kernel.org/r/20171121132855.ajdv4k6swzhvktl6@wfg-t540p.sh.intel.com
Link: lkml.kernel.org/r/20180907045041.GF1110@shao2-debian
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 827aa18e7b upstream.
When initializing the security xattrs, we are holding a transaction handle
therefore we need to use a GFP_NOFS context in order to avoid a deadlock
with reclaim in case it's triggered.
Fixes: 39a27ec100 ("btrfs: use GFP_KERNEL for xattr and acl allocations")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a6f209e36 upstream.
If the quota enable and snapshot creation ioctls are called concurrently
we can get into a deadlock where the task enabling quotas will deadlock
on the fs_info->qgroup_ioctl_lock mutex because it attempts to lock it
twice, or the task creating a snapshot tries to commit the transaction
while the task enabling quota waits for the former task to commit the
transaction while holding the mutex. The following time diagrams show how
both cases happen.
First scenario:
CPU 0 CPU 1
btrfs_ioctl()
btrfs_ioctl_quota_ctl()
btrfs_quota_enable()
mutex_lock(fs_info->qgroup_ioctl_lock)
btrfs_start_transaction()
btrfs_ioctl()
btrfs_ioctl_snap_create_v2
create_snapshot()
--> adds snapshot to the
list pending_snapshots
of the current
transaction
btrfs_commit_transaction()
create_pending_snapshots()
create_pending_snapshot()
qgroup_account_snapshot()
btrfs_qgroup_inherit()
mutex_lock(fs_info->qgroup_ioctl_lock)
--> deadlock, mutex already locked
by this task at
btrfs_quota_enable()
Second scenario:
CPU 0 CPU 1
btrfs_ioctl()
btrfs_ioctl_quota_ctl()
btrfs_quota_enable()
mutex_lock(fs_info->qgroup_ioctl_lock)
btrfs_start_transaction()
btrfs_ioctl()
btrfs_ioctl_snap_create_v2
create_snapshot()
--> adds snapshot to the
list pending_snapshots
of the current
transaction
btrfs_commit_transaction()
--> waits for task at
CPU 0 to release
its transaction
handle
btrfs_commit_transaction()
--> sees another task started
the transaction commit first
--> releases its transaction
handle
--> waits for the transaction
commit to be completed by
the task at CPU 1
create_pending_snapshot()
qgroup_account_snapshot()
btrfs_qgroup_inherit()
mutex_lock(fs_info->qgroup_ioctl_lock)
--> deadlock, task at CPU 0
has the mutex locked but
it is waiting for us to
finish the transaction
commit
So fix this by setting the quota enabled flag in fs_info after committing
the transaction at btrfs_quota_enable(). This ends up serializing quota
enable and snapshot creation as if the snapshot creation happened just
before the quota enable request. The quota rescan task, scheduled after
committing the transaction in btrfs_quote_enable(), will do the accounting.
Fixes: 6426c7ad69 ("btrfs: qgroup: Fix qgroup accounting when creating snapshot")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a8067c0d1 upstream.
The available allocation bits members from struct btrfs_fs_info are
protected by a sequence lock, and when starting balance we access them
incorrectly in two different ways:
1) In the read sequence lock loop at btrfs_balance() we use the values we
read from fs_info->avail_*_alloc_bits and we can immediately do actions
that have side effects and can not be undone (printing a message and
jumping to a label). This is wrong because a retry might be needed, so
our actions must not have side effects and must be repeatable as long
as read_seqretry() returns a non-zero value. In other words, we were
essentially ignoring the sequence lock;
2) Right below the read sequence lock loop, we were reading the values
from avail_metadata_alloc_bits and avail_data_alloc_bits without any
protection from concurrent writers, that is, reading them outside of
the read sequence lock critical section.
So fix this by making sure we only read the available allocation bits
while in a read sequence lock critical section and that what we do in the
critical section is repeatable (has nothing that can not be undone) so
that any eventual retry that is needed is handled properly.
Fixes: de98ced9e7 ("Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bits")
Fixes: 1450612797 ("btrfs: fix a bogus warning when converting only data or metadata")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>