(Upstream commit 3e91ec89f5).
Require that arg{3,4,5} of the PR_{SET,GET}_TAGGED_ADDR_CTRL prctl and
arg2 of the PR_GET_TAGGED_ADDR_CTRL prctl() are zero rather than ignored
for future extensions.
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: I8bb5c3eb4728440880c971d77904f7e45b571ddc
(Upstream commit 9c1cac424c).
untagged_addr() can be called with a '__user' pointer parameter and must
therefore use '__force' casts both when passing this parameter through
to sign_extend64() as a 'u64', but also when casting the 's64' return
value back to the '__user' pointer type.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: I8806aa6911cda3ab0489dff59b75ae214b02d48e
(Upstream commit 9ce1263033).
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
This patch adds a simple test, that calls the uname syscall with a
tagged user pointer as an argument. Without the kernel accepting tagged
user pointers the test fails with EFAULT.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: I150694bf72706b5dd39939e38435c3aebf6ab2b3
(Upstream commit 63f0c60379).
It is not desirable to relax the ABI to allow tagged user addresses into
the kernel indiscriminately. This patch introduces a prctl() interface
for enabling or disabling the tagged ABI with a global sysctl control
for preventing applications from enabling the relaxed ABI (meant for
testing user-space prctl() return error checking without reconfiguring
the kernel). The ABI properties are inherited by threads of the same
application and fork()'ed children but cleared on execve(). A Kconfig
option allows the overall disabling of the relaxed ABI.
The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
MTE-specific settings like imprecise vs precise exceptions.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Change-Id: I2d52c5589b05415faab315c116245f1058d64750
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
(Upstream commit 2b835e24b5).
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
copy_from_user (and a few other similar functions) are used to copy data
from user memory into the kernel memory or vice versa. Since a user can
provided a tagged pointer to one of the syscalls that use copy_from_user,
we need to correctly handle such pointers.
Do this by untagging user pointers in access_ok and in __uaccess_mask_ptr,
before performing access validity checks.
Note, that this patch only temporarily untags the pointers to perform the
checks, but then passes them as is into the kernel internals.
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
[will: Add __force to casting in untagged_addr() to kill sparse warning]
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: Ia97d1d311c2e2bf8e4584005b5085204db6d8955
(Upstream commit d93445225c).
Architectures that support memory tagging have a need to perform untagging
(stripping the tag) in various parts of the kernel. This patch adds an
untagged_addr() macro, which is defined as noop for architectures that do
not support memory tagging. The oncoming patch series will define it at
least for sparc64 and arm64.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: If3932582c82e302fe8d602fabac9a8f4cde6388d
psi tracks the time tasks wait for refaulting pages to become
uptodate, but it does not track the time spent submitting the IO. The
submission part can be significant if backing storage is contended or
when cgroup throttling (io.latency) is in effect - a lot of time is
spent in submit_bio(). In that case, we underreport memory pressure.
Annotate submit_bio() to account submission time as memory stall when
the bio is reading userspace workingset pages.
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit b8e24a9300)
Conflicts:
include/linux/blk_types.h
(1. Manually resolved BIO_WORKINGSET being definition instead of enum.)
Bug: 141131229
Test: boot and run act test suite
Change-Id: I99cef039844e219f1dc8196feead54b6f5fb26bb
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Changes in 4.19.78
tpm: use tpm_try_get_ops() in tpm-sysfs.c.
tpm: Fix TPM 1.2 Shutdown sequence to prevent future TPM operations
drm/bridge: tc358767: Increase AUX transfer length limit
drm/panel: simple: fix AUO g185han01 horizontal blanking
video: ssd1307fb: Start page range at page_offset
drm/stm: attach gem fence to atomic state
drm/panel: check failure cases in the probe func
drm/rockchip: Check for fast link training before enabling psr
drm/radeon: Fix EEH during kexec
gpu: drm: radeon: Fix a possible null-pointer dereference in radeon_connector_set_property()
PCI: rpaphp: Avoid a sometimes-uninitialized warning
ipmi_si: Only schedule continuously in the thread in maintenance mode
clk: qoriq: Fix -Wunused-const-variable
clk: sunxi-ng: v3s: add missing clock slices for MMC2 module clocks
drm/amd/display: fix issue where 252-255 values are clipped
drm/amd/display: reprogram VM config when system resume
powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window
clk: actions: Don't reference clk_init_data after registration
clk: sirf: Don't reference clk_init_data after registration
clk: sprd: Don't reference clk_init_data after registration
clk: zx296718: Don't reference clk_init_data after registration
powerpc/xmon: Check for HV mode when dumping XIVE info from OPAL
powerpc/rtas: use device model APIs and serialization during LPM
powerpc/futex: Fix warning: 'oldval' may be used uninitialized in this function
powerpc/pseries/mobility: use cond_resched when updating device tree
pinctrl: tegra: Fix write barrier placement in pmx_writel
powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
vfio_pci: Restore original state on release
drm/nouveau/volt: Fix for some cards having 0 maximum voltage
pinctrl: amd: disable spurious-firing GPIO IRQs
clk: renesas: mstp: Set GENPD_FLAG_ALWAYS_ON for clock domain
clk: renesas: cpg-mssr: Set GENPD_FLAG_ALWAYS_ON for clock domain
drm/amd/display: support spdif
drm/amdgpu/si: fix ASIC tests
powerpc/64s/exception: machine check use correct cfar for late handler
pstore: fs superblock limits
clk: qcom: gcc-sdm845: Use floor ops for sdcc clks
powerpc/pseries: correctly track irq state in default idle
pinctrl: meson-gxbb: Fix wrong pinning definition for uart_c
arm64: fix unreachable code issue with cmpxchg
clk: at91: select parent if main oscillator or bypass is enabled
powerpc: dump kernel log before carrying out fadump or kdump
mbox: qcom: add APCS child device for QCS404
clk: sprd: add missing kfree
scsi: core: Reduce memory required for SCSI logging
dma-buf/sw_sync: Synchronize signal vs syncpt free
ext4: fix potential use after free after remounting with noblock_validity
MIPS: Ingenic: Disable broken BTB lookup optimization.
MIPS: tlbex: Explicitly cast _PAGE_NO_EXEC to a boolean
i2c-cht-wc: Fix lockdep warning
mfd: intel-lpss: Remove D3cold delay
PCI: tegra: Fix OF node reference leak
HID: wacom: Fix several minor compiler warnings
livepatch: Nullify obj->mod in klp_module_coming()'s error path
ARM: 8898/1: mm: Don't treat faults reported from cache maintenance as writes
soundwire: intel: fix channel number reported by hardware
ARM: 8875/1: Kconfig: default to AEABI w/ Clang
rtc: snvs: fix possible race condition
rtc: pcf85363/pcf85263: fix regmap error in set_time
HID: apple: Fix stuck function keys when using FN
PCI: rockchip: Propagate errors for optional regulators
PCI: histb: Propagate errors for optional regulators
PCI: imx6: Propagate errors for optional regulators
PCI: exynos: Propagate errors for optional PHYs
security: smack: Fix possible null-pointer dereferences in smack_socket_sock_rcv_skb()
ARM: 8903/1: ensure that usable memory in bank 0 starts from a PMD-aligned address
fat: work around race with userspace's read via blockdev while mounting
pktcdvd: remove warning on attempting to register non-passthrough dev
hypfs: Fix error number left in struct pointer member
crypto: hisilicon - Fix double free in sec_free_hw_sgl()
kbuild: clean compressed initramfs image
ocfs2: wait for recovering done after direct unlock request
kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K
arm64: consider stack randomization for mmap base only when necessary
mips: properly account for stack randomization and stack guard gap
arm: properly account for stack randomization and stack guard gap
arm: use STACK_TOP when computing mmap base address
block: mq-deadline: Fix queue restart handling
bpf: fix use after free in prog symbol exposure
cxgb4:Fix out-of-bounds MSI-X info array access
erspan: remove the incorrect mtu limit for erspan
hso: fix NULL-deref on tty open
ipv6: drop incoming packets having a v4mapped source address
ipv6: Handle missing host route in __ipv6_ifa_notify
net: ipv4: avoid mixed n_redirects and rate_tokens usage
net: qlogic: Fix memory leak in ql_alloc_large_buffers
net: Unpublish sk from sk_reuseport_cb before call_rcu
nfc: fix memory leak in llcp_sock_bind()
qmi_wwan: add support for Cinterion CLS8 devices
rxrpc: Fix rxrpc_recvmsg tracepoint
sch_dsmark: fix potential NULL deref in dsmark_init()
udp: fix gso_segs calculations
vsock: Fix a lockdep warning in __vsock_release()
net: dsa: rtl8366: Check VLAN ID and not ports
udp: only do GSO if # of segs > 1
net/rds: Fix error handling in rds_ib_add_one()
xen-netfront: do not use ~0U as error return value for xennet_fill_frags()
tipc: fix unlimited bundling of small messages
sch_cbq: validate TCA_CBQ_WRROPT to avoid crash
soundwire: Kconfig: fix help format
soundwire: fix regmap dependencies and align with other serial links
Smack: Don't ignore other bprm->unsafe flags if LSM_UNSAFE_PTRACE is set
smack: use GFP_NOFS while holding inode_smack::smk_lock
NFC: fix attrs checks in netlink interface
kexec: bail out upon SIGKILL when allocating memory.
9p/cache.c: Fix memory leak in v9fs_cache_session_get_cookie
Linux 4.19.78
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I02db9c4a0cc1f784e6ac1523599fcbed747a6375
commit 18917d5147 upstream.
nfc_genl_deactivate_target() relies on the NFC_ATTR_TARGET_INDEX
attribute being present, but doesn't check whether it is actually
provided by the user. Same goes for nfc_genl_fw_download() and
NFC_ATTR_FIRMWARE_NAME.
This patch adds appropriate checks.
Found with syzkaller.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3675f052b4 upstream.
There is a logic bug in the current smack_bprm_set_creds():
If LSM_UNSAFE_PTRACE is set, but the ptrace state is deemed to be
acceptable (e.g. because the ptracer detached in the meantime), the other
->unsafe flags aren't checked. As far as I can tell, this means that
something like the following could work (but I haven't tested it):
- task A: create task B with fork()
- task B: set NO_NEW_PRIVS
- task B: install a seccomp filter that makes open() return 0 under some
conditions
- task B: replace fd 0 with a malicious library
- task A: attach to task B with PTRACE_ATTACH
- task B: execve() a file with an SMACK64EXEC extended attribute
- task A: while task B is still in the middle of execve(), exit (which
destroys the ptrace relationship)
Make sure that if any flags other than LSM_UNSAFE_PTRACE are set in
bprm->unsafe, we reject the execve().
Cc: stable@vger.kernel.org
Fixes: 5663884caa ("Smack: unify all ptrace accesses in the smack")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e95584a889 ]
We have identified a problem with the "oversubscription" policy in the
link transmission code.
When small messages are transmitted, and the sending link has reached
the transmit window limit, those messages will be bundled and put into
the link backlog queue. However, bundles of data messages are counted
at the 'CRITICAL' level, so that the counter for that level, instead of
the counter for the real, bundled message's level is the one being
increased.
Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
to be tested against the unchanged counter for their own level, while
contributing to an unrestrained increase at the CRITICAL backlog level.
This leaves a gap in congestion control algorithm for small messages
that can result in starvation for other users or a "real" CRITICAL
user. Even that eventually can lead to buffer exhaustion & link reset.
We fix this by keeping a 'target_bskb' buffer pointer at each levels,
then when bundling, we only bundle messages at the same importance
level only. This way, we know exactly how many slots a certain level
have occupied in the queue, so can manage level congestion accurately.
By bundling messages at the same level, we even have more benefits. Let
consider this:
- One socket sends 64-byte messages at the 'CRITICAL' level;
- Another sends 4096-byte messages at the 'LOW' level;
When a 64-byte message comes and is bundled the first time, we put the
overhead of message bundle to it (+ 40-byte header, data copy, etc.)
for later use, but the next message can be a 4096-byte one that cannot
be bundled to the previous one. This means the last bundle carries only
one payload message which is totally inefficient, as for the receiver
also! Later on, another 64-byte message comes, now we make a new bundle
and the same story repeats...
With the new bundling algorithm, this will not happen, the 64-byte
messages will be bundled together even when the 4096-byte message(s)
comes in between. However, if the 4096-byte messages are sent at the
same level i.e. 'CRITICAL', the bundling algorithm will again cause the
same overhead.
Also, the same will happen even with only one socket sending small
messages at a rate close to the link transmit's one, so that, when one
message is bundled, it's transmitted shortly. Then, another message
comes, a new bundle is created and so on...
We will solve this issue radically by another patch.
Fixes: 365ad353c2 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a761129e36 ]
xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
to cache extra fragments. This is incorrect because the return type of
xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
ring buffer index.
In the situation when the rsp_cons is approaching 0xffffffff, the return
value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
caller) would regard as error. As a result, queue->rx.rsp_cons is set
incorrectly because it is updated only when there is error. If there is no
error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
This leads to NULL pointer access in the next iteration to process rx ring
buffer entries.
The symptom is similar to the one fixed in
commit 00b368502d ("xen-netfront: do not assume sk_buff_head list is
empty in error handling").
This patch changes the return type of xennet_fill_frags() to indicate
whether it is successful or failed. The queue->rx.rsp_cons will be
always updated inside this function.
Fixes: ad4f15dc2c ("xen/netfront: don't bug in case of too many frags")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d64bf89a75 ]
rds_ibdev:ipaddr_list and rds_ibdev:conn_list are initialized
after allocation some resources such as protection domain.
If allocation of such resources fail, then these uninitialized
variables are accessed in rds_ib_dev_free() in failure path. This
can potentially crash the system. The code has been updated to
initialize these variables very early in the function.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4094871db1 ]
Prior to this change an application sending <= 1MSS worth of data and
enabling UDP GSO would fail if the system had SW GSO enabled, but the
same send would succeed if HW GSO offload is enabled. In addition to this
inconsistency the error in the SW GSO case does not get back to the
application if sending out of a real device so the user is unaware of this
failure.
With this change we only perform GSO if the # of segments is > 1 even
if the application has enabled segmentation. I've also updated the
relevant udpgso selftests.
Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e8521e53cc ]
There has been some confusion between the port number and
the VLAN ID in this driver. What we need to check for
validity is the VLAN ID, nothing else.
The current confusion came from assigning a few default
VLANs for default routing and we need to rewrite that
properly.
Instead of checking if the port number is a valid VLAN
ID, check the actual VLAN IDs passed in to the callback
one by one as expected.
Fixes: d8652956cf ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0d9138ffac ]
Lockdep is unhappy if two locks from the same class are held.
Fix the below warning for hyperv and virtio sockets (vmci socket code
doesn't have the issue) by using lock_sock_nested() when __vsock_release()
is called recursively:
============================================
WARNING: possible recursive locking detected
5.3.0+ #1 Not tainted
--------------------------------------------
server/1795 is trying to acquire lock:
ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
but task is already holding lock:
ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(sk_lock-AF_VSOCK);
lock(sk_lock-AF_VSOCK);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by server/1795:
#0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
#1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
stack backtrace:
CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
Call Trace:
dump_stack+0x67/0x90
__lock_acquire.cold.67+0xd2/0x20b
lock_acquire+0xb5/0x1c0
lock_sock_nested+0x6d/0x90
hvs_release+0x10/0x120 [hv_sock]
__vsock_release+0x24/0xf0 [vsock]
__vsock_release+0xa0/0xf0 [vsock]
vsock_release+0x12/0x30 [vsock]
__sock_release+0x37/0xa0
sock_close+0x14/0x20
__fput+0xc1/0x250
task_work_run+0x98/0xc0
do_exit+0x344/0xc60
do_group_exit+0x47/0xb0
get_signal+0x15c/0xc50
do_signal+0x30/0x720
exit_to_usermode_loop+0x50/0xa0
do_syscall_64+0x24e/0x270
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f4184e85f31
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 44b321e502 ]
Commit dfec0ee22c ("udp: Record gso_segs when supporting UDP segmentation offload")
added gso_segs calculation, but incorrectly got sizeof() the pointer and
not the underlying data type. In addition let's fix the v6 case.
Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Fixes: dfec0ee22c ("udp: Record gso_segs when supporting UDP segmentation offload")
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit db9b2e0af6 ]
Fix the rxrpc_recvmsg tracepoint to handle being called with a NULL call
parameter.
Fixes: a25e21f0bc ("rxrpc, afs: Use debug_ids rather than pointers in traces")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8c7138b33e ]
The "reuse->sock[]" array is shared by multiple sockets. The going away
sk must unpublish itself from "reuse->sock[]" before making call_rcu()
call. However, this unpublish-action is currently done after a grace
period and it may cause use-after-free.
The fix is to move reuseport_detach_sock() to sk_destruct().
Due to the above reason, any socket with sk_reuseport_cb has
to go through the rcu grace period before freeing it.
It is a rather old bug (~3 yrs). The Fixes tag is not necessary
the right commit but it is the one that introduced the SOCK_RCU_FREE
logic and this fix is depending on it.
Fixes: a4298e4522 ("net: add SOCK_RCU_FREE socket flag")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1acb8f2a7a ]
In ql_alloc_large_buffers, a new skb is allocated via netdev_alloc_skb.
This skb should be released if pci_dma_mapping_error fails.
Fixes: 0f8ab89e82 ("qla3xxx: Check return code from pci_map_single() in ql_release_to_lrg_buf_free_list(), ql_populate_free_queue(), ql_alloc_large_buffers(), and ql3xxx_send()")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b406472b5a ]
Since commit c09551c6ff ("net: ipv4: use a dedicated counter
for icmp_v4 redirect packets") we use 'n_redirects' to account
for redirect packets, but we still use 'rate_tokens' to compute
the redirect packets exponential backoff.
If the device sent to the relevant peer any ICMP error packet
after sending a redirect, it will also update 'rate_token' according
to the leaking bucket schema; typically 'rate_token' will raise
above BITS_PER_LONG and the redirect packets backoff algorithm
will produce undefined behavior.
Fix the issue using 'n_redirects' to compute the exponential backoff
in ip_rt_send_redirect().
Note that we still clear rate_tokens after a redirect silence period,
to avoid changing an established behaviour.
The root cause predates git history; before the mentioned commit in
the critical scenario, the kernel stopped sending redirects, after
the mentioned commit the behavior more randomic.
Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: c09551c6ff ("net: ipv4: use a dedicated counter for icmp_v4 redirect packets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2d819d250a ]
Rajendra reported a kernel panic when a link was taken down:
[ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
<snip>
[ 6870.570501] Call Trace:
[ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
[ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
[ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
[ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
[ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
[ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
[ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
[ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
[ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
[ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
[ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
addrconf_dad_work is kicked to be scheduled when a device is brought
up. There is a race between addrcond_dad_work getting scheduled and
taking the rtnl lock and a process taking the link down (under rtnl).
The latter removes the host route from the inet6_addr as part of
addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
to use the host route in __ipv6_ifa_notify. If the down event removes
the host route due to the race to the rtnl, then the BUG listed above
occurs.
Since the DAD sequence can not be aborted, add a check for the missing
host route in __ipv6_ifa_notify. The only way this should happen is due
to the previously mentioned race. The host route is created when the
address is added to an interface; it is only removed on a down event
where the address is kept. Add a warning if the host route is missing
AND the device is up; this is a situation that should never happen.
Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6af1799aaf ]
This began with a syzbot report. syzkaller was injecting
IPv6 TCP SYN packets having a v4mapped source address.
After an unsuccessful 4-tuple lookup, TCP creates a request
socket (SYN_RECV) and calls reqsk_queue_hash_req()
reqsk_queue_hash_req() calls sk_ehashfn(sk)
At this point we have AF_INET6 sockets, and the heuristic
used by sk_ehashfn() to either hash the IPv4 or IPv6 addresses
is to use ipv6_addr_v4mapped(&sk->sk_v6_daddr)
For the particular spoofed packet, we end up hashing V4 addresses
which were not initialized by the TCP IPv6 stack, so KMSAN fired
a warning.
I first fixed sk_ehashfn() to test both source and destination addresses,
but then faced various problems, including user-space programs
like packetdrill that had similar assumptions.
Instead of trying to fix the whole ecosystem, it is better
to admit that we have a dual stack behavior, and that we
can not build linux kernels without V4 stack anyway.
The dual stack API automatically forces the traffic to be IPv4
if v4mapped addresses are used at bind() or connect(), so it makes
no sense to allow IPv6 traffic to use the same v4mapped class.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8353da9fa6 ]
Fix NULL-pointer dereference on tty open due to a failure to handle a
missing interrupt-in endpoint when probing modem ports:
BUG: kernel NULL pointer dereference, address: 0000000000000006
...
RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
...
Call Trace:
hso_start_serial_device+0xdc/0x140 [hso]
hso_serial_open+0x118/0x1b0 [hso]
tty_open+0xf1/0x490
Fixes: 542f548236 ("tty: Modem functions for the HSO driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0e141f757b ]
erspan driver calls ether_setup(), after commit 61e84623ac
("net: centralize net_device min/max MTU checking"), the range
of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.
It causes the dev mtu of the erspan device to not be greater
than 1500, this limit value is not correct for ipgre tap device.
Tested:
Before patch:
# ip link set erspan0 mtu 1600
Error: mtu greater than device maximum.
After patch:
# ip link set erspan0 mtu 1600
# ip -d link show erspan0
21: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0
Fixes: 61e84623ac ("net: centralize net_device min/max MTU checking")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6b517374f4 ]
When fetching free MSI-X vectors for ULDs, check for the error code
before accessing MSI-X info array. Otherwise, an out-of-bounds access is
attempted, which results in kernel panic.
Fixes: 94cdb8bb99 ("cxgb4: Add support for dynamic allocation of resources for ULD")
Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c751798aa2 upstream.
syzkaller managed to trigger the warning in bpf_jit_free() which checks via
bpf_prog_kallsyms_verify_off() for potentially unlinked JITed BPF progs
in kallsyms, and subsequently trips over GPF when walking kallsyms entries:
[...]
8021q: adding VLAN 0 to HW filter on device batadv0
8021q: adding VLAN 0 to HW filter on device batadv0
WARNING: CPU: 0 PID: 9869 at kernel/bpf/core.c:810 bpf_jit_free+0x1e8/0x2a0
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events bpf_prog_free_deferred
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x113/0x167 lib/dump_stack.c:113
panic+0x212/0x40b kernel/panic.c:214
__warn.cold.8+0x1b/0x38 kernel/panic.c:571
report_bug+0x1a4/0x200 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:178 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:bpf_jit_free+0x1e8/0x2a0
Code: 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 86 00 00 00 48 ba 00 02 00 00 00 00 ad de 0f b6 43 02 49 39 d6 0f 84 5f fe ff ff <0f> 0b e9 58 fe ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1
RSP: 0018:ffff888092f67cd8 EFLAGS: 00010202
RAX: 0000000000000007 RBX: ffffc90001947000 RCX: ffffffff816e9d88
RDX: dead000000000200 RSI: 0000000000000008 RDI: ffff88808769f7f0
RBP: ffff888092f67d00 R08: fffffbfff1394059 R09: fffffbfff1394058
R10: fffffbfff1394058 R11: ffffffff89ca02c7 R12: ffffc90001947002
R13: ffffc90001947020 R14: ffffffff881eca80 R15: ffff88808769f7e8
BUG: unable to handle kernel paging request at fffffbfff400d000
#PF error: [normal kernel read fault]
PGD 21ffee067 P4D 21ffee067 PUD 21ffed067 PMD 9f942067 PTE 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events bpf_prog_free_deferred
RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:495 [inline]
RIP: 0010:bpf_tree_comp kernel/bpf/core.c:558 [inline]
RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
RIP: 0010:bpf_prog_kallsyms_find+0x107/0x2e0 kernel/bpf/core.c:632
Code: 00 f0 ff ff 44 38 c8 7f 08 84 c0 0f 85 fa 00 00 00 41 f6 45 02 01 75 02 0f 0b 48 39 da 0f 82 92 00 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 03 0f 8e 45 01 00 00 8b 03 48 c1 e0
[...]
Upon further debugging, it turns out that whenever we trigger this
issue, the kallsyms removal in bpf_prog_ksym_node_del() was /skipped/
but yet bpf_jit_free() reported that the entry is /in use/.
Problem is that symbol exposure via bpf_prog_kallsyms_add() but also
perf_event_bpf_event() were done /after/ bpf_prog_new_fd(). Once the
fd is exposed to the public, a parallel close request came in right
before we attempted to do the bpf_prog_kallsyms_add().
Given at this time the prog reference count is one, we start to rip
everything underneath us via bpf_prog_release() -> bpf_prog_put().
The memory is eventually released via deferred free, so we're seeing
that bpf_jit_free() has a kallsym entry because we added it from
bpf_prog_load() but /after/ bpf_prog_put() from the remote CPU.
Therefore, move both notifications /before/ we install the fd. The
issue was never seen between bpf_prog_alloc_id() and bpf_prog_new_fd()
because upon bpf_prog_get_fd_by_id() we'll take another reference to
the BPF prog, so we're still holding the original reference from the
bpf_prog_load().
Fixes: 6ee52e2a3f ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
Fixes: 74451e66d5 ("bpf: make jited programs visible in traces")
Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cb8acabbe3 ]
Commit 7211aef86f ("block: mq-deadline: Fix write completion
handling") added a call to blk_mq_sched_mark_restart_hctx() in
dd_dispatch_request() to make sure that write request dispatching does
not stall when all target zones are locked. This fix left a subtle race
when a write completion happens during a dispatch execution on another
CPU:
CPU 0: Dispatch CPU1: write completion
dd_dispatch_request()
lock(&dd->lock);
...
lock(&dd->zone_lock); dd_finish_request()
rq = find request lock(&dd->zone_lock);
unlock(&dd->zone_lock);
zone write unlock
unlock(&dd->zone_lock);
...
__blk_mq_free_request
check restart flag (not set)
-> queue not run
...
if (!rq && have writes)
blk_mq_sched_mark_restart_hctx()
unlock(&dd->lock)
Since the dispatch context finishes after the write request completion
handling, marking the queue as needing a restart is not seen from
__blk_mq_free_request() and blk_mq_sched_restart() not executed leading
to the dispatch stall under 100% write workloads.
Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from
dd_dispatch_request() into dd_finish_request() under the zone lock to
ensure full mutual exclusion between write request dispatch selection
and zone unlock on write request completion.
Fixes: 7211aef86f ("block: mq-deadline: Fix write completion handling")
Cc: stable@vger.kernel.org
Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6279eb3dd7 ]
Since 9e3596b0c6 ("kbuild: initramfs cleanup, set target from Kconfig")
"make clean" leaves behind compressed initramfs images. Example:
$ make defconfig
$ sed -i 's|CONFIG_INITRAMFS_SOURCE=""|CONFIG_INITRAMFS_SOURCE="/tmp/ir.cpio"|' .config
$ make olddefconfig
$ make -s
$ make -s clean
$ git clean -ndxf | grep initramfs
Would remove usr/initramfs_data.cpio.gz
clean rules do not have CONFIG_* context so they do not know which
compression format was used. Thus they don't know which files to delete.
Tell clean to delete all possible compression formats.
Once patched usr/initramfs_data.cpio.gz and friends are deleted by
"make clean".
Link: http://lkml.kernel.org/r/20190722063251.55541-1-gthelen@google.com
Fixes: 9e3596b0c6 ("kbuild: initramfs cleanup, set target from Kconfig")
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24fbf7bad8 ]
There are two problems in sec_free_hw_sgl():
First, when sgl_current->next is valid, @hw_sgl will be freed in the
first loop, but it free again after the loop.
Second, sgl_current and sgl_current->next_sgl is not match when
dma_pool_free() is invoked, the third parameter should be the dma
address of sgl_current, but sgl_current->next_sgl is the dma address
of next chain, so use sgl_current->next_sgl is wrong.
Fix this by deleting the last dma_pool_free() in sec_free_hw_sgl(),
modifying the condition for while loop, and matching the address for
dma_pool_free().
Fixes: 915e4e8413 ("crypto: hisilicon - SEC security accelerator driver")
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b54c64f7ad ]
In hypfs_fill_super(), if hypfs_create_update_file() fails,
sbi->update_file is left holding an error number. This is passed to
hypfs_kill_super() which doesn't check for this.
Fix this by not setting sbi->update_value until after we've checked for
error.
Fixes: 24bbb1faf3 ("[PATCH] s390_hypfs filesystem")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
cc: Heiko Carstens <heiko.carstens@de.ibm.com>
cc: linux-s390@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 07bfa4415a ]
If userspace reads the buffer via blockdev while mounting,
sb_getblk()+modify can race with buffer read via blockdev.
For example,
FS userspace
bh = sb_getblk()
modify bh->b_data
read
ll_rw_block(bh)
fill bh->b_data by on-disk data
/* lost modified data by FS */
set_buffer_uptodate(bh)
set_buffer_uptodate(bh)
Userspace should not use the blockdev while mounting though, the udev
seems to be already doing this. Although I think the udev should try to
avoid this, workaround the race by small overhead.
Link: http://lkml.kernel.org/r/87pnk7l3sw.fsf_-_@mail.parknet.co.jp
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>