Since commit 1b6b26ae70 ("pipe: fix and clarify pipe write wakeup
logic") we have sanitized the pipe write logic, and would only try to
wake up readers if they needed it.
In particular, if the pipe already had data in it before the write,
there was no point in trying to wake up a reader, since any existing
readers must have been aware of the pre-existing data already. Doing
extraneous wakeups will only cause potential thundering herd problems.
However, it turns out that some Android libraries have misused the EPOLL
interface, and expected "edge triggered" be to "any new write will
trigger it". Even if there was no edge in sight.
Quoting Sandeep Patil:
"The commit 1b6b26ae70 ('pipe: fix and clarify pipe write wakeup
logic') changed pipe write logic to wakeup readers only if the pipe
was empty at the time of write. However, there are libraries that
relied upon the older behavior for notification scheme similar to
what's described in [1]
One such library 'realm-core'[2] is used by numerous Android
applications. The library uses a similar notification mechanism as GNU
Make but it never drains the pipe until it is full. When Android moved
to v5.10 kernel, all applications using this library stopped working.
The library has since been fixed[3] but it will be a while before all
applications incorporate the updated library"
Our regression rule for the kernel is that if applications break from
new behavior, it's a regression, even if it was because the application
did something patently wrong. Also note the original report [4] by
Michal Kerrisk about a test for this epoll behavior - but at that point
we didn't know of any actual broken use case.
So add the extraneous wakeup, to approximate the old behavior.
[ I say "approximate", because the exact old behavior was to do a wakeup
not for each write(), but for each pipe buffer chunk that was filled
in. The behavior introduced by this change is not that - this is just
"every write will cause a wakeup, whether necessary or not", which
seems to be sufficient for the broken library use. ]
It's worth noting that this adds the extraneous wakeup only for the
write side, while the read side still considers the "edge" to be purely
about reading enough from the pipe to allow further writes.
See commit f467a6a664 ("pipe: fix and clarify pipe read wakeup logic")
for the pipe read case, which remains that "only wake up if the pipe was
full, and we read something from it".
Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1]
Link: https://github.com/realm/realm-core [2]
Link: https://github.com/realm/realm-core/issues/4666 [3]
Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4]
Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/
Bug: 193851993
Reported-by: Sandeep Patil <sspatil@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 3a34b13a88)
Signed-off-by: Sandeep Patil <sspatil@android.com>
Change-Id: Idcf3e8faa31bff47ada4b815237a355e0757b964
Changes in 5.10.55
tools: Allow proper CC/CXX/... override with LLVM=1 in Makefile.include
io_uring: fix link timeout refs
KVM: x86: determine if an exception has an error code only when injecting it.
af_unix: fix garbage collect vs MSG_PEEK
workqueue: fix UAF in pwq_unbound_release_workfn()
cgroup1: fix leaked context root causing sporadic NULL deref in LTP
net/802/mrp: fix memleak in mrp_request_join()
net/802/garp: fix memleak in garp_request_join()
net: annotate data race around sk_ll_usec
sctp: move 198 addresses from unusable to private scope
rcu-tasks: Don't delete holdouts within trc_inspect_reader()
rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader()
ipv6: allocate enough headroom in ip6_finish_output2()
drm/ttm: add a check against null pointer dereference
hfs: add missing clean-up in hfs_fill_super
hfs: fix high memory mapping in hfs_bnode_read
hfs: add lock nesting notation to hfs_find_init
firmware: arm_scmi: Fix possible scmi_linux_errmap buffer overflow
firmware: arm_scmi: Fix range check for the maximum number of pending messages
cifs: fix the out of range assignment to bit fields in parse_server_interfaces
iomap: remove the length variable in iomap_seek_data
iomap: remove the length variable in iomap_seek_hole
ARM: dts: versatile: Fix up interrupt controller node names
ipv6: ip6_finish_output2: set sk into newly allocated nskb
Linux 5.10.55
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I197fe56bf17800bbd0e8dd9ae7fc4ddac4203a60
[ Upstream commit 82a1c67554 ]
Once the new schema interrupt-controller/arm,vic.yaml is added, we get
the below warnings:
arch/arm/boot/dts/versatile-ab.dt.yaml:
intc@10140000: $nodename:0: 'intc@10140000' does not match
'^interrupt-controller(@[0-9a-f,]+)*$'
arch/arm/boot/dts/versatile-ab.dt.yaml:
intc@10140000: 'clear-mask' does not match any of the regexes
Fix the node names for the interrupt controller to conform
to the standard node name interrupt-controller@.. Also drop invalid
clear-mask property.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20210701132118.759454-1-sudeep.holla@arm.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 49694d14ff ]
The length variable is rather pointless given that it can be trivially
deduced from offset and size. Also the initial calculation can lead
to KASAN warnings.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Leizhen (ThunderTown) <thunder.leizhen@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ac1d42651 ]
The length variable is rather pointless given that it can be trivially
deduced from offset and size. Also the initial calculation can lead
to KASAN warnings.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Leizhen (ThunderTown) <thunder.leizhen@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c9c9c6815f ]
Because the out of range assignment to bit fields
are compiler-dependant, the fields could have wrong
value.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bdb8742dc6 ]
SCMI message headers carry a sequence number and such field is sized to
allow for MSG_TOKEN_MAX distinct numbers; moreover zero is not really an
acceptable maximum number of pending in-flight messages.
Fix accordingly the checks performed on the value exported by transports
in scmi_desc.max_msg
Link: https://lore.kernel.org/r/20210712141833.6628-3-cristian.marussi@arm.com
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
[sudeep.holla: updated the patch title and error message]
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a691f16cc ]
The scmi_linux_errmap buffer access index is supposed to depend on the
array size to prevent element out of bounds access. It uses SCMI_ERR_MAX
to check bounds but that can mismatch with the array size. It also
changes the success into -EIO though scmi_linux_errmap is never used in
case of success, it is expected to work for success case too.
It is slightly confusing code as the negative of the error code
is used as index to the buffer. Fix it by negating it at the start and
make it more readable.
Link: https://lore.kernel.org/r/20210707135028.1869642-1-sudeep.holla@arm.com
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9ab9cce93 ]
Invoking trc_del_holdout() from within trc_wait_for_one_reader() is
only a performance optimization because the RCU Tasks Trace grace-period
kthread will eventually do this within check_all_holdout_tasks_trace().
But it is not a particularly important performance optimization because
it only applies to the grace-period kthread, of which there is but one.
This commit therefore removes this invocation of trc_del_holdout() in
favor of the one in check_all_holdout_tasks_trace() in the grace-period
kthread.
Reported-by: "Xu, Yanfei" <yanfei.xu@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d10bf55d8 ]
As Yanfei pointed out, although invoking trc_del_holdout() is safe
from the viewpoint of the integrity of the holdout list itself,
the put_task_struct() invoked by trc_del_holdout() can result in
use-after-free errors due to later accesses to this task_struct structure
by the RCU Tasks Trace grace-period kthread.
This commit therefore removes this call to trc_del_holdout() from
trc_inspect_reader() in favor of the grace-period thread's existing call
to trc_del_holdout(), thus eliminating that particular class of
use-after-free errors.
Reported-by: "Xu, Yanfei" <yanfei.xu@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d11fa231c ]
The doc draft-stewart-tsvwg-sctp-ipv4-00 that restricts 198 addresses
was never published. These addresses as private addresses should be
allowed to use in SCTP.
As Michael Tuexen suggested, this patch is to move 198 addresses from
unusable to private scope.
Reported-by: Sérgio <surkamp@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0dbffbb533 ]
sk_ll_usec is read locklessly from sk_can_busy_loop()
while another thread can change its value in sock_setsockopt()
This is correct but needs annotations.
BUG: KCSAN: data-race in __skb_try_recv_datagram / sock_setsockopt
write to 0xffff88814eb5f904 of 4 bytes by task 14011 on cpu 0:
sock_setsockopt+0x1287/0x2090 net/core/sock.c:1175
__sys_setsockopt+0x14f/0x200 net/socket.c:2100
__do_sys_setsockopt net/socket.c:2115 [inline]
__se_sys_setsockopt net/socket.c:2112 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2112
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88814eb5f904 of 4 bytes by task 14001 on cpu 1:
sk_can_busy_loop include/net/busy_poll.h:41 [inline]
__skb_try_recv_datagram+0x14f/0x320 net/core/datagram.c:273
unix_dgram_recvmsg+0x14c/0x870 net/unix/af_unix.c:2101
unix_seqpacket_recvmsg+0x5a/0x70 net/unix/af_unix.c:2067
____sys_recvmsg+0x15d/0x310 include/linux/uio.h:244
___sys_recvmsg net/socket.c:2598 [inline]
do_recvmmsg+0x35c/0x9f0 net/socket.c:2692
__sys_recvmmsg net/socket.c:2771 [inline]
__do_sys_recvmmsg net/socket.c:2794 [inline]
__se_sys_recvmmsg net/socket.c:2787 [inline]
__x64_sys_recvmmsg+0xcf/0x150 net/socket.c:2787
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00000000 -> 0x00000101
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 14001 Comm: syz-executor.3 Not tainted 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 1e7107c5ef upstream.
Richard reported sporadic (roughly one in 10 or so) null dereferences and
other strange behaviour for a set of automated LTP tests. Things like:
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
RIP: 0010:kernfs_sop_show_path+0x1b/0x60
...or these others:
RIP: 0010:do_mkdirat+0x6a/0xf0
RIP: 0010:d_alloc_parallel+0x98/0x510
RIP: 0010:do_readlinkat+0x86/0x120
There were other less common instances of some kind of a general scribble
but the common theme was mount and cgroup and a dubious dentry triggering
the NULL dereference. I was only able to reproduce it under qemu by
replicating Richard's setup as closely as possible - I never did get it
to happen on bare metal, even while keeping everything else the same.
In commit 71d883c37e ("cgroup_do_mount(): massage calling conventions")
we see this as a part of the overall change:
--------------
struct cgroup_subsys *ss;
- struct dentry *dentry;
[...]
- dentry = cgroup_do_mount(&cgroup_fs_type, fc->sb_flags, root,
- CGROUP_SUPER_MAGIC, ns);
[...]
- if (percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
- struct super_block *sb = dentry->d_sb;
- dput(dentry);
+ ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns);
+ if (!ret && percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
+ struct super_block *sb = fc->root->d_sb;
+ dput(fc->root);
deactivate_locked_super(sb);
msleep(10);
return restart_syscall();
}
--------------
In changing from the local "*dentry" variable to using fc->root, we now
export/leave that dentry pointer in the file context after doing the dput()
in the unlikely "is_dying" case. With LTP doing a crazy amount of back to
back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely
becomes slightly likely and then bad things happen.
A fix would be to not leave the stale reference in fc->root as follows:
--------------
dput(fc->root);
+ fc->root = NULL;
deactivate_locked_super(sb);
--------------
...but then we are just open-coding a duplicate of fc_drop_locked() so we
simply use that instead.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org # v5.1+
Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Fixes: 71d883c37e ("cgroup_do_mount(): massage calling conventions")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbcf01128d upstream.
unix_gc() assumes that candidate sockets can never gain an external
reference (i.e. be installed into an fd) while the unix_gc_lock is
held. Except for MSG_PEEK this is guaranteed by modifying inflight
count under the unix_gc_lock.
MSG_PEEK does not touch any variable protected by unix_gc_lock (file
count is not), yet it needs to be serialized with garbage collection.
Do this by locking/unlocking unix_gc_lock:
1) increment file count
2) lock/unlock barrier to make sure incremented file count is visible
to garbage collection
3) install file into fd
This is a lock barrier (unlike smp_mb()) that ensures that garbage
collection is run completely before or completely after the barrier.
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f62700ce63 upstream.
selftests/bpf/Makefile includes tools/scripts/Makefile.include.
With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed the case if CC/AR/LD/CXX/STRIP is allowed to be
overridden, it will be written to clang/llvm-ar/..., instead of
gcc binaries. The definition of CC_NO_CLANG is also relocated
to the place after the above CC is defined.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210413153419.3028165-1-yhs@fb.com
Cc: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When merging the KVM MTE support, the blob that was interposed between
the chair and the keyboard experienced a neuronal accident (also known
as a brain fart), turning a check for VM_SHARED into VM_PFNMAP as it
was reshuffling some of the code.
The blob having now come back to its senses, let's restore the
initial check that the original author got right the first place.
Fixes: ea7fc1bb1c ("KVM: arm64: Introduce MTE VM feature")
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210713114804.594993-1-maz@kernel.org
(cherry picked from commit 80d9ac9bd7
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git fixes)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 192636784
Change-Id: I78c4ba6ff258812a6bce6d782ed84856035ec61d
Since commit 903cd0f315 ("swiotlb: Use is_swiotlb_force_bounce for
swiotlb data bouncing") if code sets swiotlb_force it needs to do so
before the swiotlb is initialised. Otherwise
io_tlb_default_mem->force_bounce will not get set to true, and devices
that use (the default) swiotlb will not bounce despite switolb_force
having the value of SWIOTLB_FORCE.
Let us restore swiotlb functionality for PV by fulfilling this new
requirement.
This change addresses what turned out to be a fragility in
commit 64e1f0c531 ("s390/mm: force swiotlb for protected
virtualization"), which ain't exactly broken in its original context,
but could give us some more headache if people backport the broken
change and forget this fix.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 903cd0f315 ("swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing")
Fixes: 64e1f0c531 ("s390/mm: force swiotlb for protected virtualization")
Cc: stable@vger.kernel.org #5.3+
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
(cherry picked from commit 93ebb68287 swiotlb/devel/for-linus-5.15)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190591509
Change-Id: I0cb7cec859f8c6d7221ccd5bf652d4def4c87092
Changes in 5.10.54
igc: Fix use-after-free error during reset
igb: Fix use-after-free error during reset
igc: change default return of igc_read_phy_reg()
ixgbe: Fix an error handling path in 'ixgbe_probe()'
igc: Fix an error handling path in 'igc_probe()'
igb: Fix an error handling path in 'igb_probe()'
fm10k: Fix an error handling path in 'fm10k_probe()'
e1000e: Fix an error handling path in 'e1000_probe()'
iavf: Fix an error handling path in 'iavf_probe()'
igb: Check if num of q_vectors is smaller than max before array access
igb: Fix position of assignment to *ring
gve: Fix an error handling path in 'gve_probe()'
net: add kcov handle to skb extensions
bonding: fix suspicious RCU usage in bond_ipsec_add_sa()
bonding: fix null dereference in bond_ipsec_add_sa()
ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops
bonding: fix suspicious RCU usage in bond_ipsec_del_sa()
bonding: disallow setting nested bonding + ipsec offload
bonding: Add struct bond_ipesc to manage SA
bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()
bonding: fix incorrect return value of bond_ipsec_offload_ok()
ipv6: fix 'disable_policy' for fwd packets
stmmac: platform: Fix signedness bug in stmmac_probe_config_dt()
selftests: icmp_redirect: remove from checking for IPv6 route get
selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect
pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped
cxgb4: fix IRQ free race during driver unload
mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join
nvme-pci: do not call nvme_dev_remove_admin from nvme_remove
KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM
perf inject: Fix dso->nsinfo refcounting
perf map: Fix dso->nsinfo refcounting
perf probe: Fix dso->nsinfo refcounting
perf env: Fix sibling_dies memory leak
perf test session_topology: Delete session->evlist
perf test event_update: Fix memory leak of evlist
perf dso: Fix memory leak in dso__new_map()
perf test maps__merge_in: Fix memory leak of maps
perf env: Fix memory leak of cpu_pmu_caps
perf report: Free generated help strings for sort option
perf script: Fix memory 'threads' and 'cpus' leaks on exit
perf lzma: Close lzma stream on exit
perf probe-file: Delete namelist in del_events() on the error path
perf data: Close all files in close_dir()
perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
ASoC: wm_adsp: Correct wm_coeff_tlv_get handling
spi: imx: add a check for speed_hz before calculating the clock
spi: stm32: fixes pm_runtime calls in probe/remove
regulator: hi6421: Use correct variable type for regmap api val argument
regulator: hi6421: Fix getting wrong drvdata
spi: mediatek: fix fifo rx mode
ASoC: rt5631: Fix regcache sync errors on resume
bpf, test: fix NULL pointer dereference on invalid expected_attach_type
bpf: Fix tail_call_reachable rejection for interpreter when jit failed
xdp, net: Fix use-after-free in bpf_xdp_link_release
timers: Fix get_next_timer_interrupt() with no timers pending
liquidio: Fix unintentional sign extension issue on left shift of u16
s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1]
bpf, sockmap: Fix potential memory leak on unlikely error case
bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats
bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats
bpftool: Check malloc return value in mount_bpffs_for_pin
net: fix uninit-value in caif_seqpkt_sendmsg
usb: hso: fix error handling code of hso_create_net_device
dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}
efi/tpm: Differentiate missing and invalid final event log table.
net: decnet: Fix sleeping inside in af_decnet
KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash
KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak
net: sched: fix memory leak in tcindex_partial_destroy_work
sctp: trim optlen when it's a huge value in sctp_setsockopt
netrom: Decrease sock refcount when sock timers expire
scsi: iscsi: Fix iface sysfs attr detection
scsi: target: Fix protect handling in WRITE SAME(32)
spi: cadence: Correct initialisation of runtime PM again
ACPI: Kconfig: Fix table override from built-in initrd
bnxt_en: don't disable an already disabled PCI device
bnxt_en: Refresh RoCE capabilities in bnxt_ulp_probe()
bnxt_en: Add missing check for BNXT_STATE_ABORT_ERR in bnxt_fw_rset_task()
bnxt_en: Validate vlan protocol ID on RX packets
bnxt_en: Check abort error state in bnxt_half_open_nic()
net: hisilicon: rename CACHE_LINE_MASK to avoid redefinition
net/tcp_fastopen: fix data races around tfo_active_disable_stamp
ALSA: hda: intel-dsp-cfg: add missing ElkhartLake PCI ID
net: hns3: fix possible mismatches resp of mailbox
net: hns3: fix rx VLAN offload state inconsistent issue
spi: spi-bcm2835: Fix deadlock
net/sched: act_skbmod: Skip non-Ethernet packets
ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions
ceph: don't WARN if we're still opening a session to an MDS
nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING
Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem"
afs: Fix tracepoint string placement with built-in AFS
r8169: Avoid duplicate sysfs entry creation error
nvme: set the PRACT bit when using Write Zeroes with T10 PI
sctp: update active_key for asoc when old key is being replaced
tcp: disable TFO blackhole logic by default
net: dsa: sja1105: make VID 4095 a bridge VLAN too
net: sched: cls_api: Fix the the wrong parameter
drm/panel: raspberrypi-touchscreen: Prevent double-free
cifs: only write 64kb at a time when fallocating a small region of a file
cifs: fix fallocate when trying to allocate a hole.
proc: Avoid mixing integer types in mem_rw()
mmc: core: Don't allocate IDA for OF aliases
s390/ftrace: fix ftrace_update_ftrace_func implementation
s390/boot: fix use of expolines in the DMA code
ALSA: usb-audio: Add missing proc text entry for BESPOKEN type
ALSA: usb-audio: Add registration quirk for JBL Quantum headsets
ALSA: sb: Fix potential ABBA deadlock in CSP driver
ALSA: hda/realtek: Fix pop noise and 2 Front Mic issues on a machine
ALSA: hdmi: Expose all pins on MSI MS-7C94 board
ALSA: pcm: Call substream ack() method upon compat mmap commit
ALSA: pcm: Fix mmap capability check
Revert "usb: renesas-xhci: Fix handling of unknown ROM state"
usb: xhci: avoid renesas_usb_fw.mem when it's unusable
xhci: Fix lost USB 2 remote wake
KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow
KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state
usb: hub: Disable USB 3 device initiated lpm if exit latency is too high
usb: hub: Fix link power management max exit latency (MEL) calculations
USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS
usb: max-3421: Prevent corruption of freed memory
usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop()
USB: serial: option: add support for u-blox LARA-R6 family
USB: serial: cp210x: fix comments for GE CS1000
USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
usb: gadget: Fix Unbalanced pm_runtime_enable in tegra_xudc_probe
usb: dwc2: gadget: Fix GOUTNAK flow for Slave mode.
usb: dwc2: gadget: Fix sending zero length packet in DDMA mode.
usb: typec: stusb160x: register role switch before interrupt registration
firmware/efi: Tell memblock about EFI iomem reservations
tracepoints: Update static_call before tp_funcs when adding a tracepoint
tracing/histogram: Rename "cpu" to "common_cpu"
tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop.
tracing: Synthetic event field_pos is an index not a boolean
btrfs: check for missing device in btrfs_trim_fs
media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf()
ixgbe: Fix packet corruption due to missing DMA sync
bus: mhi: core: Validate channel ID when processing command completions
posix-cpu-timers: Fix rearm racing against process tick
selftest: use mmap instead of posix_memalign to allocate memory
io_uring: explicitly count entries for poll reqs
io_uring: remove double poll entry on arm failure
userfaultfd: do not untag user pointers
memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions
hugetlbfs: fix mount mode command line processing
rbd: don't hold lock_rwsem while running_list is being drained
rbd: always kick acquire on "acquired" and "released" notifications
misc: eeprom: at24: Always append device id even if label property is set.
nds32: fix up stack guard gap
driver core: Prevent warning when removing a device link from unregistered consumer
drm: Return -ENOTTY for non-drm ioctls
drm/amdgpu: update golden setting for sienna_cichlid
net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz
net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz
PCI: Mark AMD Navi14 GPU ATS as broken
bonding: fix build issue
skbuff: Release nfct refcount on napi stolen or re-used skbs
Documentation: Fix intiramfs script name
perf inject: Close inject.output on exit
usb: ehci: Prevent missed ehci interrupts with edge-triggered MSI
drm/i915/gvt: Clear d3_entered on elsp cmd submission.
sfc: ensure correct number of XDP queues
xhci: add xhci_get_virt_ep() helper
skbuff: Fix build with SKB extensions disabled
Linux 5.10.54
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2a4c1f623f8560268e9b41c3dae6690d641d9f6c
commit 9615fe36b3 upstream.
We will fail to build with CONFIG_SKB_EXTENSIONS disabled after
8550ff8d8c ("skbuff: Release nfct refcount on napi stolen or re-used
skbs") since there is an unconditionally use of skb_ext_find() without
an appropriate stub. Simply build the code conditionally and properly
guard against both COFNIG_SKB_EXTENSIONS as well as
CONFIG_NET_TC_SKB_EXT being disabled.
Fixes: Fixes: 8550ff8d8c ("skbuff: Release nfct refcount on napi stolen or re-used skbs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 788bc000d4 ]
Commit 99ba0ea616 ("sfc: adjust efx->xdp_tx_queue_count with the real
number of initialized queues") intended to fix a problem caused by a
round up when calculating the number of XDP channels and queues.
However, this was not the real problem. The real problem was that the
number of XDP TX queues had been reduced to half in
commit e26ca4b535 ("sfc: reduce the number of requested xdp ev queues"),
but the variable xdp_tx_queue_count had remained the same.
Once the correct number of XDP TX queues is created again in the
previous patch of this series, this also can be reverted since the error
doesn't actually exist.
Only in the case that there is a bug in the code we can have different
values in xdp_queue_number and efx->xdp_tx_queue_count. Because of this,
and per Edward Cree's suggestion, I add instead a WARN_ON to catch if it
happens again in the future.
Note that the number of allocated queues can be higher than the number
of used ones due to the round up, as explained in the existing comment
in the code. That's why we also have to stop increasing xdp_queue_number
beyond efx->xdp_tx_queue_count.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit c90b4503cc upstream.
d3_entered flag is used to mark for vgpu_reset a previous power
transition from D3->D0, typically for VM resume from S3, so that gvt
could skip PPGTT invalidation in current vgpu_reset during resuming.
In case S0ix exit, although there is D3->D0, guest driver continue to
use vgpu as normal, with d3_entered set, until next shutdown/reboot or
power transition.
If a reboot follows a S0ix exit, device power state transite as:
D0->D3->D0->D0(reboot), while system power state transites as:
S0->S0 (reboot). There is no vgpu_reset until D0(reboot), thus
d3_entered won't be cleared, the vgpu_reset will skip PPGTT invalidation
however those PPGTT entries are no longer valid. Err appears like:
gvt: vgpu 2: vfio_pin_pages failed for gfn 0xxxxx, ret -22
gvt: vgpu 2: fail: spt xxxx guest entry 0xxxxx type 2
gvt: vgpu 2: fail: shadow page xxxx guest entry 0xxxxx type 2.
Give gvt a chance to clear d3_entered on elsp cmd submission so that the
states before & after S0ix enter/exit are consistent.
Fixes: ba25d97757 ("drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0.")
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20210707004531.4873-1-colin.xu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b60557230 upstream.
When MSI is used by the ehci-hcd driver, it can cause lost interrupts which
results in EHCI only continuing to work due to a polling fallback. But the
reliance of polling drastically reduces performance of any I/O through EHCI.
Interrupts are lost as the EHCI interrupt handler does not safely handle
edge-triggered interrupts. It fails to ensure all interrupt status bits are
cleared, which works with level-triggered interrupts but not the
edge-triggered interrupts typical from using MSI.
To fix this problem, check if the driver may have raced with the hardware
setting additional interrupt status bits and clear status until it is in a
stable state.
Fixes: 306c54d0ed ("usb: hcd: Try MSI interrupts on PCI devices")
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Link: https://lore.kernel.org/r/20210715213744.GA44506@redhat
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8550ff8d8c upstream.
When multiple SKBs are merged to a new skb under napi GRO,
or SKB is re-used by napi, if nfct was set for them in the
driver, it will not be released while freeing their stolen
head state or on re-use.
Release nfct on napi's stolen or re-used SKBs, and
in gro_list_prepare, check conntrack metadata diff.
Fixes: 5c6b946047 ("net/mlx5e: CT: Handle misses after executing CT action")
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b69874f74 upstream.
The commit 9a5605505d (" bonding: Add struct bond_ipesc to manage SA") is causing
following build error when XFRM is not selected in kernel config.
lld: error: undefined symbol: xfrm_dev_state_flush
>>> referenced by bond_main.c:3453 (drivers/net/bonding/bond_main.c:3453)
>>> net/bonding/bond_main.o:(bond_netdev_event) in archive drivers/built-in.a
Fixes: 9a5605505d (" bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Taehee Yoo <ap420073@gmail.com>
CC: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 953b0dcbe2 upstream.
Commit bf3504cea7 ("net: dsa: mv88e6xxx: Add 6390 family PCS
registers to ethtool -d") added support for dumping SerDes PCS registers
via ethtool -d for Peridot.
The same implementation is also valid for Topaz, but was not
enabled at the time.
Signed-off-by: Marek Behún <kabel@kernel.org>
Fixes: bf3504cea7 ("net: dsa: mv88e6xxx: Add 6390 family PCS registers to ethtool -d")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a03b98d683 upstream.
Commit 0df9528736 ("mv88e6xxx: Add serdes Rx statistics") added
support for RX statistics on SerDes ports for Peridot.
This same implementation is also valid for Topaz, but was not enabled
at the time.
We need to use the generic .serdes_get_lane() method instead of the
Peridot specific one in the stats methods so that on Topaz the proper
one is used.
Signed-off-by: Marek Behún <kabel@kernel.org>
Fixes: 0df9528736 ("mv88e6xxx: Add serdes Rx statistics")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3abab27c32 upstream.
drm: Return -ENOTTY for non-drm ioctls
Return -ENOTTY from drm_ioctl() when userspace passes in a cmd number
which doesn't relate to the drm subsystem.
Glibc uses the TCGETS ioctl to implement isatty(), and without this
change isatty() returns it incorrectly returns true for drm devices.
To test run this command:
$ if [ -t 0 ]; then echo is a tty; fi < /dev/dri/card0
which shows "is a tty" without this patch.
This may also modify memory which the userspace application is not
expecting.
Signed-off-by: Charles Baylis <cb-kernel@fishzet.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/YPG3IBlzaMhfPqCr@stando.fishzet.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c36748ac54 upstream.
We need to append device id even if eeprom have a label property set as some
platform can have multiple eeproms with same label and we can not register
each of those with same label. Failing to register those eeproms trigger
cascade failures on such platform (system is no longer working).
This fix regression on such platform introduced with 4e302c3b56
Reported-by: Alexander Fomichev <fomichev.ru@gmail.com>
Fixes: 4e302c3b56 ("misc: eeprom: at24: fix NVMEM name with custom AT24 device name")
Cc: stable@vger.kernel.org
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8798d070d4 upstream.
Skipping the "lock has been released" notification if the lock owner
is not what we expect based on owner_cid can lead to I/O hangs.
One example is our own notifications: because owner_cid is cleared
in rbd_unlock(), when we get our own notification it is processed as
unexpected/duplicate and maybe_kick_acquire() isn't called. If a peer
that requested the lock then doesn't go through with acquiring it,
I/O requests that came in while the lock was being quiesced would
be stalled until another I/O request is submitted and kicks acquire
from rbd_img_exclusive_lock().
This makes the comment in rbd_release_lock() actually true: prior to
this change the canceled work was being requeued in response to the
"lock has been acquired" notification from rbd_handle_acquired_lock().
Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Robin Geuze <robin.geuze@nl.team.blue>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed9eb71085 upstream.
Currently rbd_quiesce_lock() holds lock_rwsem for read while blocking
on releasing_wait completion. On the I/O completion side, each image
request also needs to take lock_rwsem for read. Because rw_semaphore
implementation doesn't allow new readers after a writer has indicated
interest in the lock, this can result in a deadlock if something that
needs to take lock_rwsem for write gets involved. For example:
1. watch error occurs
2. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and
releases lock_rwsem
3. after reestablishing the watch, rbd_reregister_watch() takes
lock_rwsem for write and calls rbd_reacquire_lock()
4. rbd_quiesce_lock() downgrades lock_rwsem to for read and blocks on
releasing_wait until running_list becomes empty
5. another watch error occurs
6. rbd_watch_errcb() blocks trying to take lock_rwsem for write
7. no in-flight image request can complete and delete itself from
running_list because lock_rwsem won't be granted anymore
A similar scenario can occur with "lock has been acquired" and "lock
has been released" notification handers which also take lock_rwsem for
write to update owner_cid.
We don't actually get anything useful from sitting on lock_rwsem in
rbd_quiesce_lock() -- owner_cid updates certainly don't need to be
synchronized with. In fact the whole owner_cid tracking logic could
probably be removed from the kernel client because we don't support
proxied maintenance operations.
Cc: stable@vger.kernel.org # 5.3+
URL: https://tracker.ceph.com/issues/42757
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Robin Geuze <robin.geuze@nl.team.blue>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>