commit a1078e821b upstream.
Instead of trying to allocate pages with GFP_USER in
add_ballooned_pages() check the available free memory via
si_mem_available(). GFP_USER is far less limiting memory exhaustion
than the test via si_mem_available().
This will avoid dom0 running out of memory due to excessive foreign
page mappings especially on ARM and on x86 in PVH mode, as those don't
have a pre-ballooned area which can be used for foreign mappings.
As the normal ballooning suffers from the same problem don't balloon
down more than si_mem_available() pages in one iteration. At the same
time limit the default maximum number of retries.
This is part of XSA-300.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit da99466ac2 ]
This fixes a global out-of-bounds read access in the copy_buffer
function of the floppy driver.
The FDDEFPRM ioctl allows one to set the geometry of a disk. The sect
and head fields (unsigned int) of the floppy_drive structure are used to
compute the max_sector (int) in the make_raw_rw_request function. It is
possible to overflow the max_sector. Next, max_sector is passed to the
copy_buffer function and used in one of the memcpy calls.
An unprivileged user could trigger the bug if the device is accessible,
but requires a floppy disk to be inserted.
The patch adds the check for the .sect * .head multiplication for not
overflowing in the set_geometry function.
The bug was found by syzkaller.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9b04609b78 ]
This fixes the invalid pointer dereference in the drive_name function of
the floppy driver.
The native_format field of the struct floppy_drive_params is used as
floppy_type array index in the drive_name function. Thus, the field
should be checked the same way as the autodetect field.
To trigger the bug, one could use a value out of range and set the drive
parameters with the FDSETDRVPRM ioctl. Next, FDGETDRVTYP ioctl should
be used to call the drive_name. A floppy disk is not required to be
inserted.
CAP_SYS_ADMIN is required to call FDSETDRVPRM.
The patch adds the check for a value of the native_format field to be in
the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array
indices.
The bug was found by syzkaller.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5635f897ed ]
This fixes a global out-of-bounds read access in the next_valid_format
function of the floppy driver.
The values from autodetect field of the struct floppy_drive_params are
used as indices for the floppy_type array in the next_valid_format
function 'floppy_type[DP->autodetect[probed_format]].sect'.
To trigger the bug, one could use a value out of range and set the drive
parameters with the FDSETDRVPRM ioctl. A floppy disk is not required to
be inserted.
CAP_SYS_ADMIN is required to call FDSETDRVPRM.
The patch adds the check for values of the autodetect field to be in the
'0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices.
The bug was found by syzkaller.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3554aeb99 ]
This fixes a divide by zero error in the setup_format_params function of
the floppy driver.
Two consecutive ioctls can trigger the bug: The first one should set the
drive geometry with such .sect and .rate values for the F_SECT_PER_TRACK
to become zero. Next, the floppy format operation should be called.
A floppy disk is not required to be inserted. An unprivileged user
could trigger the bug if the device is accessible.
The patch checks F_SECT_PER_TRACK for a non-zero value in the
set_geometry function. The proper check should involve a reasonable
upper limit for the .sect and .rate fields, but it could change the
UAPI.
The patch also checks F_SECT_PER_TRACK in the setup_format_params, and
cancels the formatting operation in case of zero.
The bug was found by syzkaller.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fe06a5128 ]
A recent commit efa14c3985 ("iavf: allow null RX descriptors") added
a null pointer sanity check on rx_buffer, however, rx_buffer is being
dereferenced before that check, which implies a null pointer dereference
bug can potentially occur. Fix this by only dereferencing rx_buffer
until after the null pointer check.
Addresses-Coverity: ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 433a06d7d7 ]
Defer probing of the orion-mdio interface when getting a clock returns
EPROBE_DEFER. This avoids locking up the Armada 8k SoC when mdio is used
before all clocks have been enabled.
Signed-off-by: Josua Mayer <josua@solid-run.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1788b8569f ]
gtp_encap_destroy() is called twice.
1. When interface is deleted.
2. When udp socket is destroyed.
either gtp->sk0 or gtp->sk1u could be freed by sock_put() in
gtp_encap_destroy(). so, when gtp_encap_destroy() is called again,
it would uses freed sk pointer.
patch makes gtp_encap_destroy() to set either gtp->sk0 or gtp->sk1u to
null. in addition, both gtp->sk0 and gtp->sk1u pointer are protected
by rtnl_lock. so, rtnl_lock() is added.
Test command:
gtp-link add gtp1 &
killall gtp-link
ip link del gtp1
Splat looks like:
[ 83.182767] BUG: KASAN: use-after-free in __lock_acquire+0x3a20/0x46a0
[ 83.184128] Read of size 8 at addr ffff8880cc7d5360 by task ip/1008
[ 83.185567] CPU: 1 PID: 1008 Comm: ip Not tainted 5.2.0-rc6+ #50
[ 83.188469] Call Trace:
[ ... ]
[ 83.200126] lock_acquire+0x141/0x380
[ 83.200575] ? lock_sock_nested+0x3a/0xf0
[ 83.201069] _raw_spin_lock_bh+0x38/0x70
[ 83.201551] ? lock_sock_nested+0x3a/0xf0
[ 83.202044] lock_sock_nested+0x3a/0xf0
[ 83.202520] gtp_encap_destroy+0x18/0xe0 [gtp]
[ 83.203065] gtp_encap_disable.isra.14+0x13/0x50 [gtp]
[ 83.203687] gtp_dellink+0x56/0x170 [gtp]
[ 83.204190] rtnl_delete_link+0xb4/0x100
[ ... ]
[ 83.236513] Allocated by task 976:
[ 83.236925] save_stack+0x19/0x80
[ 83.237332] __kasan_kmalloc.constprop.3+0xa0/0xd0
[ 83.237894] kmem_cache_alloc+0xd8/0x280
[ 83.238360] sk_prot_alloc.isra.42+0x50/0x200
[ 83.238874] sk_alloc+0x32/0x940
[ 83.239264] inet_create+0x283/0xc20
[ 83.239684] __sock_create+0x2dd/0x540
[ 83.240136] __sys_socket+0xca/0x1a0
[ 83.240550] __x64_sys_socket+0x6f/0xb0
[ 83.240998] do_syscall_64+0x9c/0x450
[ 83.241466] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 83.242061]
[ 83.242249] Freed by task 0:
[ 83.242616] save_stack+0x19/0x80
[ 83.243013] __kasan_slab_free+0x111/0x150
[ 83.243498] kmem_cache_free+0x89/0x250
[ 83.244444] __sk_destruct+0x38f/0x5a0
[ 83.245366] rcu_core+0x7e9/0x1c20
[ 83.245766] __do_softirq+0x213/0x8fa
Fixes: 1e3a3abd8b ("gtp: make GTP sockets in gtp_newlink optional")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c49a8682fc ]
Problem: The Linux Bluetooth stack yields complete control over the BLE
connection interval to the remote device.
The Linux Bluetooth stack provides access to the BLE connection interval
min and max values through /sys/kernel/debug/bluetooth/hci0/
conn_min_interval and /sys/kernel/debug/bluetooth/hci0/conn_max_interval.
These values are used for initial BLE connections, but the remote device
has the ability to request a connection parameter update. In the event
that the remote side requests to change the connection interval, the Linux
kernel currently only validates that the desired value is within the
acceptable range in the Bluetooth specification (6 - 3200, corresponding to
7.5ms - 4000ms). There is currently no validation that the desired value
requested by the remote device is within the min/max limits specified in
the conn_min_interval/conn_max_interval configurations. This essentially
leads to Linux yielding complete control over the connection interval to
the remote device.
The proposed patch adds a verification step to the connection parameter
update mechanism, ensuring that the desired value is within the min/max
bounds of the current connection. If the desired value is outside of the
current connection min/max values, then the connection parameter update
request is rejected and the negative response is returned to the remote
device. Recall that the initial connection is established using the local
conn_min_interval/conn_max_interval values, so this allows the Linux
administrator to retain control over the BLE connection interval.
The one downside that I see is that the current default Linux values for
conn_min_interval and conn_max_interval typically correspond to 30ms and
50ms respectively. If this change were accepted, then it is feasible that
some devices would no longer be able to negotiate to their desired
connection interval values. This might be remedied by setting the default
Linux conn_min_interval and conn_max_interval values to the widest
supported range (6 - 3200 / 7.5ms - 4000ms). This could lead to the same
behavior as the current implementation, where the remote device could
request to change the connection interval value to any value that is
permitted by the Bluetooth specification, and Linux would accept the
desired value.
Signed-off-by: Carey Sonsino <csonsino@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e30155fd23 ]
If an invalid role is sent from user space, gtp_encap_enable() will fail.
Then, it should call gtp_encap_disable_sock() but current code doesn't.
It makes memory leak.
Fixes: 91ed81f9ab ("gtp: support SGSN-side tunnels")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bff5a556c1 ]
'probe libc's inet_pton & backtrace it with ping' testcase sometimes
fails on powerpc because distro ping binary does not have symbol
information and thus it prints "[unknown]" function name in the
backtrace.
Accept "[unknown]" as valid function name for powerpc as well.
# perf test -v "probe libc's inet_pton & backtrace it with ping"
Before:
59: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 79695
ping 79718 [077] 96483.787025: probe_libc:inet_pton: (7fff83a754c8)
7fff83a754c8 __GI___inet_pton+0x8 (/usr/lib64/power9/libc-2.28.so)
7fff83a2b7a0 gaih_inet.constprop.7+0x1020
(/usr/lib64/power9/libc-2.28.so)
7fff83a2c170 getaddrinfo+0x160 (/usr/lib64/power9/libc-2.28.so)
1171830f4 [unknown] (/usr/bin/ping)
FAIL: expected backtrace entry
".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$"
got "1171830f4 [unknown] (/usr/bin/ping)"
test child finished with -1
---- end ----
probe libc's inet_pton & backtrace it with ping: FAILED!
After:
59: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 79085
ping 79108 [045] 96400.214177: probe_libc:inet_pton: (7fffbb9654c8)
7fffbb9654c8 __GI___inet_pton+0x8 (/usr/lib64/power9/libc-2.28.so)
7fffbb91b7a0 gaih_inet.constprop.7+0x1020
(/usr/lib64/power9/libc-2.28.so)
7fffbb91c170 getaddrinfo+0x160 (/usr/lib64/power9/libc-2.28.so)
132e830f4 [unknown] (/usr/bin/ping)
test child finished with 0
---- end ----
probe libc's inet_pton & backtrace it with ping: Ok
Signed-off-by: Seeteena Thoufeek <s1seetee@linux.vnet.ibm.com>
Reviewed-by: Kim Phillips <kim.phillips@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Fixes: 1632936480 ("perf tests: Fix record+probe_libc_inet_pton.sh without ping's debuginfo")
Link: http://lkml.kernel.org/r/1561630614-3216-1-git-send-email-s1seetee@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b188b03270 ]
Handle overlooked case where the target address is assigned to a peer
and neither route nor gateway exist.
For one peer, no checks are performed to see if it is meant to receive
packets for a given address.
As soon as there is a second peer however, checks are performed
to deal with routes and gateways for handling complex setups with
multiple hops to a target address.
This logic assumed that no route and no gateway imply that the
destination address can not be reached, which is false in case of a
direct peer.
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Tested-by: Michael Scott <mike@foundries.io>
Signed-off-by: Josua Mayer <josua.mayer@jm0.eu>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aa52bcbe0e ]
Michael reported crash with by bpf program in json mode on powerpc:
# bpftool prog -p dump jited id 14
[{
"name": "0xd00000000a9aa760",
"insns": [{
"pc": "0x0",
"operation": "nop",
"operands": [null
]
},{
"pc": "0x4",
"operation": "nop",
"operands": [null
]
},{
"pc": "0x8",
"operation": "mflr",
Segmentation fault (core dumped)
The code is assuming char pointers in format, which is not always
true at least for powerpc. Fixing this by dumping the whole string
into buffer based on its format.
Please note that libopcodes code does not check return values from
fprintf callback, but as per Jakub suggestion returning -1 on allocation
failure so we do the best effort to propagate the error.
Fixes: 107f041212 ("tools: bpftool: add JSON output for `bpftool prog dump jited *` command")
Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3285170f28 ]
Commit 372e722ea4 ("gpiolib: use descriptors internally") renamed
the functions to use a "gpiod" prefix, and commit 79a9becda8
("gpiolib: export descriptor-based GPIO interface") introduced the "raw"
variants, but both changes forgot to update the comments.
Readd a similar reference to gpiod_set_value(), which was accidentally
removed by commit 1e77fc8211 ("gpio: Add missing open drain/source
handling to gpiod_set_value_cansleep()").
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20190701142738.25219-1-geert+renesas@glider.be
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11aca65ec4 ]
Selftests are reporting this failure in test_lwt_seg6local.sh:
+ ip netns exec ns2 ip -6 route add fb00::6 encap bpf in obj test_lwt_seg6local.o sec encap_srh dev veth2
Error fetching program/map!
Failed to parse eBPF program: Operation not permitted
The problem is __attribute__((always_inline)) alone is not enough to prevent
clang from inserting those functions in .text. In that case, .text is not
marked as relocateable.
See the output of objdump -h test_lwt_seg6local.o:
Idx Name Size VMA LMA File off Algn
0 .text 00003530 0000000000000000 0000000000000000 00000040 2**3
CONTENTS, ALLOC, LOAD, READONLY, CODE
This causes the iproute bpf loader to fail in bpf_fetch_prog_sec:
bpf_has_call_data returns true but bpf_fetch_prog_relo fails as there's no
relocateable .text section in the file.
To fix this, convert to 'static __always_inline'.
v2: Use 'static __always_inline' instead of 'static inline
__attribute__((always_inline))'
Fixes: c99a84eac0 ("selftests/bpf: test for seg6local End.BPF action")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33bae185f7 ]
Based on the following report from Smatch, fix the potential NULL
pointer dereference check:
tools/lib/bpf/libbpf.c:3493
bpf_prog_load_xattr() warn: variable dereferenced before check 'attr'
(see line 3483)
3479 int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
3480 struct bpf_object **pobj, int *prog_fd)
3481 {
3482 struct bpf_object_open_attr open_attr = {
3483 .file = attr->file,
3484 .prog_type = attr->prog_type,
^^^^^^
3485 };
At the head of function, it directly access 'attr' without checking
if it's NULL pointer. This patch moves the values assignment after
validating 'attr' and 'attr->file'.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99f0eae653 ]
If the rxrpc_eproto tracepoint is enabled, an oops will be cause by the
trace line that rxrpc_extract_header() tries to emit when a protocol error
occurs (typically because the packet is short) because the call argument is
NULL.
Fix this by using ?: to assume 0 as the debug_id if call is NULL.
This can then be induced by:
echo -e '\0\0\0\0\0\0\0\0' | ncat -4u --send-only <addr> 20001
where addr has the following program running on it:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <linux/rxrpc.h>
int main(void)
{
struct sockaddr_rxrpc srx;
int fd;
memset(&srx, 0, sizeof(srx));
srx.srx_family = AF_RXRPC;
srx.srx_service = 0;
srx.transport_type = AF_INET;
srx.transport_len = sizeof(srx.transport.sin);
srx.transport.sin.sin_family = AF_INET;
srx.transport.sin.sin_port = htons(0x4e21);
fd = socket(AF_RXRPC, SOCK_DGRAM, AF_INET6);
bind(fd, (struct sockaddr *)&srx, sizeof(srx));
sleep(20);
return 0;
}
It results in the following oops.
BUG: kernel NULL pointer dereference, address: 0000000000000340
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
...
RIP: 0010:trace_event_raw_event_rxrpc_rx_eproto+0x47/0xac
...
Call Trace:
<IRQ>
rxrpc_extract_header+0x86/0x171
? rcu_read_lock_sched_held+0x5d/0x63
? rxrpc_new_skb+0xd4/0x109
rxrpc_input_packet+0xef/0x14fc
? rxrpc_input_data+0x986/0x986
udp_queue_rcv_one_skb+0xbf/0x3d0
udp_unicast_rcv_skb.isra.8+0x64/0x71
ip_protocol_deliver_rcu+0xe4/0x1b4
ip_local_deliver+0xf0/0x154
__netif_receive_skb_one_core+0x50/0x6c
netif_receive_skb_internal+0x26b/0x2e9
napi_gro_receive+0xf8/0x1da
rtl8169_poll+0x303/0x4c4
net_rx_action+0x10e/0x333
__do_softirq+0x1a5/0x38f
irq_exit+0x54/0xc4
do_IRQ+0xda/0xf8
common_interrupt+0xf/0xf
</IRQ>
...
? cpuidle_enter_state+0x23c/0x34d
cpuidle_enter+0x2a/0x36
do_idle+0x163/0x1ea
cpu_startup_entry+0x1d/0x1f
start_secondary+0x157/0x172
secondary_startup_64+0xa4/0xb0
Fixes: a25e21f0bc ("rxrpc, afs: Use debug_ids rather than pointers in traces")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c91f25c2f ]
Currently bnx2x ptp worker tries to read a register with timestamp
information in case of TX packet timestamping and in case it fails,
the routine reschedules itself indefinitely. This was reported as a
kworker always at 100% of CPU usage, which was narrowed down to be
bnx2x ptp_task.
By following the ioctl handler, we could narrow down the problem to
an NTP tool (chrony) requesting HW timestamping from bnx2x NIC with
RX filter zeroed; this isn't reproducible for example with ptp4l
(from linuxptp) since this tool requests a supported RX filter.
It seems NIC FW timestamp mechanism cannot work well with
RX_FILTER_NONE - driver's PTP filter init routine skips a register
write to the adapter if there's not a supported filter request.
This patch addresses the problem of bnx2x ptp thread's everlasting
reschedule by retrying the register read 10 times; between the read
attempts the thread sleeps for an increasing amount of time starting
in 1ms to give FW some time to perform the timestamping. If it still
fails after all retries, we bail out in order to prevent an unbound
resource consumption from bnx2x.
The patch also adds an ethtool statistic for accounting the skipped
TX timestamp packets and it reduces the priority of timestamping
error messages to prevent log flooding. The code was tested using
both linuxptp and chrony.
Reported-and-tested-by: Przemyslaw Hausman <przemyslaw.hausman@canonical.com>
Suggested-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f87f33f42 ]
The metric group code tries to find a group it added earlier in the
evlist. Fix the lookup to handle groups with partially overlaps
correctly. When a sub string match fails and we reset the match, we have
to compare the first element again.
I also renamed the find_evsel function to find_evsel_group to make its
purpose clearer.
With the earlier changes this fixes:
Before:
% perf stat -M UPI,IPC sleep 1
...
1,032,922 uops_retired.retire_slots # 1.1 UPI
1,896,096 inst_retired.any
1,896,096 inst_retired.any
1,177,254 cpu_clk_unhalted.thread
After:
% perf stat -M UPI,IPC sleep 1
...
1,013,193 uops_retired.retire_slots # 1.1 UPI
932,033 inst_retired.any
932,033 inst_retired.any # 0.9 IPC
1,091,245 cpu_clk_unhalted.thread
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Fixes: b18f3e3650 ("perf stat: Support JSON metrics in perf stat")
Link: http://lkml.kernel.org/r/20190624193711.35241-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 145c407c80 ]
After setting up metric groups through the event parser, the metricgroup
code looks them up again in the event list.
Make sure we only look up events that haven't been used by some other
metric. The data structures currently cannot handle more than one metric
per event. This avoids problems with multiple events partially
overlapping.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: http://lkml.kernel.org/r/20190624193711.35241-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0472301a28 ]
Merge commit 1c8c5a9d38 ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next") undid the
fix from commit 36f9814a49 ("bpf: fix uapi hole for 32 bit compat
applications") by taking the gpl_compatible 1-bit field definition from
commit b85fab0e67 ("bpf: Add gpl_compatible flag to struct
bpf_prog_info") as is. That breaks architectures with 16-bit alignment
like m68k. Add 31-bit pad after gpl_compatible to restore alignment of
following fields.
Thanks to Dmitry V. Levin his analysis of this bug history.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac70499ee9 ]
In some buggy scenarios we could possible attempt to transmit frames larger
than maximum MSDU size. Since our devices don't know how to handle this,
it may result in asserts, hangs etc.
This can happen, for example, when we receive a large multicast frame
and try to transmit it back to the air in AP mode.
Since in a legal scenario this should never happen, drop such frames and
warn about it.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1e08511d5d ]
If a packet which is utilizing the launchtime feature (via SO_TXTIME socket
option) also requests the hardware transmit timestamp, the hardware
timestamp is not delivered to the userspace. This is because the value in
skb->tstamp is mistaken as the software timestamp.
Applications, like ptp4l, request a hardware timestamp by setting the
SOF_TIMESTAMPING_TX_HARDWARE socket option. Whenever a new timestamp is
detected by the driver (this work is done in igb_ptp_tx_work() which calls
igb_ptp_tx_hwtstamps() in igb_ptp.c[1]), it will queue the timestamp in the
ERR_QUEUE for the userspace to read. When the userspace is ready, it will
issue a recvmsg() call to collect this timestamp. The problem is in this
recvmsg() call. If the skb->tstamp is not cleared out, it will be
interpreted as a software timestamp and the hardware tx timestamp will not
be successfully sent to the userspace. Look at skb_is_swtx_tstamp() and the
callee function __sock_recv_timestamp() in net/socket.c for more details.
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ec3ede559 ]
The Header Parser allows identifying various fields in the packet
headers, used for various kind of filtering and classification
steps.
This is a re-entrant process, where the offset in the packet header
depends on the previous lookup results. This offset is represented in
the SRAM results of the TCAM, as a shift to be operated.
This shift can be negative in some cases, such as in IPv6 parsing.
This commit prevents overriding the sign bit when setting the shift
value, which could cause instabilities when parsing IPv6 flows.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Suggested-by: Alan Winkowski <walan@marvell.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ed39f8e74 ]
The workqueue need to flush and destory while remove sdio module,
otherwise it will have thread which is not destory after remove
sdio modules.
Tested with QCA6174 SDIO with firmware
WLAN.RMH.4.4.1-00007-QCARMSWP-1.
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 04f25edb48 ]
When hdev->tx_sch_mode is HCLGE_FLAG_VNET_BASE_SCH_MODE, the
hclge_tm_schd_mode_vnet_base_cfg calls hclge_tm_pri_schd_mode_cfg
with vport->vport_id as pri_id, which is used as index for
hdev->tm_info.tc_info, it will cause out of bound access issue
if vport_id is equal to or larger than HNAE3_MAX_TC.
Also hardware only support maximum speed of HCLGE_ETHER_MAX_RATE.
So this patch adds two checks for above cases.
Fixes: 848440544b ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18d219b783 ]
When setting -Wformat=2, there is a compiler warning like this:
hclge_main.c:xxx:x: warning: format not a string literal and no
format arguments [-Wformat-nonliteral]
strs[i].desc);
^~~~
This patch adds missing format parameter "%s" to snprintf() to
fix it.
Fixes: 46a3df9f97 ("Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7e865eba00 ]
When enable lockdep and reboot system with a writeback mode bcache
device, the following potential deadlock warning is reported by lockdep
engine.
[ 101.536569][ T401] kworker/2:2/401 is trying to acquire lock:
[ 101.538575][ T401] 00000000bbf6e6c7 ((wq_completion)bcache_writeback_wq){+.+.}, at: flush_workqueue+0x87/0x4c0
[ 101.542054][ T401]
[ 101.542054][ T401] but task is already holding lock:
[ 101.544587][ T401] 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
[ 101.548386][ T401]
[ 101.548386][ T401] which lock already depends on the new lock.
[ 101.548386][ T401]
[ 101.551874][ T401]
[ 101.551874][ T401] the existing dependency chain (in reverse order) is:
[ 101.555000][ T401]
[ 101.555000][ T401] -> #1 ((work_completion)(&cl->work)#2){+.+.}:
[ 101.557860][ T401] process_one_work+0x277/0x640
[ 101.559661][ T401] worker_thread+0x39/0x3f0
[ 101.561340][ T401] kthread+0x125/0x140
[ 101.562963][ T401] ret_from_fork+0x3a/0x50
[ 101.564718][ T401]
[ 101.564718][ T401] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}:
[ 101.567701][ T401] lock_acquire+0xb4/0x1c0
[ 101.569651][ T401] flush_workqueue+0xae/0x4c0
[ 101.571494][ T401] drain_workqueue+0xa9/0x180
[ 101.573234][ T401] destroy_workqueue+0x17/0x250
[ 101.575109][ T401] cached_dev_free+0x44/0x120 [bcache]
[ 101.577304][ T401] process_one_work+0x2a4/0x640
[ 101.579357][ T401] worker_thread+0x39/0x3f0
[ 101.581055][ T401] kthread+0x125/0x140
[ 101.582709][ T401] ret_from_fork+0x3a/0x50
[ 101.584592][ T401]
[ 101.584592][ T401] other info that might help us debug this:
[ 101.584592][ T401]
[ 101.588355][ T401] Possible unsafe locking scenario:
[ 101.588355][ T401]
[ 101.590974][ T401] CPU0 CPU1
[ 101.592889][ T401] ---- ----
[ 101.594743][ T401] lock((work_completion)(&cl->work)#2);
[ 101.596785][ T401] lock((wq_completion)bcache_writeback_wq);
[ 101.600072][ T401] lock((work_completion)(&cl->work)#2);
[ 101.602971][ T401] lock((wq_completion)bcache_writeback_wq);
[ 101.605255][ T401]
[ 101.605255][ T401] *** DEADLOCK ***
[ 101.605255][ T401]
[ 101.608310][ T401] 2 locks held by kworker/2:2/401:
[ 101.610208][ T401] #0: 00000000cf2c7d17 ((wq_completion)events){+.+.}, at: process_one_work+0x21e/0x640
[ 101.613709][ T401] #1: 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
[ 101.617480][ T401]
[ 101.617480][ T401] stack backtrace:
[ 101.619539][ T401] CPU: 2 PID: 401 Comm: kworker/2:2 Tainted: G W 5.2.0-rc4-lp151.20-default+ #1
[ 101.623225][ T401] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 101.627210][ T401] Workqueue: events cached_dev_free [bcache]
[ 101.629239][ T401] Call Trace:
[ 101.630360][ T401] dump_stack+0x85/0xcb
[ 101.631777][ T401] print_circular_bug+0x19a/0x1f0
[ 101.633485][ T401] __lock_acquire+0x16cd/0x1850
[ 101.635184][ T401] ? __lock_acquire+0x6a8/0x1850
[ 101.636863][ T401] ? lock_acquire+0xb4/0x1c0
[ 101.638421][ T401] ? find_held_lock+0x34/0xa0
[ 101.640015][ T401] lock_acquire+0xb4/0x1c0
[ 101.641513][ T401] ? flush_workqueue+0x87/0x4c0
[ 101.643248][ T401] flush_workqueue+0xae/0x4c0
[ 101.644832][ T401] ? flush_workqueue+0x87/0x4c0
[ 101.646476][ T401] ? drain_workqueue+0xa9/0x180
[ 101.648303][ T401] drain_workqueue+0xa9/0x180
[ 101.649867][ T401] destroy_workqueue+0x17/0x250
[ 101.651503][ T401] cached_dev_free+0x44/0x120 [bcache]
[ 101.653328][ T401] process_one_work+0x2a4/0x640
[ 101.655029][ T401] worker_thread+0x39/0x3f0
[ 101.656693][ T401] ? process_one_work+0x640/0x640
[ 101.658501][ T401] kthread+0x125/0x140
[ 101.660012][ T401] ? kthread_create_worker_on_cpu+0x70/0x70
[ 101.661985][ T401] ret_from_fork+0x3a/0x50
[ 101.691318][ T401] bcache: bcache_device_free() bcache0 stopped
Here is how the above potential deadlock may happen in reboot/shutdown
code path,
1) bcache_reboot() is called firstly in the reboot/shutdown code path,
then in bcache_reboot(), bcache_device_stop() is called.
2) bcache_device_stop() sets BCACHE_DEV_CLOSING on d->falgs, then call
closure_queue(&d->cl) to invoke cached_dev_flush(). And in turn
cached_dev_flush() calls cached_dev_free() via closure_at()
3) In cached_dev_free(), after stopped writebach kthread
dc->writeback_thread, the kwork dc->writeback_write_wq is stopping by
destroy_workqueue().
4) Inside destroy_workqueue(), drain_workqueue() is called. Inside
drain_workqueue(), flush_workqueue() is called. Then wq->lockdep_map
is acquired by lock_map_acquire() in flush_workqueue(). After the
lock acquired the rest part of flush_workqueue() just wait for the
workqueue to complete.
5) Now we look back at writeback thread routine bch_writeback_thread(),
in the main while-loop, write_dirty() is called via continue_at() in
read_dirty_submit(), which is called via continue_at() in while-loop
level called function read_dirty(). Inside write_dirty() it may be
re-called on workqueeu dc->writeback_write_wq via continue_at().
It means when the writeback kthread is stopped in cached_dev_free()
there might be still one kworker queued on dc->writeback_write_wq
to execute write_dirty() again.
6) Now this kworker is scheduled on dc->writeback_write_wq to run by
process_one_work() (which is called by worker_thread()). Before
calling the kwork routine, wq->lockdep_map is acquired.
7) But wq->lockdep_map is acquired already in step 4), so a A-A lock
(lockdep terminology) scenario happens.
Indeed on multiple cores syatem, the above deadlock is very rare to
happen, just as the code comments in process_one_work() says,
2263 * AFAICT there is no possible deadlock scenario between the
2264 * flush_work() and complete() primitives (except for
single-threaded
2265 * workqueues), so hiding them isn't a problem.
But it is still good to fix such lockdep warning, even no one running
bcache on single core system.
The fix is simple. This patch solves the above potential deadlock by,
- Do not destroy workqueue dc->writeback_write_wq in cached_dev_free().
- Flush and destroy dc->writeback_write_wq in writebach kthread routine
bch_writeback_thread(), where after quit the thread main while-loop
and before cached_dev_put() is called.
By this fix, dc->writeback_write_wq will be stopped and destroy before
the writeback kthread stopped, so the chance for a A-A locking on
wq->lockdep_map is disappeared, such A-A deadlock won't happen
any more.
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 80265d8dfd ]
When enable lockdep engine, a lockdep warning can be observed when
reboot or shutdown system,
[ 3142.764557][ T1] bcache: bcache_reboot() Stopping all devices:
[ 3142.776265][ T2649]
[ 3142.777159][ T2649] ======================================================
[ 3142.780039][ T2649] WARNING: possible circular locking dependency detected
[ 3142.782869][ T2649] 5.2.0-rc4-lp151.20-default+ #1 Tainted: G W
[ 3142.785684][ T2649] ------------------------------------------------------
[ 3142.788479][ T2649] kworker/3:67/2649 is trying to acquire lock:
[ 3142.790738][ T2649] 00000000aaf02291 ((wq_completion)bcache_writeback_wq){+.+.}, at: flush_workqueue+0x87/0x4c0
[ 3142.794678][ T2649]
[ 3142.794678][ T2649] but task is already holding lock:
[ 3142.797402][ T2649] 000000004fcf89c5 (&bch_register_lock){+.+.}, at: cached_dev_free+0x17/0x120 [bcache]
[ 3142.801462][ T2649]
[ 3142.801462][ T2649] which lock already depends on the new lock.
[ 3142.801462][ T2649]
[ 3142.805277][ T2649]
[ 3142.805277][ T2649] the existing dependency chain (in reverse order) is:
[ 3142.808902][ T2649]
[ 3142.808902][ T2649] -> #2 (&bch_register_lock){+.+.}:
[ 3142.812396][ T2649] __mutex_lock+0x7a/0x9d0
[ 3142.814184][ T2649] cached_dev_free+0x17/0x120 [bcache]
[ 3142.816415][ T2649] process_one_work+0x2a4/0x640
[ 3142.818413][ T2649] worker_thread+0x39/0x3f0
[ 3142.820276][ T2649] kthread+0x125/0x140
[ 3142.822061][ T2649] ret_from_fork+0x3a/0x50
[ 3142.823965][ T2649]
[ 3142.823965][ T2649] -> #1 ((work_completion)(&cl->work)#2){+.+.}:
[ 3142.827244][ T2649] process_one_work+0x277/0x640
[ 3142.829160][ T2649] worker_thread+0x39/0x3f0
[ 3142.830958][ T2649] kthread+0x125/0x140
[ 3142.832674][ T2649] ret_from_fork+0x3a/0x50
[ 3142.834915][ T2649]
[ 3142.834915][ T2649] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}:
[ 3142.838121][ T2649] lock_acquire+0xb4/0x1c0
[ 3142.840025][ T2649] flush_workqueue+0xae/0x4c0
[ 3142.842035][ T2649] drain_workqueue+0xa9/0x180
[ 3142.844042][ T2649] destroy_workqueue+0x17/0x250
[ 3142.846142][ T2649] cached_dev_free+0x52/0x120 [bcache]
[ 3142.848530][ T2649] process_one_work+0x2a4/0x640
[ 3142.850663][ T2649] worker_thread+0x39/0x3f0
[ 3142.852464][ T2649] kthread+0x125/0x140
[ 3142.854106][ T2649] ret_from_fork+0x3a/0x50
[ 3142.855880][ T2649]
[ 3142.855880][ T2649] other info that might help us debug this:
[ 3142.855880][ T2649]
[ 3142.859663][ T2649] Chain exists of:
[ 3142.859663][ T2649] (wq_completion)bcache_writeback_wq --> (work_completion)(&cl->work)#2 --> &bch_register_lock
[ 3142.859663][ T2649]
[ 3142.865424][ T2649] Possible unsafe locking scenario:
[ 3142.865424][ T2649]
[ 3142.868022][ T2649] CPU0 CPU1
[ 3142.869885][ T2649] ---- ----
[ 3142.871751][ T2649] lock(&bch_register_lock);
[ 3142.873379][ T2649] lock((work_completion)(&cl->work)#2);
[ 3142.876399][ T2649] lock(&bch_register_lock);
[ 3142.879727][ T2649] lock((wq_completion)bcache_writeback_wq);
[ 3142.882064][ T2649]
[ 3142.882064][ T2649] *** DEADLOCK ***
[ 3142.882064][ T2649]
[ 3142.885060][ T2649] 3 locks held by kworker/3:67/2649:
[ 3142.887245][ T2649] #0: 00000000e774cdd0 ((wq_completion)events){+.+.}, at: process_one_work+0x21e/0x640
[ 3142.890815][ T2649] #1: 00000000f7df89da ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
[ 3142.894884][ T2649] #2: 000000004fcf89c5 (&bch_register_lock){+.+.}, at: cached_dev_free+0x17/0x120 [bcache]
[ 3142.898797][ T2649]
[ 3142.898797][ T2649] stack backtrace:
[ 3142.900961][ T2649] CPU: 3 PID: 2649 Comm: kworker/3:67 Tainted: G W 5.2.0-rc4-lp151.20-default+ #1
[ 3142.904789][ T2649] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 3142.909168][ T2649] Workqueue: events cached_dev_free [bcache]
[ 3142.911422][ T2649] Call Trace:
[ 3142.912656][ T2649] dump_stack+0x85/0xcb
[ 3142.914181][ T2649] print_circular_bug+0x19a/0x1f0
[ 3142.916193][ T2649] __lock_acquire+0x16cd/0x1850
[ 3142.917936][ T2649] ? __lock_acquire+0x6a8/0x1850
[ 3142.919704][ T2649] ? lock_acquire+0xb4/0x1c0
[ 3142.921335][ T2649] ? find_held_lock+0x34/0xa0
[ 3142.923052][ T2649] lock_acquire+0xb4/0x1c0
[ 3142.924635][ T2649] ? flush_workqueue+0x87/0x4c0
[ 3142.926375][ T2649] flush_workqueue+0xae/0x4c0
[ 3142.928047][ T2649] ? flush_workqueue+0x87/0x4c0
[ 3142.929824][ T2649] ? drain_workqueue+0xa9/0x180
[ 3142.931686][ T2649] drain_workqueue+0xa9/0x180
[ 3142.933534][ T2649] destroy_workqueue+0x17/0x250
[ 3142.935787][ T2649] cached_dev_free+0x52/0x120 [bcache]
[ 3142.937795][ T2649] process_one_work+0x2a4/0x640
[ 3142.939803][ T2649] worker_thread+0x39/0x3f0
[ 3142.941487][ T2649] ? process_one_work+0x640/0x640
[ 3142.943389][ T2649] kthread+0x125/0x140
[ 3142.944894][ T2649] ? kthread_create_worker_on_cpu+0x70/0x70
[ 3142.947744][ T2649] ret_from_fork+0x3a/0x50
[ 3142.970358][ T2649] bcache: bcache_device_free() bcache0 stopped
Here is how the deadlock happens.
1) bcache_reboot() calls bcache_device_stop(), then inside
bcache_device_stop() BCACHE_DEV_CLOSING bit is set on d->flags.
Then closure_queue(&d->cl) is called to invoke cached_dev_flush().
2) In cached_dev_flush(), cached_dev_free() is called by continu_at().
3) In cached_dev_free(), when stopping the writeback kthread of the
cached device by kthread_stop(), dc->writeback_thread will be waken
up to quite the kthread while-loop, then cached_dev_put() is called
in bch_writeback_thread().
4) Calling cached_dev_put() in writeback kthread may drop dc->count to
0, then dc->detach kworker is scheduled, which is initialized as
cached_dev_detach_finish().
5) Inside cached_dev_detach_finish(), the last line of code is to call
closure_put(&dc->disk.cl), which drops the last reference counter of
closrure dc->disk.cl, then the callback cached_dev_flush() gets
called.
Now cached_dev_flush() is called for second time in the code path, the
first time is in step 2). And again bch_register_lock will be acquired
again, and a A-A lock (lockdep terminology) is happening.
The root cause of the above A-A lock is in cached_dev_free(), mutex
bch_register_lock is held before stopping writeback kthread and other
kworkers. Fortunately now we have variable 'bcache_is_reboot', which may
prevent device registration or unregistration during reboot/shutdown
time, so it is unncessary to hold bch_register_lock such early now.
This is how this patch fixes the reboot/shutdown time A-A lock issue:
After moving mutex_lock(&bch_register_lock) to a later location where
before atomic_read(&dc->running) in cached_dev_free(), such A-A lock
problem can be solved without any reboot time registration race.
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 383ff2183a ]
When too many I/O errors happen on cache set and CACHE_SET_IO_DISABLE
bit is set, bch_journal() may continue to work because the journaling
bkey might be still in write set yet. The caller of bch_journal() may
believe the journal still work but the truth is in-memory journal write
set won't be written into cache device any more. This behavior may
introduce potential inconsistent metadata status.
This patch checks CACHE_SET_IO_DISABLE bit at the head of bch_journal(),
if the bit is set, bch_journal() returns NULL immediately to notice
caller to know journal does not work.
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e775339e1a ]
If CACHE_SET_IO_DISABLE of a cache set flag is set by too many I/O
errors, currently allocator routines can still continue allocate
space which may introduce inconsistent metadata state.
This patch checkes CACHE_SET_IO_DISABLE bit in following allocator
routines,
- bch_bucket_alloc()
- __bch_bucket_alloc_set()
Once CACHE_SET_IO_DISABLE is set on cache set, the allocator routines
may reject allocation request earlier to avoid potential inconsistent
metadata.
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8655e7630 ]
Commit 9da21b1509 ("EDAC: Poll timeout cannot be zero, p2") assumes
edac_mc_poll_msec to be unsigned long, but the type of the variable still
remained as int. Setting edac_mc_poll_msec can trigger out-of-bounds
write.
Reproducer:
# echo 1001 > /sys/module/edac_core/parameters/edac_mc_poll_msec
KASAN report:
BUG: KASAN: global-out-of-bounds in edac_set_poll_msec+0x140/0x150
Write of size 8 at addr ffffffffb91b2d00 by task bash/1996
CPU: 1 PID: 1996 Comm: bash Not tainted 5.2.0-rc6+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
Call Trace:
dump_stack+0xca/0x13e
print_address_description.cold+0x5/0x246
__kasan_report.cold+0x75/0x9a
? edac_set_poll_msec+0x140/0x150
kasan_report+0xe/0x20
edac_set_poll_msec+0x140/0x150
? dimmdev_location_show+0x30/0x30
? vfs_lock_file+0xe0/0xe0
? _raw_spin_lock+0x87/0xe0
param_attr_store+0x1b5/0x310
? param_array_set+0x4f0/0x4f0
module_attr_store+0x58/0x80
? module_attr_show+0x80/0x80
sysfs_kf_write+0x13d/0x1a0
kernfs_fop_write+0x2bc/0x460
? sysfs_kf_bin_read+0x270/0x270
? kernfs_notify+0x1f0/0x1f0
__vfs_write+0x81/0x100
vfs_write+0x1e1/0x560
ksys_write+0x126/0x250
? __ia32_sys_read+0xb0/0xb0
? do_syscall_64+0x1f/0x390
do_syscall_64+0xc1/0x390
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fa7caa5e970
Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 04
RSP: 002b:00007fff6acfdfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa7caa5e970
RDX: 0000000000000005 RSI: 0000000000e95c08 RDI: 0000000000000001
RBP: 0000000000e95c08 R08: 00007fa7cad1e760 R09: 00007fa7cb36a700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000005
R13: 0000000000000001 R14: 00007fa7cad1d600 R15: 0000000000000005
The buggy address belongs to the variable:
edac_mc_poll_msec+0x0/0x40
Memory state around the buggy address:
ffffffffb91b2c00: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
ffffffffb91b2c80: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
>ffffffffb91b2d00: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
^
ffffffffb91b2d80: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
ffffffffb91b2e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fix it by changing the type of edac_mc_poll_msec to unsigned int.
The reason why this patch adopts unsigned int rather than unsigned long
is msecs_to_jiffies() assumes arg to be unsigned int. We can avoid
integer conversion bugs and unsigned int will be large enough for
edac_mc_poll_msec.
Reviewed-by: James Morse <james.morse@arm.com>
Fixes: 9da21b1509 ("EDAC: Poll timeout cannot be zero, p2")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1a27600311 ]
This change fixes a rare race condition of handling WMI events after
wmi_call expires.
wmi_recv_cmd immediately handles an event when reply_buf is defined and
a wmi_call is waiting for the event.
However, in case the wmi_call has already timed-out, there will be no
waiting/running wmi_call and the event will be queued in WMI queue and
will be handled later in wmi_event_handle.
Meanwhile, a new similar wmi_call for the same command and event may
be issued. In this case, when handling the queued event we got WARN_ON
printed.
Fixing this case as a valid timeout and drop the unexpected event.
Signed-off-by: Ahmad Masri <amasri@codeaurora.org>
Signed-off-by: Maya Erez <merez@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 90acc0653d ]
Build testing with some core crypto options disabled revealed
a few modules that are missing CRYPTO_HASH:
crypto/asymmetric_keys/x509_public_key.o: In function `x509_get_sig_params':
x509_public_key.c:(.text+0x4c7): undefined reference to `crypto_alloc_shash'
x509_public_key.c:(.text+0x5e5): undefined reference to `crypto_shash_digest'
crypto/asymmetric_keys/pkcs7_verify.o: In function `pkcs7_digest.isra.0':
pkcs7_verify.c:(.text+0xab): undefined reference to `crypto_alloc_shash'
pkcs7_verify.c:(.text+0x1b2): undefined reference to `crypto_shash_digest'
pkcs7_verify.c:(.text+0x3c1): undefined reference to `crypto_shash_update'
pkcs7_verify.c:(.text+0x411): undefined reference to `crypto_shash_finup'
This normally doesn't show up in randconfig tests because there is
a large number of other options that select CRYPTO_HASH.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 473971187d ]
The same bug that gcc hit in the past is apparently now showing
up with clang, which decides to inline __serpent_setkey_sbox:
crypto/serpent_generic.c:268:5: error: stack frame size of 2112 bytes in function '__serpent_setkey' [-Werror,-Wframe-larger-than=]
Marking it 'noinline' reduces the stack usage from 2112 bytes to
192 and 96 bytes, respectively, and seems to generate more
useful object code.
Fixes: c871c10e4e ("crypto: serpent - improve __serpent_setkey with UBSAN")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 655c914145 ]
Some transceivers may comply with SFF-8472 but not implement the Digital
Diagnostic Monitoring (DDM) interface described in it. The existence of
such area is specified by bit 6 of byte 92, set to 1 if implemented.
Currently, due to not checking this bit ixgbe fails trying to read SFP
module's eeprom with the follow message:
ethtool -m enP51p1s0f0
Cannot get Module EEPROM data: Input/output error
Because it fails to read the additional 256 bytes in which it was assumed
to exist the DDM data.
This issue was noticed using a Mellanox Passive DAC PN 01FT738. The eeprom
data was confirmed by Mellanox as correct and present in other Passive
DACs in from other manufacturers.
Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2034a42d17 ]
The decoding of shortenend codes is broken. It only works as expected if
there are no erasures.
When decoding with erasures, Lambda (the error and erasure locator
polynomial) is initialized from the given erasure positions. The pad
parameter is not accounted for by the initialisation code, and hence
Lambda is initialized from incorrect erasure positions.
The fix is to adjust the erasure positions by the supplied pad.
Signed-off-by: Ferdinand Blomqvist <ferdinand.blomqvist@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190620141039.9874-3-ferdinand.blomqvist@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f7019b7b0a ]
Clang warns:
In file included from net/xdp/xsk_queue.c:10:
net/xdp/xsk_queue.h:292:2: warning: expression result unused
[-Wunused-value]
WRITE_ONCE(q->ring->producer, q->prod_tail);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/compiler.h:284:6: note: expanded from macro 'WRITE_ONCE'
__u.__val; \
~~~ ^~~~~
1 warning generated.
The q->prod_tail assignment has a comma at the end, not a semi-colon.
Fix that so clang no longer warns and everything works as expected.
Fixes: c497176cb2 ("xsk: add Rx receive functions and poll support")
Link: https://github.com/ClangBuiltLinux/linux/issues/544
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>