[ Upstream commit 4068664e3c ]
Extents are cached in read_extent_tree_block(); as a result, extents
are not cached for inodes with depth == 0 when we try to find the
extent using ext4_find_extent(). The result of the lookup is cached
in ext4_map_blocks() but is only a subset of the extent on disk. As a
result, the contents of extents status cache can get very badly
fragmented for certain workloads, such as a random 4k read workload.
File size of /mnt/test is 33554432 (8192 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 8191: 40960.. 49151: 8192: last,eof
$ perf record -e 'ext4:ext4_es_*' /root/bin/fio --name=t --direct=0 --rw=randread --bs=4k --filesize=32M --size=32M --filename=/mnt/test
$ perf script | grep ext4_es_insert_extent | head -n 10
fio 131 [000] 13.975421: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [494/1) mapped 41454 status W
fio 131 [000] 13.975939: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6064/1) mapped 47024 status W
fio 131 [000] 13.976467: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6907/1) mapped 47867 status W
fio 131 [000] 13.976937: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3850/1) mapped 44810 status W
fio 131 [000] 13.977440: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3292/1) mapped 44252 status W
fio 131 [000] 13.977931: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6882/1) mapped 47842 status W
fio 131 [000] 13.978376: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3117/1) mapped 44077 status W
fio 131 [000] 13.978957: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [2896/1) mapped 43856 status W
fio 131 [000] 13.979474: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [7479/1) mapped 48439 status W
Fix this by caching the extents for inodes with depth == 0 in
ext4_find_extent().
[ Renamed ext4_es_cache_extents() to ext4_cache_extents() since this
newly added function is not in extents_cache.c, and to avoid
potential visual confusion with ext4_es_cache_extent(). -TYT ]
Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com>
Link: https://lore.kernel.org/r/20191106122502.19986-1-dmonakhov@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ no upstream commit ]
Switch the comparison, so that is_branch_taken() will recognize that below
branch is never taken:
[...]
17: [...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
17: (67) r8 <<= 32
18: [...] R8_w=inv(id=0,smax_value=-4294967296,umin_value=9223372036854775808,umax_value=18446744069414584320,var_off=(0x8000000000000000; 0x7fffffff00000000)) [...]
18: (c7) r8 s>>= 32
19: [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
19: (6d) if r1 s> r8 goto pc+16
[...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
[...]
Currently we check for is_branch_taken() only if either K is source, or source
is a scalar value that is const. For upstream it would be good to extend this
properly to check whether dst is const and src not.
For the sake of the test_verifier, it is probably not needed here:
# ./test_verifier 101
#101/p bpf_get_stack return R0 within range OK
Summary: 1 PASSED, 0 SKIPPED, 0 FAILED
I haven't seen this issue in test_progs* though, they are passing fine:
# ./test_progs-no_alu32 -t get_stack
Switching to flavor 'no_alu32' subdirectory...
#20 get_stack_raw_tp:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
# ./test_progs -t get_stack
#20 get_stack_raw_tp:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ac26e9973 upstream.
With current ALU32 subreg handling and retval refine fix from last
patches we see an expected failure in test_verifier. With verbose
verifier state being printed at each step for clarity we have the
following relavent lines [I omit register states that are not
necessarily useful to see failure cause],
#101/p bpf_get_stack return R0 within range FAIL
Failed to load prog 'Success'!
[..]
14: (85) call bpf_get_stack#67
R0_w=map_value(id=0,off=0,ks=8,vs=48,imm=0)
R3_w=inv48
15:
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
15: (b7) r1 = 0
16:
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
16: (bf) r8 = r0
17:
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
R8_w=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
17: (67) r8 <<= 32
18:
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
R8_w=inv(id=0,smax_value=9223372032559808512,
umax_value=18446744069414584320,
var_off=(0x0; 0xffffffff00000000),
s32_min_value=0,
s32_max_value=0,
u32_max_value=0,
var32_off=(0x0; 0x0))
18: (c7) r8 s>>= 32
19
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
R8_w=inv(id=0,smin_value=-2147483648,
smax_value=2147483647,
var32_off=(0x0; 0xffffffff))
19: (cd) if r1 s< r8 goto pc+16
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
R8_w=inv(id=0,smin_value=-2147483648,
smax_value=0,
var32_off=(0x0; 0xffffffff))
20:
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
R8_w=inv(id=0,smin_value=-2147483648,
smax_value=0,
R9=inv48
20: (1f) r9 -= r8
21: (bf) r2 = r7
22:
R2_w=map_value(id=0,off=0,ks=8,vs=48,imm=0)
22: (0f) r2 += r8
value -2147483648 makes map_value pointer be out of bounds
After call bpf_get_stack() on line 14 and some moves we have at line 16
an r8 bound with max_value 48 but an unknown min value. This is to be
expected bpf_get_stack call can only return a max of the input size but
is free to return any negative error in the 32-bit register space. The
C helper is returning an int so will use lower 32-bits.
Lines 17 and 18 clear the top 32 bits with a left/right shift but use
ARSH so we still have worst case min bound before line 19 of -2147483648.
At this point the signed check 'r1 s< r8' meant to protect the addition
on line 22 where dst reg is a map_value pointer may very well return
true with a large negative number. Then the final line 22 will detect
this as an invalid operation and fail the program. What we want to do
is proceed only if r8 is positive non-error. So change 'r1 s< r8' to
'r1 s> r8' so that we jump if r8 is negative.
Next we will throw an error because we access past the end of the map
value. The map value size is 48 and sizeof(struct test_val) is 48 so
we walk off the end of the map value on the second call to
get bpf_get_stack(). Fix this by changing sizeof(struct test_val) to
24 by using 'sizeof(struct test_val) / 2'. After this everything passes
as expected.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/158560426019.10843.3285429543232025187.stgit@john-Precision-5820-Tower
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ no upstream commit ]
See the glory details in 100605035e ("bpf: Verifier, do_refine_retval_range
may clamp umin to 0 incorrectly") for why 849fa50662 ("bpf/verifier: refine
retval R0 state for bpf_get_stack helper") is buggy. The whole series however
is not suitable for stable since it adds significant amount [0] of verifier
complexity in order to add 32bit subreg tracking. Something simpler is needed.
Unfortunately, reverting 849fa50662 ("bpf/verifier: refine retval R0 state
for bpf_get_stack helper") or just cherry-picking 100605035e ("bpf: Verifier,
do_refine_retval_range may clamp umin to 0 incorrectly") is not an option since
it will break existing tracing programs badly (at least those that are using
bpf_get_stack() and bpf_probe_read_str() helpers). Not fixing it in stable is
also not an option since on 4.19 kernels an error will cause a soft-lockup due
to hitting dead-code sanitized branch since we don't hard-wire such branches
in old kernels yet. But even then for 5.x 849fa50662 ("bpf/verifier: refine
retval R0 state for bpf_get_stack helper") would cause wrong bounds on the
verifier simluation when an error is hit.
In one of the earlier iterations of mentioned patch series for upstream there
was the concern that just using smax_value in do_refine_retval_range() would
nuke bounds by subsequent <<32 >>32 shifts before the comparison against 0 [1]
which eventually led to the 32bit subreg tracking in the first place. While I
initially went for implementing the idea [1] to pattern match the two shift
operations, it turned out to be more complex than actually needed, meaning, we
could simply treat do_refine_retval_range() similarly to how we branch off
verification for conditionals or under speculation, that is, pushing a new
reg state to the stack for later verification. This means, instead of verifying
the current path with the ret_reg in [S32MIN, msize_max_value] interval where
later bounds would get nuked, we split this into two: i) for the success case
where ret_reg can be in [0, msize_max_value], and ii) for the error case with
ret_reg known to be in interval [S32MIN, -1]. Latter will preserve the bounds
during these shift patterns and can match reg < 0 test. test_progs also succeed
with this approach.
[0] https://lore.kernel.org/bpf/158507130343.15666.8018068546764556975.stgit@john-Precision-5820-Tower/
[1] https://lore.kernel.org/bpf/158015334199.28573.4940395881683556537.stgit@john-XPS-13-9370/T/#m2e0ad1d5949131014748b6daa48a3495e7f0456d
Fixes: 849fa50662 ("bpf/verifier: refine retval R0 state for bpf_get_stack helper")
Reported-by: Lorenzo Fontana <fontanalorenz@gmail.com>
Reported-by: Leonardo Di Donato <leodidonato@gmail.com>
Reported-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 80c503e0e6 upstream.
The __torture_print_stats() function in locktorture.c carefully
initializes local variable "min" to statp[0].n_lock_acquired, but
then compares it to statp[i].n_lock_fail. Given that the .n_lock_fail
field should normally be zero, and given the initialization, it seems
reasonable to display the maximum and minimum number acquisitions
instead of miscomputing the maximum and minimum number of failures.
This commit therefore switches from failures to acquisitions.
And this turns out to be not only a day-zero bug, but entirely my
own fault. I hate it when that happens!
Fixes: 0af3fe1efa ("locktorture: Add a lock-torture kernel module")
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3670664b5d upstream.
ev_byte_channel_send() assumes that its third argument is a 16 byte
array. Some places where it is called it may not be (or we can't
easily tell if it is). Newer compilers have started producing warnings
about this, so make sure we actually pass a 16 byte array.
There may be more elegant solutions to this, but the driver is quite
old and hasn't been updated in many years.
The warnings (from a powerpc allyesconfig build) are:
In file included from include/linux/byteorder/big_endian.h:5,
from arch/powerpc/include/uapi/asm/byteorder.h:14,
from include/asm-generic/bitops/le.h:6,
from arch/powerpc/include/asm/bitops.h:250,
from include/linux/bitops.h:29,
from include/linux/kernel.h:12,
from include/asm-generic/bug.h:19,
from arch/powerpc/include/asm/bug.h:109,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/gfp.h:5,
from include/linux/slab.h:15,
from drivers/tty/ehv_bytechan.c:24:
drivers/tty/ehv_bytechan.c: In function ‘ehv_bc_udbg_putc’:
arch/powerpc/include/asm/epapr_hcalls.h:298:20: warning: array subscript 1 is outside array bounds of ‘const char[1]’ [-Warray-bounds]
298 | r6 = be32_to_cpu(p[1]);
include/uapi/linux/byteorder/big_endian.h:40:51: note: in definition of macro ‘__be32_to_cpu’
40 | #define __be32_to_cpu(x) ((__force __u32)(__be32)(x))
| ^
arch/powerpc/include/asm/epapr_hcalls.h:298:7: note: in expansion of macro ‘be32_to_cpu’
298 | r6 = be32_to_cpu(p[1]);
| ^~~~~~~~~~~
drivers/tty/ehv_bytechan.c:166:13: note: while referencing ‘data’
166 | static void ehv_bc_udbg_putc(char c)
| ^~~~~~~~~~~~~~~~
Fixes: dcd83aaff1 ("tty/powerpc: introduce the ePAPR embedded hypervisor byte channel driver")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
[mpe: Trim warnings from change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200109183912.5fcb52aa@canb.auug.org.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 93166f5f2e upstream.
Clang warns:
../drivers/video/fbdev/core/fbmem.c:665:3: warning: misleading
indentation; statement is not part of the previous 'else'
[-Wmisleading-indentation]
if (fb_logo.depth > 4 && depth > 4) {
^
../drivers/video/fbdev/core/fbmem.c:661:2: note: previous statement is
here
else
^
../drivers/video/fbdev/core/fbmem.c:1075:3: warning: misleading
indentation; statement is not part of the previous 'if'
[-Wmisleading-indentation]
return ret;
^
../drivers/video/fbdev/core/fbmem.c:1072:2: note: previous statement is
here
if (!ret)
^
2 warnings generated.
This warning occurs because there are spaces before the tabs on these
lines. Normalize the indentation in these functions so that it is
consistent with the Linux kernel coding style and clang no longer warns.
Fixes: 1692b37c99 ("fbdev: Fix logo if logo depth is less than framebuffer depth")
Link: https://github.com/ClangBuiltLinux/linux/issues/825
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191218030025.10064-1-natechancellor@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 328b50e9a0 upstream.
The chip is configured in 24 bit mode. The values read from
it must always be treated as is. This fixes the issue by
replacing the previous 16 bits value by a 24 bits buffer.
This changes affects the value output by previous version of
the driver, since the least significant byte was missing.
The upper half of 16 bit values previously output are now
the upper half of a 24 bit value.
Fixes: e01e7eaf37 ("iio: light: introduce si1133")
Reported-by: Simon Goyette <simon.goyette@gmail.com>
Co-authored-by: Guillaume Champagne <champagne.guillaume.c@gmail.com>
Signed-off-by: Maxime Roussin-Bélanger <maxime.roussinbelanger@gmail.com>
Signed-off-by: Guillaume Champagne <champagne.guillaume.c@gmail.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0802dc411 upstream.
Commit f949a12fd6 ("net: dsa: bcm_sf2: fix buffer overflow doing
set_rxnfc") tried to fix the some user controlled buffer overflows in
bcm_sf2_cfp_rule_set() and bcm_sf2_cfp_rule_del() but the fix was using
CFP_NUM_RULES, which while it is correct not to overflow the bitmaps, is
not representative of what the device actually supports. Correct that by
using bcm_sf2_cfp_rule_size() instead.
The latter subtracts the number of rules by 1, so change the checks from
greater than or equal to greater than accordingly.
Fixes: f949a12fd6 ("net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 028a12f5aa ]
Certain boards with GP107/GP108 chipsets hang (often, but randomly) for
unknown reasons during GR initialisation.
The first tell-tale symptom of this issue is:
nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ]
appearing in dmesg, likely followed by many other failures being logged.
Karol found this WAR for the issue a while back, but efforts to isolate
the root cause and proper fix have not yielded success so far. I've
modified the original patch to include a few more details, limit it to
GP107/GP108 by default, and added a config option to override this choice.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc5a941223 ]
There is a race condition that we may miss to wait for all node pages
writeback, fix it.
- fsync() - shrink
- f2fs_do_sync_file
- __write_node_page
- set_page_writeback(page#0)
: remove DIRTY/TOWRITE flag
- f2fs_fsync_node_pages
: won't find page #0 as TOWRITE flag was removeD
- f2fs_wait_on_node_pages_writeback
: wont' wait page #0 writeback as it was not in fsync_node_list list.
- f2fs_add_fsync_node_entry
Fixes: 50fa53eccf ("f2fs: fix to avoid broken of dnode block list")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c20f365346 ]
The SPA of the GCR3 table root pointer[51:31] masks 20 bits. However,
this requires 21 bits (Please see the AMD IOMMU specification).
This leads to the potential failure when the bit 51 of SPA of
the GCR3 table root pointer is 1'.
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Fixes: 52815b7568 ("iommu/amd: Add support for IOMMUv2 domain mode")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e42fe5b29a ]
The Intel Compute Stick `STK1A32SC` can have a system vendor of
"Intel(R) Client Systems".
Broaden the Intel Compute Stick DMI checks so that they match "Intel
Corporation" as well as "Intel(R) Client Systems".
This fixes an issue where the STK1A32SC compute sticks were still
exposing a battery with the existing blacklist entry.
Signed-off-by: Jeffery Miller <jmiller@neverware.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 12879bda3c ]
WARNING: vmlinux.o(.text+0x2366): Section mismatch in reference from the
function csky_start_secondary() to the function .init.text:init_fpu()
The function csky_start_secondary() references
the function __init init_fpu().
This is often because csky_start_secondary lacks a __init
annotation or the annotation of init_fpu is wrong.
Reported-by: Lu Chongzhi <chongzhi.lcz@alibaba-inc.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4047aa909c ]
xdr_buf_read_mic() tries to find unused contiguous space in a
received xdr_buf in order to linearize the checksum for the call
to gss_verify_mic. However, the corner cases in this code are
numerous and we seem to keep missing them. I've just hit yet
another buffer overrun related to it.
This overrun is at the end of xdr_buf_read_mic():
1284 if (buf->tail[0].iov_len != 0)
1285 mic->data = buf->tail[0].iov_base + buf->tail[0].iov_len;
1286 else
1287 mic->data = buf->head[0].iov_base + buf->head[0].iov_len;
1288 __read_bytes_from_xdr_buf(&subbuf, mic->data, mic->len);
1289 return 0;
This logic assumes the transport has set the length of the tail
based on the size of the received message. base + len is then
supposed to be off the end of the message but still within the
actual buffer.
In fact, the length of the tail is set by the upper layer when the
Call is encoded so that the end of the tail is actually the end of
the allocated buffer itself. This causes the logic above to set
mic->data to point past the end of the receive buffer.
The "mic->data = head" arm of this if statement is no less fragile.
As near as I can tell, this has been a problem forever. I'm not sure
that minimizing au_rslack recently changed this pathology much.
So instead, let's use a more straightforward approach: kmalloc a
separate buffer to linearize the checksum. This is similar to
how gss_validate() currently works.
Coming back to this code, I had some trouble understanding what
was going on. So I've cleaned up the variable naming and added
a few comments that point back to the XDR definition in RFC 2203
to help guide future spelunkers, including myself.
As an added clean up, the functionality that was in
xdr_buf_read_mic() is folded directly into gss_unwrap_resp_integ(),
as that is its only caller.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32302085a8 ]
Fix a debug-only build error in ext2/xattr.c:
When building without extra debugging, (and with another patch that uses
no_printk() instead of <empty> for the ext2-xattr debug-print macros,
this build error happens:
../fs/ext2/xattr.c: In function ‘ext2_xattr_cache_insert’:
../fs/ext2/xattr.c:869:18: error: ‘ext2_xattr_cache’ undeclared (first use in
this function); did you mean ‘ext2_xattr_list’?
atomic_read(&ext2_xattr_cache->c_entry_count));
Fix the problem by removing cached entry count from the debug message
since otherwise we'd have to export the mbcache structure just for that.
Fixes: be0726d33c ("ext2: convert to mbcache2")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 52355fb191 ]
Intel VT-d might support PRS (Page Reqest Support) when it's
running in the scalable mode. Each page request descriptor
occupies 32 bytes and is 32-bytes aligned. The page request
descriptor offset mask should be 32-bytes aligned.
Fixes: 5b438f4ba3 ("iommu/vt-d: Support page request in scalable mode")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 44a52022e7 ]
When EXT2_ATTR_DEBUG is not defined, modify the 2 debug macros
to use the no_printk() macro instead of <nothing>.
This fixes gcc warnings when -Wextra is used:
../fs/ext2/xattr.c:252:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
../fs/ext2/xattr.c:258:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
../fs/ext2/xattr.c:330:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
../fs/ext2/xattr.c:872:45: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body]
I have verified that the only object code change (with gcc 7.5.0) is
the reversal of some instructions from 'cmp a,b' to 'cmp b,a'.
Link: https://lore.kernel.org/r/e18a7395-61fb-2093-18e8-ed4f8cf56248@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df513a7711 ]
Ever since commit 2c94b8eca1 ("SUNRPC: Use au_rslack when computing
reply buffer size"). It changed how "req->rq_rcvsize" is calculated. It
used to use au_cslack value which was nice and large and changed it to
au_rslack value which turns out to be too small.
Since 5.1, v3 mount with sec=krb5p fails against an Ontap server
because client's receive buffer it too small.
For gss krb5p, we need to account for the mic token in the verifier,
and the wrap token in the wrap token.
RFC 4121 defines:
mic token
Octet no Name Description
--------------------------------------------------------------
0..1 TOK_ID Identification field. Tokens emitted by
GSS_GetMIC() contain the hex value 04 04
expressed in big-endian order in this
field.
2 Flags Attributes field, as described in section
4.2.2.
3..7 Filler Contains five octets of hex value FF.
8..15 SND_SEQ Sequence number field in clear text,
expressed in big-endian order.
16..last SGN_CKSUM Checksum of the "to-be-signed" data and
octet 0..15, as described in section 4.2.4.
that's 16bytes (GSS_KRB5_TOK_HDR_LEN) + chksum
wrap token
Octet no Name Description
--------------------------------------------------------------
0..1 TOK_ID Identification field. Tokens emitted by
GSS_Wrap() contain the hex value 05 04
expressed in big-endian order in this
field.
2 Flags Attributes field, as described in section
4.2.2.
3 Filler Contains the hex value FF.
4..5 EC Contains the "extra count" field, in big-
endian order as described in section 4.2.3.
6..7 RRC Contains the "right rotation count" in big-
endian order, as described in section
4.2.5.
8..15 SND_SEQ Sequence number field in clear text,
expressed in big-endian order.
16..last Data Encrypted data for Wrap tokens with
confidentiality, or plaintext data followed
by the checksum for Wrap tokens without
confidentiality, as described in section
4.2.4.
Also 16bytes of header (GSS_KRB5_TOK_HDR_LEN), encrypted data, and cksum
(other things like padding)
RFC 3961 defines known cksum sizes:
Checksum type sumtype checksum section or
value size reference
---------------------------------------------------------------------
CRC32 1 4 6.1.3
rsa-md4 2 16 6.1.2
rsa-md4-des 3 24 6.2.5
des-mac 4 16 6.2.7
des-mac-k 5 8 6.2.8
rsa-md4-des-k 6 16 6.2.6
rsa-md5 7 16 6.1.1
rsa-md5-des 8 24 6.2.4
rsa-md5-des3 9 24 ??
sha1 (unkeyed) 10 20 ??
hmac-sha1-des3-kd 12 20 6.3
hmac-sha1-des3 13 20 ??
sha1 (unkeyed) 14 20 ??
hmac-sha1-96-aes128 15 20 [KRB5-AES]
hmac-sha1-96-aes256 16 20 [KRB5-AES]
[reserved] 0x8003 ? [GSS-KRB5]
Linux kernel now mainly supports type 15,16 so max cksum size is 20bytes.
(GSS_KRB5_MAX_CKSUM_LEN)
Re-use already existing define of GSS_KRB5_MAX_SLACK_NEEDED that's used
for encoding the gss_wrap tokens (same tokens are used in reply).
Fixes: 2c94b8eca1 ("SUNRPC: Use au_rslack when computing reply buffer size")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7062af3ed2 ]
Calling viommu_domain_free() on a domain that hasn't been finalised (not
attached to any device, for example) can currently cause an Oops,
because we attempt to call ida_free() on ID 0, which may either be
unallocated or used by another domain.
Only initialise the vdomain->viommu pointer, which denotes a finalised
domain, at the end of a successful viommu_domain_finalise().
Fixes: edcd69ab9a ("iommu: Add virtio-iommu driver")
Reported-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20200326093558.2641019-3-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 35f3401317 ]
When building UML with glibc 2.17 installed, compilation
of arch/um/os-Linux/file.c fails due to failure to find
FALLOC_FL_PUNCH_HOLE and FALLOC_FL_KEEP_SIZE definitions.
It appears that /usr/include/bits/fcntl-linux.h (indirectly
included by /usr/include/fcntl.h) does not include falloc.h
with an older glibc, whereas a more up-to-date version
does.
Adding the direct include to file.c resolves the issue
and does not cause problems for more recent glibc.
Fixes: 50109b5a03 ("um: Add support for DISCARD in the UBD Driver")
Cc: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c0e343d76 ]
We should get psr value from regs->psr in stack, not directly get
it from phyiscal register then save the vector number in
tsk->trap_no.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 862f35c947 ]
If we just set the mirror count to 1 without first clearing out
the mirrors, we can leak queued up requests.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aefd9461d3 ]
For the memory size ( > 512MB, < 1GB), the MSA setting is:
- SSEG0: PHY_START , PHY_START + 512MB
- SSEG1: PHY_START + 512MB, PHY_START + 1GB
But the real memory is no more than 1GB, there is a gap between the
end size of memory and border of 1GB. CPU could speculatively
execute to that gap and if the gap of the bus couldn't respond to
the CPU request, then the crash will happen.
Now make the setting with:
- SSEG0: PHY_START , PHY_START + 512MB (no change)
- SSEG1: Disabled (We use highmem to use the memory of 512MB~1GB)
We also deprecated zhole_szie[] settings, it's only used by arm
style CPUs. All memory gap should use Reserved setting of dts in
csky system.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 696ac2e3bf ]
Similar to commit 0266d81e9b ("acpi/processor: Prevent cpu hotplug
deadlock") except this is for acpi_processor_ffh_cstate_probe():
"The problem is that the work is scheduled on the current CPU from the
hotplug thread associated with that CPU.
It's not required to invoke these functions via the workqueue because
the hotplug thread runs on the target CPU already.
Check whether current is a per cpu thread pinned on the target CPU and
invoke the function directly to avoid the workqueue."
WARNING: possible circular locking dependency detected
------------------------------------------------------
cpuhp/1/15 is trying to acquire lock:
ffffc90003447a28 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: __flush_work+0x4c6/0x630
but task is already holding lock:
ffffffffafa1c0e8 (cpuidle_lock){+.+.}-{3:3}, at: cpuidle_pause_and_lock+0x17/0x20
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (cpu_hotplug_lock){++++}-{0:0}:
cpus_read_lock+0x3e/0xc0
irq_calc_affinity_vectors+0x5f/0x91
__pci_enable_msix_range+0x10f/0x9a0
pci_alloc_irq_vectors_affinity+0x13e/0x1f0
pci_alloc_irq_vectors_affinity at drivers/pci/msi.c:1208
pqi_ctrl_init+0x72f/0x1618 [smartpqi]
pqi_pci_probe.cold.63+0x882/0x892 [smartpqi]
local_pci_probe+0x7a/0xc0
work_for_cpu_fn+0x2e/0x50
process_one_work+0x57e/0xb90
worker_thread+0x363/0x5b0
kthread+0x1f4/0x220
ret_from_fork+0x27/0x50
-> #0 ((work_completion)(&wfc.work)){+.+.}-{0:0}:
__lock_acquire+0x2244/0x32a0
lock_acquire+0x1a2/0x680
__flush_work+0x4e6/0x630
work_on_cpu+0x114/0x160
acpi_processor_ffh_cstate_probe+0x129/0x250
acpi_processor_evaluate_cst+0x4c8/0x580
acpi_processor_get_power_info+0x86/0x740
acpi_processor_hotplug+0xc3/0x140
acpi_soft_cpu_online+0x102/0x1d0
cpuhp_invoke_callback+0x197/0x1120
cpuhp_thread_fun+0x252/0x2f0
smpboot_thread_fn+0x255/0x440
kthread+0x1f4/0x220
ret_from_fork+0x27/0x50
other info that might help us debug this:
Chain exists of:
(work_completion)(&wfc.work) --> cpuhp_state-up --> cpuidle_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(cpuidle_lock);
lock(cpuhp_state-up);
lock(cpuidle_lock);
lock((work_completion)(&wfc.work));
*** DEADLOCK ***
3 locks held by cpuhp/1/15:
#0: ffffffffaf51ab10 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x69/0x2f0
#1: ffffffffaf51ad40 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x69/0x2f0
#2: ffffffffafa1c0e8 (cpuidle_lock){+.+.}-{3:3}, at: cpuidle_pause_and_lock+0x17/0x20
Call Trace:
dump_stack+0xa0/0xea
print_circular_bug.cold.52+0x147/0x14c
check_noncircular+0x295/0x2d0
__lock_acquire+0x2244/0x32a0
lock_acquire+0x1a2/0x680
__flush_work+0x4e6/0x630
work_on_cpu+0x114/0x160
acpi_processor_ffh_cstate_probe+0x129/0x250
acpi_processor_evaluate_cst+0x4c8/0x580
acpi_processor_get_power_info+0x86/0x740
acpi_processor_hotplug+0xc3/0x140
acpi_soft_cpu_online+0x102/0x1d0
cpuhp_invoke_callback+0x197/0x1120
cpuhp_thread_fun+0x252/0x2f0
smpboot_thread_fn+0x255/0x440
kthread+0x1f4/0x220
ret_from_fork+0x27/0x50
Signed-off-by: Qian Cai <cai@lca.pw>
Tested-by: Borislav Petkov <bp@suse.de>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64ed6588c2 ]
The warning message when a led is renamed due to name collition can fail
to show proper original name if init_data is used. Eg:
[ 9.073996] leds-gpio a0040000.leds_0: Led (null) renamed to red_led_1 due to name collision
Fixes: bb4e9af034 ("leds: core: Add support for composing LED class device names")
Signed-off-by: Ricardo Ribalda Delgado <ribalda@kernel.org>
Acked-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1493e0f944 ]
We have to properly retry again by returning -EINVAL immediately in case
somebody else instantiated the table concurrently. We missed to add the
goto in this function only. The code now matches the other, similar
shadowing functions.
We are overwriting an existing region 2 table entry. All allocated pages
are added to the crst_list to be freed later, so they are not lost
forever. However, when unshadowing the region 2 table, we wouldn't trigger
unshadowing of the original shadowed region 3 table that we replaced. It
would get unshadowed when the original region 3 table is modified. As it's
not connected to the page table hierarchy anymore, it's not going to get
used anymore. However, for a limited time, this page table will stick
around, so it's in some sense a temporary memory leak.
Identified by manual code inspection. I don't think this classifies as
stable material.
Fixes: 998f637cc4 ("s390/mm: avoid races on region/segment/page table shadowing")
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200403153050.20569-4-david@redhat.com
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af9c5d2e3b ]
compiletime_assert() uses __LINE__ to create a unique function name. This
means that if you have more than one BUILD_BUG_ON() in the same source
line (which can happen if they appear e.g. in a macro), then the error
message from the compiler might output the wrong condition.
For this source file:
#include <linux/build_bug.h>
#define macro() \
BUILD_BUG_ON(1); \
BUILD_BUG_ON(0);
void foo()
{
macro();
}
gcc would output:
./include/linux/compiler.h:350:38: error: call to `__compiletime_assert_9' declared with attribute error: BUILD_BUG_ON failed: 0
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
However, it was not the BUILD_BUG_ON(0) that failed, so it should say 1
instead of 0. With this patch, we use __COUNTER__ instead of __LINE__, so
each BUILD_BUG_ON() gets a different function name and the correct
condition is printed:
./include/linux/compiler.h:350:38: error: call to `__compiletime_assert_0' declared with attribute error: BUILD_BUG_ON failed: 1
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Daniel Santos <daniel.santos@pobox.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Ian Abbott <abbotti@mev.co.uk>
Cc: Joe Perches <joe@perches.com>
Link: http://lkml.kernel.org/r/20200331112637.25047-1-vegard.nossum@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7e23452002 ]
"vm_committed_as.count" could be accessed concurrently as reported by
KCSAN,
BUG: KCSAN: data-race in __vm_enough_memory / percpu_counter_add_batch
write to 0xffffffff9451c538 of 8 bytes by task 65879 on cpu 35:
percpu_counter_add_batch+0x83/0xd0
percpu_counter_add_batch at lib/percpu_counter.c:91
__vm_enough_memory+0xb9/0x260
dup_mm+0x3a4/0x8f0
copy_process+0x2458/0x3240
_do_fork+0xaa/0x9f0
__do_sys_clone+0x125/0x160
__x64_sys_clone+0x70/0x90
do_syscall_64+0x91/0xb05
entry_SYSCALL_64_after_hwframe+0x49/0xbe
read to 0xffffffff9451c538 of 8 bytes by task 66773 on cpu 19:
__vm_enough_memory+0x199/0x260
percpu_counter_read_positive at include/linux/percpu_counter.h:81
(inlined by) __vm_enough_memory at mm/util.c:839
mmap_region+0x1b2/0xa10
do_mmap+0x45c/0x700
vm_mmap_pgoff+0xc0/0x130
ksys_mmap_pgoff+0x6e/0x300
__x64_sys_mmap+0x33/0x40
do_syscall_64+0x91/0xb05
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The read is outside percpu_counter::lock critical section which results in
a data race. Fix it by adding a READ_ONCE() in
percpu_counter_read_positive() which could also service as the existing
compiler memory barrier.
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marco Elver <elver@google.com>
Link: http://lkml.kernel.org/r/1582302724-2804-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3f3673d7d3 ]
If CONFIG_DEVICE_PRIVATE is defined, but neither CONFIG_MEMORY_FAILURE nor
CONFIG_MIGRATION, then non_swap_entry() will return 0, meaning that the
condition (non_swap_entry(entry) && is_device_private_entry(entry)) in
zap_pte_range() will never be true even if the entry is a device private
one.
Equally any other code depending on non_swap_entry() will not function as
expected.
I originally spotted this just by looking at the code, I haven't actually
observed any problems.
Looking a bit more closely it appears that actually this situation
(currently at least) cannot occur:
DEVICE_PRIVATE depends on ZONE_DEVICE
ZONE_DEVICE depends on MEMORY_HOTREMOVE
MEMORY_HOTREMOVE depends on MIGRATION
Fixes: 5042db43cc ("mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory")
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Link: http://lkml.kernel.org/r/20200305130550.22693-1-steven.price@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b92103b559 ]
find_vma_intersection(mm, start, end) only guarantees that end is greater
than or equal to vma->vm_start but doesn't guarantee that start is
greater than or equal to vma->vm_start. The calculation for the
intersecting range in nouveau_svmm_bind() isn't accounting for this and
can call migrate_vma_setup() with a starting address less than
vma->vm_start. This results in migrate_vma_setup() returning -EINVAL for
the range instead of nouveau skipping that part of the range and migrating
the rest.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>