commit eba04b20e4 upstream.
Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are
clearly associated with a single task/VM.
Note, there are a several SEV allocations that aren't accounted, but
those can (hopefully) be fixed by using the local stack for memory.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210331023025.2485960-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2cdea19a34 upstream.
Since 5bfa685e62 ("KVM: arm64: vgic: Read HW interrupt pending state
from the HW"), we're able to source the pending bit for an interrupt
that is stored either on the physical distributor or on a device.
However, this state is only available when the vcpu is loaded,
and is not intended to be accessed from userspace. Unfortunately,
the GICv2 emulation doesn't provide specific userspace accessors,
and we fallback with the ones that are intended for the guest,
with fatal consequences.
Add a new vgic_uaccess_read_pending() accessor for userspace
to use, build on top of the existing vgic_mmio_read_pending().
Reported-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes: 5bfa685e62 ("KVM: arm64: vgic: Read HW interrupt pending state from the HW")
Link: https://lore.kernel.org/r/20220607131427.1164881-2-maz@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b55c3cd102 upstream.
We capture a NULL pointer issue when resizing a corrupt ext4 image which
is freshly clear resize_inode feature (not run e2fsck). It could be
simply reproduced by following steps. The problem is because of the
resize_inode feature was cleared, and it will convert the filesystem to
meta_bg mode in ext4_resize_fs(), but the es->s_reserved_gdt_blocks was
not reduced to zero, so could we mistakenly call reserve_backup_gdb()
and passing an uninitialized resize_inode to it when adding new group
descriptors.
mkfs.ext4 /dev/sda 3G
tune2fs -O ^resize_inode /dev/sda #forget to run requested e2fsck
mount /dev/sda /mnt
resize2fs /dev/sda 8G
========
BUG: kernel NULL pointer dereference, address: 0000000000000028
CPU: 19 PID: 3243 Comm: resize2fs Not tainted 5.18.0-rc7-00001-gfde086c5ebfd #748
...
RIP: 0010:ext4_flex_group_add+0xe08/0x2570
...
Call Trace:
<TASK>
ext4_resize_fs+0xbec/0x1660
__ext4_ioctl+0x1749/0x24e0
ext4_ioctl+0x12/0x20
__x64_sys_ioctl+0xa6/0x110
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2dd739617b
========
The fix is simple, add a check in ext4_resize_begin() to make sure that
the es->s_reserved_gdt_blocks is zero when the resize_inode feature is
disabled.
Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220601092717.763694-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a08f789d2a upstream.
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/mballoc.c:3211!
[...]
RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
[...]
Call Trace:
ext4_mb_new_blocks+0x9df/0x5d30
ext4_ext_map_blocks+0x1803/0x4d80
ext4_map_blocks+0x3a4/0x1a10
ext4_writepages+0x126d/0x2c30
do_writepages+0x7f/0x1b0
__filemap_fdatawrite_range+0x285/0x3b0
file_write_and_wait_range+0xb1/0x140
ext4_sync_file+0x1aa/0xca0
vfs_fsync_range+0xfb/0x260
do_fsync+0x48/0xa0
[...]
==================================================================
Above issue may happen as follows:
-------------------------------------
do_fsync
vfs_fsync_range
ext4_sync_file
file_write_and_wait_range
__filemap_fdatawrite_range
do_writepages
ext4_writepages
mpage_map_and_submit_extent
mpage_map_one_extent
ext4_map_blocks
ext4_mb_new_blocks
ext4_mb_normalize_request
>>> start + size <= ac->ac_o_ex.fe_logical
ext4_mb_regular_allocator
ext4_mb_simple_scan_group
ext4_mb_use_best_found
ext4_mb_new_preallocation
ext4_mb_new_inode_pa
ext4_mb_use_inode_pa
>>> set ac->ac_b_ex.fe_len <= 0
ext4_mb_mark_diskspace_used
>>> BUG_ON(ac->ac_b_ex.fe_len <= 0);
we can easily reproduce this problem with the following commands:
`fallocate -l100M disk`
`mkfs.ext4 -b 1024 -g 256 disk`
`mount disk /mnt`
`fsstress -d /mnt -l 0 -n 1000 -p 1`
The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
when the size is truncated. So start should be the start position of
the group where ac_o_ex.fe_logical is located after alignment.
In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
is very large, the value calculated by start_off is more accurate.
Cc: stable@kernel.org
Fixes: cd648b8a8f ("ext4: trim allocation requests to group size")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220528110017.354175-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fd17f2ac0 upstream.
[Why]
For OLED eDP the Display Manager uses max_cll value as a limit
for brightness control.
max_cll defines the content light luminance for individual pixel.
Whereas max_fall defines frame-average level luminance.
The user may not observe the difference in brightness in between
max_fall and max_cll.
That negatively impacts the user experience.
[How]
Use max_fall value instead of max_cll as a limit for brightness control.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85e123c27d upstream.
The code in dm-log rounds up bitset_size to 32 bits. It then uses
find_next_zero_bit_le on the allocated region. find_next_zero_bit_le
accesses the bitmap using unsigned long pointers. So, on 64-bit
architectures, it may access 4 bytes beyond the allocated size.
Fix this bug by rounding up bitset_size to BITS_PER_LONG.
This bug was found by running the lvm2 testsuite with kasan.
Fixes: 29121bd0b0 ("[PATCH] dm mirror log: bitset_size fix")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abfed87e2a upstream.
This is used by code that doesn't need CONFIG_CRYPTO, so move this into
lib/ with a Kconfig option so that it can be selected by whatever needs
it.
This fixes a linker error Zheng pointed out when
CRYPTO_MANAGER_DISABLE_TESTS!=y and CRYPTO=m:
lib/crypto/curve25519-selftest.o: In function `curve25519_selftest':
curve25519-selftest.c:(.init.text+0x60): undefined reference to `__crypto_memneq'
curve25519-selftest.c:(.init.text+0xec): undefined reference to `__crypto_memneq'
curve25519-selftest.c:(.init.text+0x114): undefined reference to `__crypto_memneq'
curve25519-selftest.c:(.init.text+0x154): undefined reference to `__crypto_memneq'
Reported-by: Zheng Bin <zhengbin13@huawei.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: stable@vger.kernel.org
Fixes: aa127963f1 ("crypto: lib/curve25519 - re-add selftests")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 27071b5cbc ]
Even though the DW I2C controller reference clock source is requested by
the method devm_clk_get() with non-optional clock requirement the way the
clock handler is used afterwards has a pure optional clock semantic
(though in some circumstances we can get a warning about the clock missing
printed in the system console). There is no point in reimplementing that
functionality seeing the kernel clock framework already supports the
optional interface from scratch. Thus let's convert the platform driver to
using it.
Note by providing this commit we get to fix two problems. The first one
was introduced in commit c62ebb3d5f ("i2c: designware: Add support for
an interface clock"). It causes not having the interface clock (pclk)
enabled/disabled in case if the reference clock isn't provided. The second
problem was first introduced in commit b33af11de2 ("i2c: designware: Do
not require clock when SSCN and FFCN are provided"). Since that
modification the deferred probe procedure has been unsupported in case if
the interface clock isn't ready.
Fixes: c62ebb3d5f ("i2c: designware: Add support for an interface clock")
Fixes: b33af11de2 ("i2c: designware: Do not require clock when SSCN and FFCN are provided")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec8401a429 ]
of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
When kcalloc fails, it missing of_node_put() and results in refcount
leak. Fix this by goto out_put_node label.
Fixes: 52085d3f20 ("irqchip/gic-v3: Dynamically allocate PPI partition descriptors")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220601080930.31005-5-linmq006@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6ba12b56b9 ]
As platform_driver_register() could fail, it should be better
to deal with the return value in order to maintain the code
consisitency.
Fixes: 56a1485b10 ("i2c: npcm7xx: Add Nuvoton NPCM I2C controller driver")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Tali Perry <tali.perry1@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcea997bee ]
If a function lives in a section other than .text, but .text also exists
in the object, faddr2line may wrongly assume .text. This can result in
comically wrong output. For example:
$ scripts/faddr2line vmlinux.o enter_from_user_mode+0x1c
enter_from_user_mode+0x1c/0x30:
find_next_bit at /home/jpoimboe/git/linux/./include/linux/find.h:40
(inlined by) perf_clear_dirty_counters at /home/jpoimboe/git/linux/arch/x86/events/core.c:2504
Fix it by passing the section name to addr2line, unless the object file
is vmlinux, in which case the symbol table uses absolute addresses.
Fixes: 1d1a0e7c51 ("scripts/faddr2line: Fix overlapping text section failures")
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/7d25bc1408bd3a750ac26e60d2f2815a5f4a8363.1654130536.git.jpoimboe@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 14dc7a18ab ]
This patch prevents that test nvme/004 triggers the following:
UBSAN: array-index-out-of-bounds in block/blk-mq.h:135:9
index 512 is out of range for type 'long unsigned int [512]'
Call Trace:
show_stack+0x52/0x58
dump_stack_lvl+0x49/0x5e
dump_stack+0x10/0x12
ubsan_epilogue+0x9/0x3b
__ubsan_handle_out_of_bounds.cold+0x44/0x49
blk_mq_alloc_request_hctx+0x304/0x310
__nvme_submit_sync_cmd+0x70/0x200 [nvme_core]
nvmf_connect_io_queue+0x23e/0x2a0 [nvme_fabrics]
nvme_loop_connect_io_queues+0x8d/0xb0 [nvme_loop]
nvme_loop_create_ctrl+0x58e/0x7d0 [nvme_loop]
nvmf_create_ctrl+0x1d7/0x4d0 [nvme_fabrics]
nvmf_dev_write+0xae/0x111 [nvme_fabrics]
vfs_write+0x144/0x560
ksys_write+0xb7/0x140
__x64_sys_write+0x42/0x50
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Fixes: 20e4d81393 ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220615210004.1031820-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a625357997 ]
Sometimes it is necessary to use a PLT entry to call an ftrace
trampoline. This is handled by ftrace_make_call() and ftrace_make_nop(),
with each having *almost* identical logic, but this is not handled by
ftrace_modify_call() since its introduction in commit:
3b23e4991f ("arm64: implement ftrace with regs")
Due to this, if we ever were to call ftrace_modify_call() for a callsite
which requires a PLT entry for a trampoline, then either:
a) If the old addr requires a trampoline, ftrace_modify_call() will use
an out-of-range address to generate the 'old' branch instruction.
This will result in warnings from aarch64_insn_gen_branch_imm() and
ftrace_modify_code(), and no instructions will be modified. As
ftrace_modify_call() will return an error, this will result in
subsequent internal ftrace errors.
b) If the old addr does not require a trampoline, but the new addr does,
ftrace_modify_call() will use an out-of-range address to generate the
'new' branch instruction. This will result in warnings from
aarch64_insn_gen_branch_imm(), and ftrace_modify_code() will replace
the 'old' branch with a BRK. This will result in a kernel panic when
this BRK is later executed.
Practically speaking, case (a) is vastly more likely than case (b), and
typically this will result in internal ftrace errors that don't
necessarily affect the rest of the system. This can be demonstrated with
an out-of-tree test module which triggers ftrace_modify_call(), e.g.
| # insmod test_ftrace.ko
| test_ftrace: Function test_function raw=0xffffb3749399201c, callsite=0xffffb37493992024
| branch_imm_common: offset out of range
| branch_imm_common: offset out of range
| ------------[ ftrace bug ]------------
| ftrace failed to modify
| [<ffffb37493992024>] test_function+0x8/0x38 [test_ftrace]
| actual: 1d:00:00:94
| Updating ftrace call site to call a different ftrace function
| ftrace record flags: e0000002
| (2) R
| expected tramp: ffffb374ae42ed54
| ------------[ cut here ]------------
| WARNING: CPU: 0 PID: 165 at kernel/trace/ftrace.c:2085 ftrace_bug+0x280/0x2b0
| Modules linked in: test_ftrace(+)
| CPU: 0 PID: 165 Comm: insmod Not tainted 5.19.0-rc2-00002-g4d9ead8b45ce #13
| Hardware name: linux,dummy-virt (DT)
| pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : ftrace_bug+0x280/0x2b0
| lr : ftrace_bug+0x280/0x2b0
| sp : ffff80000839ba00
| x29: ffff80000839ba00 x28: 0000000000000000 x27: ffff80000839bcf0
| x26: ffffb37493994180 x25: ffffb374b0991c28 x24: ffffb374b0d70000
| x23: 00000000ffffffea x22: ffffb374afcc33b0 x21: ffffb374b08f9cc8
| x20: ffff572b8462c000 x19: ffffb374b08f9000 x18: ffffffffffffffff
| x17: 6c6c6163202c6331 x16: ffffb374ae5ad110 x15: ffffb374b0d51ee4
| x14: 0000000000000000 x13: 3435646532346561 x12: 3437336266666666
| x11: 203a706d61727420 x10: 6465746365707865 x9 : ffffb374ae5149e8
| x8 : 336266666666203a x7 : 706d617274206465 x6 : 00000000fffff167
| x5 : ffff572bffbc4a08 x4 : 00000000fffff167 x3 : 0000000000000000
| x2 : 0000000000000000 x1 : ffff572b84461e00 x0 : 0000000000000022
| Call trace:
| ftrace_bug+0x280/0x2b0
| ftrace_replace_code+0x98/0xa0
| ftrace_modify_all_code+0xe0/0x144
| arch_ftrace_update_code+0x14/0x20
| ftrace_startup+0xf8/0x1b0
| register_ftrace_function+0x38/0x90
| test_ftrace_init+0xd0/0x1000 [test_ftrace]
| do_one_initcall+0x50/0x2b0
| do_init_module+0x50/0x1f0
| load_module+0x17c8/0x1d64
| __do_sys_finit_module+0xa8/0x100
| __arm64_sys_finit_module+0x2c/0x3c
| invoke_syscall+0x50/0x120
| el0_svc_common.constprop.0+0xdc/0x100
| do_el0_svc+0x3c/0xd0
| el0_svc+0x34/0xb0
| el0t_64_sync_handler+0xbc/0x140
| el0t_64_sync+0x18c/0x190
| ---[ end trace 0000000000000000 ]---
We can solve this by consistently determining whether to use a PLT entry
for an address.
Note that since (the earlier) commit:
f1a54ae9af ("arm64: module/ftrace: intialize PLT at load time")
... we can consistently determine the PLT address that a given callsite
will use, and therefore ftrace_make_nop() does not need to skip
validation when a PLT is in use.
This patch factors the existing logic out of ftrace_make_call() and
ftrace_make_nop() into a common ftrace_find_callable_addr() helper
function, which is used by ftrace_make_call(), ftrace_make_nop(), and
ftrace_modify_call(). In ftrace_make_nop() the patching is consistently
validated by ftrace_modify_code() as we can always determine what the
old instruction should have been.
Fixes: 3b23e4991f ("arm64: implement ftrace with regs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: "Ivan T. Ivanov" <iivanov@suse.de>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220614080944.1349146-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3eefdf9d1e ]
The branch range checks in ftrace_make_call() and ftrace_make_nop() are
incorrect, erroneously permitting a forwards branch of 128M and
erroneously rejecting a backwards branch of 128M.
This is because both functions calculate the offset backwards,
calculating the offset *from* the target *to* the branch, rather than
the other way around as the later comparisons expect.
If an out-of-range branch were erroeously permitted, this would later be
rejected by aarch64_insn_gen_branch_imm() as branch_imm_common() checks
the bounds correctly, resulting in warnings and the placement of a BRK
instruction. Note that this can only happen for a forwards branch of
exactly 128M, and so the caller would need to be exactly 128M bytes
below the relevant ftrace trampoline.
If an in-range branch were erroeously rejected, then:
* For modules when CONFIG_ARM64_MODULE_PLTS=y, this would result in the
use of a PLT entry, which is benign.
Note that this is the common case, as this is selected by
CONFIG_RANDOMIZE_BASE (and therefore RANDOMIZE_MODULE_REGION_FULL),
which distributions typically seelct. This is also selected by
CONFIG_ARM64_ERRATUM_843419.
* For modules when CONFIG_ARM64_MODULE_PLTS=n, this would result in
internal ftrace failures.
* For core kernel text, this would result in internal ftrace failues.
Note that for this to happen, the kernel text would need to be at
least 128M bytes in size, and typical configurations are smaller tha
this.
Fix this by calculating the offset *from* the branch *to* the target in
both functions.
Fixes: f8af0b364e ("arm64: ftrace: don't validate branch via PLT in ftrace_make_nop()")
Fixes: e71a4e1beb ("arm64: ftrace: add support for far branches to dynamic ftrace")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: "Ivan T. Ivanov" <iivanov@suse.de>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220614080944.1349146-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 219b51a6f0 ]
The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock
and block until it receives a packet from the remote. If the client
doesn`t connect to server and calls read() directly, it will not
receive any packets forever. As a result, the deadlock will happen.
The fail log caused by deadlock is shown below:
[ 369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds.
[ 369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 369.613058] Call Trace:
[ 369.613315] <TASK>
[ 369.614072] __schedule+0x2f9/0xb20
[ 369.615029] schedule+0x49/0xb0
[ 369.615734] __lock_sock+0x92/0x100
[ 369.616763] ? destroy_sched_domains_rcu+0x20/0x20
[ 369.617941] lock_sock_nested+0x6e/0x70
[ 369.618809] ax25_bind+0xaa/0x210
[ 369.619736] __sys_bind+0xca/0xf0
[ 369.620039] ? do_futex+0xae/0x1b0
[ 369.620387] ? __x64_sys_futex+0x7c/0x1c0
[ 369.620601] ? fpregs_assert_state_consistent+0x19/0x40
[ 369.620613] __x64_sys_bind+0x11/0x20
[ 369.621791] do_syscall_64+0x3b/0x90
[ 369.622423] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 369.623319] RIP: 0033:0x7f43c8aa8af7
[ 369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
[ 369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7
[ 369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005
[ 369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700
[ 369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe
[ 369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700
This patch replaces skb_recv_datagram() with an open-coded variant of it
releasing the socket lock before the __skb_wait_for_more_packets() call
and re-acquiring it after such call in order that other functions that
need socket lock could be executed.
what's more, the socket lock will be released only when recvmsg() will
block and that should produce nicer overall behavior.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Suggested-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reported-by: Thomas Habets <thomas@@habets.se>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b7a632ac4 ]
Both RIF and ACL flow counters use a 24-bit SW-managed counter address to
communicate which counter they want to bind.
In a number of Spectrum FW releases, binding a RIF counter is broken and
slices the counter index to 16 bits. As a result, on Spectrum-2 and above,
no more than about 410 RIF counters can be effectively used. This
translates to 205 netdevices for which L3 HW stats can be enabled. (This
does not happen on Spectrum-1, because there are fewer counters available
overall and the counter index never exceeds 16 bits.)
Binding counters to ACLs does not have this issue. Therefore reorder the
counter allocation scheme so that RIF counters come first and therefore get
lower indices that are below the 16-bit barrier.
Fixes: 98e60dce4d ("Merge branch 'mlxsw-Introduce-initial-Spectrum-2-support'")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220613125017.2018162-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1fc766b5c0 ]
This provides more context to users.
Old message:
[ 00.000000] No UUID available providing old NGUID
New message:
[ 00.000000] block nvme0n1: No UUID available providing old NGUID
Fixes: d934f9848a ("nvme: provide UUID value to userspace")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bff4bcf3cf ]
sysfs_emit is the recommended API to use for formatting strings to be
returned to user space. It is equivalent to scnprintf and aware of the
PAGE_SIZE buffer size.
Suggested-by: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fd5855e6b1 ]
After PF reset and ethtool -t there was call trace in dmesg
sometimes leading to panic. When there was some time, around 5
seconds, between reset and test there were no errors.
Problem was that pf reset calls i40e_vsi_close in prep_for_reset
and ethtool -t calls i40e_vsi_close in diag_test. If there was not
enough time between those commands the second i40e_vsi_close starts
before previous i40e_vsi_close was done which leads to crash.
Add check to diag_test if pf is in reset and don't start offline
tests if it is true.
Add netif_info("testing failed") into unhappy path of i40e_diag_test()
Fixes: e17bc411ae ("i40e: Disable offline diagnostics if VFs are enabled")
Fixes: 510efb2682 ("i40e: Fix ethtool offline diagnostic with netqueues")
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0bb050670a ]
If ADQ is enabled for a VF, then actual number of queue pair
is a number of currently available traffic classes for this VF.
Without this change the configuration of the Rx/Tx queues
fails with error.
Fixes: d29e0d233e ("i40e: missing input validation on VF message handling by the PF")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c3238d36c3 ]
Procedure of configure tc flower filters erroneously allows to create
filters on TC0 where unfiltered packets are also directed by default.
Issue was caused by insufficient checks of hw_tc parameter specifying
the hardware traffic class to pass matching packets to.
Fix checking hw_tc parameter which blocks creation of filters on TC0.
Fixes: 2f4b411a3d ("i40e: Enable cloud filters via tc-flower")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 245b993d8f ]
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.
modpost used to detect it, but it has been broken for a decade.
Recently, I fixed modpost so it started to warn it again, then this
showed up in linux-next builds.
There are two ways to fix it:
- Remove __init
- Remove EXPORT_SYMBOL
I chose the latter for this case because the only in-tree call-site,
arch/x86/kernel/cpu/mshyperv.c is never compiled as modular.
(CONFIG_HYPERVISOR_GUEST is boolean)
Fixes: dd2cb34861 ("clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220606050238.4162200-1-masahiroy@kernel.org
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 880265c77a ]
If we're about to send the first layoutget for an empty layout, we want
to make sure that we drain out the existing pending layoutget calls
first. The reason is that these layouts may have been already implicitly
returned to the server by a recall to which the client gave a
NFS4ERR_NOMATCHING_LAYOUT response.
The problem is that wait_var_event_killable() could in principle see the
plh_outstanding count go back to '1' when the first process to wake up
starts sending a new layoutget. If it fails to get a layout, then this
loop can continue ad infinitum...
Fixes: 0b77f97a7e ("NFSv4/pnfs: Fix layoutget behaviour after invalidation")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe44fb23d6 ]
If the server tells us that a pNFS layout is not available for a
specific file, then we should not keep pounding it with further
layoutget requests.
Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 846bb97e13 ]
This commit changes the default Kconfig values of RANDOM_TRUST_CPU and
RANDOM_TRUST_BOOTLOADER to be Y by default. It does not change any
existing configs or change any kernel behavior. The reason for this is
several fold.
As background, I recently had an email thread with the kernel
maintainers of Fedora/RHEL, Debian, Ubuntu, Gentoo, Arch, NixOS, Alpine,
SUSE, and Void as recipients. I noted that some distros trust RDRAND,
some trust EFI, and some trust both, and I asked why or why not. There
wasn't really much of a "debate" but rather an interesting discussion of
what the historical reasons have been for this, and it came up that some
distros just missed the introduction of the bootloader Kconfig knob,
while another didn't want to enable it until there was a boot time
switch to turn it off for more concerned users (which has since been
added). The result of the rather uneventful discussion is that every
major Linux distro enables these two options by default.
While I didn't have really too strong of an opinion going into this
thread -- and I mostly wanted to learn what the distros' thinking was
one way or another -- ultimately I think their choice was a decent
enough one for a default option (which can be disabled at boot time).
I'll try to summarize the pros and cons:
Pros:
- The RNG machinery gets initialized super quickly, and there's no
messing around with subsequent blocking behavior.
- The bootloader mechanism is used by kexec in order for the prior
kernel to initialize the RNG of the next kernel, which increases
the entropy available to early boot daemons of the next kernel.
- Previous objections related to backdoors centered around
Dual_EC_DRBG-like kleptographic systems, in which observing some
amount of the output stream enables an adversary holding the right key
to determine the entire output stream.
This used to be a partially justified concern, because RDRAND output
was mixed into the output stream in varying ways, some of which may
have lacked pre-image resistance (e.g. XOR or an LFSR).
But this is no longer the case. Now, all usage of RDRAND and
bootloader seeds go through a cryptographic hash function. This means
that the CPU would have to compute a hash pre-image, which is not
considered to be feasible (otherwise the hash function would be
terribly broken).
- More generally, if the CPU is backdoored, the RNG is probably not the
realistic vector of choice for an attacker.
- These CPU or bootloader seeds are far from being the only source of
entropy. Rather, there is generally a pretty huge amount of entropy,
not all of which is credited, especially on CPUs that support
instructions like RDRAND. In other words, assuming RDRAND outputs all
zeros, an attacker would *still* have to accurately model every single
other entropy source also in use.
- The RNG now reseeds itself quite rapidly during boot, starting at 2
seconds, then 4, then 8, then 16, and so forth, so that other sources
of entropy get used without much delay.
- Paranoid users can set random.trust_{cpu,bootloader}=no in the kernel
command line, and paranoid system builders can set the Kconfig options
to N, so there's no reduction or restriction of optionality.
- It's a practical default.
- All the distros have it set this way. Microsoft and Apple trust it
too. Bandwagon.
Cons:
- RDRAND *could* still be backdoored with something like a fixed key or
limited space serial number seed or another indexable scheme like
that. (However, it's hard to imagine threat models where the CPU is
backdoored like this, yet people are still okay making *any*
computations with it or connecting it to networks, etc.)
- RDRAND *could* be defective, rather than backdoored, and produce
garbage that is in one way or another insufficient for crypto.
- Suggesting a *reduction* in paranoia, as this commit effectively does,
may cause some to question my personal integrity as a "security
person".
- Bootloader seeds and RDRAND are generally very difficult if not all
together impossible to audit.
Keep in mind that this doesn't actually change any behavior. This
is just a change in the default Kconfig value. The distros already are
shipping kernels that set things this way.
Ard made an additional argument in [1]:
We're at the mercy of firmware and micro-architecture anyway, given
that we are also relying on it to ensure that every instruction in
the kernel's executable image has been faithfully copied to memory,
and that the CPU implements those instructions as documented. So I
don't think firmware or ISA bugs related to RNGs deserve special
treatment - if they are broken, we should quirk around them like we
usually do. So enabling these by default is a step in the right
direction IMHO.
In [2], Phil pointed out that having this disabled masked a bug that CI
otherwise would have caught:
A clean 5.15.45 boots cleanly, whereas a downstream kernel shows the
static key warning (but it does go on to boot). The significant
difference is that our defconfigs set CONFIG_RANDOM_TRUST_BOOTLOADER=y
defining that on top of multi_v7_defconfig demonstrates the issue on
a clean 5.15.45. Conversely, not setting that option in a
downstream kernel build avoids the warning
[1] https://lore.kernel.org/lkml/CAMj1kXGi+ieviFjXv9zQBSaGyyzeGW_VpMpTLJK8PJb2QHEQ-w@mail.gmail.com/
[2] https://lore.kernel.org/lkml/c47c42e3-1d56-5859-a6ad-976a1a3381c6@raspberrypi.com/
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 77006f6edc ]
Currently if the APB or Debounce clocks aren't yet ready to be requested
the DW GPIO driver will correctly handle that by deferring the probe
procedure, but the error is still printed to the system log. It needlessly
pollutes the log since there was no real error but a request to postpone
the clock request procedure since the clocks subsystem hasn't been fully
initialized yet. Let's fix that by using the dev_err_probe method to print
the APB/clock request error status. It will correctly handle the deferred
probe situation and print the error if it actually happens.
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 842c3b3ddc ]
gcc-12 started warning about 'tracker' being used uninitialized:
drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c: In function ‘mlx5_do_bond’:
drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c:786:28: warning: ‘tracker’ is used uninitialized [-Wuninitialized]
786 | struct lag_tracker tracker;
| ^~~~~~~
which seems to be because it doesn't track how the use (and
initialization) is bound by the 'do_bond' flag.
But admittedly that 'do_bond' usage is fairly complicated, and involves
passing it around as an argument to helper functions, so it's somewhat
understandable that gcc doesn't see how that all works.
This function could be rewritten to make the use of that tracker
variable more obviously safe, but for now I'm just adding the forced
initialization of it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a4d480702 ]
Similar to the handling of play_deferred in commit 19cfe912c3
("Bluetooth: btusb: Fix memory leak in play_deferred"), we thought
a patch might be needed here as well.
Currently usb_submit_urb is called directly to submit deferred tx
urbs after unanchor them.
So the usb_giveback_urb_bh would failed to unref it in usb_unanchor_urb
and cause memory leak.
Put those urbs in tx_anchor to avoid the leak, and also fix the error
handling.
Signed-off-by: Xiaohui Zhang <xiaohuizhang@ruc.edu.cn>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220607083230.6182-1-xiaohuizhang@ruc.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>