[ Upstream commit 7c68fa2bdd ]
sk->sk_pacing_shift can be read and written without lock
synchronization. This patch adds annotations to
document this fact and avoid future syzbot complains.
This might also avoid unexpected false sharing
in sk_pacing_shift_update(), as the compiler
could remove the conditional check and always
write over sk->sk_pacing_shift :
if (sk->sk_pacing_shift != val)
sk->sk_pacing_shift = val;
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff61541cc6 ]
Commit
8062382c8d ("perf/x86/intel/bts: Add BTS PMU driver")
brought in a warning with the BTS buffer initialization
that is easily tripped with (assuming KPTI is disabled):
instantly throwing:
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 326 at arch/x86/events/intel/bts.c:86 bts_buffer_setup_aux+0x117/0x3d0
> Modules linked in:
> CPU: 2 PID: 326 Comm: perf Not tainted 5.4.0-rc8-00291-gceb9e77324fa #904
> RIP: 0010:bts_buffer_setup_aux+0x117/0x3d0
> Call Trace:
> rb_alloc_aux+0x339/0x550
> perf_mmap+0x607/0xc70
> mmap_region+0x76b/0xbd0
...
It appears to assume (for lost raisins) that PagePrivate() is set,
while later it actually tests for PagePrivate() before using
page_private().
Make it consistent and always check PagePrivate() before using
page_private().
Fixes: 8062382c8d ("perf/x86/intel/bts: Add BTS PMU driver")
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lkml.kernel.org/r/20191205142853.28894-2-alexander.shishkin@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a470552ee8 ]
Commit:
1c5fecb612 ("efi: Export Runtime Configuration Interface table to sysfs")
... added support for a Dell specific UEFI configuration table, but
failed to take into account that mapping the table should not be
attempted unless the table actually exists. If it doesn't exist,
the code usually fails silently unless pr_debug() prints are
enabled. However, on 32-bit PAE x86, the splat below is produced due
to the attempt to map the placeholder value EFI_INVALID_TABLE_ADDR
which we use for non-existing UEFI configuration tables, and which
equals ULONG_MAX.
memremap attempted on mixed range 0x00000000ffffffff size: 0x1e
WARNING: CPU: 1 PID: 1 at kernel/iomem.c:81 memremap+0x1a3/0x1c0
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.4.2-smp-mine #1
Hardware name: Hewlett-Packard HP Z400 Workstation/0B4Ch, BIOS 786G3 v03.61 03/05/2018
EIP: memremap+0x1a3/0x1c0
...
Call Trace:
? map_properties+0x473/0x473
? efi_rci2_sysfs_init+0x2c/0x154
? map_properties+0x473/0x473
? do_one_initcall+0x49/0x1d4
? parse_args+0x1e8/0x2a0
? do_early_param+0x7a/0x7a
? kernel_init_freeable+0x139/0x1c2
? rest_init+0x8e/0x8e
? kernel_init+0xd/0xf2
? ret_from_fork+0x2e/0x38
Fix this by checking whether the table exists before attempting to map it.
Reported-by: Richard Narron <comet.berkeley@gmail.com>
Tested-by: Richard Narron <comet.berkeley@gmail.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org
Fixes: 1c5fecb612 ("efi: Export Runtime Configuration Interface table to sysfs")
Link: https://lkml.kernel.org/r/20191210090945.11501-2-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce5c31db36 ]
At the moment, UBSAN report will be serialized using a spin_lock(). On
RT-systems, spinlocks are turned to rt_spin_lock and may sleep. This
will result to the following splat if the undefined behavior is in a
context that can sleep:
BUG: sleeping function called from invalid context at /src/linux/kernel/locking/rtmutex.c:968
in_atomic(): 1, irqs_disabled(): 128, pid: 3447, name: make
1 lock held by make/3447:
#0: 000000009a966332 (&mm->mmap_sem){++++}, at: do_page_fault+0x140/0x4f8
irq event stamp: 6284
hardirqs last enabled at (6283): [<ffff000011326520>] _raw_spin_unlock_irqrestore+0x90/0xa0
hardirqs last disabled at (6284): [<ffff0000113262b0>] _raw_spin_lock_irqsave+0x30/0x78
softirqs last enabled at (2430): [<ffff000010088ef8>] fpsimd_restore_current_state+0x60/0xe8
softirqs last disabled at (2427): [<ffff000010088ec0>] fpsimd_restore_current_state+0x28/0xe8
Preemption disabled at:
[<ffff000011324a4c>] rt_mutex_futex_unlock+0x4c/0xb0
CPU: 3 PID: 3447 Comm: make Tainted: G W 5.2.14-rt7-01890-ge6e057589653 #911
Call trace:
dump_backtrace+0x0/0x148
show_stack+0x14/0x20
dump_stack+0xbc/0x104
___might_sleep+0x154/0x210
rt_spin_lock+0x68/0xa0
ubsan_prologue+0x30/0x68
handle_overflow+0x64/0xe0
__ubsan_handle_add_overflow+0x10/0x18
__lock_acquire+0x1c28/0x2a28
lock_acquire+0xf0/0x370
_raw_spin_lock_irqsave+0x58/0x78
rt_mutex_futex_unlock+0x4c/0xb0
rt_spin_unlock+0x28/0x70
get_page_from_freelist+0x428/0x2b60
__alloc_pages_nodemask+0x174/0x1708
alloc_pages_vma+0x1ac/0x238
__handle_mm_fault+0x4ac/0x10b0
handle_mm_fault+0x1d8/0x3b0
do_page_fault+0x1c8/0x4f8
do_translation_fault+0xb8/0xe0
do_mem_abort+0x3c/0x98
el0_da+0x20/0x24
The spin_lock() will protect against multiple CPUs to output a report
together, I guess to prevent them from being interleaved. However, they
can still interleave with other messages (and even splat from
__might_sleep).
So the lock usefulness seems pretty limited. Rather than trying to
accomodate RT-system by switching to a raw_spin_lock(), the lock is now
completely dropped.
Link: http://lkml.kernel.org/r/20190920100835.14999-1-julien.grall@arm.com
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reported-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f9bd84a8a8 ]
For each I/O request, blkback first maps the foreign pages for the
request to its local pages. If an allocation of a local page for the
mapping fails, it should unmap every mapping already made for the
request.
However, blkback's handling mechanism for the allocation failure does
not mark the remaining foreign pages as unmapped. Therefore, the unmap
function merely tries to unmap every valid grant page for the request,
including the pages not mapped due to the allocation failure. On a
system that fails the allocation frequently, this problem leads to
following kernel crash.
[ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[ 372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40
[ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
[ 372.012562] Oops: 0002 [#1] SMP
[ 372.012566] Modules linked in: act_police sch_ingress cls_u32
...
[ 372.012746] Call Trace:
[ 372.012752] [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40
[ 372.012759] [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
...
[ 372.012802] [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
...
Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[ 0.000000] Initializing cgroup subsys cpuset
This commit fixes this problem by marking the grant pages of the given
request that didn't mapped due to the allocation failure as invalid.
Fixes: c6cc142dac ("xen-blkback: use balloon pages for all mappings")
Reviewed-by: David Woodhouse <dwmw@amazon.de>
Reviewed-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Paul Durrant <pdurrant@amazon.co.uk>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 030eab4f9f ]
Building the kernel on s390 with -Og produces the following warning:
WARNING: vmlinux.o(.text+0x28dabe): Section mismatch in reference from the function populate_section_memmap() to the function .meminit.text:__populate_section_memmap()
The function populate_section_memmap() references
the function __meminit __populate_section_memmap().
This is often because populate_section_memmap lacks a __meminit
annotation or the annotation of __populate_section_memmap is wrong.
While -Og is not supported, in theory this might still happen with
another compiler or on another architecture. So fix this by using the
correct section annotations.
[iii@linux.ibm.com: v2]
Link: http://lkml.kernel.org/r/20191030151639.41486-1-iii@linux.ibm.com
Link: http://lkml.kernel.org/r/20191028165549.14478-1-iii@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Oscar Salvador <OSalvador@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 72a81ad9d6 ]
If an SMT capable system is not IPL'ed from the first CPU the setup of
the physical to logical CPU mapping is broken: the IPL core gets CPU
number 0, but then the next core gets CPU number 1. Correct would be
that all SMT threads of CPU 0 get the subsequent logical CPU numbers.
This is important since a lot of code (like e.g. the CPU topology
code) assumes that CPU maps are setup like this. If the mapping is
broken the system will not IPL due to broken topology masks:
[ 1.716341] BUG: arch topology broken
[ 1.716342] the SMT domain not a subset of the MC domain
[ 1.716343] BUG: arch topology broken
[ 1.716344] the MC domain not a subset of the BOOK domain
This scenario can usually not happen since LPARs are always IPL'ed
from CPU 0 and also re-IPL is intiated from CPU 0. However older
kernels did initiate re-IPL on an arbitrary CPU. If therefore a re-IPL
from an old kernel into a new kernel is initiated this may lead to
crash.
Fix this by setting up the physical to logical CPU mapping correctly.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d53c9e672 ]
The btrfs writepages function collects a large range of pages flagged
for delayed allocation, and then sends them down through the COW code
for processing. When compression is on, we allocate one async_chunk
structure for every 512K, and then run those pages through the
compression code for IO submission.
writepages starts all of this off with a single page, locked by the
original call to extent_write_cache_pages(), and it's important to keep
track of this page because it has already been through
clear_page_dirty_for_io().
The btrfs async_chunk struct has a pointer to the locked_page, and when
we're redirtying the page because compression had to fallback to
uncompressed IO, we use page->index to decide if a given async_chunk
struct really owns that page.
But, this is racey. If a given delalloc range is broken up into two
async_chunks (chunkA and chunkB), we can end up with something like
this:
compress_file_range(chunkA)
submit_compress_extents(chunkA)
submit compressed bios(chunkA)
put_page(locked_page)
compress_file_range(chunkB)
...
Or:
async_cow_submit
submit_compressed_extents <--- falls back to buffered writeout
cow_file_range
extent_clear_unlock_delalloc
__process_pages_contig
put_page(locked_pages)
async_cow_submit
The end result is that chunkA is completed and cleaned up before chunkB
even starts processing. This means we can free locked_page() and reuse
it elsewhere. If we get really lucky, it'll have the same page->index
in its new home as it did before.
While we're processing chunkB, we might decide we need to fall back to
uncompressed IO, and so compress_file_range() will call
__set_page_dirty_nobufers() on chunkB->locked_page.
Without cgroups in use, this creates as a phantom dirty page, which
isn't great but isn't the end of the world. What can happen, it can go
through the fixup worker and the whole COW machinery again:
in submit_compressed_extents():
while (async extents) {
...
cow_file_range
if (!page_started ...)
extent_write_locked_range
else if (...)
unlock_page
continue;
This hasn't been observed in practice but is still possible.
With cgroups in use, we might crash in the accounting code because
page->mapping->i_wb isn't set.
BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
IP: percpu_counter_add_batch+0x11/0x70
PGD 66534e067 P4D 66534e067 PUD 66534f067 PMD 0
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU: 16 PID: 2172 Comm: rm Not tainted
RIP: 0010:percpu_counter_add_batch+0x11/0x70
RSP: 0018:ffffc9000a97bbe0 EFLAGS: 00010286
RAX: 0000000000000005 RBX: 0000000000000090 RCX: 0000000000026115
RDX: 0000000000000030 RSI: ffffffffffffffff RDI: 0000000000000090
RBP: 0000000000000000 R08: fffffffffffffff5 R09: 0000000000000000
R10: 00000000000260c0 R11: ffff881037fc26c0 R12: ffffffffffffffff
R13: ffff880fe4111548 R14: ffffc9000a97bc90 R15: 0000000000000001
FS: 00007f5503ced480(0000) GS:ffff880ff7200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000d0 CR3: 00000001e0459005 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
account_page_cleaned+0x15b/0x1f0
__cancel_dirty_page+0x146/0x200
truncate_cleanup_page+0x92/0xb0
truncate_inode_pages_range+0x202/0x7d0
btrfs_evict_inode+0x92/0x5a0
evict+0xc1/0x190
do_unlinkat+0x176/0x280
do_syscall_64+0x63/0x1a0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
The fix here is to make asyc_chunk->locked_page NULL everywhere but the
one async_chunk struct that's allowed to do things to the locked page.
Link: https://lore.kernel.org/linux-btrfs/c2419d01-5c84-3fb4-189e-4db519d08796@suse.com/
Fixes: 771ed689d2 ("Btrfs: Optimize compressed writeback and reads")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Chris Mason <clm@fb.com>
[ update changelog from mail thread discussion ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0cac0ec96 ]
Commit 9e0af23764 ("Btrfs: fix task hang under heavy compressed
write") worked around the issue that a recycled work item could get a
false dependency on the original work item due to how the workqueue code
guarantees non-reentrancy. It did so by giving different work functions
to different types of work.
However, the fixes in the previous few patches are more complete, as
they prevent a work item from being recycled at all (except for a tiny
window that the kernel workqueue code handles for us). This obsoletes
the previous fix, so we don't need the unique helpers for correctness.
The only other reason to keep them would be so they show up in stack
traces, but they always seem to be optimized to a tail call, so they
don't show up anyways. So, let's just get rid of the extra indirection.
While we're here, rename normal_work_helper() to the more informative
btrfs_work_helper().
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6abf572621 ]
Running stress-test test_2 in mtd-utils on ubi device, sometimes we can
get following oops message:
BUG: unable to handle page fault for address: ffffffff00000140
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 280a067 P4D 280a067 PUD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 60 Comm: kworker/u16:1 Kdump: loaded Not tainted 5.2.0 #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0
-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
Workqueue: writeback wb_workfn (flush-ubifs_0_0)
RIP: 0010:rb_next_postorder+0x2e/0xb0
Code: 80 db 03 01 48 85 ff 0f 84 97 00 00 00 48 8b 17 48 83 05 bc 80 db
03 01 48 83 e2 fc 0f 84 82 00 00 00 48 83 05 b2 80 db 03 01 <48> 3b 7a
10 48 89 d0 74 02 f3 c3 48 8b 52 08 48 83 05 a3 80 db 03
RSP: 0018:ffffc90000887758 EFLAGS: 00010202
RAX: ffff888129ae4700 RBX: ffff888138b08400 RCX: 0000000080800001
RDX: ffffffff00000130 RSI: 0000000080800024 RDI: ffff888138b08400
RBP: ffff888138b08400 R08: ffffea0004a6b920 R09: 0000000000000000
R10: ffffc90000887740 R11: 0000000000000001 R12: ffff888128d48000
R13: 0000000000000800 R14: 000000000000011e R15: 00000000000007c8
FS: 0000000000000000(0000) GS:ffff88813ba00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff00000140 CR3: 000000013789d000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
destroy_old_idx+0x5d/0xa0 [ubifs]
ubifs_tnc_start_commit+0x4fe/0x1380 [ubifs]
do_commit+0x3eb/0x830 [ubifs]
ubifs_run_commit+0xdc/0x1c0 [ubifs]
Above Oops are due to the slab-out-of-bounds happened in do-while of
function layout_in_gaps indirectly called by ubifs_tnc_start_commit. In
function layout_in_gaps, there is a do-while loop placing index nodes
into the gaps created by obsolete index nodes in non-empty index LEBs
until rest index nodes can totally be placed into pre-allocated empty
LEBs. @c->gap_lebs points to a memory area(integer array) which records
LEB numbers used by 'in-the-gaps' method. Whenever a fitable index LEB
is found, corresponding lnum will be incrementally written into the
memory area pointed by @c->gap_lebs. The size
((@c->lst.idx_lebs + 1) * sizeof(int)) of memory area is allocated before
do-while loop and can not be changed in the loop. But @c->lst.idx_lebs
could be increased by function ubifs_change_lp (called by
layout_leb_in_gaps->ubifs_find_dirty_idx_leb->get_idx_gc_leb) during the
loop. So, sometimes oob happens when number of cycles in do-while loop
exceeds the original value of @c->lst.idx_lebs. See detail in
https://bugzilla.kernel.org/show_bug.cgi?id=204229.
This patch fixes oob in layout_in_gaps.
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d1116d4c6 ]
Christoph Hellwig complained about the following soft lockup warning
when running scrub after generic/175 when preemption is disabled and
slub debugging is enabled:
watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [xfs_scrub:161]
Modules linked in:
irq event stamp: 41692326
hardirqs last enabled at (41692325): [<ffffffff8232c3b7>] _raw_0
hardirqs last disabled at (41692326): [<ffffffff81001c5a>] trace0
softirqs last enabled at (41684994): [<ffffffff8260031f>] __do_e
softirqs last disabled at (41684987): [<ffffffff81127d8c>] irq_e0
CPU: 3 PID: 16189 Comm: xfs_scrub Not tainted 5.4.0-rc3+ #30
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.124
RIP: 0010:_raw_spin_unlock_irqrestore+0x39/0x40
Code: 89 f3 be 01 00 00 00 e8 d5 3a e5 fe 48 89 ef e8 ed 87 e5 f2
RSP: 0018:ffffc9000233f970 EFLAGS: 00000286 ORIG_RAX: ffffffffff3
RAX: ffff88813b398040 RBX: 0000000000000286 RCX: 0000000000000006
RDX: 0000000000000006 RSI: ffff88813b3988c0 RDI: ffff88813b398040
RBP: ffff888137958640 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00042b0c00
R13: 0000000000000001 R14: ffff88810ac32308 R15: ffff8881376fc040
FS: 00007f6113dea700(0000) GS:ffff88813bb80000(0000) knlGS:00000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6113de8ff8 CR3: 000000012f290000 CR4: 00000000000006e0
Call Trace:
free_debug_processing+0x1dd/0x240
__slab_free+0x231/0x410
kmem_cache_free+0x30e/0x360
xchk_ag_btcur_free+0x76/0xb0
xchk_ag_free+0x10/0x80
xchk_bmap_iextent_xref.isra.14+0xd9/0x120
xchk_bmap_iextent+0x187/0x210
xchk_bmap+0x2e0/0x3b0
xfs_scrub_metadata+0x2e7/0x500
xfs_ioc_scrub_metadata+0x4a/0xa0
xfs_file_ioctl+0x58a/0xcd0
do_vfs_ioctl+0xa0/0x6f0
ksys_ioctl+0x5b/0x90
__x64_sys_ioctl+0x11/0x20
do_syscall_64+0x4b/0x1a0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
If preemption is disabled, all metadata buffers needed to perform the
scrub are already in memory, and there are a lot of records to check,
it's possible that the scrub thread will run for an extended period of
time without sleeping for IO or any other reason. Then the watchdog
timer or the RCU stall timeout can trigger, producing the backtrace
above.
To fix this problem, call cond_resched() from the scrub thread so that
we back out to the scheduler whenever necessary.
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 6b2daec190 upstream.
Unlike FICLONE, all of those take a pointer argument; they do need
compat_ptr() applied to arg.
Fixes: d79bdd52d8 ("vfs: wire up compat ioctl for CLONE/CLONE_RANGE")
Fixes: 54dbc15172 ("vfs: hoist the btrfs deduplication ioctl to the vfs")
Fixes: ceac204e1d ("fs: make fiemap work from compat_ioctl")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df3df923b3 upstream.
We don't care about module aliasing validation in
cifs_compose_mount_options(..., is_smb3) when finding the root SMB
session of an DFS namespace in order to refresh DFS referral cache.
The following issue has been observed when mounting with '-t smb3' and
then specifying 'vers=2.0':
...
Nov 08 15:27:08 tw kernel: address conversion returned 0 for FS0.WIN.LOCAL
Nov 08 15:27:08 tw kernel: [kworke] ==> dns_query((null),FS0.WIN.LOCAL,13,(null))
Nov 08 15:27:08 tw kernel: [kworke] call request_key(,FS0.WIN.LOCAL,)
Nov 08 15:27:08 tw kernel: [kworke] ==> dns_resolver_cmp(FS0.WIN.LOCAL,FS0.WIN.LOCAL)
Nov 08 15:27:08 tw kernel: [kworke] <== dns_resolver_cmp() = 1
Nov 08 15:27:08 tw kernel: [kworke] <== dns_query() = 13
Nov 08 15:27:08 tw kernel: fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: resolved: FS0.WIN.LOCAL to 192.168.30.26
===> Nov 08 15:27:08 tw kernel: CIFS VFS: vers=2.0 not permitted when mounting with smb3
Nov 08 15:27:08 tw kernel: fs/cifs/dfs_cache.c: CIFS VFS: leaving refresh_tcon (xid = 26) rc = -22
...
Fixes: 5072010ccf ("cifs: Fix DFS cache refresher for DFS links")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e4f7f920a upstream.
As the commit 677fe555cb ("serial: imx: Fix recursive locking bug")
has mentioned the uart driver might cause recursive locking between
normal printing and the kernel debugging facilities (e.g. sysrq and
oops). In the commit it gave out suggestion for fixing recursive
locking issue: "The solution is to avoid locking in the sysrq case
and trylock in the oops_in_progress case."
This patch follows the suggestion (also used the exactly same code with
other serial drivers, e.g. amba-pl011.c) to fix the recursive locking
issue, this can avoid stuck caused by deadlock and print out log for
sysrq and oops.
Fixes: 04896a77a9 ("msm_serial: serial driver for MSM7K onboard serial peripheral.")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Link: https://lore.kernel.org/r/20191127141544.4277-2-leo.yan@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b049e03ca5 upstream.
As platform_get_irq() now prints an error when the interrupt does not
exist, a scary warning may be printed for an optional interrupt:
phy_rcar_gen3_usb2 ee0a0200.usb-phy: IRQ index 0 not found
Fix this by calling platform_get_irq_optional() instead.
Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2289adbfa5 upstream.
In af9005_identify_state when returning -EIO the allocated buffer should
be released. Replace the "return -EIO" with assignment into ret and move
deb_info() under a check.
Fixes: af4e067e1d ("V4L/DVB (5625): Add support for the AF9005 demodulator from Afatech")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99c4f70df3 upstream.
The USB regulator was removed for AB8500 in
commit 41a06aa738 ("regulator: ab8500: Remove USB regulator").
It was then added for AB8505 in
commit 547f384f33 ("regulator: ab8500: add support for ab8505").
However, there was never an entry added for it in
ab8505_regulator_match. This causes all regulators after it
to be initialized with the wrong device tree data, eventually
leading to an out-of-bounds array read.
Given that it is not used anywhere in the kernel, it seems
likely that similar arguments against supporting it exist for
AB8505 (it is controlled by hardware).
Therefore, simply remove it like for AB8500 instead of adding
an entry in ab8505_regulator_match.
Fixes: 547f384f33 ("regulator: ab8500: add support for ab8505")
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20191106173125.14496-1-stephan@gerhold.net
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 74a96b51a3 upstream.
An earlier commit hard coded a return 0 to function flexcop_usb_i2c_req
even though the an -EIO was intended to be returned in the case where
ret != buflen. Fix this by replacing the return 0 with the return of
ret to return the error return code.
Addresses-Coverity: ("Unused value")
Fixes: b430eaba0b ("[media] flexcop-usb: don't use stack for DMA")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d088337c38 upstream.
In the implementation of hci_connect_le_scan() when conn is added via
hci_conn_add(), if hci_explicit_conn_params_set() fails the allocated
memory for conn is leaked. Use hci_conn_del() to release it.
Fixes: f75113a260 ("Bluetooth: add hci_connect_le_scan")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df66499a1f upstream.
We used to take a lock in amp_physical_cfm() but then we moved it to
the caller function. Unfortunately the unlock on this error path was
overlooked so it leads to a double unlock.
Fixes: a514b17fab ("Bluetooth: Refactor locking in amp_physical_cfm")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d44a6fd07 upstream.
If setup() fails a reference for runtime PM has already
been taken. Proper use of the error handling in btusb_open()is needed.
You cannot just return.
Fixes: ace3198258 ("Bluetooth: btusb: Add setup callback for chip init on USB")
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 91a063c956 upstream.
These slice routines are called from the SLB miss handler, which can
lead to warnings from the IRQ code, because we have not reconciled the
IRQ state properly:
WARNING: CPU: 72 PID: 30150 at arch/powerpc/kernel/irq.c:258 arch_local_irq_restore.part.0+0xcc/0x100
Modules linked in:
CPU: 72 PID: 30150 Comm: ftracetest Not tainted 5.5.0-rc2-gcc9x-g7e0165b2f1a9 #1
NIP: c00000000001d83c LR: c00000000029ab90 CTR: c00000000026cf90
REGS: c0000007eee3b960 TRAP: 0700 Not tainted (5.5.0-rc2-gcc9x-g7e0165b2f1a9)
MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 22242844 XER: 20000000
CFAR: c00000000001d780 IRQMASK: 0
...
NIP arch_local_irq_restore.part.0+0xcc/0x100
LR trace_graph_entry+0x270/0x340
Call Trace:
trace_graph_entry+0x254/0x340 (unreliable)
function_graph_enter+0xe4/0x1a0
prepare_ftrace_return+0xa0/0x130
ftrace_graph_caller+0x44/0x94 # (get_slice_psize())
slb_allocate_user+0x7c/0x100
do_slb_fault+0xf8/0x300
instruction_access_slb_common+0x140/0x180
Fixes: 48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191221121337.4894-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f40ddaa059 upstream.
A copy-paste error was introduced when bitmasks were converted to
macros, incorrectly setting the enable bitmask for ELDO2 to the one
for ELDO1 for the AXP22x units.
Fix it by using the correct macro.
On affected boards, ELDO1 and/or ELDO2 are used to power the camera,
which is currently unsupported.
Fixes: db4a555f7c ("regulator: axp20x: use defines for masks")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Link: https://lore.kernel.org/r/20191218044720.21990-1-wens@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9cd34efbd3 upstream.
Rx threshold means the value to inform the receiver when the number of words
in Rx FIFO is equal to or more than the value. Similarly, Tx threshold means
the value to inform the sender when the number of words in Tx FIFO is equal
to or less than the value. The controller triggers the driver to start
the transfer.
In case of Rx, the driver wants to detect that the specified number of words
N are in Rx FIFO, so the value of Rx threshold should be N. In case of Tx,
the driver wants to detect that the same number of spaces as Rx are in
Tx FIFO, so the value of Tx threshold should be (FIFO size - N).
For example, in order for the driver to receive at least 3 words from
Rx FIFO, set 3 to Rx threshold.
+-+-+-+-+-+-+-+-+
| | | | | |*|*|*|
+-+-+-+-+-+-+-+-+
In order for the driver to send at least 3 words to Tx FIFO, because
it needs at least 3 spaces, set 8(FIFO size) - 3 = 5 to Tx threshold.
+-+-+-+-+-+-+-+-+
|*|*|*|*|*| | | |
+-+-+-+-+-+-+-+-+
This adds new function uniphier_spi_set_fifo_threshold() to set
threshold value to the register.
And more, FIFO counts by 'words', so this renames 'fill_bytes' with
'fill_words', and fixes the calculation using bytes_per_words.
Fixes: 37ffab8170 ("spi: uniphier: introduce polling mode")
Cc: Keiji Hayashibara <hayashibara.keiji@socionext.com>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Link: https://lore.kernel.org/r/1577149107-30670-2-git-send-email-hayashi.kunihiko@socionext.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1148f9adbe upstream.
proc_dointvec_minmax_bpf_restricted() has been firstly introduced
in commit 2e4a30983b ("bpf: restrict access to core bpf sysctls")
under CONFIG_HAVE_EBPF_JIT. Then, this ifdef has been removed in
ede95a63b5 ("bpf: add bpf_jit_limit knob to restrict unpriv
allocations"), because a new sysctl, bpf_jit_limit, made use of it.
Finally, this parameter has become long instead of integer with
fdadd04931 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
and thus, a new proc_dolongvec_minmax_bpf_restricted() has been
added.
With this last change, we got back to that
proc_dointvec_minmax_bpf_restricted() is used only under
CONFIG_HAVE_EBPF_JIT, but the corresponding ifdef has not been
brought back.
So, in configurations like CONFIG_BPF_JIT=y && CONFIG_HAVE_EBPF_JIT=n
since v4.20 we have:
CC net/core/sysctl_net_core.o
net/core/sysctl_net_core.c:292:1: warning: ‘proc_dointvec_minmax_bpf_restricted’ defined but not used [-Wunused-function]
292 | proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Suppress this by guarding it with CONFIG_HAVE_EBPF_JIT again.
Fixes: fdadd04931 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191218091821.7080-1-alobakin@dlink.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b9173f468 upstream.
Bridge packets that are forwarded have skb->dst == NULL and get
dropped by the check introduced by
b60a77386b (net: make skb_dst_force
return true when dst is refcounted).
To fix this we check skb_dst() before skb_dst_force(), so we don't
drop skb packet with dst == NULL. This holds also for skb at the
PRE_ROUTING hook so we remove the second check.
Fixes: b60a77386b ("net: make skb_dst_force return true when dst is refcounted")
Signed-off-by: Marco Oliverio <marco.oliverio@tanaza.com>
Signed-off-by: Rocco Folino <rocco.folino@tanaza.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e8796c6c69 upstream.
The CONNECT X300 uses the PMC clock for on-board components and gets
stuck during boot if the clock is disabled. Therefore, add this
device to the critical systems list.
Tested on CONNECT X300.
Fixes: 648e921888 ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL")
Signed-off-by: Michael Haener <michael.haener@siemens.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69ffe5960d upstream.
Commit 5b094d6dac ("xfs: fix multi-AG deadlock in xfs_bunmapi") added
a check in __xfs_bunmapi() to stop early if we would touch multiple AGs
in the wrong order. However, this check isn't applicable for realtime
files. In most cases, it just makes us do unnecessary commits. However,
without the fix from the previous commit ("xfs: fix realtime file data
space leak"), if the last and second-to-last extents also happen to have
different "AG numbers", then the break actually causes __xfs_bunmapi()
to return without making any progress, which sends
xfs_itruncate_extents_flags() into an infinite loop.
Fixes: 5b094d6dac ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8305e90a89 upstream.
If device_register() fails, both put_device() and kfree() are called,
ending with a double free of the scmi_dev.
Calling kfree() is needed only when a failure happens between the
allocation of the scmi_dev and its registration, so move it to there
and remove it from the error flow.
Fixes: 46edb8d132 ("firmware: arm_scmi: provide the mandatory device release callback")
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84a1f5b1cc upstream.
We used to skip reconnects on all SMB2_IOCTL commands due to SMB3+
FSCTL_VALIDATE_NEGOTIATE_INFO - which made sense since we're still
establishing a SMB session.
However, when refresh_cache_worker() calls smb2_get_dfs_refer() and
we're under reconnect, SMB2_ioctl() will not be able to get a proper
status error (e.g. -EHOSTDOWN in case we failed to reconnect) but an
-EAGAIN from cifs_send_recv() thus looping forever in
refresh_cache_worker().
Fixes: e99c63e4d8 ("SMB3: Fix deadlock in validate negotiate hits reconnect")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Suggested-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 637392a850 upstream.
No changeset entries are created for #address-cells and #size-cells
properties, but the duplicated properties are never freed. This
results in a memory leak which is detected by kmemleak:
unreferenced object 0x85887180 (size 64):
backtrace:
kmem_cache_alloc_trace+0x1fb/0x1fc
__of_prop_dup+0x25/0x7c
add_changeset_property+0x17f/0x370
build_changeset_next_level+0x29/0x20c
of_overlay_fdt_apply+0x32b/0x6b4
...
Fixes: 6f75118800 ("of: overlay: validate overlay properties #address-cells and #size-cells")
Reported-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Frank Rowand <frank.rowand@sony.com>
Tested-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f81b846dcd upstream.
The PSI (Page Selective Invalidation) bit in the capability register
is only valid for second-level translation. Intel IOMMU supporting
scalable mode must support page/address selective IOTLB invalidation
for first-level translation. Remove the PSI capability check in SVA
cache invalidation code.
Fixes: 8744daf4b0 ("iommu/vt-d: Remove global page flush support")
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7583e72a5 upstream.
The commit 0f27cff859 ("ACPI: sysfs: Make ACPI GPE mask kernel
parameter cover all GPEs") says:
"Use a bitmap of size 0xFF instead of a u64 for the GPE mask so 256
GPEs can be masked"
But the masking of GPE 0xFF it not supported and the check condition
"gpe > ACPI_MASKABLE_GPE_MAX" is not valid because the type of gpe is
u8.
So modify the macro ACPI_MASKABLE_GPE_MAX to 0x100, and drop the "gpe >
ACPI_MASKABLE_GPE_MAX" check. In addition, update the docs "Format" for
acpi_mask_gpe parameter.
Fixes: 0f27cff859 ("ACPI: sysfs: Make ACPI GPE mask kernel parameter cover all GPEs")
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
[ rjw: Use u16 as gpe data type in acpi_gpe_apply_masked_gpes() ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d60331161 upstream.
Fix the race between load and unload a kernel module.
sys_delete_module()
try_stop_module()
mod->state = _GOING
add_unformed_module()
old = find_module_all()
(old->state == _GOING =>
wait_event_interruptible())
During pre-condition
finished_loading() rets 0
schedule()
(never gets waken up later)
free_module()
mod->state = _UNFORMED
list_del_rcu(&mod->list)
(dels mod from "modules" list)
return
The race above leads to modprobe hanging forever on loading
a module.
Error paths on loading module call wake_up_all(&module_wq) after
freeing module, so let's do the same on straight module unload.
Fixes: 6e6de3dee5 ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fd70466d37 upstream.
Commit 52cf93e63e ("HID: i2c-hid: Don't reset device upon system
resume") fixes many touchpads and touchscreens, however ALPS touchpads
start to trigger IRQ storm after system resume.
Since it's total silence from ALPS, let's bring the old behavior back
to ALPS touchpads.
Fixes: 52cf93e63e ("HID: i2c-hid: Don't reset device upon system resume")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>