[ Upstream commit 243bce09c9 ]
Chris Murphy reports that a slightly overcommitted load, testing swap
and zram along with i915, splats and keeps on splatting, when it had
better fail less noisily:
gnome-shell: page allocation failure: order:0,
mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE),
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 2 PID: 1155 Comm: gnome-shell Not tainted 5.7.0-1.fc33.x86_64 #1
Call Trace:
dump_stack+0x64/0x88
warn_alloc.cold+0x75/0xd9
__alloc_pages_slowpath.constprop.0+0xcfa/0xd30
__alloc_pages_nodemask+0x2df/0x320
alloc_slab_page+0x195/0x310
allocate_slab+0x3c5/0x440
___slab_alloc+0x40c/0x5f0
__slab_alloc+0x1c/0x30
kmem_cache_alloc+0x20e/0x220
xas_nomem+0x28/0x70
add_to_swap_cache+0x321/0x400
__read_swap_cache_async+0x105/0x240
swap_cluster_readahead+0x22c/0x2e0
shmem_swapin+0x8e/0xc0
shmem_swapin_page+0x196/0x740
shmem_getpage_gfp+0x3a2/0xa60
shmem_read_mapping_page_gfp+0x32/0x60
shmem_get_pages+0x155/0x5e0 [i915]
__i915_gem_object_get_pages+0x68/0xa0 [i915]
i915_vma_pin+0x3fe/0x6c0 [i915]
eb_add_vma+0x10b/0x2c0 [i915]
i915_gem_do_execbuffer+0x704/0x3430 [i915]
i915_gem_execbuffer2_ioctl+0x1ea/0x3e0 [i915]
drm_ioctl_kernel+0x86/0xd0 [drm]
drm_ioctl+0x206/0x390 [drm]
ksys_ioctl+0x82/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5b/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported on 5.7, but it goes back really to 3.1: when
shmem_read_mapping_page_gfp() was implemented for use by i915, and
allowed for __GFP_NORETRY and __GFP_NOWARN flags in most places, but
missed swapin's "& GFP_KERNEL" mask for page tree node allocation in
__read_swap_cache_async() - that was to mask off HIGHUSER_MOVABLE bits
from what page cache uses, but GFP_RECLAIM_MASK is now what's needed.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208085
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2006151330070.11064@eggly.anvils
Fixes: 68da9f0557 ("tmpfs: pass gfp to shmem_getpage_gfp")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Chris Murphy <lists@colorremedies.com>
Analyzed-by: Vlastimil Babka <vbabka@suse.cz>
Analyzed-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Chris Murphy <lists@colorremedies.com>
Cc: <stable@vger.kernel.org> [3.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 432cd2a10f ]
When running relocation of a data block group while scrub is running in
parallel, it is possible that the relocation will fail and abort the
current transaction with an -EINVAL error:
[134243.988595] BTRFS info (device sdc): found 14 extents, stage: move data extents
[134243.999871] ------------[ cut here ]------------
[134244.000741] BTRFS: Transaction aborted (error -22)
[134244.001692] WARNING: CPU: 0 PID: 26954 at fs/btrfs/ctree.c:1071 __btrfs_cow_block+0x6a7/0x790 [btrfs]
[134244.003380] Modules linked in: btrfs blake2b_generic xor raid6_pq (...)
[134244.012577] CPU: 0 PID: 26954 Comm: btrfs Tainted: G W 5.6.0-rc7-btrfs-next-58 #5
[134244.014162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[134244.016184] RIP: 0010:__btrfs_cow_block+0x6a7/0x790 [btrfs]
[134244.017151] Code: 48 c7 c7 (...)
[134244.020549] RSP: 0018:ffffa41607863888 EFLAGS: 00010286
[134244.021515] RAX: 0000000000000000 RBX: ffff9614bdfe09c8 RCX: 0000000000000000
[134244.022822] RDX: 0000000000000001 RSI: ffffffffb3d63980 RDI: 0000000000000001
[134244.024124] RBP: ffff961589e8c000 R08: 0000000000000000 R09: 0000000000000001
[134244.025424] R10: ffffffffc0ae5955 R11: 0000000000000000 R12: ffff9614bd530d08
[134244.026725] R13: ffff9614ced41b88 R14: ffff9614bdfe2a48 R15: 0000000000000000
[134244.028024] FS: 00007f29b63c08c0(0000) GS:ffff9615ba600000(0000) knlGS:0000000000000000
[134244.029491] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[134244.030560] CR2: 00007f4eb339b000 CR3: 0000000130d6e006 CR4: 00000000003606f0
[134244.031997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[134244.033153] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[134244.034484] Call Trace:
[134244.034984] btrfs_cow_block+0x12b/0x2b0 [btrfs]
[134244.035859] do_relocation+0x30b/0x790 [btrfs]
[134244.036681] ? do_raw_spin_unlock+0x49/0xc0
[134244.037460] ? _raw_spin_unlock+0x29/0x40
[134244.038235] relocate_tree_blocks+0x37b/0x730 [btrfs]
[134244.039245] relocate_block_group+0x388/0x770 [btrfs]
[134244.040228] btrfs_relocate_block_group+0x161/0x2e0 [btrfs]
[134244.041323] btrfs_relocate_chunk+0x36/0x110 [btrfs]
[134244.041345] btrfs_balance+0xc06/0x1860 [btrfs]
[134244.043382] ? btrfs_ioctl_balance+0x27c/0x310 [btrfs]
[134244.045586] btrfs_ioctl_balance+0x1ed/0x310 [btrfs]
[134244.045611] btrfs_ioctl+0x1880/0x3760 [btrfs]
[134244.049043] ? do_raw_spin_unlock+0x49/0xc0
[134244.049838] ? _raw_spin_unlock+0x29/0x40
[134244.050587] ? __handle_mm_fault+0x11b3/0x14b0
[134244.051417] ? ksys_ioctl+0x92/0xb0
[134244.052070] ksys_ioctl+0x92/0xb0
[134244.052701] ? trace_hardirqs_off_thunk+0x1a/0x1c
[134244.053511] __x64_sys_ioctl+0x16/0x20
[134244.054206] do_syscall_64+0x5c/0x280
[134244.054891] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[134244.055819] RIP: 0033:0x7f29b51c9dd7
[134244.056491] Code: 00 00 00 (...)
[134244.059767] RSP: 002b:00007ffcccc1dd08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[134244.061168] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f29b51c9dd7
[134244.062474] RDX: 00007ffcccc1dda0 RSI: 00000000c4009420 RDI: 0000000000000003
[134244.063771] RBP: 0000000000000003 R08: 00005565cea4b000 R09: 0000000000000000
[134244.065032] R10: 0000000000000541 R11: 0000000000000202 R12: 00007ffcccc2060a
[134244.066327] R13: 00007ffcccc1dda0 R14: 0000000000000002 R15: 00007ffcccc1dec0
[134244.067626] irq event stamp: 0
[134244.068202] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[134244.069351] hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
[134244.070909] softirqs last enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
[134244.072392] softirqs last disabled at (0): [<0000000000000000>] 0x0
[134244.073432] ---[ end trace bd7c03622e0b0a99 ]---
The -EINVAL error comes from the following chain of function calls:
__btrfs_cow_block() <-- aborts the transaction
btrfs_reloc_cow_block()
replace_file_extents()
get_new_location() <-- returns -EINVAL
When relocating a data block group, for each allocated extent of the block
group, we preallocate another extent (at prealloc_file_extent_cluster()),
associated with the data relocation inode, and then dirty all its pages.
These preallocated extents have, and must have, the same size that extents
from the data block group being relocated have.
Later before we start the relocation stage that updates pointers (bytenr
field of file extent items) to point to the the new extents, we trigger
writeback for the data relocation inode. The expectation is that writeback
will write the pages to the previously preallocated extents, that it
follows the NOCOW path. That is generally the case, however, if a scrub
is running it may have turned the block group that contains those extents
into RO mode, in which case writeback falls back to the COW path.
However in the COW path instead of allocating exactly one extent with the
expected size, the allocator may end up allocating several smaller extents
due to free space fragmentation - because we tell it at cow_file_range()
that the minimum allocation size can match the filesystem's sector size.
This later breaks the relocation's expectation that an extent associated
to a file extent item in the data relocation inode has the same size as
the respective extent pointed by a file extent item in another tree - in
this case the extent to which the relocation inode poins to is smaller,
causing relocation.c:get_new_location() to return -EINVAL.
For example, if we are relocating a data block group X that has a logical
address of X and the block group has an extent allocated at the logical
address X + 128KiB with a size of 64KiB:
1) At prealloc_file_extent_cluster() we allocate an extent for the data
relocation inode with a size of 64KiB and associate it to the file
offset 128KiB (X + 128KiB - X) of the data relocation inode. This
preallocated extent was allocated at block group Z;
2) A scrub running in parallel turns block group Z into RO mode and
starts scrubing its extents;
3) Relocation triggers writeback for the data relocation inode;
4) When running delalloc (btrfs_run_delalloc_range()), we try first the
NOCOW path because the data relocation inode has BTRFS_INODE_PREALLOC
set in its flags. However, because block group Z is in RO mode, the
NOCOW path (run_delalloc_nocow()) falls back into the COW path, by
calling cow_file_range();
5) At cow_file_range(), in the first iteration of the while loop we call
btrfs_reserve_extent() to allocate a 64KiB extent and pass it a minimum
allocation size of 4KiB (fs_info->sectorsize). Due to free space
fragmentation, btrfs_reserve_extent() ends up allocating two extents
of 32KiB each, each one on a different iteration of that while loop;
6) Writeback of the data relocation inode completes;
7) Relocation proceeds and ends up at relocation.c:replace_file_extents(),
with a leaf which has a file extent item that points to the data extent
from block group X, that has a logical address (bytenr) of X + 128KiB
and a size of 64KiB. Then it calls get_new_location(), which does a
lookup in the data relocation tree for a file extent item starting at
offset 128KiB (X + 128KiB - X) and belonging to the data relocation
inode. It finds a corresponding file extent item, however that item
points to an extent that has a size of 32KiB, which doesn't match the
expected size of 64KiB, resuling in -EINVAL being returned from this
function and propagated up to __btrfs_cow_block(), which aborts the
current transaction.
To fix this make sure that at cow_file_range() when we call the allocator
we pass it a minimum allocation size corresponding the desired extent size
if the inode belongs to the data relocation tree, otherwise pass it the
filesystem's sector size as the minimum allocation size.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fecd13202 ]
When removing a block group, if we fail to delete the block group's item
from the extent tree, we jump to the 'out' label and end up decrementing
the block group's reference count once only (by 1), resulting in a counter
leak because the block group at that point was already removed from the
block group cache rbtree - so we have to decrement the reference count
twice, once for the rbtree and once for our lookup at the start of the
function.
There is a second bug where if removing the free space tree entries (the
call to remove_block_group_free_space()) fails we end up jumping to the
'out_put_group' label but end up decrementing the reference count only
once, when we should have done it twice, since we have already removed
the block group from the block group cache rbtree. This happens because
the reference count decrement for the rbtree reference happens after
attempting to remove the free space tree entries, which is far away from
the place where we remove the block group from the rbtree.
To make things less error prone, decrement the reference count for the
rbtree immediately after removing the block group from it. This also
eleminates the need for two different exit labels on error, renaming
'out_put_label' to just 'out' and removing the old 'out'.
Fixes: f6033c5e33 ("btrfs: fix block group leak when removing fails")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d0c7feaf87 ]
We recently used fuzz(hydra) to test XFS and automatically generate
tmp.img(XFS v5 format, but some metadata is wrong)
xfs_repair information(just one AG):
agf_freeblks 0, counted 3224 in ag 0
agf_longest 536874136, counted 3224 in ag 0
sb_fdblocks 613, counted 3228
Test as follows:
mount tmp.img tmpdir
cp file1M tmpdir
sync
In 4.19-stable, sync will stuck, the reason is:
xfs_mountfs
xfs_check_summary_counts
if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
!xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
return 0; -->just return, incore sb_fdblocks still be 613
xfs_initialize_perag_data
cp file1M tmpdir -->ok(write file to pagecache)
sync -->stuck(write pagecache to disk)
xfs_map_blocks
xfs_iomap_write_allocate
while (count_fsb != 0) {
nimaps = 0;
while (nimaps == 0) { --> endless loop
nimaps = 1;
xfs_bmapi_write(..., &nimaps) --> nimaps becomes 0 again
xfs_bmapi_write
xfs_bmap_alloc
xfs_bmap_btalloc
xfs_alloc_vextent
xfs_alloc_fix_freelist
xfs_alloc_space_available -->fail(agf_freeblks is 0)
In linux-next, sync not stuck, cause commit c2b3164320 ("xfs:
use the latest extent at writeback delalloc conversion time") remove
the above while, dmesg is as follows:
[ 55.250114] XFS (loop0): page discard on page ffffea0008bc7380, inode 0x1b0c, offset 0.
Users do not know why this page is discard, the better soultion is:
1. Like xfs_repair, make sure sb_fdblocks is equal to counted
(xfs_initialize_perag_data did this, who is not called at this mount)
2. Add agf verify, if fail, will tell users to repair
This patch use the second soultion.
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Ren Xudong <renxudong1@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit d03727b248 upstream.
Figuring out the root case for the REMOVE/CLOSE race and
suggesting the solution was done by Neil Brown.
Currently what happens is that direct IO calls hold a reference
on the open context which is decremented as an asynchronous task
in the nfs_direct_complete(). Before reference is decremented,
control is returned to the application which is free to close the
file. When close is being processed, it decrements its reference
on the open_context but since directIO still holds one, it doesn't
sent a close on the wire. It returns control to the application
which is free to do other operations. For instance, it can delete a
file. Direct IO is finally releasing its reference and triggering
an asynchronous close. Which races with the REMOVE. On the server,
REMOVE can be processed before the CLOSE, failing the REMOVE with
EACCES as the file is still opened.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Suggested-by: Neil Brown <neilb@suse.com>
CC: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89a3c9f5b9 upstream.
@subbuf is an output parameter of xdr_buf_subsegment(). A survey of
call sites shows that @subbuf is always uninitialized before
xdr_buf_segment() is invoked by callers.
There are some execution paths through xdr_buf_subsegment() that do
not set all of the fields in @subbuf, leaving some pointer fields
containing garbage addresses. Subsequent processing of that buffer
then results in a page fault.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6784beada6 upstream.
Fix the event trigger to accept redundant spaces in
the trigger input.
For example, these return -EINVAL
echo " traceon" > events/ftrace/print/trigger
echo "traceon if common_pid == 0" > events/ftrace/print/trigger
echo "disable_event:kmem:kmalloc " > events/ftrace/print/trigger
But these are hard to find what is wrong.
To fix this issue, use skip_spaces() to remove spaces
in front of actual tokens, and set NULL if there is no
token.
Link: http://lkml.kernel.org/r/159262476352.185015.5261566783045364186.stgit@devnote2
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 85f2b08268 ("tracing: Add basic event trigger framework")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1b0b283648 ]
We use one blktrace per request_queue, that means one per the entire
disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1,
or just two calls on /dev/vda.
We check for concurrent setup only at the very end of the blktrace setup though.
If we try to run two concurrent blktraces on the same block device the
second one will fail, and the first one seems to go on. However when
one tries to kill the first one one will see things like this:
The kernel will show these:
```
debugfs: File 'dropped' in directory 'nvme1n1' already present!
debugfs: File 'msg' in directory 'nvme1n1' already present!
debugfs: File 'trace0' in directory 'nvme1n1' already present!
``
And userspace just sees this error message for the second call:
```
blktrace /dev/nvme1n1
BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error
```
The first userspace process #1 will also claim that the files
were taken underneath their nose as well. The files are taken
away form the first process given that when the second blktrace
fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN.
This means that even if go-happy process #1 is waiting for blktrace
data, we *have* been asked to take teardown the blktrace.
This can easily be reproduced with break-blktrace [0] run_0005.sh test.
Just break out early if we know we're already going to fail, this will
prevent trying to create the files all over again, which we know still
exist.
[0] https://github.com/mcgrof/break-blktrace
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f2f02ebd8f ]
When cc-option and friends evaluate compiler flags, the temporary file
$$TMP is created as an output object, and automatically cleaned up.
The actual file path of $$TMP is .<pid>.tmp, here <pid> is the process
ID of $(shell ...) invoked from cc-option. (Please note $$$$ is the
escape sequence of $$).
Such garbage files are cleaned up in most cases, but some compiler flags
create additional output files.
For example, -gsplit-dwarf creates a .dwo file.
When CONFIG_DEBUG_INFO_SPLIT=y, you will see a bunch of .<pid>.dwo files
left in the top of build directories. You may not notice them unless you
do 'ls -a', but the garbage files will increase every time you run 'make'.
This commit changes the temporary object path to .tmp_<pid>/tmp, and
removes .tmp_<pid> directory when exiting. Separate build artifacts such
as *.dwo will be cleaned up all together because their file paths are
usually determined based on the base name of the object.
Another example is -ftest-coverage, which outputs the coverage data into
<base-name-of-object>.gcno
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 873e5a763d ]
When strace wants to update the syscall number, it sets GPR2
to the desired number and updates the GPR via PTRACE_SETREGSET.
It doesn't update regs->int_code which would cause the old syscall
executed on syscall restart. As we cannot change the ptrace ABI and
don't have a field for the interruption code, check whether the tracee
is in a syscall and the last instruction was svc. In that case assume
that the tracer wants to update the syscall number and copy the GPR2
value to regs->int_code.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e89df5c432 ]
There is a race condition exist during termination. The path is
alx_stop and then alx_remove. An alx_schedule_link_check could be called
before alx_stop by interrupt handler and invoke alx_link_check later.
Alx_stop frees the napis, and alx_remove cancels any pending works.
If any of the work is scheduled before termination and invoked before
alx_remove, a null-ptr-deref occurs because both expect alx->napis[i].
This patch fix the race condition by moving cancel_work_sync functions
before alx_free_napis inside alx_stop. Because interrupt handler can call
alx_schedule_link_check again, alx_free_irq is moved before
cancel_work_sync calls too.
Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 740797ce3a ]
syzbot reported the following warning:
WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
At deadline.c:628 we have:
623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
624 {
625 struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
626 struct rq *rq = rq_of_dl_rq(dl_rq);
627
628 WARN_ON(dl_se->dl_boosted);
629 WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
[...]
}
Which means that setup_new_dl_entity() has been called on a task
currently boosted. This shouldn't happen though, as setup_new_dl_entity()
is only called when the 'dynamic' deadline of the new entity
is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
condition.
Digging through the PI code I noticed that what above might in fact happen
if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
first branch of boosting conditions we check only if a pi_task 'dynamic'
deadline is earlier than mutex holder's and in this case we set mutex
holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
initialized if such tasks get boosted at some point (or if they become
DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
to 0 and this verifies the aforementioned condition.
Fix it by checking that the potential donor task is actually (even if
temporary because in turn boosted) running at DEADLINE priority before
using its 'dynamic' deadline value.
Fixes: 2d3d891d33 ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Link: https://lkml.kernel.org/r/20181119153201.GB2119@localhost.localdomain
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7150284600 ]
When using ip_set with counters and comment, traffic causes the kernel
to panic on 32-bit ARM:
Alignment trap: not handling instruction e1b82f9f at [<bf01b0dc>]
Unhandled fault: alignment exception (0x221) at 0xea08133c
PC is at ip_set_match_extensions+0xe0/0x224 [ip_set]
The problem occurs when we try to update the 64-bit counters - the
faulting address above is not 64-bit aligned. The problem occurs
due to the way elements are allocated, for example:
set->dsize = ip_set_elem_len(set, tb, 0, 0);
map = ip_set_alloc(sizeof(*map) + elements * set->dsize);
If the element has a requirement for a member to be 64-bit aligned,
and set->dsize is not a multiple of 8, but is a multiple of four,
then every odd numbered elements will be misaligned - and hitting
an atomic64_add() on that element will cause the kernel to panic.
ip_set_elem_len() must return a size that is rounded to the maximum
alignment of any extension field stored in the element. This change
ensures that is the case.
Fixes: 95ad1f4a93 ("netfilter: ipset: Fix extension alignment")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e55f3c37cb ]
If this is in "transceiver" mode the the ->qwork isn't required and is
a NULL pointer. This can lead to a NULL dereference when we call
destroy_workqueue(udc->qwork).
Fixes: 3517c31a8e ("usb: gadget: mv_udc: use devm_xxx for probe")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 586745f159 ]
if of_find_device_by_node() succeed, imx_suspend_alloc_ocram() doesn't
have a corresponding put_device(). Thus add a jump target to fix the
exception handling for this function implementation.
Fixes: 1579c7b9fe ("ARM: imx53: Set DDR pins to high impedance when in suspend to RAM.")
Signed-off-by: yu kuai <yukuai3@huawei.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d434d02f7e ]
This is likely a copy'n'paste mistake. The amount of ILT lines to
reserve for a single VF was being multiplied by the total VFs count.
This led to a huge redundancy in reservation and potential lines
drainouts.
Fixes: 1408cc1fa4 ("qed: Introduce VFs")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ccd7c7ce16 ]
25ms sleep cycles in waiting for PF response are excessive and may lead
to different timeout failures.
Start to wait with short udelays, and in most cases polling will end
here. If the time was not sufficient, switch to msleeps.
usleep_range() may go far beyond 100us depending on platform and tick
configuration, hence atomic udelays for consistency.
Also add explicit DMA barriers since 'done' always comes from a shared
request-response DMA pool, and note that in the comment nearby.
Fixes: 1408cc1fa4 ("qed: Introduce VFs")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 97dd1abd02 ]
qed_chain_get_element_left{,_u32} returned 0 when the difference
between producer and consumer page count was equal to the total
page count.
Fix this by conditional expanding of producer value (vs
unconditional). This allowed to eliminate normalizaton against
total page count, which was the cause of this bug.
Misc: replace open-coded constants with common defines.
Fixes: a91eb52abb ("qed: Revisit chain implementation")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b69040247 ]
CIFS implements the fallocate(FALLOC_FL_ZERO_RANGE) with send SMB
ioctl(FSCTL_SET_ZERO_DATA) to server. It just set the range of the
remote file to zero, but local page cache not update, then the data
inconsistent with server, which leads the xfstest generic/008 failed.
So we need to remove the local page caches before send SMB
ioctl(FSCTL_SET_ZERO_DATA) to server. After next read, it will
re-cache it.
Fixes: 30175628bf ("[SMB3] Enable fallocate -z support for SMB3 mounts")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Cc: stable@vger.kernel.org # v3.17
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit acc91c2d8d ]
When punch hole success, we also can read old data from file:
# strace -e trace=pread64,fallocate xfs_io -f -c "pread 20 40" \
-c "fpunch 20 40" -c"pread 20 40" file
pread64(3, " version 5.8.0-rc1+"..., 40, 20) = 40
fallocate(3, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 20, 40) = 0
pread64(3, " version 5.8.0-rc1+"..., 40, 20) = 40
CIFS implements the fallocate(FALLOCATE_FL_PUNCH_HOLE) with send SMB
ioctl(FSCTL_SET_ZERO_DATA) to server. It just set the range of the
remote file to zero, but local page caches not updated, then the
local page caches inconsistent with server.
Also can be found by xfstests generic/316.
So, we need to remove the page caches before send the SMB
ioctl(FSCTL_SET_ZERO_DATA) to server.
Fixes: 31742c5a33 ("enable fallocate punch hole ("fallocate -p") for SMB3")
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Cc: stable@vger.kernel.org # v3.17
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b3d71abd13 ]
USB2 devices with LPM enabled may interrupt the system suspend:
[ 932.510475] usb 1-7: usb suspend, wakeup 0
[ 932.510549] hub 1-0:1.0: hub_suspend
[ 932.510581] usb usb1: bus suspend, wakeup 0
[ 932.510590] xhci_hcd 0000:00:14.0: port 9 not suspended
[ 932.510593] xhci_hcd 0000:00:14.0: port 8 not suspended
..
[ 932.520323] xhci_hcd 0000:00:14.0: Port change event, 1-7, id 7, portsc: 0x400e03
..
[ 932.591405] PM: pci_pm_suspend(): hcd_pci_suspend+0x0/0x30 returns -16
[ 932.591414] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -16
[ 932.591418] PM: Device 0000:00:14.0 failed to suspend async: error -16
During system suspend, USB core will let HC suspends the device if it
doesn't have remote wakeup enabled and doesn't have any children.
However, from the log above we can see that the usb 1-7 doesn't get bus
suspended due to not in U0. After a while the port finished U2 -> U0
transition, interrupts the suspend process.
The observation is that after disabling LPM, port doesn't transit to U0
immediately and can linger in U2. xHCI spec 4.23.5.2 states that the
maximum exit latency for USB2 LPM should be BESL + 10us. The BESL for
the affected device is advertised as 400us, which is still not enough
based on my testing result.
So let's use the maximum permitted latency, 10000, to poll for U0
status to solve the issue.
Cc: stable@vger.kernel.org
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20200624135949.22611-6-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 220345e98f ]
The USB-audio mixer code holds a linked list of usb_mixer_elem_list,
and several operations are performed for each mixer element. A few of
them (snd_usb_mixer_notify_id() and snd_usb_mixer_interrupt_v2())
assume each mixer element being a usb_mixer_elem_info object that is a
subclass of usb_mixer_elem_list, cast via container_of() and access it
members. This may result in an out-of-bound access when a
non-standard list element has been added, as spotted by syzkaller
recently.
This patch adds a new field, is_std_info, in usb_mixer_elem_list to
indicate that the element is the usb_mixer_elem_info type or not, and
skip the access to such an element if needed.
Reported-by: syzbot+fb14314433463ad51625@syzkaller.appspotmail.com
Reported-by: syzbot+2405ca3401e943c538b5@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200624122340.9615-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8c558076c7 ]
Introduce a new macro for iterating over mixer element list for
avoiding the open codes in many places. Also the open-coded
container_of() and the forced cast to struct usb_mixer_elem_info are
replaced with another simple macro, too.
No functional changes but just readability improvement.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b2500b584c ]
When an interrupt occurs, the value of at least one of the belonging
controls should have changed. To make sure they get re-read from device
on the next read, invalidate the cache. This was correctly implemented
for uac2 already, but missing for uac1.
Signed-off-by: Julian Scheel <julian@jusst.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a73d9d9cfc upstream.
Unable to complete the enumeration of a USB TV Tuner device.
Per XHCI spec (4.6.5), the EP state field of the input context shall
be cleared for a set address command. In the special case of an FS
device that has "MaxPacketSize0 = 8", the Linux XHCI driver does
not do this before evaluating the context. With an XHCI controller
that checks the EP state field for parameter context error this
causes a problem in cases such as the device getting reset again
after enumeration.
When that field is cleared, the problem does not occur.
This was found and fixed by Sasi Kumar.
Cc: stable@vger.kernel.org
Signed-off-by: Al Cooper <alcooperx@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20200624135949.22611-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ddcb71a3e upstream.
A Synopsys USB2.0 core used in Huawei Kunpeng920 SoC has a bug which
might cause the host controller not issuing ping.
Bug description:
After indicating an Interrupt on Async Advance, the software uses the
doorbell mechanism to delete the Next Link queue head of the last
executed queue head. At this time, the host controller still references
the removed queue head(the queue head is NULL). NULL reference causes
the host controller to lose the USB device.
Solution:
After deleting the Next Link queue head, when has_synopsys_hc_bug set
to 1,the software can write one of the valid queue head addresses to
the ASYNCLISTADDR register to allow the host controller to get
the valid queue head. in order to solve that problem, this patch set
the flag for Huawei Kunpeng920
There are detailed instructions and solutions in this patch:
commit 2f7ac6c199 ("USB: ehci: add workaround for Synopsys HC bug")
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/1591588019-44284-1-git-send-email-liulongfang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>