commit 62c9827cbb upstream.
Fix a data race in commit 779750d20b ("shmem: split huge pages beyond
i_size under memory pressure").
Here are call traces causing race:
Call Trace 1:
shmem_unused_huge_shrink+0x3ae/0x410
? __list_lru_walk_one.isra.5+0x33/0x160
super_cache_scan+0x17c/0x190
shrink_slab.part.55+0x1ef/0x3f0
shrink_node+0x10e/0x330
kswapd+0x380/0x740
kthread+0xfc/0x130
? mem_cgroup_shrink_node+0x170/0x170
? kthread_create_on_node+0x70/0x70
ret_from_fork+0x1f/0x30
Call Trace 2:
shmem_evict_inode+0xd8/0x190
evict+0xbe/0x1c0
do_unlinkat+0x137/0x330
do_syscall_64+0x76/0x120
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
A simple explanation:
Image there are 3 items in the local list (@list). In the first
traversal, A is not deleted from @list.
1) A->B->C
^
|
pos (leave)
In the second traversal, B is deleted from @list. Concurrently, A is
deleted from @list through shmem_evict_inode() since last reference
counter of inode is dropped by other thread. Then the @list is corrupted.
2) A->B->C
^ ^
| |
evict pos (drop)
We should make sure the inode is either on the global list or deleted from
any local list before iput().
Fixed by moving inodes back to global list before we put them.
[akpm@linux-foundation.org: coding style fixes]
Link: https://lkml.kernel.org/r/20211125064502.99983-1-ligang.bdlg@bytedance.com
Fixes: 779750d20b ("shmem: split huge pages beyond i_size under memory pressure")
Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4dc63f003 upstream.
In kdump kernel of x86_64, page allocation failure is observed:
kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
Workqueue: events_unbound async_run_entry_fn
Call Trace:
<TASK>
dump_stack_lvl+0x48/0x5e
warn_alloc.cold+0x72/0xd6
__alloc_pages_slowpath.constprop.0+0xc69/0xcd0
__alloc_pages+0x1df/0x210
new_slab+0x389/0x4d0
___slab_alloc+0x58f/0x770
__slab_alloc.constprop.0+0x4a/0x80
kmem_cache_alloc_trace+0x24b/0x2c0
sr_probe+0x1db/0x620
......
device_add+0x405/0x920
......
__scsi_add_device+0xe5/0x100
ata_scsi_scan_host+0x97/0x1d0
async_run_entry_fn+0x30/0x130
process_one_work+0x1e8/0x3c0
worker_thread+0x50/0x3b0
? rescuer_thread+0x350/0x350
kthread+0x16b/0x190
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
</TASK>
Mem-Info:
......
The above failure happened when calling kmalloc() to allocate buffer with
GFP_DMA. It requests to allocate slab page from DMA zone while no managed
pages at all in there.
sr_probe()
--> get_capabilities()
--> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
Because in the current kernel, dma-kmalloc will be created as long as
CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have
managed pages on DMA zone since commit 6f599d8423 ("x86/kdump: Always
reserve the low 1M when the crashkernel option is specified"). The
failure can be always reproduced.
For now, let's mute the warning of allocation failure if requesting pages
from DMA zone while no managed pages.
[akpm@linux-foundation.org: fix warning]
Link: https://lkml.kernel.org/r/20211223094435.248523-4-bhe@redhat.com
Fixes: 6f599d8423 ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a674e48c54 upstream.
Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory
presented and locked down by memblock allocator. So no pages are added
into buddy of DMA zone. Please check commit f1d4d47c58 ("x86/setup:
Always reserve the first 1M of RAM").
Then in kdump kernel of x86_64, it always prints below failure message:
DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
Call Trace:
dump_stack+0x7f/0xa1
warn_alloc.cold+0x72/0xd6
__alloc_pages_slowpath.constprop.0+0xf29/0xf50
__alloc_pages+0x24d/0x2c0
alloc_page_interleave+0x13/0xb0
atomic_pool_expand+0x118/0x210
__dma_atomic_pool_init+0x45/0x93
dma_atomic_pool_init+0xdb/0x176
do_one_initcall+0x67/0x320
kernel_init_freeable+0x290/0x2dc
kernel_init+0xa/0x111
ret_from_fork+0x22/0x30
Mem-Info:
......
DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
Here, let's check if DMA zone has managed pages, then create
atomic_pool_dma if yes. Otherwise just skip it.
Link: https://lkml.kernel.org/r/20211223094435.248523-3-bhe@redhat.com
Fixes: 6f599d8423 ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: David Rientjes <rientjes@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62b3107073 upstream.
Patch series "Handle warning of allocation failure on DMA zone w/o
managed pages", v4.
**Problem observed:
On x86_64, when crash is triggered and entering into kdump kernel, page
allocation failure can always be seen.
---------------------------------
DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 0 PID: 1 Comm: swapper/0
Call Trace:
dump_stack+0x7f/0xa1
warn_alloc.cold+0x72/0xd6
......
__alloc_pages+0x24d/0x2c0
......
dma_atomic_pool_init+0xdb/0x176
do_one_initcall+0x67/0x320
? rcu_read_lock_sched_held+0x3f/0x80
kernel_init_freeable+0x290/0x2dc
? rest_init+0x24f/0x24f
kernel_init+0xa/0x111
ret_from_fork+0x22/0x30
Mem-Info:
------------------------------------
***Root cause:
In the current kernel, it assumes that DMA zone must have managed pages
and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
always true. E.g in kdump kernel of x86_64, only low 1M is presented and
locked down at very early stage of boot, so that this low 1M won't be
added into buddy allocator to become managed pages of DMA zone. This
exception will always cause page allocation failure if page is requested
from DMA zone.
***Investigation:
This failure happens since below commit merged into linus's tree.
1a6a9044b9 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
23721c8e92 x86/crash: Remove crash_reserve_low_1M()
f1d4d47c58 x86/setup: Always reserve the first 1M of RAM
7c321eb2b8 x86/kdump: Remove the backup region handling
6f599d8423 x86/kdump: Always reserve the low 1M when the crashkernel option is specified
Before them, on x86_64, the low 640K area will be reused by kdump kernel.
So in kdump kernel, the content of low 640K area is copied into a backup
region for dumping before jumping into kdump. Then except of those firmware
reserved region in [0, 640K], the left area will be added into buddy
allocator to become available managed pages of DMA zone.
However, after above commits applied, in kdump kernel of x86_64, the low
1M is reserved by memblock, but not released to buddy allocator. So any
later page allocation requested from DMA zone will fail.
At the beginning, if crashkernel is reserved, the low 1M need be locked
down because AMD SME encrypts memory making the old backup region
mechanims impossible when switching into kdump kernel.
Later, it was also observed that there are BIOSes corrupting memory
under 1M. To solve this, in commit f1d4d47c58, the entire region of
low 1M is always reserved after the real mode trampoline is allocated.
Besides, recently, Intel engineer mentioned their TDX (Trusted domain
extensions) which is under development in kernel also needs to lock down
the low 1M. So we can't simply revert above commits to fix the page allocation
failure from DMA zone as someone suggested.
***Solution:
Currently, only DMA atomic pool and dma-kmalloc will initialize and
request page allocation with GFP_DMA during bootup.
So only initializ DMA atomic pool when DMA zone has available managed
pages, otherwise just skip the initialization.
For dma-kmalloc(), for the time being, let's mute the warning of
allocation failure if requesting pages from DMA zone while no manged
pages. Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to
replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc() if
not necessary. Christoph is posting patches to fix those under
drivers/scsi/. Finally, we can remove the need of dma-kmalloc() as people
suggested.
This patch (of 3):
In some places of the current kernel, it assumes that dma zone must have
managed pages if CONFIG_ZONE_DMA is enabled. While this is not always
true. E.g in kdump kernel of x86_64, only low 1M is presented and locked
down at very early stage of boot, so that there's no managed pages at all
in DMA zone. This exception will always cause page allocation failure if
page is requested from DMA zone.
Here add function has_managed_dma() and the relevant helper functions to
check if there's DMA zone with managed pages. It will be used in later
patches.
Link: https://lkml.kernel.org/r/20211223094435.248523-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20211223094435.248523-2-bhe@redhat.com
Fixes: 6f599d8423 ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e445375882 upstream.
Like other SATA controller chips in the Marvell 88SE91xx series, the
Marvell 88SE9125 has the same DMA requester ID hardware bug that prevents
it from working under IOMMU. Add it to the list of devices that need the
quirk.
Without this patch, device initialization fails with DMA errors:
ata8: softreset failed (1st FIS failed)
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Write NO_PASID] Request device [03:00.1] fault addr 0xfffc0000 [fault reason 0x02] Present bit in context entry is clear
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read NO_PASID] Request device [03:00.1] fault addr 0xfffc0000 [fault reason 0x02] Present bit in context entry is clear
After applying the patch, the controller can be successfully initialized:
ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 330)
ata8.00: ATAPI: PIONEER BD-RW BDR-207M, 1.21, max UDMA/100
ata8.00: configured for UDMA/100
scsi 7:0:0:0: CD-ROM PIONEER BD-RW BDR-207M 1.21 PQ: 0 ANSI: 5
Link: https://lore.kernel.org/r/YahpKVR+McJVDdkD@work
Reported-by: Sam Bingner <sam@bingner.com>
Tested-by: Sam Bingner <sam@bingner.com>
Tested-by: Yifeng Li <tomli@tomli.me>
Signed-off-by: Yifeng Li <tomli@tomli.me>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5185965c3 upstream.
Host1x DMA buffer isn't mapped properly when CONFIG_ARM_DMA_USE_IOMMU=y.
The memory management code of Host1x driver has a longstanding overhaul
overdue and it's not obvious where the problem is in this case. Hence
let's add back the old workaround which we already had sometime before.
It explicitly detaches Host1x device from the offending implicit IOMMU
domain. This fixes a completely broken Host1x DMA in case of ARM32
multiplatform kernel config.
Cc: stable@vger.kernel.org
Fixes: af1cbfb9bf ("gpu: host1x: Support DMA mapping of buffers")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a556cfe4ca upstream.
In __arm_v7s_alloc_table function:
iommu call kmem_cache_alloc to allocate page table, this function
allocate memory may fail, when kmem_cache_alloc fails to allocate
table, call virt_to_phys will be abnomal and return unexpected phys
and goto out_free, then call kmem_cache_free to release table will
trigger KE, __get_free_pages and free_pages have similar problem,
so add error handle for page table allocation failure.
Fixes: 29859aeb8a ("iommu/io-pgtable-arm-v7s: Abort allocation when table address overflows the PTE")
Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
Cc: <stable@vger.kernel.org> # 5.10.*
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20211207113315.29109-1-yf.wang@mediatek.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 713bdfa10b upstream.
The en/disable_irq() functions keep track of the 'depth': i.e. if
interrupts are disabled twice, then it needs to enable_irq() calls to
enable them again. The cec-pin framework didn't take this into accound
and could disable irqs multiple times, and it expected that a single
enable_irq() would enable them again.
Move all calls to en/disable_irq() to the kthread where it is easy
to keep track of the current irq state and ensure that multiple
en/disable_irq calls never happen.
If interrupts where disabled twice, then they would never turn on
again, leaving the CEC adapter in a dead state.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 865463fc03 (media: cec-pin: add error injection support)
Cc: <stable@vger.kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7b77ebe6d upstream.
This fixes a problem where closing the tuner would leave it in a state
where it would not tune to any channel when reopened. This problem was
discovered as part of https://github.com/hselasky/webcamd/issues/16.
Since adap->id is 0 or 1, this bit-shift overflows, which is undefined
behavior. The driver still worked in practice as the overflow would in
most environments result in 0, which rendered the line a no-op. When
running the driver as part of webcamd however, the overflow could lead
to 0xff due to optimizations by the compiler, which would, in the end,
improperly shut down the tuner.
The bug is a regression introduced in the commit referenced below. The
present patch causes identical behavior to before that commit for
adap->id equal to 0 or 1. The driver does not contain support for
dib0700 devices with more adapters, assuming such even exist.
Tests have been performed with the Xbox One Digital TV Tuner on amd64.
Not all dib0700 devices are expected to be affected by the regression;
this code path is only taken by those with incorrect endpoint numbers.
Link: https://lore.kernel.org/linux-media/1d2fc36d94ced6f67c7cc21dcc469d5e5bdd8201.1632689033.git.mchehab+huawei@kernel.org
Cc: stable@vger.kernel.org
Fixes: 7757ddda6f ("[media] DiB0700: add function to change I2C-speed")
Signed-off-by: Michael Kuron <michael.kuron@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f71d272ad4 upstream.
USB control-message timeouts are specified in milliseconds and should
specifically not vary with CONFIG_HZ.
Use the common control-message timeout define for the five-second
timeouts.
Fixes: 38f993ad8b ("V4L/DVB (8125): This driver adds support for the Sensoray 2255 devices.")
Cc: stable@vger.kernel.org # 2.6.27
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd1798a387 upstream.
USB control-message timeouts are specified in milliseconds and should
specifically not vary with CONFIG_HZ.
Note that the driver was multiplying some of the timeout values with HZ
twice resulting in 3000-second timeouts with HZ=1000.
Also note that two of the timeout defines are currently unused.
Fixes: 2154be651b ("[media] redrat3: new rc-core IR transceiver device driver")
Cc: stable@vger.kernel.org # 3.0
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd9d9377ed upstream.
If V4L2_CAP_READWRITE is not set, then readbuffers must be set to 0,
otherwise v4l2-compliance will complain.
A note on the Fixes tag below: this patch does not really fix that commit,
but it can be applied from that commit onwards. For older code there is no
guarantee that device_caps is set, so even though this patch would apply,
it will not work reliably.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 049e684f2d (media: v4l2-dev: fix WARN_ON(!vdev->device_caps))
Cc: <stable@vger.kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de0244ae40 upstream.
Ammar Faizi reported that our exit code handling is wrong. We truncate
it to the lowest 8 bits but the syscall itself is expected to take a
regular 32-bit signed integer, not an unsigned char. It's the kernel
that later truncates it to the lowest 8 bits. The difference is visible
in strace, where the program below used to show exit(255) instead of
exit(-1):
int main(void)
{
return -1;
}
This patch applies the fix to all archs. x86_64, i386, arm64, armv7 and
mips were all tested and confirmed to work fine now. Risc-v was not
tested but the change is trivial and exactly the same as for other archs.
Reported-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebbe0d8a44 upstream.
After re-checking in the spec and comparing stack offsets with glibc,
The last pushed argument must be 16-byte aligned (i.e. aligned before the
call) so that in the callee esp+4 is multiple of 16, so the principle is
the 32-bit equivalent to what Ammar fixed for x86_64. It's possible that
32-bit code using SSE2 or MMX could have been affected. In addition the
frame pointer ought to be zero at the deepest level.
Link: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI
Cc: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 937ed91c71 upstream.
Before this patch, the `_start` function looks like this:
```
0000000000001170 <_start>:
1170: pop %rdi
1171: mov %rsp,%rsi
1174: lea 0x8(%rsi,%rdi,8),%rdx
1179: and $0xfffffffffffffff0,%rsp
117d: sub $0x8,%rsp
1181: call 1000 <main>
1186: movzbq %al,%rdi
118a: mov $0x3c,%rax
1191: syscall
1193: hlt
1194: data16 cs nopw 0x0(%rax,%rax,1)
119f: nop
```
Note the "and" to %rsp with $-16, it makes the %rsp be 16-byte aligned,
but then there is a "sub" with $0x8 which makes the %rsp no longer
16-byte aligned, then it calls main. That's the bug!
What actually the x86-64 System V ABI mandates is that right before the
"call", the %rsp must be 16-byte aligned, not after the "call". So the
"sub" with $0x8 here breaks the alignment. Remove it.
An example where this rule matters is when the callee needs to align
its stack at 16-byte for aligned move instruction, like `movdqa` and
`movaps`. If the callee can't align its stack properly, it will result
in segmentation fault.
x86-64 System V ABI also mandates the deepest stack frame should be
zero. Just to be safe, let's zero the %rbp on startup as the content
of %rbp may be unspecified when the program starts. Now it looks like
this:
```
0000000000001170 <_start>:
1170: pop %rdi
1171: mov %rsp,%rsi
1174: lea 0x8(%rsi,%rdi,8),%rdx
1179: xor %ebp,%ebp # zero the %rbp
117b: and $0xfffffffffffffff0,%rsp # align the %rsp
117f: call 1000 <main>
1184: movzbq %al,%rdi
1188: mov $0x3c,%rax
118f: syscall
1191: hlt
1192: data16 cs nopw 0x0(%rax,%rax,1)
119d: nopl (%rax)
```
Cc: Bedirhan KURT <windowz414@gnuweeb.org>
Cc: Louvian Lyndal <louvianlyndal@gmail.com>
Reported-by: Peter Cordes <peter@cordes.ca>
Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
[wt: I did this on purpose due to a misunderstanding of the spec, other
archs will thus have to be rechecked, particularly i386]
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c494ca4d3 upstream.
"Stolen memory" is memory set aside for use by an Intel integrated GPU.
The intel_graphics_quirks() early quirk reserves this memory when it is
called for a GPU that appears in the intel_early_ids[] table of integrated
GPUs.
Previously intel_graphics_quirks() was marked as QFLAG_APPLY_ONCE, so it
was called only for the first Intel GPU found. If a discrete GPU happened
to be enumerated first, intel_graphics_quirks() was called for it but not
for any integrated GPU found later. Therefore, stolen memory for such an
integrated GPU was never reserved.
For example, this problem occurs in this Alderlake-P (integrated) + DG2
(discrete) topology where the DG2 is found first, but stolen memory is
associated with the integrated GPU:
- 00:01.0 Bridge
`- 03:00.0 DG2 discrete GPU
- 00:02.0 Integrated GPU (with stolen memory)
Remove the QFLAG_APPLY_ONCE flag and call intel_graphics_quirks() for every
Intel GPU. Reserve stolen memory for the first GPU that appears in
intel_early_ids[].
[bhelgaas: commit log, add code comment, squash in
https://lore.kernel.org/r/20220118190558.2ququ4vdfjuahicm@ldmartin-desk2]
Link: https://lore.kernel.org/r/20220114002843.2083382-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c9d709965 upstream.
The function nand_davinci_read_page_hwecc_oob_first() first reads the
OOB data, extracts the ECC information, programs the ECC hardware before
reading the actual data in a loop.
Right after the OOB data was read, it called nand_read_page_op() to
reset the read cursor to the beginning of the page. This caused the
first page to be read twice: in that call, and later in the loop.
Address that issue by changing the call to nand_read_page_op() to
nand_change_read_column_op(), which will only reset the read cursor.
Cc: <stable@vger.kernel.org> # v5.2
Fixes: a0ac778eb8 ("mtd: rawnand: ingenic: Add support for the JZ4740")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20211016132228.40254-2-paul@crapouillou.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f53d4c109a upstream.
gpmi_io clock needs to be gated off when changing the parent/dividers of
enfc_clk_root (i.MX6Q/i.MX6UL) respectively qspi2_clk_root (i.MX6SX).
Otherwise this rate change can lead to an unresponsive GPMI core which
results in DMA timeouts and failed driver probe:
[ 4.072318] gpmi-nand 112000.gpmi-nand: DMA timeout, last DMA
...
[ 4.370355] gpmi-nand 112000.gpmi-nand: Chip: 0, Error -110
...
[ 4.375988] gpmi-nand 112000.gpmi-nand: Chip: 0, Error -22
[ 4.381524] gpmi-nand 112000.gpmi-nand: Error in ECC-based read: -22
[ 4.387988] gpmi-nand 112000.gpmi-nand: Chip: 0, Error -22
[ 4.393535] gpmi-nand 112000.gpmi-nand: Chip: 0, Error -22
...
Other than stated in i.MX 6 erratum ERR007117, it should be sufficient
to gate only gpmi_io because all other bch/nand clocks are derived from
different clock roots.
The i.MX6 reference manuals state that changing clock muxers can cause
glitches but are silent about changing dividers. But tests showed that
these glitches can definitely happen on i.MX6ULL. For i.MX7D/8MM in turn,
the manual guarantees that no glitches can happen when changing
dividers.
Co-developed-by: Stefan Riedmueller <s.riedmueller@phytec.de>
Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
Signed-off-by: Christian Eggers <ceggers@arri.de>
Cc: stable@vger.kernel.org
Acked-by: Han Xu <han.xu@nxp.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20211102202022.15551-2-ceggers@arri.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dded08927c upstream.
Syzbot detected a NULL pointer dereference of nfc_llcp_sock->dev pointer
(which is a 'struct nfc_dev *') with calls to llcp_sock_sendmsg() after
a failed llcp_sock_bind(). The message being sent is a SOCK_DGRAM.
KASAN report:
BUG: KASAN: null-ptr-deref in nfc_alloc_send_skb+0x2d/0xc0
Read of size 4 at addr 00000000000005c8 by task llcp_sock_nfc_a/899
CPU: 5 PID: 899 Comm: llcp_sock_nfc_a Not tainted 5.16.0-rc6-next-20211224-00001-gc6437fbf18b0 #125
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x59
? nfc_alloc_send_skb+0x2d/0xc0
__kasan_report.cold+0x117/0x11c
? mark_lock+0x480/0x4f0
? nfc_alloc_send_skb+0x2d/0xc0
kasan_report+0x38/0x50
nfc_alloc_send_skb+0x2d/0xc0
nfc_llcp_send_ui_frame+0x18c/0x2a0
? nfc_llcp_send_i_frame+0x230/0x230
? __local_bh_enable_ip+0x86/0xe0
? llcp_sock_connect+0x470/0x470
? llcp_sock_connect+0x470/0x470
sock_sendmsg+0x8e/0xa0
____sys_sendmsg+0x253/0x3f0
...
The issue was visible only with multiple simultaneous calls to bind() and
sendmsg(), which resulted in most of the bind() calls to fail. The
bind() was failing on checking if there is available WKS/SDP/SAP
(respective bit in 'struct nfc_llcp_local' fields). When there was no
available WKS/SDP/SAP, the bind returned error but the sendmsg() to such
socket was able to trigger mentioned NULL pointer dereference of
nfc_llcp_sock->dev.
The code looks simply racy and currently it protects several paths
against race with checks for (!nfc_llcp_sock->local) which is NULL-ified
in error paths of bind(). The llcp_sock_sendmsg() did not have such
check but called function nfc_llcp_send_ui_frame() had, although not
protected with lock_sock().
Therefore the race could look like (same socket is used all the time):
CPU0 CPU1
==== ====
llcp_sock_bind()
- lock_sock()
- success
- release_sock()
- return 0
llcp_sock_sendmsg()
- lock_sock()
- release_sock()
llcp_sock_bind(), same socket
- lock_sock()
- error
- nfc_llcp_send_ui_frame()
- if (!llcp_sock->local)
- llcp_sock->local = NULL
- nfc_put_device(dev)
- dereference llcp_sock->dev
- release_sock()
- return -ERRNO
The nfc_llcp_send_ui_frame() checked llcp_sock->local outside of the
lock, which is racy and ineffective check. Instead, its caller
llcp_sock_sendmsg(), should perform the check inside lock_sock().
Reported-and-tested-by: syzbot+7f23bcddf626e0593a39@syzkaller.appspotmail.com
Fixes: b874dec21d ("NFC: Implement LLCP connection less Tx path")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77900c45ee upstream.
In fuzzed image, SSA table may indicate that a data block belongs to
invalid node, which node ID is out-of-range (0, 1, 2 or max_nid), in
order to avoid migrating inconsistent data in such corrupted image,
let's do sanity check anyway before data block migration.
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20f3cf5f86 upstream.
If we ever see a touch report with contact count data we initialize
several variables used to read the contact count in the pre-report
phase. These variables are never reset if we process a report which
doesn't contain a contact count, however. This can cause the pre-
report function to trigger a read of arbitrary memory (e.g. NULL
if we're lucky) and potentially crash the driver.
This commit restores resetting of the variables back to default
"none" values that were used prior to the commit mentioned
below.
Link: https://github.com/linuxwacom/input-wacom/issues/276
Fixes: 003f50ab67 (HID: wacom: Update last_slot_field during pre_report phase)
CC: stable@vger.kernel.org
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df03e9bd6d upstream.
AES hardware may internally re-classify a contact that it thought was
intentional as a palm. Intentional contacts are reported as "down" with
the confidence bit set. When this re-classification occurs, however, the
state transitions to "up" with the confidence bit cleared. This kind of
transition appears to be legal according to Microsoft docs, but we do
not handle it correctly. Because the confidence bit is clear, we don't
call `wacom_wac_finger_slot` and update userspace. This causes hung
touches that confuse userspace and interfere with pen arbitration.
This commit adds a special case to ignore the confidence flag if a contact
is reported as removed. This ensures we do not leave a hung touch if one
of these re-classification events occured. Ideally we'd have some way to
also let userspace know that the touch has been re-classified as a palm
and needs to be canceled, but that's not possible right now :)
Link: https://github.com/linuxwacom/input-wacom/issues/288
Fixes: 7fb0413baa (HID: wacom: Use "Confidence" flag to prevent reporting invalid contacts)
CC: stable@vger.kernel.org
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 546e41ac99 upstream.
These two values go hand-in-hand and must be valid for the driver to
behave correctly. We are currently lazy about updating the values and
rely on the "expected" code flow to take care of making sure they're
valid at the point they're needed. The "expected" flow changed somewhat
with commit f8b6a74719 ("HID: wacom: generic: Support multiple tools
per report"), however. This led to problems with the DTH-2452 due (in
part) to *all* contacts being fully processed -- even those past the
expected contact count. Specifically, the received count gets reset to
0 once all expected fingers are processed, but not the expected count.
The rest of the contacts in the report are then *also* processed since
now the driver thinks we've only processed 0 of N expected contacts.
Later commits such as 7fb0413baa (HID: wacom: Use "Confidence" flag to
prevent reporting invalid contacts) worked around the DTH-2452 issue by
skipping the invalid contacts at the end of the report, but this is not
a complete fix. The confidence flag cannot be relied on when a contact
is removed (see the following patch), and dealing with that condition
re-introduces the DTH-2452 issue unless we also address this contact
count laziness. By resetting expected and received counts at the same
time we ensure the driver understands that there are 0 more contacts
expected in the report. Similarly, we also make sure to reset the
received count if for some reason we're out of sync in the pre-report
phase.
Link: https://github.com/linuxwacom/input-wacom/issues/288
Fixes: f8b6a74719 ("HID: wacom: generic: Support multiple tools per report")
CC: stable@vger.kernel.org
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ea5763fb7 upstream.
uhid has to run hid_add_device() from workqueue context while allowing
parallel use of the userspace API (which is protected with ->devlock).
But hid_add_device() can fail. Currently, that is handled by immediately
destroying the associated HID device, without using ->devlock - but if
there are concurrent requests from userspace, that's wrong and leads to
NULL dereferences and/or memory corruption (via use-after-free).
Fix it by leaving the HID device as-is in the worker. We can clean it up
later, either in the UHID_DESTROY command handler or in the ->release()
handler.
Cc: stable@vger.kernel.org
Fixes: 67f8ecc550 ("HID: uhid: fix timeout when probe races with IO")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2aac550da3 upstream.
The recent few quirk entries for Lenovo haven't been put in the right
order. Let's arrange the table again.
Fixes: ad7cc2d41b ("ALSA: hda/realtek: Quirks to enable speaker output...")
Fixes: 6dc8697622 ("ALSA: hda/realtek: Add speaker fixup for some Yoga 15ITL5 devices")
Fixes: 8f4c90427a ("ALSA: hda/realtek: Add quirk for Legion Y9000X 2020")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c193300867 upstream.
This patch addresses an issue where after rebooting from Windows into Linux
there would be no audio output.
It turns out that the Realtek Audio driver on Windows changes some coeffs
which are not being reset/reinitialized when rebooting the machine. As a
result, there is no audio output until these coeffs are being reset to
their initial state. This patch takes care of that by setting known-good
(initial) values to the coeffs.
We initially relied upon alc1220_fixup_clevo_p950() to fix some pins in the
connection list. However, it also sets coef 0x7 which does not need to be
touched. Furthermore, to prevent mixing device-specific quirks I introduced
a new alc1220_fixup_gb_x570() which is heavily based on
alc1220_fixup_clevo_p950() but does not set coeff 0x7 and fixes the coeffs
that are actually needed instead.
This new alc1220_fixup_gb_x570() is believed to also work for other boards,
like the Gigabyte X570 Aorus Extreme and the newer Gigabyte Aorus X570S
Master. However, as there is no way for me to test these I initially only
enable this new behaviour for the mainboard I have which is the Gigabyte
X570(non-S) Aorus Master.
I tested this patch on the 5.15 branch as well as on master and it is
working well for me.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205275
Signed-off-by: Christian Lachner <gladiac@gmail.com>
Fixes: 0d45e86d22 ("ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220103140517.30273-2-gladiac@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9fb12fe5b9 upstream.
The fixed counter 3 is used for the Topdown metrics, which hasn't been
enabled for KVM guests. Userspace accessing to it will fail as it's not
included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines,
which have this counter.
To reproduce it on ICX+ machines, ./state_test reports:
==== Test Assertion Failure ====
lib/x86_64/processor.c:1078: r == nmsrs
pid=4564 tid=4564 - Argument list too long
1 0x000000000040b1b9: vcpu_save_state at processor.c:1077
2 0x0000000000402478: main at state_test.c:209 (discriminator 6)
3 0x00007fbe21ed5f92: ?? ??:0
4 0x000000000040264d: _start at ??:?
Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c)
With this patch, it works well.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Message-Id: <20211217124934.32893-1-wei.w.wang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: e2ada66ec4 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>