Patch series "Squashfs: handle missing pages decompressing into page
cache".
This patchset enables Squashfs to handle missing pages when directly
decompressing datablocks into the page cache.
Previously if the full set of pages needed was not available, Squashfs
would have to fall back to using an intermediate buffer (the older
method), which is slower, involving a memcopy, and it introduces
contention on a shared buffer.
The first patch extends the "page actor" code to handle missing pages.
The second patch updates Squashfs_readpage_block() to use the new
functionality, and removes the code that falls back to using an
intermediate buffer.
This patchset is independent of the readahead work, and it is standalone.
It can be merged on its own.
But the readahead patch for efficiency also needs this patch-set.
This patch (of 2):
This patch extends the "page actor" code to handle missing pages.
Previously if the full set of pages needed to decompress a Squashfs
datablock was unavailable, this would cause decompression to fail on the
missing pages.
In this case direct decompression into the page cache could not be
achieved and the code would fall back to using the older intermediate
buffer method.
With this patch, direct decompression into the page cache can be achieved
with missing pages.
For "multi-shot" decompressors (zlib, xz, zstd), the page actor will
allocate a temporary buffer which is passed to the decompressor, and then
freed by the page actor.
For "single shot" decompressors (lz4, lzo) which decompress into a
contiguous "bounce buffer", and which is then copied into the page cache,
it would be pointless to allocate a temporary buffer, memcpy into it, and
then free it. For these decompressors -ENOMEM is returned, which
signifies that the memcpy for that page should be skipped.
This also happens if the data block is uncompressed.
Link: https://lkml.kernel.org/r/20220611032133.5743-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20220611032133.5743-2-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Xiongwei Song <Xiongwei.Song@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 290900323
(cherry picked from commit f268eedddf)
Change-Id: I08906140517a56660e0c487cef49b2efb9c098be
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
[ Upstream commit 504a10d9e4 ]
On corrupt gfs2 file systems the evict code can try to reference the
journal descriptor structure, jdesc, after it has been freed and set to
NULL. The sequence of events is:
init_journal()
...
fail_jindex:
gfs2_jindex_free(sdp); <------frees journals, sets jdesc = NULL
if (gfs2_holder_initialized(&ji_gh))
gfs2_glock_dq_uninit(&ji_gh);
fail:
iput(sdp->sd_jindex); <--references jdesc in evict_linked_inode
evict()
gfs2_evict_inode()
evict_linked_inode()
ret = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
<------references the now freed/zeroed sd_jdesc pointer.
The call to gfs2_trans_begin is done because the truncate_inode_pages
call can cause gfs2 events that require a transaction, such as removing
journaled data (jdata) blocks from the journal.
This patch fixes the problem by adding a check for sdp->sd_jdesc to
function gfs2_evict_inode. In theory, this should only happen to corrupt
gfs2 file systems, when gfs2 detects the problem, reports it, then tries
to evict all the system inodes it has read in up to that point.
Bug: 289870854
Reported-by: Yang Lan <lanyang0908@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5ae4a618a1)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I501e8631e1b60479023f5e6ad957540f9e10bcd5
Presently the data buffer used to return the per-UID timeout description
is created based on information provided by the user. It is expected
that the user populates a variable called 'timeouts_array_size' which is
heavily scrutinised to ensure the value provided is appropriate i.e.
smaller than the largest possible value but large enough to contain all
of the data we wish to pass back.
The issue is that the aforementioned scrutiny is imposed on a different
variable to the one expected. Contrary to expectation, the data buffer
is actually being allocated to the size specified in a variable named
'timeouts_array_size_out'. A variable originally designed to only
contain the output information i.e. the size of the data actually copied
to the user for consumption. This value is also user provided and is
not given the same level of scrutiny as the former.
The fix in this case is simple. Ignore 'timeouts_array_size_out' until
it is time to populate (over-write) it ourselves and use
'timeouts_array_size' to shape the buffer as intended.
Bug: 281547360
Change-Id: I95e12879a33a2355f9e4bc0ce2bfc3f229141aa8
Signed-off-by: Lee Jones <joneslee@google.com>
(cherry picked from commit 5a4d20a3eb4e651f88ed2f1f08cee066639ca801)
The 'mmu' parameter to enter_vmid_context() represents the target MMU
to switch to, so we should stash away the current MMU for restoration
by exit_vmid_context() rather than the one we're about to switch to!
Bug: 291568386
Fixes: e815dfc6c6 ("ANDROID: KVM: arm64: Support TLB invalidation in guest context")
Tested-by: Mostafa Saleh <smostafa@google.com>
Reported-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I5d76c159424e32a6d70c598d0007f98ea80c1db4
KASAN suppresses reports for bad accesses done by the KASAN reporting
code. The reporting code might access poisoned memory for reporting
purposes.
Software KASAN modes do this by suppressing reports during reporting via
current->kasan_depth, the same way they suppress reports during accesses
to poisoned slab metadata.
Hardware Tag-Based KASAN does not use current->kasan_depth, and instead
resets pointer tags for accesses to poisoned memory done by the reporting
code.
Despite that, a recursive report can still happen:
1. On hardware with faulty MTE support. This was observed by Weizhao
Ouyang on a faulty hardware that caused memory tags to randomly change
from time to time.
2. Theoretically, due to a previous MTE-undetected memory corruption.
A recursive report can happen via:
1. Accessing a pointer with a non-reset tag in the reporting code, e.g.
slab->slab_cache, which is what Weizhao Ouyang observed.
2. Theoretically, via external non-annotated routines, e.g. stackdepot.
To resolve this issue, resetting tags for all of the pointers in the
reporting code and all the used external routines would be impractical.
Instead, disable tag checking done by the CPU for the duration of KASAN
reporting for Hardware Tag-Based KASAN.
Without this fix, Hardware Tag-Based KASAN reporting code might deadlock.
[andreyknvl@google.com: disable preemption instead of migration, fix comment typo]
Link: https://lkml.kernel.org/r/d14417c8bc5eea7589e99381203432f15c0f9138.1680114854.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/59f433e00f7fa985e8bf9f7caf78574db16b67ab.1678491668.git.andreyknvl@google.com
Fixes: 2e903b9147 ("kasan, arm64: implement HW_TAGS runtime")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Weizhao Ouyang <ouyangweizhao@zeku.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 254721825
(cherry picked from commit c6a690e0c9)
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Change-Id: Ifc5daf66f57dd16e85de73257cc0966565836269
Adding the following symbols:
- __get_task_comm
Bug: 260174400
Change-Id: I25953c46a6eb5c0ea136101a77ded198b3f796ce
Signed-off-by: Robin Peng <robinpeng@google.com>
If gs_close has cleared port->port.tty and gs_start_io is called
afterwards, then the function tty_wakeup will attempt to access the value
of the pointer port->port.tty which will cause a null pointer
dereference error.
To avoid this, add a null pointer check to gs_start_io before attempting
to access the value of the pointer port->port.tty.
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
Message-ID: <20230602070009.1353946-1-khtsai@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 283247551
(cherry picked from commit ffd603f214)
Change-Id: I782ef328b0d49810d3fb23c002a86439e6728542
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
Commit 56124d6c87 ("fsverity: support enabling with tree block size <
PAGE_SIZE") changed FS_IOC_ENABLE_VERITY to use __kernel_read() to read
the file's data, instead of direct pagecache accesses.
An unintended consequence of this is that the
'WARN_ON_ONCE(!(file->f_mode & FMODE_READ))' in __kernel_read() became
reachable by fuzz tests. This happens if FS_IOC_ENABLE_VERITY is called
on a fd opened with access mode 3, which means "ioctl access only".
Arguably, FS_IOC_ENABLE_VERITY should work on ioctl-only fds. But
ioctl-only fds are a weird Linux extension that is rarely used and that
few people even know about. (The documentation for FS_IOC_ENABLE_VERITY
even specifically says it requires O_RDONLY.) It's probably not
worthwhile to make the ioctl internally open a new fd just to handle
this case. Thus, just reject the ioctl on such fds for now.
Fixes: 56124d6c87 ("fsverity: support enabling with tree block size < PAGE_SIZE")
Reported-by: syzbot+51177e4144d764827c45@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=2281afcbbfa8fdb92f9887479cc0e4180f1c6b28
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230406215106.235829-1-ebiggers@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
(cherry picked from commit 0483913921)
Change-Id: I3043d7295d59c05f487c05258cb6bb0113357c6e
The new Merkle tree construction algorithm is a bit fragile in that it
may overflow the 'root_hash' array if the tree actually generated does
not match the calculated tree parameters.
This should never happen unless there is a filesystem bug that allows
the file size to change despite deny_write_access(), or a bug in the
Merkle tree logic itself. Regardless, it's fairly easy to check for
buffer overflow here, so let's do so.
This is a robustness improvement only; this case is not currently known
to be reachable. I've added a Fixes tag anyway, since I recommend that
this be included in kernels that have the mentioned commit.
Fixes: 56124d6c87 ("fsverity: support enabling with tree block size < PAGE_SIZE")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230328041505.110162-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
(cherry picked from commit 39049b69ec)
Change-Id: I248fd8686a806f0099bed1ac83d52362af3e194e
This reverts commit 21061b7d0f.
Let's use the upstream version.
Bug: 280545073
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: Idcdc94d6bd6b6272535a49c8639517ef1bddb246
[ Upstream commit 2b947f8769 ]
In renesas_usb3_probe, role_work is bound with renesas_usb3_role_work.
renesas_usb3_start will be called to start the work.
If we remove the driver which will call usbhs_remove, there may be
an unfinished work. The possible sequence is as follows:
CPU0 CPU1
renesas_usb3_role_work
renesas_usb3_remove
usb_role_switch_unregister
device_unregister
kfree(sw)
//free usb3->role_sw
usb_role_switch_set_role
//use usb3->role_sw
The usb3->role_sw could be freed under such circumstance and then
used in usb_role_switch_set_role.
This bug was found by static analysis. And note that removing a
driver is a root-only operation, and should never happen in normal
case. But the root user may directly remove the device which
will also trigger the remove function.
Fix it by canceling the work before cleanup in the renesas_usb3_remove.
Bug: 289003615
Fixes: 39facfa01c ("usb: gadget: udc: renesas_usb3: Add register of usb role switch")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20230320062931.505170-1-zyytlz.wz@163.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit df23805209)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I79a1dbeba9a90ee5daf94648ef6a32207b283561
[ Upstream commit 3228cec23b ]
In rkvdec_probe, rkvdec->watchdog_work is bound with
rkvdec_watchdog_func. Then rkvdec_vp9_run may
be called to start the work.
If we remove the module which will call rkvdec_remove
to make cleanup, there may be a unfinished work.
The possible sequence is as follows, which will
cause a typical UAF bug.
Fix it by canceling the work before cleanup in rkvdec_remove.
CPU0 CPU1
|rkvdec_watchdog_func
rkvdec_remove |
rkvdec_v4l2_cleanup|
v4l2_m2m_release |
kfree(m2m_dev); |
|
| v4l2_m2m_get_curr_priv
| m2m_dev->curr_ctx //use
Bug: 289003637
Fixes: cd33c83044 ("media: rkvdec: Add the rkvdec driver")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 6a17add9c6)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ibdf4667315d98ac1cd42545f61e271c291893edd
commit 43ec16f145 upstream.
There is a crash in relay_file_read, as the var from
point to the end of last subbuf.
The oops looks something like:
pc : __arch_copy_to_user+0x180/0x310
lr : relay_file_read+0x20c/0x2c8
Call trace:
__arch_copy_to_user+0x180/0x310
full_proxy_read+0x68/0x98
vfs_read+0xb0/0x1d0
ksys_read+0x6c/0xf0
__arm64_sys_read+0x20/0x28
el0_svc_common.constprop.3+0x84/0x108
do_el0_svc+0x74/0x90
el0_svc+0x1c/0x28
el0_sync_handler+0x88/0xb0
el0_sync+0x148/0x180
We get the condition by analyzing the vmcore:
1). The last produced byte and last consumed byte
both at the end of the last subbuf
2). A softirq calls function(e.g __blk_add_trace)
to write relay buffer occurs when an program is calling
relay_file_read_avail().
relay_file_read
relay_file_read_avail
relay_file_read_consume(buf, 0, 0);
//interrupted by softirq who will write subbuf
....
return 1;
//read_start point to the end of the last subbuf
read_start = relay_file_read_start_pos
//avail is equal to subsize
avail = relay_file_read_subbuf_avail
//from points to an invalid memory address
from = buf->start + read_start
//system is crashed
copy_to_user(buffer, from, avail)
Bug: 288957094
Link: https://lkml.kernel.org/r/20230419040203.37676-1-zhang.zhengming@h3c.com
Fixes: 8d62fdebda ("relay file read: start-pos fix")
Signed-off-by: Zhang Zhengming <zhang.zhengming@h3c.com>
Reviewed-by: Zhao Lei <zhao_lei1@hoperun.com>
Reviewed-by: Zhou Kete <zhou.kete@h3c.com>
Reviewed-by: Pengcheng Yang <yangpc@wangsu.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f6ee841ff2)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ibbdf65d8bf2268c3e8c09520f595167a2ed41e8b
This reverts commit 1f86123b97 ("net: align SO_RCVMARK required
privileges with SO_MARK") because the reasoning in the commit message
is not really correct:
SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such
it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check
and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which
sets the socket mark and does require privs.
Additionally incoming skb->mark may already be visible if
sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled.
Furthermore, it is easier to block the getsockopt via bpf
(either cgroup setsockopt hook, or via syscall filters)
then to unblock it if it requires CAP_NET_RAW/ADMIN.
On Android the socket mark is (among other things) used to store
the network identifier a socket is bound to. Setting it is privileged,
but retrieving it is not. We'd like unprivileged userspace to be able
to read the network id of incoming packets (where mark is set via
iptables [to be moved to bpf])...
An alternative would be to add another sysctl to control whether
setting SO_RCVMARK is privilged or not.
(or even a MASK of which bits in the mark can be exposed)
But this seems like over-engineering...
Note: This is a non-trivial revert, due to later merged commit e42c7beee7
("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()")
which changed both 'ns_capable' into 'sockopt_ns_capable' calls.
Bug: 254441685
Fixes: 1f86123b97 ("net: align SO_RCVMARK required privileges with SO_MARK")
Cc: Larysa Zaremba <larysa.zaremba@intel.com>
Cc: Simon Horman <simon.horman@corigine.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Patrick Rohr <prohr@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
(cherry picked from commit a9628e8877)
[Lee: Fixed trivial merge conflict - result is the same]
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Iee4d495734536509c1fc4db61879113a311e4033
When handling ESR_ELx_EC_WATCHPT_LOW, far_el2 member of struct
kvm_vcpu_fault_info will be copied to far member of struct
kvm_debug_exit_arch and exposed to the userspace. The userspace will
see stale values from older faults if the fault info does not get
populated.
Bug: 254441685
Fixes: 8fb2046180 ("KVM: arm64: Move early handlers to per-EC handlers")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230530024651.10014-1-akihiko.odaki@daynix.com
Cc: stable@vger.kernel.org
(cherry picked from commit 811154e234)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I3d6dfed43293fbcd60898943e41ef2e3f6697a9f
In kvm_vm_ioctl_create_vcpu(), add vcpu to vcpu_array iff it's safe to
access vcpu via kvm_get_vcpu() and kvm_for_each_vcpu(), i.e. when there's
no failure path requiring vcpu removal and destruction. Such order is
important because vcpu_array accessors may end up referencing vcpu at
vcpu_array[0] even before online_vcpus is set to 1.
When online_vcpus=0, any call to kvm_get_vcpu() goes through
array_index_nospec() and ends with an attempt to xa_load(vcpu_array, 0):
int num_vcpus = atomic_read(&kvm->online_vcpus);
i = array_index_nospec(i, num_vcpus);
return xa_load(&kvm->vcpu_array, i);
Similarly, when online_vcpus=0, a kvm_for_each_vcpu() does not iterate over
an "empty" range, but actually [0, ULONG_MAX]:
xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, \
(atomic_read(&kvm->online_vcpus) - 1))
In both cases, such online_vcpus=0 edge case, even if leading to
unnecessary calls to XArray API, should not be an issue; requesting
unpopulated indexes/ranges is handled by xa_load() and xa_for_each_range().
However, this means that when the first vCPU is created and inserted in
vcpu_array *and* before online_vcpus is incremented, code calling
kvm_get_vcpu()/kvm_for_each_vcpu() already has access to that first vCPU.
This should not pose a problem assuming that once a vcpu is stored in
vcpu_array, it will remain there, but that's not the case:
kvm_vm_ioctl_create_vcpu() first inserts to vcpu_array, then requests a
file descriptor. If create_vcpu_fd() fails, newly inserted vcpu is removed
from the vcpu_array, then destroyed:
vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
kvm_get_kvm(kvm);
r = create_vcpu_fd(vcpu);
if (r < 0) {
xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx);
kvm_put_kvm_no_destroy(kvm);
goto unlock_vcpu_destroy;
}
atomic_inc(&kvm->online_vcpus);
This results in a possible race condition when a reference to a vcpu is
acquired (via kvm_get_vcpu() or kvm_for_each_vcpu()) moments before said
vcpu is destroyed.
Bug: 254441685
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Message-Id: <20230510140410.1093987-2-mhal@rbox.co>
Cc: stable@vger.kernel.org
Fixes: c5b0775491 ("KVM: Convert the kvm->vcpus array to a xarray", 2021-12-08)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit afb2acb2e3)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I79735110d2e95dddb8181c72716a24cd87736094
Now that DVB_CORE can be a loadable module, pvrusb2 can run into
a link error:
ld.lld: error: undefined symbol: dvb_module_probe
>>> referenced by pvrusb2-devattr.c
>>> drivers/media/usb/pvrusb2/pvrusb2-devattr.o:(pvr2_lgdt3306a_attach) in archive vmlinux.a
ld.lld: error: undefined symbol: dvb_module_release
>>> referenced by pvrusb2-devattr.c
>>> drivers/media/usb/pvrusb2/pvrusb2-devattr.o:(pvr2_dual_fe_attach) in archive vmlinux.a
Refine the Kconfig dependencies to avoid this case.
Bug: 254441685
Link: https://lore.kernel.org/linux-media/20230117171055.2714621-1-arnd@kernel.org
Fixes: 7655c342db ("media: Kconfig: Make DVB_CORE=m possible when MEDIA_SUPPORT=y")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
(cherry picked from commit 53558de2b5)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ib47c2e59aff511becce09e81c71c9eeb01695a67
When ufshcd_err_handler() is executed, CQ event interrupt can enter waiting
for the same lock. This can happen in ufshcd_handle_mcq_cq_events() and
also in ufs_mtk_mcq_intr(). The following warning message will be generated
when &hwq->cq_lock is used in IRQ context with IRQ enabled. Use
ufshcd_mcq_poll_cqe_lock() with spin_lock_irqsave instead of spin_lock to
resolve the deadlock issue.
[name:lockdep&]WARNING: inconsistent lock state
[name:lockdep&]--------------------------------
[name:lockdep&]inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[name:lockdep&]kworker/u16:4/260 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffffff8028444600 (&hwq->cq_lock){?.-.}-{2:2}, at:
ufshcd_mcq_poll_cqe_lock+0x30/0xe0
[name:lockdep&]{IN-HARDIRQ-W} state was registered at:
lock_acquire+0x17c/0x33c
_raw_spin_lock+0x5c/0x7c
ufshcd_mcq_poll_cqe_lock+0x30/0xe0
ufs_mtk_mcq_intr+0x60/0x1bc [ufs_mediatek_mod]
__handle_irq_event_percpu+0x140/0x3ec
handle_irq_event+0x50/0xd8
handle_fasteoi_irq+0x148/0x2b0
generic_handle_domain_irq+0x4c/0x6c
gic_handle_irq+0x58/0x134
call_on_irq_stack+0x40/0x74
do_interrupt_handler+0x84/0xe4
el1_interrupt+0x3c/0x78
<snip>
Possible unsafe locking scenario:
CPU0
----
lock(&hwq->cq_lock);
<Interrupt>
lock(&hwq->cq_lock);
*** DEADLOCK ***
2 locks held by kworker/u16:4/260:
[name:lockdep&]
stack backtrace:
CPU: 7 PID: 260 Comm: kworker/u16:4 Tainted: G S W OE
6.1.17-mainline-android14-2-g277223301adb #1
Workqueue: ufs_eh_wq_0 ufshcd_err_handler
Call trace:
dump_backtrace+0x10c/0x160
show_stack+0x20/0x30
dump_stack_lvl+0x98/0xd8
dump_stack+0x20/0x60
print_usage_bug+0x584/0x76c
mark_lock_irq+0x488/0x510
mark_lock+0x1ec/0x25c
__lock_acquire+0x4d8/0xffc
lock_acquire+0x17c/0x33c
_raw_spin_lock+0x5c/0x7c
ufshcd_mcq_poll_cqe_lock+0x30/0xe0
ufshcd_poll+0x68/0x1b0
ufshcd_transfer_req_compl+0x9c/0xc8
ufshcd_err_handler+0x3bc/0xea0
process_one_work+0x2f4/0x7e8
worker_thread+0x234/0x450
kthread+0x110/0x134
ret_from_fork+0x10/0x20
Bug: 254441685
Fixes: ed975065c3 ("scsi: ufs: core: mcq: Add completion support in poll")
Reviewed-by: Can Guo <quic_cang@quicinc.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Alice Chao <alice.chao@mediatek.com>
Link: https://lore.kernel.org/r/20230424080400.8955-1-alice.chao@mediatek.com
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 948afc6961)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: If4af26c78561e0fd3f92bd039976380617cc3550
commit d082d48737 upstream.
KPTI keeps around two PGDs: one for userspace and another for the
kernel. Among other things, set_pgd() contains infrastructure to
ensure that updates to the kernel PGD are reflected in the user PGD
as well.
One side-effect of this is that set_pgd() expects to be passed whole
pages. Unfortunately, init_trampoline_kaslr() passes in a single entry:
'trampoline_pgd_entry'.
When KPTI is on, set_pgd() will update 'trampoline_pgd_entry' (an
8-Byte globally stored [.bss] variable) and will then proceed to
replicate that value into the non-existent neighboring user page
(located +4k away), leading to the corruption of other global [.bss]
stored variables.
Fix it by directly assigning 'trampoline_pgd_entry' and avoiding
set_pgd().
[ dhansen: tweak subject and changelog ]
Bug: 274115504
Fixes: 0925dda596 ("x86/mm/KASLR: Use only one PUD entry for real mode trampoline")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/all/20230614163859.924309-1-lee@kernel.org/g
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 364fdcbb03)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Idc1fc494d7ccb4a8a3765e1f46482583b528a584
[ Upstream commit 1240eb93f0 ]
In case of error when adding a new rule that refers to an anonymous set,
deactivate expressions via NFT_TRANS_PREPARE state, not NFT_TRANS_RELEASE.
Thus, the lookup expression marks anonymous sets as inactive in the next
generation to ensure it is not reachable in this transaction anymore and
decrement the set refcount as introduced by c1592a8994 ("netfilter:
nf_tables: deactivate anonymous set from preparation phase"). The abort
step takes care of undoing the anonymous set.
This is also consistent with rule deletion, where NFT_TRANS_PREPARE is
used. Note that this error path is exercised in the preparation step of
the commit protocol. This patch replaces nf_tables_rule_release() by the
deactivate and destroy calls, this time with NFT_TRANS_PREPARE.
Due to this incorrect error handling, it is possible to access a
dangling pointer to the anonymous set that remains in the transaction
list.
[1009.379054] BUG: KASAN: use-after-free in nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379106] Read of size 8 at addr ffff88816c4c8020 by task nft-rule-add/137110
[1009.379116] CPU: 7 PID: 137110 Comm: nft-rule-add Not tainted 6.4.0-rc4+ #256
[1009.379128] Call Trace:
[1009.379132] <TASK>
[1009.379135] dump_stack_lvl+0x33/0x50
[1009.379146] ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379191] print_address_description.constprop.0+0x27/0x300
[1009.379201] kasan_report+0x107/0x120
[1009.379210] ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379255] nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379302] nft_lookup_init+0xa5/0x270 [nf_tables]
[1009.379350] nf_tables_newrule+0x698/0xe50 [nf_tables]
[1009.379397] ? nf_tables_rule_release+0xe0/0xe0 [nf_tables]
[1009.379441] ? kasan_unpoison+0x23/0x50
[1009.379450] nfnetlink_rcv_batch+0x97c/0xd90 [nfnetlink]
[1009.379470] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]
[1009.379485] ? __alloc_skb+0xb8/0x1e0
[1009.379493] ? __alloc_skb+0xb8/0x1e0
[1009.379502] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[1009.379509] ? unwind_get_return_address+0x2a/0x40
[1009.379517] ? write_profile+0xc0/0xc0
[1009.379524] ? avc_lookup+0x8f/0xc0
[1009.379532] ? __rcu_read_unlock+0x43/0x60
Bug: 289230343
Fixes: 958bee14d0 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4aaa3b730d)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ia62fea0e2c2c2cf944dde80751a9dfb85108e758
[ Upstream commit 4d56304e58 ]
If we send two TCA_FLOWER_KEY_ENC_OPTS_GENEVE packets and their total
size is 252 bytes(key->enc_opts.len = 252) then
key->enc_opts.len = opt->length = data_len / 4 = 0 when the third
TCA_FLOWER_KEY_ENC_OPTS_GENEVE packet enters fl_set_geneve_opt. This
bypasses the next bounds check and results in an out-of-bounds.
Bug: 288660424
Fixes: 0a6e77784f ("net/sched: allow flower to match tunnel options")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Link: https://lore.kernel.org/r/20230531102805.27090-1-hbh25y@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 45f47d2cf1)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I53c534b7d43f4c7da5a9f63556c79d35797aa598
Assignment of NVIDIA Ampere-based GPUs have seen a regression since the
below referenced commit, where the reduced D3hot transition delay appears
to introduce a small window where a D3hot->D0 transition followed by a bus
reset can wedge the device. The entire device is subsequently unavailable,
returning -1 on config space read and is unrecoverable without a host
reset.
This has been observed with RTX A2000 and A5000 GPU and audio functions
assigned to a Windows VM, where shutdown of the VM places the devices in
D3hot prior to vfio-pci performing a bus reset when userspace releases the
devices. The issue has roughly a 2-3% chance of occurring per shutdown.
Restoring the HDA controller d3hot_delay to the effective value before the
below commit has been shown to resolve the issue. NVIDIA confirms this
change should be safe for all of their HDA controllers.
Bug: 254441685
Fixes: 3e347969a5 ("PCI/PM: Reduce D3hot delay with usleep_range()")
Link: https://lore.kernel.org/r/20230413194042.605768-1-alex.williamson@redhat.com
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Tarun Gupta <targupta@nvidia.com>
Cc: Abhishek Sahu <abhsahu@nvidia.com>
Cc: Tarun Gupta <targupta@nvidia.com>
(cherry picked from commit a5a6dd2624)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ie8bb6c852e041ce16b4f9086c42030dc24375602