mirror of
https://github.com/hardkernel/linux.git
synced 2026-06-06 19:08:57 +09:00
ffc19fe2a7ade9c3bf84959bb91a49513be86792
1236709 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
ffc19fe2a7 |
btrfs: send: add and use helper to rename current inode when processing refs
[ Upstream commit ec666c84deba56f714505b53556a97565f72db86 ] Extract the logic to rename the current inode at process_recorded_refs() into a helper function and use it, therefore removing duplicated logic and making it easier for an upcoming patch by avoiding yet more duplicated logic. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
bfed58ed10 |
btrfs: send: only use boolean variables at process_recorded_refs()
[ Upstream commit 9453fe329789073d9a971de01da5902c32c1a01a ] We have several local variables at process_recorded_refs() that are used as booleans, with some of them having a 'bool' type while two of them having an 'int' type. Change this to make them all use the 'bool' type which is more clear and to make everything more consistent. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
253557254f |
btrfs: send: factor out common logic when sending xattrs
[ Upstream commit 17f6a74d0b89092e38e3328b66eda1ab29a195d4 ] We always send xattrs for the current inode only and both callers of send_set_xattr() pass a path for the current inode. So move the path allocation and computation to send_set_xattr(), reducing duplicated code. This also facilitates an upcoming patch. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
b1947464c0 |
btrfs: populate otime when logging an inode item
[ Upstream commit 1ef94169db0958d6de39f9ea6e063ce887342e2d ]
[TEST FAILURE WITH EXPERIMENTAL FEATURES]
When running test case generic/508, the test case will fail with the new
btrfs shutdown support:
generic/508 - output mismatch (see /home/adam/xfstests/results//generic/508.out.bad)
# --- tests/generic/508.out 2022-05-11 11:25:30.806666664 +0930
# +++ /home/adam/xfstests/results//generic/508.out.bad 2025-07-02 14:53:22.401824212 +0930
# @@ -1,2 +1,6 @@
# QA output created by 508
# Silence is golden
# +Before:
# +After : stat.btime = Thu Jan 1 09:30:00 1970
# +Before:
# +After : stat.btime = Wed Jul 2 14:53:22 2025
# ...
# (Run 'diff -u /home/adam/xfstests/tests/generic/508.out /home/adam/xfstests/results//generic/508.out.bad' to see the entire diff)
Ran: generic/508
Failures: generic/508
Failed 1 of 1 tests
Please note that the test case requires shutdown support, thus the test
case will be skipped using the current upstream kernel, as it doesn't
have shutdown ioctl support.
[CAUSE]
The direct cause the 0 time stamp in the log tree:
leaf 30507008 items 2 free space 16057 generation 9 owner TREE_LOG
leaf 30507008 flags 0x1(WRITTEN) backref revision 1
checksum stored e522548d
checksum calced e522548d
fs uuid 57d45451-481e-43e4-aa93-289ad707a3a0
chunk uuid d52bd3fd-5163-4337-98a7-7986993ad398
item 0 key (257 INODE_ITEM 0) itemoff 16123 itemsize 160
generation 9 transid 9 size 0 nbytes 0
block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
atime 1751432947.492000000 (2025-07-02 14:39:07)
ctime 1751432947.492000000 (2025-07-02 14:39:07)
mtime 1751432947.492000000 (2025-07-02 14:39:07)
otime 0.0 (1970-01-01 09:30:00) <<<
But the old fs tree has all the correct time stamp:
btrfs-progs v6.12
fs tree key (FS_TREE ROOT_ITEM 0)
leaf 30425088 items 2 free space 16061 generation 5 owner FS_TREE
leaf 30425088 flags 0x1(WRITTEN) backref revision 1
checksum stored 48f6c57e
checksum calced 48f6c57e
fs uuid 57d45451-481e-43e4-aa93-289ad707a3a0
chunk uuid d52bd3fd-5163-4337-98a7-7986993ad398
item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
generation 3 transid 0 size 0 nbytes 16384
block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0
sequence 0 flags 0x0(none)
atime 1751432947.0 (2025-07-02 14:39:07)
ctime 1751432947.0 (2025-07-02 14:39:07)
mtime 1751432947.0 (2025-07-02 14:39:07)
otime 1751432947.0 (2025-07-02 14:39:07) <<<
The root cause is that fill_inode_item() in tree-log.c is only
populating a/c/m time, not the otime (or btime in statx output).
Part of the reason is that, the vfs inode only has a/c/m time, no native
btime support yet.
[FIX]
Thankfully btrfs has its otime stored in btrfs_inode::i_otime_sec and
btrfs_inode::i_otime_nsec.
So what we really need is just fill the otime time stamp in
fill_inode_item() of tree-log.c
There is another fill_inode_item() in inode.c, which is doing the proper
otime population.
Fixes:
|
||
|
|
d7badc2ba4 |
btrfs: constify more pointer parameters
[ Upstream commit ca283ea9920ac20ae23ed398b693db3121045019 ] Continue adding const to parameters. This is for clarity and minor addition to safety. There are some minor effects, in the assembly code and .ko measured on release config. Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
9dfdea89ce |
btrfs: fix ssd_spread overallocation
[ Upstream commit 807d9023e75fc20bfd6dd2ac0408ce4af53f1648 ]
If the ssd_spread mount option is enabled, then we run the so called
clustered allocator for data block groups. In practice, this results in
creating a btrfs_free_cluster which caches a block_group and borrows its
free extents for allocation.
Since the introduction of allocation size classes in 6.1, there has been
a bug in the interaction between that feature and ssd_spread.
find_free_extent() has a number of nested loops. The loop going over the
allocation stages, stored in ffe_ctl->loop and managed by
find_free_extent_update_loop(), the loop over the raid levels, and the
loop over all the block_groups in a space_info. The size class feature
relies on the block_group loop to ensure it gets a chance to see a
block_group of a given size class. However, the clustered allocator
uses the cached cluster block_group and breaks that loop. Each call to
do_allocation() will really just go back to the same cached block_group.
Normally, this is OK, as the allocation either succeeds and we don't
want to loop any more or it fails, and we clear the cluster and return
its space to the block_group.
But with size classes, the allocation can succeed, then later fail,
outside of do_allocation() due to size class mismatch. That latter
failure is not properly handled due to the highly complex multi loop
logic. The result is a painful loop where we continue to allocate the
same num_bytes from the cluster in a tight loop until it fails and
releases the cluster and lets us try a new block_group. But by then, we
have skipped great swaths of the available block_groups and are likely
to fail to allocate, looping the outer loop. In pathological cases like
the reproducer below, the cached block_group is often the very last one,
in which case we don't perform this tight bg loop but instead rip
through the ffe stages to LOOP_CHUNK_ALLOC and allocate a chunk, which
is now the last one, and we enter the tight inner loop until an
allocation failure. Then allocation succeeds on the final block_group
and if the next allocation is a size mismatch, the exact same thing
happens again.
Triggering this is as easy as mounting with -o ssd_spread and then
running:
mount -o ssd_spread $dev $mnt
dd if=/dev/zero of=$mnt/big bs=16M count=1 &>/dev/null
dd if=/dev/zero of=$mnt/med bs=4M count=1 &>/dev/null
sync
if you do the two writes + sync in a loop, you can force btrfs to spin
an excessive amount on semi-successful clustered allocations, before
ultimately failing and advancing to the stage where we force a chunk
allocation. This results in 2G of data allocated per iteration, despite
only using ~20M of data. By using a small size classed extent, the inner
loop takes longer and we can spin for longer.
The simplest, shortest term fix to unbreak this is to make the clustered
allocator size_class aware in the dumbest way, where it fails on size
class mismatch. This may hinder the operation of the clustered
allocator, but better hindered than completely broken and terribly
overallocating.
Further re-design improvements are also in the works.
Fixes:
|
||
|
|
9c64d6b07c |
btrfs: open code timespec64 in struct btrfs_inode
[ Upstream commit c6e8f898f56fae2cb5bc4396bec480f23cd8b066 ] The type of timespec64::tv_nsec is 'unsigned long', while we have only u32 for on-disk and in-memory. This wastes a few bytes in btrfs_inode. Add separate members for sec and nsec with the corresponding type width. This creates a 4 byte hole in btrfs_inode which can be utilized in the future. Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 1ef94169db09 ("btrfs: populate otime when logging an inode item") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
8f85af3269 |
xfs: fully decouple XFS_IBULK* flags from XFS_IWALK* flags
[ Upstream commit d2845519b0723c5d5a0266cbf410495f9b8fd65c ]
Fix up xfs_inumbers to now pass in the XFS_IBULK* flags into the flags
argument to xfs_inobt_walk, which expects the XFS_IWALK* flags.
Currently passing the wrong flags works for non-debug builds because
the only XFS_IWALK* flag has the same encoding as the corresponding
XFS_IBULK* flag, but in debug builds it can trigger an assert that no
incorrect flag is passed. Instead just extra the relevant flag.
Fixes:
|
||
|
|
4734255ef3 |
btrfs: abort transaction on unexpected eb generation at btrfs_copy_root()
[ Upstream commit 33e8f24b52d2796b8cfb28c19a1a7dd6476323a8 ] If we find an unexpected generation for the extent buffer we are cloning at btrfs_copy_root(), we just WARN_ON() and don't error out and abort the transaction, meaning we allow to persist metadata with an unexpected generation. Instead of warning only, abort the transaction and return -EUCLEAN. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Daniel Vacek <neelx@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
467dcf7e4a |
btrfs: always abort transaction on failure to add block group to free space tree
[ Upstream commit 1f06c942aa709d397cf6bed577a0d10a61509667 ] Only one of the callers of __add_block_group_free_space() aborts the transaction if the call fails, while the others don't do it and it's either never done up the call chain or much higher in the call chain. So make sure we abort the transaction at __add_block_group_free_space() if it fails, which brings a couple benefits: 1) If some call chain never aborts the transaction, we avoid having some metadata inconsistency because BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE is cleared when we enter __add_block_group_free_space() and therefore __add_block_group_free_space() is never called again to add the block group items to the free space tree, since the function is only called when that flag is set in a block group; 2) If the call chain already aborts the transaction, then we get a better trace that points to the exact step from __add_block_group_free_space() which failed, which is better for analysis. So abort the transaction at __add_block_group_free_space() if any of its steps fails. CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
8c5bf13905 |
btrfs: move transaction aborts to the error site in add_block_group_free_space()
[ Upstream commit b63c8c1ede4407835cb8c8bed2014d96619389f3 ] Transaction aborts should be done next to the place the error happens, which was not done in add_block_group_free_space(). Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 1f06c942aa70 ("btrfs: always abort transaction on failure to add block group to free space tree") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
b172535ccb |
btrfs: qgroup: fix race between quota disable and quota rescan ioctl
[ Upstream commit e1249667750399a48cafcf5945761d39fa584edf ] There's a race between a task disabling quotas and another running the rescan ioctl that can result in a use-after-free of qgroup records from the fs_info->qgroup_tree rbtree. This happens as follows: 1) Task A enters btrfs_ioctl_quota_rescan() -> btrfs_qgroup_rescan(); 2) Task B enters btrfs_quota_disable() and calls btrfs_qgroup_wait_for_completion(), which does nothing because at that point fs_info->qgroup_rescan_running is false (it wasn't set yet by task A); 3) Task B calls btrfs_free_qgroup_config() which starts freeing qgroups from fs_info->qgroup_tree without taking the lock fs_info->qgroup_lock; 4) Task A enters qgroup_rescan_zero_tracking() which starts iterating the fs_info->qgroup_tree tree while holding fs_info->qgroup_lock, but task B is freeing qgroup records from that tree without holding the lock, resulting in a use-after-free. Fix this by taking fs_info->qgroup_lock at btrfs_free_qgroup_config(). Also at btrfs_qgroup_rescan() don't start the rescan worker if quotas were already disabled. Reported-by: cen zhang <zzzccc427@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAFRLqsV+cMDETFuzqdKSHk_FDm6tneea45krsHqPD6B3FetLpQ@mail.gmail.com/ CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> [ Check for BTRFS_FS_QUOTA_ENABLED, instead of btrfs_qgroup_full_accounting() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
d6f1a6c2f5 |
btrfs: don't ignore inode missing when replaying log tree
[ Upstream commit 7ebf381a69421a88265d3c49cd0f007ba7336c9d ]
During log replay, at add_inode_ref(), we return -ENOENT if our current
inode isn't found on the subvolume tree or if a parent directory isn't
found. The error comes from btrfs_iget_logging() <- btrfs_iget() <-
btrfs_read_locked_inode().
The single caller of add_inode_ref(), replay_one_buffer(), ignores an
-ENOENT error because it expects that error to mean only that a parent
directory wasn't found and that is ok.
Before commit 5f61b961599a ("btrfs: fix inode lookup error handling during
log replay") we were converting any error when getting a parent directory
to -ENOENT and any error when getting the current inode to -EIO, so our
caller would fail log replay in case we can't find the current inode.
After that commit however in case the current inode is not found we return
-ENOENT to the caller and therefore it ignores the critical fact that the
current inode was not found in the subvolume tree.
Fix this by converting -ENOENT to 0 when we don't find a parent directory,
returning -ENOENT when we don't find the current inode and making the
caller, replay_one_buffer(), not ignore -ENOENT anymore.
Fixes: 5f61b961599a ("btrfs: fix inode lookup error handling during log replay")
CC: stable@vger.kernel.org # 6.16
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[ adapted btrfs_inode pointer usage to older inode API ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
a779d1dd7f |
usb: typec: fusb302: cache PD RX state
[ Upstream commit 1e61f6ab08786d66a11cfc51e13d6f08a6b06c56 ]
This patch fixes a race condition communication error, which ends up in
PD hard resets when losing the race. Some systems, like the Radxa ROCK
5B are powered through USB-C without any backup power source and use a
FUSB302 chip to do the PD negotiation. This means it is quite important
to avoid hard resets, since that effectively kills the system's
power-supply.
I've found the following race condition while debugging unplanned power
loss during booting the board every now and then:
1. lots of TCPM/FUSB302/PD initialization stuff
2. TCPM ends up in SNK_WAIT_CAPABILITIES (tcpm_set_pd_rx is enabled here)
3. the remote PD source does not send anything, so TCPM does a SOFT RESET
4. TCPM ends up in SNK_WAIT_CAPABILITIES for the second time
(tcpm_set_pd_rx is enabled again, even though it is still on)
At this point I've seen broken CRC good messages being send by the
FUSB302 with a logic analyzer sniffing the CC lines. Also it looks like
messages are being lost and things generally going haywire with one of
the two sides doing a hard reset once a broken CRC good message was send
to the bus.
I think the system is running into a race condition, that the FIFOs are
being cleared and/or the automatic good CRC message generation flag is
being updated while a message is already arriving.
Let's avoid this by caching the PD RX enabled state, as we have already
processed anything in the FIFOs and are in a good state. As a side
effect that this also optimizes I2C bus usage :)
As far as I can tell the problem theoretically also exists when TCPM
enters SNK_WAIT_CAPABILITIES the first time, but I believe this is less
critical for the following reason:
On devices like the ROCK 5B, which are powered through a TCPM backed
USB-C port, the bootloader must have done some prior PD communication
(initial communication must happen within 5 seconds after plugging the
USB-C plug). This means the first time the kernel TCPM state machine
reaches SNK_WAIT_CAPABILITIES, the remote side is not sending messages
actively. On other devices a hard reset simply adds some extra delay and
things should be good afterwards.
Fixes:
|
||
|
|
827f7b4fe0 |
PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports
[ Upstream commit 6cff20ce3b92ffbf2fc5eb9e5a030b3672aa414a ]
pci_bridge_d3_possible() is called from both pcie_portdrv_probe() and
pcie_portdrv_remove() to determine whether runtime power management shall
be enabled (on probe) or disabled (on remove) on a PCIe port.
The underlying assumption is that pci_bridge_d3_possible() always returns
the same value, else a runtime PM reference imbalance would occur. That
assumption is not given if the PCIe port is inaccessible on remove due to
hot-unplug: pci_bridge_d3_possible() calls pciehp_is_native(), which
accesses Config Space to determine whether the port is Hot-Plug Capable.
An inaccessible port returns "all ones", which is converted to "all
zeroes" by pcie_capability_read_dword(). Hence the port no longer seems
Hot-Plug Capable on remove even though it was on probe.
The resulting runtime PM ref imbalance causes warning messages such as:
pcieport 0000:02:04.0: Runtime PM usage count underflow!
Avoid the Config Space access (and thus the runtime PM ref imbalance) by
caching the Hot-Plug Capable bit in struct pci_dev.
The struct already contains an "is_hotplug_bridge" flag, which however is
not only set on Hot-Plug Capable PCIe ports, but also Conventional PCI
Hot-Plug bridges and ACPI slots. The flag identifies bridges which are
allocated additional MMIO and bus number resources to allow for hierarchy
expansion.
The kernel is somewhat sloppily using "is_hotplug_bridge" in a number of
places to identify Hot-Plug Capable PCIe ports, even though the flag
encompasses other devices. Subsequent commits replace these occurrences
with the new flag to clearly delineate Hot-Plug Capable PCIe ports from
other kinds of hotplug bridges.
Document the existing "is_hotplug_bridge" and the new "is_pciehp" flag
and document the (non-obvious) requirement that pci_bridge_d3_possible()
always returns the same value across the entire lifetime of a bridge,
including its hot-removal.
Fixes:
|
||
|
|
80617b3926 |
block: Make REQ_OP_ZONE_FINISH a write operation
[ Upstream commit 3f66ccbaaef3a0c5bd844eab04e3207b4061c546 ]
REQ_OP_ZONE_FINISH is defined as "12", which makes
op_is_write(REQ_OP_ZONE_FINISH) return false, despite the fact that a
zone finish operation is an operation that modifies a zone (transition
it to full) and so should be considered as a write operation (albeit
one that does not transfer any data to the device).
Fix this by redefining REQ_OP_ZONE_FINISH to be an odd number (13), and
redefine REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL using sequential
odd numbers from that new value.
Fixes:
|
||
|
|
6c77d4e0f6 |
block: reject invalid operation in submit_bio_noacct
[ Upstream commit 1c042f8d4bc342b7985b1de3d76836f1a1083b65 ] submit_bio_noacct allows completely invalid operations, or operations that are not supported in the bio path. Extent the existing switch statement to rejcect all invalid types. Move the code point for REQ_OP_ZONE_APPEND so that it's not right in the middle of the zone management operations and the switch statement can follow the numerical order of the operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231221070538.1112446-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 3f66ccbaaef3 ("block: Make REQ_OP_ZONE_FINISH a write operation") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
3bbd52a413 |
fscrypt: Don't use problematic non-inline crypto engines
[ Upstream commit b41c1d8d07906786c60893980d52688f31d114a6 ]
Make fscrypt no longer use Crypto API drivers for non-inline crypto
engines, even when the Crypto API prioritizes them over CPU-based code
(which unfortunately it often does). These drivers tend to be really
problematic, especially for fscrypt's workload. This commit has no
effect on inline crypto engines, which are different and do work well.
Specifically, exclude drivers that have CRYPTO_ALG_KERN_DRIVER_ONLY or
CRYPTO_ALG_ALLOCATES_MEMORY set. (Later, CRYPTO_ALG_ASYNC should be
excluded too. That's omitted for now to keep this commit backportable,
since until recently some CPU-based code had CRYPTO_ALG_ASYNC set.)
There are two major issues with these drivers: bugs and performance.
First, these drivers tend to be buggy. They're fundamentally much more
error-prone and harder to test than the CPU-based code. They often
don't get tested before kernel releases, and even if they do, the crypto
self-tests don't properly test these drivers. Released drivers have
en/decrypted or hashed data incorrectly. These bugs cause issues for
fscrypt users who often didn't even want to use these drivers, e.g.:
- https://github.com/google/fscryptctl/issues/32
- https://github.com/google/fscryptctl/issues/9
- https://lore.kernel.org/r/PH0PR02MB731916ECDB6C613665863B6CFFAA2@PH0PR02MB7319.namprd02.prod.outlook.com
These drivers have also similarly caused issues for dm-crypt users,
including data corruption and deadlocks. Since Linux v5.10, dm-crypt
has disabled most of them by excluding CRYPTO_ALG_ALLOCATES_MEMORY.
Second, these drivers tend to be *much* slower than the CPU-based code.
This may seem counterintuitive, but benchmarks clearly show it. There's
a *lot* of overhead associated with going to a hardware driver, off the
CPU, and back again. To prove this, I gathered as many systems with
this type of crypto engine as I could, and I measured synchronous
encryption of 4096-byte messages (which matches fscrypt's workload):
Intel Emerald Rapids server:
AES-256-XTS:
xts-aes-vaes-avx512 16171 MB/s [CPU-based, Vector AES]
qat_aes_xts 289 MB/s [Offload, Intel QuickAssist]
Qualcomm SM8650 HDK:
AES-256-XTS:
xts-aes-ce 4301 MB/s [CPU-based, ARMv8 Crypto Extensions]
xts-aes-qce 73 MB/s [Offload, Qualcomm Crypto Engine]
i.MX 8M Nano LPDDR4 EVK:
AES-256-XTS:
xts-aes-ce 647 MB/s [CPU-based, ARMv8 Crypto Extensions]
xts(ecb-aes-caam) 20 MB/s [Offload, CAAM]
AES-128-CBC-ESSIV:
essiv(cbc-aes-caam,sha256-lib) 23 MB/s [Offload, CAAM]
STM32MP157F-DK2:
AES-256-XTS:
xts-aes-neonbs 13.2 MB/s [CPU-based, ARM NEON]
xts(stm32-ecb-aes) 3.1 MB/s [Offload, STM32 crypto engine]
AES-128-CBC-ESSIV:
essiv(cbc-aes-neonbs,sha256-lib)
14.7 MB/s [CPU-based, ARM NEON]
essiv(stm32-cbc-aes,sha256-lib)
3.2 MB/s [Offload, STM32 crypto engine]
Adiantum:
adiantum(xchacha12-arm,aes-arm,nhpoly1305-neon)
52.8 MB/s [CPU-based, ARM scalar + NEON]
So, there was no case in which the crypto engine was even *close* to
being faster. On the first three, which have AES instructions in the
CPU, the CPU was 30 to 55 times faster (!). Even on STM32MP157F-DK2
which has a Cortex-A7 CPU that doesn't have AES instructions, AES was
over 4 times faster on the CPU. And Adiantum encryption, which is what
actually should be used on CPUs like that, was over 17 times faster.
Other justifications that have been given for these non-inline crypto
engines (almost always coming from the hardware vendors, not actual
users) don't seem very plausible either:
- The crypto engine throughput could be improved by processing
multiple requests concurrently. Currently irrelevant to fscrypt,
since it doesn't do that. This would also be complex, and unhelpful
in many cases. 2 of the 4 engines I tested even had only one queue.
- Some of the engines, e.g. STM32, support hardware keys. Also
currently irrelevant to fscrypt, since it doesn't support these.
Interestingly, the STM32 driver itself doesn't support this either.
- Free up CPU for other tasks and/or reduce energy usage. Not very
plausible considering the "short" message length, driver overhead,
and scheduling overhead. There's just very little time for the CPU
to do something else like run another task or enter low-power state,
before the message finishes and it's time to process the next one.
- Some of these engines resist power analysis and electromagnetic
attacks, while the CPU-based crypto generally does not. In theory,
this sounds great. In practice, if this benefit requires the use of
an off-CPU offload that massively regresses performance and has a
low-quality, buggy driver, the price for this hardening (which is
not relevant to most fscrypt users, and tends to be incomplete) is
just too high. Inline crypto engines are much more promising here,
as are on-CPU solutions like RISC-V High Assurance Cryptography.
Fixes:
|
||
|
|
c8a1e1f029 |
leds: flash: leds-qcom-flash: Fix registry access after re-bind
[ Upstream commit fab15f57360b1e6620a1d0d6b0fbee896e6c1f07 ]
Driver in probe() updates each of 'reg_field' with 'reg_base':
for (i = 0; i < REG_MAX_COUNT; i++)
regs[i].reg += reg_base;
'reg_field' array (under variable 'regs' above) is statically allocated,
thus each re-bind would add another 'reg_base' leading to bogus
register addresses. Constify the local 'reg_field' array and duplicate
it in probe to solve this.
Fixes:
|
||
|
|
71230248a7 |
leds: flash: leds-qcom-flash: Limit LED current based on thermal condition
[ Upstream commit a0864cf32044233e56247fa0eed3ac660f15db9e ] The flash module has status bits to indicate different thermal conditions which are called as OTSTx. For each OTSTx status, there is a recommended total flash current for all channels to prevent the flash module entering into higher thermal level. For example, the total flash current should be limited to 1000mA/500mA respectively when the HW reaches the OTST1/OTST2 thermal level. Signed-off-by: Fenglin Wu <quic_fenglinw@quicinc.com> Link: https://lore.kernel.org/r/20240705-qcom_flash_thermal_derating-v3-1-8e2e2783e3a6@quicinc.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: fab15f57360b ("leds: flash: leds-qcom-flash: Fix registry access after re-bind") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
84a24fb446 |
net/sched: ets: use old 'nbands' while purging unused classes
[ Upstream commit 87c6efc5ce9c126ae4a781bc04504b83780e3650 ] Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify() after recent changes from Lion [2]. The problem is: in ets_qdisc_change() we purge unused DWRR queues; the value of 'q->nbands' is the new one, and the cleanup should be done with the old one. The problem is here since my first attempts to fix ets_qdisc_change(), but it surfaced again after the recent qdisc len accounting fixes. Fix it purging idle DWRR queues before assigning a new value of 'q->nbands', so that all purge operations find a consistent configuration: - old 'q->nbands' because it's needed by ets_class_find() - old 'q->nstrict' because it's needed by ets_class_is_strict() BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 62 UID: 0 PID: 39457 Comm: tc Kdump: loaded Not tainted 6.12.0-116.el10.x86_64 #1 PREEMPT(voluntary) Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.12.2 07/09/2021 RIP: 0010:__list_del_entry_valid_or_report+0x4/0x80 Code: ff 4c 39 c7 0f 84 39 19 8e ff b8 01 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b 17 48 8b 4f 08 48 85 d2 0f 84 56 19 8e ff 48 85 c9 0f 84 ab RSP: 0018:ffffba186009f400 EFLAGS: 00010202 RAX: 00000000000000d6 RBX: 0000000000000000 RCX: 0000000000000004 RDX: ffff9f0fa29b69c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffffffc12c2400 R08: 0000000000000008 R09: 0000000000000004 R10: ffffffffffffffff R11: 0000000000000004 R12: 0000000000000000 R13: ffff9f0f8cfe0000 R14: 0000000000100005 R15: 0000000000000000 FS: 00007f2154f37480(0000) GS:ffff9f269c1c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001530be001 CR4: 00000000007726f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ets_class_qlen_notify+0x65/0x90 [sch_ets] qdisc_tree_reduce_backlog+0x74/0x110 ets_qdisc_change+0x630/0xa40 [sch_ets] __tc_modify_qdisc.constprop.0+0x216/0x7f0 tc_modify_qdisc+0x7c/0x120 rtnetlink_rcv_msg+0x145/0x3f0 netlink_rcv_skb+0x53/0x100 netlink_unicast+0x245/0x390 netlink_sendmsg+0x21b/0x470 ____sys_sendmsg+0x39d/0x3d0 ___sys_sendmsg+0x9a/0xe0 __sys_sendmsg+0x7a/0xd0 do_syscall_64+0x7d/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f2155114084 Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 25 f0 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89 RSP: 002b:00007fff1fd7a988 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000560ec063e5e0 RCX: 00007f2155114084 RDX: 0000000000000000 RSI: 00007fff1fd7a9f0 RDI: 0000000000000003 RBP: 00007fff1fd7aa60 R08: 0000000000000010 R09: 000000000000003f R10: 0000560ee9b3a010 R11: 0000000000000202 R12: 00007fff1fd7aae0 R13: 000000006891ccde R14: 0000560ec063e5e0 R15: 00007fff1fd7aad0 </TASK> [1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/ [2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/ Cc: stable@vger.kernel.org Fixes: 103406b38c60 ("net/sched: Always pass notifications when child class becomes empty") Fixes: |
||
|
|
81505f137b |
net_sched: sch_ets: implement lockless ets_dump()
[ Upstream commit c5f1dde7f731e7bf2e7c169ca42cb4989fc2f8b9 ] Instead of relying on RTNL, ets_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in ets_change(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 87c6efc5ce9c ("net/sched: ets: use old 'nbands' while purging unused classes") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
00dc616e6c |
smb: client: fix netns refcount leak after net_passive changes
[ Upstream commit 59b33fab4ca4d7dacc03367082777627e05d0323 ]
After commit 5c70eb5c593d ("net: better track kernel sockets lifetime"),
kernel sockets now use net_passive reference counting. However, commit
95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"")
restored the manual socket refcount manipulation without adapting to this
new mechanism, causing a memory leak.
The issue can be reproduced by[1]:
1. Creating a network namespace
2. Mounting and Unmounting CIFS within the namespace
3. Deleting the namespace
Some memory leaks may appear after a period of time following step 3.
unreferenced object 0xffff9951419f6b00 (size 256):
comm "ip", pid 447, jiffies 4294692389 (age 14.730s)
hex dump (first 32 bytes):
1b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 80 77 c2 44 51 99 ff ff .........w.DQ...
backtrace:
__kmem_cache_alloc_node+0x30e/0x3d0
__kmalloc+0x52/0x120
net_alloc_generic+0x1d/0x30
copy_net_ns+0x86/0x200
create_new_namespaces+0x117/0x300
unshare_nsproxy_namespaces+0x60/0xa0
ksys_unshare+0x148/0x360
__x64_sys_unshare+0x12/0x20
do_syscall_64+0x59/0x110
entry_SYSCALL_64_after_hwframe+0x78/0xe2
...
unreferenced object 0xffff9951442e7500 (size 32):
comm "mount.cifs", pid 475, jiffies 4294693782 (age 13.343s)
hex dump (first 32 bytes):
40 c5 38 46 51 99 ff ff 18 01 96 42 51 99 ff ff @.8FQ......BQ...
01 00 00 00 6f 00 c5 07 6f 00 d8 07 00 00 00 00 ....o...o.......
backtrace:
__kmem_cache_alloc_node+0x30e/0x3d0
kmalloc_trace+0x2a/0x90
ref_tracker_alloc+0x8e/0x1d0
sk_alloc+0x18c/0x1c0
inet_create+0xf1/0x370
__sock_create+0xd7/0x1e0
generic_ip_connect+0x1d4/0x5a0 [cifs]
cifs_get_tcp_session+0x5d0/0x8a0 [cifs]
cifs_mount_get_session+0x47/0x1b0 [cifs]
dfs_mount_share+0xfa/0xa10 [cifs]
cifs_mount+0x68/0x2b0 [cifs]
cifs_smb3_do_mount+0x10b/0x760 [cifs]
smb3_get_tree+0x112/0x2e0 [cifs]
vfs_get_tree+0x29/0xf0
path_mount+0x2d4/0xa00
__se_sys_mount+0x165/0x1d0
Root cause:
When creating kernel sockets, sk_alloc() calls net_passive_inc() for
sockets with sk_net_refcnt=0. The CIFS code manually converts kernel
sockets to user sockets by setting sk_net_refcnt=1, but doesn't call
the corresponding net_passive_dec(). This creates an imbalance in the
net_passive counter, which prevents the network namespace from being
destroyed when its last user reference is dropped. As a result, the
entire namespace and all its associated resources remain allocated.
Timeline of patches leading to this issue:
- commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.") in v6.12 fixed the original netns UAF by manually
managing socket refcounts
- commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after
rmmod") in v6.13 attempted to use kernel sockets but introduced
TCP timer issues
- commit 5c70eb5c593d ("net: better track kernel sockets lifetime")
in v6.14-rc5 introduced the net_passive mechanism with
sk_net_refcnt_upgrade() for proper socket conversion
- commit 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock
after rmmod"") in v6.15-rc3 reverted to manual refcount management
without adapting to the new net_passive changes
Fix this by using sk_net_refcnt_upgrade() which properly handles the
net_passive counter when converting kernel sockets to user sockets.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220343 [1]
Fixes: 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"")
Cc: stable@vger.kernel.org
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
2668e03880 |
net: better track kernel sockets lifetime
[ Upstream commit 5c70eb5c593d64d93b178905da215a9fd288a4b5 ]
While kernel sockets are dismantled during pernet_operations->exit(),
their freeing can be delayed by any tx packets still held in qdisc
or device queues, due to skb_set_owner_w() prior calls.
This then trigger the following warning from ref_tracker_dir_exit() [1]
To fix this, make sure that kernel sockets own a reference on net->passive.
Add sk_net_refcnt_upgrade() helper, used whenever a kernel socket
is converted to a refcounted one.
[1]
[ 136.263918][ T35] ref_tracker: net notrefcnt@ffff8880638f01e0 has 1/2 users at
[ 136.263918][ T35] sk_alloc+0x2b3/0x370
[ 136.263918][ T35] inet6_create+0x6ce/0x10f0
[ 136.263918][ T35] __sock_create+0x4c0/0xa30
[ 136.263918][ T35] inet_ctl_sock_create+0xc2/0x250
[ 136.263918][ T35] igmp6_net_init+0x39/0x390
[ 136.263918][ T35] ops_init+0x31e/0x590
[ 136.263918][ T35] setup_net+0x287/0x9e0
[ 136.263918][ T35] copy_net_ns+0x33f/0x570
[ 136.263918][ T35] create_new_namespaces+0x425/0x7b0
[ 136.263918][ T35] unshare_nsproxy_namespaces+0x124/0x180
[ 136.263918][ T35] ksys_unshare+0x57d/0xa70
[ 136.263918][ T35] __x64_sys_unshare+0x38/0x40
[ 136.263918][ T35] do_syscall_64+0xf3/0x230
[ 136.263918][ T35] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 136.263918][ T35]
[ 136.343488][ T35] ref_tracker: net notrefcnt@ffff8880638f01e0 has 1/2 users at
[ 136.343488][ T35] sk_alloc+0x2b3/0x370
[ 136.343488][ T35] inet6_create+0x6ce/0x10f0
[ 136.343488][ T35] __sock_create+0x4c0/0xa30
[ 136.343488][ T35] inet_ctl_sock_create+0xc2/0x250
[ 136.343488][ T35] ndisc_net_init+0xa7/0x2b0
[ 136.343488][ T35] ops_init+0x31e/0x590
[ 136.343488][ T35] setup_net+0x287/0x9e0
[ 136.343488][ T35] copy_net_ns+0x33f/0x570
[ 136.343488][ T35] create_new_namespaces+0x425/0x7b0
[ 136.343488][ T35] unshare_nsproxy_namespaces+0x124/0x180
[ 136.343488][ T35] ksys_unshare+0x57d/0xa70
[ 136.343488][ T35] __x64_sys_unshare+0x38/0x40
[ 136.343488][ T35] do_syscall_64+0xf3/0x230
[ 136.343488][ T35] entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes:
|
||
|
|
ac90037bf3 |
net: Add net_passive_inc() and net_passive_dec().
[ Upstream commit e57a6320215c3967f51ab0edeff87db2095440e4 ] net_drop_ns() is NULL when CONFIG_NET_NS is disabled. The next patch introduces a function that increments and decrements net->passive. As a prep, let's rename and export net_free() to net_passive_dec() and add net_passive_inc(). Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/CANn89i+oUCt2VGvrbrweniTendZFEh+nwS=uonc004-aPkWy-Q@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250217191129.19967-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 59b33fab4ca4 ("smb: client: fix netns refcount leak after net_passive changes") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
5dd481868e |
selftests/memfd: add test for mapping write-sealed memfd read-only
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [ Upstream commit ea0916e01d0b0f2cce1369ac1494239a79827270 ] Now we have reinstated the ability to map F_SEAL_WRITE mappings read-only, assert that we are able to do this in a test to ensure that we do not regress this again. Link: https://lkml.kernel.org/r/a6377ec470b14c0539b4600cf8fa24bf2e4858ae.1732804776.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Julian Orth <ju.orth@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
2e4179698f |
mm: reinstate ability to map write-sealed memfd mappings read-only
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [ Upstream commit 8ec396d05d1b737c87311fb7311f753b02c2a6b1 ] Patch series "mm: reinstate ability to map write-sealed memfd mappings read-only". In commit 158978945f31 ("mm: perform the mapping_map_writable() check after call_mmap()") (and preceding changes in the same series) it became possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only. Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour") unintentionally undid this logic by moving the mapping_map_writable() check before the shmem_mmap() hook is invoked, thereby regressing this change. This series reworks how we both permit write-sealed mappings being mapped read-only and disallow mprotect() from undoing the write-seal, fixing this regression. We also add a regression test to ensure that we do not accidentally regress this in future. Thanks to Julian Orth for reporting this regression. This patch (of 2): In commit 158978945f31 ("mm: perform the mapping_map_writable() check after call_mmap()") (and preceding changes in the same series) it became possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only. This was previously unnecessarily disallowed, despite the man page documentation indicating that it would be, thereby limiting the usefulness of F_SEAL_WRITE logic. We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WRITE seal (one which disallows future writes to the memfd) to also be used for F_SEAL_WRITE. For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for a read-only mapping to disallow mprotect() from overriding the seal - an operation performed by seal_check_write(), invoked from shmem_mmap(), the f_op->mmap() hook used by shmem mappings. By extending this to F_SEAL_WRITE and critically - checking mapping_map_writable() to determine if we may map the memfd AFTER we invoke shmem_mmap() - the desired logic becomes possible. This is because mapping_map_writable() explicitly checks for VM_MAYWRITE, which we will have cleared. Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour") unintentionally undid this logic by moving the mapping_map_writable() check before the shmem_mmap() hook is invoked, thereby regressing this change. We reinstate this functionality by moving the check out of shmem_mmap() and instead performing it in do_mmap() at the point at which VMA flags are being determined, which seems in any case to be a more appropriate place in which to make this determination. In order to achieve this we rework memfd seal logic to allow us access to this information using existing logic and eliminate the clearing of VM_MAYWRITE from seal_check_write() which we are performing in do_mmap() instead. Link: https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.1732804776.git.lorenzo.stoakes@oracle.com Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Julian Orth <ju.orth@gmail.com> Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv=U7s4L_Jem5x3AaPZxrYpQ@mail.gmail.com/ Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
87a75f68ea |
mm: update memfd seal write check to include F_SEAL_WRITE
From: Lorenzo Stoakes <lstoakes@gmail.com> [ Upstream commit 28464bbb2ddc199433383994bcb9600c8034afa1 ] The seal_check_future_write() function is called by shmem_mmap() or hugetlbfs_file_mmap() to disallow any future writable mappings of an memfd sealed this way. The F_SEAL_WRITE flag is not checked here, as that is handled via the mapping->i_mmap_writable mechanism and so any attempt at a mapping would fail before this could be run. However we intend to change this, meaning this check can be performed for F_SEAL_WRITE mappings also. The logic here is equally applicable to both flags, so update this function to accommodate both and rename it accordingly. Link: https://lkml.kernel.org/r/913628168ce6cce77df7d13a63970bae06a526e0.1697116581.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
17c5d49beb |
mm: drop the assumption that VM_SHARED always implies writable
From: Lorenzo Stoakes <lstoakes@gmail.com> [ Upstream commit e8e17ee90eaf650c855adb0a3e5e965fd6692ff1 ] Patch series "permit write-sealed memfd read-only shared mappings", v4. The man page for fcntl() describing memfd file seals states the following about F_SEAL_WRITE:- Furthermore, trying to create new shared, writable memory-mappings via mmap(2) will also fail with EPERM. With emphasis on 'writable'. In turns out in fact that currently the kernel simply disallows all new shared memory mappings for a memfd with F_SEAL_WRITE applied, rendering this documentation inaccurate. This matters because users are therefore unable to obtain a shared mapping to a memfd after write sealing altogether, which limits their usefulness. This was reported in the discussion thread [1] originating from a bug report [2]. This is a product of both using the struct address_space->i_mmap_writable atomic counter to determine whether writing may be permitted, and the kernel adjusting this counter when any VM_SHARED mapping is performed and more generally implicitly assuming VM_SHARED implies writable. It seems sensible that we should only update this mapping if VM_MAYWRITE is specified, i.e. whether it is possible that this mapping could at any point be written to. If we do so then all we need to do to permit write seals to function as documented is to clear VM_MAYWRITE when mapping read-only. It turns out this functionality already exists for F_SEAL_FUTURE_WRITE - we can therefore simply adapt this logic to do the same for F_SEAL_WRITE. We then hit a chicken and egg situation in mmap_region() where the check for VM_MAYWRITE occurs before we are able to clear this flag. To work around this, perform this check after we invoke call_mmap(), with careful consideration of error paths. Thanks to Andy Lutomirski for the suggestion! [1]:https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/ [2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238 This patch (of 3): There is a general assumption that VMAs with the VM_SHARED flag set are writable. If the VM_MAYWRITE flag is not set, then this is simply not the case. Update those checks which affect the struct address_space->i_mmap_writable field to explicitly test for this by introducing [vma_]is_shared_maywrite() helper functions. This remains entirely conservative, as the lack of VM_MAYWRITE guarantees that the VMA cannot be written to. Link: https://lkml.kernel.org/r/cover.1697116581.git.lstoakes@gmail.com Link: https://lkml.kernel.org/r/d978aefefa83ec42d18dfa964ad180dbcde34795.1697116581.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Suggested-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org [isaacmanjarres: resolved merge conflicts due to due to refactoring that happened in upstream commit 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour")] Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
440d36847a |
mptcp: pm: kernel: flush: do not reset ADD_ADDR limit
commit 68fc0f4b0d25692940cdc85c68e366cae63e1757 upstream.
A flush of the MPTCP endpoints should not affect the MPTCP limits. In
other words, 'ip mptcp endpoint flush' should not change 'ip mptcp
limits'.
But it was the case: the MPTCP_PM_ATTR_RCV_ADD_ADDRS (add_addr_accepted)
limit was reset by accident. Removing the reset of this counter during a
flush fixes this issue.
Fixes:
|
||
|
|
253b7d8baf |
mptcp: drop skb if MPTCP skb extension allocation fails
commit ccab044697980c6c01ab51f43f48f13b8a3e5c33 upstream.
When skb_ext_add(skb, SKB_EXT_MPTCP) fails in mptcp_incoming_options(),
we used to return true, letting the segment proceed through the TCP
receive path without a DSS mapping. Such segments can leave inconsistent
mapping state and trigger a mid-stream fallback to TCP, which in testing
collapsed (by artificially forcing failures in skb_ext_add) throughput
to zero.
Return false instead so the TCP input path drops the skb (see
tcp_data_queue() and step-7 processing). This is the safer choice
under memory pressure: it preserves MPTCP correctness and provides
backpressure to the sender.
Control packets remain unaffected: ACK updates and DATA_FIN handling
happen before attempting the extension allocation, and tcp_reset()
continues to ignore the return value.
With this change, MPTCP continues to work at high throughput if we
artificially inject failures into skb_ext_add.
Fixes:
|
||
|
|
cf0a88124e |
ACPI: pfr_update: Fix the driver update version check
commit 8151320c747efb22d30b035af989fed0d502176e upstream.
The security-version-number check should be used rather
than the runtime version check for driver updates.
Otherwise, the firmware update would fail when the update binary had
a lower runtime version number than the current one.
Fixes:
|
||
|
|
3ddd55cf19 |
ipv6: sr: Fix MAC comparison to be constant-time
commit a458b2902115b26a25d67393b12ddd57d1216aaa upstream.
To prevent timing attacks, MACs need to be compared in constant time.
Use the appropriate helper function for this.
Fixes:
|
||
|
|
b117c41b00 |
net, hsr: reject HSR frame if skb can't hold tag
commit 7af76e9d18a9fd6f8611b3313c86c190f9b6a5a7 upstream.
Receiving HSR frame with insufficient space to hold HSR tag in the skb
can result in a crash (kernel BUG):
[ 45.390915] skbuff: skb_under_panic: text:ffffffff86f32cac len:26 put:14 head:ffff888042418000 data:ffff888042417ff4 tail:0xe end:0x180 dev:bridge_slave_1
[ 45.392559] ------------[ cut here ]------------
[ 45.392912] kernel BUG at net/core/skbuff.c:211!
[ 45.393276] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 45.393809] CPU: 1 UID: 0 PID: 2496 Comm: reproducer Not tainted 6.15.0 #12 PREEMPT(undef)
[ 45.394433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 45.395273] RIP: 0010:skb_panic+0x15b/0x1d0
<snip registers, remove unreliable trace>
[ 45.402911] Call Trace:
[ 45.403105] <IRQ>
[ 45.404470] skb_push+0xcd/0xf0
[ 45.404726] br_dev_queue_push_xmit+0x7c/0x6c0
[ 45.406513] br_forward_finish+0x128/0x260
[ 45.408483] __br_forward+0x42d/0x590
[ 45.409464] maybe_deliver+0x2eb/0x420
[ 45.409763] br_flood+0x174/0x4a0
[ 45.410030] br_handle_frame_finish+0xc7c/0x1bc0
[ 45.411618] br_handle_frame+0xac3/0x1230
[ 45.413674] __netif_receive_skb_core.constprop.0+0x808/0x3df0
[ 45.422966] __netif_receive_skb_one_core+0xb4/0x1f0
[ 45.424478] __netif_receive_skb+0x22/0x170
[ 45.424806] process_backlog+0x242/0x6d0
[ 45.425116] __napi_poll+0xbb/0x630
[ 45.425394] net_rx_action+0x4d1/0xcc0
[ 45.427613] handle_softirqs+0x1a4/0x580
[ 45.427926] do_softirq+0x74/0x90
[ 45.428196] </IRQ>
This issue was found by syzkaller.
The panic happens in br_dev_queue_push_xmit() once it receives a
corrupted skb with ETH header already pushed in linear data. When it
attempts the skb_push() call, there's not enough headroom and
skb_push() panics.
The corrupted skb is put on the queue by HSR layer, which makes a
sequence of unintended transformations when it receives a specific
corrupted HSR frame (with incomplete TAG).
Fix it by dropping and consuming frames that are not long enough to
contain both ethernet and hsr headers.
Alternative fix would be to check for enough headroom before skb_push()
in br_dev_queue_push_xmit().
In the reproducer, this is injected via AF_PACKET, but I don't easily
see why it couldn't be sent over the wire from adjacent network.
Further Details:
In the reproducer, the following network interface chain is set up:
┌────────────────┐ ┌────────────────┐
│ veth0_to_hsr ├───┤ hsr_slave0 ┼───┐
└────────────────┘ └────────────────┘ │
│ ┌──────┐
├─┤ hsr0 ├───┐
│ └──────┘ │
┌────────────────┐ ┌────────────────┐ │ │┌────────┐
│ veth1_to_hsr ┼───┤ hsr_slave1 ├───┘ └┤ │
└────────────────┘ └────────────────┘ ┌┼ bridge │
││ │
│└────────┘
│
┌───────┐ │
│ ... ├──────┘
└───────┘
To trigger the events leading up to crash, reproducer sends a corrupted
HSR frame with incomplete TAG, via AF_PACKET socket on 'veth0_to_hsr'.
The first HSR-layer function to process this frame is
hsr_handle_frame(). It and then checks if the
protocol is ETH_P_PRP or ETH_P_HSR. If it is, it calls
skb_set_network_header(skb, ETH_HLEN + HSR_HLEN), without checking that
the skb is long enough. For the crashing frame it is not, and hence the
skb->network_header and skb->mac_len fields are set incorrectly,
pointing after the end of the linear buffer.
I will call this a BUG#1 and it is what is addressed by this patch. In
the crashing scenario before the fix, the skb continues to go down the
hsr path as follows.
hsr_handle_frame() then calls this sequence
hsr_forward_skb()
fill_frame_info()
hsr->proto_ops->fill_frame_info()
hsr_fill_frame_info()
hsr_fill_frame_info() contains a check that intends to check whether the
skb actually contains the HSR header. But the check relies on the
skb->mac_len field which was erroneously setup due to BUG#1, so the
check passes and the execution continues back in the hsr_forward_skb():
hsr_forward_skb()
hsr_forward_do()
hsr->proto_ops->get_untagged_frame()
hsr_get_untagged_frame()
create_stripped_skb_hsr()
In create_stripped_skb_hsr(), a copy of the skb is created and is
further corrupted by operation that attempts to strip the HSR tag in a
call to __pskb_copy().
The skb enters create_stripped_skb_hsr() with ethernet header pushed in
linear buffer. The skb_pull(skb_in, HSR_HLEN) thus pulls 6 bytes of
ethernet header into the headroom, creating skb_in with a headroom of
size 8. The subsequent __pskb_copy() then creates an skb with headroom
of just 2 and skb->len of just 12, this is how it looks after the copy:
gdb) p skb->len
$10 = 12
(gdb) p skb->data
$11 = (unsigned char *) 0xffff888041e45382 "\252\252\252\252\252!\210\373",
(gdb) p skb->head
$12 = (unsigned char *) 0xffff888041e45380 ""
It seems create_stripped_skb_hsr() assumes that ETH header is pulled
in the headroom when it's entered, because it just pulls HSR header on
top. But that is not the case in our code-path and we end up with the
corrupted skb instead. I will call this BUG#2
*I got confused here because it seems that under no conditions can
create_stripped_skb_hsr() work well, the assumption it makes is not true
during the processing of hsr frames - since the skb_push() in
hsr_handle_frame to skb_pull in hsr_deliver_master(). I wonder whether I
missed something here.*
Next, the execution arrives in hsr_deliver_master(). It calls
skb_pull(ETH_HLEN), which just returns NULL - the SKB does not have
enough space for the pull (as it only has 12 bytes in total at this
point).
*The skb_pull() here further suggests that ethernet header is meant
to be pushed through the whole hsr processing and
create_stripped_skb_hsr() should pull it before doing the HSR header
pull.*
hsr_deliver_master() then puts the corrupted skb on the queue, it is
then picked up from there by bridge frame handling layer and finally
lands in br_dev_queue_push_xmit where it panics.
Cc: stable@kernel.org
Fixes:
|
||
|
|
3afdeb96dc |
drm/amd/display: Don't overwrite dce60_clk_mgr
commit 4db9cd554883e051df1840d4d58d636043101034 upstream.
dc_clk_mgr_create accidentally overwrites the dce60_clk_mgr
with the dce_clk_mgr, causing incorrect behaviour on DCE6.
Fix it by removing the extra dce_clk_mgr_construct.
Fixes:
|
||
|
|
39f187cbf7 |
drm/amd/display: Add primary plane to commits for correct VRR handling
commit 3477c1b0972dc1c8a46f78e8fb1fa6966095b5ec upstream. amdgpu_dm_commit_planes calls update_freesync_state_on_stream only for the primary plane. If a commit affects a CRTC but not its primary plane, it would previously not trigger a refresh cycle or affect LFC, violating current UAPI semantics. Fixes e.g. atomic commits affecting only the cursor plane being limited to the minimum refresh rate. Don't do this for the legacy cursor ioctls though, it would break the UAPI semantics for those. Suggested-by: Xaver Hugl <xaver.hugl@kde.org> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3034 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cc7bfba95966251b254cb970c21627124da3b7f4) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
74ee7445c3 |
drm/amdkfd: Destroy KFD debugfs after destroy KFD wq
commit 2e58401a24e7b2d4ec619104e1a76590c1284a4c upstream.
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD
debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior
to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens
when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but
kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line
debugfs_remove_recursive(entry->proc_dentry);
tries to remove /sys/kernel/debug/kfd/proc/<pid> while
/sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel
NULL pointer.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0333052d90683d88531558dcfdbf2525cc37c233)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
627f30a328 |
drm/amdgpu: update mmhub 3.0.1 client id mappings
commit 0bae62cc989fa99ac9cb564eb573aad916d1eb61 upstream. Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2a2681eda73b99a2c1ee8cdb006099ea5d0c2505) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
a3201e3b7c |
drm/amdgpu: Avoid extra evict-restore process.
commit 1f02f2044bda1db1fd995bc35961ab075fa7b5a2 upstream. If vm belongs to another process, this is fclose after fork, wait may enable signaling KFD eviction fence and cause parent process queue evicted. [677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu] [677852.634814] __dma_fence_enable_signaling+0x3e/0xe0 [677852.634820] dma_fence_wait_timeout+0x3a/0x140 [677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl] [677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu] [677852.635026] amdgpu_flush+0x34/0x50 [amdgpu] [677852.635208] filp_flush+0x38/0x90 [677852.635213] filp_close+0x14/0x30 [677852.635216] do_close_on_exec+0xdd/0x130 [677852.635221] begin_new_exec+0x1da/0x490 [677852.635225] load_elf_binary+0x307/0xea0 [677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5 [677852.635235] ? ima_bprm_check+0xa2/0xd0 [677852.635240] search_binary_handler+0xda/0x260 [677852.635245] exec_binprm+0x58/0x1a0 [677852.635249] bprm_execve.part.0+0x16f/0x210 [677852.635254] bprm_execve+0x45/0x80 [677852.635257] do_execveat_common.isra.0+0x190/0x200 Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Gang Ba <Gang.Ba@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
0e7f5f9aa4 |
drm/amd: Restore cached power limit during resume
commit ed4efe426a49729952b3dc05d20e33b94409bdd1 upstream. The power limit will be cached in smu->current_power_limit but if the ASIC goes into S3 this value won't be restored. Restore the value during SMU resume. Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250725031222.3015095-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 26a609e053a6fc494403e95403bc6a2470383bec) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
002860142c |
media: venus: venc: Clamp param smaller than 1fps and bigger than 240
commit 417c01b92ec278a1118a05c6ad8a796eaa0c9c52 upstream.
The driver uses "whole" fps in all its calculations (e.g. in
load_per_instance()). Those calculation expect an fps bigger than 1, and
not big enough to overflow.
Clamp the param if the user provides a value that will result in an invalid
fps.
Reported-by: Hans Verkuil <hverkuil@xs4all.nl>
Closes: https://lore.kernel.org/linux-media/f11653a7-bc49-48cd-9cdb-1659147453e4@xs4all.nl/T/#m91cd962ac942834654f94c92206e2f85ff7d97f0
Fixes:
|
||
|
|
f866778583 |
media: venus: vdec: Clamp param smaller than 1fps and bigger than 240.
commit 377dc500d253f0b26732b2cb062e89668aef890a upstream.
The driver uses "whole" fps in all its calculations (e.g. in
load_per_instance()). Those calculation expect an fps bigger than 1, and
not big enough to overflow.
Clamp the value if the user provides a param that will result in an invalid
fps.
Reported-by: Hans Verkuil <hverkuil@xs4all.nl>
Closes: https://lore.kernel.org/linux-media/f11653a7-bc49-48cd-9cdb-1659147453e4@xs4all.nl/T/#m91cd962ac942834654f94c92206e2f85ff7d97f0
Fixes:
|
||
|
|
f54be97bc6 |
media: venus: protect against spurious interrupts during probe
commit 3200144a2fa4209dc084a19941b9b203b43580f0 upstream.
Make sure the interrupt handler is initialized before the interrupt is
registered.
If the IRQ is registered before hfi_create(), it's possible that an
interrupt fires before the handler setup is complete, leading to a NULL
dereference.
This error condition has been observed during system boot on Rb3Gen2.
Fixes:
|
||
|
|
041015a465 |
media: venus: hfi: explicitly release IRQ during teardown
commit 640803003cd903cea73dc6a86bf6963e238e2b3f upstream.
Ensure the IRQ is disabled - and all pending handlers completed - before
dismantling the interrupt routing and clearing related pointers.
This prevents any possibility of the interrupt triggering after the
handler context has been invalidated.
Fixes:
|
||
|
|
ba567c2e52 |
media: venus: Add a check for packet size after reading from shared memory
commit 49befc830daa743e051a65468c05c2ff9e8580e6 upstream.
Add a check to ensure that the packet size does not exceed the number of
available words after reading the packet header from shared memory. This
ensures that the size provided by the firmware is safe to process and
prevent potential out-of-bounds memory access.
Fixes:
|
||
|
|
03c0e00ccf |
media: qcom: camss: cleanup media device allocated resource on error path
commit 69080ec3d0daba8a894025476c98ab16b5a505a4 upstream.
A call to media_device_init() requires media_device_cleanup() counterpart
to complete cleanup and release any allocated resources.
This has been done in the driver .remove() right from the beginning, but
error paths on .probe() shall also be fixed.
Fixes:
|
||
|
|
3c0e4cc4f5 |
media: ivsc: Fix crash at shutdown due to missing mei_cldev_disable() calls
commit 0c92c49fc688cfadacc47ae99b06a31237702e9e upstream. Both the ACE and CSI driver are missing a mei_cldev_disable() call in their remove() function. This causes the mei_cl client to stay part of the mei_device->file_list list even though its memory is freed by mei_cl_bus_dev_release() calling kfree(cldev->cl). This leads to a use-after-free when mei_vsc_remove() runs mei_stop() which first removes all mei bus devices calling mei_ace_remove() and mei_csi_remove() followed by mei_cl_bus_dev_release() and then calls mei_cl_all_disconnect() which walks over mei_device->file_list dereferecing the just freed cldev->cl. And mei_vsc_remove() it self is run at shutdown because of the platform_device_unregister(tp->pdev) in vsc_tp_shutdown() When building a kernel with KASAN this leads to the following KASAN report: [ 106.634504] ================================================================== [ 106.634623] BUG: KASAN: slab-use-after-free in mei_cl_set_disconnected (drivers/misc/mei/client.c:783) mei [ 106.634683] Read of size 4 at addr ffff88819cb62018 by task systemd-shutdow/1 [ 106.634729] [ 106.634767] Tainted: [E]=UNSIGNED_MODULE [ 106.634770] Hardware name: Dell Inc. XPS 16 9640/09CK4V, BIOS 1.12.0 02/10/2025 [ 106.634773] Call Trace: [ 106.634777] <TASK> ... [ 106.634871] kasan_report (mm/kasan/report.c:221 mm/kasan/report.c:636) [ 106.634901] mei_cl_set_disconnected (drivers/misc/mei/client.c:783) mei [ 106.634921] mei_cl_all_disconnect (drivers/misc/mei/client.c:2165 (discriminator 4)) mei [ 106.634941] mei_reset (drivers/misc/mei/init.c:163) mei ... [ 106.635042] mei_stop (drivers/misc/mei/init.c:348) mei [ 106.635062] mei_vsc_remove (drivers/misc/mei/mei_dev.h:784 drivers/misc/mei/platform-vsc.c:393) mei_vsc [ 106.635066] platform_remove (drivers/base/platform.c:1424) Add the missing mei_cldev_disable() calls so that the mei_cl gets removed from mei_device->file_list before it is freed to fix this. Fixes: |
||
|
|
61f267b77d |
media: ov2659: Fix memory leaks in ov2659_probe()
commit 76142b137b968d47b35cdd8d1dc924677d319c8b upstream.
ov2659_probe() doesn't properly free control handler resources in failure
paths, causing memory leaks. Add v4l2_ctrl_handler_free() to prevent these
memory leaks and reorder the ctrl_handler assignment for better code flow.
Fixes:
|
||
|
|
1c2769dc80 |
media: rainshadow-cec: fix TOCTOU race condition in rain_interrupt()
commit 7af160aea26c7dc9e6734d19306128cce156ec40 upstream.
In the interrupt handler rain_interrupt(), the buffer full check on
rain->buf_len is performed before acquiring rain->buf_lock. This
creates a Time-of-Check to Time-of-Use (TOCTOU) race condition, as
rain->buf_len is concurrently accessed and modified in the work
handler rain_irq_work_handler() under the same lock.
Multiple interrupt invocations can race, with each reading buf_len
before it becomes full and then proceeding. This can lead to both
interrupts attempting to write to the buffer, incrementing buf_len
beyond its capacity (DATA_SIZE) and causing a buffer overflow.
Fix this bug by moving the spin_lock() to before the buffer full
check. This ensures that the check and the subsequent buffer modification
are performed atomically, preventing the race condition. An corresponding
spin_unlock() is added to the overflow path to correctly release the
lock.
This possible bug was found by an experimental static analysis tool
developed by our team.
Fixes:
|
||
|
|
3d83d0b5ae |
media: usbtv: Lock resolution while streaming
commit 7e40e0bb778907b2441bff68d73c3eb6b6cd319f upstream.
When an program is streaming (ffplay) and another program (qv4l2)
changes the TV standard from NTSC to PAL, the kernel crashes due to trying
to copy to unmapped memory.
Changing from NTSC to PAL increases the resolution in the usbtv struct,
but the video plane buffer isn't adjusted, so it overflows.
Fixes:
|