[ Upstream commit 54c3f1a814 ]
Anand hit a BUG() when pulling off headers on egress to a SW tunnel.
We get to skb_checksum_help() with an invalid checksum offset
(commit d7ea0d9df2 ("net: remove two BUG() from skb_checksum_help()")
converted those BUGs to WARN_ONs()).
He points out oddness in how skb_postpull_rcsum() gets used.
Indeed looks like we should pull before "postpull", otherwise
the CHECKSUM_PARTIAL fixup from skb_postpull_rcsum() will not
be able to do its job:
if (skb->ip_summed == CHECKSUM_PARTIAL &&
skb_checksum_start_offset(skb) < 0)
skb->ip_summed = CHECKSUM_NONE;
Reported-by: Anand Parthasarathy <anpartha@meta.com>
Fixes: 6578171a7f ("bpf: add bpf_skb_change_proto helper")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221220004701.402165-1-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7dc753fe3 ]
A previous cleanup patch accidentally broke some conditional
expressions by replacing the safe "do {} while (0)" constructs
with empty macros. gcc points this out when extra warnings
are enabled:
drivers/net/wireless/ath/ath9k/hif_usb.c: In function 'ath9k_skb_queue_complete':
drivers/net/wireless/ath/ath9k/hif_usb.c:251:57: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body]
251 | TX_STAT_INC(hif_dev, skb_failed);
Make both sets of macros proper expressions again.
Fixes: d7fc76039b ("ath9k: htc: clean up statistics macros")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221215165553.1950307-1-arnd@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b18cba09e3 ]
Commit 9130b8dbc6 ("SUNRPC: allow for upcalls for the same uid
but different gss service") introduced `auth` argument to
__gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
since it (and auth->service) was not (yet) determined.
When multiple upcalls with the same uid and different service are
ongoing, it could happen that __gss_find_upcall(), which returns the
first match found in the pipe->in_downcall list, could not find the
correct gss_msg corresponding to the downcall we are looking for.
Moreover, it might return a msg which is not sent to rpc.gssd yet.
We could see mount.nfs process hung in D state with multiple mount.nfs
are executed in parallel. The call trace below is of CentOS 7.9
kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
elrepo kernel-ml-6.0.7-1.el7.
PID: 71258 TASK: ffff91ebd4be0000 CPU: 36 COMMAND: "mount.nfs"
#0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
#1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
#2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
#3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc
[sunrpc]
#4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
#5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
#6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
#7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
#8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
#9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]
The scenario is like this. Let's say there are two upcalls for
services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.
When rpc.gssd reads pipe to get the upcall msg corresponding to
service B from pipe->pipe and then writes the response, in
gss_pipe_downcall the msg corresponding to service A will be picked
because only uid is used to find the msg and it is before the one for
B in pipe->in_downcall. And the process waiting for the msg
corresponding to service A will be woken up.
Actual scheduing of that process might be after rpc.gssd processes the
next msg. In rpc_pipe_generic_upcall it clears msg->errno (for A).
The process is scheduled to see gss_msg->ctx == NULL and
gss_msg->msg.errno == 0, therefore it cannot break the loop in
gss_create_upcall and is never woken up after that.
This patch adds a simple check to ensure that a msg which is not
sent to rpc.gssd yet is not chosen as the matching upcall upon
receiving a downcall.
Signed-off-by: minoura makoto <minoura@valinux.co.jp>
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@nec.com>
Tested-by: Hiroshi Shimamoto <h-shimamoto@nec.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>
Fixes: 9130b8dbc6 ("SUNRPC: allow for upcalls for same uid but different gss service")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db0a4a7b8e ]
All error handling paths end to 'out', except this memory allocation
failure.
This is spurious. So branch to the error handling path also in this case.
It will add a call to:
memset(&root->defrag_progress, 0,
sizeof(root->defrag_progress));
Fixes: 6702ed490c ("Btrfs: Add run time btree defrag, and an ioctl to force btree defrag")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a7d86d14d ]
The PHY is powered on during phy-init by setting the SW_PWRDN bit in the
COM_POWER_DOWN_CTRL register and then setting the same bit in the in the
PCS_POWER_DOWN_CONTROL register that belongs to the USB part of the
PHY.
Currently, whether power on succeeds depends on probe order and having
the USB part of the PHY be initialised first. In case the DP part of the
PHY is instead initialised first, the intended power on of the USB block
results in a corrupted DP_PHY register (e.g. DP_PHY_AUX_CFG8).
Add a pointer to the USB part of the PHY to the driver data and use that
to power on the PHY also if the DP part of the PHY is initialised first.
Fixes: 52e013d0bf ("phy: qcom-qmp: Add support for DP in USB3+DP combo phy")
Cc: stable@vger.kernel.org # 5.10
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20221114081346.5116-5-johan+linaro@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 2ba48b2004 upstream.
[BUG]
Even with commit 81d5d61454 ("btrfs: enhance unsupported compat RO
flags handling"), btrfs can still mount a fs with unsupported compat_ro
flags read-only, then remount it RW:
# btrfs ins dump-super /dev/loop0 | grep compat_ro_flags -A 3
compat_ro_flags 0x403
( FREE_SPACE_TREE |
FREE_SPACE_TREE_VALID |
unknown flag: 0x400 )
# mount /dev/loop0 /mnt/btrfs
mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
dmesg(1) may have more information after failed mount system call.
^^^ RW mount failed as expected ^^^
# dmesg -t | tail -n5
loop0: detected capacity change from 0 to 1048576
BTRFS: device fsid cb5b82f5-0fdd-4d81-9b4b-78533c324afa devid 1 transid 7 /dev/loop0 scanned by mount (1146)
BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm
BTRFS info (device loop0): using free space tree
BTRFS error (device loop0): cannot mount read-write because of unknown compat_ro features (0x403)
BTRFS error (device loop0): open_ctree failed
# mount /dev/loop0 -o ro /mnt/btrfs
# mount -o remount,rw /mnt/btrfs
^^^ RW remount succeeded unexpectedly ^^^
[CAUSE]
Currently we use btrfs_check_features() to check compat_ro flags against
our current mount flags.
That function get reused between open_ctree() and btrfs_remount().
But for btrfs_remount(), the super block we passed in still has the old
mount flags, thus btrfs_check_features() still believes we're mounting
read-only.
[FIX]
Replace the existing @sb argument with @is_rw_mount.
As originally we only use @sb to determine if the mount is RW.
Now it's callers' responsibility to determine if the mount is RW, and
since there are only two callers, the check is pretty simple:
- caller in open_ctree()
Just pass !sb_rdonly().
- caller in btrfs_remount()
Pass !(*flags & SB_RDONLY), as our check should be against the new
flags.
Now we can correctly reject the RW remount:
# mount /dev/loop0 -o ro /mnt/btrfs
# mount -o remount,rw /mnt/btrfs
mount: /mnt/btrfs: mount point not mounted or bad option.
dmesg(1) may have more information after failed mount system call.
# dmesg -t | tail -n 1
BTRFS error (device loop0: state M): cannot mount read-write because of unknown compat_ro features (0x403)
Reported-by: Chung-Chiang Cheng <shepjeng@gmail.com>
Fixes: 81d5d61454 ("btrfs: enhance unsupported compat RO flags handling")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f2e84ca60 upstream.
During lseek, when searching for delalloc in a range that represents a
hole and that range has a length of 1 byte, we end up not doing the actual
delalloc search in the inode's io tree, resulting in not correctly
reporting the offset with data or a hole. This actually only happens when
the start offset is 0 because with any other start offset we round it down
by sector size.
Reproducer:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt/sdc
$ xfs_io -f -c "pwrite -q 0 1" /mnt/sdc/foo
$ xfs_io -c "seek -d 0" /mnt/sdc/foo
Whence Result
DATA EOF
It should have reported an offset of 0 instead of EOF.
Fix this by updating btrfs_find_delalloc_in_range() and count_range_bits()
to deal with inclusive ranges properly. These functions are already
supposed to work with inclusive end offsets, they just got it wrong in a
couple places due to off-by-one mistakes.
A test case for fstests will be added later.
Reported-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/20221223020509.457113-1-joanbrugueram@gmail.com/
Fixes: b6e833567e ("btrfs: make hole and data seeking a lot more efficient")
CC: stable@vger.kernel.org # 6.1
Tested-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4e3ef5685 upstream.
If we delay sending End Transfer for Setup TRB to be prepared, we need
to check if the End Transfer was in preparation for a driver
teardown/soft-disconnect. In those cases, just send the End Transfer
command without delay.
In the case of soft-disconnect, there's a very small chance the command
may not go through immediately. But should it happen, the Setup TRB will
be prepared during the polling of the controller halted state, allowing
the command to go through then.
In the case of disabling endpoint due to reconfiguration (e.g.
set_interface(alt-setting) or usb reset), then it's driven by the host.
Typically the host wouldn't immediately cancel the control request and
send another control transfer to trigger the End Transfer command
timeout.
Fixes: 4db0fbb601 ("usb: dwc3: gadget: Don't delay End Transfer on delayed_status")
Cc: stable@vger.kernel.org
Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/f1617a323e190b9cc408fb8b65456e32b5814113.1670546756.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7246210ecd upstream.
When the server interface for a channel is not active anymore,
we have the logic to select an alternative interface. However
this was not breaking out of the loop as soon as a new alternative
was found. As a result, some interfaces may get refcounted unintentionally.
There was also a bug in checking if we found an alternate iface.
Fixed that too.
Fixes: b54034a73b ("cifs: during reconnect, update interface if necessary")
Cc: stable@vger.kernel.org # 5.19+
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc7d79d4fa upstream.
The last fix to iface_count did fix the overcounting issue.
However, during each refresh, we could end up undercounting
the iface_count, if a match was found.
Fixing this by doing increments and decrements instead of
setting it to 0 before each parsing of server interfaces.
Fixes: 096bbeec7b ("smb3: interface count displayed incorrectly")
Cc: stable@vger.kernel.org # 6.1
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 63d5429f68 ]
Using strncpy() on NUL-terminated strings are deprecated. To avoid
possible forming of non-terminated string strscpy() should be used.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit e4db04f7d3 upstream.
There is issue as follows when do setxattr with inject fault:
[localhost]# fsck.ext4 -fn /dev/sda
e2fsck 1.46.6-rc1 (12-Sep-2022)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached zero-length inode 15. Clear? no
Unattached inode 15
Connect to /lost+found? no
Pass 5: Checking group summary information
/dev/sda: ********** WARNING: Filesystem still has errors **********
/dev/sda: 15/655360 files (0.0% non-contiguous), 66755/2621440 blocks
This occurs in 'ext4_xattr_inode_create()'. If 'ext4_mark_inode_dirty()'
fails, dropping i_nlink of the inode is needed. Or will lead to inode leak.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221208023233.1231330-5-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a44e84a9b7 upstream.
When manipulating xattr blocks, we can deadlock infinitely looping
inside ext4_xattr_block_set() where we constantly keep finding xattr
block for reuse in mbcache but we are unable to reuse it because its
reference count is too big. This happens because cache entry for the
xattr block is marked as reusable (e_reusable set) although its
reference count is too big. When this inconsistency happens, this
inconsistent state is kept indefinitely and so ext4_xattr_block_set()
keeps retrying indefinitely.
The inconsistent state is caused by non-atomic update of e_reusable bit.
e_reusable is part of a bitfield and e_reusable update can race with
update of e_referenced bit in the same bitfield resulting in loss of one
of the updates. Fix the problem by using atomic bitops instead.
This bug has been around for many years, but it became *much* easier
to hit after commit 65f8b80053 ("ext4: fix race when reusing xattr
blocks").
Cc: stable@vger.kernel.org
Fixes: 6048c64b26 ("mbcache: add reusable flag to cache entries")
Fixes: 65f8b80053 ("ext4: fix race when reusing xattr blocks")
Reported-and-tested-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reported-by: Thilo Fromm <t-lo@linux.microsoft.com>
Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microsoft.com/
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20221123193950.16758-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b40ebaf638 upstream.
Commit fb0a387dcd ("ext4: limit block allocations for indirect-block
files to < 2^32") added code to try to allocate xattr block with 32-bit
block number for indirect block based files on the grounds that these
files cannot use larger block numbers. It also added BUG_ON when
allocated block could not fit into 32 bits. This is however bogus
reasoning because xattr block is stored in inode->i_file_acl and
inode->i_file_acl_hi and as such even indirect block based files can
happily use full 48 bits for xattr block number. The proper handling
seems to be there basically since 64-bit block number support was added.
So remove the bogus limitation and BUG_ON.
Cc: Eric Sandeen <sandeen@redhat.com>
Fixes: fb0a387dcd ("ext4: limit block allocations for indirect-block files to < 2^32")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221121130929.32031-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f49ec603a upstream.
In commit 9a8c5b0d06 ("ext4: update the backup superblock's at the end
of the online resize"), it is assumed that update_backups() only updates
backup superblocks, so each b_data is treated as a backupsuper block to
update its s_block_group_nr and s_checksum. However, update_backups()
also updates the backup group descriptors, which causes the backup group
descriptors to be corrupted.
The above commit fixes the problem of invalid checksum of the backup
superblock. The root cause of this problem is that the checksum of
ext4_update_super() is not set correctly. This problem has been fixed
in the previous patch ("ext4: fix bad checksum after online resize").
However, we do need to set block_group_nr for the backup superblock in
update_backups(). When a block is in a group that contains a backup
superblock, and the block is the first block in the group, the block is
definitely a superblock. We add a helper function that includes setting
s_block_group_nr and updating checksum, and then call it only when the
above conditions are met to prevent the backup group descriptors from
being incorrectly modified.
Fixes: 9a8c5b0d06 ("ext4: update the backup superblock's at the end of the online resize")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20221117040341.1380702-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a408f33e89 upstream.
When online resizing is performed twice consecutively, the error message
"Superblock checksum does not match superblock" is displayed for the
second time. Here's the reproducer:
mkfs.ext4 -F /dev/sdb 100M
mount /dev/sdb /tmp/test
resize2fs /dev/sdb 5G
resize2fs /dev/sdb 6G
To solve this issue, we moved the update of the checksum after the
es->s_overhead_clusters is updated.
Fixes: 026d0d27c4 ("ext4: reduce computation of overhead during resize")
Fixes: de394a8665 ("ext4: update s_overhead_clusters in the superblock during an on-line resize")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20221117040341.1380702-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0aeaa2559d upstream.
When a backup superblock is updated in update_backups(), the primary
superblock's offset in the group (that is, sbi->s_sbh->b_blocknr) is used
as the backup superblock's offset in its group. However, when the block
size is 1K and bigalloc is enabled, the two offsets are not equal. This
causes the backup group descriptors to be overwritten by the superblock
in update_backups(). Moreover, if meta_bg is enabled, the file system will
be corrupted because this feature uses backup group descriptors.
To solve this issue, we use a more accurate ext4_group_first_block_no() as
the offset of the backup superblock in its group.
Fixes: d77147ff44 ("ext4: add support for online resizing with bigalloc")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20221117040341.1380702-4-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 131294c35e upstream.
When converting files with inline data to extents, delayed allocations
made on a file system created with both the bigalloc and inline options
can result in invalid extent status cache content, incorrect reserved
cluster counts, kernel memory leaks, and potential kernel panics.
With bigalloc, the code that determines whether a block must be
delayed allocated searches the extent tree to see if that block maps
to a previously allocated cluster. If not, the block is delayed
allocated, and otherwise, it isn't. However, if the inline option is
also used, and if the file containing the block is marked as able to
store data inline, there isn't a valid extent tree associated with
the file. The current code in ext4_clu_mapped() calls
ext4_find_extent() to search the non-existent tree for a previously
allocated cluster anyway, which typically finds nothing, as desired.
However, a side effect of the search can be to cache invalid content
from the non-existent tree (garbage) in the extent status tree,
including bogus entries in the pending reservation tree.
To fix this, avoid searching the extent tree when allocating blocks
for bigalloc + inline files that are being converted from inline to
extent mapped.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20221117152207.2424-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48a6a66db8 upstream.
Due to several different off-by-one errors, or perhaps due to a late
change in design that wasn't fully reflected in the code that was
actually merged, there are several very strange constraints on how
fast-commit blocks are filled with tlv entries:
- tlvs must start at least 10 bytes before the end of the block, even
though the minimum tlv length is 8. Otherwise, the replay code will
ignore them. (BUG: ext4_fc_reserve_space() could violate this
requirement if called with a len of blocksize - 9 or blocksize - 8.
Fortunately, this doesn't seem to happen currently.)
- tlvs must end at least 1 byte before the end of the block. Otherwise
the replay code will consider them to be invalid. This quirk
contributed to a bug (fixed by an earlier commit) where uninitialized
memory was being leaked to disk in the last byte of blocks.
Also, strangely these constraints don't apply to the replay code in
e2fsprogs, which will accept any tlvs in the blocks (with no bounds
checks at all, but that is a separate issue...).
Given that this all seems to be a bug, let's fix it by just filling
blocks with tlv entries in the natural way.
Note that old kernels will be unable to replay fast-commit journals
created by kernels that have this commit.
Fixes: aa75f4d3da ("ext4: main fast-commit commit path")
Cc: <stable@vger.kernel.org> # v5.10+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221106224841.279231-7-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0fbcb5251f upstream.
fast-commit of create, link, and unlink operations in encrypted
directories is completely broken because the unencrypted filenames are
being written to the fast-commit journal instead of the encrypted
filenames. These operations can't be replayed, as encryption keys
aren't present at journal replay time. It is also an information leak.
Until if/when we can get this working properly, make encrypted directory
operations ineligible for fast-commit.
Note that fast-commit operations on encrypted regular files continue to
be allowed, as they seem to work.
Fixes: aa75f4d3da ("ext4: main fast-commit commit path")
Cc: <stable@vger.kernel.org> # v5.10+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221106224841.279231-2-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 105c78e124 upstream.
Mounting a filesystem whose journal inode has the encrypt flag causes a
NULL dereference in fscrypt_limit_io_blocks() when the 'inlinecrypt'
mount option is used.
The problem is that when jbd2_journal_init_inode() calls bmap(), it
eventually finds its way into ext4_iomap_begin(), which calls
fscrypt_limit_io_blocks(). fscrypt_limit_io_blocks() requires that if
the inode is encrypted, then its encryption key must already be set up.
That's not the case here, since the journal inode is never "opened" like
a normal file would be. Hence the crash.
A reproducer is:
mkfs.ext4 -F /dev/vdb
debugfs -w /dev/vdb -R "set_inode_field <8> flags 0x80808"
mount /dev/vdb /mnt -o inlinecrypt
To fix this, make ext4 consider journal inodes with the encrypt flag to
be invalid. (Note, maybe other flags should be rejected on the journal
inode too. For now, this is just the minimal fix for the above issue.)
I've marked this as fixing the commit that introduced the call to
fscrypt_limit_io_blocks(), since that's what made an actual crash start
being possible. But this fix could be applied to any version of ext4
that supports the encrypt feature.
Reported-by: syzbot+ba9dac45bc76c490b7c3@syzkaller.appspotmail.com
Fixes: 38ea50daa7 ("ext4: support direct I/O with fscrypt using blk-crypto")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221102053312.189962-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 991ed014de upstream.
We got a issue as fllows:
==================================================================
kernel BUG at fs/ext4/extents_status.c:203!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 945 Comm: cat Not tainted 6.0.0-next-20221007-dirty #349
RIP: 0010:ext4_es_end.isra.0+0x34/0x42
RSP: 0018:ffffc9000143b768 EFLAGS: 00010203
RAX: 0000000000000000 RBX: ffff8881769cd0b8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8fc27cf7 RDI: 00000000ffffffff
RBP: ffff8881769cd0bc R08: 0000000000000000 R09: ffffc9000143b5f8
R10: 0000000000000001 R11: 0000000000000001 R12: ffff8881769cd0a0
R13: ffff8881768e5668 R14: 00000000768e52f0 R15: 0000000000000000
FS: 00007f359f7f05c0(0000)GS:ffff88842fd00000(0000)knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f359f5a2000 CR3: 000000017130c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__es_tree_search.isra.0+0x6d/0xf5
ext4_es_cache_extent+0xfa/0x230
ext4_cache_extents+0xd2/0x110
ext4_find_extent+0x5d5/0x8c0
ext4_ext_map_blocks+0x9c/0x1d30
ext4_map_blocks+0x431/0xa50
ext4_mpage_readpages+0x48e/0xe40
ext4_readahead+0x47/0x50
read_pages+0x82/0x530
page_cache_ra_unbounded+0x199/0x2a0
do_page_cache_ra+0x47/0x70
page_cache_ra_order+0x242/0x400
ondemand_readahead+0x1e8/0x4b0
page_cache_sync_ra+0xf4/0x110
filemap_get_pages+0x131/0xb20
filemap_read+0xda/0x4b0
generic_file_read_iter+0x13a/0x250
ext4_file_read_iter+0x59/0x1d0
vfs_read+0x28f/0x460
ksys_read+0x73/0x160
__x64_sys_read+0x1e/0x30
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
==================================================================
In the above issue, ioctl invokes the swap_inode_boot_loader function to
swap inode<5> and inode<12>. However, inode<5> contain incorrect imode and
disordered extents, and i_nlink is set to 1. The extents check for inode in
the ext4_iget function can be bypassed bacause 5 is EXT4_BOOT_LOADER_INO.
While links_count is set to 1, the extents are not initialized in
swap_inode_boot_loader. After the ioctl command is executed successfully,
the extents are swapped to inode<12>, in this case, run the `cat` command
to view inode<12>. And Bug_ON is triggered due to the incorrect extents.
When the boot loader inode is not initialized, its imode can be one of the
following:
1) the imode is a bad type, which is marked as bad_inode in ext4_iget and
set to S_IFREG.
2) the imode is good type but not S_IFREG.
3) the imode is S_IFREG.
The BUG_ON may be triggered by bypassing the check in cases 1 and 2.
Therefore, when the boot loader inode is bad_inode or its imode is not
S_IFREG, initialize the inode to avoid triggering the BUG.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221026042310.3839669-5-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 318cdc822c upstream.
In ext4_evict_inode(), if we evicting an inode in the 'no_delete' path,
it cannot be raced by another mark_inode_dirty(). If it happens,
someone else may accidentally dirty it without holding inode refcount
and probably cause use-after-free issues in the writeback procedure.
It's indiscoverable and hard to debug, so add an WARN_ON_ONCE() to
check and detect this issue in advance.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220629112647.4141034-2-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>