Commit Graph

784247 Commits

Author SHA1 Message Date
Chao Yu
4e2f733167 f2fs: clean up codes with {f2fs_,}data_blkaddr()
- rename datablock_addr() to data_blkaddr().
- wrap data_blkaddr() with f2fs_data_blkaddr() to clean up
parameters.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:54 -07:00
Jaegeuk Kim
eb8bca0262 f2fs: show mounted time
Let's show mounted time.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:54 -07:00
Takashi Iwai
b484631337 f2fs: Use scnprintf() for avoiding potential buffer overflow
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:54 -07:00
Chao Yu
8aa412921f f2fs: allow to clear F2FS_COMPR_FL flag
If regular inode has no compressed cluster, allow using 'chattr -c'
to remove its compress flag, recovering it to a non-compressed file.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:54 -07:00
Chao Yu
3da4dc032f f2fs: fix to check dirty pages during compressed inode conversion
Compressed cluster can be generated during dirty data writeback,
if there is dirty pages on compressed inode, it needs to disable
converting compressed inode to non-compressed one.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:54 -07:00
Chao Yu
34303c6d02 f2fs: fix to account compressed inode correctly
stat_inc_compr_inode() needs to check FI_COMPRESSED_FILE flag, so
in f2fs_disable_compressed_file(), we should call stat_dec_compr_inode()
before clearing FI_COMPRESSED_FILE flag.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:54 -07:00
Jaegeuk Kim
71abd2acf1 f2fs: fix wrong check on F2FS_IOC_FSSETXATTR
This fixes the incorrect failure when enabling project quota on casefold-enabled
file.

Cc: Daniel Rosenberg <drosen@google.com>
Cc: kernel-team@android.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:54 -07:00
Chao Yu
64f6384ef2 f2fs: fix to avoid use-after-free in f2fs_write_multi_pages()
In compress cluster, if physical block number is less than logic
page number, race condition will cause use-after-free issue as
described below:

- f2fs_write_compressed_pages
 - fio.page = cic->rpages[0];
 - f2fs_outplace_write_data
					- f2fs_compress_write_end_io
					 - kfree(cic->rpages);
					 - kfree(cic);
 - fio.page = cic->rpages[1];

f2fs_write_multi_pages+0xfd0/0x1a98
f2fs_write_data_pages+0x74c/0xb5c
do_writepages+0x64/0x108
__writeback_single_inode+0xdc/0x4b8
writeback_sb_inodes+0x4d0/0xa68
__writeback_inodes_wb+0x88/0x178
wb_writeback+0x1f0/0x424
wb_workfn+0x2f4/0x574
process_one_work+0x210/0x48c
worker_thread+0x2e8/0x44c
kthread+0x110/0x120
ret_from_fork+0x10/0x18

Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:53 -07:00
Chao Yu
55a5997c03 f2fs: fix to avoid using uninitialized variable
In f2fs_vm_page_mkwrite(), if inode is compress one, and current mmapped
page locates in compressed cluster, we have to call f2fs_get_dnode_of_data()
to get its physical block address before f2fs_wait_on_block_writeback().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:53 -07:00
Chao Yu
82304e39e6 f2fs: fix inconsistent comments
Lack of maintenance on comments may mislead developers, fix them.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:53 -07:00
Chao Yu
9f900c267f f2fs: remove i_sem lock coverage in f2fs_setxattr()
f2fs_inode.xattr_ver field was gone after commit d260081ccf
("f2fs: change recovery policy of xattr node block"), remove i_sem
lock coverage in f2fs_setxattr()

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:53 -07:00
Chao Yu
584ae9ec0f f2fs: cover last_disk_size update with spinlock
This change solves below hangtask issue:

INFO: task kworker/u16:1:58 blocked for more than 122 seconds.
      Not tainted 5.6.0-rc2-00590-g9983bdae4974e #11
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:1   D    0    58      2 0x00000000
Workqueue: writeback wb_workfn (flush-179:0)
Backtrace:
 (__schedule) from [<c0913234>] (schedule+0x78/0xf4)
 (schedule) from [<c017ec74>] (rwsem_down_write_slowpath+0x24c/0x4c0)
 (rwsem_down_write_slowpath) from [<c0915f2c>] (down_write+0x6c/0x70)
 (down_write) from [<c0435b80>] (f2fs_write_single_data_page+0x608/0x7ac)
 (f2fs_write_single_data_page) from [<c0435fd8>] (f2fs_write_cache_pages+0x2b4/0x7c4)
 (f2fs_write_cache_pages) from [<c043682c>] (f2fs_write_data_pages+0x344/0x35c)
 (f2fs_write_data_pages) from [<c0267ee8>] (do_writepages+0x3c/0xd4)
 (do_writepages) from [<c0310cbc>] (__writeback_single_inode+0x44/0x454)
 (__writeback_single_inode) from [<c03112d0>] (writeback_sb_inodes+0x204/0x4b0)
 (writeback_sb_inodes) from [<c03115cc>] (__writeback_inodes_wb+0x50/0xe4)
 (__writeback_inodes_wb) from [<c03118f4>] (wb_writeback+0x294/0x338)
 (wb_writeback) from [<c0312dac>] (wb_workfn+0x35c/0x54c)
 (wb_workfn) from [<c014f2b8>] (process_one_work+0x214/0x544)
 (process_one_work) from [<c014f634>] (worker_thread+0x4c/0x574)
 (worker_thread) from [<c01564fc>] (kthread+0x144/0x170)
 (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)

Reported-and-tested-by: Ondřej Jirman <megi@xff.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:53 -07:00
Chao Yu
0f87f20d05 f2fs: fix to check i_compr_blocks correctly
inode.i_blocks counts based on 512byte sector, we need to convert
to 4kb sized block count before comparing to i_compr_blocks.

In addition, add to print message when sanity check on inode
compression configs failed.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-06 13:02:51 -07:00
Chao Yu
3e27aebf89 f2fs: fix to avoid potential deadlock
Using f2fs_trylock_op() in f2fs_write_compressed_pages() to avoid potential
deadlock like we did in f2fs_write_single_data_page().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-28 07:18:00 -07:00
Chao Yu
5063cb3447 f2fs: add missing function name in kernel message
Otherwise, we can not distinguish the exact location of messages,
when there are more than one places printing same message.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-28 07:18:00 -07:00
Chao Yu
cb54e0dd33 f2fs: recycle unused compress_data.chksum feild
In Struct compress_data, chksum field was never used, remove it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-28 07:18:00 -07:00
Chao Yu
3fff3163db f2fs: fix to avoid NULL pointer dereference
Unable to handle kernel NULL pointer dereference at virtual address 00000000
PC is at f2fs_free_dic+0x60/0x2c8
LR is at f2fs_decompress_pages+0x3c4/0x3e8
f2fs_free_dic+0x60/0x2c8
f2fs_decompress_pages+0x3c4/0x3e8
__read_end_io+0x78/0x19c
f2fs_post_read_work+0x6c/0x94
process_one_work+0x210/0x48c
worker_thread+0x2e8/0x44c
kthread+0x110/0x120
ret_from_fork+0x10/0x18

In f2fs_free_dic(), we can not use f2fs_put_page(,1) to release dic->tpages[i],
as the page's mapping is NULL.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-28 07:17:59 -07:00
Eric Biggers
ee10cab2bf f2fs: fix leaking uninitialized memory in compressed clusters
When the compressed data of a cluster doesn't end on a page boundary,
the remainder of the last page must be zeroed in order to avoid leaking
uninitialized memory to disk.

Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-28 07:17:59 -07:00
Sahitya Tummala
2d41291c72 f2fs: fix the panic in do_checkpoint()
There could be a scenario where f2fs_sync_meta_pages() will not
ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus,
resulting in the below panic in do_checkpoint() -

f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) &&
				!f2fs_cp_error(sbi));

This can happen in a low-memory condition, where shrinker could
also be doing the writepage operation (stack shown below)
at the same time when checkpoint is running on another core.

schedule
down_write
f2fs_submit_page_write -> by this time, this page in page cache is tagged
			as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY
			is cleared, due to which f2fs_sync_meta_pages()
			cannot sync this page in do_checkpoint() path.
f2fs_do_write_meta_page
__f2fs_write_meta_page
f2fs_write_meta_page
shrink_page_list
shrink_inactive_list
shrink_node_memcg
shrink_node
kswapd

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-28 07:17:59 -07:00
Chao Yu
fc414c6c45 f2fs: fix to wait all node page writeback
There is a race condition that we may miss to wait for all node pages
writeback, fix it.

- fsync()				- shrink
 - f2fs_do_sync_file
					 - __write_node_page
					  - set_page_writeback(page#0)
					  : remove DIRTY/TOWRITE flag
  - f2fs_fsync_node_pages
  : won't find page #0 as TOWRITE flag was removeD
  - f2fs_wait_on_node_pages_writeback
  : wont' wait page #0 writeback as it was not in fsync_node_list list.
					   - f2fs_add_fsync_node_entry

Fixes: 50fa53eccf ("f2fs: fix to avoid broken of dnode block list")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-28 07:17:59 -07:00
Naohiro Aota
66f10f5d58 mm/swapfile.c: move inode_lock out of claim_swapfile
claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -ebusy).  and, on the other error
cases, it does not lock the inode.

this inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

this commit fixes this issue by moving the inode_lock() and is_swapfile
check out of claim_swapfile().  the inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap".  thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    warning: bad unlock balance detected!
    5.5.0-rc7+ #176 not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
    [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    cpu: 5 pid: 4294 comm: swapon not tainted 5.5.0-rc7-btrfs-zns+ #176
    hardware name: asus all series/h87-pro, bios 2102 07/29/2014
    call trace:
     dump_stack+0xa1/0xea
     ? __do_sys_swapon+0x94b/0x3550
     print_unlock_imbalance_bug.cold+0x114/0x123
     ? __do_sys_swapon+0x94b/0x3550
     lock_release+0x562/0xed0
     ? kvfree+0x31/0x40
     ? lock_downgrade+0x770/0x770
     ? kvfree+0x31/0x40
     ? rcu_read_lock_sched_held+0xa1/0xd0
     ? rcu_read_lock_bh_held+0xb0/0xb0
     up_write+0x2d/0x490
     ? kfree+0x293/0x2f0
     __do_sys_swapon+0x94b/0x3550
     ? putname+0xb0/0xf0
     ? kmem_cache_free+0x2e7/0x370
     ? do_sys_open+0x184/0x3e0
     ? generic_max_swapfile_size+0x40/0x40
     ? do_syscall_64+0x27/0x4b0
     ? entry_syscall_64_after_hwframe+0x49/0xbe
     ? lockdep_hardirqs_on+0x38c/0x590
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_syscall_64_after_hwframe+0x49/0xbe
    rip: 0033:0x7f15da0a0dc7

link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com
fixes: 1638045c36 ("mm: set s_swapfile on blockdev swap devices")
signed-off-by: naohiro aota <naohiro.aota@wdc.com>
reviewed-by: andrew morton <akpm@linux-foundation.org>
reviewed-by: darrick j. wong <darrick.wong@oracle.com>
tested-by: qais youef <qais.yousef@arm.com>
cc: christoph hellwig <hch@infradead.org>
cc: <stable@vger.kernel.org>
2020-03-28 07:17:56 -07:00
Eric Biggers
5b2fc72a91 fscrypt: don't evict dirty inodes after removing key
After FS_IOC_REMOVE_ENCRYPTION_KEY removes a key, it syncs the
filesystem and tries to get and put all inodes that were unlocked by the
key so that unused inodes get evicted via fscrypt_drop_inode().
Normally, the inodes are all clean due to the sync.

However, after the filesystem is sync'ed, userspace can modify and close
one of the files.  (Userspace is *supposed* to close the files before
removing the key.  But it doesn't always happen, and the kernel can't
assume it.)  This causes the inode to be dirtied and have i_count == 0.
Then, fscrypt_drop_inode() failed to consider this case and indicated
that the inode can be dropped, causing the write to be lost.

On f2fs, other problems such as a filesystem freeze could occur due to
the inode being freed while still on f2fs's dirty inode list.

Fix this bug by making fscrypt_drop_inode() only drop clean inodes.

I've written an xfstest which detects this bug on ext4, f2fs, and ubifs.

Fixes: b1c0ec3599 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl")
Cc: <stable@vger.kernel.org> # v5.4+
Link: https://lore.kernel.org/r/20200305084138.653498-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-03-11 14:08:50 -07:00
Eric Biggers
768209cdb6 fs-verity: use u64_to_user_ptr()
<linux/kernel.h> already provides a macro u64_to_user_ptr().
Use it instead of open-coding the two casts.

No change in behavior.

Link: https://lore.kernel.org/r/20191231175408.20524-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:14:20 -08:00
Eric Biggers
62810e0d59 fs-verity: use mempool for hash requests
When initializing an fs-verity hash algorithm, also initialize a mempool
that contains a single preallocated hash request object.  Then replace
the direct calls to ahash_request_alloc() and ahash_request_free() with
allocating and freeing from this mempool.

This eliminates the possibility of the allocation failing, which is
desirable for the I/O path.

This doesn't cause deadlocks because there's no case where multiple hash
requests are needed at a time to make forward progress.

Link: https://lore.kernel.org/r/20191231175545.20709-1-ebiggers@kernel.org
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:14:20 -08:00
Eric Biggers
dcd848f102 fs-verity: implement readahead of Merkle tree pages
When fs-verity verifies data pages, currently it reads each Merkle tree
page synchronously using read_mapping_page().

Therefore, when the Merkle tree pages aren't already cached, fs-verity
causes an extra 4 KiB I/O request for every 512 KiB of data (assuming
that the Merkle tree uses SHA-256 and 4 KiB blocks).  This results in
more I/O requests and performance loss than is strictly necessary.

Therefore, implement readahead of the Merkle tree pages.

For simplicity, we take advantage of the fact that the kernel already
does readahead of the file's *data*, just like it does for any other
file.  Due to this, we don't really need a separate readahead state
(struct file_ra_state) just for the Merkle tree, but rather we just need
to piggy-back on the existing data readahead requests.

We also only really need to bother with the first level of the Merkle
tree, since the usual fan-out factor is 128, so normally over 99% of
Merkle tree I/O requests are for the first level.

Therefore, make fsverity_verify_bio() enable readahead of the first
Merkle tree level, for up to 1/4 the number of pages in the bio, when it
sees that the REQ_RAHEAD flag is set on the bio.  The readahead size is
then passed down to ->read_merkle_tree_page() for the filesystem to
(optionally) implement if it sees that the requested page is uncached.

While we're at it, also make build_merkle_tree_level() set the Merkle
tree readahead size, since it's easy to do there.

However, for now don't set the readahead size in fsverity_verify_page(),
since currently it's only used to verify holes on ext4 and f2fs, and it
would need parameters added to know how much to read ahead.

This patch significantly improves fs-verity sequential read performance.
Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches:

    On an ARM64 phone (using sha256-ce):
        Before: 217 MB/s
        After: 263 MB/s
        (compare to sha256sum of non-verity file: 357 MB/s)

    In an x86_64 VM (using sha256-avx2):
        Before: 173 MB/s
        After: 215 MB/s
        (compare to sha256sum of non-verity file: 223 MB/s)

Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.org
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:14:14 -08:00
Eric Biggers
9630eacb1e fs-verity: implement readahead for FS_IOC_ENABLE_VERITY
When it builds the first level of the Merkle tree, FS_IOC_ENABLE_VERITY
sequentially reads each page of the file using read_mapping_page().
This works fine if the file's data is already in pagecache, which should
normally be the case, since this ioctl is normally used immediately
after writing out the file.

But in any other case this implementation performs very poorly, since
only one page is read at a time.

Fix this by implementing readahead using the functions from
mm/readahead.c.

This improves performance in the uncached case by about 20x, as seen in
the following benchmarks done on a 250MB file (on x86_64 with SHA-NI):

    FS_IOC_ENABLE_VERITY uncached (before) 3.299s
    FS_IOC_ENABLE_VERITY uncached (after)  0.160s
    FS_IOC_ENABLE_VERITY cached            0.147s
    sha256sum uncached                     0.191s
    sha256sum cached                       0.145s

Note: we could instead switch to kernel_read().  But that would mean
we'd no longer be hashing the data directly from the pagecache, which is
a nice optimization of its own.  And using kernel_read() would require
allocating another temporary buffer, hashing the data and tree pages
separately, and explicitly zero-padding the last page -- so it wouldn't
really be any simpler than direct pagecache access, at least for now.

Link: https://lore.kernel.org/r/20200106205410.136707-1-ebiggers@kernel.org
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:14:14 -08:00
Daniel Rosenberg
86eb43f574 fscrypt: improve format of no-key names
When an encrypted directory is listed without the key, the filesystem
must show "no-key names" that uniquely identify directory entries, are
at most 255 (NAME_MAX) bytes long, and don't contain '/' or '\0'.
Currently, for short names the no-key name is the base64 encoding of the
ciphertext filename, while for long names it's the base64 encoding of
the ciphertext filename's dirhash and second-to-last 16-byte block.

This format has the following problems:

- Since it doesn't always include the dirhash, it's incompatible with
  directories that will use a secret-keyed dirhash over the plaintext
  filenames.  In this case, the dirhash won't be computable from the
  ciphertext name without the key, so it instead must be retrieved from
  the directory entry and always included in the no-key name.
  Casefolded encrypted directories will use this type of dirhash.

- It's ambiguous: it's possible to craft two filenames that map to the
  same no-key name, since the method used to abbreviate long filenames
  doesn't use a proper cryptographic hash function.

Solve both these problems by switching to a new no-key name format that
is the base64 encoding of a variable-length structure that contains the
dirhash, up to 149 bytes of the ciphertext filename, and (if any bytes
remain) the SHA-256 of the remaining bytes of the ciphertext filename.

This ensures that each no-key name contains everything needed to find
the directory entry again, contains only legal characters, doesn't
exceed NAME_MAX, is unambiguous unless there's a SHA-256 collision, and
that we only take the performance hit of SHA-256 on very long filenames.

Note: this change does *not* address the existing issue where users can
modify the 'dirhash' part of a no-key name and the filesystem may still
accept the name.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
[EB: improved comments and commit message, fixed checking return value
 of base64_decode(), check for SHA-256 error, continue to set disk_name
 for short names to keep matching simpler, and many other cleanups]
Link: https://lore.kernel.org/r/20200120223201.241390-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:14:09 -08:00
Eric Biggers
eeb955a8ad ubifs: allow both hash and disk name to be provided in no-key names
In order to support a new dirhash method that is a secret-keyed hash
over the plaintext filenames (which will be used by encrypted+casefolded
directories on ext4 and f2fs), fscrypt will be switching to a new no-key
name format that always encodes the dirhash in the name.

UBIFS isn't happy with this because it has assertions that verify that
either the hash or the disk name is provided, not both.

Change it to use the disk name if one is provided, even if a hash is
available too; else use the hash.

Link: https://lore.kernel.org/r/20200120223201.241390-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:10:34 -08:00
Eric Biggers
869dc687a8 ubifs: don't trigger assertion on invalid no-key filename
If userspace provides an invalid fscrypt no-key filename which encodes a
hash value with any of the UBIFS node type bits set (i.e. the high 3
bits), gracefully report ENOENT rather than triggering ubifs_assert().

Test case with kvm-xfstests shell:

    . fs/ubifs/config
    . ~/xfstests/common/encrypt
    dev=$(__blkdev_to_ubi_volume /dev/vdc)
    ubiupdatevol $dev -t
    mount $dev /mnt -t ubifs
    mkdir /mnt/edir
    xfs_io -c set_encpolicy /mnt/edir
    rm /mnt/edir/_,,,,,DAAAAAAAAAAAAAAAAAAAAAAAAAA

With the bug, the following assertion fails on the 'rm' command:

    [   19.066048] UBIFS error (ubi0:0 pid 379): ubifs_assert_failed: UBIFS assert failed: !(hash & ~UBIFS_S_KEY_HASH_MASK), in fs/ubifs/key.h:170

Fixes: f4f61d2cc6 ("ubifs: Implement encrypted filenames")
Cc: <stable@vger.kernel.org> # v4.10+
Link: https://lore.kernel.org/r/20200120223201.241390-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:10:34 -08:00
Eric Biggers
338a1f52ae fscrypt: clarify what is meant by a per-file key
Now that there's sometimes a second type of per-file key (the dirhash
key), clarify some function names, macros, and documentation that
specifically deal with per-file *encryption* keys.

Link: https://lore.kernel.org/r/20200120223201.241390-4-ebiggers@kernel.org
Reviewed-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:10:33 -08:00
Daniel Rosenberg
7495f91bb5 fscrypt: derive dirhash key for casefolded directories
When we allow indexed directories to use both encryption and
casefolding, for the dirhash we can't just hash the ciphertext filenames
that are stored on-disk (as is done currently) because the dirhash must
be case insensitive, but the stored names are case-preserving.  Nor can
we hash the plaintext names with an unkeyed hash (or a hash keyed with a
value stored on-disk like ext4's s_hash_seed), since that would leak
information about the names that encryption is meant to protect.

Instead, if we can accept a dirhash that's only computable when the
fscrypt key is available, we can hash the plaintext names with a keyed
hash using a secret key derived from the directory's fscrypt master key.
We'll use SipHash-2-4 for this purpose.

Prepare for this by deriving a SipHash key for each casefolded encrypted
directory.  Make sure to handle deriving the key not only when setting
up the directory's fscrypt_info, but also in the case where the casefold
flag is enabled after the fscrypt_info was already set up.  (We could
just always derive the key regardless of casefolding, but that would
introduce unnecessary overhead for people not using casefolding.)

Signed-off-by: Daniel Rosenberg <drosen@google.com>
[EB: improved commit message, updated fscrypt.rst, squashed with change
 that avoids unnecessarily deriving the key, and many other cleanups]
Link: https://lore.kernel.org/r/20200120223201.241390-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:10:33 -08:00
Daniel Rosenberg
f4951340a1 fscrypt: don't allow v1 policies with casefolding
Casefolded encrypted directories will use a new dirhash method that
requires a secret key.  If the directory uses a v2 encryption policy,
it's easy to derive this key from the master key using HKDF.  However,
v1 encryption policies don't provide a way to derive additional keys.

Therefore, don't allow casefolding on directories that use a v1 policy.
Specifically, make it so that trying to enable casefolding on a
directory that has a v1 policy fails, trying to set a v1 policy on a
casefolded directory fails, and trying to open a casefolded directory
that has a v1 policy (if one somehow exists on-disk) fails.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
[EB: improved commit message, updated fscrypt.rst, and other cleanups]
Link: https://lore.kernel.org/r/20200120223201.241390-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:10:33 -08:00
Eric Biggers
2ad325daa7 fscrypt: add "fscrypt_" prefix to fname_encrypt()
fname_encrypt() is a global function, due to being used in both fname.c
and hooks.c.  So it should be prefixed with "fscrypt_", like all the
other global functions in fs/crypto/.

Link: https://lore.kernel.org/r/20200120071736.45915-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:10:33 -08:00
Eric Biggers
23ad6ce441 fscrypt: don't print name of busy file when removing key
When an encryption key can't be fully removed due to file(s) protected
by it still being in-use, we shouldn't really print the path to one of
these files to the kernel log, since parts of this path are likely to be
encrypted on-disk, and (depending on how the system is set up) the
confidentiality of this path might be lost by printing it to the log.

This is a trade-off: a single file path often doesn't matter at all,
especially if it's a directory; the kernel log might still be protected
in some way; and I had originally hoped that any "inode(s) still busy"
bugs (which are security weaknesses in their own right) would be quickly
fixed and that to do so it would be super helpful to always know the
file path and not have to run 'find dir -inum $inum' after the fact.

But in practice, these bugs can be hard to fix (e.g. due to asynchronous
process killing that is difficult to eliminate, for performance
reasons), and also not tied to specific files, so knowing a file path
doesn't necessarily help.

So to be safe, for now let's just show the inode number, not the path.
If someone really wants to know a path they can use 'find -inum'.

Fixes: b1c0ec3599 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl")
Cc: <stable@vger.kernel.org> # v5.4+
Link: https://lore.kernel.org/r/20200120060732.390362-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:10:33 -08:00
Eric Biggers
6fe9354e00 fscrypt: document gfp_flags for bounce page allocation
Document that fscrypt_encrypt_pagecache_blocks() allocates the bounce
page from a mempool, and document what this means for the @gfp_flags
argument.

Link: https://lore.kernel.org/r/20191231181026.47400-1-ebiggers@kernel.org
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:04:00 -08:00
Eric Biggers
7df05e52cb fscrypt: optimize fscrypt_zeroout_range()
Currently fscrypt_zeroout_range() issues and waits on a bio for each
block it writes, which makes it very slow.

Optimize it to write up to 16 pages at a time instead.

Also add a function comment, and improve reliability by allowing the
allocations of the bio and the first ciphertext page to wait on the
corresponding mempools.

Link: https://lore.kernel.org/r/20191226160813.53182-1-ebiggers@kernel.org
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:04:00 -08:00
Eric Biggers
a9ae9e66a0 fscrypt: remove redundant bi_status check
submit_bio_wait() already returns bi_status translated to an errno.
So the additional check of bi_status is redundant and can be removed.

Link: https://lore.kernel.org/r/20191209204509.228942-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:04:00 -08:00
Herbert Xu
b504c7cead fscrypt: Allow modular crypto algorithms
The commit 643fa9612b ("fscrypt: remove filesystem specific
build config option") removed modular support for fs/crypto.  This
causes the Crypto API to be built-in whenever fscrypt is enabled.
This makes it very difficult for me to test modular builds of
the Crypto API without disabling fscrypt which is a pain.

As fscrypt is still evolving and it's developing new ties with the
fs layer, it's hard to build it as a module for now.

However, the actual algorithms are not required until a filesystem
is mounted.  Therefore we can allow them to be built as modules.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20191227024700.7vrzuux32uyfdgum@gondor.apana.org.au
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-13 15:03:58 -08:00
Eric Biggers
6f39a6b4de fscrypt: include <linux/ioctl.h> in UAPI header
<linux/fscrypt.h> defines ioctl numbers using the macros like _IOWR()
which are defined in <linux/ioctl.h>, so <linux/ioctl.h> should be
included as a prerequisite, like it is in many other kernel headers.

In practice this doesn't really matter since anyone referencing these
ioctl numbers will almost certainly include <sys/ioctl.h> too in order
to actually call ioctl().  But we might as well fix this.

Link: https://lore.kernel.org/r/20191219185624.21251-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:20 -08:00
Eric Biggers
11dd760288 fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info()
fscrypt_get_encryption_info() returns 0 if the encryption key is
unavailable; it never returns ENOKEY.  So remove checks for ENOKEY.

Link: https://lore.kernel.org/r/20191209212348.243331-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:20 -08:00
Eric Biggers
bfc935af5b fscrypt: remove fscrypt_is_direct_key_policy()
fscrypt_is_direct_key_policy() is no longer used, so remove it.

Link: https://lore.kernel.org/r/20191209211829.239800-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:20 -08:00
Eric Biggers
51a6bbc53f fscrypt: move fscrypt_valid_enc_modes() to policy.c
fscrypt_valid_enc_modes() is only used by policy.c, so move it to there.

Also adjust the order of the checks to be more natural, matching the
numerical order of the constants and also keeping AES-256 (the
recommended default) first in the list.

No change in behavior.

Link: https://lore.kernel.org/r/20191209211829.239800-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:20 -08:00
Eric Biggers
6dad35d9e8 fscrypt: check for appropriate use of DIRECT_KEY flag earlier
FSCRYPT_POLICY_FLAG_DIRECT_KEY is currently only allowed with Adiantum
encryption.  But FS_IOC_SET_ENCRYPTION_POLICY allowed it in combination
with other encryption modes, and an error wasn't reported until later
when the encrypted directory was actually used.

Fix it to report the error earlier by validating the correct use of the
DIRECT_KEY flag in fscrypt_supported_policy(), similar to how we
validate the IV_INO_LBLK_64 flag.

Link: https://lore.kernel.org/r/20191209211829.239800-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:19 -08:00
Eric Biggers
cf17f4020b fscrypt: split up fscrypt_supported_policy() by policy version
Make fscrypt_supported_policy() call new functions
fscrypt_supported_v1_policy() and fscrypt_supported_v2_policy(), to
reduce the indentation level and make the code easier to read.

Also adjust the function comment to mention that whether the encryption
policy is supported can also depend on the inode.

No change in behavior.

Link: https://lore.kernel.org/r/20191209211829.239800-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:19 -08:00
Eric Biggers
3597e506e5 fscrypt: introduce fscrypt_needs_contents_encryption()
Add a function fscrypt_needs_contents_encryption() which takes an inode
and returns true if it's an encrypted regular file and the kernel was
built with fscrypt support.

This will allow replacing duplicated checks of IS_ENCRYPTED() &&
S_ISREG() on the I/O paths in ext4 and f2fs, while also optimizing out
unneeded code when !CONFIG_FS_ENCRYPTION.

Link: https://lore.kernel.org/r/20191209205021.231767-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:19 -08:00
Eric Biggers
b168e58523 fscrypt: move fscrypt_d_revalidate() to fname.c
fscrypt_d_revalidate() and fscrypt_d_ops really belong in fname.c, since
they're specific to filenames encryption.  crypto.c is for contents
encryption and general fs/crypto/ initialization and utilities.

Link: https://lore.kernel.org/r/20191209204359.228544-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:19 -08:00
Eric Biggers
bac335ab74 fscrypt: constify inode parameter to filename encryption functions
Constify the struct inode parameter to fscrypt_fname_disk_to_usr() and
the other filename encryption functions so that users don't have to pass
in a non-const inode when they are dealing with a const one, as in [1].

[1] https://lkml.kernel.org/linux-ext4/20191203051049.44573-6-drosen@google.com/

Cc: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20191215213947.9521-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:19 -08:00
Eric Biggers
38c2723e47 fscrypt: constify struct fscrypt_hkdf parameter to fscrypt_hkdf_expand()
Constify the struct fscrypt_hkdf parameter to fscrypt_hkdf_expand().
This makes it clearer that struct fscrypt_hkdf contains the key only,
not any per-request state.

Link: https://lore.kernel.org/r/20191209204054.227736-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:19 -08:00
Eric Biggers
7eabda806e fscrypt: verify that the crypto_skcipher has the correct ivsize
As a sanity check, verify that the allocated crypto_skcipher actually
has the ivsize that fscrypt is assuming it has.  This will always be the
case unless there's a bug.  But if there ever is such a bug (e.g. like
there was in earlier versions of the ESSIV conversion patch [1]) it's
preferable for it to be immediately obvious, and not rely on the
ciphertext verification tests failing due to uninitialized IV bytes.

[1] https://lkml.kernel.org/linux-crypto/20190702215517.GA69157@gmail.com/

Link: https://lore.kernel.org/r/20191209203918.225691-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:19 -08:00
Eric Biggers
17b10a9cf6 fscrypt: use crypto_skcipher_driver_name()
Crypto API users shouldn't really be accessing struct skcipher_alg
directly.  <crypto/skcipher.h> already has a function
crypto_skcipher_driver_name(), so use that instead.

No change in behavior.

Link: https://lore.kernel.org/r/20191209203810.225302-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-12 21:26:18 -08:00