[ Upstream commit 4b270a8cc5 ]
In synchronous scenario, like in checkpoint(), we are going to flush
dirty node pages to device synchronously, we can easily failed
writebacking node page due to trylock_page() failure, especially in
condition of intensive lock competition, which can cause long latency
of checkpoint(). So let's use lock_page() in synchronous scenario to
avoid this issue.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch clears PageError in some pages tagged by read path, but when we
write the pages with valid contents, writepage should clear the bit likewise
ext4.
Change-Id: I434b22132f29f7243ab9170296a6e0b52e40701d
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit f453147e9315b3bc1050b590278a63d91fc2a681)
Cherry-pick from origin/upstream-f2fs-stable-linux-4.9.y:
commit 95459e9ebe19 ("Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer"")
This patch reverts copied f2fs_set_page_dirty_nobuffer to use generic function
for stability.
This reverts commit fe76b796fc.
Change-Id: I2d31d13eb14b2672f8d3c74c16c759ebbb6a1d32
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull f2fs update from Jaegeuk Kim:
"In this round, we've mainly focused on performance tuning and critical
bug fixes occurred in low-end devices. Sheng Yong introduced
lost_found feature to keep missing files during recovery instead of
thrashing them. We're preparing coming fsverity implementation. And,
we've got more features to communicate with users for better
performance. In low-end devices, some memory-related issues were
fixed, and subtle race condtions and corner cases were addressed as
well.
Enhancements:
- large nat bitmaps for more free node ids
- add three block allocation policies to pass down write hints given by user
- expose extension list to user and introduce hot file extension
- tune small devices seamlessly for low-end devices
- set readdir_ra by default
- give more resources under gc_urgent mode regarding to discard and cleaning
- introduce fsync_mode to enforce posix or not
- nowait aio support
- add lost_found feature to keep dangling inodes
- reserve bits for future fsverity feature
- add test_dummy_encryption for FBE
Bug fixes:
- don't use highmem for dentry pages
- align memory boundary for bitops
- truncate preallocated blocks in write errors
- guarantee i_times on fsync call
- clear CP_TRIMMED_FLAG correctly
- prevent node chain loop during recovery
- avoid data race between atomic write and background cleaning
- avoid unnecessary selinux violation warnings on resgid option
- GFP_NOFS to avoid deadlock in quota and read paths
- fix f2fs_skip_inode_update to allow i_size recovery
In addition to the above, there are several minor bug fixes and clean-ups"
Cherry-pick from origin/upstream-f2fs-stable-linux-4.9.y:
ac389af190fb f2fs: remain written times to update inode during fsync
270deeb87125 f2fs: make assignment of t->dentry_bitmap more readable
a4fa11c8da10 f2fs: truncate preallocated blocks in error case
4478970f0e73 f2fs: fix a wrong condition in f2fs_skip_inode_update
29cead58f5ea f2fs: reserve bits for fs-verity
848b293a5d95 f2fs: Add a segment type check in inplace write
2dc8f5a3a640 f2fs: no need to initialize zero value for GFP_F2FS_ZERO
83b9bb95a628 f2fs: don't track new nat entry in nat set
a33ce03ac477 f2fs: clean up with F2FS_BLK_ALIGN
a3f8ec8082e3 f2fs: check blkaddr more accuratly before issue a bio
034f11eadb16 f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read
aa5bcfd8f488 f2fs: introduce a new mount option test_dummy_encryption
9b880fe6e6e2 f2fs: introduce F2FS_FEATURE_LOST_FOUND feature
80d6489a08c1 f2fs: release locks before return in f2fs_ioc_gc_range()
9f1896c490eb f2fs: align memory boundary for bitops
c7930ee88334 f2fs: remove unneeded set_cold_node()
355d2346409a f2fs: add nowait aio support
e9a50e6b9479 f2fs: wrap all options with f2fs_sb_info.mount_opt
b6d2ec83e0c0 f2fs: Don't overwrite all types of node to keep node chain
9a954816298c f2fs: introduce mount option for fsync mode
4ce4eb697068 f2fs: fix to restore old mount option in ->remount_fs
8f711c344e61 f2fs: wrap sb_rdonly with f2fs_readonly
c07478ee84bf f2fs: avoid selinux denial on CAP_SYS_RESOURCE
ac734c416fa9 f2fs: support hot file extension
f4f10221accc f2fs: fix to avoid race in between atomic write and background GC
e87b13ec160b f2fs: do gc in greedy mode for whole range if gc_urgent mode is set
e9878588de94 f2fs: issue discard aggressively in the gc_urgent mode
ad3ce479e6e4 f2fs: set readdir_ra by default
5aae2026bbd2 f2fs: add auto tuning for small devices
78c1fc2d8f27 f2fs: add mount option for segment allocation policy
ecd02f564631 f2fs: don't stop GC if GC is contended
1e72cb27d2d6 f2fs: expose extension_list sysfs entry
061839d178ab f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range
4951ebcbc4e2 f2fs: introduce sb_lock to make encrypt pwsalt update exclusive
939f6be0420f f2fs: remove redundant initialization of pointer 'p'
39bea4bc8ef2 f2fs: flush cp pack except cp pack 2 page at first
770611eb2ab4 f2fs: clean up f2fs_sb_has_xxx functions
4d8e4a8965f9 f2fs: remove redundant check of page type when submit bio
e9878588de94 f2fs: issue discard aggressively in the gc_urgent mode
ad3ce479e6e4 f2fs: set readdir_ra by default
5aae2026bbd2 f2fs: add auto tuning for small devices
78c1fc2d8f27 f2fs: add mount option for segment allocation policy
ecd02f564631 f2fs: don't stop GC if GC is contended
1e72cb27d2d6 f2fs: expose extension_list sysfs entry
061839d178ab f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range
4951ebcbc4e2 f2fs: introduce sb_lock to make encrypt pwsalt update exclusive
939f6be0420f f2fs: remove redundant initialization of pointer 'p'
39bea4bc8ef2 f2fs: flush cp pack except cp pack 2 page at first
770611eb2ab4 f2fs: clean up f2fs_sb_has_xxx functions
4d8e4a8965f9 f2fs: remove redundant check of page type when submit bio
b57a37f01fda f2fs: fix to handle looped node chain during recovery
9ac5b8c54083 f2fs: handle quota for orphan inodes
87c18066016a f2fs: support passing down write hints to block layer with F2FS policy
bcdc571e8d8b f2fs: support passing down write hints given by users to block layer
92413bc12e32 f2fs: fix to clear CP_TRIMMED_FLAG
a1afb55f9784 f2fs: support large nat bitmap
636039140493 f2fs: fix to check extent cache in f2fs_drop_extent_tree
7de4fccdbce1 f2fs: restrict inline_xattr_size configuration
aae506a8b704 f2fs: fix heap mode to reset it back
8fa455bb6ea0 f2fs: fix potential corruption in area before F2FS_SUPER_OFFSET
9d9cb0ef73f9 fscrypt: fix build with pre-4.6 gcc versions
401052ffc6b4 fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info()
549b2061b3b5 fscrypt: fix up fscrypt_fname_encrypted_size() for internal use
c440b5091a0c fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
7d82f0e1c39a ext4: switch to fscrypt ->symlink() helper functions
ba4efe560438 ext4: switch to fscrypt_get_symlink()
b0edc2f22d24 fscrypt: calculate NUL-padding length in one place only
62cfdd9868c7 fscrypt: move fscrypt_symlink_data to fscrypt_private.h
e4e6776522bc fscrypt: remove fscrypt_fname_usr_to_disk()
45028b5aaa4e f2fs: switch to fscrypt_get_symlink()
f62d3d31e0c7 f2fs: switch to fscrypt ->symlink() helper functions
da32a1633ad3 fscrypt: new helper function - fscrypt_get_symlink()
a7e05c731d11 fscrypt: new helper functions for ->symlink()
eb9c5fd896de fscrypt: trim down fscrypt.h includes
0a02472d8ae2 fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
9d51ca80274c fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
efbfa8c6a056 fscrypt: move fscrypt_operations declaration to fscrypt_supp.h
616dbd2bdc6a fscrypt: split fscrypt_dummy_context_enabled() into supp/notsupp versions
f0c472bcbf1c fscrypt: move fscrypt_ctx declaration to fscrypt_supp.h
bc76f39109b1 fscrypt: move fscrypt_info_cachep declaration to fscrypt_private.h
b67b07ec4964 fscrypt: move fscrypt_control_page() to supp/notsupp headers
d8dfb89961d0 fscrypt: move fscrypt_has_encryption_key() to supp/notsupp headers
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've followed up to support some generic features such
as cgroup, block reservation, linking fscrypt_ops, delivering
write_hints, and some ioctls. And, we could fix some corner cases in
terms of power-cut recovery and subtle deadlocks.
Enhancements:
- bitmap operations to handle NAT blocks
- readahead to improve readdir speed
- switch to use fscrypt_*
- apply write hints for direct IO
- add reserve_root=%u,resuid=%u,resgid=%u to reserve blocks for root/uid/gid
- modify b_avail and b_free to consider root reserved blocks
- support cgroup writeback
- support FIEMAP_FLAG_XATTR for fibmap
- add F2FS_IOC_PRECACHE_EXTENTS to pre-cache extents
- add F2FS_IOC_{GET/SET}_PIN_FILE to pin LBAs for data blocks
- support inode creation time
Bug fixs:
- sysfile-based quota operations
- memory footprint accounting
- allow to write data on partial preallocation case
- fix deadlock case on fallocate
- fix to handle fill_super errors
- fix missing inode updates of fsync'ed file
- recover renamed file which was fsycn'ed before
- drop inmemory pages in corner error case
- keep last_disk_size correctly
- recover missing i_inline flags during roll-forward
Various clean-up patches were added as well"
Cherry-pick from origin/upstream-f2fs-stable-linux-4.9.y:
71f8f0499e8b f2fs: support inode creation time
58dc6f6fcef7 f2fs: rebuild sit page from sit info in mem
6393cef3f112 f2fs: stop issuing discard if fs is readonly
742bc90e88fa f2fs: clean up duplicated assignment in init_discard_policy
cfabb6edfbc2 f2fs: use GFP_F2FS_ZERO for cleanup
111e8456a697 f2fs: allow to recover node blocks given updated checkpoint
36e041a57ccf f2fs: recover some i_inline flags
3127a7b67ca8 f2fs: correct removexattr behavior for null valued extended attribute
86f78c1e5523 f2fs: drop page cache after fs shutdown
1a3b0047597c f2fs: stop gc/discard thread after fs shutdown
62a91a5a489f f2fs: hanlde error case in f2fs_ioc_shutdown
66356ee5f94c f2fs: split need_inplace_update
5912fbae9da5 f2fs: fix to update last_disk_size correctly
3aa46e2c2187 f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup
acdaca27aacb f2fs: clean up error path of fill_super
cf8821115c54 f2fs: avoid hungtask when GC encrypted block if io_bits is set
4be98c9805e4 f2fs: allow quota to use reserved blocks
2a6489c87ee0 f2fs: fix to drop all inmem pages correctly
fd214422395f f2fs: speed up defragment on sparse file
6bce96329c85 f2fs: support F2FS_IOC_PRECACHE_EXTENTS
9ce3d6bb6883 f2fs: add an ioctl to disable GC for specific file
9ef5e6568449 f2fs: prevent newly created inode from being dirtied incorrectly
08ddb1917e04 f2fs: support FIEMAP_FLAG_XATTR
aa9c1c1046e0 f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock
92b8f9c726ef f2fs: check node page again in write end io
4992a3ca15b3 f2fs: fix to caclulate required free section correctly
d1a6b4f6c958 f2fs: handle newly created page when revoking inmem pages
462d762b205a f2fs: add resgid and resuid to reserve root blocks
cbd5e5af8cac f2fs: implement cgroup writeback support
5a5847421d31 f2fs: remove unused pend_list_tag
37d4ca7cd1c1 f2fs: avoid high cpu usage in discard thread
02cfdab8344f f2fs: make local functions static
5fee54098565 f2fs: add reserved blocks for root user
265974636ae0 f2fs: check segment type in __f2fs_replace_block
4f76d6acc6ff f2fs: update inode info to inode page for new file
52b452817452 f2fs: show precise # of blocks that user/root can use
ae0e1fa5a816 f2fs: clean up unneeded declaration
8fc74466298f f2fs: continue to do direct IO if we only preallocate partial blocks
162464df894e f2fs: enable quota at remount from r to w
e270976ff848 f2fs: skip stop_checkpoint for user data writes
d04736926fa7 f2fs: fix missing error number for xattr operation
211cb7bb2428 f2fs: recover directory operations by fsync
2648e735ffe5 f2fs: return error during fill_super
e2a0518d8c24 f2fs: fix an error case of missing update inode page
bf1750bafe86 f2fs: fix potential hangtask in f2fs_trace_pid
c804fcf3df1f f2fs: no need return value in restore summary process
fdd41a8793ad f2fs: use unlikely for release case
a74690b03e24 f2fs: don't return value in truncate_data_blocks_range
987892cc67aa f2fs: clean up f2fs_map_blocks
d7714cb2319a f2fs: clean up hash codes
e3d2a1e946df f2fs: fix error handling in fill_super
b02e72d2942c f2fs: spread f2fs_k{m,z}alloc
ead5259de34d f2fs: inject fault to kvmalloc
e585ca29dd7e f2fs: inject fault to kzalloc
8234ed56e748 f2fs: remove a redundant conditional expression
1a9d6a9c0046 f2fs: apply write hints to select the type of segment for direct write
955e7f58f67b f2fs: switch to fscrypt_prepare_setattr()
268c7f607cb8 f2fs: switch to fscrypt_prepare_lookup()
8dfa646f972c f2fs: switch to fscrypt_prepare_rename()
d5382ccb020a f2fs: switch to fscrypt_prepare_link()
3ccc177c9b8b f2fs: switch to fscrypt_file_open()
8b5674efdc35 f2fs: remove repeated f2fs_bug_on
ba4556cdf10c f2fs: remove an excess variable
46accc925145 f2fs: fix lock dependency in between dio_rwsem & i_mmap_sem
8933908c4f93 f2fs: remove unused parameter
76b6e8ed2058 f2fs: still write data if preallocate only partial blocks
1ed753392fe7 f2fs: introduce sysfs readdir_ra to readahead inode block in readdir
4e68a15eeebc f2fs: fix concurrent problem for updating free bitmap
9be6e7596232 f2fs: remove unneeded memory footprint accounting
923df752db37 f2fs: no need to read nat block if nat_block_bitmap is set
09234be262cb f2fs: reserve nid resource for quota sysfile
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Pull f2fs updates from Jaegeuk Kim:
"In this round, we introduce sysfile-based quota support which is
required for Android by default. In addition, we allow that users are
able to reserve some blocks in runtime to mitigate performance drops
in low free space.
Enhancements:
- assign proper data segments according to write_hints given by user
- issue cache_flush on dirty devices only among multiple devices
- exploit cp_error flag and add more faults to enhance fault
injection test
- conduct more readaheads during f2fs_readdir
- add a range for discard commands
Bug fixes:
- fix zero stat->st_blocks when inline_data is set
- drop crypto key and free stale memory pointer while evict_inode is
failing
- fix some corner cases in free space and segment management
- fix wrong last_disk_size
This series includes lots of clean-ups and code enhancement in terms
of xattr operations, discard/flush command control. In addition, it
adds versatile debugfs entries to monitor f2fs status"
Cherry-picked from origin/upstream-f2fs-stable-linux-4.9.y:
5b2b7f7dd87f f2fs: deny accessing encryption policy if encryption is off
05dac2e89867 f2fs: inject fault in inc_valid_node_count
2e08de4fda00 f2fs: fix to clear FI_NO_PREALLOC
931ecc22b402 f2fs: expose quota information in debugfs
45d6e702d3a9 f2fs: separate nat entry mem alloc from nat_tree_lock
8e2f721703b4 f2fs: validate before set/clear free nat bitmap
27d50282d073 f2fs: avoid opened loop codes in __add_ino_entry
b1823df0e68f f2fs: apply write hints to select the type of segments for buffered write
b561061c067b f2fs: introduce scan_curseg_cache for cleanup
5772e0c102b0 f2fs: optimize the way of traversing free_nid_bitmap
a51e85eae2c3 f2fs: keep scanning until enough free nids are acquired
d75eb8d7345e f2fs: trace checkpoint reason in fsync()
bed6cffdf7e4 f2fs: keep isize once block is reserved cross EOF
5f3fdd2afc9b f2fs: avoid race in between GC and block exchange
51cb399e7ead f2fs: save a multiplication for last_nid calculation
7f41aab3d61d f2fs: fix summary info corruption
148c518517fc f2fs: remove dead code in update_meta_page
c3bc6e5183f0 f2fs: remove unneeded semicolon
9e71a0321f32 f2fs: don't bother with inode->i_version
49f72728e708 f2fs: check curseg space before foreground GC
25d0becffa0a f2fs: use rw_semaphore to protect SIT cache
0108c481d7af f2fs: support quota sys files
d4c292db7b81 f2fs: add quota_ino feature infra
1033eee92c41 f2fs: optimize __update_nat_bits
247e8951164a f2fs: modify for accurate fggc node io stat
c7272f8aebe7 Revert "f2fs: handle dirty segments inside refresh_sit_entry"
068868fc7e26 f2fs: add a function to move nid
b9f73875af11 f2fs: export SSR allocation threshold
ab30204bb9d8 f2fs: give correct trimmed blocks in fstrim
b5db2de4623f f2fs: support bio allocation error injection
58ddec85e417 f2fs: support get_page error injection
ef216e610a14 f2fs: add missing sysfs description
68ab6f8dd541 f2fs: support soft block reservation
d7947e2a3118 f2fs: handle error case when adding xattr entry
50ffaa980f98 f2fs: support flexible inline xattr size
5a8ed073c7fa f2fs: show current cp state
d888fcd74c18 f2fs: add missing quota_initialize
af1cc1ea2309 f2fs: show # of dirty segments via sysfs
6663422a3642 f2fs: stop all the operations by cp_error flag
872d8e3af080 f2fs: remove several redundant assignments
bf823c82e3fe f2fs: avoid using timespec
c70ab1b99321 f2fs: fix to correct no_fggc_candidate
0e6275dc317b Revert "f2fs: return wrong error number on f2fs_quota_write"
41d59230e302 f2fs: remove obsolete pointer for truncate_xattr_node
8c12a10f2ee4 f2fs: retry ENOMEM for quota_read|write
35e13ca2e9d9 f2fs: limit # of inmemory pages
9ca57a7e96e0 f2fs: update ctx->pos correctly when hitting hole in directory
a04208e54b9c f2fs: relocate readahead codes in readdir()
905d0370e6ab f2fs: allow readdir() to be interrupted
2dfbda03f941 f2fs: trace f2fs_readdir
d67586ddf3e9 f2fs: trace f2fs_lookup
4c94f14b3c8b f2fs: skip searching non-exist range in truncate_hole
ac5d4b425739 f2fs: expose some sectors to user in inline data or dentry case
5ded3b82dc2b f2fs: avoid stale fi->gdirty_list pointer
f6b708e25fb5 f2fs/crypto: drop crypto key at evict_inode only
33fdebbb0e7e f2fs: fix to avoid race when accessing last_disk_size
595046758d8e f2fs: Fix bool initialization/comparison
1e5305afa81e f2fs: give up CP_TRIMMED_FLAG if it drops discards
8258fd3054c1 f2fs: trace f2fs_remove_discard
6c46b37d9b43 f2fs: reduce cmd_lock coverage in __issue_discard_cmd
daf437d37cff f2fs: split discard policy
69a596797adf f2fs: wrap discard policy
28e1023e8e8a f2fs: support issuing/waiting discard in range
fd6422ea9264 f2fs: fix to flush multiple device in checkpoint
f014be822ce7 f2fs: enhance multiple device flush
0597a6e4bdcd f2fs: fix to show ino management cache size correctly
cacc1ed0c46a f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush
84af6aeceb49 f2fs: obsolete ALLOC_NID_LIST list
8456d343780d f2fs: convert inline data for direct I/O & FI_NO_PREALLOC
3f01af786c84 f2fs: allow readpages with NULL file pointer
2f0df25e6529 f2fs: show flush list status in sysfs
20ef20fbf78e f2fs: introduce read_xattr_block
126221de375b f2fs: introduce read_inline_xattr
127faa71f6a6 Revert "f2fs: reuse nids more aggressively"
c19928e660fb Revert "f2fs: node segment is prior to data segment selected victim"
Change-Id: I2f892e6ee75c41e84241f37b1903e0c32387d95b
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Cherry-picked from upstream-f2fs-stable-linux-4.9.y
Changes include:
commit 30da3a4de96733 ("f2fs: hurry up to issue discard after io interruption")
commit d1c363b48398d4 ("f2fs: fix to show correct discard_granularity in sysfs")
...
commit e6b120d4d01ab0 ("f2fs/fscrypt: catch up to v4.12")
commit 4d7931d72758db ("KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload()")
Signed-off-by: Hyojun Kim <hyojun@google.com>
Pull misc vfs updates from Al Viro:
"Assorted misc bits and pieces.
There are several single-topic branches left after this (rename2
series from Miklos, current_time series from Deepa Dinamani, xattr
series from Andreas, uaccess stuff from from me) and I'd prefer to
send those separately"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
proc: switch auxv to use of __mem_open()
hpfs: support FIEMAP
cifs: get rid of unused arguments of CIFSSMBWrite()
posix_acl: uapi header split
posix_acl: xattr representation cleanups
fs/aio.c: eliminate redundant loads in put_aio_ring_file
fs/internal.h: add const to ns_dentry_operations declaration
compat: remove compat_printk()
fs/buffer.c: make __getblk_slow() static
proc: unsigned file descriptors
fs/file: more unsigned file descriptors
fs: compat: remove redundant check of nr_segs
cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
cifs: don't use memcpy() to copy struct iov_iter
get rid of separate multipage fault-in primitives
fs: Avoid premature clearing of capabilities
fs: Give dentry to inode_change_ok() instead of inode
fuse: Propagate dentry down to inode_change_ok()
ceph: Propagate dentry down to inode_change_ok()
xfs: Propagate dentry down to inode_change_ok()
...
In sync_node_pages, we won't check and commit last merged pages in private
bio cache of f2fs, as these pages were taged as writeback, someone who is
waiting for writebacking of the page will be blocked until the cache was
committed by someone else.
We need to commit node type bio cache to avoid potential deadlock or long
delay of waiting writeback.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously, we only support global fault injection configuration, so that
when we configure type/rate of fault injection through sysfs, mount
option, it will influence all f2fs partition which is being used.
It is not make sence, since it will be not convenient if developer want
to test separated partitions with different fault injection rate/type
simultaneously, also it's not possible to enable fault injection in one
partition and disable fault injection in other one.
>From now on, we move global configuration of fault injection in module
into per-superblock, hence injection testing can be more flexible.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch improves the migration of dirty pages and allows migrating atomic
written pages that F2FS uses in Page Cache. Instead of the fallback releasing
page path, it provides better performance for memory compaction, CMA and other
users of memory page migrating. For dirty pages, there is no need to write back
first when migrating. For an atomic written page before committing, we can
migrate the page and update the related 'inmem_pages' list at the same time.
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix some coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In available_free_memory, there are two same judgement conditions which
is used for checking NAT excess, remove one of them.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
LKP reported -36.3% regression of fsmark.files_per_sec due to this patch.
I've confirmed that fxmark [1] has also slight regression for DWAL.
[1] https://github.com/sslab-gatech/fxmark
This reverts commit ec795418c4.
Pull f2fs updates from Jaegeuk Kim:
"The major change in this version is mitigating cpu overheads on write
paths by replacing redundant inode page updates with mark_inode_dirty
calls. And we tried to reduce lock contentions as well to improve
filesystem scalability. Other feature is setting F2FS automatically
when detecting host-managed SMR.
Enhancements:
- ioctl to move a range of data between files
- inject orphan inode errors
- avoid flush commands congestion
- support lazytime
Bug fixes:
- return proper results for some dentry operations
- fix deadlock in add_link failure
- disable extent_cache for fcollapse/finsert"
* tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
f2fs: clean up coding style and redundancy
f2fs: get victim segment again after new cp
f2fs: handle error case with f2fs_bug_on
f2fs: avoid data race when deciding checkpoin in f2fs_sync_file
f2fs: support an ioctl to move a range of data blocks
f2fs: fix to report error number of f2fs_find_entry
f2fs: avoid memory allocation failure due to a long length
f2fs: reset default idle interval value
f2fs: use blk_plug in all the possible paths
f2fs: fix to avoid data update racing between GC and DIO
f2fs: add maximum prefree segments
f2fs: disable extent_cache for fcollapse/finsert inodes
f2fs: refactor __exchange_data_block for speed up
f2fs: fix ERR_PTR returned by bio
f2fs: avoid mark_inode_dirty
f2fs: move i_size_write in f2fs_write_end
f2fs: fix to avoid redundant discard during fstrim
f2fs: avoid mismatching block range for discard
f2fs: fix incorrect f_bfree calculation in ->statfs
f2fs: use percpu_rw_semaphore
...
These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces. For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense. Any check for READA is replaced with an
explicit check for REQ_RAHEAD. Also remove the READA alias for
REQ_RAHEAD.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch reverts 19a5f5e2ef (f2fs: drop any block plugging),
and adds blk_plug in write paths additionally.
The main reason is that blk_start_plug can be used to wake up from low-power
mode before submitting further bios.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer.
When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the
overall performance.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In procedure of synchonized read, after sending out the read request, reader
will try to lock the page for waiting device to finish the read jobs and
unlock the page, but meanwhile, truncater will race with reader, so after
reader get lock of the page, it should check page's mapping to detect
whether someone has truncated the page in advance, then reader has the
chance to do the retry if truncation was done, otherwise read can be failed
due to previous condition check.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The readahead nat pages are more likely to be reclaimed quickly, so it'd better
to gather more free nids in advance.
And, let's keep some free nids as much as possible.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Separate the op from the rq_flag_bits and have f2fs
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Commit aaf9607516 ("f2fs: check node page
contents all the time") pointed out that "sometimes it was reported that
its contents was missing", so it checks the page's mapping and contents.
When "nid != nid_of_node(page)", ERR_PTR(-EIO) will be returned to the
caller. However, commit e1c51b9f1d ("f2fs:
clean up node page updating flow") moves "nid != nid_of_node(page)" test
to "f2fs_bug_on(sbi, nid != nid_of_node(page))", this will return a
wrong page to the caller when F2FS_CHECK_FS is off when "sometimes it
was reported that its contents was missing" happens.
This patch restores to check node page contents all the time, and
returns the errno to make the caller known something is wrong and avoid
to use the page. This patch also moves f2fs_bug_on to its proper location.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch reduces to call them across the whole tree.
- sync_inode_page()
- update_inode_page()
- update_inode()
- f2fs_write_inode()
Instead, checkpoint will flush all the dirty inode metadata before syncing
node pages.
Note that, this is doable, since we call mark_inode_dirty_sync() for all
inode's field change which needs to update on-disk inode as well.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch calls mark_inode_dirty_sync() for the following on-disk inode
changes.
-> largest
-> ctime/mtime/atime
-> i_current_depth
-> i_xattr_nid
-> i_pino
-> i_advise
-> i_flags
-> i_mode
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch enables reading node blocks in advance when truncating large
data blocks.
> time rm $MNT/testfile (500GB) after drop_cachees
Before : 9.422 s
After : 4.821 s
Reported-by: Stephen Bates <stephen.bates@microsemi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For foreground GC, we cache node blocks in victim section and set them
dirty, then we call sync_node_pages to flush these node pages, but
meanwhile, those node pages which does not locate in victim section
will be flushed together, so more bandwidth and continuous free space
would be occupied.
So for this condition, it's better to leave those unrelated node page
in cache for further write hit, and let CP or VM to flush them afterward.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In order to give atomic writes, we should consider power failure during
sync_node_pages in fsync.
So, this patch marks fsync flag only in the last dnode block.
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The fsync_node_pages should return pass or failure so that user could know
fsync is completed or not.
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch splits the existing sync_node_pages into (f)sync_node_pages.
The fsync_node_pages is used for f2fs_sync_file only.
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When fsync is called, sync_node_pages finds a proper direct node pages to flush.
But, it locks unrelated direct node pages together unnecessarily.
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds BUG_ON instead of retrying loop.
In the case of node pages, we already got this inode page, but unlocked it.
By the fact that we don't truncate any node pages in operations, the page's
mapping should be unchangeable.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously, after trylock_page is succeeded, it doesn't check its mapping.
In order to fix that, we can just give PGP_LOCK to pagecache_get_page.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If many threads calls fsync with data writes, we don't need to flush every
bios having node page writes.
The f2fs_wait_on_page_writeback will flush its bios when the page is really
needed.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
ra_node_page() is used to read ahead one node page. Comparing to regular
read, it's faster because it doesn't wait for IO completion.
But if it is called twice for reading the same block, and the IO request
from the first call hasn't been completed before the second call, the second
call will have to wait until the read is over.
Here use the code in __do_page_cache_readahead() to solve this problem.
It does nothing when someone else already puts the page in mapping. The
status of page should be assured by whoever puts it there.
This implement also prevents alteration of page reference count.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>