* aosp/upstream-f2fs-stable-linux-6.1.y:
fsverity: simplify fsverity_get_digest()
fsverity: stop using PG_error to track error status
f2fs: reset wait_ms to default if any of the victims have been selected
f2fs: fix some format WARNING in debug.c and sysfs.c
f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in f2fs_put_super()
f2fs: fix iostat parameter for discard
f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> free_bio_entry_cache
f2fs: add block_age-based extent cache
f2fs: allocate the extent_cache by default
f2fs: refactor extent_cache to support for read and more
f2fs: remove unnecessary __init_extent_tree
f2fs: move internal functions into extent_cache.c
f2fs: specify extent cache for read explicitly
f2fs: introduce f2fs_is_readonly() for readability
f2fs: remove F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() macro
f2fs: do some cleanup for f2fs module init
MAINTAINERS: Add f2fs bug tracker link
f2fs: remove the unused flush argument to change_curseg
f2fs: open code allocate_segment_by_default
f2fs: remove struct segment_allocation default_salloc_ops
f2fs: introduce discard_urgent_util sysfs node
f2fs: define MIN_DISCARD_GRANULARITY macro
f2fs: init discard policy after thread wakeup
f2fs: avoid victim selection from previous victim section
f2fs: truncate blocks in batch in __complete_revoke_list()
f2fs: make __queue_discard_cmd() return void
f2fs: fix description about discard_granularity node
f2fs: move set_file_temperature into f2fs_new_inode
f2fs: fix to enable compress for newly created file if extension matches
f2fs: set zstd compress level correctly
f2fs: change type for 'sbi->readdir_ra'
f2fs: cleanup for 'f2fs_tuning_parameters' function
f2fs: fix to alloc_mode changed after remount on a small volume device
f2fs: remove submit label in __submit_discard_cmd()
f2fs: fix to do sanity check on i_extra_isize in is_alive()
f2fs: introduce F2FS_IOC_START_ATOMIC_REPLACE
f2fs: fix to set flush_merge opt and show noflush_merge
f2fs: initialize locks earlier in f2fs_fill_super()
f2fs: optimize iteration over sparse directories
f2fs: fix to avoid accessing uninitialized spinlock
f2fs: correct i_size change for atomic writes
f2fs: add proc entry to show discard_plist info
f2fs: allow to read node block after shutdown
f2fs: replace ternary operator with max()
f2fs: replace gc_urgent_high_remaining with gc_remaining_trials
f2fs: add missing bracket in doc
f2fs: use sysfs_emit instead of sprintf
f2fs: introduce gc_mode sysfs node
f2fs: fix to destroy sbi->post_read_wq in error path of f2fs_fill_super()
f2fs: fix return val in f2fs_start_ckpt_thread()
f2fs: fix the msg data type
f2fs: fix the assign logic of iocb
f2fs: Fix typo in comments
f2fs: introduce max_ordered_discard sysfs node
f2fs: allow to set compression for inlined file
f2fs: add barrier mount option
f2fs: fix normal discard process
f2fs: cleanup in f2fs_create_flush_cmd_control()
f2fs: fix gc mode when gc_urgent_high_remaining is 1
f2fs: remove batched_trim_sections node
f2fs: support fault injection for f2fs_is_valid_blkaddr()
f2fs: fix to invalidate dcc->f2fs_issue_discard in error path
f2fs: Fix the race condition of resize flag between resizefs
f2fs: let's avoid to get cp_rwsem twice by f2fs_evict_inode by d_invalidate
f2fs: should put a page when checking the summary info
Bug: 256243893
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: I84a5ebbfbfb58a1f89327ce003a298aaae7a42b9
This config helps address audio buffer underrun issues on arm64 targets.
Bug: 168521633
Signed-off-by: J. Avila <elavila@google.com>
[jstultz: Rebased to android-mainline, also enabled on x86]
Signed-off-by: John Stultz <jstultz@google.com>
Change-Id: Iffb39b2c1d55f5d88d1475f68d5ed5a3bba90a2b
Tasklets are supposed to finish their work quickly and
should not block the current running process, but it is not
guaranteed that. Currently softirq_entry/exit can be used to
know total tasklets execution time, but not helpful to track
individual tasklet's execution time. With that we can't find
any culprit tasklet function, which is taking more time.
Add tasklet_entry/exit trace point support to track
individual tasklet execution.
This patch has been carried in the Android tree for awhile
so I wanted to submit it for review upstream. Feedback would
be appreciated!
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Connor O'Brien <connoro@google.com>
Cc: kernel-team@android.com
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
[elavila: Port to android-mainline]
Signed-off-by: J. Avila <elavila@google.com>
[jstultz: Rebased to upstream, cut unused trace points, added
comments for the tracepoints, reworded commit]
Signed-off-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/lkml/20221213185310.1315794-1-jstultz@google.com/
Bug: 168521633
Change-Id: I3944fcedffae54a5f761d0b18ff1c41d2c3e4aeb
Defer the softirq processing to ksoftirqd if a RT task is
running or queued on the current CPU. This complements the RT
task placement algorithm which tries to find a CPU that is not
currently busy with softirqs.
Currently NET_TX, NET_RX, BLOCK and IRQ_POLL softirqs are only
deferred as they can potentially run for long time.
Additionally, this patch stubs out ksoftirqd_running() logic,
in the CONFIG_RT_SOFTIRQ_AWARE_SCHED case, as deferring
potentially long-running softirqs will cause the logic to not
process shorter-running softirqs immediately. By stubbing it out
the potentially long running softirqs are deferred, but the
shorter running ones can still run immediately.
This patch includes folded-in fixes by:
Lingutla Chandrasekhar <clingutla@codeaurora.org>
Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
J. Avila <elavila@google.com>
Cc: John Dias <joaodias@google.com>
Cc: Connor O'Brien <connoro@google.com>
Cc: Rick Yiu <rickyiu@google.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Chris Redpath <chris.redpath@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: kernel-team@android.com
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[satyap@codeaurora.org: trivial merge conflict resolution.]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
[elavila: Port to mainline, squash with bugfix]
Signed-off-by: J. Avila <elavila@google.com>
[jstultz: Rebase to linus/HEAD, minor rearranging of code,
included bug fix Reported-by: Qais Yousef <qais.yousef@arm.com> ]
Signed-off-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/lkml/20221116075929.453876-4-jstultz@google.com/
Change-Id: Iedb590216e373788950243165ff83f4c43015f31
Bug: 168521633
---
v4:
* Fix commit message to accurately note long-running softirqs
(suggested by Qais)
* Switch to using rt_task(current) (suggested by Qais)
v5:
* Switch to using CONFIG_RT_SOFTIRQ_AWARE_SCHED (suggested by
Joel Fernandes <joel@joelfernandes.org>)
In certain audio use cases, scheduling RT threads on cores that
are handling softirqs can lead to glitches. Prevent this
behavior in cases where the softirq is likely to take a long
time. To avoid unnecessary migrations, the old behavior is
preserved for RCU, SCHED and TIMER irqs which are expected to be
relatively quick.
This patch reworks and combines two related changes originally
by John Dias <joaodias@google.com>
Cc: John Dias <joaodias@google.com>
Cc: Connor O'Brien <connoro@google.com>
Cc: Rick Yiu <rickyiu@google.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Chris Redpath <chris.redpath@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: kernel-team@android.com
Signed-off-by: John Dias <joaodias@google.com>
[elavila: Port to mainline, amend commit text]
Signed-off-by: J. Avila <elavila@google.com>
[connoro: Reworked, simplified, and merged two patches together]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Further simplified and fixed issues, reworded commit
message, removed arm64-isms]
Signed-off-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/lkml/20221116075929.453876-3-jstultz@google.com/
Change-Id: Iea6719e1d900be4a3492949685f1218f299f29e8
Bug: 168521633
---
v2:
* Reformatted Kconfig entry to match coding style
(Reported-by: Randy Dunlap <rdunlap@infradead.org>)
* Made rt_task_fits_capacity_and_may_preempt static to
avoid warnings (Reported-by: kernel test robot <lkp@intel.com>)
* Rework to use preempt_count and drop kconfig dependency on ARM64
v3:
* Use introduced __cpu_softirq_pending() to avoid s390 build
issues (Reported-by: kernel test robot <lkp@intel.com>)
v4:
* Drop TASKLET_SOFTIRQ from LONG_SOFTIRQS (suggested by Qais)
* Depend on !PREEMPT_RT (Suggested by Qais)
* Larger simplification of logic (suggested by Qais)
* Rework LONG_SOFTIRQS to use BIT() macros
* Rename task_may_preempt() to cpu_busy_with_softirqs()
v5:
* Conditionalize active_softirqs handling (suggested by Alexander
Gordeev <agordeev@linux.ibm.com>)
* Reorder rt_task_fits_cpu to have the "fast" function first
(Suggested by Alexander Gordeev <agordeev@linux.ibm.com>)
* Fix bug I introduced in v2 condensing
task_thread_info(task)->preempt_count to preempt_count()
(Reported-by: Alexander Gordeev <agordeev@linux.ibm.com>)
* Tweak comment discription to remove the vauge "slow"
descriptor of softirqs being run by ksoftirqd
(Suggested by Alexander Gordeev <agordeev@linux.ibm.com>)
* Switch to using CONFIG_RT_SOFTIRQ_AWARE_SCHED (suggested by
Joel Fernandes <joel@joelfernandes.org>)
* Simplify cpu_busy_with_softirqs() logic as pointed out by
Alexander Gordeev <agordeev@linux.ibm.com>
* Switch to using IS_ENABLED rather then defining my own macro
(suggsted by Joel Fernandes <joel@joelfernandes.org>)
Vendors might want to change tasks affinity settings when they are
moving from one cpuset into the other. Add vendor hook to give control
to vendor to implement what they need.
This feature is necessary to control hotplug behaviour within Qualcomm's
proprietary load tracking scheme, WALT.
This reverts commit 034ddf86f7 ("Revert "ANDROID: sched/cpuset: Add
vendor hook to change tasks affinity"") to effectively bring back the
original change.
Bug: 174125747
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
Signed-off-by: Sai Harshini Nimmala <quic_snimmala@quicinc.com>
Change-Id: Id4e9c3e47e3b4e041804bdf10cbd9e36179bc172
This deliberately changes the behavior of the per-cpuset
cpus file to not be effected by hotplug. When a cpu is offlined, it will
be removed from the cpuset/cpus file. When a cpu is onlined, if the cpuset
originally requested that cpu be a part of the cpuset, that cpu will be
restored to the cpuset. The cpus files still have to be hierachical, but
the ranges no longer have to be out of the currently online cpus, just the
physically present cpus.
This reverts commit 3fc3fe757f ("Revert "ANDROID: cpuset: Make cpusets
restore on hotplug""). Reverting the revert effectively bringing back
the original change.
Bug: 174125747
Bug: 120444281
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
[AmitP: Refactored original changes to align with upstream commit
201af4c0fa ("cgroup: move cgroup files under kernel/cgroup/")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[satyap@codeaurora.org: port to android-mainline kernel]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
[SaiHarshiniN: Resolved merge conflict post 6.1-rc1]
Signed-off-by: Sai Harshini Nimmala <quic_snimmala@quicinc.com>
Change-Id: I588f6172c15b48ecadb85f161dae948ce9aeca93
Instead of looking up the algorithm by name in hash_algo_name[] to get
its hash_algo ID, just store the hash_algo ID in the fsverity_hash_alg
struct. Verify at boot time that every fsverity_hash_alg has a valid
hash_algo ID with matching digest size.
Remove an unnecessary memset() of the whole digest array to 0 before the
digest is copied into it.
Finally, remove the pr_debug statement. There is already a pr_debug for
the fsverity digest when the file is opened.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/r/20221129045139.69803-1-ebiggers@kernel.org
As a step towards freeing the PG_error flag for other uses, change ext4
and f2fs to stop using PG_error to track verity errors. Instead, if a
verity error occurs, just mark the whole bio as failed. The coarser
granularity isn't really a problem since it isn't any worse than what
the block layer provides, and errors from a multi-page readahead aren't
reported to applications unless a single-page read fails too.
f2fs supports compression, which makes the f2fs changes a bit more
complicated than desired, but the basic premise still works.
Note: there are still a few uses of PageError in f2fs, but they are on
the write path, so they are unrelated and this patch doesn't touch them.
Reviewed-by: Chao Yu <chao@kernel.org>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221129070401.156114-1-ebiggers@kernel.org
In non-foreground gc mode, if no victim is selected, the gc process
will wait for no_gc_sleep_time before waking up again. In this
subsequent time, even though a victim will be selected, the gc process
still waits for no_gc_sleep_time before waking up. The configuration
of wait_ms is not reasonable.
After any of the victims have been selected, we need to reset wait_ms to
default sleep time from no_gc_sleep_time.
Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
To fix:
WARNING: function definition argument 'struct f2fs_attr *' should also have an identifier name
+ ssize_t (*show)(struct f2fs_attr *, struct f2fs_sb_info *, char *);
WARNING: return sysfs_emit(...) formats should include a terminating newline
+ return sysfs_emit(buf, "(none)");
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+ unsigned npages = NODE_MAPPING(sbi)->nrpages;
WARNING: Missing a blank line after declarations
+ unsigned npages = COMPRESS_MAPPING(sbi)->nrpages;
+ si->page_mem += (unsigned long long)npages << PAGE_SHIFT;
WARNING: quoted string split across lines
+ seq_printf(s, "CP merge (Queued: %4d, Issued: %4d, Total: %4d, "
+ "Cur time: %4d(ms), Peak time: %4d(ms))\n",
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
No need to call f2fs_issue_discard_timeout() in f2fs_put_super,
when no discard command requires issue. Since the caller of
f2fs_issue_discard_timeout() usually judges the number of discard
commands before using it. Let's move this logic to
f2fs_issue_discard_timeout().
By the way, use f2fs_realtime_discard_enable to simplify the code.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Just like other data we count uses the number of bytes as the basic unit,
but discard uses the number of cmds as the statistical unit. In fact the
discard command contains the number of blocks, so let's change to the
number of bytes as the base unit.
Fixes: b0af6d491a ("f2fs: add app/fs io stat")
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces a runtime hot/cold data separation method
for f2fs, in order to improve the accuracy for data temperature
classification, reduce the garbage collection overhead after
long-term data updates.
Enhanced hot/cold data separation can record data block update
frequency as "age" of the extent per inode, and take use of the age
info to indicate better temperature type for data block allocation:
- It records total data blocks allocated since mount;
- When file extent has been updated, it calculate the count of data
blocks allocated since last update as the age of the extent;
- Before the data block allocated, it searches for the age info and
chooses the suitable segment for allocation.
Test and result:
- Prepare: create about 30000 files
* 3% for cold files (with cold file extension like .apk, from 3M to 10M)
* 50% for warm files (with random file extension like .FcDxq, from 1K
to 4M)
* 47% for hot files (with hot file extension like .db, from 1K to 256K)
- create(5%)/random update(90%)/delete(5%) the files
* total write amount is about 70G
* fsync will be called for .db files, and buffered write will be used
for other files
The storage of test device is large enough(128G) so that it will not
switch to SSR mode during the test.
Benefit: dirty segment count increment reduce about 14%
- before: Dirty +21110
- after: Dirty +18286
Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() have never
been used since they were introduced by this commit
76f105a2dbcd("f2fs: add feature facility in superblock").
So let's remove them. BTW, convert f2fs_sb_has_##name to return bool.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As f2fs component in bugzilla.kernel.org was created and used since
2018-7.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
allocate_segment_by_default has just two callers, which use very
different code pathes inside it based on the force paramter. Just
open code the logic in the two callers using a new helper to decided
if a new segment should be allocated.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There is only single instance of these ops, so remove the indirection
and call allocate_segment_by_default directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Through this node, you can control the background discard
to run more aggressively or not aggressively when reach the
utilization rate of the space.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Do cleanup in f2fs_tuning_parameters() and __init_discard_policy(),
let's use macro instead of number.
Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Under the current logic, after the discard thread wakes up, it will not
run according to the expected policy, but will use the expected policy
before sleep. Move the strategy selection to after the thread wakes up,
so that the running state of the thread meets expectations.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When f2fs chooses GC victim in large section & LFS mode,
next_victim_seg[gc_type] is referenced first. After segment is freed,
next_victim_seg[gc_type] has the next segment number.
However, next_victim_seg[gc_type] still has the last segment number
even after the last segment of section is freed. In this case, when f2fs
chooses a victim for the next GC round, the last segment of previous victim
section is chosen as a victim.
Initialize next_victim_seg[gc_type] to NULL_SEGNO for the last segment in
large section.
Fixes: e3080b0120 ("f2fs: support subsectional garbage collection")
Signed-off-by: Yonggil Song <yonggil.song@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Use f2fs_do_truncate_blocks() to truncate all blocks in-batch in
__complete_revoke_list().
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since __queue_discard_cmd() never returns an error,
let's make it return void.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Let's fix the inconsistency in the text description.
Default discard granularity is 16. For small devices,
default value is 1.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since the file name has already passed to f2fs_new_inode(), let's
move set_file_temperature() into f2fs_new_inode().
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If compress_extension is set, and a newly created file matches the
extension, the file could be marked as compression file. However,
if inline_data is also enabled, there is no chance to check its
extension since f2fs_should_compress() always returns false.
This patch moves set_compress_inode(), which do extension check, in
f2fs_should_compress() to check extensions before setting inline
data flag.
Fixes: 7165841d57 ("f2fs: fix to check inline_data during compressed inode conversion")
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Before this patch, the varibale 'readdir_ra' takes effect if it's equal
to '1' or not, so we can change type for it from 'int' to 'bool'.
Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The commit 84b89e5d94 ("f2fs: add auto tuning for small devices") add
tuning for small volume device, now support to tune alloce_mode to 'reuse'
if it's small size. But the alloc_mode will change to 'default' when do
remount on this small size dievce. This patch fo fix alloc_mode changed
when do remount for a small volume device.
Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Complaint from Matthew Wilcox in another similar place:
"submit? You don't submit anything at the 'submit' label.
it should be called 'skip' or something. But I think this
is just badly written and you don't need a goto at all."
Let's remove submit label for readability.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
introduce a new ioctl to replace the whole content of a file atomically,
which means it induces truncate and content update at the same time.
We can start it with F2FS_IOC_START_ATOMIC_REPLACE and complete it with
F2FS_IOC_COMMIT_ATOMIC_WRITE. Or abort it with
F2FS_IOC_ABORT_ATOMIC_WRITE.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Some minor modifications to flush_merge and related parameters:
1.The FLUSH_MERGE opt is set by default only in non-ro mode.
2.When ro and merge are set at the same time, an error is reported.
3.Display noflush_merge mount opt.
Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Wei Chen reports a kernel bug as blew:
INFO: task syz-executor.0:29056 blocked for more than 143 seconds.
Not tainted 5.15.0-rc5 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:14632 pid:29056 ppid: 6574 flags:0x00000004
Call Trace:
__schedule+0x4a1/0x1720
schedule+0x36/0xe0
rwsem_down_write_slowpath+0x322/0x7a0
fscrypt_ioctl_set_policy+0x11f/0x2a0
__f2fs_ioctl+0x1a9f/0x5780
f2fs_ioctl+0x89/0x3a0
__x64_sys_ioctl+0xe8/0x140
do_syscall_64+0x34/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Eric did some investigation on this issue, quoted from reply of Eric:
"Well, the quality of this bug report has a lot to be desired (not on
upstream kernel, reproducer is full of totally irrelevant stuff, not
sent to the mailing list of the filesystem whose disk image is being
fuzzed, etc.). But what is going on is that f2fs_empty_dir() doesn't
consider the case of a directory with an extremely large i_size on a
malicious disk image.
Specifically, the reproducer mounts an f2fs image with a directory
that has an i_size of 14814520042850357248, then calls
FS_IOC_SET_ENCRYPTION_POLICY on it.
That results in a call to f2fs_empty_dir() to check whether the
directory is empty. f2fs_empty_dir() then iterates through all
3616826182336513 blocks the directory allegedly contains to check
whether any contain anything. i_rwsem is held during this, so
anything else that tries to take it will hang."
In order to solve this issue, let's use f2fs_get_next_page_offset()
to speed up iteration by skipping holes for all below functions:
- f2fs_empty_dir
- f2fs_readdir
- find_in_level
The way why we can speed up iteration was described in
'commit 3cf4574705 ("f2fs: introduce get_next_page_offset to speed
up SEEK_DATA")'.
Meanwhile, in f2fs_empty_dir(), let's use f2fs_find_data_page()
instead f2fs_get_lock_data_page(), due to i_rwsem was held in
caller of f2fs_empty_dir(), there shouldn't be any races, so it's
fine to not lock dentry page during lookuping dirents in the page.
Link: https://lore.kernel.org/lkml/536944df-a0ae-1dd8-148f-510b476e1347@kernel.org/T/
Reported-by: Wei Chen <harperchen1110@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We need to make sure i_size doesn't change until atomic write commit is
successful and restore it when commit is failed.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>