This is a BUG_ON issue as follows when running xfstest-generic-503:
WARNING: CPU: 21 PID: 1385 at fs/f2fs/inode.c:762 f2fs_evict_inode+0x847/0xaa0
Modules linked in:
CPU: 21 PID: 1385 Comm: umount Not tainted 5.19.0-rc5+ #73
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
Call Trace:
evict+0x129/0x2d0
dispose_list+0x4f/0xb0
evict_inodes+0x204/0x230
generic_shutdown_super+0x5b/0x1e0
kill_block_super+0x29/0x80
kill_f2fs_super+0xe6/0x140
deactivate_locked_super+0x44/0xc0
deactivate_super+0x79/0x90
cleanup_mnt+0x114/0x1a0
__cleanup_mnt+0x16/0x20
task_work_run+0x98/0x100
exit_to_user_mode_prepare+0x3d0/0x3e0
syscall_exit_to_user_mode+0x12/0x30
do_syscall_64+0x42/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Function flow analysis when BUG occurs:
f2fs_fallocate mmap
do_page_fault
pte_spinlock // ---lock_pte
do_wp_page
wp_page_shared
pte_unmap_unlock // unlock_pte
do_page_mkwrite
f2fs_vm_page_mkwrite
down_read(invalidate_lock)
lock_page
if (PageMappedToDisk(page))
goto out;
// set_page_dirty --NOT RUN
out: up_read(invalidate_lock);
finish_mkwrite_fault // unlock_pte
f2fs_collapse_range
down_write(i_mmap_sem)
truncate_pagecache
unmap_mapping_pages
i_mmap_lock_write // down_write(i_mmap_rwsem)
......
zap_pte_range
pte_offset_map_lock // ---lock_pte
set_page_dirty
f2fs_dirty_data_folio
if (!folio_test_dirty(folio)) {
fault_dirty_shared_page
set_page_dirty
f2fs_dirty_data_folio
if (!folio_test_dirty(folio)) {
filemap_dirty_folio
f2fs_update_dirty_folio // ++
}
unlock_page
filemap_dirty_folio
f2fs_update_dirty_folio // page count++
}
pte_unmap_unlock // --unlock_pte
i_mmap_unlock_write // up_write(i_mmap_rwsem)
truncate_inode_pages
up_write(i_mmap_sem)
When race happens between mmap-do_page_fault-wp_page_shared and
fallocate-truncate_pagecache-zap_pte_range, the zap_pte_range calls
function set_page_dirty without page lock. Besides, though
truncate_pagecache has immap and pte lock, wp_page_shared calls
fault_dirty_shared_page without any. In this case, two threads race
in f2fs_dirty_data_folio function. Page is set to dirty only ONCE,
but the count is added TWICE by calling filemap_dirty_folio.
Thus the count of dirty page cannot accord with the real dirty pages.
Following is the solution to in case of race happens without any lock.
Since folio_test_set_dirty in filemap_dirty_folio is atomic, judge return
value will not be at risk of race.
Signed-off-by: Shuqi Zhang <zhangshuqi3@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Just use the defined COMPRESS_MAPPING to get compress cache
mapping instaed of direct accessing name.
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently, the range and default value of NR_CPUS is too restrictive
for high-end RISC-V systems with large number of HARTs. The latest
QEMU virt machine supports upto 512 CPUs so the current NR_CPUS is
restrictive for QEMU as well. Other major architectures (such as
ARM64, x86_64, MIPS, etc) have a much higher range and default
value of NR_CPUS.
This patch increases NR_CPUS range to 2-512 and default value to
XLEN (i.e. 32 for RV32 and 64 for RV64).
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Link: https://lore.kernel.org/r/20220420112408.155561-1-apatel@ventanamicro.com/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Update the _EXPORT_DEV_PM_OPS() internal macro. It was not used anywhere
outside pm.h and pm_runtime.h, so it is safe to update it.
Before, this macro would take a few parameters to be used as sleep and
runtime callbacks. This made it unsuitable to use with different
callbacks, for instance the "noirq" ones.
It is now semantically different: instead of creating a conditionally
exported dev_pm_ops structure, it only contains part of the definition.
This macro should however never be used directly (hence the trailing
underscore). Instead, the following four macros are provided:
- EXPORT_DEV_PM_OPS(name)
- EXPORT_GPL_DEV_PM_OPS(name)
- EXPORT_NS_DEV_PM_OPS(name, ns)
- EXPORT_NS_GPL_DEV_PM_OPS(name, ns)
For instance, it is now possible to conditionally export noirq
suspend/resume PM functions like this:
EXPORT_GPL_DEV_PM_OPS(foo_pm_ops) = {
NOIRQ_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
};
The existing helper macros EXPORT_*_SIMPLE_DEV_PM_OPS() and
EXPORT_*_RUNTIME_DEV_PM_OPS() have been updated to use these new macros.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull landlock updates from Mickaël Salaün:
"Improve user help for Landlock (documentation and sample)"
* tag 'landlock-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Fix documentation style
landlock: Slightly improve documentation and fix spelling
samples/landlock: Print hints about ABI versions
Many of the mdev drivers use a simple counter for keeping track of the
available instances. Move this code to the core code and store the counter
in the mdev_parent. Implement it using correct locking, fixing mdpy.
Drivers just provide the value in the mdev_driver at registration time
and the core code takes care of maintaining it and exposing the value in
sysfs.
[hch: count instances per-parent instead of per-type, use an atomic_t
to avoid taking mdev_list_lock in the show method]
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-15-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pull audit updates from Paul Moore:
"Six audit patches for v6.1, most are pretty trivial, but a quick list
of the highlights are below:
- Only free the audit proctitle information on task exit. This allows
us to cache the information and improve performance slightly.
- Use the time_after() macro to do time comparisons instead of doing
it directly and potentially causing ourselves problems when the
timer wraps.
- Convert an audit_context state comparison from a relative enum
comparison, e.g. (x < y), to a not-equal comparison to ensure that
we are not caught out at some unknown point in the future by an
enum shuffle.
- A handful of small cleanups such as tidying up comments and
removing unused declarations"
* tag 'audit-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: remove selinux_audit_rule_update() declaration
audit: use time_after to compare time
audit: free audit_proctitle only on task exit
audit: explicitly check audit_context->context enum value
audit: audit_context pid unused, context enum comment fix
audit: fix repeated words in comments
- Miscellaneous of_node_put() fixes
- Nuke dt-bindings/clk path (again) by moving headers to dt-bindings/clock
- Convert gpio-clk-gate binding to YAML
- Various fixes to AMD/Xilinx Zynqmp clk driver
- Graduate AMD/Xilinx "clocking wizard" driver from staging
* clk-ofnode:
clk: ti: Balance of_node_get() calls for of_find_node_by_name()
clk: tegra20: Fix refcount leak in tegra20_clock_init
clk: tegra: Fix refcount leak in tegra114_clock_init
clk: tegra: Fix refcount leak in tegra210_clock_init
clk: sprd: Hold reference returned by of_get_parent()
clk: berlin: Add of_node_put() for of_get_parent()
clk: at91: dt-compat: Hold reference returned by of_get_parent()
clk: qoriq: Hold reference returned by of_get_parent()
clk: oxnas: Hold reference returned by of_get_parent()
clk: st: Hold reference returned by of_get_parent()
clk: tegra: Add missing of_node_put()
clk: meson: Hold reference returned by of_get_parent()
clk: nomadik: Add missing of_node_put()
* clk-bindings:
dt-bindings: clock: drop minItems equal to maxItems
dt-bindings: clock: gpio-gate-clock: Convert to json-schema
dt-bindings: clock: Move versaclock.h to dt-bindings/clock
dt-bindings: clock: Move lochnagar.h to dt-bindings/clock
* clk-cleanup:
clk: allow building lan966x as a module
clk: clk-xgene: simplify if-if to if-else
clk: nxp: fix typo in comment
clk: mvebu: armada-37xx-tbg: Remove the unneeded result variable
clk: ti: dra7-atl: Fix reference leak in of_dra7_atl_clk_probe
clkdev: Simplify devm_clk_hw_register_clkdev() function
clkdev: Remove never used devm_clk_release_clkdev()
clk: Remove never used devm_of_clk_del_provider()
clk: pistachio: Fix initconst confusion
clk: clk-npcm7xx: Remove unused struct npcm7xx_clk_gate_data and npcm7xx_clk_div_fixed_data
clk: do not initialize ret
clk: remove extra empty line
clk: Fix comment typo
clk: move from strlcpy with unused retval to strscpy
* clk-zynq:
clk: zynqmp: pll: rectify rate rounding in zynqmp_pll_round_rate
clk: zynqmp: Check the return type zynqmp_pm_query_data
clk: zynqmp: Add a check for NULL pointer
clk: zynqmp: Replaced strncpy() with strscpy()
clk: zynqmp: Fix stack-out-of-bounds in strncpy`
clk: zynqmp: make bestdiv unsigned
* clk-xilinx:
clk: clocking-wizard: Depend on HAS_IOMEM
clk: clocking-wizard: Use dev_err_probe() helper
clk: clocking-wizard: Update the compatible
clk: clocking-wizard: Fix the reconfig for 5.2
clk: clocking-wizard: Rename nr-outputs to xlnx,nr-outputs
clk: clocking-wizard: Move clocking-wizard out
dt-bindings: add documentation of xilinx clocking wizard
This PLL frequency needs a UL postfix to avoid compiler warnings on
32-bit architectures.
Fixes: 184fdd873d ("clk: qcom: Add global clock controller driver for SM6375")
Cc: Konrad Dybcio <konrad.dybcio@somainline.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Pull x86 cleanups from Borislav Petkov:
- The usual round of smaller fixes and cleanups all over the tree
* tag 'x86_cleanups_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype
x86/uaccess: Improve __try_cmpxchg64_user_asm() for x86_32
x86: Fix various duplicate-word comment typos
x86/boot: Remove superfluous type casting from arch/x86/boot/bitops.h
Sage's git tree has not been pushed to in years, and it was removed in
commit 3a5ccecd9a ("MAINTAINERS: remove myself as ceph co-maintainer"),
so it is better to remove it in the documentation too.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When unlinking a file the kclient will send a unlink request to MDS
by holding the dentry reference, and then the MDS will return 2 replies,
which are unsafe reply and a deferred safe reply.
After the unsafe reply received the kernel will return and succeed
the unlink request to user space apps.
Only when the safe reply received the dentry's reference will be
released. Or the dentry will only be unhashed from dcache. But when
the open_by_handle_at() begins to open the unlinked files it will
succeed.
The inode->i_count couldn't be used to check whether the inode is
opened or not.
Link: https://tracker.ceph.com/issues/56524
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luís Henriques <lhenriques@suse.de>
Tested-by: Luís Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When the client has enough caps to satisfy a setattr locally without
having to talk to the server, we currently do the setattr without
incrementing the change attribute.
Ensure that if the ctime changes locally, then the change attribute
does too.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Prefer using kcalloc(a, b) over kzalloc(a * b) as this improves
semantics since kcalloc is intended for allocating an array of memory.
Signed-off-by: Kenneth Lee <klee33@uw.edu>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
For write when trying to get the Fwb caps we need to keep waiting
on transition from WRBUFFER|WR -> WR to avoid a new WR sync write
from going before a prior buffered writeback happens.
While for read there is no need to wait on transition from
RDCACHE|RD -> RD, and we can just exclude the revoking caps and
force to sync read.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
ceph_msg_data_next is always passed a NULL pointer for this field. Some
of the "next" operations look at it in order to determine the length,
but we can just take the min of the data on the page or cursor->resid.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Pull x86 cache resource control updates from Borislav Petkov:
- More work by James Morse to disentangle the resctrl filesystem
generic code from the architectural one with the endgoal of plugging
ARM's MPAM implementation into it too so that the user interface
remains the same
- Properly restore the MSR_MISC_FEATURE_CONTROL value instead of
blindly overwriting it to 0
* tag 'x86_cache_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data
x86/resctrl: Rename and change the units of resctrl_cqm_threshold
x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()
x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()
x86/resctrl: Abstract __rmid_read()
x86/resctrl: Allow per-rmid arch private storage to be reset
x86/resctrl: Add per-rmid arch private storage for overflow and chunks
x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
x86/resctrl: Allow update_mba_bw() to update controls directly
x86/resctrl: Remove architecture copy of mbps_val
x86/resctrl: Switch over to the resctrl mbps_val list
x86/resctrl: Create mba_sc configuration in the rdt_domain
x86/resctrl: Abstract and use supports_mba_mbps()
x86/resctrl: Remove set_mba_sc()s control array re-initialisation
x86/resctrl: Add domain offline callback for resctrl work
x86/resctrl: Group struct rdt_hw_domain cleanup
x86/resctrl: Add domain online callback for resctrl work
x86/resctrl: Merge mon_capable and mon_enabled
...
Pull x75 microcode loader updates from Borislav Petkov:
- Get rid of a single ksize() usage
- By popular demand, print the previous microcode revision an update
was done over
- Remove more code related to the now gone MICROCODE_OLD_INTERFACE
- Document the problems stemming from microcode late loading
* tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Track patch allocation size explicitly
x86/microcode: Print previous version of microcode after reload
x86/microcode: Remove ->request_microcode_user()
x86/microcode: Document the whole late loading problem
Pull x86 paravirt fix from Borislav Petkov:
- Ensure paravirt patching site descriptors are aligned properly so
that code can do proper arithmetic with their addresses
* tag 'x86_paravirt_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/paravirt: Ensure proper alignment
Pull misc x86 fixes from Borislav Petkov:
- Drop misleading "RIP" from the opcodes dumping message
- Correct APM entry's Konfig help text
* tag 'x86_misc_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Don't mention RIP in "Code: "
x86/Kconfig: Specify idle=poll instead of no-hlt
Pull x86 asm update from Borislav Petkov:
- Use the __builtin_ffs/ctzl() compiler builtins for the constant
argument case in the kernel's optimized ffs()/ffz() helpers in order
to make use of the compiler's constant folding optmization passes.
* tag 'x86_asm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions
x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions
Pull x86 core fixes from Borislav Petkov:
- Make sure an INT3 is slapped after every unconditional retpoline JMP
as both vendors suggest
- Clean up pciserial a bit
* tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86,retpoline: Be sure to emit INT3 after JMP *%\reg
x86/earlyprintk: Clean up pciserial
Pull x86 APIC update from Borislav Petkov:
- Add support for locking the APIC in X2APIC mode to prevent SGX
enclave leaks
* tag 'x86_apic_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Don't disable x2APIC if locked