linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-07 19:30:30 +09:00

Author	SHA1	Message	Date
Uwe Kleine-König	feea30e075	soc/mediatek: mtk-devapc: Convert to platform remove callback returning void [ Upstream commit a129ac3555c0dca6f04ae404dc0f0790656587fb ] The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20230925095532.1984344-15-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Stable-dep-of: c9c0036c1990 ("soc: mediatek: mtk-devapc: Fix leaking IO map on driver remove") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:46 -08:00
Krzysztof Kozlowski	3cce694e7e	arm64: dts: qcom: sm8550: Fix ADSP memory base and length [ Upstream commit a6a8f54bc2af555738322783ba1e990c2ae7f443 ] The address space in ADSP PAS (Peripheral Authentication Service) remoteproc node should point to the QDSP PUB address space (QDSP6...SS_PUB): 0x0680_0000 with length of 0x10000. 0x3000_0000, value used so far, is the main region of CDSP. Downstream DTS uses 0x0300_0000, which is oddly similar to 0x3000_0000, yet quite different and points to unused area. Correct the base address and length, which also moves the node to different place to keep things sorted by unit address. The diff looks big, but only the unit address and "reg" property were changed. This should have no functional impact on Linux users, because PAS loader does not use this address space at all. Fixes: `d0c061e366` ("arm64: dts: qcom: sm8550: add adsp, cdsp & mdss nodes") Cc: stable@vger.kernel.org Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20241213-dts-qcom-cdsp-mpss-base-address-v3-7-2e0036fccd8d@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:46 -08:00
Neil Armstrong	5d285b468e	arm64: dts: qcom: sm8550: add missing qcom,non-secure-domain property [ Upstream commit 49c50ad9e6cbaa6a3da59cdd85d4ffb354ef65f4 ] By default the DSP domains are non secure, add the missing qcom,non-secure-domain property to mark them as non-secure. Fixes: `d0c061e366` ("arm64: dts: qcom: sm8550: add adsp, cdsp & mdss nodes") Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20240227-topic-sm8x50-upstream-fastrpc-non-secure-domain-v1-2-15c4c864310f@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: a6a8f54bc2af ("arm64: dts: qcom: sm8550: Fix ADSP memory base and length") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:46 -08:00
Ling Xu	5369d3b31f	arm64: dts: qcom: sm8550: Add dma-coherent property [ Upstream commit 4a03b85b8491d8bfe84a26ff979507b6ae7122c1 ] Add dma-coherent property to fastRPC context bank nodes to pass dma sequence test in fastrpc sanity test, ensure that data integrity is maintained during DMA operations. Signed-off-by: Ling Xu <quic_lxu5@quicinc.com> Link: https://lore.kernel.org/r/20240125102413.3016-2-quic_lxu5@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: a6a8f54bc2af ("arm64: dts: qcom: sm8550: Fix ADSP memory base and length") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:46 -08:00
Krzysztof Kozlowski	5a8f1613a1	arm64: dts: qcom: sm8450: Fix ADSP memory base and length [ Upstream commit 13c96bee5d5e5b61a9d8d000c9bb37bb9a2a0551 ] The address space in ADSP PAS (Peripheral Authentication Service) remoteproc node should point to the QDSP PUB address space (QDSP6...SS_PUB): 0x0300_0000 with length of 0x10000, which also matches downstream DTS. 0x3000_0000, value used so far, was in datasheet is the region of CDSP. Correct the base address and length, which also moves the node to different place to keep things sorted by unit address. The diff looks big, but only the unit address and "reg" property were changed. This should have no functional impact on Linux users, because PAS loader does not use this address space at all. Fixes: `1172729576` ("arm64: dts: qcom: sm8450: Add remoteproc enablers and instances") Cc: stable@vger.kernel.org Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20241213-dts-qcom-cdsp-mpss-base-address-v3-4-2e0036fccd8d@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:46 -08:00
Neil Armstrong	e96ddc4f00	arm64: dts: qcom: sm8450: add missing qcom,non-secure-domain property [ Upstream commit 033fbfa0eb60e519f50e97ef93baec270cd28a88 ] By default the DSP domains are non secure, add the missing qcom,non-secure-domain property to mark them as non-secure. Fixes: `91d70eb708` ("arm64: dts: qcom: sm8450: add fastrpc nodes") Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20240227-topic-sm8x50-upstream-fastrpc-non-secure-domain-v1-1-15c4c864310f@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: 13c96bee5d5e ("arm64: dts: qcom: sm8450: Fix ADSP memory base and length") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:46 -08:00
Igor Pylypiv	3cfce644d8	scsi: core: Do not retry I/Os during depopulation [ Upstream commit 9ff7c383b8ac0c482a1da7989f703406d78445c6 ] Fail I/Os instead of retry to prevent user space processes from being blocked on the I/O completion for several minutes. Retrying I/Os during "depopulation in progress" or "depopulation restore in progress" results in a continuous retry loop until the depopulation completes or until the I/O retry loop is aborted due to a timeout by the scsi_cmd_runtime_exceeced(). Depopulation is slow and can take 24+ hours to complete on 20+ TB HDDs. Most I/Os in the depopulation retry loop end up taking several minutes before returning the failure to user space. Cc: stable@vger.kernel.org # 4.18.x: 2bbeb8d scsi: core: Handle depopulation and restoration in progress Cc: stable@vger.kernel.org # 4.18.x Fixes: `e37c7d9a03` ("scsi: core: sanitize++ in progress") Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Link: https://lore.kernel.org/r/20250131184408.859579-1-ipylypiv@google.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:46 -08:00
Douglas Gilbert	7f818ac0ac	scsi: core: Handle depopulation and restoration in progress [ Upstream commit 2bbeb8d12404cf0603f513fc33269ef9abfbb396 ] The default handling of the NOT READY sense key is to wait for the device to become ready. The "wait" is assumed to be relatively short. However there is a sub-class of NOT READY that have the "... in progress" phrase in their additional sense code and these can take much longer. Following on from commit `505aa4b6a8` ("scsi: sd: Defer spinning up drive while SANITIZE is in progress") we now have element depopulation and restoration that can take a long time. For example, over 24 hours for a 20 TB, 7200 rpm hard disk to depopulate 1 of its 20 elements. Add handling of ASC/ASCQ: 0x4,0x24 (depopulation in progress) and ASC/ASCQ: 0x4,0x25 (depopulation restoration in progress) to sd.c . The scsi_lib.c has incomplete handling of these two messages, so complete it. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Link: https://lore.kernel.org/r/20231015050650.131145-1-dgilbert@interlog.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Stable-dep-of: 9ff7c383b8ac ("scsi: core: Do not retry I/Os during depopulation") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:46 -08:00
Krzysztof Kozlowski	b11052c8c4	firmware: qcom: scm: Fix missing read barrier in qcom_scm_is_available() [ Upstream commit 0a744cceebd0480cb39587b3b1339d66a9d14063 ] Commit 2e4955167ec5 ("firmware: qcom: scm: Fix __scm and waitq completion variable initialization") introduced a write barrier in probe function to store global '__scm' variable. It also claimed that it added a read barrier, because as we all known barriers are paired (see memory-barriers.txt: "Note that write barriers should normally be paired with read or address-dependency barriers"), however it did not really add it. The offending commit used READ_ONCE() to access '__scm' global which is not a barrier. The barrier is needed so the store to '__scm' will be properly visible. This is most likely not fatal in current driver design, because missing read barrier would mean qcom_scm_is_available() callers will access old value, NULL. Driver does not support unbinding and does not correctly handle probe failures, thus there is no risk of stale or old pointer in '__scm' variable. However for code correctness, readability and to be sure that we did not mess up something in this tricky topic of SMP barriers, add a read barrier for accessing '__scm'. Change also comment from useless/obvious what does barrier do, to what is expected: which other parts of the code are involved here. Fixes: 2e4955167ec5 ("firmware: qcom: scm: Fix __scm and waitq completion variable initialization") Cc: stable@vger.kernel.org Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20241209-qcom-scm-missing-barriers-and-all-sort-of-srap-v2-1-9061013c8d92@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:45 -08:00
Dan Carpenter	67f04c52e5	ASoC: renesas: rz-ssi: Add a check for negative sample_space [ Upstream commit 82a0a3e6f8c02b3236b55e784a083fa4ee07c321 ] My static checker rule complains about this code. The concern is that if "sample_space" is negative then the "sample_space >= runtime->channels" condition will not work as intended because it will be type promoted to a high unsigned int value. strm->fifo_sample_size is SSI_FIFO_DEPTH (32). The SSIFSR_TDC_MASK is 0x3f. Without any further context it does seem like a reasonable warning and it can't hurt to add a check for negatives. Cc: stable@vger.kernel.org Fixes: `03e786bd43` ("ASoC: sh: Add RZ/G2L SSIF-2 driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/e07c3dc5-d885-4b04-a742-71f42243f4fd@stanley.mountain Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:45 -08:00
Dmitry Torokhov	a2cbcd7013	Input: synaptics - fix crash when enabling pass-through port [ Upstream commit 08bd5b7c9a2401faabdaa1472d45c7de0755fd7e ] When enabling a pass-through port an interrupt might come before psmouse driver binds to the pass-through port. However synaptics sub-driver tries to access psmouse instance presumably associated with the pass-through port to figure out if only 1 byte of response or entire protocol packet needs to be forwarded to the pass-through port and may crash if psmouse instance has not been attached to the port yet. Fix the crash by introducing open() and close() methods for the port and check if the port is open before trying to access psmouse instance. Because psmouse calls serio_open() only after attaching psmouse instance to serio port instance this prevents the potential crash. Reported-by: Takashi Iwai <tiwai@suse.de> Fixes: `100e16959c` ("Input: libps2 - attach ps2dev instances as serio port's drvdata") Link: https://bugzilla.suse.com/show_bug.cgi?id=1219522 Cc: stable@vger.kernel.org Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/Z4qSHORvPn7EU2j1@google.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:45 -08:00
Dmitry Torokhov	c02d630398	Input: serio - define serio_pause_rx guard to pause and resume serio ports [ Upstream commit 0e45a09a1da0872786885c505467aab8fb29b5b4 ] serio_pause_rx() and serio_continue_rx() are usually used together to temporarily stop receiving interrupts/data for a given serio port. Define "serio_pause_rx" guard for this so that the port is always resumed once critical section is over. Example: scoped_guard(serio_pause_rx, elo->serio) { elo->expected_packet = toupper(packet[0]); init_completion(&elo->cmd_done); } Link: https://lore.kernel.org/r/20240905041732.2034348-2-dmitry.torokhov@gmail.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Stable-dep-of: 08bd5b7c9a24 ("Input: synaptics - fix crash when enabling pass-through port") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:45 -08:00
Zijun Hu	ab8b6bf2bf	Bluetooth: qca: Fix poor RF performance for WCN6855 [ Upstream commit a2fad248947d702ed3dcb52b8377c1a3ae201e44 ] For WCN6855, board ID specific NVM needs to be downloaded once board ID is available, but the default NVM is always downloaded currently. The wrong NVM causes poor RF performance, and effects user experience for several types of laptop with WCN6855 on the market. Fix by downloading board ID specific NVM if board ID is available. Fixes: `095327fede` ("Bluetooth: hci_qca: Add support for QTI Bluetooth chip wcn6855") Cc: stable@vger.kernel.org # 6.4 Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Tested-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Steev Klimaszewski <steev@kali.org> #Thinkpad X13s Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:45 -08:00
Cheng Jiang	ae2d111c09	Bluetooth: qca: Update firmware-name to support board specific nvm [ Upstream commit a4c5a468c6329bde7dfd46bacff2cbf5f8a8152e ] Different connectivity boards may be attached to the same platform. For example, QCA6698-based boards can support either a two-antenna or three-antenna solution, both of which work on the sa8775p-ride platform. Due to differences in connectivity boards and variations in RF performance from different foundries, different NVM configurations are used based on the board ID. Therefore, in the firmware-name property, if the NVM file has an extension, the NVM file will be used. Otherwise, the system will first try the .bNN (board ID) file, and if that fails, it will fall back to the .bin file. Possible configurations: firmware-name = "QCA6698/hpnv21"; firmware-name = "QCA6698/hpnv21.bin"; Signed-off-by: Cheng Jiang <quic_chejiang@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: a2fad248947d ("Bluetooth: qca: Fix poor RF performance for WCN6855") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:45 -08:00
Zijun Hu	e68d2b880e	Bluetooth: qca: Support downloading board id specific NVM for WCN7850 [ Upstream commit e41137d8bd1a8e8bab8dcbfe3ec056418db3df18 ] Download board id specific NVM instead of default for WCN7850 if board id is available. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: a2fad248947d ("Bluetooth: qca: Fix poor RF performance for WCN6855") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:45 -08:00
Andreas Kemnade	5d8ba57800	cpufreq: fix using cpufreq-dt as module [ Upstream commit f1f010c9d9c62c865d9f54e94075800ba764b4d9 ] This driver can be built as a module since commit `3b062a0869` ("cpufreq: dt-platdev: Support building as module"), but unfortunately this caused a regression because the cputfreq-dt-platdev.ko module does not autoload. Usually, this is solved by just using the MODULE_DEVICE_TABLE() macro to export all the device IDs as module aliases. But this driver is special due how matches with devices and decides what platform supports. There are two of_device_id lists, an allow list that are for CPU devices that always match and a deny list that's for devices that must not match. The driver registers a cpufreq-dt platform device for all the CPU device nodes that either are in the allow list or contain an operating-points-v2 property and are not in the deny list. Enforce builtin compile of cpufreq-dt-platdev to make autoload work. Fixes: `3b062a0869` ("cpufreq: dt-platdev: Support building as module") Link: https://lore.kernel.org/all/20241104201424.2a42efdd@akair/ Link: https://lore.kernel.org/all/20241119111918.1732531-1-javierm@redhat.com/ Cc: stable@vger.kernel.org Signed-off-by: Andreas Kemnade <andreas@kemnade.info> Reported-by: Radu Rendec <rrendec@redhat.com> Reported-by: Javier Martinez Canillas <javierm@redhat.com> [ Viresh: Picked commit log from Javier, updated tags ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:45 -08:00
Jeff Johnson	a9b868213e	cpufreq: dt-platdev: add missing MODULE_DESCRIPTION() macro [ Upstream commit 64e018d7a8990c11734704a0767c47fd8efd5388 ] make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/cpufreq/cpufreq-dt-platdev.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Stable-dep-of: f1f010c9d9c6 ("cpufreq: fix using cpufreq-dt as module") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:45 -08:00
Chen Ridong	972486d371	memcg: fix soft lockup in the OOM process [ Upstream commit ade81479c7dda1ce3eedb215c78bc615bbd04f06 ] A soft lockup issue was found in the product with about 56,000 tasks were in the OOM cgroup, it was traversing them when the soft lockup was triggered. watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066] CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G Hardware name: Huawei Cloud OpenStack Nova, BIOS RIP: 0010:console_unlock+0x343/0x540 RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247 RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040 R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0 R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: vprintk_emit+0x193/0x280 printk+0x52/0x6e dump_task+0x114/0x130 mem_cgroup_scan_tasks+0x76/0x100 dump_header+0x1fe/0x210 oom_kill_process+0xd1/0x100 out_of_memory+0x125/0x570 mem_cgroup_out_of_memory+0xb5/0xd0 try_charge+0x720/0x770 mem_cgroup_try_charge+0x86/0x180 mem_cgroup_try_charge_delay+0x1c/0x40 do_anonymous_page+0xb5/0x390 handle_mm_fault+0xc4/0x1f0 This is because thousands of processes are in the OOM cgroup, it takes a long time to traverse all of them. As a result, this lead to soft lockup in the OOM process. To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks' function per 1000 iterations. For global OOM, call 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue. Link: https://lkml.kernel.org/r/20241224025238.3768787-1-chenridong@huaweicloud.com Fixes: `9cbb78bb31` ("mm, memcg: introduce own oom handler to iterate only over its own threads") Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:45 -08:00
Carlos Galo	0a657f6e7f	mm: update mark_victim tracepoints fields [ Upstream commit 72ba14deb40a9e9668ec5e66a341ed657e5215c2 ] The current implementation of the mark_victim tracepoint provides only the process ID (pid) of the victim process. This limitation poses challenges for userspace tools requiring real-time OOM analysis and intervention. Although this information is available from the kernel logs, it’s not the appropriate format to provide OOM notifications. In Android, BPF programs are used with the mark_victim trace events to notify userspace of an OOM kill. For consistency, update the trace event to include the same information about the OOMed victim as the kernel logs. - UID In Android each installed application has a unique UID. Including the `uid` assists in correlating OOM events with specific apps. - Process Name (comm) Enables identification of the affected process. - OOM Score Will allow userspace to get additional insight of the relative kill priority of the OOM victim. In Android, the oom_score_adj is used to categorize app state (foreground, background, etc.), which aids in analyzing user-perceptible impacts of OOM events [1]. - Total VM, RSS Stats, and pgtables Amount of memory used by the victim that will, potentially, be freed up by killing it. [1] `246dc8fc95`:frameworks/base/services/core/java/com/android/server/am/ProcessList.java;l=188-283 Signed-off-by: Carlos Galo <carlosgalo@google.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: ade81479c7dd ("memcg: fix soft lockup in the OOM process") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:45 -08:00
Yu Kuai	52848a095b	md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime [ Upstream commit 8d28d0ddb986f56920ac97ae704cc3340a699a30 ] After commit ec6bb299c7c3 ("md/md-bitmap: add 'sync_size' into struct md_bitmap_stats"), following panic is reported: Oops: general protection fault, probably for non-canonical address RIP: 0010:bitmap_get_stats+0x2b/0xa0 Call Trace: <TASK> md_seq_show+0x2d2/0x5b0 seq_read_iter+0x2b9/0x470 seq_read+0x12f/0x180 proc_reg_read+0x57/0xb0 vfs_read+0xf6/0x380 ksys_read+0x6c/0xf0 do_syscall_64+0x82/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Root cause is that bitmap_get_stats() can be called at anytime if mddev is still there, even if bitmap is destroyed, or not fully initialized. Deferenceing bitmap in this case can crash the kernel. Meanwhile, the above commit start to deferencing bitmap->storage, make the problem easier to trigger. Fix the problem by protecting bitmap_get_stats() with bitmap_info.mutex. Cc: stable@vger.kernel.org # v6.12+ Fixes: `32a7627cf3` ("[PATCH] md: optimised resync using Bitmap based intent logging") Reported-and-tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Closes: https://lore.kernel.org/linux-raid/ca3a91a2-50ae-4f68-b317-abd9889f3907@oracle.com/T/#m6e5086c95201135e4941fe38f9efa76daf9666c5 Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250124092055.4050195-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:44 -08:00
Yu Kuai	754fffa651	md/md-bitmap: add 'sync_size' into struct md_bitmap_stats [ Upstream commit ec6bb299c7c3dd4ca1724d13d5f5fae3ee54fc65 ] To avoid dereferencing bitmap directly in md-cluster to prepare inventing a new bitmap. BTW, also fix following checkpatch warnings: WARNING: Deprecated use of 'kmap_atomic', prefer 'kmap_local_page' instead WARNING: Deprecated use of 'kunmap_atomic', prefer 'kunmap_local' instead Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-7-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:44 -08:00
Yu Kuai	023d5bc950	md/md-cluster: fix spares warnings for __le64 [ Upstream commit 82697ccf7e495c1ba81e315c2886d6220ff84c2c ] drivers/md/md-cluster.c:1220:22: warning: incorrect type in assignment (different base types) drivers/md/md-cluster.c:1220:22: expected unsigned long my_sync_size drivers/md/md-cluster.c:1220:22: got restricted __le64 [usertype] sync_size drivers/md/md-cluster.c:1252:35: warning: incorrect type in assignment (different base types) drivers/md/md-cluster.c:1252:35: expected unsigned long sync_size drivers/md/md-cluster.c:1252:35: got restricted __le64 [usertype] sync_size drivers/md/md-cluster.c:1253:41: warning: restricted __le64 degrades to integer Fix the warnings by using le64_to_cpu() to convet __le64 to integer. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-6-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:44 -08:00
Yu Kuai	ba9e0f0578	md/md-bitmap: replace md_bitmap_status() with a new helper md_bitmap_get_stats() [ Upstream commit 38f287d7e495ae00d4481702f44ff7ca79f5c9bc ] There are no functional changes, and the new helper will be used in multiple places in following patches to avoid dereferencing bitmap directly. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-3-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:44 -08:00
Yu Kuai	87ebc90e84	md: simplify md_seq_ops [ Upstream commit cf1b6d4441fffd0ba8ae4ced6a12f578c95ca049 ] Before this patch, the implementation is hacky and hard to understand: 1) md_seq_start set pos to 1; 2) md_seq_show found pos is 1, then print Personalities; 3) md_seq_next found pos is 1, then it update pos to the first mddev; 4) md_seq_show found pos is not 1 or 2, show mddev; 5) md_seq_next found pos is not 1 or 2, update pos to next mddev; 6) loop 4-5 until the last mddev, then md_seq_next update pos to 2; 7) md_seq_show found pos is 2, then print unused devices; 8) md_seq_next found pos is 2, stop; This patch remove the magic value and use seq_list_start/next/stop() directly, and move printing "Personalities" to md_seq_start(), "unsed devices" to md_seq_stop(): 1) md_seq_start print Personalities, and then set pos to first mddev; 2) md_seq_show show mddev; 3) md_seq_next update pos to next mddev; 4) loop 2-3 until the last mddev; 5) md_seq_stop print unsed devices; Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230927061241.1552837-3-yukuai1@huaweicloud.com Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:44 -08:00
Yu Kuai	452f508079	md: factor out a helper from mddev_put() [ Upstream commit 3d8d32873c7b6d9cec5b40c2ddb8c7c55961694f ] There are no functional changes, prepare to simplify md_seq_ops in next patch. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230927061241.1552837-2-yukuai1@huaweicloud.com Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:44 -08:00
Yu Kuai	13231893fb	md: use separate work_struct for md_start_sync() [ Upstream commit ac619781967bd5663c29606246b50dbebd8b3473 ] It's a little weird to borrow 'del_work' for md_start_sync(), declare a new work_struct 'sync_work' for md_start_sync(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825031622.1530464-2-yukuai1@huaweicloud.com Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:44 -08:00
Darrick J. Wong	4534162e09	xfs: don't over-report free space or inodes in statvfs [ Upstream commit 4b8d867ca6e2fc6d152f629fdaf027053b81765a ] Emmanual Florac reports a strange occurrence when project quota limits are enabled, free space is lower than the remaining quota, and someone runs statvfs: # mkfs.xfs -f /dev/sda # mount /dev/sda /mnt -o prjquota # xfs_quota -x -c 'limit -p bhard=2G 55' /mnt # mkdir /mnt/dir # xfs_io -c 'chproj 55' -c 'chattr +P' -c 'stat -vvvv' /mnt/dir # fallocate -l 19g /mnt/a # df /mnt /mnt/dir Filesystem Size Used Avail Use% Mounted on /dev/sda 20G 20G 345M 99% /mnt /dev/sda 2.0G 0 2.0G 0% /mnt I think the bug here is that xfs_fill_statvfs_from_dquot unconditionally assigns to f_bfree without checking that the filesystem has enough free space to fill the remaining project quota. However, this is a longstanding behavior of xfs so it's unclear what to do here. Cc: <stable@vger.kernel.org> # v2.6.18 Fixes: `932f2c3231` ("[XFS] statvfs component of directory/project quota support, code originally by Glen.") Reported-by: Emmanuel Florac <eflorac@intellique.com> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:44 -08:00
Darrick J. Wong	1603b0b657	xfs: report realtime block quota limits on realtime directories [ Upstream commit 9a17ebfea9d0c7e0bb7409dcf655bf982a5d6e52 ] On the data device, calling statvfs on a projinherit directory results in the block and avail counts being curtailed to the project quota block limits, if any are set. Do the same for realtime files or directories, only use the project quota rt block limits. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Stable-dep-of: 4b8d867ca6e2 ("xfs: don't over-report free space or inodes in statvfs") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-27 04:10:44 -08:00
Ojaswin Mujoo	b887d2fe4a	xfs: Check for delayed allocations before setting extsize commit 2a492ff66673c38a77d0815d67b9a8cce2ef57f8 upstream. Extsize should only be allowed to be set on files with no data in it. For this, we check if the files have extents but miss to check if delayed extents are present. This patch adds that check. While we are at it, also refactor this check into a helper since it's used in some other places as well like xfs_inactive() or xfs_ioctl_setattr_xflags() Without the patch (SUCCEEDS) $ xfs_io -c 'open -f testfile' -c 'pwrite 0 1024' -c 'extsize 65536' wrote 1024/1024 bytes at offset 0 1 KiB, 1 ops; 0.0002 sec (4.628 MiB/sec and 4739.3365 ops/sec) With the patch (FAILS as expected) $ xfs_io -c 'open -f testfile' -c 'pwrite 0 1024' -c 'extsize 65536' wrote 1024/1024 bytes at offset 0 1 KiB, 1 ops; 0.0002 sec (4.628 MiB/sec and 4739.3365 ops/sec) xfs_io: FS_IOC_FSSETXATTR testfile: Invalid argument Fixes: `e94af02a9c` ("[XFS] fix old xfs_setattr mis-merge from irix; mostly harmless esp if not using xfs rt") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:44 -08:00
Christoph Hellwig	067ee59f11	xfs: streamline xfs_filestream_pick_ag commit 81a1e1c32ef474c20ccb9f730afe1ac25b1c62a4 upstream. Directly return the error from xfs_bmap_longest_free_extent instead of breaking from the loop and handling it there, and use a done label to directly jump to the exist when we found a suitable perag structure to reduce the indentation level and pag/max_pag check complexity in the tail of the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:44 -08:00
Chi Zhiling	1fe5c2aa39	xfs: Reduce unnecessary searches when searching for the best extents commit 3ef22684038aa577c10972ee9c6a2455f5fac941 upstream. Recently, we found that the CPU spent a lot of time in xfs_alloc_ag_vextent_size when the filesystem has millions of fragmented spaces. The reason is that we conducted much extra searching for extents that could not yield a better result, and these searches would cost a lot of time when there were millions of extents to search through. Even if we get the same result length, we don't switch our choice to the new one, so we can definitely terminate the search early. Since the result length cannot exceed the found length, when the found length equals the best result length we already have, we can conclude the search. We did a test in that filesystem: [root@localhost ~]# xfs_db -c freesp /dev/vdb from to extents blocks pct 1 1 215 215 0.01 2 3 994476 1988952 99.99 Before this patch: 0) \| xfs_alloc_ag_vextent_size [xfs]() { 0) * 15597.94 us \| } After this patch: 0) \| xfs_alloc_ag_vextent_size [xfs]() { 0) 19.176 us \| } Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:43 -08:00
Christoph Hellwig	c904df6599	xfs: update the pag for the last AG at recovery time commit 4a201dcfa1ff0dcfe4348c40f3ad8bd68b97eb6c upstream. Currently log recovery never updates the in-core perag values for the last allocation group when they were grown by growfs. This leads to btree record validation failures for the alloc, ialloc or finotbt trees if a transaction references this new space. Found by Brian's new growfs recovery stress test. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:43 -08:00
Christoph Hellwig	7a2c24661d	xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag commit 069cf5e32b700f94c6ac60f6171662bdfb04f325 upstream. [backport: uses kmem_zalloc instead of kzalloc] __GFP_RETRY_MAYFAIL increases the likelyhood of allocations to fail, which isn't really helpful during log recovery. Remove the flag and stick to the default GFP_KERNEL policies. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:43 -08:00
Christoph Hellwig	5a9f827998	xfs: error out when a superblock buffer update reduces the agcount commit b882b0f8138ffa935834e775953f1630f89bbb62 upstream. XFS currently does not support reducing the agcount, so error out if a logged sb buffer tries to shrink the agcount. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:43 -08:00
Christoph Hellwig	a9c1ebae75	xfs: update the file system geometry after recoverying superblock buffers commit 6a18765b54e2e52aebcdb84c3b4f4d1f7cb2c0ca upstream. Primary superblock buffers that change the file system geometry after a growfs operation can affect the operation of later CIL checkpoints that make use of the newly added space and allocation groups. Apply the changes to the in-memory structures as part of recovery pass 2, to ensure recovery works fine for such cases. In the future we should apply the logic to other updates such as features bits as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:43 -08:00
Christoph Hellwig	bb305f888d	xfs: pass the exact range to initialize to xfs_initialize_perag commit 82742f8c3f1a93787a05a00aca50c2a565231f84 upstream. [backport: dependency of 6a18765b] Currently only the new agcount is passed to xfs_initialize_perag, which requires lookups of existing AGs to skip them and complicates error handling. Also pass the previous agcount so that the range that xfs_initialize_perag operates on is exactly defined. That way the extra lookups can be avoided, and error handling can clean up the exact range from the old count to the last added perag structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:43 -08:00
Zhang Zekun	5a9e3dbb0b	xfs: Remove empty declartion in header file commit f6225eebd76f371dab98b4d1c1a7c1e255190aef upstream. The definition of xfs_attr_use_log_assist() has been removed since commit `d9c61ccb3b` ("xfs: move xfs_attr_use_log_assist out of xfs_log.c"). So, Remove the empty declartion in header files. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:43 -08:00
Uros Bizjak	b5d917a639	xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate() commit 20195d011c840b01fa91a85ebcd099ca95fbf8fc upstream. Use !try_cmpxchg instead of cmpxchg (ptr, old, new) != old in xlog_cil_insert_pcp_aggregate(). x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg. Also, try_cmpxchg implicitly assigns old ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. Note that the value from *ptr should be read using READ_ONCE to prevent the compiler from merging, refetching or reordering the read. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:43 -08:00
Christoph Hellwig	9716ff8824	xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc commit 6aac77059881e4419df499392c995bf02fb9630b upstream. Currently the debug-only xfs_bmap_exact_minlen_extent_alloc allocation variant fails to drop into the lowmode last resort allocator, and thus can sometimes fail allocations for which the caller has a transaction block reservation. Fix this by using xfs_bmap_btalloc_low_space to do the actual allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:43 -08:00
Christoph Hellwig	a8a80b75b4	xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc commit 405ee87c6938f67e6ab62a3f8f85b3c60a093886 upstream. [backport: dependency of 6aac770] xfs_bmap_exact_minlen_extent_alloc duplicates the args setup in xfs_bmap_btalloc. Switch to call it from xfs_bmap_btalloc after doing the basic setup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:43 -08:00
Christoph Hellwig	479e112ddf	xfs: don't ifdef around the exact minlen allocations commit b611fddc0435738e64453bbf1dadd4b12a801858 upstream. Exact minlen allocations only exist as an error injection tool for debug builds. Currently this is implemented using ifdefs, which means the code isn't even compiled for non-XFS_DEBUG builds. Enhance the compile test coverage by always building the code and use the compilers' dead code elimination to remove it from the generated binary instead. The only downside is that the alloc_minlen_only field is unconditionally added to struct xfs_alloc_args now, but by moving it around and packing it tightly this doesn't actually increase the size of the structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:43 -08:00
Christoph Hellwig	41e7f8ffee	xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate commit 865469cd41bce2b04bef9539cbf70676878bc8df upstream. [backport: dependency of 6aac770] Userdata and metadata allocations end up in the same allocation helpers. Remove the separate xfs_bmap_alloc_userdata function to make this more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:42 -08:00
Christoph Hellwig	f37a5f0e91	xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname commit b3f4e84e2f438a119b7ca8684a25452b3e57c0f0 upstream. Just like xfs_attr3_leaf_split, xfs_attr_node_try_addname can return -ENOSPC both for an actual failure to allocate a disk block, but also to signal the caller to convert the format of the attr fork. Use magic 1 to ask for the conversion here as well. Note that unlike the similar issue in xfs_attr3_leaf_split, this one was only found by code review. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:42 -08:00
Christoph Hellwig	512a911142	xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split commit a5f73342abe1f796140f6585e43e2aa7bc1b7975 upstream. xfs_attr3_leaf_split propagates the need for an extra btree split as -ENOSPC to it's only caller, but the same return value can also be returned from xfs_da_grow_inode when it fails to find free space. Distinguish the two cases by returning 1 for the extra split case instead of overloading -ENOSPC. This can be triggered relatively easily with the pending realtime group support and a file system with a lot of small zones that use metadata space on the main device. In this case every about 5-10th run of xfs/538 runs into the following assert: ASSERT(oldblk->magic == XFS_ATTR_LEAF_MAGIC); in xfs_attr3_leaf_split caused by an allocation failure. Note that the allocation failure is caused by another bug that will be fixed subsequently, but this commit at least sorts out the error handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:42 -08:00
Christoph Hellwig	702e1ac48f	xfs: return bool from xfs_attr3_leaf_add commit 346c1d46d4c631c0c88592d371f585214d714da4 upstream. [backport: dependency of a5f7334 and b3f4e84] xfs_attr3_leaf_add only has two potential return values, indicating if the entry could be added or not. Replace the errno return with a bool so that ENOSPC from it can't easily be confused with a real ENOSPC. Remove the return value from the xfs_attr3_leaf_add_work helper entirely, as it always return 0. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:42 -08:00
Christoph Hellwig	3d58507d6c	xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname commit b1c649da15c2e4c86344c8e5af69c8afa215efec upstream. [backport: dependency of a5f7334 and b3f4e84] xfs_attr_leaf_try_add is only called by xfs_attr_leaf_addname, and merging the two will simplify a following error handling fix. To facilitate this move the remote block state save/restore helpers up in the file so that they don't need forward declarations now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:42 -08:00
Brian Foster	7b5b119191	xfs: don't free cowblocks from under dirty pagecache on unshare commit 4390f019ad7866c3791c3d768d2ff185d89e8ebe upstream. fallocate unshare mode explicitly breaks extent sharing. When a command completes, it checks the data fork for any remaining shared extents to determine whether the reflink inode flag and COW fork preallocation can be removed. This logic doesn't consider in-core pagecache and I/O state, however, which means we can unsafely remove COW fork blocks that are still needed under certain conditions. For example, consider the following command sequence: xfs_io -fc "pwrite 0 1k" -c "reflink <file> 0 256k 1k" \ -c "pwrite 0 32k" -c "funshare 0 1k" <file> This allocates a data block at offset 0, shares it, and then overwrites it with a larger buffered write. The overwrite triggers COW fork preallocation, 32 blocks by default, which maps the entire 32k write to delalloc in the COW fork. All but the shared block at offset 0 remains hole mapped in the data fork. The unshare command redirties and flushes the folio at offset 0, removing the only shared extent from the inode. Since the inode no longer maps shared extents, unshare purges the COW fork before the remaining 28k may have written back. This leaves dirty pagecache backed by holes, which writeback quietly skips, thus leaving clean, non-zeroed pagecache over holes in the file. To verify, fiemap shows holes in the first 32k of the file and reads return different data across a remount: $ xfs_io -c "fiemap -v" <file> <file>: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS ... 1: [8..511]: hole 504 ... $ xfs_io -c "pread -v 4k 8" <file> 00001000: cd cd cd cd cd cd cd cd ........ $ umount <mnt>; mount <dev> <mnt> $ xfs_io -c "pread -v 4k 8" <file> 00001000: 00 00 00 00 00 00 00 00 ........ To avoid this problem, make unshare follow the same rules used for background cowblock scanning and never purge the COW fork for inodes with dirty pagecache or in-flight I/O. Fixes: `46afb0628b` ("xfs: only flush the unshared range in xfs_reflink_unshare") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:42 -08:00
Brian Foster	f56db9ce3c	xfs: skip background cowblock trims on inodes open for write commit 90a71daaf73f5d39bb0cbb3c7ab6af942fe6233e upstream. The background blockgc scanner runs on a 5m interval by default and trims preallocation (post-eof and cow fork) from inodes that are otherwise idle. Idle effectively means that iolock can be acquired without blocking and that the inode has no dirty pagecache or I/O in flight. This simple mechanism and heuristic has worked fairly well for post-eof speculative preallocations. Support for reflink and COW fork preallocations came sometime later and plugged into the same mechanism, with similar heuristics. Some recent testing has shown that COW fork preallocation may be notably more sensitive to blockgc processing than post-eof preallocation, however. For example, consider an 8GB reflinked file with a COW extent size hint of 1MB. A worst case fully randomized overwrite of this file results in ~8k extents of an average size of ~1MB. If the same workload is interrupted a couple times for blockgc processing (assuming the file goes idle), the resulting extent count explodes to over 100k extents with an average size <100kB. This is significantly worse than ideal and essentially defeats the COW extent size hint mechanism. While this particular test is instrumented, it reflects a fairly reasonable pattern in practice where random I/Os might spread out over a large period of time with varying periods of (in)activity. For example, consider a cloned disk image file for a VM or container with long uptime and variable and bursty usage. A background blockgc scan that races and processes the image file when it happens to be clean and idle can have a significant effect on the future fragmentation level of the file, even when still in use. To help combat this, update the heuristic to skip cowblocks inodes that are currently opened for write access during non-sync blockgc scans. This allows COW fork preallocations to persist for as long as possible unless otherwise needed for functional purposes (i.e. a sync scan), the file is idle and closed, or the inode is being evicted from cache. While here, update the comments to help distinguish performance oriented heuristics from the logic that exists to maintain functional correctness. Suggested-by: Darrick Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:42 -08:00
Andrew Kreimer	3e2f7c2051	xfs: fix a typo commit 77bfe1b11ea0c0c4b0ce19b742cd1aa82f60e45d upstream. Fix a typo in comments. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:42 -08:00
Darrick J. Wong	a6790b50bf	xfs: fix a sloppy memory handling bug in xfs_iroot_realloc commit de55149b6639e903c4d06eb0474ab2c05060e61d upstream. While refactoring code, I noticed that when xfs_iroot_realloc tries to shrink a bmbt root block, it allocates a smaller new block and then copies "records" and pointers to the new block. However, bmbt root blocks cannot ever be leaves, which means that it's not technically correct to copy records. We /should/ be copying keys. Note that this has never resulted in actual memory corruption because sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)). However, this will no longer be true when we start adding realtime rmap stuff, so fix this now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:42 -08:00

1 2 3 4 5 ...

1232559 Commits