linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-06 10:58:48 +09:00

Author	SHA1	Message	Date
Hongbo Li	6d09bbbca3	irqchip/sifive-plic: Make use of __assign_bit() [ Upstream commit 40d7af5375a4e27d8576d9d11954ac213d06f09e ] Replace the open coded if (foo) __set_bit(n, bar); else __clear_bit(n, bar); with __assign_bit(). No functional change intended. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/all/20240902130824.2878644-1-lihongbo22@huawei.com Stable-dep-of: f75e07bf5226 ("irqchip/sifive-plic: Avoid interrupt ID 0 handling during suspend/resume") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-19 16:31:01 +02:00
Matthieu Baerts (NGI0)	63c44fa29e	mptcp: pm: in-kernel: usable client side with C-flag commit 4b1ff850e0c1aacc23e923ed22989b827b9808f9 upstream. When servers set the C-flag in their MP_CAPABLE to tell clients not to create subflows to the initial address and port, clients will likely not use their other endpoints. That's because the in-kernel path-manager uses the 'subflow' endpoints to create subflows only to the initial address and port. If the limits have not been modified to accept ADD_ADDR, the client doesn't try to establish new subflows. If the limits accept ADD_ADDR, the routing routes will be used to select the source IP. The C-flag is typically set when the server is operating behind a legacy Layer 4 load balancer, or using anycast IP address. Clients having their different 'subflow' endpoints setup, don't end up creating multiple subflows as expected, and causing some deployment issues. A special case is then added here: when servers set the C-flag in the MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted. The 'subflows' endpoints will then be used with this new remote IP and port. This exception is only allowed when the ADD_ADDR is sent immediately after the 3WHS, and makes the client switching to the 'fully established' mode. After that, 'select_local_address()' will not be able to find any subflows, because 'id_avail_bitmap' will be filled in mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully established' mode. Fixes: `df377be387` ("mptcp: add deny_join_id0 in mptcp_options_received") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536 Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-1-ad126cc47c6b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflict in pm.c, because commit 498d7d8b75f1 ("mptcp: pm: remove '_nl' from mptcp_pm_nl_is_init_remote_addr") renamed an helper in the context, and it is not in this version. The same new code can be applied at the same place. Conflict in pm_kernel.c, because the modified code has been moved from pm_netlink.c to pm_kernel.c in commit 8617e85e04bd ("mptcp: pm: split in-kernel PM specific code"), which is not in this version. The resolution is easy: simply by applying the patch where 'pm_kernel.c' has been replaced 'pm_netlink.c'. Conflict in pm_netlink.c, because commit b83fbca1b4c9 ("mptcp: pm: reduce entries iterations on connect") is not in this version. Instead of using the 'locals' variable (struct mptcp_pm_local ) from the new version and embedding a "struct mptcp_addr_info", we can simply continue to use the 'addrs' variable (struct mptcp_addr_info ). Conflict in protocol.h, because commit af3dc0ad3167 ("mptcp: Remove unused declaration mptcp_sockopt_sync()") is not in this version and it removed one line in the context. The resolution is easy because the new function can still be added at the same place. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:01 +02:00
Lance Yang	06d82c3a1f	selftests/mm: skip soft-dirty tests when CONFIG_MEM_SOFT_DIRTY is disabled commit 0389c305ef56cbadca4cbef44affc0ec3213ed30 upstream. The madv_populate and soft-dirty kselftests currently fail on systems where CONFIG_MEM_SOFT_DIRTY is disabled. Introduce a new helper softdirty_supported() into vm_util.c/h to ensure tests are properly skipped when the feature is not enabled. Link: https://lkml.kernel.org/r/20250917133137.62802-1-lance.yang@linux.dev Fixes: `9f3265db6a` ("selftests: vm: add test for Soft-Dirty PTE bit") Signed-off-by: Lance Yang <lance.yang@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Gabriel Krisman Bertazi <krisman@collabora.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:01 +02:00
Ilya Leoshkevich	ef8a0b37f1	s390/bpf: Write back tail call counter for BPF_TRAMP_F_CALL_ORIG commit bc3905a71f02511607d3ccf732360580209cac4c upstream. The tailcall_bpf2bpf_hierarchy_fentry test hangs on s390. Its call graph is as follows: entry() subprog_tail() trampoline() fentry() the rest of subprog_tail() # via BPF_TRAMP_F_CALL_ORIG return to entry() The problem is that the rest of subprog_tail() increments the tail call counter, but the trampoline discards the incremented value. This results in an astronomically large number of tail calls. Fix by making the trampoline write the incremented tail call counter back. Fixes: `528eb2cb87` ("s390/bpf: Implement arch_prepare_bpf_trampoline()") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250813121016.163375-4-iii@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:01 +02:00
Ilya Leoshkevich	1527222f35	s390/bpf: Write back tail call counter for BPF_PSEUDO_CALL commit c861a6b147137d10b5ff88a2c492ba376cd1b8b0 upstream. The tailcall_bpf2bpf_hierarchy_1 test hangs on s390. Its call graph is as follows: entry() subprog_tail() bpf_tail_call_static(0) -> entry + tail_call_start subprog_tail() bpf_tail_call_static(0) -> entry + tail_call_start entry() copies its tail call counter to the subprog_tail()'s frame, which then increments it. However, the incremented result is discarded, leading to an astronomically large number of tail calls. Fix by writing the incremented counter back to the entry()'s frame. Fixes: `dd691e847d` ("s390/bpf: Implement bpf_jit_supports_subprog_tailcalls()") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250813121016.163375-3-iii@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:00 +02:00
Ilya Leoshkevich	2c768a9d1c	s390/bpf: Describe the frame using a struct instead of constants commit e26d523edf2a62b142d2dd2dd9b87f61ed92f33a upstream. Currently the caller-allocated portion of the stack frame is described using constants, hardcoded values, and an ASCII drawing, making it harder than necessary to ensure that everything is in sync. Declare a struct and use offsetof() and offsetofend() macros to refer to various values stored within the frame. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250624121501.50536-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:00 +02:00
Ilya Leoshkevich	10358217e3	s390/bpf: Centralize frame offset calculations commit b2268d550d20ff860bddfe3a91b1aec00414689a upstream. The calculation of the distance from %r15 to the caller-allocated portion of the stack frame is copy-pasted into multiple places in the JIT code. Move it to bpf_jit_prog() and save the result into bpf_jit::frame_off, so that the other parts of the JIT can use it. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250624121501.50536-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:00 +02:00
Ilya Leoshkevich	9b378246e7	s390/bpf: Change seen_reg to a mask commit 7ba4f43e16de351fe9821de80e15d88c884b2967 upstream. Using a mask instead of an array saves a small amount of memory and allows marking multiple registers as seen with a simple "or". Another positive side-effect is that it speeds up verification with jitterbug. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240703005047.40915-2-iii@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:00 +02:00
Rafael J. Wysocki	63d2008aff	ACPI: property: Do not pass NULL handles to acpi_attach_data() [ Upstream commit baf60d5cb8bc6b85511c5df5f0ad7620bb66d23c ] In certain circumstances, the ACPI handle of a data-only node may be NULL, in which case it does not make sense to attempt to attach that node to an ACPI namespace object, so update the code to avoid attempts to do so. This prevents confusing and unuseful error messages from being printed. Also document the fact that the ACPI handle of a data-only node may be NULL and when that happens in a code comment. In addition, make acpi_add_nondev_subnodes() print a diagnostic message for each data-only node with an unknown ACPI namespace scope. Fixes: `1d52f10917` ("ACPI: property: Tie data nodes to acpi handles") Cc: 6.0+ <stable@vger.kernel.org> # 6.0+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:00 +02:00
Rafael J. Wysocki	af0ff085cd	ACPI: property: Add code comments explaining what is going on [ Upstream commit 737c3a09dcf69ba2814f3674947ccaec1861c985 ] In some places in the ACPI device properties handling code, it is unclear why the code is what it is. Some assumptions are not documented and some pieces of code are based on knowledge that is not mentioned anywhere. Add code comments explaining these things. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Stable-dep-of: baf60d5cb8bc ("ACPI: property: Do not pass NULL handles to acpi_attach_data()") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:00 +02:00
Rafael J. Wysocki	156819a047	ACPI: property: Disregard references in data-only subnode lists [ Upstream commit d06118fe9b03426484980ed4c189a8c7b99fa631 ] Data-only subnode links following the ACPI data subnode GUID in a _DSD package are expected to point to named objects returning _DSD-equivalent packages. If a reference to such an object is used in the target field of any of those links, that object will be evaluated in place (as a named object) and its return data will be embedded in the outer _DSD package. For this reason, it is not expected to see a subnode link with the target field containing a local reference (that would mean pointing to a device or another object that cannot be evaluated in place and therefore cannot return a _DSD-equivalent package). Accordingly, simplify the code parsing data-only subnode links to simply print a message when it encounters a local reference in the target field of one of those links. Moreover, since acpi_nondev_subnode_data_ok() would only have one caller after the change above, fold it into that caller. Link: https://lore.kernel.org/linux-acpi/CAJZ5v0jVeSrDO6hrZhKgRZrH=FpGD4vNUjFD8hV9WwN9TLHjzQ@mail.gmail.com/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Stable-dep-of: baf60d5cb8bc ("ACPI: property: Do not pass NULL handles to acpi_attach_data()") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:00 +02:00
Rafael J. Wysocki	1ed161347a	ACPI: battery: Add synchronization between interface updates [ Upstream commit 399dbcadc01ebf0035f325eaa8c264f8b5cd0a14 ] There is no synchronization between different code paths in the ACPI battery driver that update its sysfs interface or its power supply class device interface. In some cases this results to functional failures due to race conditions. One example of this is when two ACPI notifications: - ACPI_BATTERY_NOTIFY_STATUS (0x80) - ACPI_BATTERY_NOTIFY_INFO (0x81) are triggered (by the platform firmware) in a row with a little delay in between after removing and reinserting a laptop battery. Both notifications cause acpi_battery_update() to be called and if the delay between them is sufficiently small, sysfs_add_battery() can be re-entered before battery->bat is set which leads to a duplicate sysfs entry error: sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0A:00/power_supply/BAT1' CPU: 1 UID: 0 PID: 185 Comm: kworker/1:4 Kdump: loaded Not tainted 6.12.38+deb13-amd64 #1 Debian 6.12.38-1 Hardware name: Gateway NV44 /SJV40-MV , BIOS V1.3121 04/08/2009 Workqueue: kacpi_notify acpi_os_execute_deferred Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 sysfs_warn_dup.cold+0x17/0x23 sysfs_create_dir_ns+0xce/0xe0 kobject_add_internal+0xba/0x250 kobject_add+0x96/0xc0 ? get_device_parent+0xde/0x1e0 device_add+0xe2/0x870 __power_supply_register.part.0+0x20f/0x3f0 ? wake_up_q+0x4e/0x90 sysfs_add_battery+0xa4/0x1d0 [battery] acpi_battery_update+0x19e/0x290 [battery] acpi_battery_notify+0x50/0x120 [battery] acpi_ev_notify_dispatch+0x49/0x70 acpi_os_execute_deferred+0x1a/0x30 process_one_work+0x177/0x330 worker_thread+0x251/0x390 ? __pfx_worker_thread+0x10/0x10 kthread+0xd2/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> kobject: kobject_add_internal failed for BAT1 with -EEXIST, don't try to register things with the same name in the same directory. There are also other scenarios in which analogous issues may occur. Address this by using a common lock in all of the code paths leading to updates of driver interfaces: ACPI Notify () handler, system resume callback and post-resume notification, device addition and removal. This new lock replaces sysfs_lock that has been used only in sysfs_remove_battery() which now is going to be always called under the new lock, so it doesn't need any internal locking any more. Fixes: `1066625155` ("ACPI: battery: Install Notify() handler directly") Closes: https://lore.kernel.org/linux-acpi/20250910142653.313360-1-luogf2025@163.com/ Reported-by: GuangFei Luo <luogf2025@163.com> Tested-by: GuangFei Luo <luogf2025@163.com> Cc: 6.6+ <stable@vger.kernel.org> # 6.6+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:00 +02:00
Andy Shevchenko	8f03b1bf2b	ACPI: battery: Check for error code from devm_mutex_init() call [ Upstream commit 815daedc318b2f9f1b956d0631377619a0d69d96 ] Even if it's not critical, the avoidance of checking the error code from devm_mutex_init() call today diminishes the point of using devm variant of it. Tomorrow it may even leak something. Add the missed check. Fixes: 0710c1ce5045 ("ACPI: battery: initialize mutexes through devm_ APIs") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20241030162754.2110946-1-andriy.shevchenko@linux.intel.com [ rjw: Added 2 empty code lines ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 399dbcadc01e ("ACPI: battery: Add synchronization between interface updates") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:31:00 +02:00
Thomas Weißschuh	e6c83bbb01	ACPI: battery: initialize mutexes through devm_ APIs [ Upstream commit 0710c1ce50455ed0db91bffa0eebbaa4f69b1773 ] Simplify the cleanup logic a bit. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20240904-acpi-battery-cleanups-v1-3-a3bf74f22d40@weissschuh.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 399dbcadc01e ("ACPI: battery: Add synchronization between interface updates") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:59 +02:00
Thomas Weißschuh	346975d626	ACPI: battery: allocate driver data through devm_ APIs [ Upstream commit 909dfc60692331e1599d5e28a8f08a611f353aef ] Simplify the cleanup logic a bit. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20240904-acpi-battery-cleanups-v1-2-a3bf74f22d40@weissschuh.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 399dbcadc01e ("ACPI: battery: Add synchronization between interface updates") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:59 +02:00
Catalin Marinas	b8c7d40b4c	arm64: mte: Do not flag the zero page as PG_mte_tagged [ Upstream commit f620d66af3165838bfa845dcf9f5f9b4089bf508 ] Commit `68d54ceeec` ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page") attempted to fix ptrace() reading of tags from the zero page by marking it as PG_mte_tagged during cpu_enable_mte(). The same commit also changed the ptrace() tag access permission check to the VM_MTE vma flag while turning the page flag test into a WARN_ON_ONCE(). Attempting to set the PG_mte_tagged flag early with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled may either hang (after commit `d77e59a8fc` "arm64: mte: Lock a page for MTE tag initialisation") or have the flags cleared later during page_alloc_init_late(). In addition, pages_identical() -> memcmp_pages() will reject any comparison with the zero page as it is marked as tagged. Partially revert the above commit to avoid setting PG_mte_tagged on the zero page. Update the __access_remote_tags() warning on untagged pages to ignore the zero page since it is known to have the tags initialised. Note that all user mapping of the zero page are marked as pte_special(). The arm64 set_pte_at() will not call mte_sync_tags() on such pages, so PG_mte_tagged will remain cleared. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: `68d54ceeec` ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page") Reported-by: Gergely Kovacs <Gergely.Kovacs2@arm.com> Cc: stable@vger.kernel.org # 5.10.x Cc: Will Deacon <will@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lance Yang <lance.yang@linux.dev> Acked-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: David Hildenbrand <david@redhat.com> Tested-by: Lance Yang <lance.yang@linux.dev> Signed-off-by: Will Deacon <will@kernel.org> [ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:59 +02:00
Yang Shi	7f4f031e66	arm64: kprobes: call set_memory_rox() for kprobe page [ Upstream commit 195a1b7d8388c0ec2969a39324feb8bebf9bb907 ] The kprobe page is allocated by execmem allocator with ROX permission. It needs to call set_memory_rox() to set proper permission for the direct map too. It was missed. Fixes: `10d5e97c1b` ("arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page") Cc: <stable@vger.kernel.org> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org> [ kept existing __vmalloc_node_range() instead of upstream's execmem_alloc() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:59 +02:00
Guenter Roeck	ca603d1576	ipmi: Fix handling of messages with provided receive message pointer commit e2c69490dda5d4c9f1bfbb2898989c8f3530e354 upstream Prior to commit b52da4054ee0 ("ipmi: Rework user message limit handling"), i_ipmi_request() used to increase the user reference counter if the receive message is provided by the caller of IPMI API functions. This is no longer the case. However, ipmi_free_recv_msg() is still called and decreases the reference counter. This results in the reference counter reaching zero, the user data pointer is released, and all kinds of interesting crashes are seen. Fix the problem by increasing user reference counter if the receive message has been provided by the caller. Fixes: b52da4054ee0 ("ipmi: Rework user message limit handling") Reported-by: Eric Dumazet <edumazet@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Message-ID: <20251006201857.3433837-1-linux@roeck-us.net> Signed-off-by: Corey Minyard <corey@minyard.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:59 +02:00
Corey Minyard	348121b295	ipmi: Rework user message limit handling commit b52da4054ee0bf9ecb44996f2c83236ff50b3812 upstream This patch required quite a bit of work to backport due to a number of unrelated changes that do not make sense to backport. This has been run against my test suite and passes all tests. The limit on the number of user messages had a number of issues, improper counting in some cases and a use after free. Restructure how this is all done to handle more in the receive message allocation routine, so all refcouting and user message limit counts are done in that routine. It's a lot cleaner and safer. Reported-by: Gilles BULOZ <gilles.buloz@kontron.com> Closes: https://lore.kernel.org/lkml/aLsw6G0GyqfpKs2S@mail.minyard.net/ Fixes: `8e76741c3d` ("ipmi: Add a limit on the number of users that may use IPMI") Cc: <stable@vger.kernel.org> # 4.19 Signed-off-by: Corey Minyard <corey@minyard.net> Tested-by: Gilles BULOZ <gilles.buloz@kontron.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:59 +02:00
Sean Christopherson	aafae78e6d	KVM: SVM: Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 [ Upstream commit 68e61f6fd65610e73b17882f86fedfd784d99229 ] Emulate PERF_CNTR_GLOBAL_STATUS_SET when PerfMonV2 is enumerated to the guest, as the MSR is supposed to exist in all AMD v2 PMUs. Fixes: `4a2771895c` ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support") Cc: stable@vger.kernel.org Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20250711172746.1579423-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> [ changed global_status_rsvd field to global_status_mask ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:59 +02:00
Thomas Gleixner	d0d9fa88d7	rseq: Protect event mask against membarrier IPI [ Upstream commit 6eb350a2233100a283f882c023e5ad426d0ed63b ] rseq_need_restart() reads and clears task::rseq_event_mask with preemption disabled to guard against the scheduler. But membarrier() uses an IPI and sets the PREEMPT bit in the event mask from the IPI, which leaves that RMW operation unprotected. Use guard(irq) if CONFIG_MEMBARRIER is enabled to fix that. Fixes: `2a36ab717e` ("rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: stable@vger.kernel.org [ Applied changes to include/linux/sched.h instead of include/linux/rseq.h ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:59 +02:00
Qu Wenruo	1810b6084a	btrfs: fix the incorrect max_bytes value for find_lock_delalloc_range() [ Upstream commit 7b26da407420e5054e3f06c5d13271697add9423 ] [BUG] With my local branch to enable bs > ps support for btrfs, sometimes I hit the following ASSERT() inside submit_one_sector(): ASSERT(block_start != EXTENT_MAP_HOLE); Please note that it's not yet possible to hit this ASSERT() in the wild yet, as it requires btrfs bs > ps support, which is not even in the development branch. But on the other hand, there is also a very low chance to hit above ASSERT() with bs < ps cases, so this is an existing bug affect not only the incoming bs > ps support but also the existing bs < ps support. [CAUSE] Firstly that ASSERT() means we're trying to submit a dirty block but without a real extent map nor ordered extent map backing it. Furthermore with extra debugging, the folio triggering such ASSERT() is always larger than the fs block size in my bs > ps case. (8K block size, 4K page size) After some more debugging, the ASSERT() is trigger by the following sequence: extent_writepage() \| We got a 32K folio (4 fs blocks) at file offset 0, and the fs block \| size is 8K, page size is 4K. \| And there is another 8K folio at file offset 32K, which is also \| dirty. \| So the filemap layout looks like the following: \| \| "\|\|" is the filio boundary in the filemap. \| "//\| is the dirty range. \| \| 0 8K 16K 24K 32K 40K \| \|////////\| \|//////////////////////\|\|////////\| \| \|- writepage_delalloc() \| \|- find_lock_delalloc_range() for [0, 8K) \| \| Now range [0, 8K) is properly locked. \| \| \| \|- find_lock_delalloc_range() for [16K, 40K) \| \| \|- btrfs_find_delalloc_range() returned range [16K, 40K) \| \| \|- lock_delalloc_folios() locked folio 0 successfully \| \| \| \| \| \| The filemap range [32K, 40K) got dropped from filemap. \| \| \| \| \| \|- lock_delalloc_folios() failed with -EAGAIN on folio 32K \| \| \| As the folio at 32K is dropped. \| \| \| \| \| \|- loops = 1; \| \| \|- max_bytes = PAGE_SIZE; \| \| \|- goto again; \| \| \| This will re-do the lookup for dirty delalloc ranges. \| \| \| \| \| \|- btrfs_find_delalloc_range() called with @max_bytes == 4K \| \| \| This is smaller than block size, so \| \| \| btrfs_find_delalloc_range() is unable to return any range. \| \| \- return false; \| \| \| \- Now only range [0, 8K) has an OE for it, but for dirty range \| [16K, 32K) it's dirty without an OE. \| This breaks the assumption that writepage_delalloc() will find \| and lock all dirty ranges inside the folio. \| \|- extent_writepage_io() \|- submit_one_sector() for [0, 8K) \| Succeeded \| \|- submit_one_sector() for [16K, 24K) Triggering the ASSERT(), as there is no OE, and the original extent map is a hole. Please note that, this also exposed the same problem for bs < ps support. E.g. with 64K page size and 4K block size. If we failed to lock a folio, and falls back into the "loops = 1;" branch, we will re-do the search using 64K as max_bytes. Which may fail again to lock the next folio, and exit early without handling all dirty blocks inside the folio. [FIX] Instead of using the fixed size PAGE_SIZE as @max_bytes, use @sectorsize, so that we are ensured to find and lock any remaining blocks inside the folio. And since we're here, add an extra ASSERT() to before calling btrfs_find_delalloc_range() to make sure the @max_bytes is at least no smaller than a block to avoid false negative. Cc: stable@vger.kernel.org # 5.15+ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> [ adapted folio terminology and API calls to page-based equivalents ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:58 +02:00
Shin'ichiro Kawasaki	0c5ce6b6cc	PCI: endpoint: pci-epf-test: Add NULL check for DMA channels before release [ Upstream commit 85afa9ea122dd9d4a2ead104a951d318975dcd25 ] The fields dma_chan_tx and dma_chan_rx of the struct pci_epf_test can be NULL even after EPF initialization. Then it is prudent to check that they have non-NULL values before releasing the channels. Add the checks in pci_epf_test_clean_dma_chan(). Without the checks, NULL pointer dereferences happen and they can lead to a kernel panic in some cases: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050 Call trace: dma_release_channel+0x2c/0x120 (P) pci_epf_test_epc_deinit+0x94/0xc0 [pci_epf_test] pci_epc_deinit_notify+0x74/0xc0 tegra_pcie_ep_pex_rst_irq+0x250/0x5d8 irq_thread_fn+0x34/0xb8 irq_thread+0x18c/0x2e8 kthread+0x14c/0x210 ret_from_fork+0x10/0x20 Fixes: `8353813c88` ("PCI: endpoint: Enable DMA tests for endpoints with DMA capabilities") Fixes: `5ebf3fc59b` ("PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> [mani: trimmed the stack trace] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250916025756.34807-1-shinichiro.kawasaki@wdc.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:58 +02:00
Wang Jiang	93b8a612db	PCI: endpoint: Remove surplus return statement from pci_epf_test_clean_dma_chan() [ Upstream commit 9b80bdb10aee04ce7289896e6bdad13e33972636 ] Remove a surplus return statement from the void function that has been added in the commit commit `8353813c88` ("PCI: endpoint: Enable DMA tests for endpoints with DMA capabilities"). Especially, as an empty return statements at the end of a void functions serve little purpose. This fixes the following checkpatch.pl script warning: WARNING: void function return statements are not generally useful #296: FILE: drivers/pci/endpoint/functions/pci-epf-test.c:296: + return; +} Link: https://lore.kernel.org/r/tencent_F250BEE2A65745A524E2EFE70CF615CA8F06@qq.com Signed-off-by: Wang Jiang <jiangwang@kylinos.cn> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Stable-dep-of: 85afa9ea122d ("PCI: endpoint: pci-epf-test: Add NULL check for DMA channels before release") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:58 +02:00
Ling Xu	0fa2185104	misc: fastrpc: Save actual DMA size in fastrpc_map structure [ Upstream commit 8b5b456222fd604079b5cf2af1f25ad690f54a25 ] For user passed fd buffer, map is created using DMA calls. The map related information is stored in fastrpc_map structure. The actual DMA size is not stored in the structure. Store the actual size of buffer and check it against the user passed size. Fixes: `c68cfb718c` ("misc: fastrpc: Add support for context Invoke method") Cc: stable@kernel.org Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Co-developed-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com> Signed-off-by: Ling Xu <quic_lxu5@quicinc.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250912131236.303102-2-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:58 +02:00
Ekansh Gupta	78e5fa79ed	misc: fastrpc: Add missing dev_err newlines [ Upstream commit a150c68ae6369ea65b786fefd0b8aa0b075c041a ] Few dev_err calls are missing newlines. This can result in unrelated lines getting appended which might make logs difficult to understand. Add trailing newlines to avoid this. Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Caleb Connolly <caleb.connolly@linaro.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20240705075900.424100-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 8b5b456222fd ("misc: fastrpc: Save actual DMA size in fastrpc_map structure") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:58 +02:00
Namjae Jeon	bc718d0bd8	ksmbd: add max ip connections parameter [ Upstream commit d8b6dc9256762293048bf122fc11c4e612d0ef5d ] This parameter set the maximum number of connections per ip address. The default is 8. Cc: stable@vger.kernel.org Fixes: c0d41112f1a5 ("ksmbd: extend the connection limiting mechanism to support IPv6") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> [ Adjust reserved room ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:58 +02:00
Sean Christopherson	cd3efb9367	KVM: SVM: Skip fastpath emulation on VM-Exit if next RIP isn't valid [ Upstream commit 0910dd7c9ad45a2605c45fd2bf3d1bcac087687c ] Skip the WRMSR and HLT fastpaths in SVM's VM-Exit handler if the next RIP isn't valid, e.g. because KVM is running with nrips=false. SVM must decode and emulate to skip the instruction if the CPU doesn't provide the next RIP, and getting the instruction bytes to decode requires reading guest memory. Reading guest memory through the emulator can fault, i.e. can sleep, which is disallowed since the fastpath handlers run with IRQs disabled. BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:106 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 32611, name: qemu preempt_count: 1, expected: 0 INFO: lockdep is turned off. irq event stamp: 30580 hardirqs last enabled at (30579): [<ffffffffc08b2527>] vcpu_run+0x1787/0x1db0 [kvm] hardirqs last disabled at (30580): [<ffffffffb4f62e32>] __schedule+0x1e2/0xed0 softirqs last enabled at (30570): [<ffffffffb4247a64>] fpu_swap_kvm_fpstate+0x44/0x210 softirqs last disabled at (30568): [<ffffffffb4247a64>] fpu_swap_kvm_fpstate+0x44/0x210 CPU: 298 UID: 0 PID: 32611 Comm: qemu Tainted: G U 6.16.0-smp--e6c618b51cfe-sleep #782 NONE Tainted: [U]=USER Hardware name: Google Astoria-Turin/astoria, BIOS 0.20241223.2-0 01/17/2025 Call Trace: <TASK> dump_stack_lvl+0x7d/0xb0 __might_resched+0x271/0x290 __might_fault+0x28/0x80 kvm_vcpu_read_guest_page+0x8d/0xc0 [kvm] kvm_fetch_guest_virt+0x92/0xc0 [kvm] __do_insn_fetch_bytes+0xf3/0x1e0 [kvm] x86_decode_insn+0xd1/0x1010 [kvm] x86_emulate_instruction+0x105/0x810 [kvm] __svm_skip_emulated_instruction+0xc4/0x140 [kvm_amd] handle_fastpath_invd+0xc4/0x1a0 [kvm] vcpu_run+0x11a1/0x1db0 [kvm] kvm_arch_vcpu_ioctl_run+0x5cc/0x730 [kvm] kvm_vcpu_ioctl+0x578/0x6a0 [kvm] __se_sys_ioctl+0x6d/0xb0 do_syscall_64+0x8a/0x2c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f479d57a94b </TASK> Note, this is essentially a reapply of commit `5c30e8101e` ("KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"), but with different justification (KVM now grabs SRCU when skipping the instruction for other reasons). Fixes: `b439eb8ab5` ("Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250805190526.1453366-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> [ adapted switch-based MSR/HLT fastpath to if-based MSR-only check ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:58 +02:00
Donet Tom	ad25061d1d	mm/ksm: fix incorrect KSM counter handling in mm_struct during fork [ Upstream commit 4d6fc29f36341d7795db1d1819b4c15fe9be7b23 ] Patch series "mm/ksm: Fix incorrect accounting of KSM counters during fork", v3. The first patch in this series fixes the incorrect accounting of KSM counters such as ksm_merging_pages, ksm_rmap_items, and the global ksm_zero_pages during fork. The following patch add a selftest to verify the ksm_merging_pages counter was updated correctly during fork. Test Results ============ Without the first patch ----------------------- # [RUN] test_fork_ksm_merging_page_count not ok 10 ksm_merging_page in child: 32 With the first patch -------------------- # [RUN] test_fork_ksm_merging_page_count ok 10 ksm_merging_pages is not inherited after fork This patch (of 2): Currently, the KSM-related counters in `mm_struct`, such as `ksm_merging_pages`, `ksm_rmap_items`, and `ksm_zero_pages`, are inherited by the child process during fork. This results in inconsistent accounting. When a process uses KSM, identical pages are merged and an rmap item is created for each merged page. The `ksm_merging_pages` and `ksm_rmap_items` counters are updated accordingly. However, after a fork, these counters are copied to the child while the corresponding rmap items are not. As a result, when the child later triggers an unmerge, there are no rmap items present in the child, so the counters remain stale, leading to incorrect accounting. A similar issue exists with `ksm_zero_pages`, which maintains both a global counter and a per-process counter. During fork, the per-process counter is inherited by the child, but the global counter is not incremented. Since the child also references zero pages, the global counter should be updated as well. Otherwise, during zero-page unmerge, both the global and per-process counters are decremented, causing the global counter to become inconsistent. To fix this, ksm_merging_pages and ksm_rmap_items are reset to 0 during fork, and the global ksm_zero_pages counter is updated with the per-process ksm_zero_pages value inherited by the child. This ensures that KSM statistics remain accurate and reflect the activity of each process correctly. Link: https://lkml.kernel.org/r/cover.1758648700.git.donettom@linux.ibm.com Link: https://lkml.kernel.org/r/7b9870eb67ccc0d79593940d9dbd4a0b39b5d396.1758648700.git.donettom@linux.ibm.com Fixes: `7609385337` ("ksm: count ksm merging pages for each process") Fixes: `cb4df4cae4` ("ksm: count allocated ksm rmap_items for each process") Fixes: `e2942062e0` ("ksm: count all zero pages placed by KSM") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Cc: Aboorva Devarajan <aboorvad@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: <stable@vger.kernel.org> [6.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ changed mm_flags_test() to test_bit() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:58 +02:00
Yuan Chen	1a301228c0	tracing: Fix race condition in kprobe initialization causing NULL pointer dereference [ Upstream commit 9cf9aa7b0acfde7545c1a1d912576e9bab28dc6f ] There is a critical race condition in kprobe initialization that can lead to NULL pointer dereference and kernel crash. [1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000 ... [1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO) [1135630.269239] pc : kprobe_perf_func+0x30/0x260 [1135630.277643] lr : kprobe_dispatcher+0x44/0x60 [1135630.286041] sp : ffffaeff4977fa40 [1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400 [1135630.302837] x27: 0000000000000000 x26: 0000000000000000 [1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528 [1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50 [1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50 [1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000 [1135630.349985] x17: 0000000000000000 x16: 0000000000000000 [1135630.359285] x15: 0000000000000000 x14: 0000000000000000 [1135630.368445] x13: 0000000000000000 x12: 0000000000000000 [1135630.377473] x11: 0000000000000000 x10: 0000000000000000 [1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000 [1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000 [1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000 [1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006 [1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000 [1135630.429410] Call trace: [1135630.434828] kprobe_perf_func+0x30/0x260 [1135630.441661] kprobe_dispatcher+0x44/0x60 [1135630.448396] aggr_pre_handler+0x70/0xc8 [1135630.454959] kprobe_breakpoint_handler+0x140/0x1e0 [1135630.462435] brk_handler+0xbc/0xd8 [1135630.468437] do_debug_exception+0x84/0x138 [1135630.475074] el1_dbg+0x18/0x8c [1135630.480582] security_file_permission+0x0/0xd0 [1135630.487426] vfs_write+0x70/0x1c0 [1135630.493059] ksys_write+0x5c/0xc8 [1135630.498638] __arm64_sys_write+0x24/0x30 [1135630.504821] el0_svc_common+0x78/0x130 [1135630.510838] el0_svc_handler+0x38/0x78 [1135630.516834] el0_svc+0x8/0x1b0 kernel/trace/trace_kprobe.c: 1308 0xffff3df8995039ec <kprobe_perf_func+0x2c>: ldr x21, [x24,#120] include/linux/compiler.h: 294 0xffff3df8995039f0 <kprobe_perf_func+0x30>: ldr x1, [x21,x0] kernel/trace/trace_kprobe.c 1308: head = this_cpu_ptr(call->perf_events); 1309: if (hlist_empty(head)) 1310: return 0; crash> struct trace_event_call -o struct trace_event_call { ... [120] struct hlist_head *perf_events; //(call->perf_event) ... } crash> struct trace_event_call ffffaf015340e528 struct trace_event_call { ... perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0 ... } Race Condition Analysis: The race occurs between kprobe activation and perf_events initialization: CPU0 CPU1 ==== ==== perf_kprobe_init perf_trace_event_init tp_event->perf_events = list;(1) tp_event->class->reg (2)← KPROBE ACTIVE Debug exception triggers ... kprobe_dispatcher kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE) head = this_cpu_ptr(call->perf_events)(3) (perf_events is still NULL) Problem: 1. CPU0 executes (1) assigning tp_event->perf_events = list 2. CPU0 executes (2) enabling kprobe functionality via class->reg() 3. CPU1 triggers and reaches kprobe_dispatcher 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed) 5. CPU1 calls kprobe_perf_func() and crashes at (3) because call->perf_events is still NULL CPU1 sees that kprobe functionality is enabled but does not see that perf_events has been assigned. Add pairing read and write memory barriers to guarantee that if CPU1 sees that kprobe functionality is enabled, it must also see that perf_events has been assigned. Link: https://lore.kernel.org/all/20251001022025.44626-1-chenyuan_fl@163.com/ Fixes: `50d7805607` ("tracing/kprobes: Add probe handler dispatcher to support perf and ftrace concurrent use") Cc: stable@vger.kernel.org Signed-off-by: Yuan Chen <chenyuan@kylinos.cn> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> [ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:58 +02:00
Hans de Goede	8d2a77ccef	mfd: intel_soc_pmic_chtdc_ti: Set use_single_read regmap_config flag [ Upstream commit 64e0d839c589f4f2ecd2e3e5bdb5cee6ba6bade9 ] Testing has shown that reading multiple registers at once (for 10-bit ADC values) does not work. Set the use_single_read regmap_config flag to make regmap split these for us. This should fix temperature opregion accesses done by drivers/acpi/pmic/intel_pmic_chtdc_ti.c and is also necessary for the upcoming drivers for the ADC and battery MFD cells. Fixes: `6bac0606fd` ("mfd: Add support for Cherry Trail Dollar Cove TI PMIC") Cc: stable@vger.kernel.org Reviewed-by: Andy Shevchenko <andy@kernel.org> Signed-off-by: Hans de Goede <hansg@kernel.org> Link: https://lore.kernel.org/r/20250804133240.312383-1-hansg@kernel.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:57 +02:00
Andy Shevchenko	3cb4b35687	mfd: intel_soc_pmic_chtdc_ti: Drop unneeded assignment for cache_type [ Upstream commit 9eb99c08508714906db078b5efbe075329a3fb06 ] REGCACHE_NONE is the default type of the cache when not provided. Drop unneeded explicit assignment to it. Note, it's defined to 0, and if ever be redefined, it will break literally a lot of the drivers, so it very unlikely to happen. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20250129152823.1802273-1-andriy.shevchenko@linux.intel.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: 64e0d839c589 ("mfd: intel_soc_pmic_chtdc_ti: Set use_single_read regmap_config flag") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:57 +02:00
Hans de Goede	d7b038045e	mfd: intel_soc_pmic_chtdc_ti: Fix invalid regmap-config max_register value [ Upstream commit 70e997e0107e5ed85c1a3ef2adfccbe351c29d71 ] The max_register = 128 setting in the regmap config is not valid. The Intel Dollar Cove TI PMIC has an eeprom unlock register at address 0x88 and a number of EEPROM registers at 0xF?. Increase max_register to 0xff so that these registers can be accessed. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://lore.kernel.org/r/20241208150028.325349-1-hdegoede@redhat.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: 64e0d839c589 ("mfd: intel_soc_pmic_chtdc_ti: Set use_single_read regmap_config flag") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:57 +02:00
Edward Adam Davis	7bd4e5367d	media: mc: Clear minor number before put device [ Upstream commit 8cfc8cec1b4da88a47c243a11f384baefd092a50 ] The device minor should not be cleared after the device is released. Fixes: 9e14868dc952 ("media: mc: Clear minor number reservation at unregistration time") Cc: stable@vger.kernel.org Reported-by: syzbot+031d0cfd7c362817963f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=031d0cfd7c362817963f Tested-by: syzbot+031d0cfd7c362817963f@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> [ moved clear_bit from media_devnode_release callback to media_devnode_unregister before put_device ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:57 +02:00
Phillip Lougher	8c7aad7675	Squashfs: reject negative file sizes in squashfs_read_inode() [ Upstream commit 9f1c14c1de1bdde395f6cc893efa4f80a2ae3b2b ] Syskaller reports a "WARNING in ovl_copy_up_file" in overlayfs. This warning is ultimately caused because the underlying Squashfs file system returns a file with a negative file size. This commit checks for a negative file size and returns EINVAL. [phillip@squashfs.org.uk: only need to check 64 bit quantity] Link: https://lkml.kernel.org/r/20250926222305.110103-1-phillip@squashfs.org.uk Link: https://lkml.kernel.org/r/20250926215935.107233-1-phillip@squashfs.org.uk Fixes: `6545b246a2` ("Squashfs: inode operations") Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: syzbot+f754e01116421e9754b9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68d580e5.a00a0220.303701.0019.GAE@google.com/ Cc: Amir Goldstein <amir73il@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:57 +02:00
Phillip Lougher	f5a1b04e5d	Squashfs: add additional inode sanity checking [ Upstream commit 9ee94bfbe930a1b39df53fa2d7b31141b780eb5a ] Patch series "Squashfs: performance improvement and a sanity check". This patchset adds an additional sanity check when reading regular file inodes, and adds support for SEEK_DATA/SEEK_HOLE lseek() whence values. This patch (of 2): Add an additional sanity check when reading regular file inodes. A regular file if the file size is an exact multiple of the filesystem block size cannot have a fragment. This is because by definition a fragment block stores tailends which are not a whole block in size. Link: https://lkml.kernel.org/r/20250923220652.568416-1-phillip@squashfs.org.uk Link: https://lkml.kernel.org/r/20250923220652.568416-2-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 9f1c14c1de1b ("Squashfs: reject negative file sizes in squashfs_read_inode()") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:57 +02:00
Nathan Chancellor	edb6425f59	lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and older commit 2f13daee2a72bb962f5fd356c3a263a6f16da965 upstream. After commit 6f110a5e4f99 ("Disable SLUB_TINY for build testing"), which causes CONFIG_KASAN to be enabled in allmodconfig again, arm64 allmodconfig builds with clang-17 and older show an instance of -Wframe-larger-than (which breaks the build with CONFIG_WERROR=y): lib/crypto/curve25519-hacl64.c:757:6: error: stack frame size (2336) exceeds limit (2048) in 'curve25519_generic' [-Werror,-Wframe-larger-than] 757 \| void curve25519_generic(u8 mypublic[CURVE25519_KEY_SIZE], \| ^ When KASAN is disabled, the stack usage is roughly quartered: lib/crypto/curve25519-hacl64.c:757:6: error: stack frame size (608) exceeds limit (128) in 'curve25519_generic' [-Werror,-Wframe-larger-than] 757 \| void curve25519_generic(u8 mypublic[CURVE25519_KEY_SIZE], \| ^ Using '-Rpass-analysis=stack-frame-layout' shows the following variables and many, many 8-byte spills when KASAN is enabled: Offset: [SP-144], Type: Variable, Align: 8, Size: 40 Offset: [SP-464], Type: Variable, Align: 8, Size: 320 Offset: [SP-784], Type: Variable, Align: 8, Size: 320 Offset: [SP-864], Type: Variable, Align: 32, Size: 80 Offset: [SP-896], Type: Variable, Align: 32, Size: 32 Offset: [SP-1016], Type: Variable, Align: 8, Size: 120 When KASAN is disabled, there are still spills but not at many and the variables list is smaller: Offset: [SP-192], Type: Variable, Align: 32, Size: 80 Offset: [SP-224], Type: Variable, Align: 32, Size: 32 Offset: [SP-344], Type: Variable, Align: 8, Size: 120 Disable KASAN for this file when using clang-17 or older to avoid blowing out the stack, clearing up the warning. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250609-curve25519-hacl64-disable-kasan-clang-v1-1-08ea0ac5ccff@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:57 +02:00
Jan Kara	abdfc4704e	ext4: free orphan info with kvfree commit 971843c511c3c2f6eda96c6b03442913bfee6148 upstream. Orphan info is now getting allocated with kvmalloc_array(). Free it with kvfree() instead of kfree() to avoid complaints from mm. Reported-by: Chris Mason <clm@meta.com> Fixes: 0a6ce20c1564 ("ext4: verify orphan file size is not too big") Cc: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-ID: <20251007134936.7291-2-jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:57 +02:00
Huacai Chen	f775f821de	ACPICA: Allow to skip Global Lock initialization commit feb8ae81b2378b75a99c81d315602ac8918ed382 upstream. Introduce acpi_gbl_use_global_lock, which allows to skip the Global Lock initialization. This is useful for systems without Global Lock (such as loong_arch), so as to avoid error messages during boot phase: ACPI Error: Could not enable global_lock event (20240827/evxfevnt-182) ACPI Error: No response from Global Lock hardware, disabling lock (20240827/evglock-59) Link: https://github.com/acpica/acpica/commit/463cb0fe Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:56 +02:00
Deepanshu Kartikey	720a66fdaa	ext4: validate ea_ino and size in check_xattrs commit 44d2a72f4d64655f906ba47a5e108733f59e6f28 upstream. During xattr block validation, check_xattrs() processes xattr entries without validating that entries claiming to use EA inodes have non-zero sizes. Corrupted filesystems may contain xattr entries where e_value_size is zero but e_value_inum is non-zero, indicating invalid xattr data. Add validation in check_xattrs() to detect this corruption pattern early and return -EFSCORRUPTED, preventing invalid xattr entries from causing issues throughout the ext4 codebase. Cc: stable@kernel.org Suggested-by: Theodore Ts'o <tytso@mit.edu> Reported-by: syzbot+4c9d23743a2409b80293@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=4c9d23743a2409b80293 Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Message-ID: <20250923133245.1091761-1-kartikey406@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:56 +02:00
Ahmet Eray Karadag	79ea7f3e11	ext4: guard against EA inode refcount underflow in xattr update commit 57295e835408d8d425bef58da5253465db3d6888 upstream. syzkaller found a path where ext4_xattr_inode_update_ref() reads an EA inode refcount that is already <= 0 and then applies ref_change (often -1). That lets the refcount underflow and we proceed with a bogus value, triggering errors like: EXT4-fs error: EA inode <n> ref underflow: ref_count=-1 ref_change=-1 EXT4-fs warning: ea_inode dec ref err=-117 Make the invariant explicit: if the current refcount is non-positive, treat this as on-disk corruption, emit ext4_error_inode(), and fail the operation with -EFSCORRUPTED instead of updating the refcount. Delete the WARN_ONCE() as negative refcounts are now impossible; keep error reporting in ext4_error_inode(). This prevents the underflow and the follow-on orphan/cleanup churn. Reported-by: syzbot+0be4f339a8218d2a5bb1@syzkaller.appspotmail.com Fixes: https://syzbot.org/bug?extid=0be4f339a8218d2a5bb1 Cc: stable@kernel.org Co-developed-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com> Signed-off-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com> Signed-off-by: Ahmet Eray Karadag <eraykrdg1@gmail.com> Message-ID: <20250920021342.45575-1-eraykrdg1@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:56 +02:00
Zhang Yi	9e642ab8e5	ext4: fix an off-by-one issue during moving extents commit 12e803c8827d049ae8f2c743ef66ab87ae898375 upstream. During the movement of a written extent, mext_page_mkuptodate() is called to read data in the range [from, to) into the page cache and to update the corresponding buffers. Therefore, we should not wait on any buffer whose start offset is >= 'to'. Otherwise, it will return -EIO and fail the extents movement. $ for i in `seq 3 -1 0`; \ do xfs_io -fs -c "pwrite -b 1024 $((i * 1024)) 1024" /mnt/foo; \ done $ umount /mnt && mount /dev/pmem1s /mnt # drop cache $ e4defrag /mnt/foo e4defrag 1.47.0 (5-Feb-2023) ext4 defragmentation for /mnt/foo [1/1]/mnt/foo: 0% [ NG ] Success: [0/1] Cc: stable@kernel.org Fixes: a40759fb16ae ("ext4: remove array of buffer_heads from mext_page_mkuptodate()") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20250912105841.1886799-1-yi.zhang@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:56 +02:00
Ojaswin Mujoo	d1e681c0bb	ext4: correctly handle queries for metadata mappings commit 46c22a8bb4cb03211da1100d7ee4a2005bf77c70 upstream. Currently, our handling of metadata is _ambiguous_ in some scenarios, that is, we end up returning unknown if the range only covers the mapping partially. For example, in the following case: $ xfs_io -c fsmap -d 0: 254:16 [0..7]: static fs metadata 8 1: 254:16 [8..15]: special 102:1 8 2: 254:16 [16..5127]: special 102:2 5112 3: 254:16 [5128..5255]: special 102:3 128 4: 254:16 [5256..5383]: special 102:4 128 5: 254:16 [5384..70919]: inodes 65536 6: 254:16 [70920..70967]: unknown 48 ... $ xfs_io -c fsmap -d 24 33 0: 254:16 [24..39]: unknown 16 <--- incomplete reporting $ xfs_io -c fsmap -d 24 33 (With patch) 0: 254:16 [16..5127]: special 102:2 5112 This is because earlier in ext4_getfsmap_meta_helper, we end up ignoring any extent that starts before our queried range, but overlaps it. While the man page [1] is a bit ambiguous on this, this fix makes the output make more sense since we are anyways returning an "unknown" extent. This is also consistent to how XFS does it: $ xfs_io -c fsmap -d ... 6: 254:16 [104..127]: free space 24 7: 254:16 [128..191]: inodes 64 ... $ xfs_io -c fsmap -d 137 150 0: 254:16 [128..191]: inodes 64 <-- full extent returned [1] https://man7.org/linux/man-pages/man2/ioctl_getfsmap.2.html Reported-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: stable@kernel.org Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <023f37e35ee280cd9baac0296cbadcbe10995cab.1757058211.git.ojaswin@linux.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:56 +02:00
Yongjian Sun	871b6894a3	ext4: increase i_disksize to offset + len in ext4_update_disksize_before_punch() commit 9d80eaa1a1d37539224982b76c9ceeee736510b9 upstream. After running a stress test combined with fault injection, we performed fsck -a followed by fsck -fn on the filesystem image. During the second pass, fsck -fn reported: Inode 131512, end of extent exceeds allowed value (logical block 405, physical block 1180540, len 2) This inode was not in the orphan list. Analysis revealed the following call chain that leads to the inconsistency: ext4_da_write_end() //does not update i_disksize ext4_punch_hole() //truncate folio, keep size ext4_page_mkwrite() ext4_block_page_mkwrite() ext4_block_write_begin() ext4_get_block() //insert written extent without update i_disksize journal commit echo 1 > /sys/block/xxx/device/delete da-write path updates i_size but does not update i_disksize. Then ext4_punch_hole truncates the da-folio yet still leaves i_disksize unchanged(in the ext4_update_disksize_before_punch function, the condition offset + len < size is met). Then ext4_page_mkwrite sees ext4_nonda_switch return 1 and takes the nodioread_nolock path, the folio about to be written has just been punched out, and it’s offset sits beyond the current i_disksize. This may result in a written extent being inserted, but again does not update i_disksize. If the journal gets committed and then the block device is yanked, we might run into this. It should be noted that replacing ext4_punch_hole with ext4_zero_range in the call sequence may also trigger this issue, as neither will update i_disksize under these circumstances. To fix this, we can modify ext4_update_disksize_before_punch to increase i_disksize to min(i_size, offset + len) when both i_size and (offset + len) are greater than i_disksize. Cc: stable@kernel.org Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Baokun Li <libaokun1@huawei.com> Message-ID: <20250911133024.1841027-1-sunyongjian@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:56 +02:00
Jan Kara	304fc34ff6	ext4: verify orphan file size is not too big commit 0a6ce20c156442a4ce2a404747bb0fb05d54eeb3 upstream. In principle orphan file can be arbitrarily large. However orphan replay needs to traverse it all and we also pin all its buffers in memory. Thus filesystems with absurdly large orphan files can lead to big amounts of memory consumed. Limit orphan file size to a sane value and also use kvmalloc() for allocating array of block descriptor structures to avoid large order allocations for sane but large orphan files. Reported-by: syzbot+0b92850d68d9b12934f5@syzkaller.appspotmail.com Fixes: `02f310fcf4` ("ext4: Speedup ext4 orphan inode handling") Cc: stable@kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-ID: <20250909112206.10459-2-jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:56 +02:00
Olga Kornievskaia	e7e0e3eae0	nfsd: nfserr_jukebox in nlm_fopen should lead to a retry commit a082e4b4d08a4a0e656d90c2c05da85f23e6d0c9 upstream. When v3 NLM request finds a conflicting delegation, it triggers a delegation recall and nfsd_open fails with EAGAIN. nfsd_open then translates EAGAIN into nfserr_jukebox. In nlm_fopen, instead of returning nlm_failed for when there is a conflicting delegation, drop this NLM request so that the client retries. Once delegation is recalled and if a local lock is claimed, a retry would lead to nfsd returning a nlm_lck_blocked error or a successful nlm lock. Fixes: `d343fce148` ("[PATCH] knfsd: Allow lockd to drop replies as appropriate") Cc: stable@vger.kernel.org # v6.6 Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:56 +02:00
Thorsten Blum	925ed83efb	NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul() commit ab1c282c010c4f327bd7addc3c0035fd8e3c1721 upstream. Commit `5304877936` ("NFSD: Fix strncpy() fortify warning") replaced strncpy(,, sizeof(..)) with strlcpy(,, sizeof(..) - 1), but strlcpy() already guaranteed NUL-termination of the destination buffer and subtracting one byte potentially truncated the source string. The incorrect size was then carried over in commit `72f78ae00a` ("NFSD: move from strlcpy with unused retval to strscpy") when switching from strlcpy() to strscpy(). Fix this off-by-one error by using the full size of the destination buffer again. Cc: stable@vger.kernel.org Fixes: `5304877936` ("NFSD: Fix strncpy() fortify warning") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:56 +02:00
SeongJae Park	677ebfe5d0	mm/damon/vaddr: do not repeat pte_offset_map_lock() until success commit b93af2cc8e036754c0d9970d9ddc47f43cc94b9f upstream. DAMON's virtual address space operation set implementation (vaddr) calls pte_offset_map_lock() inside the page table walk callback function. This is for reading and writing page table accessed bits. If pte_offset_map_lock() fails, it retries by returning the page table walk callback function with ACTION_AGAIN. pte_offset_map_lock() can continuously fail if the target is a pmd migration entry, though. Hence it could cause an infinite page table walk if the migration cannot be done until the page table walk is finished. This indeed caused a soft lockup when CPU hotplugging and DAMON were running in parallel. Avoid the infinite loop by simply not retrying the page table walk. DAMON is promising only a best-effort accuracy, so missing access to such pages is no problem. Link: https://lkml.kernel.org/r/20250930004410.55228-1-sj@kernel.org Fixes: `7780d04046` ("mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Xinyu Zheng <zhengxinyu6@huawei.com> Closes: https://lore.kernel.org/20250918030029.2652607-1-zhengxinyu6@huawei.com Acked-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> [6.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:55 +02:00
Li RongQing	bb5ef60ee8	mm/hugetlb: early exit from hugetlb_pages_alloc_boot() when max_huge_pages=0 commit b322e88b3d553e85b4e15779491c70022783faa4 upstream. Optimize hugetlb_pages_alloc_boot() to return immediately when max_huge_pages is 0, avoiding unnecessary CPU cycles and the below log message when hugepages aren't configured in the kernel command line. [ 3.702280] HugeTLB: allocation took 0ms with hugepage_allocation_threads=32 Link: https://lkml.kernel.org/r/20250814102333.4428-1-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Tested-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:55 +02:00
Thadeu Lima de Souza Cascardo	81a6d6011a	mm/page_alloc: only set ALLOC_HIGHATOMIC for __GPF_HIGH allocations commit 6a204d4b14c99232e05d35305c27ebce1c009840 upstream. Commit `524c48072e` ("mm/page_alloc: rename ALLOC_HIGH to ALLOC_MIN_RESERVE") is the start of a series that explains how __GFP_HIGH, which implies ALLOC_MIN_RESERVE, is going to be used instead of __GFP_ATOMIC for high atomic reserves. Commit `eb2e2b425c` ("mm/page_alloc: explicitly record high-order atomic allocations in alloc_flags") introduced ALLOC_HIGHATOMIC for such allocations of order higher than 0. It still used __GFP_ATOMIC, though. Then, commit `1ebbb21811` ("mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations accesses reserves") just turned that check for !__GFP_DIRECT_RECLAIM, ignoring that high atomic reserves were expected to test for __GFP_HIGH. This leads to high atomic reserves being added for high-order GFP_NOWAIT allocations and others that clear __GFP_DIRECT_RECLAIM, which is unexpected. Later, those reserves lead to 0-order allocations going to the slow path and starting reclaim. From /proc/pagetypeinfo, without the patch: Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type HighAtomic 1 8 10 9 7 3 0 0 0 0 0 Node 0, zone Normal, type HighAtomic 64 20 12 5 0 0 0 0 0 0 0 With the patch: Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Link: https://lkml.kernel.org/r/20250814172245.1259625-1-cascardo@igalia.com Fixes: `1ebbb21811` ("mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations accesses reserves") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Tested-by: Helen Koike <koike@igalia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:30:55 +02:00

1 2 3 4 5 ...

1237731 Commits