linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-07 03:15:31 +09:00

Author	SHA1	Message	Date
Huang Ying	dde5e5343d	mm: restrict the pcp batch scale factor to avoid too long latency [ Upstream commit 52166607ecc980391b1fffbce0be3074a96d0c7b ] In page allocator, PCP (Per-CPU Pageset) is refilled and drained in batches to increase page allocation throughput, reduce page allocation/freeing latency per page, and reduce zone lock contention. But too large batch size will cause too long maximal allocation/freeing latency, which may punish arbitrary users. So the default batch size is chosen carefully (in zone_batchsize(), the value is 63 for zone > 1GB) to avoid that. In commit `3b12e7e979` ("mm/page_alloc: scale the number of pages that are batch freed"), the batch size will be scaled for large number of page freeing to improve page freeing performance and reduce zone lock contention. Similar optimization can be used for large number of pages allocation too. To find out a suitable max batch scale factor (that is, max effective batch size), some tests and measurement on some machines were done as follows. A set of debug patches are implemented as follows, - Set PCP high to be 2 * batch to reduce the effect of PCP high - Disable free batch size scaling to get the raw performance. - The code with zone lock held is extracted from rmqueue_bulk() and free_pcppages_bulk() to 2 separate functions to make it easy to measure the function run time with ftrace function_graph tracer. - The batch size is hard coded to be 63 (default), 127, 255, 511, 1023, 2047, 4095. Then will-it-scale/page_fault1 is used to generate the page allocation/freeing workload. The page allocation/freeing throughput (page/s) is measured via will-it-scale. The page allocation/freeing average latency (alloc/free latency avg, in us) and allocation/freeing latency at 99 percentile (alloc/free latency 99%, in us) are measured with ftrace function_graph tracer. The test results are as follows, Sapphire Rapids Server ====================== Batch throughput free latency free latency alloc latency alloc latency page/s avg / us 99% / us avg / us 99% / us ----- ---------- ------------ ------------ ------------- ------------- 63 513633.4 2.33 3.57 2.67 6.83 127 517616.7 4.35 6.65 4.22 13.03 255 520822.8 8.29 13.32 7.52 25.24 511 524122.0 15.79 23.42 14.02 49.35 1023 525980.5 30.25 44.19 25.36 94.88 2047 526793.6 59.39 84.50 45.22 140.81 Ice Lake Server =============== Batch throughput free latency free latency alloc latency alloc latency page/s avg / us 99% / us avg / us 99% / us ----- ---------- ------------ ------------ ------------- ------------- 63 620210.3 2.21 3.68 2.02 4.35 127 627003.0 4.09 6.86 3.51 8.28 255 630777.5 7.70 13.50 6.17 15.97 511 633651.5 14.85 22.62 11.66 31.08 1023 637071.1 28.55 42.02 20.81 54.36 2047 638089.7 56.54 84.06 39.28 91.68 Cascade Lake Server =================== Batch throughput free latency free latency alloc latency alloc latency page/s avg / us 99% / us avg / us 99% / us ----- ---------- ------------ ------------ ------------- ------------- 63 404706.7 3.29 5.03 3.53 4.75 127 422475.2 6.12 9.09 6.36 8.76 255 411522.2 11.68 16.97 10.90 16.39 511 428124.1 22.54 31.28 19.86 32.25 1023 414718.4 43.39 62.52 40.00 66.33 2047 429848.7 86.64 120.34 71.14 106.08 Commet Lake Desktop =================== Batch throughput free latency free latency alloc latency alloc latency page/s avg / us 99% / us avg / us 99% / us ----- ---------- ------------ ------------ ------------- ------------- 63 795183.13 2.18 3.55 2.03 3.05 127 803067.85 3.91 6.56 3.85 5.52 255 812771.10 7.35 10.80 7.14 10.20 511 817723.48 14.17 27.54 13.43 30.31 1023 818870.19 27.72 40.10 27.89 46.28 Coffee Lake Desktop =================== Batch throughput free latency free latency alloc latency alloc latency page/s avg / us 99% / us avg / us 99% / us ----- ---------- ------------ ------------ ------------- ------------- 63 510542.8 3.13 4.40 2.48 3.43 127 514288.6 5.97 7.89 4.65 6.04 255 516889.7 11.86 15.58 8.96 12.55 511 519802.4 23.10 28.81 16.95 26.19 1023 520802.7 45.30 52.51 33.19 45.95 2047 519997.1 90.63 104.00 65.26 81.74 From the above data, to restrict the allocation/freeing latency to be less than 100 us in most times, the max batch scale factor needs to be less than or equal to 5. Although it is reasonable to use 5 as max batch scale factor for the systems tested, there are also slower systems. Where smaller value should be used to constrain the page allocation/freeing latency. So, in this patch, a new kconfig option (PCP_BATCH_SCALE_MAX) is added to set the max batch scale factor. Whose default value is 5, and users can reduce it when necessary. Link: https://lkml.kernel.org/r/20231016053002.756205-5-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 66eca1021a42 ("mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:16 +02:00
Thomas Zimmermann	340bbe90cc	fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes [ Upstream commit c2bc958b2b03e361f14df99983bc64a39a7323a3 ] Test the vesa_attributes field in struct screen_info for compatibility with VGA hardware. Vesafb currently tests bit 1 in screen_info's capabilities field which indicates a 64-bit lfb address and is unrelated to VGA compatibility. Section 4.4 of the Vesa VBE 2.0 specifications defines that bit 5 in the mode's attributes field signals VGA compatibility. The mode is compatible with VGA hardware if the bit is clear. In that case, the driver can access VGA state of the VBE's underlying hardware. The vesafb driver uses this feature to program the color LUT in palette modes. Without, colors might be incorrect. The problem got introduced in commit `89ec4c238e` ("[PATCH] vesafb: Fix incorrect logo colors in x86_64"). It incorrectly stores the mode attributes in the screen_info's capabilities field and updates vesafb accordingly. Later, commit `5e8ddcbe86` ("Video mode probing support for the new x86 setup code") fixed the screen_info, but did not update vesafb. Color output still tends to work, because bit 1 in capabilities is usually 0. Besides fixing the bug in vesafb, this commit introduces a helper that reads the correct bit from screen_info. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: `5e8ddcbe86` ("Video mode probing support for the new x86 setup code") Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Cc: <stable@vger.kernel.org> # v2.6.23+ Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:16 +02:00
Thomas Zimmermann	a168da3182	firmware/sysfb: Update screen_info for relocated EFI framebuffers [ Upstream commit 78aa89d1dfba1e3cf4a2e053afa3b4c4ec622371 ] On ARM PCI systems, the PCI hierarchy might be reconfigured during boot and the firmware framebuffer might move as a result of that. The values in screen_info will then be invalid. Work around this problem by tracking the framebuffer's initial location before it get relocated; then fix the screen_info state between reloaction and creating the firmware framebuffer's device. This functionality has been lifted from efifb. See the commit message of commit `55d728a40d` ("efi/fb: Avoid reconfiguration of BAR that covers the framebuffer") for more information. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240212090736.11464-8-tzimmermann@suse.de Stable-dep-of: c2bc958b2b03 ("fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:16 +02:00
Thomas Zimmermann	f5dce77f3f	video: Provide screen_info_get_pci_dev() to find screen_info's PCI device [ Upstream commit 036105e3a776b6fc2fe0d262896a23ff2cc2e6b1 ] Add screen_info_get_pci_dev() to find the PCI device of an instance of screen_info. Does nothing on systems without PCI bus. v3: * search PCI device with pci_get_base_class() (Sui) v2: * remove ret from screen_info_pci_dev() (Javier) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240212090736.11464-3-tzimmermann@suse.de Stable-dep-of: c2bc958b2b03 ("fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Thomas Zimmermann	5b4d995dfd	video: Add helpers for decoding screen_info [ Upstream commit 75fa9b7e375e35739663cde0252d31e586c6314a ] The plain values as stored in struct screen_info need to be decoded before being used. Add helpers that decode the type of video output and the framebuffer I/O aperture. Old or non-x86 systems may not set the type of video directly, but only indicate the presence by storing 0x01 in orig_video_isVGA. The decoding logic in screen_info_video_type() takes this into account. It then follows similar code in vgacon's vgacon_startup() to detect the video type from the given values. A call to screen_info_resources() returns all known resources of the given screen_info. The resources' values have been taken from existing code in vgacon and vga16fb. These drivers can later be converted to use the new interfaces. v2: * return ssize_t from screen_info_resources() * don't call __screen_info_has_lfb() unnecessarily Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240212090736.11464-2-tzimmermann@suse.de Stable-dep-of: c2bc958b2b03 ("fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Thomas Zimmermann	bab0a82854	fbdev/vesafb: Replace references to global screen_info by local pointer [ Upstream commit 3218286bbb78cac3dde713514529e0480d678173 ] Get the global screen_info's address once and access the data via this pointer. Limits the use of global state. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231206135153.2599-4-tzimmermann@suse.de Stable-dep-of: c2bc958b2b03 ("fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Sui Jingfeng	ccab04dc57	PCI: Add pci_get_base_class() helper [ Upstream commit d427da2323b093a65d8317783e76ab8fad2e2ef0 ] There is no function to get all PCI devices in a system by matching against the base class code only, ignoring the sub-class code and the programming interface. Add pci_get_base_class() to suit the need. For example, if a driver wants to process all PCI display devices in a system, it can do so like this: pdev = NULL; while ((pdev = pci_get_base_class(PCI_BASE_CLASS_DISPLAY, pdev))) { do_something_for_pci_display_device(pdev); } Link: https://lore.kernel.org/r/20230825062714.6325-2-sui.jingfeng@linux.dev Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn> [bhelgaas: reword commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: c2bc958b2b03 ("fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Sean Christopherson	43e73206cf	KVM: nVMX: Check for pending posted interrupts when looking for nested events [ Upstream commit 27c4fa42b11af780d49ce704f7fa67b3c2544df4 ] Check for pending (and notified!) posted interrupts when checking if L2 has a pending wake event, as fully posted/notified virtual interrupt is a valid wake event for HLT. Note that KVM must check vmx->nested.pi_pending to avoid prematurely waking L2, e.g. even if KVM sees a non-zero PID.PIR and PID.0N=1, the virtual interrupt won't actually be recognized until a notification IRQ is received by the vCPU or the vCPU does (nested) VM-Enter. Fixes: `26844fee6a` ("KVM: x86: never write to memory from kvm_vcpu_check_block()") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Reported-by: Jim Mattson <jmattson@google.com> Closes: https://lore.kernel.org/all/20231207010302.2240506-1-jmattson@google.com Link: https://lore.kernel.org/r/20240607172609.3205077-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Sean Christopherson	459403bc66	KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vector [ Upstream commit d83c36d822be44db4bad0c43bea99c8908f54117 ] Add a helper to retrieve the highest pending vector given a Posted Interrupt descriptor. While the actual operation is straightforward, it's surprisingly easy to mess up, e.g. if one tries to reuse lapic.c's find_highest_vector(), which doesn't work with PID.PIR due to the APIC's IRR and ISR component registers being physically discontiguous (they're 4-byte registers aligned at 16-byte intervals). To make PIR handling more consistent with respect to IRR and ISR handling, return -1 to indicate "no interrupt pending". Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240607172609.3205077-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Jacob Pan	65b2514e03	KVM: VMX: Move posted interrupt descriptor out of VMX code [ Upstream commit 699f67512f04cbaee965fad872702c06eaf440f6 ] To prepare native usage of posted interrupts, move the PID declarations out of VMX code such that they can be shared. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20240423174114.526704-2-jacob.jun.pan@linux.intel.com Stable-dep-of: d83c36d822be ("KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vector") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Vitaly Kuznetsov	ebfed7bebd	KVM: VMX: Split off vmx_onhyperv.{ch} from hyperv.{ch} [ Upstream commit 50a82b0eb88c108d1ebc73a97f5b81df0d5918e0 ] hyperv.{ch} is currently a mix of stuff which is needed by both Hyper-V on KVM and KVM on Hyper-V. As a preparation to making Hyper-V emulation optional, put KVM-on-Hyper-V specific code into dedicated files. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20231205103630.1391318-4-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: d83c36d822be ("KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vector") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Thomas Weißschuh	93ac74cd6f	leds: triggers: Flush pending brightness before activating trigger [ Upstream commit ab477b766edd3bfb6321a6e3df4c790612613fae ] The race fixed in timer_trig_activate() between a blocking set_brightness() call and trigger->activate() can affect any trigger. So move the call to flush_work() into led_trigger_set() where it can avoid the race for all triggers. Fixes: `0db37915d9` ("leds: avoid races with workqueue") Fixes: `8c0f693c6e` ("leds: avoid flush_work in atomic context") Cc: stable@vger.kernel.org Tested-by: Dustin L. Howett <dustin@howett.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20240613-led-trigger-flush-v2-1-f4f970799d77@weissschuh.net Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Hans de Goede	9ce3c14f0d	leds: trigger: Call synchronize_rcu() before calling trig->activate() [ Upstream commit b1bbd20f35e19774ea01989320495e09ac44fba3 ] Some triggers call led_trigger_event() from their activate() callback to initialize the brightness of the LED for which the trigger is being activated. In order for the LED's initial state to be set correctly this requires that the led_trigger_event() call uses the new version of trigger->led_cdevs, which has the new LED. AFAICT led_trigger_event() will always use the new version when it is running on the same CPU as where the list_add_tail_rcu() call was made, which is why the missing synchronize_rcu() has not lead to bug reports. But if activate() is pre-empted, sleeps or uses a worker then the led_trigger_event() call may run on another CPU which may still use the old trigger->led_cdevs list. Add a synchronize_rcu() call to ensure that any led_trigger_event() calls done from activate() always use the new list. Triggers using led_trigger_event() from their activate() callback are: net/bluetooth/leds.c, net/rfkill/core.c and drivers/tty/vt/keyboard.c. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240531120124.75662-1-hdegoede@redhat.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: ab477b766edd ("leds: triggers: Flush pending brightness before activating trigger") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Heiner Kallweit	587cf9c0f7	leds: trigger: Store brightness set by led_trigger_event() [ Upstream commit 822c91e72eac568ed8d83765634f00decb45666c ] If a simple trigger is assigned to a LED, then the LED may be off until the next led_trigger_event() call. This may be an issue for simple triggers with rare led_trigger_event() calls, e.g. power supply charging indicators (drivers/power/supply/power_supply_leds.c). Therefore persist the brightness value of the last led_trigger_event() call and use this value if the trigger is assigned to a LED. In addition add a getter for the trigger brightness value. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/b1358b25-3f30-458d-8240-5705ae007a8a@gmail.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: ab477b766edd ("leds: triggers: Flush pending brightness before activating trigger") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Heiner Kallweit	73a26eada5	leds: trigger: Remove unused function led_trigger_rename_static() [ Upstream commit c82a1662d4548c454de5343b88f69b9fc82266b3 ] This function was added with `a8df7b1ab7` ("leds: add led_trigger_rename function") 11 yrs ago, but it has no users. So remove it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/d90f30be-f661-4db7-b0b5-d09d07a78a68@gmail.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: ab477b766edd ("leds: triggers: Flush pending brightness before activating trigger") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Javier Carrasco	e3fd01a810	cpufreq: qcom-nvmem: fix memory leaks in probe error paths [ Upstream commit d01c84b97f19f1137211e90b0a910289a560019e ] The code refactoring added new error paths between the np device node allocation and the call to of_node_put(), which leads to memory leaks if any of those errors occur. Add the missing of_node_put() in the error paths that require it. Cc: stable@vger.kernel.org Fixes: `57f2f8b4aa` ("cpufreq: qcom: Refactor the driver to make it easier to extend") Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Stephan Gerhold	51a45209a8	cpufreq: qcom-nvmem: Simplify driver data allocation [ Upstream commit 2a5d46c3ad6b0e62d2b04356ad999d504fb564e0 ] Simplify the allocation and cleanup of driver data by using devm together with a flexible array. Prepare for adding additional per-CPU data by defining a struct qcom_cpufreq_drv_cpu instead of storing the opp_tokens directly. Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Stable-dep-of: d01c84b97f19 ("cpufreq: qcom-nvmem: fix memory leaks in probe error paths") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Zhang Yi	df7363307e	ext4: check the extent status again before inserting delalloc block [ Upstream commit 0ea6560abb3bac1ffcfa4bf6b2c4d344fdc27b3c ] ext4_da_map_blocks looks up for any extent entry in the extent status tree (w/o i_data_sem) and then the looks up for any ondisk extent mapping (with i_data_sem in read mode). If it finds a hole in the extent status tree or if it couldn't find any entry at all, it then takes the i_data_sem in write mode to add a da entry into the extent status tree. This can actually race with page mkwrite & fallocate path. Note that this is ok between 1. ext4 buffered-write path v/s ext4_page_mkwrite(), because of the folio lock 2. ext4 buffered write path v/s ext4 fallocate because of the inode lock. But this can race between ext4_page_mkwrite() & ext4 fallocate path ext4_page_mkwrite() ext4_fallocate() block_page_mkwrite() ext4_da_map_blocks() //find hole in extent status tree ext4_alloc_file_blocks() ext4_map_blocks() //allocate block and unwritten extent ext4_insert_delayed_block() ext4_da_reserve_space() //reserve one more block ext4_es_insert_delayed_block() //drop unwritten extent and add delayed extent by mistake Then, the delalloc extent is wrong until writeback and the extra reserved block can't be released any more and it triggers below warning: EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared! Fix the problem by looking up extent status tree again while the i_data_sem is held in write mode. If it still can't find any entry, then we insert a new da entry into the extent status tree. Cc: stable@vger.kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240517124005.347221-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Zhang Yi	f12fbb9599	ext4: factor out a common helper to query extent map [ Upstream commit 8e4e5cdf2fdeb99445a468b6b6436ad79b9ecb30 ] Factor out a new common helper ext4_map_query_blocks() from the ext4_da_map_blocks(), it query and return the extent map status on the inode's extent path, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/20240517124005.347221-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 0ea6560abb3b ("ext4: check the extent status again before inserting delalloc block") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Zhang Yi	c6cba59072	ext4: convert to exclusive lock while inserting delalloc extents [ Upstream commit acf795dc161f3cf481db20f05db4250714e375e5 ] ext4_da_map_blocks() only hold i_data_sem in shared mode and i_rwsem when inserting delalloc extents, it could be raced by another querying path of ext4_map_blocks() without i_rwsem, .e.g buffered read path. Suppose we buffered read a file containing just a hole, and without any cached extents tree, then it is raced by another delayed buffered write to the same area or the near area belongs to the same hole, and the new delalloc extent could be overwritten to a hole extent. pread() pwrite() filemap_read_folio() ext4_mpage_readpages() ext4_map_blocks() down_read(i_data_sem) ext4_ext_determine_hole() //find hole ext4_ext_put_gap_in_cache() ext4_es_find_extent_range() //no delalloc extent ext4_da_map_blocks() down_read(i_data_sem) ext4_insert_delayed_block() //insert delalloc extent ext4_es_insert_extent() //overwrite delalloc extent to hole This race could lead to inconsistent delalloc extents tree and incorrect reserved space counter. Fix this by converting to hold i_data_sem in exclusive mode when adding a new delalloc extent in ext4_da_map_blocks(). Cc: stable@vger.kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 0ea6560abb3b ("ext4: check the extent status again before inserting delalloc block") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Zhang Yi	7849e9b5ba	ext4: refactor ext4_da_map_blocks() [ Upstream commit 3fcc2b887a1ba4c1f45319cd8c54daa263ecbc36 ] Refactor and cleanup ext4_da_map_blocks(), reduce some unnecessary parameters and branches, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 0ea6560abb3b ("ext4: check the extent status again before inserting delalloc block") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Thomas Weißschuh	ffde3af4b2	sysctl: always initialize i_uid/i_gid [ Upstream commit 98ca62ba9e2be5863c7d069f84f7166b45a5b2f4 ] Always initialize i_uid/i_gid inside the sysfs core so set_ownership() can safely skip setting them. Commit `5ec27ec735` ("fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.") added defaults for i_uid/i_gid when set_ownership() was not implemented. It also missed adjusting net_ctl_set_ownership() to use the same default values in case the computation of a better value failed. Fixes: `5ec27ec735` ("fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Thomas Weißschuh	96f1d909cd	sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table) [ Upstream commit 520713a93d550406dae14d49cdb8778d70cecdfd ] Remove the 'table' argument from set_ownership as it is never used. This change is a step towards putting "struct ctl_table" into .rodata and eventually having sysctl core only use "const struct ctl_table". The patch was created with the following coccinelle script: @@ identifier func, head, table, uid, gid; @@ void func( struct ctl_table_header head, - struct ctl_table table, kuid_t uid, kgid_t gid) { ... } No additional occurrences of 'set_ownership' were found after doing a tree-wide search. Reviewed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Joel Granados <j.granados@samsung.com> Stable-dep-of: 98ca62ba9e2b ("sysctl: always initialize i_uid/i_gid") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Alexey Gladkov	13886221ad	sysctl: allow to change limits for posix messages queues [ Upstream commit f9436a5d0497f759330d07e1189565edd4456be8 ] All parameters of posix messages queues (queues_max/msg_max/msgsize_max) end up being limited by RLIMIT_MSGQUEUE. The code in mqueue_get_inode is where that limiting happens. The RLIMIT_MSGQUEUE is bound to the user namespace and is counted hierarchically. We can allow root in the user namespace to modify the posix messages queues parameters. Link: https://lkml.kernel.org/r/6ad67f23d1459a4f4339f74aa73bac0ecf3995e1.1705333426.git.legion@kernel.org Signed-off-by: Alexey Gladkov <legion@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Link: https://lkml.kernel.org/r/7eb21211c8622e91d226e63416b1b93c079f60ee.1663756794.git.legion@kernel.org Cc: Christian Brauner <brauner@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Joel Granados <joel.granados@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 98ca62ba9e2b ("sysctl: always initialize i_uid/i_gid") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Alexey Gladkov	8d5b1a9ff8	sysctl: allow change system v ipc sysctls inside ipc namespace [ Upstream commit 50ec499b9a43e46200c9f7b7d723ab2e4af540b3 ] Patch series "Allow to change ipc/mq sysctls inside ipc namespace", v3. Right now ipc and mq limits count as per ipc namespace, but only real root can change them. By default, the current values of these limits are such that it can only be reduced. Since only root can change the values, it is impossible to reduce these limits in the rootless container. We can allow limit changes within ipc namespace because mq parameters are limited by RLIMIT_MSGQUEUE and ipc parameters are not limited to anything other than cgroups. This patch (of 3): Rootless containers are not allowed to modify kernel IPC parameters. All default limits are set to such high values that in fact there are no limits at all. All limits are not inherited and are initialized to default values when a new ipc_namespace is created. For new ipc_namespace: size_t ipc_ns.shm_ctlmax = SHMMAX; // (ULONG_MAX - (1UL << 24)) size_t ipc_ns.shm_ctlall = SHMALL; // (ULONG_MAX - (1UL << 24)) int ipc_ns.shm_ctlmni = IPCMNI; // (1 << 15) int ipc_ns.shm_rmid_forced = 0; unsigned int ipc_ns.msg_ctlmax = MSGMAX; // 8192 unsigned int ipc_ns.msg_ctlmni = MSGMNI; // 32000 unsigned int ipc_ns.msg_ctlmnb = MSGMNB; // 16384 The shm_tot (total amount of shared pages) has also ceased to be global, it is located in ipc_namespace and is not inherited from anywhere. In such conditions, it cannot be said that these limits limit anything. The real limiter for them is cgroups. If we allow rootless containers to change these parameters, then it can only be reduced. Link: https://lkml.kernel.org/r/cover.1705333426.git.legion@kernel.org Link: https://lkml.kernel.org/r/d2f4603305cbfed58a24755aa61d027314b73a45.1705333426.git.legion@kernel.org Signed-off-by: Alexey Gladkov <legion@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Link: https://lkml.kernel.org/r/e2d84d3ec0172cfff759e6065da84ce0cc2736f8.1663756794.git.legion@kernel.org Cc: Christian Brauner <brauner@kernel.org> Cc: Joel Granados <joel.granados@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 98ca62ba9e2b ("sysctl: always initialize i_uid/i_gid") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Krzysztof Kozlowski	34e788045d	thermal/drivers/broadcom: Fix race between removal and clock disable [ Upstream commit e90c369cc2ffcf7145a46448de101f715a1f5584 ] During the probe, driver enables clocks necessary to access registers (in get_temp()) and then registers thermal zone with managed-resources (devm) interface. Removal of device is not done in reversed order, because: 1. Clock will be disabled in driver remove() callback - thermal zone is still registered and accessible to users, 2. devm interface will unregister thermal zone. This leaves short window between (1) and (2) for accessing the get_temp() callback with disabled clock. Fix this by enabling clock also via devm-interface, so entire cleanup path will be in proper, reversed order. Fixes: `8454c8c09c` ("thermal/drivers/bcm2835: Remove buggy call to thermal_of_zone_unregister") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-1-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Uwe Kleine-König	103881e636	thermal: bcm2835: Convert to platform remove callback returning void [ Upstream commit f29ecd3748a28d0b52512afc81b3c13fd4a00c9b ] The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: e90c369cc2ff ("thermal/drivers/broadcom: Fix race between removal and clock disable") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Krishna Kurapati	0b4e4da51e	arm64: dts: qcom: sdm845: Disable SS instance in Parkmode for USB [ Upstream commit cf4d6d54eadb60d2ee4d31c9d92299f5e8dcb55c ] For Gen-1 targets like SDM845, it is seen that stressing out the controller in host mode results in HC died error: xhci-hcd.12.auto: xHCI host not responding to stop endpoint command xhci-hcd.12.auto: xHCI host controller not responding, assume dead xhci-hcd.12.auto: HC died; cleaning up And at this instant only restarting the host mode fixes it. Disable SuperSpeed instance in park mode for SDM845 to mitigate this issue. Cc: stable@vger.kernel.org Fixes: `ca4db2b538` ("arm64: dts: qcom: sdm845: Add USB-related nodes") Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20240704152848.3380602-9-quic_kriskura@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Dmitry Baryshkov	a27753e685	arm64: dts: qcom: sdm845: switch USB QMP PHY to new style of bindings [ Upstream commit ca5ca568d7388b38039c8d658735fc539352b1db ] Change the USB QMP PHY to use newer style of QMP PHY bindings (single resource region, no per-PHY subnodes). Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230824211952.1397699-12-dmitry.baryshkov@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: cf4d6d54eadb ("arm64: dts: qcom: sdm845: Disable SS instance in Parkmode for USB") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Dmitry Baryshkov	affc4de945	arm64: dts: qcom: sdm845: switch USB+DP QMP PHY to new style of bindings [ Upstream commit a9ecdec45a3a59057a68cf61ba4569d34caea5fc ] Change the USB QMP PHY to use newer style of QMP PHY bindings (single resource region, no per-PHY subnodes). Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230711120916.4165894-9-dmitry.baryshkov@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: cf4d6d54eadb ("arm64: dts: qcom: sdm845: Disable SS instance in Parkmode for USB") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Krishna Kurapati	1a0bff67f4	arm64: dts: qcom: ipq8074: Disable SS instance in Parkmode for USB [ Upstream commit dc6ba95c6c4400a84cca5b419b34ae852a08cfb5 ] For Gen-1 targets like IPQ8074, it is seen that stressing out the controller in host mode results in HC died error: xhci-hcd.12.auto: xHCI host not responding to stop endpoint command xhci-hcd.12.auto: xHCI host controller not responding, assume dead xhci-hcd.12.auto: HC died; cleaning up And at this instant only restarting the host mode fixes it. Disable SuperSpeed instance in park mode for IPQ8074 to mitigate this issue. Cc: stable@vger.kernel.org Fixes: `5e09bc51d0` ("arm64: dts: ipq8074: enable USB support") Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20240704152848.3380602-3-quic_kriskura@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:11 +02:00
Krishna Kurapati	cd4f3ad55b	arm64: dts: qcom: msm8998: Disable SS instance in Parkmode for USB [ Upstream commit 0046325ae52079b46da13a7f84dd7b2a6f7c38f8 ] For Gen-1 targets like MSM8998, it is seen that stressing out the controller in host mode results in HC died error: xhci-hcd.12.auto: xHCI host not responding to stop endpoint command xhci-hcd.12.auto: xHCI host controller not responding, assume dead xhci-hcd.12.auto: HC died; cleaning up And at this instant only restarting the host mode fixes it. Disable SuperSpeed instance in park mode for MSM8998 to mitigate this issue. Cc: stable@vger.kernel.org Fixes: `026dad8f58` ("arm64: dts: qcom: msm8998: Add USB-related nodes") Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20240704152848.3380602-4-quic_kriskura@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:11 +02:00
Dmitry Baryshkov	267a485c15	arm64: dts: qcom: msm8998: switch USB QMP PHY to new style of bindings [ Upstream commit b7efebfeb2e8ad8187cdabba5f0212ba2e6c1069 ] Change the USB QMP PHY to use newer style of QMP PHY bindings (single resource region, no per-PHY subnodes). Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230824211952.1397699-11-dmitry.baryshkov@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: 0046325ae520 ("arm64: dts: qcom: msm8998: Disable SS instance in Parkmode for USB") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:11 +02:00
Krishna Kurapati	5bf33793d1	arm64: dts: qcom: sc7280: Disable SuperSpeed instances in park mode [ Upstream commit 3d930f1750ce30a6c36dbc71f8ff7e20322b94d7 ] On SC7280, in host mode, it is observed that stressing out controller results in HC died error: xhci-hcd.12.auto: xHCI host not responding to stop endpoint command xhci-hcd.12.auto: xHCI host controller not responding, assume dead xhci-hcd.12.auto: HC died; cleaning up And at this instant only restarting the host mode fixes it. Disable SuperSpeed instances in park mode for SC7280 to mitigate this issue. Reported-by: Doug Anderson <dianders@google.com> Cc: stable@vger.kernel.org Fixes: `bb9efa59c6` ("arm64: dts: qcom: sc7280: Add USB related nodes") Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20240604060659.1449278-3-quic_kriskura@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:11 +02:00
Dmitry Baryshkov	f879a83086	arm64: dts: qcom: sc7280: switch USB+DP QMP PHY to new style of bindings [ Upstream commit 36888ed83f998c3335272f9e353eaf6d109e2429 ] Change the USB QMP PHY to use newer style of QMP PHY bindings (single resource region, no per-PHY subnodes). Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230711120916.4165894-8-dmitry.baryshkov@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: 3d930f1750ce ("arm64: dts: qcom: sc7280: Disable SuperSpeed instances in park mode") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:11 +02:00
Krishna Kurapati	fde0434035	arm64: dts: qcom: sc7180: Disable SuperSpeed instances in park mode [ Upstream commit 5b8baed4b88132c12010ce6ca1b56f00d122e376 ] On SC7180, in host mode, it is observed that stressing out controller results in HC died error: xhci-hcd.12.auto: xHCI host not responding to stop endpoint command xhci-hcd.12.auto: xHCI host controller not responding, assume dead xhci-hcd.12.auto: HC died; cleaning up And at this instant only restarting the host mode fixes it. Disable SuperSpeed instances in park mode for SC7180 to mitigate this issue. Reported-by: Doug Anderson <dianders@google.com> Cc: stable@vger.kernel.org Fixes: `0b766e7fe5` ("arm64: dts: qcom: sc7180: Add USB related nodes") Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20240604060659.1449278-2-quic_kriskura@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:11 +02:00
Dmitry Baryshkov	2359355ddf	arm64: dts: qcom: sc7180: switch USB+DP QMP PHY to new style of bindings [ Upstream commit ebb840b00b7f9fc15153b37a7d9ec5b47a5308c1 ] Change the USB QMP PHY to use newer style of QMP PHY bindings (single resource region, no per-PHY subnodes). Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230711120916.4165894-6-dmitry.baryshkov@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: 5b8baed4b881 ("arm64: dts: qcom: sc7180: Disable SuperSpeed instances in park mode") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:11 +02:00
Greg Kroah-Hartman	7213910600	Linux 6.6.44 Link: https://lore.kernel.org/r/20240730151639.792277039@linuxfoundation.org Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Peter Schneider <pschneider1968@googlemail.com> Tested-by: Allen Pais <apais@linux.microsoft.com> Tested-by: SeongJae Park <sj@kernel.org> Tested-by: Ron Economos <re@w6rz.net> Tested-by: kernelci.org bot <bot@kernelci.org> Tested-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-08-03 08:54:42 +02:00
Seth Forshee (DigitalOcean)	acbd66c10d	fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT [ Upstream commit e1c5ae59c0f22f7fe5c07fb5513a29e4aad868c9 ] Christian noticed that it is possible for a privileged user to mount most filesystems with a non-initial user namespace in sb->s_user_ns. When fsopen() is called in a non-init namespace the caller's namespace is recorded in fs_context->user_ns. If the returned file descriptor is then passed to a process priviliged in init_user_ns, that process can call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock with sb->s_user_ns set to the namespace of the process which called fsopen(). This is problematic. We cannot assume that any filesystem which does not set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in mind, increasing the risk for bugs and security issues. Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is not set for the filesystem and a non-initial user namespace will be used. sget() does not need to be updated as it always uses the user namespace of the current context, or the initial user namespace if SB_SUBMOUNT is set. Fixes: `cb50b348c7` ("convenience helpers: vfs_get_super() and sget_fc()") Reported-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Link: https://lore.kernel.org/r/20240724-s_user_ns-fix-v1-1-895d07c94701@kernel.org Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:41 +02:00
Leon Romanovsky	77848b379e	nvme-pci: add missing condition check for existence of mapped data [ Upstream commit c31fad1470389666ac7169fe43aa65bf5b7e2cfd ] nvme_map_data() is called when request has physical segments, hence the nvme_unmap_data() should have same condition to avoid dereference. Fixes: `4aedb70543` ("nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:41 +02:00
Pavel Begunkov	766b0e807e	io_uring: fix io_match_task must_hold [ Upstream commit e142e9cd8891b0c6f277ac2c2c254199a6aa56e3 ] The __must_hold annotation in io_match_task() uses a non existing parameter "req", fix it. Fixes: `6af3f48bf6` ("io_uring: fix link traversal locking") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3e65ee7709e96507cef3d93291746f2c489f2307.1721819383.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:41 +02:00
Artem Chernyshev	b62841e49a	iommu: sprd: Avoid NULL deref in sprd_iommu_hw_en [ Upstream commit 630482ee0653decf9e2482ac6181897eb6cde5b8 ] In sprd_iommu_cleanup() before calling function sprd_iommu_hw_en() dom->sdev is equal to NULL, which leads to null dereference. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `9afea57384` ("iommu/sprd: Release dma buffer to avoid memory leak") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Reviewed-by: Chunyan Zhang <zhang.lyra@gmail.com> Link: https://lore.kernel.org/r/20240716125522.3690358-1-artem.chernyshev@red-soft.ru Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:41 +02:00
Thomas Richter	97dfb89415	s390/cpum_cf: Fix endless loop in CF_DIAG event stop [ Upstream commit e6ce1f12d777f6ee22b20e10ae6a771e7e6f44f5 ] Event CF_DIAG reads out complete counter sets using stcctm instruction. This is done at event start time when the process starts execution and at event stop time when the process is removed from the CPU. During removal the difference of each counter in the counter sets is calculated and saved as raw data in the ring buffer. This works fine unless the number of counters in a counter set is zero. This may happen for the extended counter set. This set is machine specific and the size of the counter set can be zero even when extended counter set is authorized for read access. This case is not handled. cfdiag_diffctr() checks authorization of the extended counter set. If true the functions assumes the extended counter set has been saved in a data buffer. However this is not the case, cfdiag_getctrset() does not save a counter set with counter set size of zero. This mismatch causes an endless loop in the counter set readout during event stop handling. The calculation of the difference of the counters in each counter now verifies the size of the counter set is non-zero. A counter set with size zero is skipped. Fixes: `a029a4eab3` ("s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:41 +02:00
Gerd Bayer	b4d781ddae	s390/pci: Allow allocation of more than 1 MSI interrupt [ Upstream commit ab42fcb511fd9d241bbab7cc3ca04e34e9fc0666 ] On a PCI adapter that provides up to 8 MSI interrupt sources the s390 implementation of PCI interrupts rejected to accommodate them, although the underlying hardware is able to support that. For MSI-X it is sufficient to allocate a single irq_desc per msi_desc, but for MSI multiple irq descriptors are attached to and controlled by a single msi descriptor. Add the appropriate loops to maintain multiple irq descriptors and tie/untie them to/from the appropriate AIBV bit, if a device driver allocates more than 1 MSI interrupt. Common PCI code passes on requests to allocate a number of interrupt vectors based on the device drivers' demand and the PCI functions' capabilities. However, the root-complex of s390 systems support just a limited number of interrupt vectors per PCI function. Produce a kernel log message to inform about any architecture-specific capping that might be done. With this change, we had a PCI adapter successfully raising interrupts to its device driver via all 8 sources. Fixes: `a384c8924a` ("s390/PCI: Fix single MSI only check") Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:41 +02:00
Gerd Bayer	3eab85f45f	s390/pci: Refactor arch_setup_msi_irqs() [ Upstream commit 5fd11b96b43708f2f6e3964412c301c1bd20ec0f ] Factor out adapter interrupt allocation from arch_setup_msi_irqs() in preparation for enabling registration of multiple MSIs. Code movement only, no change of functionality intended. Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Stable-dep-of: ab42fcb511fd ("s390/pci: Allow allocation of more than 1 MSI interrupt") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:41 +02:00
ethanwu	da0a3ebf9a	ceph: fix incorrect kmalloc size of pagevec mempool [ Upstream commit 03230edb0bd831662a7c08b6fef66b2a9a817774 ] The kmalloc size of pagevec mempool is incorrectly calculated. It misses the size of page pointer and only accounts the number for the array. Fixes: `a0102bda5b` ("ceph: move sb->wb_pagevec_pool to be a global mempool") Signed-off-by: ethanwu <ethanwu@synology.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:40 +02:00
Dan Carpenter	6d98741dbd	ASoC: TAS2781: Fix tasdev_load_calibrated_data() [ Upstream commit 92c78222168e9035a9bfb8841c2e56ce23e51f73 ] This function has a reversed if statement so it's either a no-op or it leads to a NULL dereference. Fixes: b195acf5266d ("ASoC: tas2781: Fix wrong loading calibrated data sequence") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/18a29b68-cc85-4139-b7c7-2514e8409a42@stanley.mountain Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:40 +02:00
Pierre-Louis Bossart	be6d86df47	ASoC: Intel: use soc_intel_is_byt_cr() only when IOSF_MBI is reachable [ Upstream commit 9931f7d5d251882a147cc5811060097df43e79f5 ] the Intel kbuild bot reports a link failure when IOSF_MBI is built-in but the Merrifield driver is configured as a module. The soc-intel-quirks.h is included for Merrifield platforms, but IOSF_MBI is not selected for that platform. ld.lld: error: undefined symbol: iosf_mbi_read >>> referenced by atom.c >>> sound/soc/sof/intel/atom.o:(atom_machine_select) in archive vmlinux.a This patch forces the use of the fallback static inline when IOSF_MBI is not reachable. Fixes: `536cfd2f37` ("ASoC: Intel: use common helpers to detect CPUs") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202407160704.zpdhJ8da-lkp@intel.com/ Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20240722083002.10800-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:40 +02:00
Conor Dooley	af1125de16	spi: spidev: add correct compatible for Rohm BH2228FV [ Upstream commit fc28d1c1fe3b3e2fbc50834c8f73dda72f6af9fc ] When Maxime originally added the BH2228FV to the spidev driver, he spelt it incorrectly - the d should have been a b. Add the correctly spelt compatible to the driver. Although the majority of users of this compatible are abusers, there is at least one board that validly uses the incorrect spelt compatible, so keep it in the driver to avoid breaking the few real users it has. Fixes: `8fad805bdc` ("spi: spidev: Add Rohm DH2228FV DAC compatible string") Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://patch.msgid.link/20240717-ventricle-strewn-a7678c509e85@spud Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:40 +02:00
Venkata Prasad Potturu	6443a40285	ASoC: sof: amd: fix for firmware reload failure in Vangogh platform [ Upstream commit f2038c12e8133bf4c6bd4d1127a23310d55d9e21 ] Setting ACP ACLK as clock source when ACP enters D0 state causing firmware load failure, as per design clock source should be internal clock. Remove acp_clkmux_sel field so that ACP will use internal clock source when ACP enters into D0 state. Fixes: `d0dab6b76a` ("ASoC: SOF: amd: Add sof support for vangogh platform") Signed-off-by: Venkata Prasad Potturu <venkataprasad.potturu@amd.com> Link: https://patch.msgid.link/20240718062004.581685-1-venkataprasad.potturu@amd.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:54:40 +02:00

1 2 3 4 5 ...

1226566 Commits