linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-07 19:30:30 +09:00

Author	SHA1	Message	Date
Cédric Le Goater	abf104b64c	powerpc/xive: Skip ioremap() of ESB pages for LSI interrupts commit `b67a95f2ab` upstream. The PCI INTx interrupts and other LSI interrupts are handled differently under a sPAPR platform. When the interrupt source characteristics are queried, the hypervisor returns an H_INT_ESB flag to inform the OS that it should be using the H_INT_ESB hcall for interrupt management and not loads and stores on the interrupt ESB pages. A default -1 value is returned for the addresses of the ESB pages. The driver ignores this condition today and performs a bogus IO mapping. Recent changes and the DEBUG_VM configuration option make the bug visible with : kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=1024 NUMA pSeries Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-0.rc6.git0.1.fc32.ppc64le #1 NIP: c000000000f63294 LR: c000000000f62e44 CTR: 0000000000000000 REGS: c0000000fa45f0d0 TRAP: 0700 Not tainted (5.4.0-0.rc6.git0.1.fc32.ppc64le) ... NIP ioremap_page_range+0x4c4/0x6e0 LR ioremap_page_range+0x74/0x6e0 Call Trace: ioremap_page_range+0x74/0x6e0 (unreliable) do_ioremap+0x8c/0x120 __ioremap_caller+0x128/0x140 ioremap+0x30/0x50 xive_spapr_populate_irq_data+0x170/0x260 xive_irq_domain_map+0x8c/0x170 irq_domain_associate+0xb4/0x2d0 irq_create_mapping+0x1e0/0x3b0 irq_create_fwspec_mapping+0x27c/0x3e0 irq_create_of_mapping+0x98/0xb0 of_irq_parse_and_map_pci+0x168/0x230 pcibios_setup_device+0x88/0x250 pcibios_setup_bus_devices+0x54/0x100 __of_scan_bus+0x160/0x310 pcibios_scan_phb+0x330/0x390 pcibios_init+0x8c/0x128 do_one_initcall+0x60/0x2c0 kernel_init_freeable+0x290/0x378 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x80 Fixes: `bed81ee181` ("powerpc/xive: introduce H_INT_ESB hcall") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191203163642.2428-1-clg@kaod.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:16 +01:00
Alastair D'Silva	f80e2ee414	powerpc: Allow flush_icache_range to work across ranges >4GB commit `29430fae82` upstream. When calling flush_icache_range with a size >4GB, we were masking off the upper 32 bits, so we would incorrectly flush a range smaller than intended. This patch replaces the 32 bit shifts with 64 bit ones, so that the full size is accounted for. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191104023305.9581-2-alastair@au1.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:16 +01:00
Cédric Le Goater	87682db0a1	powerpc/xive: Prevent page fault issues in the machine crash handler commit `1ca3dec2b2` upstream. When the machine crash handler is invoked, all interrupts are masked but interrupts which have not been started yet do not have an ESB page mapped in the Linux address space. This crashes the 'crash kexec' sequence on sPAPR guests. To fix, force the mapping of the ESB page when an interrupt is being mapped in the Linux IRQ number space. This is done by setting the initial state of the interrupt to OFF which is not necessarily the case on PowerNV. Fixes: `243e25112d` ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191031063100.3864-1-clg@kaod.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:15 +01:00
Alastair D'Silva	e0dd31b9e5	powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GB commit `f9ec111653` upstream. When calling __kernel_sync_dicache with a size >4GB, we were masking off the upper 32 bits, so we would incorrectly flush a range smaller than intended. This patch replaces the 32 bit shifts with 64 bit ones, so that the full size is accounted for. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191104023305.9581-3-alastair@au1.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:14 +01:00
Arnd Bergmann	919fc442cf	ppdev: fix PPGETTIME/PPSETTIME ioctls commit `998174042d` upstream. Going through the uses of timeval in the user space API, I noticed two bugs in ppdev that were introduced in the y2038 conversion: * The range check was accidentally moved from ppsettime to ppgettime * On sparc64, the microseconds are in the other half of the 64-bit word. Fix both, and mark the fix for stable backports. Cc: stable@vger.kernel.org Fixes: `3b9ab374a1` ("ppdev: convert to y2038 safe") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20191108203435.112759-8-arnd@arndb.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:13 +01:00
Jarkko Nikula	c85b9f7deb	ARM: dts: omap3-tao3530: Fix incorrect MMC card detection GPIO polarity commit `287897f9aa` upstream. The MMC card detection GPIO polarity is active low on TAO3530, like in many other similar boards. Now the card is not detected and it is unable to mount rootfs from an SD card. Fix this by using the correct polarity. This incorrect polarity was defined already in the commit `30d95c6d70` ("ARM: dts: omap3: Add Technexion TAO3530 SOM omap3-tao3530.dtsi") in v3.18 kernel and later changed to use defined GPIO constants in v4.4 kernel by the commit `3a637e008e` ("ARM: dts: Use defined GPIO constants in flags cell for OMAP2+ boards"). While the latter commit did not introduce the issue I'm marking it with Fixes tag due the v4.4 kernels still being maintained. Fixes: `3a637e008e` ("ARM: dts: Use defined GPIO constants in flags cell for OMAP2+ boards") Cc: linux-stable <stable@vger.kernel.org> # 4.4+ Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:13 +01:00
H. Nikolaus Schaller	6c44b58d95	mmc: host: omap_hsmmc: add code for special init of wl1251 to get rid of pandora_wl1251_init_card commit `f6498b922e` upstream. Pandora_wl1251_init_card was used to do special pdata based setup of the sdio mmc interface. This does no longer work with v4.7 and later. A fix requires a device tree based mmc3 setup. Therefore we move the special setup to omap_hsmmc.c instead of calling some pdata supplied init_card function. The new code checks for a DT child node compatible to wl1251 so it will not affect other MMC3 use cases. Generally, this code was and still is a hack and should be moved to mmc core to e.g. read such properties from optional DT child nodes. Fixes: `81eef6ca92` ("mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.7+ [Ulf: Fixed up some checkpatch complaints] Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:11 +01:00
Krzysztof Kozlowski	8e059f7bda	pinctrl: samsung: Fix device node refcount leaks in S3C64xx wakeup controller init commit `7f028caadf` upstream. In s3c64xx_eint_eint0_init() the for_each_child_of_node() loop is used with a break to find a matching child node. Although each iteration of for_each_child_of_node puts the previous node, but early exit from loop misses it. This leads to leak of device node. Cc: <stable@vger.kernel.org> Fixes: `61dd726131` ("pinctrl: Add pinctrl-s3c64xx driver") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:11 +01:00
Krzysztof Kozlowski	1d9501c2ce	pinctrl: samsung: Fix device node refcount leaks in init code commit `a322b3377f` upstream. Several functions use for_each_child_of_node() loop with a break to find a matching child node. Although each iteration of for_each_child_of_node puts the previous node, but early exit from loop misses it. This leads to leak of device node. Cc: <stable@vger.kernel.org> Fixes: `9a2c1c3b91` ("pinctrl: samsung: Allow grouping multiple pinmux/pinconf nodes") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:09 +01:00
Krzysztof Kozlowski	fcfaf12587	pinctrl: samsung: Fix device node refcount leaks in S3C24xx wakeup controller init commit `6fbbcb0508` upstream. In s3c24xx_eint_init() the for_each_child_of_node() loop is used with a break to find a matching child node. Although each iteration of for_each_child_of_node puts the previous node, but early exit from loop misses it. This leads to leak of device node. Cc: <stable@vger.kernel.org> Fixes: `af99a75074` ("pinctrl: Add pinctrl-s3c24xx driver") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:08 +01:00
Krzysztof Kozlowski	1aaf409ebd	pinctrl: samsung: Fix device node refcount leaks in Exynos wakeup controller init commit `5c7f48dd14` upstream. In exynos_eint_wkup_init() the for_each_child_of_node() loop is used with a break to find a matching child node. Although each iteration of for_each_child_of_node puts the previous node, but early exit from loop misses it. This leads to leak of device node. Cc: <stable@vger.kernel.org> Fixes: `43b169db18` ("pinctrl: add exynos4210 specific extensions for samsung pinctrl driver") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:08 +01:00
Nishka Dasgupta	0c72a9f1f9	pinctrl: samsung: Add of_node_put() before return in error path commit `3d2557ab75` upstream. Each iteration of for_each_child_of_node puts the previous node, but in the case of a return from the middle of the loop, there is no put, thus causing a memory leak. Hence add an of_node_put before the return of exynos_eint_wkup_init() error path. Issue found with Coccinelle. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Cc: <stable@vger.kernel.org> Fixes: `14c255d35b` ("pinctrl: exynos: Add irq_chip instance for Exynos7 wakeup interrupts") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:07 +01:00
Gregory CLEMENT	935057b020	pinctrl: armada-37xx: Fix irq mask access in armada_37xx_irq_set_type() commit `04fb02757a` upstream. As explained in the following commit `a9a1a48336` ("pinctrl: armada-37xx: Fix gpio interrupt setup") the armada_37xx_irq_set_type() function can be called before the initialization of the mask field. That means that we can't use this field in this function and need to workaround it using hwirq. Fixes: `30ac0d3b07` ("pinctrl: armada-37xx: Add edge both type gpio irq support") Cc: stable@vger.kernel.org Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Link: https://lore.kernel.org/r/20191115155752.2562-1-gregory.clement@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:07 +01:00
Rafael J. Wysocki	24f4153491	ACPI: PM: Avoid attaching ACPI PM domain to certain devices commit `b9ea0bae26` upstream. Certain ACPI-enumerated devices represented as platform devices in Linux, like fans, require special low-level power management handling implemented by their drivers that is not in agreement with the ACPI PM domain behavior. That leads to problems with managing ACPI fans during system-wide suspend and resume. For this reason, make acpi_dev_pm_attach() skip the affected devices by adding a list of device IDs to avoid to it and putting the IDs of the affected devices into that list. Fixes: `e5cc8ef312` (ACPI / PM: Provide ACPI PM callback routines for subsystems) Reported-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:06 +01:00
Vamshi K Sthambamkadi	1309b43650	ACPI: bus: Fix NULL pointer check in acpi_bus_get_private_data() commit `627ead724e` upstream. kmemleak reported backtrace: [<bbee0454>] kmem_cache_alloc_trace+0x128/0x260 [<6677f215>] i2c_acpi_install_space_handler+0x4b/0xe0 [<1180f4fc>] i2c_register_adapter+0x186/0x400 [<6083baf7>] i2c_add_adapter+0x4e/0x70 [<a3ddf966>] intel_gmbus_setup+0x1a2/0x2c0 [i915] [<84cb69ae>] i915_driver_probe+0x8d8/0x13a0 [i915] [<81911d4b>] i915_pci_probe+0x48/0x160 [i915] [<4b159af1>] pci_device_probe+0xdc/0x160 [<b3c64704>] really_probe+0x1ee/0x450 [<bc029f5a>] driver_probe_device+0x142/0x1b0 [<d8829d20>] device_driver_attach+0x49/0x50 [<de71f045>] __driver_attach+0xc9/0x150 [<df33ac83>] bus_for_each_dev+0x56/0xa0 [<80089bba>] driver_attach+0x19/0x20 [<cc73f583>] bus_add_driver+0x177/0x220 [<7b29d8c7>] driver_register+0x56/0xf0 In i2c_acpi_remove_space_handler(), a leak occurs whenever the "data" parameter is initialized to 0 before being passed to acpi_bus_get_private_data(). This is because the NULL pointer check in acpi_bus_get_private_data() (condition->if(!*data)) returns EINVAL and, in consequence, memory is never freed in i2c_acpi_remove_space_handler(). Fix the NULL pointer check in acpi_bus_get_private_data() to follow the analogous check in acpi_get_data_full(). Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> [ rjw: Subject & changelog ] Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:05 +01:00
Francesco Ruggeri	b81b6f35fa	ACPI: OSL: only free map once in osl.c commit `833a426cc4` upstream. acpi_os_map_cleanup checks map->refcount outside of acpi_ioremap_lock before freeing the map. This creates a race condition the can result in the map being freed more than once. A panic can be caused by running for ((i=0; i<10; i++)) do for ((j=0; j<100000; j++)) do cat /sys/firmware/acpi/tables/data/BERT >/dev/null done & done This patch makes sure that only the process that drops the reference to 0 does the freeing. Fixes: `b7c1fadd6c` ("ACPI: Do not use krefs under a mutex in osl.c") Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:05 +01:00
Mika Westerberg	9f5ee70685	ACPI / hotplug / PCI: Allocate resources directly under the non-hotplug bridge commit `77adf93553` upstream. Valerio and others reported that commit `84c8b58ed3` ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug") prevents some recent LG and HP laptops from booting with endless loop of: ACPI Error: No handler or method for GPE 08, disabling event (20190215/evgpe-835) ACPI Error: No handler or method for GPE 09, disabling event (20190215/evgpe-835) ACPI Error: No handler or method for GPE 0A, disabling event (20190215/evgpe-835) ... What seems to happen is that during boot, after the initial PCI enumeration when EC is enabled the platform triggers ACPI Notify() to one of the root ports. The root port itself looks like this: pci 0000:00:1b.0: PCI bridge to [bus 02-3a] pci 0000:00:1b.0: bridge window [mem 0xc4000000-0xda0fffff] pci 0000:00:1b.0: bridge window [mem 0x80000000-0xa1ffffff 64bit pref] The BIOS has configured the root port so that it does not have I/O bridge window. Now when the ACPI Notify() is triggered ACPI hotplug handler calls acpiphp_native_scan_bridge() for each non-hotplug bridge (as this system is using native PCIe hotplug) and pci_assign_unassigned_bridge_resources() to allocate resources. The device connected to the root port is a PCIe switch (Thunderbolt controller) with two hotplug downstream ports. Because of the hotplug ports __pci_bus_size_bridges() tries to add "additional I/O" of 256 bytes to each (DEFAULT_HOTPLUG_IO_SIZE). This gets further aligned to 4k as that's the minimum I/O window size so each hotplug port gets 4k I/O window and the same happens for the root port (which is also hotplug port). This means 3 * 4k = 12k I/O window. Because of this pci_assign_unassigned_bridge_resources() ends up opening a I/O bridge window for the root port at first available I/O address which seems to be in range 0x1000 - 0x3fff. Normally this range is used for ACPI stuff such as GPE bits (below is part of /proc/ioports): 1800-1803 : ACPI PM1a_EVT_BLK 1804-1805 : ACPI PM1a_CNT_BLK 1808-180b : ACPI PM_TMR 1810-1815 : ACPI CPU throttle 1850-1850 : ACPI PM2_CNT_BLK 1854-1857 : pnp 00:05 1860-187f : ACPI GPE0_BLK However, when the ACPI Notify() happened this range was not yet reserved for ACPI/PNP (that happens later) so PCI gets it. It then starts writing to this range and accidentally stomps over GPE bits among other things causing the endless stream of messages about missing GPE handler. This problem does not happen if "pci=hpiosize=0" is passed in the kernel command line. The reason is that then the kernel does not try to allocate the additional 256 bytes for each hotplug port. Fix this by allocating resources directly below the non-hotplug bridges where a new device may appear as a result of ACPI Notify(). This avoids the hotplug bridges and prevents opening the additional I/O window. Fixes: `84c8b58ed3` ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug") Link: https://bugzilla.kernel.org/show_bug.cgi?id=203617 Link: https://lore.kernel.org/r/20191030150545.19885-1-mika.westerberg@linux.intel.com Reported-by: Valerio Passini <passini.valerio@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:04 +01:00
John Hubbard	2324a66924	cpufreq: powernv: fix stack bloat and hard limit on number of CPUs commit `db0d32d840` upstream. The following build warning occurred on powerpc 64-bit builds: drivers/cpufreq/powernv-cpufreq.c: In function 'init_chip_info': drivers/cpufreq/powernv-cpufreq.c:1070:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] This is with a cross-compiler based on gcc 8.1.0, which I got from: https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/8.1.0/ The warning is due to putting 1024 bytes on the stack: unsigned int chip[256]; ...and it's also undesirable to have a hard limit on the number of CPUs here. Fix both problems by dynamically allocating based on num_possible_cpus, as recommended by Michael Ellerman. Fixes: `053819e0bf` ("cpufreq: powernv: Handle throttling due to Pmax capping at chip level") Signed-off-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.10+ <stable@vger.kernel.org> # 4.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:03 +01:00
Leonard Crestez	b1d06da384	PM / devfreq: Lock devfreq in trans_stat_show commit `2abb0d5268` upstream. There is no locking in this sysfs show function so stats printing can race with a devfreq_update_status called as part of freq switching or with initialization. Also add an assert in devfreq_update_status to make it clear that lock must be held by caller. Fixes: `39688ce6fa` ("PM / devfreq: account suspend/resume for stats") Cc: stable@vger.kernel.org Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:03 +01:00
Alexander Shishkin	1fce7e81b9	intel_th: pci: Add Tiger Lake CPU support commit `6e6c18bcb7` upstream. This adds support for the Trace Hub in Tiger Lake CPU. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191120130806.44028-4-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:02 +01:00
Alexander Shishkin	fc8e3ca541	intel_th: pci: Add Ice Lake CPU support commit `6a1743422a` upstream. This adds support for the Trace Hub in Ice Lake CPU. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191120130806.44028-3-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:01 +01:00
Alexander Shishkin	aa5d849be1	intel_th: Fix a double put_device() in error path commit `512592779a` upstream. Commit `a753bfcfdb` ("intel_th: Make the switch allocate its subdevices") factored out intel_th_subdevice_alloc() from intel_th_populate(), but got the error path wrong, resulting in two instances of a double put_device() on a freshly initialized, but not 'added' device. Fix this by only doing one put_device() in the error path. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Fixes: `a753bfcfdb` ("intel_th: Make the switch allocate its subdevices") Reported-by: Wen Yang <wenyang@linux.alibaba.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org # v4.14+ Link: https://lore.kernel.org/r/20191120130806.44028-2-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:01 +01:00
Gao Xiang	b56d00ace1	erofs: zero out when listxattr is called with no xattr commit `926d165017` upstream. As David reported [1], ENODATA returns when attempting to modify files by using EROFS as an overlayfs lower layer. The root cause is that listxattr could return unexpected -ENODATA by mistake for inodes without xattr. That breaks listxattr return value convention and it can cause copy up failure when used with overlayfs. Resolve by zeroing out if no xattr is found for listxattr. [1] https://lore.kernel.org/r/CAEvUa7nxnby+rxK-KRMA46=exeOMApkDMAV08AjMkkPnTPV4CQ@mail.gmail.com Link: https://lore.kernel.org/r/20191201084040.29275-1-hsiangkao@aol.com Fixes: `cadf1ccf1b` ("staging: erofs: add error handling for xattr submodule") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:00 +01:00
Zhenzhong Duan	c96788214e	cpuidle: Do not unset the driver if it is there already commit `918c1fe9fb` upstream. Fix __cpuidle_set_driver() to check if any of the CPUs in the mask has a driver different from drv already and, if so, return -EBUSY before updating any cpuidle_drivers per-CPU pointers. Fixes: `82467a5a88` ("cpuidle: simplify multiple driver support") Cc: 3.11+ <stable@vger.kernel.org> # 3.11+ Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:35:00 +01:00
Hans Verkuil	aedc1c75ff	media: cec.h: CEC_OP_REC_FLAG_ values were swapped commit `806e0cdfee` upstream. CEC_OP_REC_FLAG_NOT_USED is 0 and CEC_OP_REC_FLAG_USED is 1, not the other way around. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Reported-by: Jiunn Chang <c0d1n61at3@gmail.com> Cc: <stable@vger.kernel.org> # for v4.10 and up Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:59 +01:00
Johan Hovold	acdb4a6b24	media: radio: wl1273: fix interrupt masking on release commit `1091eb8306` upstream. If a process is interrupted while accessing the radio device and the core lock is contended, release() could return early and fail to update the interrupt mask. Note that the return value of the v4l2 release file operation is ignored. Fixes: `87d1a50ce4` ("[media] V4L2: WL1273 FM Radio: TI WL1273 FM radio driver") Cc: stable <stable@vger.kernel.org> # 2.6.38 Cc: Matti Aaltonen <matti.j.aaltonen@nokia.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:59 +01:00
Johan Hovold	2f86d5af05	media: bdisp: fix memleak on release commit `11609a7e21` upstream. If a process is interrupted while accessing the video device and the device lock is contended, release() could return early and fail to free related resources. Note that the return value of the v4l2 release file operation is ignored. Fixes: `28ffeebbb7` ("[media] bdisp: 2D blitter driver using v4l2 mem2mem framework") Cc: stable <stable@vger.kernel.org> # 4.2 Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Fabien Dessenne <fabien.dessenne@st.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:58 +01:00
Gerald Schaefer	4ca41aa4c6	s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported commit `ab874f22d3` upstream. On older HW or under a hypervisor, w/o the instruction-execution- protection (IEP) facility, and also w/o EDAT-1, a translation-specification exception may be recognized when bit 55 of a pte is one (_PAGE_NOEXEC). The current code tries to prevent setting _PAGE_NOEXEC in such cases, by removing it within set_pte_at(). However, ptep_set_access_flags() will modify a pte directly, w/o using set_pte_at(). There is at least one scenario where this can result in an active pte with _PAGE_NOEXEC set, which would then lead to a panic due to a translation-specification exception (write to swapped out page): do_swap_page pte = mk_pte (with _PAGE_NOEXEC bit) set_pte_at (will remove _PAGE_NOEXEC bit in page table, but keep it in local variable pte) vmf->orig_pte = pte (pte still contains _PAGE_NOEXEC bit) do_wp_page wp_page_reuse entry = vmf->orig_pte (still with _PAGE_NOEXEC bit) ptep_set_access_flags (writes entry with _PAGE_NOEXEC bit) Fix this by clearing _PAGE_NOEXEC already in mk_pte_phys(), where the pgprot value is applied, so that no pte with _PAGE_NOEXEC will ever be visible, if it is not supported. The check in set_pte_at() can then also be removed. Cc: <stable@vger.kernel.org> # 4.11+ Fixes: `57d7f939e7` ("s390: add no-execute support") Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:57 +01:00
Denis Efremov	0cc303ba19	ar5523: check NULL before memcpy() in ar5523_cmd() commit `315cee426f` upstream. memcpy() call with "idata == NULL && ilen == 0" results in undefined behavior in ar5523_cmd(). For example, NULL is passed in callchain "ar5523_stat_work() -> ar5523_cmd_write() -> ar5523_cmd()". This patch adds ilen check before memcpy() call in ar5523_cmd() to prevent an undefined behavior. Cc: Pontus Fuchs <pontus.fuchs@gmail.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: stable@vger.kernel.org Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:57 +01:00
Aleksa Sarai	a1de70aa86	cgroup: pids: use atomic64_t for pids->limit commit `a713af394c` upstream. Because pids->limit can be changed concurrently (but we don't want to take a lock because it would be needlessly expensive), use atomic64_ts instead. Fixes: commit `49b786ea14` ("cgroup: implement the PIDs subsystem") Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:56 +01:00
Ming Lei	317c80c672	blk-mq: avoid sysfs buffer overflow with too many CPU cores commit `8962842ca5` upstream. It is reported that sysfs buffer overflow can be triggered if the system has too many CPU cores(>841 on 4K PAGE_SIZE) when showing CPUs of hctx via /sys/block/$DEV/mq/$N/cpu_list. Use snprintf to avoid the potential buffer overflow. This version doesn't change the attribute format, and simply stops showing CPU numbers if the buffer is going to overflow. Cc: stable@vger.kernel.org Fixes: 676141e48af7("blk-mq: don't dump CPU -> hw queue map on driver load") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:55 +01:00
David Jeffery	a12c768df3	md: improve handling of bio with REQ_PREFLUSH in md_flush_request() commit `775d78319f` upstream. If pers->make_request fails in md_flush_request(), the bio is lost. To fix this, pass back a bool to indicate if the original make_request call should continue to handle the I/O and instead of assuming the flush logic will push it to completion. Convert md_flush_request to return a bool and no longer calls the raid driver's make_request function. If the return is true, then the md flush logic has or will complete the bio and the md make_request call is done. If false, then the md make_request function needs to keep processing like it is a normal bio. Let the original call to md_handle_request handle any need to retry sending the bio to the raid driver's make_request function should it be needed. Also mark md_flush_request and the make_request function pointer as __must_check to issue warnings should these critical return values be ignored. Fixes: `2bc13b83e6` ("md: batch flush requests.") Cc: stable@vger.kernel.org # # v4.19+ Cc: NeilBrown <neilb@suse.com> Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:55 +01:00
Pawel Harlozinski	d88d9321e8	ASoC: Jack: Fix NULL pointer dereference in snd_soc_jack_report commit `8f157d4ff0` upstream. Check for existance of jack before tracing. NULL pointer dereference has been reported by KASAN while unloading machine driver (snd_soc_cnl_rt274). Signed-off-by: Pawel Harlozinski <pawel.harlozinski@linux.intel.com> Link: https://lore.kernel.org/r/20191112130237.10141-1-pawel.harlozinski@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:55 +01:00
Jacob Rasmussen	29674d00ca	ASoC: rt5645: Fixed typo for buddy jack support. commit `fe23be2d85` upstream. Had a typo in `e7cfd867fd` that resulted in buddy jack support not being fixed. Fixes: `e7cfd867fd` ("ASoC: rt5645: Fixed buddy jack support.") Signed-off-by: Jacob Rasmussen <jacobraz@google.com> Reviewed-by: Ross Zwisler <zwisler@google.com> Cc: <jacobraz@google.com> CC: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191114232011.165762-1-jacobraz@google.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:54 +01:00
Jacob Rasmussen	fcea88b2ac	ASoC: rt5645: Fixed buddy jack support. commit `e7cfd867fd` upstream. The headphone jack on buddy was broken with the following commit: commit `6b5da66322` ("ASoC: rt5645: read jd1_1 status for jd detection"). This changes the jd_mode for buddy to 4 so buddy can read from the same register that was used in the working version of this driver without affecting any other devices that might use this, since no other device uses jd_mode = 4. To test this I plugged and uplugged the headphone jack, verifying audio works. Signed-off-by: Jacob Rasmussen <jacobraz@google.com> Reviewed-by: Ross Zwisler <zwisler@google.com> Link: https://lore.kernel.org/r/20191111185957.217244-1-jacobraz@google.com Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:54 +01:00
Tejun Heo	ebd9fbf9e7	workqueue: Fix pwq ref leak in rescuer_thread() commit `e66b39af00` upstream. `008847f66c` ("workqueue: allow rescuer thread to do more work.") made the rescuer worker requeue the pwq immediately if there may be more work items which need rescuing instead of waiting for the next mayday timer expiration. Unfortunately, it doesn't check whether the pwq is already on the mayday list and unconditionally gets the ref and moves it onto the list. This doesn't corrupt the list but creates an additional reference to the pwq. It got queued twice but will only be removed once. This leak later can trigger pwq refcnt warning on workqueue destruction and prevent freeing of the workqueue. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Williams, Gerald S" <gerald.s.williams@intel.com> Cc: NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org # v3.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:54 +01:00
Tejun Heo	7c43540e88	workqueue: Fix spurious sanity check failures in destroy_workqueue() commit `def98c84b6` upstream. Before actually destrying a workqueue, destroy_workqueue() checks whether it's actually idle. If it isn't, it prints out a bunch of warning messages and leaves the workqueue dangling. It unfortunately has a couple issues. * Mayday list queueing increments pwq's refcnts which gets detected as busy and fails the sanity checks. However, because mayday list queueing is asynchronous, this condition can happen without any actual work items left in the workqueue. * Sanity check failure leaves the sysfs interface behind too which can lead to init failure of newer instances of the workqueue. This patch fixes the above two by * If a workqueue has a rescuer, disable and kill the rescuer before sanity checks. Disabling and killing is guaranteed to flush the existing mayday list. * Remove sysfs interface before sanity checks. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Marcin Pawlowski <mpawlowski@fb.com> Reported-by: "Williams, Gerald S" <gerald.s.williams@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:53 +01:00
Dmitry Fomichev	56a8302490	dm zoned: reduce overhead of backing device checks commit `e7fad909b6` upstream. Commit `75d66ffb48` added backing device health checks and as a part of these checks, check_events() block ops template call is invoked in dm-zoned mapping path as well as in reclaim and flush path. Calling check_events() with ATA or SCSI backing devices introduces a blocking scsi_test_unit_ready() call being made in sd_check_events(). Even though the overhead of calling scsi_test_unit_ready() is small for ATA zoned devices, it is much larger for SCSI and it affects performance in a very negative way. Fix this performance regression by executing check_events() only in case of any I/O errors. The function dmz_bdev_is_dying() is modified to call only blk_queue_dying(), while calls to check_events() are made in a new helper function, dmz_check_bdev(). Reported-by: zhangxiaoxu <zhangxiaoxu5@huawei.com> Fixes: `75d66ffb48` ("dm zoned: properly handle backing device failure") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:53 +01:00
Maged Mokhtar	10b9bf59ba	dm writecache: handle REQ_FUA commit `c1005322ff` upstream. Call writecache_flush() on REQ_FUA in writecache_map(). Cc: stable@vger.kernel.org # 4.18+ Signed-off-by: Maged Mokhtar <mmokhtar@petasan.org> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:52 +01:00
Sumit Garg	7c07d02673	hwrng: omap - Fix RNG wait loop timeout commit `be867f987a` upstream. Existing RNG data read timeout is 200us but it doesn't cover EIP76 RNG data rate which takes approx. 700us to produce 16 bytes of output data as per testing results. So configure the timeout as 1000us to also take account of lack of udelay()'s reliability. Fixes: `383212425c` ("hwrng: omap - Add device variant for SafeXcel IP-76 found in Armada 8K") Cc: <stable@vger.kernel.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:52 +01:00
Amir Goldstein	f785f33c23	ovl: relax WARN_ON() on rename to self commit `6889ee5a53` upstream. In ovl_rename(), if new upper is hardlinked to old upper underneath overlayfs before upper dirs are locked, user will get an ESTALE error and a WARN_ON will be printed. Changes to underlying layers while overlayfs is mounted may result in unexpected behavior, but it shouldn't crash the kernel and it shouldn't trigger WARN_ON() either, so relax this WARN_ON(). Reported-by: syzbot+bb1836a212e69f8e201a@syzkaller.appspotmail.com Fixes: `804032fabb` ("ovl: don't check rename to self") Cc: <stable@vger.kernel.org> # v4.9+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:51 +01:00
Amir Goldstein	3e929ddf18	ovl: fix corner case of non-unique st_dev;st_ino commit `9c6d8f13e9` upstream. On non-samefs overlay without xino, non pure upper inodes should use a pseudo_dev assigned to each unique lower fs and pure upper inodes use the real upper st_dev. It is fine for an overlay pure upper inode to use the same st_dev;st_ino values as the real upper inode, because the content of those two different filesystem objects is always the same. In this case, however: - two filesystems, A and B - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B Non pure upper overlay inode, whose origin is in layer 1 will have the same st_dev;st_ino values as the real lower inode. This may result with a false positive results of 'diff' between the real lower and copied up overlay inode. Fix this by using the upper st_dev;st_ino values in this case. This breaks the property of constant st_dev;st_ino across copy up of this case. This breakage will be fixed by a later patch. Fixes: `5148626b80` ("ovl: allocate anon bdev per unique lower fs") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:51 +01:00
Greg Kroah-Hartman	458f77a499	lib: raid6: fix awk build warnings commit `702600eef7` upstream. Newer versions of awk spit out these fun warnings: awk: ../lib/raid6/unroll.awk:16: warning: regexp escape sequence `\#' is not a known regexp operator As commit `700c1018b8` ("x86/insn: Fix awk regexp warnings") showed, it turns out that there are a number of awk strings that do not need to be escaped and newer versions of awk now warn about this. Fix the string up so that no warning is produced. The exact same kernel module gets created before and after this patch, showing that it wasn't needed. Link: https://lore.kernel.org/r/20191206152600.GA75093@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:51 +01:00
Larry Finger	4ee6af20c2	rtlwifi: rtl8192de: Fix missing enable interrupt flag commit `330bb71171` upstream. In commit `38506ecefa` ("rtlwifi: rtl_pci: Start modification for new drivers"), the flag that indicates that interrupts are enabled was never set. In addition, there are several places when enable/disable interrupts were commented out are restored. A sychronize_interrupts() call is removed. Fixes: `38506ecefa` ("rtlwifi: rtl_pci: Start modification for new drivers") Cc: Stable <stable@vger.kernel.org> # v3.18+ Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:50 +01:00
Larry Finger	0aa2570917	rtlwifi: rtl8192de: Fix missing callback that tests for hw release of buffer commit `3155db7613` upstream. In commit `38506ecefa` ("rtlwifi: rtl_pci: Start modification for new drivers"), a callback needed to check if the hardware has released a buffer indicating that a DMA operation is completed was not added. Fixes: `38506ecefa` ("rtlwifi: rtl_pci: Start modification for new drivers") Cc: Stable <stable@vger.kernel.org> # v3.18+ Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:50 +01:00
Larry Finger	56a35a3f5a	rtlwifi: rtl8192de: Fix missing code to retrieve RX buffer address commit `0e531cc575` upstream. In commit `38506ecefa` ("rtlwifi: rtl_pci: Start modification for new drivers"), a callback to get the RX buffer address was added to the PCI driver. Unfortunately, driver rtl8192de was not modified appropriately and the code runs into a WARN_ONCE() call. The use of an incorrect array is also fixed. Fixes: `38506ecefa` ("rtlwifi: rtl_pci: Start modification for new drivers") Cc: Stable <stable@vger.kernel.org> # 3.18+ Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:49 +01:00
Josef Bacik	8862b80bd5	btrfs: record all roots for rename exchange on a subvol commit `3e1740993e` upstream. Testing with the new fsstress support for subvolumes uncovered a pretty bad problem with rename exchange on subvolumes. We're modifying two different subvolumes, but we only start the transaction on one of them, so the other one is not added to the dirty root list. This is caught by btrfs_cow_block() with a warning because the root has not been updated, however if we do not modify this root again we'll end up pointing at an invalid root because the root item is never updated. Fix this by making sure we add the destination root to the trans list, the same as we do with normal renames. This fixes the corruption. Fixes: `cdd1fedf82` ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:49 +01:00
Filipe Manana	f803185361	Btrfs: send, skip backreference walking for extents with many references commit `fd0ddbe250` upstream. Backreference walking, which is used by send to figure if it can issue clone operations instead of write operations, can be very slow and use too much memory when extents have many references. This change simply skips backreference walking when an extent has more than 64 references, in which case we fallback to a write operation instead of a clone operation. This limit is conservative and in practice I observed no signicant slowdown with up to 100 references and still low memory usage up to that limit. This is a temporary workaround until there are speedups in the backref walking code, and as such it does not attempt to add extra interfaces or knobs to tweak the threshold. Reported-by: Atemu <atemu.main@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAE4GHgkvqVADtS4AzcQJxo0Q1jKQgKaW3JGp3SGdoinVo=C9eQ@mail.gmail.com/T/#me55dc0987f9cc2acaa54372ce0492c65782be3fa CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:48 +01:00
Qu Wenruo	dc2a320dc2	btrfs: Remove btrfs_bio::flags member commit `34b127aecd` upstream. The last user of btrfs_bio::flags was removed in commit `326e1dbb57` ("block: remove management of bi_remaining when restoring original bi_end_io"), remove it. (Tagged for stable as the structure is heavily used and space savings are desirable.) CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:48 +01:00
Tejun Heo	dfca82a7ab	btrfs: Avoid getting stuck during cyclic writebacks commit `f7bddf1e27` upstream. During a cyclic writeback, extent_write_cache_pages() uses done_index to update the writeback_index after the current run is over. However, instead of current index + 1, it gets to to the current index itself. Unfortunately, this, combined with returning on EOF instead of looping back, can lead to the following pathlogical behavior. 1. There is a single file which has accumulated enough dirty pages to trigger balance_dirty_pages() and the writer appending to the file with a series of short writes. 2. balance_dirty_pages kicks in, wakes up background writeback and sleeps. 3. Writeback kicks in and the cursor is on the last page of the dirty file. Writeback is started or skipped if already in progress. As it's EOF, extent_write_cache_pages() returns and the cursor is set to done_index which is pointing to the last page. 4. Writeback is done. Nothing happens till balance_dirty_pages finishes, at which point we go back to #1. This can almost completely stall out writing back of the file and keep the system over dirty threshold for a long time which can mess up the whole system. We encountered this issue in production with a package handling application which can reliably reproduce the issue when running under tight memory limits. Reading the comment in the error handling section, this seems to be to avoid accidentally skipping a page in case the write attempt on the page doesn't succeed. However, this concern seems bogus. On each page, the code either: * Skips and moves onto the next page. * Fails issue and sets done_index to index + 1. * Successfully issues and continue to the next page if budget allows and not EOF. IOW, as long as it's not EOF and there's budget, the code never retries writing back the same page. Only when a page happens to be the last page of a particular run, we end up retrying the page, which can't possibly guarantee anything data integrity related. Besides, cyclic writes are only used for non-syncing writebacks meaning that there's no data integrity implication to begin with. Fix it by always setting done_index past the current page being processed. Note that this problem exists in other writepages too. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-17 20:34:47 +01:00

1 2 3 4 5 ...

793746 Commits