linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-07 19:30:30 +09:00

Author	SHA1	Message	Date
Tyrel Datwyler	802d704669	powerpc/pseries/vio: Fix iommu_table use-after-free refcount warning commit `aff8c8242b` upstream. Commit `e5afdf9dd5` ("powerpc/vfio_spapr_tce: Add reference counting to iommu_table") missed an iommu_table allocation in the pseries vio code. The iommu_table is allocated with kzalloc and as a result the associated kref gets a value of zero. This has the side effect that during a DLPAR remove of the associated virtual IOA the iommu_tce_table_put() triggers a use-after-free underflow warning. Call Trace: [c0000002879e39f0] [c00000000071ecb4] refcount_warn_saturate+0x184/0x190 (unreliable) [c0000002879e3a50] [c0000000000500ac] iommu_tce_table_put+0x9c/0xb0 [c0000002879e3a70] [c0000000000f54e4] vio_dev_release+0x34/0x70 [c0000002879e3aa0] [c00000000087cfa4] device_release+0x54/0xf0 [c0000002879e3b10] [c000000000d64c84] kobject_cleanup+0xa4/0x240 [c0000002879e3b90] [c00000000087d358] put_device+0x28/0x40 [c0000002879e3bb0] [c0000000007a328c] dlpar_remove_slot+0x15c/0x250 [c0000002879e3c50] [c0000000007a348c] remove_slot_store+0xac/0xf0 [c0000002879e3cd0] [c000000000d64220] kobj_attr_store+0x30/0x60 [c0000002879e3cf0] [c0000000004ff13c] sysfs_kf_write+0x6c/0xa0 [c0000002879e3d10] [c0000000004fde4c] kernfs_fop_write+0x18c/0x260 [c0000002879e3d60] [c000000000410f3c] __vfs_write+0x3c/0x70 [c0000002879e3d80] [c000000000415408] vfs_write+0xc8/0x250 [c0000002879e3dd0] [c0000000004157dc] ksys_write+0x7c/0x120 [c0000002879e3e20] [c00000000000b278] system_call+0x5c/0x68 Further, since the refcount was always zero the iommu_tce_table_put() fails to call the iommu_table release function resulting in a leak. Fix this issue be initilizing the iommu_table kref immediately after allocation. Fixes: `e5afdf9dd5` ("powerpc/vfio_spapr_tce: Add reference counting to iommu_table") Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1579558202-26052-1-git-send-email-tyreld@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:25 -05:00
Zhengyuan Liu	63a01158cf	tools/power/acpi: fix compilation error commit `1985f8c7f9` upstream. If we compile tools/acpi target in the top source directory, we'd get a compilation error showing as bellow: # make tools/acpi DESCEND power/acpi DESCEND tools/acpidbg CC tools/acpidbg/acpidbg.o Assembler messages: Fatal error: can't create /home/lzy/kernel-upstream/power/acpi/\ tools/acpidbg/acpidbg.o: No such file or directory ../../Makefile.rules:26: recipe for target '/home/lzy/kernel-upstream/\ power/acpi/tools/acpidbg/acpidbg.o' failed make[3]: * [/home/lzy/kernel-upstream//power/acpi/tools/acpidbg/\ acpidbg.o] Error 1 Makefile:19: recipe for target 'acpidbg' failed make[2]: * [acpidbg] Error 2 Makefile:54: recipe for target 'acpi' failed make[1]: * [acpi] Error 2 Makefile:1607: recipe for target 'tools/acpi' failed make: * [tools/acpi] Error 2 Fixes: `d5a4b1a540` ("tools/power/acpi: Remove direct kernel source include reference") Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:25 -05:00
Alexandre Belloni	939d63b8e2	ARM: dts: at91: sama5d3: define clock rate range for tcb1 commit `a7e0f3fc01` upstream. The clock rate range for the TCB1 clock is missing. define it in the device tree. Reported-by: Karl Rudbæk Olsen <karl@micro-technic.com> Fixes: `d2e8190b79` ("ARM: at91/dt: define sama5d3 clocks") Link: https://lore.kernel.org/r/20200110172007.1253659-2-alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:25 -05:00
Alexandre Belloni	7afef92485	ARM: dts: at91: sama5d3: fix maximum peripheral clock rates commit `ee0aa926dd` upstream. Currently the maximum rate for peripheral clock is calculated based on a typical 133MHz MCK. The maximum frequency is defined in the datasheet as a ratio to MCK. Some sama5d3 platforms are using a 166MHz MCK. Update the device trees to match the maximum rate based on 166MHz. Reported-by: Karl Rudbæk Olsen <karl@micro-technic.com> Fixes: `d2e8190b79` ("ARM: at91/dt: define sama5d3 clocks") Link: https://lore.kernel.org/r/20200110172007.1253659-1-alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:25 -05:00
Tero Kristo	95427f9a16	ARM: dts: am43xx: add support for clkout1 clock commit `01053dadb7` upstream. clkout1 clock node and its generation tree was missing. Add this based on the data on TRM and PRCM functional spec. commit `664ae1ab25` ("ARM: dts: am43xx: add clkctrl nodes") effectively reverted this commit `8010f13a40` ("ARM: dts: am43xx: add support for clkout1 clock") which is needed for the ov2659 camera sensor clock definition hence it is being re-applied here. Note that because of the current dts node name dependency for mapping to clock domain, we must still use "clkout1-*ck" naming instead of generic "clock@" naming for the node. And because of this, it's probably best to apply the dts node addition together along with the other clock changes. Fixes: `664ae1ab25` ("ARM: dts: am43xx: add clkctrl nodes") Signed-off-by: Tero Kristo <t-kristo@ti.com> Tested-by: Benoit Parrot <bparrot@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Benoit Parrot <bparrot@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:25 -05:00
Ingo van Lil	27866f3676	ARM: dts: at91: Reenable UART TX pull-ups commit `9d39d86cd4` upstream. Pull-ups for SAM9 UART/USART TX lines were disabled in a previous commit. However, several chips in the SAM9 family require pull-ups to prevent the TX lines from falling (and causing an endless break condition) when the transceiver is disabled. From the SAM9G20 datasheet, 32.5.1: "To prevent the TXD line from falling when the USART is disabled, the use of an internal pull up is mandatory.". This commit reenables the pull-ups for all chips having that sentence in their datasheets. Fixes: `5e04822f7d` ("ARM: dts: at91: fixes uart pinctrl, set pullup on rx, clear pullup on tx") Signed-off-by: Ingo van Lil <inguin@gmx.de> Cc: Peter Rosin <peda@axentia.se> Link: https://lore.kernel.org/r/20191203142147.875227-1-inguin@gmx.de Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:25 -05:00
Mika Westerberg	73124cba81	platform/x86: intel_mid_powerbtn: Take a copy of ddata commit `5e0c94d3ae` upstream. The driver gets driver_data from memory that is marked as const (which is probably put to read-only memory) and it then modifies it. This likely causes some sort of fault to happen. Fix this by taking a copy of the structure. Fixes: `c94a8ff14d` ("platform/x86: intel_mid_powerbtn: make mid_pb_ddata const") Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:25 -05:00
Jose Abreu	f2b5542920	ARC: [plat-axs10x]: Add missing multicast filter number to GMAC node commit `7980dff398` upstream. Add a missing property to GMAC node so that multicast filtering works correctly. Fixes: `556cc1c5f5` ("ARC: [axs101] Add support for AXS101 SDP (software development platform)") Acked-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:25 -05:00
Andy Shevchenko	25992fd9d8	rtc: cmos: Stop using shared IRQ commit `b6da197a2e` upstream. As reported by Guilherme G. Piccoli: ---8<---8<---8<--- The rtc-cmos interrupt setting was changed in the commit `079062b28f` ("rtc: cmos: prevent kernel warning on IRQ flags mismatch") in order to allow shared interrupts; according to that commit's description, some machine got kernel warnings due to the interrupt line being shared between rtc-cmos and other hardware, and rtc-cmos didn't allow IRQ sharing that time. After the aforementioned commit though it was observed a huge increase in lost HPET interrupts in some systems, observed through the following kernel message: [...] hpet1: lost 35 rtc interrupts After investigation, it was narrowed down to the shared interrupts usage when having the kernel option "irqpoll" enabled. In this case, all IRQ handlers are called for non-timer interrupts, if such handlers are setup in shared IRQ lines. The rtc-cmos IRQ handler could be set to hpet_rtc_interrupt(), which will produce the kernel "lost interrupts" message after doing work - lots of readl/writel to HPET registers, which are known to be slow. Although "irqpoll" is not a default kernel option, it's used in some contexts, one being the kdump kernel (which is an already "impaired" kernel usually running with 1 CPU available), so the performance burden could be considerable. Also, the same issue would happen (in a shorter extent though) when using "irqfixup" kernel option. In a quick experiment, a virtual machine with uptime of 2 minutes produced >300 calls to hpet_rtc_interrupt() when "irqpoll" was set, whereas without sharing interrupts this number reduced to 1 interrupt. Machines with more hardware than a VM should generate even more unnecessary HPET interrupts in this scenario. ---8<---8<---8<--- After looking into the rtc-cmos driver history and DSDT table from the Microsoft Surface 3, we may notice that Hans de Goede submitted a correct fix (see dependency below). Thus, we simply revert the culprit commit. Fixes: `079062b28f` ("rtc: cmos: prevent kernel warning on IRQ flags mismatch") Depends-on: `a1e23a42f1` ("rtc: cmos: Do not assume irq 8 for rtc when there are no legacy irqs") Reported-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200123131437.28157-1-andriy.shevchenko@linux.intel.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:24 -05:00
Paul Kocialkowski	3abc9c46a5	rtc: hym8563: Return -EINVAL if the time is known to be invalid commit `f236a2a2eb` upstream. The current code returns -EPERM when the voltage loss bit is set. Since the bit indicates that the time value is not valid, return -EINVAL instead, which is the appropriate error code for this situation. Fixes: `dcaf038493` ("rtc: add hym8563 rtc-driver") Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Link: https://lore.kernel.org/r/20191212153111.966923-1-paul.kocialkowski@bootlin.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:24 -05:00
Geert Uytterhoeven	490ab7fce1	spi: spi-mem: Fix inverted logic in op sanity check [ Upstream commit `aea3877e24` ] On r8a7791/koelsch: m25p80 spi0.0: error -22 reading 9f m25p80: probe of spi0.0 failed with error -22 Apparently the logic in spi_mem_check_op() is wrong, rejecting the spi-mem operation if any buswidth is valid, instead of invalid. Fixes: `380583227c` ("spi: spi-mem: Add extra sanity checks on the op param") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-14 16:33:24 -05:00
Boris Brezillon	b237f078cb	spi: spi-mem: Add extra sanity checks on the op param commit `380583227c` upstream Some combinations are simply not valid and should be rejected before the op is passed to the SPI controller driver. Add an spi_mem_check_op() helper and use it in spi_mem_exec_op() and spi_mem_supports_op() to make sure the spi-mem operation is valid. Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable <stable@vger.kernel.org> # 4.19 Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-14 16:33:24 -05:00
Brandon Maier	1503649d8f	gpio: zynq: Report gpio direction at boot commit `6169005ceb` upstream The Zynq's gpios can be configured by the bootloader. But Linux will erroneously report all gpios as inputs unless we implement get_direction(). Signed-off-by: Brandon Maier <Brandon.Maier@collins.com> Tested-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: stable <stable@vger.kernel.org> # 4.19 Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-14 16:33:24 -05:00
Shubhrajyoti Datta	4d905fc227	serial: uartps: Add a timeout to the tx empty wait commit `277375b864` upstream In case the cable is not connected then the target gets into an infinite wait for tx empty. Add a timeout to the tx empty wait. Reported-by: Jean-Francois Dagenais <jeff.dagenais@gmail.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> # 4.19 Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-14 16:33:24 -05:00
Robert Milkowski	070818b71d	NFSv4: try lease recovery on NFS4ERR_EXPIRED commit `924491f2e4` upstream. Currently, if an nfs server returns NFS4ERR_EXPIRED to open(), we return EIO to applications without even trying to recover. Fixes: `272289a3df` ("NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid") Signed-off-by: Robert Milkowski <rmilkowski@gmail.com> Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:24 -05:00
Trond Myklebust	056d165670	NFS/pnfs: Fix pnfs_generic_prepare_to_resend_writes() commit `221203ce64` upstream. Instead of making assumptions about the commit verifier contents, change the commit code to ensure we always check that the verifier was set by the XDR code. Fixes: `f54bcf2ece` ("pnfs: Prepare for flexfiles by pulling out common code") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:24 -05:00
Trond Myklebust	3060883146	NFS: Revalidate the file size on a fatal write error commit `0df68ced55` upstream. If we suffer a fatal error upon writing a file, which causes us to need to revalidate the entire mapping, then we should also revalidate the file size. Fixes: `d2ceb7e570` ("NFS: Don't use page_file_mapping after removing the page") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:24 -05:00
Geert Uytterhoeven	008ff93dee	nfs: NFS_SWAP should depend on SWAP commit `474c4f306e` upstream. If CONFIG_SWAP=n, it does not make much sense to offer the user the option to enable support for swapping over NFS, as that will still fail at run time: # swapon /swap swapon: /swap: swapon failed: Function not implemented Fix this by adding a dependency on CONFIG_SWAP. Fixes: `a564b8f039` ("nfs: enable swap on NFS") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:23 -05:00
Logan Gunthorpe	412cb7a7b0	PCI: Don't disable bridge BARs when assigning bus resources commit `9db8dc6d07` upstream. Some PCI bridges implement BARs in addition to bridge windows. For example, here's a PLX switch: 04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0 Memory at 90a00000 (32-bit, non-prefetchable) [size=256K] Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0 I/O behind bridge: 00002000-00003fff Memory behind bridge: 90000000-909fffff Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff Previously, when the kernel assigned resource addresses (with the pci=realloc command line parameter, for example) it could clear the struct resource corresponding to the BAR. When this happened, lspci would report this BAR as "ignored": Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K] This is because the kernel reports a zero start address and zero flags in the corresponding sysfs resource file and in /proc/bus/pci/devices. Investigation with 'lspci -x', however, shows the BIOS-assigned address will still be programmed in the device's BAR registers. It's clearly a bug that the kernel lost track of the BAR value, but in most cases, this still won't result in a visible issue because nothing uses the memory, so nothing is affected. However, when an IOMMU is in use, it will not reserve this space in the IOVA because the kernel no longer thinks the range is valid. (See dmar_init_reserved_ranges() for the Intel implementation of this.) Without the proper reserved range, a DMA mapping may allocate an IOVA that matches a bridge BAR, which results in DMA accesses going to the BAR instead of the intended RAM. The problem was in pci_assign_unassigned_root_bus_resources(). When any resource from a bridge device fails to get assigned, the code set the resource's flags to zero. This makes sense for bridge windows, as they will be re-enabled later, but for regular BARs, it makes the kernel permanently lose track of the fact that they decode address space. Change pci_assign_unassigned_root_bus_resources() and pci_assign_unassigned_bridge_resources() so they only clear "res->flags" for bridge windows, not bridge BARs. Fixes: `da7822e5ad` ("PCI: update bridge resources to get more big ranges when allocating space (again)") Link: https://lore.kernel.org/r/20200108213208.4612-1-logang@deltatee.com [bhelgaas: commit log, check for pci_is_bridge()] Reported-by: Kit Chow <kchow@gigaio.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:23 -05:00
Logan Gunthorpe	239514d16f	PCI/switchtec: Fix vep_vector_number ioread width commit `9375646b4c` upstream. vep_vector_number is actually a 16 bit register which should be read with ioread16() instead of ioread32(). Fixes: `080b47def5` ("MicroSemi Switchtec management interface driver") Link: https://lore.kernel.org/r/20200106190337.2428-3-logang@deltatee.com Reported-by: Doug Meyer <dmeyer@gigaio.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:23 -05:00
Bryan O'Donoghue	654ba8dcc3	ath10k: pci: Only dump ATH10K_MEM_REGION_TYPE_IOREG when safe commit `d239380196` upstream. ath10k_pci_dump_memory_reg() will try to access memory of type ATH10K_MEM_REGION_TYPE_IOREG however, if a hardware restart is in progress this can crash a system. Individual ioread32() time has been observed to jump from 15-20 ticks to > 80k ticks followed by a secure-watchdog bite and a system reset. Work around this corner case by only issuing the read transaction when the driver state is ATH10K_STATE_ON. Tested-on: QCA9988 PCI 10.4-3.9.0.2-00044 Fixes: `219cc084c6` ("ath10k: add memory dump support QCA9984") Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:23 -05:00
Navid Emamdoost	93b5c76198	PCI/IOV: Fix memory leak in pci_iov_add_virtfn() commit `8c386cc817` upstream. In the implementation of pci_iov_add_virtfn() the allocated virtfn is leaked if pci_setup_device() fails. The error handling is not calling pci_stop_and_remove_bus_device(). Change the goto label to failed2. Fixes: `156c55325d` ("PCI: Check for pci_setup_device() failure in pci_iov_add_virtfn()") Link: https://lore.kernel.org/r/20191125195255.23740-1-navid.emamdoost@gmail.com Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:23 -05:00
Bean Huo	960e54416b	scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails commit `b9fc532021` upstream. A non-zero error value likely being returned by ufshcd_scsi_add_wlus() in case of failure of adding the WLs, but ufshcd_probe_hba() doesn't use this value, and doesn't report this failure to upper caller. This patch is to fix this issue. Fixes: `2a8fa60044` ("ufs: manually add well known logical units") Link: https://lore.kernel.org/r/20200120130820.1737-2-huobean@gmail.com Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:23 -05:00
Michael Guralnik	eb22d5e307	RDMA/uverbs: Verify MR access flags commit `ca95c14111` upstream. Verify that MR access flags that are passed from user are all supported ones, otherwise an error is returned. Fixes: `4fca037783` ("IB/uverbs: Move ib_access_flags and ib_read_counters_flags to uapi") Link: https://lore.kernel.org/r/1578506740-22188-6-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:23 -05:00
Jason Gunthorpe	14d37518a9	RDMA/core: Fix locking in ib_uverbs_event_read commit `14e23bd6d2` upstream. This should not be using ib_dev to test for disassociation, during disassociation is_closed is set under lock and the waitq is triggered. Instead check is_closed and be sure to re-obtain the lock to test the value after the wait_event returns. Fixes: `036b106357` ("IB/uverbs: Enable device removal when there are active user space applications") Link: https://lore.kernel.org/r/1578504126-9400-12-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:23 -05:00
Håkon Bugge	ab40fc36d6	RDMA/netlink: Do not always generate an ACK for some netlink operations commit `a242c36951` upstream. In rdma_nl_rcv_skb(), the local variable err is assigned the return value of the supplied callback function, which could be one of ib_nl_handle_resolve_resp(), ib_nl_handle_set_timeout(), or ib_nl_handle_ip_res_resp(). These three functions all return skb->len on success. rdma_nl_rcv_skb() is merely a copy of netlink_rcv_skb(). The callback functions used by the latter have the convention: "Returns 0 on success or a negative error code". In particular, the statement (equal for both functions): if (nlh->nlmsg_flags & NLM_F_ACK \|\| err) implies that rdma_nl_rcv_skb() always will ack a message, independent of the NLM_F_ACK being set in nlmsg_flags or not. The fix could be to change the above statement, but it is better to keep the two *_rcv_skb() functions equal in this respect and instead change the three callback functions in the rdma subsystem to the correct convention. Fixes: `2ca546b92a` ("IB/sa: Route SA pathrecord query through netlink") Fixes: `ae43f82867` ("IB/core: Add IP to GID netlink offload") Link: https://lore.kernel.org/r/20191216120436.3204814-1-haakon.bugge@oracle.com Suggested-by: Mark Haywood <mark.haywood@oracle.com> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Tested-by: Mark Haywood <mark.haywood@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:22 -05:00
Jack Morgenstein	6ddcb30256	IB/mlx4: Fix memory leak in add_gid error flow commit `eaad647e5c` upstream. In procedure mlx4_ib_add_gid(), if the driver is unable to update the FW gid table, there is a memory leak in the driver's copy of the gid table: the gid entry's context buffer is not freed. If such an error occurs, free the entry's context buffer, and mark the entry as available (by setting its context pointer to NULL). Fixes: `e26be1bfef` ("IB/mlx4: Implement ib_device callbacks") Link: https://lore.kernel.org/r/20200115085050.73746-1-leon@kernel.org Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-14 16:33:22 -05:00
Sunil Muthuswamy	9a7f8a176a	hv_sock: Remove the accept port restriction [ Upstream commit `c742c59e1f` ] Currently, hv_sock restricts the port the guest socket can accept connections on. hv_sock divides the socket port namespace into two parts for server side (listening socket), 0-0x7FFFFFFF & 0x80000000-0xFFFFFFFF (there are no restrictions on client port namespace). The first part (0-0x7FFFFFFF) is reserved for sockets where connections can be accepted. The second part (0x80000000-0xFFFFFFFF) is reserved for allocating ports for the peer (host) socket, once a connection is accepted. This reservation of the port namespace is specific to hv_sock and not known by the generic vsock library (ex: af_vsock). This is problematic because auto-binds/ephemeral ports are handled by the generic vsock library and it has no knowledge of this port reservation and could allocate a port that is not compatible with hv_sock (and legitimately so). The issue hasn't surfaced so far because the auto-bind code of vsock (__vsock_bind_stream) prior to the change 'VSOCK: bind to random port for VMADDR_PORT_ANY' would start walking up from LAST_RESERVED_PORT (1023) and start assigning ports. That will take a large number of iterations to hit 0x7FFFFFFF. But, after the above change to randomize port selection, the issue has started coming up more frequently. There has really been no good reason to have this port reservation logic in hv_sock from the get go. Reserving a local port for peer ports is not how things are handled generally. Peer ports should reflect the peer port. This fixes the issue by lifting the port reservation, and also returns the right peer port. Since the code converts the GUID to the peer port (by using the first 4 bytes), there is a possibility of conflicts, but that seems like a reasonable risk to take, given this is limited to vsock and that only applies to all local sockets. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-14 16:33:22 -05:00
Ranjani Sridharan	e7751a4bb7	ASoC: pcm: update FE/BE trigger order based on the command [ Upstream commit `acbf27746e` ] Currently, the trigger orders SND_SOC_DPCM_TRIGGER_PRE/POST determine the order in which FE DAI and BE DAI are triggered. In the case of SND_SOC_DPCM_TRIGGER_PRE, the FE DAI is triggered before the BE DAI and in the case of SND_SOC_DPCM_TRIGGER_POST, the BE DAI is triggered before the FE DAI. And this order remains the same irrespective of the trigger command. In the case of the SOF driver, during playback, the FW expects the BE DAI to be triggered before the FE DAI during the START trigger. The BE DAI trigger handles the starting of Link DMA and so it must be started before the FE DAI is started to prevent xruns during pause/release. This can be addressed by setting the trigger order for the FE dai link to SND_SOC_DPCM_TRIGGER_POST. But during the STOP trigger, the FW expects the FE DAI to be triggered before the BE DAI. Retaining the same order during the START and STOP commands, results in FW error as the DAI component in the FW is still active. The issue can be fixed by mirroring the trigger order of FE and BE DAI's during the START and STOP trigger. So, with the trigger order set to SND_SOC_DPCM_TRIGGER_PRE, the FE DAI will be trigger first during SNDRV_PCM_TRIGGER_START/STOP/RESUME and the BE DAI will be triggered first during the STOP/SUSPEND/PAUSE commands. Conversely, with the trigger order set to SND_SOC_DPCM_TRIGGER_POST, the BE DAI will be triggered first during the SNDRV_PCM_TRIGGER_START/STOP/RESUME commands and the FE DAI will be triggered first during the SNDRV_PCM_TRIGGER_STOP/SUSPEND/PAUSE commands. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20191104224812.3393-2-ranjani.sridharan@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-14 16:33:21 -05:00
Greg Kroah-Hartman	357668399c	Linux 4.19.103	2020-02-11 04:34:19 -08:00
David Howells	06748661c7	rxrpc: Fix service call disconnection [ Upstream commit `b39a934ec7` ] The recent patch that substituted a flag on an rxrpc_call for the connection pointer being NULL as an indication that a call was disconnected puts the set_bit in the wrong place for service calls. This is only a problem if a call is implicitly terminated by a new call coming in on the same connection channel instead of a terminating ACK packet. In such a case, rxrpc_input_implicit_end_call() calls __rxrpc_disconnect_call(), which is now (incorrectly) setting the disconnection bit, meaning that when rxrpc_release_call() is later called, it doesn't call rxrpc_disconnect_call() and so the call isn't removed from the peer's error distribution list and the list gets corrupted. KASAN finds the issue as an access after release on a call, but the position at which it occurs is confusing as it appears to be related to a different call (the call site is where the latter call is being removed from the error distribution list and either the next or pprev pointer points to a previously released call). Fix this by moving the setting of the flag from __rxrpc_disconnect_call() to rxrpc_disconnect_call() in the same place that the connection pointer was being cleared. Fixes: `5273a191dc` ("rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:19 -08:00
Song Liu	a3623db43a	perf/core: Fix mlock accounting in perf_mmap() commit `003461559e` upstream. Decreasing sysctl_perf_event_mlock between two consecutive perf_mmap()s of a perf ring buffer may lead to an integer underflow in locked memory accounting. This may lead to the undesired behaviors, such as failures in BPF map creation. Address this by adjusting the accounting logic to take into account the possibility that the amount of already locked memory may exceed the current limit. Fixes: `c4b7547974` ("perf/core: Make the mlock accounting simple again") Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lkml.kernel.org/r/20200123181146.2238074-1-songliubraving@fb.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:34:19 -08:00
Konstantin Khlebnikov	6284d30e96	clocksource: Prevent double add_timer_on() for watchdog_timer commit `febac332a8` upstream. Kernel crashes inside QEMU/KVM are observed: kernel BUG at kernel/time/timer.c:1154! BUG_ON(timer_pending(timer) \|\| !timer->function) in add_timer_on(). At the same time another cpu got: general protection fault: 0000 [#1] SMP PTI of poinson pointer 0xdead000000000200 in: __hlist_del at include/linux/list.h:681 (inlined by) detach_timer at kernel/time/timer.c:818 (inlined by) expire_timers at kernel/time/timer.c:1355 (inlined by) __run_timers at kernel/time/timer.c:1686 (inlined by) run_timer_softirq at kernel/time/timer.c:1699 Unfortunately kernel logs are badly scrambled, stacktraces are lost. Printing the timer->function before the BUG_ON() pointed to clocksource_watchdog(). The execution of clocksource_watchdog() can race with a sequence of clocksource_stop_watchdog() .. clocksource_start_watchdog(): expire_timers() detach_timer(timer, true); timer->entry.pprev = NULL; raw_spin_unlock_irq(&base->lock); call_timer_fn clocksource_watchdog() clocksource_watchdog_kthread() or clocksource_unbind() spin_lock_irqsave(&watchdog_lock, flags); clocksource_stop_watchdog(); del_timer(&watchdog_timer); watchdog_running = 0; spin_unlock_irqrestore(&watchdog_lock, flags); spin_lock_irqsave(&watchdog_lock, flags); clocksource_start_watchdog(); add_timer_on(&watchdog_timer, ...); watchdog_running = 1; spin_unlock_irqrestore(&watchdog_lock, flags); spin_lock(&watchdog_lock); add_timer_on(&watchdog_timer, ...); BUG_ON(timer_pending(timer) \|\| !timer->function); timer_pending() -> true BUG() I.e. inside clocksource_watchdog() watchdog_timer could be already armed. Check timer_pending() before calling add_timer_on(). This is sufficient as all operations are synchronized by watchdog_lock. Fixes: `75c5158f70` ("timekeeping: Update clocksource with stop_machine") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/158048693917.4378.13823603769948933793.stgit@buzz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:34:18 -08:00
Thomas Gleixner	032a2bf978	x86/apic/msi: Plug non-maskable MSI affinity race commit `6f1a4891a5` upstream. Evan tracked down a subtle race between the update of the MSI message and the device raising an interrupt internally on PCI devices which do not support MSI masking. The update of the MSI message is non-atomic and consists of either 2 or 3 sequential 32bit wide writes to the PCI config space. - Write address low 32bits - Write address high 32bits (If supported by device) - Write data When an interrupt is migrated then both address and data might change, so the kernel attempts to mask the MSI interrupt first. But for MSI masking is optional, so there exist devices which do not provide it. That means that if the device raises an interrupt internally between the writes then a MSI message is sent built from half updated state. On x86 this can lead to spurious interrupts on the wrong interrupt vector when the affinity setting changes both address and data. As a consequence the device interrupt can be lost causing the device to become stuck or malfunctioning. Evan tried to handle that by disabling MSI accross an MSI message update. That's not feasible because disabling MSI has issues on its own: If MSI is disabled the PCI device is routing an interrupt to the legacy INTx mechanism. The INTx delivery can be disabled, but the disablement is not working on all devices. Some devices lose interrupts when both MSI and INTx delivery are disabled. Another way to solve this would be to enforce the allocation of the same vector on all CPUs in the system for this kind of screwed devices. That could be done, but it would bring back the vector space exhaustion problems which got solved a few years ago. Fortunately the high address (if supported by the device) is only relevant when X2APIC is enabled which implies interrupt remapping. In the interrupt remapping case the affinity setting is happening at the interrupt remapping unit and the PCI MSI message is programmed only once when the PCI device is initialized. That makes it possible to solve it with a two step update: 1) Target the MSI msg to the new vector on the current target CPU 2) Target the MSI msg to the new vector on the new target CPU In both cases writing the MSI message is only changing a single 32bit word which prevents the issue of inconsistency. After writing the final destination it is necessary to check whether the device issued an interrupt while the intermediate state #1 (new vector, current CPU) was in effect. This is possible because the affinity change is always happening on the current target CPU. The code runs with interrupts disabled, so the interrupt can be detected by checking the IRR of the local APIC. If the vector is pending in the IRR then the interrupt is retriggered on the new target CPU by sending an IPI for the associated vector on the target CPU. This can cause spurious interrupts on both the local and the new target CPU. 1) If the new vector is not in use on the local CPU and the device affected by the affinity change raised an interrupt during the transitional state (step #1 above) then interrupt entry code will ignore that spurious interrupt. The vector is marked so that the 'No irq handler for vector' warning is supressed once. 2) If the new vector is in use already on the local CPU then the IRR check might see an pending interrupt from the device which is using this vector. The IPI to the new target CPU will then invoke the handler of the device, which got the affinity change, even if that device did not issue an interrupt 3) If the new vector is in use already on the local CPU and the device affected by the affinity change raised an interrupt during the transitional state (step #1 above) then the handler of the device which uses that vector on the local CPU will be invoked. expose issues in device driver interrupt handlers which are not prepared to handle a spurious interrupt correctly. This not a regression, it's just exposing something which was already broken as spurious interrupts can happen for a lot of reasons and all driver handlers need to be able to deal with them. Reported-by: Evan Green <evgreen@chromium.org> Debugged-by: Evan Green <evgreen@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Evan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:34:18 -08:00
Ronnie Sahlberg	71a47ed651	cifs: fail i/o on soft mounts if sessionsetup errors out commit `b0dd940e58` upstream. RHBZ: 1579050 If we have a soft mount we should fail commands for session-setup failures (such as the password having changed/ account being deleted/ ...) and return an error back to the application. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:34:18 -08:00
David Hildenbrand	0a69047d82	mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section [ Upstream commit `e822969cab` ] Patch series "mm: fix max_pfn not falling on section boundary", v2. Playing with different memory sizes for a x86-64 guest, I discovered that some memmaps (highest section if max_mem does not fall on the section boundary) are marked as being valid and online, but contain garbage. We have to properly initialize these memmaps. Looking at /proc/kpageflags and friends, I found some more issues, partially related to this. This patch (of 3): If max_pfn is not aligned to a section boundary, we can easily run into BUGs. This can e.g., be triggered on x86-64 under QEMU by specifying a memory size that is not a multiple of 128MB (e.g., 4097MB, but also 4160MB). I was told that on real HW, we can easily have this scenario (esp., one of the main reasons sub-section hotadd of devmem was added). The issue is, that we have a valid memmap (pfn_valid()) for the whole section, and the whole section will be marked "online". pfn_to_online_page() will succeed, but the memmap contains garbage. E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m 4160M" - (see tools/vm/page-types.c): [ 200.476376] BUG: unable to handle page fault for address: fffffffffffffffe [ 200.477500] #PF: supervisor read access in kernel mode [ 200.478334] #PF: error_code(0x0000) - not-present page [ 200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0 [ 200.479557] Oops: 0000 [#4] SMP NOPTI [ 200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G D W 5.5.0-rc1-next-20191209 #93 [ 200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4 [ 200.481648] RIP: 0010:stable_page_flags+0x4d/0x410 [ 200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f [ 200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202 [ 200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000 [ 200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246 [ 200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000 [ 200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001 [ 200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08 [ 200.487130] FS: 00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000 [ 200.487804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0 [ 200.488897] Call Trace: [ 200.489115] kpageflags_read+0xe9/0x140 [ 200.489447] proc_reg_read+0x3c/0x60 [ 200.489755] vfs_read+0xc2/0x170 [ 200.490037] ksys_pread64+0x65/0xa0 [ 200.490352] do_syscall_64+0x5c/0xa0 [ 200.490665] entry_SYSCALL_64_after_hwframe+0x49/0xbe But it can be triggered much easier via "cat /proc/kpageflags > /dev/null" after cold/hot plugging a DIMM to such a system: [root@localhost ~]# cat /proc/kpageflags > /dev/null [ 111.517275] BUG: unable to handle page fault for address: fffffffffffffffe [ 111.517907] #PF: supervisor read access in kernel mode [ 111.518333] #PF: error_code(0x0000) - not-present page [ 111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0 This patch fixes that by at least zero-ing out that memmap (so e.g., page_to_pfn() will not crash). Commit `907ec5fca3` ("mm: zero remaining unavailable struct pages") tried to fix a similar issue, but forgot to consider this special case. After this patch, there are still problems to solve. E.g., not all of these pages falling into a memory hole will actually get initialized later and set PageReserved - they are only zeroed out - but at least the immediate crashes are gone. A follow-up patch will take care of this. Link: http://lkml.kernel.org/r/20191211163201.17179-2-david@redhat.com Fixes: `f7f99100d8` ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: David Hildenbrand <david@redhat.com> Tested-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Bob Picco <bob.picco@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: <stable@vger.kernel.org> [4.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:18 -08:00
Pavel Tatashin	f19a50c1e3	mm: return zero_resv_unavail optimization [ Upstream commit `ec393a0f01` ] When checking for valid pfns in zero_resv_unavail(), it is not necessary to verify that pfns within pageblock_nr_pages ranges are valid, only the first one needs to be checked. This is because memory for pages are allocated in contiguous chunks that contain pageblock_nr_pages struct pages. Link: http://lkml.kernel.org/r/20181002143821.5112-3-msys.mizuma@gmail.com Signed-off-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:18 -08:00
Naoya Horiguchi	9ac5917a1d	mm: zero remaining unavailable struct pages [ Upstream commit `907ec5fca3` ] Patch series "mm: Fix for movable_node boot option", v3. This patch series contains a fix for the movable_node boot option issue which was introduced by commit `124049decb` ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"). The commit breaks the option because it changed the memory gap range to reserved memblock. So, the node is marked as Normal zone even if the SRAT has Hot pluggable affinity. First and second patch fix the original issue which the commit tried to fix, then revert the commit. This patch (of 3): There is a kernel panic that is triggered when reading /proc/kpageflags on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': BUG: unable to handle kernel paging request at fffffffffffffffe PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 RIP: 0010:stable_page_flags+0x27/0x3c0 Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202 RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0 RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001 R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0 R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10 FS: 00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0 Call Trace: kpageflags_read+0xc7/0x120 proc_reg_read+0x3c/0x60 __vfs_read+0x36/0x170 vfs_read+0x89/0x130 ksys_pread64+0x71/0x90 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7efc42e75e23 Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 According to kernel bisection, this problem became visible due to commit `f7f99100d8` which changes how struct pages are initialized. Memblock layout affects the pfn ranges covered by node/zone. Consider that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the default (no memmap= given) memblock layout is like below: MEMBLOCK configuration: memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000 memory.cnt = 0x4 memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0 memory[0x3] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0 ... If you give memmap=1G!4G (so it just covers memory[0x2]), the range [0x100000000-0x13fffffff] is gone: MEMBLOCK configuration: memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000 memory.cnt = 0x3 memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0 ... This causes shrinking node 0's pfn range because it is calculated by the address range of memblock.memory. So some of struct pages in the gap range are left uninitialized. We have a function zero_resv_unavail() which does zeroing the struct pages outside memblock.memory, but currently it covers only the reserved unavailable range (i.e. memblock.memory && !memblock.reserved). This patch extends it to cover all unavailable range, which fixes the reported issue. Link: http://lkml.kernel.org/r/20181002143821.5112-2-msys.mizuma@gmail.com Fixes: `f7f99100d8` ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Tested-by: Oscar Salvador <osalvador@suse.de> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:18 -08:00
Sean Christopherson	21b70d9bc1	KVM: Play nice with read-only memslots when querying host page size [ Upstream commit `42cde48b2d` ] Avoid the "writable" check in __gfn_to_hva_many(), which will always fail on read-only memslots due to gfn_to_hva() assuming writes. Functionally, this allows x86 to create large mappings for read-only memslots that are backed by HugeTLB mappings. Note, the changelog for commit `05da45583d` ("KVM: MMU: large page support") states "If the largepage contains write-protected pages, a large pte is not used.", but "write-protected" refers to pages that are temporarily read-only, e.g. read-only memslots didn't even exist at the time. Fixes: `4d8b81abc4` ("KVM: introduce readonly memslot") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Redone using kvm_vcpu_gfn_to_memslot_prot. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:17 -08:00
Sean Christopherson	dabf1a1096	KVM: Use vcpu-specific gva->hva translation when querying host page size [ Upstream commit `f9b84e1922` ] Use kvm_vcpu_gfn_to_hva() when retrieving the host page size so that the correct set of memslots is used when handling x86 page faults in SMM. Fixes: `54bf36aac5` ("KVM: x86: use vcpu-specific functions to read/write/translate GFNs") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:17 -08:00
Miaohe Lin	eb2c9541bc	KVM: nVMX: vmread should not set rflags to specify success in case of #PF [ Upstream commit `a4d956b939` ] In case writing to vmread destination operand result in a #PF, vmread should not call nested_vmx_succeed() to set rflags to specify success. Similar to as done in VMPTRST (See handle_vmptrst()). Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:17 -08:00
Sean Christopherson	57211b7366	KVM: VMX: Add non-canonical check on writes to RTIT address MSRs [ Upstream commit `fe6ed369fc` ] Reject writes to RTIT address MSRs if the data being written is a non-canonical address as the MSRs are subject to canonical checks, e.g. KVM will trigger an unchecked #GP when loading the values to hardware during pt_guest_enter(). Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:17 -08:00
Sean Christopherson	9b376cb650	KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM [ Upstream commit `736c291c9f` ] Convert a plethora of parameters and variables in the MMU and page fault flows from type gva_t to gpa_t to properly handle TDP on 32-bit KVM. Thanks to PSE and PAE paging, 32-bit kernels can access 64-bit physical addresses. When TDP is enabled, the fault address is a guest physical address and thus can be a 64-bit value, even when both KVM and its guest are using 32-bit virtual addressing, e.g. VMX's VMCS.GUEST_PHYSICAL is a 64-bit field, not a natural width field. Using a gva_t for the fault address means KVM will incorrectly drop the upper 32-bits of the GPA. Ditto for gva_to_gpa() when it is used to translate L2 GPAs to L1 GPAs. Opportunistically rename variables and parameters to better reflect the dual address modes, e.g. use "cr2_or_gpa" for fault addresses and plain "addr" instead of "vaddr" when the address may be either a GVA or an L2 GPA. Similarly, use "gpa" in the nonpaging_page_fault() flows to avoid a confusing "gpa_t gva" declaration; this also sets the stage for a future patch to combing nonpaging_page_fault() and tdp_page_fault() with minimal churn. Sprinkle in a few comments to document flows where an address is known to be a GVA and thus can be safely truncated to a 32-bit value. Add WARNs in kvm_handle_page_fault() and FNAME(gva_to_gpa_nested)() to help document such cases and detect bugs. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:17 -08:00
Sean Christopherson	c2e29d0fe6	KVM: x86/mmu: Apply max PA check for MMIO sptes to 32-bit KVM [ Upstream commit `e30a7d623d` ] Remove the bogus 64-bit only condition from the check that disables MMIO spte optimization when the system supports the max PA, i.e. doesn't have any reserved PA bits. 32-bit KVM always uses PAE paging for the shadow MMU, and per Intel's SDM: PAE paging translates 32-bit linear addresses to 52-bit physical addresses. The kernel's restrictions on max physical addresses are limits on how much memory the kernel can reasonably use, not what physical addresses are supported by hardware. Fixes: `ce88decffd` ("KVM: MMU: mmio page fault support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:17 -08:00
Josef Bacik	860473714c	btrfs: flush write bio if we loop in extent_write_cache_pages [ Upstream commit 96bf313ecb33567af4cb53928b0c951254a02759 ] There exists a deadlock with range_cyclic that has existed forever. If we loop around with a bio already built we could deadlock with a writer who has the page locked that we're attempting to write but is waiting on a page in our bio to be written out. The task traces are as follows PID: 1329874 TASK: ffff889ebcdf3800 CPU: 33 COMMAND: "kworker/u113:5" #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3 #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42 #3 [ffffc900297bb708] __lock_page at ffffffff811f145b #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502 #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684 #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0 #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2 #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd PID: 2167901 TASK: ffff889dc6a59c00 CPU: 14 COMMAND: "aio-dio-invalid" #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3 #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42 #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6 #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7 #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359 #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933 #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8 #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032 I used drgn to find the respective pages we were stuck on page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901 page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874 As you can see the kworker is waiting for bit 0 (PG_locked) on index 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index 8148. aio-dio-invalid has 7680, and the kworker epd looks like the following crash> struct extent_page_data ffffc900297bbbb0 struct extent_page_data { bio = 0xffff889f747ed830, tree = 0xffff889eed6ba448, extent_locked = 0, sync_io = 0 } Probably worth mentioning as well that it waits for writeback of the page to complete while holding a lock on it (at prepare_pages()). Using drgn I walked the bio pages looking for page 0xffffea00fbfc7500 which is the one we're waiting for writeback on bio = Object(prog, 'struct bio', address=0xffff889f747ed830) for i in range(0, bio.bi_vcnt.value_()): bv = bio.bi_io_vec[i] if bv.bv_page.value_() == 0xffffea00fbfc7500: print("FOUND IT") which validated what I suspected. The fix for this is simple, flush the epd before we loop back around to the beginning of the file during writeout. Fixes: `b293f02e14` ("Btrfs: Add writepages support") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:16 -08:00
Wayne Lin	4ecba33ec8	drm/dp_mst: Remove VCPI while disabling topology mgr [ Upstream commit `64e62bdf04` ] [Why] This patch is trying to address the issue observed when hotplug DP daisy chain monitors. e.g. src-mstb-mstb-sst -> src (unplug) mstb-mstb-sst -> src-mstb-mstb-sst (plug in again) Once unplug a DP MST capable device, driver will call drm_dp_mst_topology_mgr_set_mst() to disable MST. In this function, it cleans data of topology manager while disabling mst_state. However, it doesn't clean up the proposed_vcpis of topology manager. If proposed_vcpi is not reset, once plug in MST daisy chain monitors later, code will fail at checking port validation while trying to allocate payloads. When MST capable device is plugged in again and try to allocate payloads by calling drm_dp_update_payload_part1(), this function will iterate over all proposed virtual channels to see if any proposed VCPI's num_slots is greater than 0. If any proposed VCPI's num_slots is greater than 0 and the port which the specific virtual channel directed to is not in the topology, code then fails at the port validation. Since there are stale VCPI allocations from the previous topology enablement in proposed_vcpi[], code will fail at port validation and reurn EINVAL. [How] Clean up the data of stale proposed_vcpi[] and reset mgr->proposed_vcpis to NULL while disabling mst in drm_dp_mst_topology_mgr_set_mst(). Changes since v1: *Add on more details in commit message to describe the issue which the patch is trying to fix Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> [added cc to stable] Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205090043.7580-1-Wayne.Lin@amd.com Cc: <stable@vger.kernel.org> # v3.17+ Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:16 -08:00
Claudiu Beznea	1f1611dc1f	drm: atmel-hlcdc: enable clock before configuring timing engine [ Upstream commit `2c1fb9d86f` ] Changing pixel clock source without having this clock source enabled will block the timing engine and the next operations after (in this case setting ATMEL_HLCDC_CFG(5) settings in atmel_hlcdc_crtc_mode_set_nofb() will fail). It is recomended (although in datasheet this is not present) to actually enabled pixel clock source before doing any changes on timing enginge (only SAM9X60 datasheet specifies that the peripheral clock and pixel clock must be enabled before using LCD controller). Fixes: `1a396789f6` ("drm: add Atmel HLCDC Display Controller support") Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Boris Brezillon <boris.brezillon@free-electrons.com> Cc: <stable@vger.kernel.org> # v4.0+ Link: https://patchwork.freedesktop.org/patch/msgid/1576672109-22707-3-git-send-email-claudiu.beznea@microchip.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:16 -08:00
Josef Bacik	159db2ae36	btrfs: free block groups after free'ing fs trees [ Upstream commit `4e19443da1` ] Sometimes when running generic/475 we would trip the WARN_ON(cache->reserved) check when free'ing the block groups on umount. This is because sometimes we don't commit the transaction because of IO errors and thus do not cleanup the tree logs until at umount time. These blocks are still reserved until they are cleaned up, but they aren't cleaned up until _after_ we do the free block groups work. Fix this by moving the free after free'ing the fs roots, that way all of the tree logs are cleaned up and we have a properly cleaned fs. A bunch of loops of generic/475 confirmed this fixes the problem. CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:16 -08:00
Anand Jain	381a16fa10	btrfs: use bool argument in free_root_pointers() [ Upstream commit `4273eaff9b` ] We don't need int argument bool shall do in free_root_pointers(). And rename the argument as it confused two people. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:16 -08:00
Eric Biggers	987bb7a3fd	ext4: fix deadlock allocating crypto bounce page from mempool [ Upstream commit `547c556f4d` ] ext4_writepages() on an encrypted file has to encrypt the data, but it can't modify the pagecache pages in-place, so it encrypts the data into bounce pages and writes those instead. All bounce pages are allocated from a mempool using GFP_NOFS. This is not correct use of a mempool, and it can deadlock. This is because GFP_NOFS includes __GFP_DIRECT_RECLAIM, which enables the "never fail" mode for mempool_alloc() where a failed allocation will fall back to waiting for one of the preallocated elements in the pool. But since this mode is used for all a bio's pages and not just the first, it can deadlock waiting for pages already in the bio to be freed. This deadlock can be reproduced by patching mempool_alloc() to pretend that pool->alloc() always fails (so that it always falls back to the preallocations), and then creating an encrypted file of size > 128 KiB. Fix it by only using GFP_NOFS for the first page in the bio. For subsequent pages just use GFP_NOWAIT, and if any of those fail, just submit the bio and start a new one. This will need to be fixed in f2fs too, but that's less straightforward. Fixes: `c9af28fdd4` ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191231181149.47619-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:34:16 -08:00

1 2 3 4 5 ...

795712 Commits