linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-10 21:07:02 +09:00

Author	SHA1	Message	Date
Tejun Heo	10f24144ca	iocost: don't reset the inuse weight of under-weighted debtors commit `8c936f9ea1` upstream. When an iocg is in debt, its inuse weight is owned by debt handling and should stay at 1. This invariant was broken when determining the amount of surpluses at the beginning of donation calculation - when an iocg's hierarchical weight is too low, the iocg is excluded from donation calculation and its inuse is reset to its active regardless of its indebtedness, triggering warnings like the following: WARNING: CPU: 5 PID: 0 at block/blk-iocost.c:1416 iocg_kick_waitq+0x392/0x3a0 ... RIP: 0010:iocg_kick_waitq+0x392/0x3a0 Code: 00 00 be ff ff ff ff 48 89 4d a8 e8 98 b2 70 00 48 8b 4d a8 85 c0 0f 85 4a fe ff ff 0f 0b e9 43 fe ff ff 0f 0b e9 4d fe ff ff <0f> 0b e9 50 fe ff ff e8 a2 ae 70 00 66 90 0f 1f 44 00 00 55 48 89 RSP: 0018:ffffc90000200d08 EFLAGS: 00010016 ... <IRQ> ioc_timer_fn+0x2e0/0x1470 call_timer_fn+0xa1/0x2c0 ... As this happens only when an iocg's hierarchical weight is negligible, its impact likely is limited to triggering the warnings. Fix it by skipping resetting inuse of under-weighted debtors. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rik van Riel <riel@surriel.com> Fixes: `c421a3eb2e` ("blk-iocost: revamp debt handling") Cc: stable@vger.kernel.org # v5.10+ Link: https://lore.kernel.org/r/YmjODd4aif9BzFuO@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:31 +02:00
Thomas Gleixner	559d4f4595	x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests commit `7e0815b3e0` upstream. When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI layer. This can lead to a situation where the PCI/MSI layer masks an MSI[-X] interrupt and the hypervisor grants the write despite the fact that it already requested the interrupt. As a consequence interrupt delivery on the affected device is not happening ever. Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests already. Fixes: `809f9267bb` ("xen: map MSIs into pirqs") Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Reported-by: Dusty Mabe <dustymabe@redhat.com> Reported-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Noah Meyerhans <noahm@debian.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglx Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:31 +02:00
Guo Ren	95ad6bef5b	riscv: patch_text: Fixup last cpu should be master commit `8ec1442953` upstream. These patch_text implementations are using stop_machine_cpuslocked infrastructure with atomic cpu_count. The original idea: When the master CPU patch_text, the others should wait for it. But current implementation is using the first CPU as master, which couldn't guarantee the remaining CPUs are waiting. This patch changes the last CPU as the master to solve the potential risk. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: `043cb41a85` ("riscv: introduce interfaces to patch kernel code") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:31 +02:00
Mikulas Patocka	3437091fcc	hex2bin: fix access beyond string end commit `e4d8a29997` upstream. If we pass too short string to "hex2bin" (and the string size without the terminating NUL character is even), "hex2bin" reads one byte after the terminating NUL character. This patch fixes it. Note that hex_to_bin returns -1 on error and hex2bin return -EINVAL on error - so we can't just return the variable "hi" or "lo" on error. This inconsistency may be fixed in the next merge window, but for the purpose of fixing this bug, we just preserve the existing behavior and return -1 and -EINVAL. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Fixes: `b78049831f` ("lib: add error checking to hex2bin") Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:30 +02:00
Mikulas Patocka	4541645b58	hex2bin: make the function hex_to_bin constant-time commit `e5be15767e` upstream. The function hex2bin is used to load cryptographic keys into device mapper targets dm-crypt and dm-integrity. It should take constant time independent on the processed data, so that concurrently running unprivileged code can't infer any information about the keys via microarchitectural convert channels. This patch changes the function hex_to_bin so that it contains no branches and no memory accesses. Note that this shouldn't cause performance degradation because the size of the new function is the same as the size of the old function (on x86-64) - and the new function causes no branch misprediction penalties. I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64 i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32 sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64 powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are no branches in the generated code. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:30 +02:00
Krzysztof Kozlowski	41dee18326	pinctrl: samsung: fix missing GPIOLIB on ARM64 Exynos config commit `ac875df4d8` upstream. The Samsung pinctrl drivers depend on OF_GPIO, which is part of GPIOLIB. ARMv7 Exynos platform selects GPIOLIB and Samsung pinctrl drivers. ARMv8 Exynos selects only the latter leading to possible wrong configuration on ARMv8 build: WARNING: unmet direct dependencies detected for PINCTRL_EXYNOS Depends on [n]: PINCTRL [=y] && OF_GPIO [=n] && (ARCH_EXYNOS [=y] \|\| ARCH_S5PV210 \|\| COMPILE_TEST [=y]) Selected by [y]: - ARCH_EXYNOS [=y] Always select the GPIOLIB from the Samsung pinctrl drivers to fix the issue. This requires removing of OF_GPIO dependency (to avoid recursive dependency), so add dependency on OF for COMPILE_TEST cases. Reported-by: Necip Fazil Yildiran <fazilyildiran@gmail.com> Fixes: `eed6b3eb20` ("arm64: Split out platform options to separate Kconfig") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20220420141407.470955-1-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:30 +02:00
Johan Hovold	8f2dac58b3	arm64: dts: imx8mm-venice: fix spi2 pin configuration commit `dc90043133` upstream. Due to what looks like a copy-paste error, the ECSPI2_MISO pad is not muxed for SPI mode and causes reads from a slave-device connected to the SPI header to always return zero. Configure the ECSPI2_MISO pad for SPI mode on the gw71xx, gw72xx and gw73xx families of boards that got this wrong. Fixes: `6f30b27c5e` ("arm64: dts: imx8mm: Add Gateworks i.MX 8M Mini Development Kits") Cc: stable@vger.kernel.org # 5.12 Cc: Tim Harvey <tharvey@gateworks.com> Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Tim Harvey <tharvey@gateworks.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:30 +02:00
Manivannan Sadhasivam	34b5d2aa35	bus: mhi: host: pci_generic: Flush recovery worker during freeze commit `c38f83bae4` upstream. It is possible that the recovery work might be running while the freeze gets executed (during hibernation etc.,). Currently, we don't powerdown the stack if it is not up but if the recovery work completes after freeze, then the device will be up afterwards. This will not be a sane situation. So let's flush the recovery worker before trying to powerdown the device. Cc: stable@vger.kernel.org Fixes: `5f0c2ee1fe` ("bus: mhi: pci-generic: Fix hibernation") Reported-by: Bhaumik Vasav Bhatt <quic_bbhatt@quicinc.com> Reviewed-by: Bhaumik Vasav Bhatt <quic_bbhatt@quicinc.com> Link: https://lore.kernel.org/r/20220408150039.17297-1-manivannan.sadhasivam@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:30 +02:00
Manivannan Sadhasivam	b3d21deabc	bus: mhi: host: pci_generic: Add missing poweroff() PM callback commit `e64d5fa504` upstream. During hibernation process, once thaw() stage completes, the MHI endpoint devices will be in M0 state post recovery. After that, the devices will be powered down so that the system can enter the target sleep state. During this stage, the PCI core will put the devices in D3hot. But this transition is allowed by the MHI spec. The devices can only enter D3hot when it is in M3 state. So for fixing this issue, let's add the poweroff() callback that will get executed before putting the system in target sleep state during hibernation. This callback will power down the device properly so that it could be restored during restore() or thaw() stage. Cc: stable@vger.kernel.org Fixes: `5f0c2ee1fe` ("bus: mhi: pci-generic: Fix hibernation") Reported-by: Hemant Kumar <quic_hemantk@quicinc.com> Suggested-by: Hemant Kumar <quic_hemantk@quicinc.com> Link: https://lore.kernel.org/r/20220405125907.5644-1-manivannan.sadhasivam@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:30 +02:00
Xiubo Li	732f861dd4	ceph: fix possible NULL pointer dereference for req->r_session commit `7acae6183c` upstream. The request will be inserted into the ci->i_unsafe_dirops before assigning the req->r_session, so it's possible that we will hit NULL pointer dereference bug here. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/55327 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:30 +02:00
Wang Qing	2b008197a0	arch_topology: Do not set llc_sibling if llc_id is invalid commit `1dc9f1a66e` upstream. When ACPI is not enabled, cpuid_topo->llc_id = cpu_topo->llc_id = -1, which will set llc_sibling 0xff(...), this is misleading. Don't set llc_sibling(default 0) if we don't know the cache topology. Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Wang Qing <wangqing@vivo.com> Fixes: `37c3ec2d81` ("arm64: topology: divorce MC scheduling domain from core_siblings") Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1649644580-54626-1-git-send-email-wangqing@vivo.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:30 +02:00
Maciej W. Rozycki	03cab849da	serial: 8250: Correct the clock for EndRun PTP/1588 PCIe device commit `637674fa40` upstream. The EndRun PTP/1588 dual serial port device is based on the Oxford Semiconductor OXPCIe952 UART device with the PCI vendor:device ID set for EndRun Technologies and is therefore driven by a fixed 62.5MHz clock input derived from the 100MHz PCI Express clock. The clock rate is divided by the oversampling rate of 16 as it is supplied to the baud rate generator, yielding the baud base of 3906250. Replace the incorrect baud base of 4000000 with the right value of 3906250 then, complementing commit `6cbe45d8ac` ("serial: 8250: Correct the clock for OxSemi PCIe devices"). Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Cc: stable <stable@kernel.org> Fixes: `1bc8cde46a` ("8250_pci: Added driver for Endrun Technologies PTP PCIe card.") Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2204181515270.9383@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:30 +02:00
Maciej W. Rozycki	9445505273	serial: 8250: Also set sticky MCR bits in console restoration commit `6e6eebdf5e` upstream. Sticky MCR bits are lost in console restoration if console suspending has been disabled. This currently affects the AFE bit, which works in combination with RTS which we set, so we want to make sure the UART retains control of its FIFO where previously requested. Also specific drivers may need other bits in the future. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: `4516d50aab` ("serial: 8250: Use canary to restart console after suspend") Cc: stable@vger.kernel.org # v4.0+ Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2204181518490.9383@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:30 +02:00
Lino Sanfilippo	ac55cac5dc	serial: amba-pl011: do not time out prematurely when draining tx fifo commit `0e4deb56b0` upstream. The current timeout for draining the tx fifo in RS485 mode is calculated by multiplying the time it takes to transmit one character (with the given baud rate) with the maximal number of characters in the tx queue. This timeout is too short for two reasons: First when calculating the time to transmit one character integer division is used which may round down the result in case of a remainder of the division. Fix this by rounding up the division result. Second the hardware may need additional time (e.g for first putting the characters from the fifo into the shift register) before the characters are actually put onto the wire. To be on the safe side double the current maximum number of iterations that are used to wait for the queue draining. Fixes: `8d47923772` ("serial: amba-pl011: add RS485 support") Cc: stable@vger.kernel.org Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Link: https://lore.kernel.org/r/20220408233503.7251-1-LinoSanfilippo@gmx.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:29 +02:00
Johan Hovold	858d93280e	serial: imx: fix overrun interrupts in DMA mode commit `3ee82c6e41` upstream. Commit `76821e222c` ("serial: imx: ensure that RX irqs are off if RX is off") accidentally enabled overrun interrupts unconditionally when deferring DMA enable until after the receiver has been enabled during startup. Fix this by using the DMA-initialised instead of DMA-enabled flag to determine whether overrun interrupts should be enabled. Note that overrun interrupts are already accounted for in imx_uart_clear_rx_errors() when using DMA since commit `41d98b5da9` ("serial: imx-serial - update RX error counters when DMA is used"). Fixes: `76821e222c` ("serial: imx: ensure that RX irqs are off if RX is off") Cc: stable@vger.kernel.org # 4.17 Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20220411081957.7846-1-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:29 +02:00
Sean Anderson	c0ea202fbc	usb: phy: generic: Get the vbus supply commit `03e607cbb2` upstream. While support for working with a vbus was added, the regulator was never actually gotten (despite what was documented). Fix this by actually getting the supply from the device tree. Fixes: `7acc9973e3` ("usb: phy: generic: add vbus support") Cc: stable <stable@kernel.org> Signed-off-by: Sean Anderson <sean.anderson@seco.com> Link: https://lore.kernel.org/r/20220425171412.1188485-3-sean.anderson@seco.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:29 +02:00
Pawel Laszczak	dd2af3ad96	usb: cdns3: Fix issue for clear halt endpoint commit `b3fa25de31` upstream. Path fixes bug which occurs during resetting endpoint in __cdns3_gadget_ep_clear_halt function. During resetting endpoint controller will change HW/DMA owned TRB. It set Abort flag in trb->control and will change trb->length field. If driver want to use the aborted trb it must update the changed field in TRB. Fixes: `7733f6c32e` ("usb: cdns3: Add Cadence USB3 DRD Driver") cc: <stable@vger.kernel.org> Acked-by: Peter Chen <peter.chen@kernel.org> Signed-off-by: Pawel Laszczak <pawell@cadence.com> Link: https://lore.kernel.org/r/20220329084605.4022-1-pawell@cadence.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:29 +02:00
Heikki Krogerus	455285db87	usb: dwc3: pci: add support for the Intel Meteor Lake-P commit `973e0f7a84` upstream. This patch adds the necessary PCI IDs for Intel Meteor Lake-P devices. Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/20220425103518.44028-1-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:29 +02:00
Thinh Nguyen	0d1c407b1a	usb: dwc3: gadget: Return proper request status commit `c7428dbddc` upstream. If the user sets the usb_request's no_interrupt, then there will be no completion event for the request. Currently the driver incorrectly uses the event status of a different request to report the status for a request with no_interrupt. The dwc3 driver needs to check the TRB status associated with the request when reporting its status. Note: this is only applicable to missed_isoc TRB completion status, but the other status are also listed for completeness/documentation. Fixes: `6d8a019614` ("usb: dwc3: gadget: check for Missed Isoc from event status") Cc: <stable@vger.kernel.org> Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/db2c80108286cfd108adb05bad52138b78d7c3a7.1650673655.git.Thinh.Nguyen@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:29 +02:00
Thinh Nguyen	7d14c96bff	usb: dwc3: core: Only handle soft-reset in DCTL commit `f4fd84ae07` upstream. Make sure not to set run_stop bit or link state change request while initiating soft-reset. Register read-modify-write operation may unintentionally start the controller before the initialization completes with its previous DCTL value, which can cause initialization failure. Fixes: `f59dcab176` ("usb: dwc3: core: improve reset sequence") Cc: <stable@vger.kernel.org> Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/6aecbd78328f102003d40ccf18ceeebd411d3703.1650594792.git.Thinh.Nguyen@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:29 +02:00
Thinh Nguyen	5d8299ead7	usb: dwc3: core: Fix tx/rx threshold settings commit `f28ad90693` upstream. The current driver logic checks against 0 to determine whether the periodic tx/rx threshold settings are set, but we may get bogus values from uninitialized variables if no device property is set. Properly default these variables to 0. Fixes: `938a5ad1d3` ("usb: dwc3: Check for ESS TX/RX threshold config") Cc: <stable@vger.kernel.org> Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/cccfce990b11b730b0dae42f9d217dc6fb988c90.1649727139.git.Thinh.Nguyen@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:29 +02:00
Sven Peter	b81be940ea	usb: dwc3: Try usb-role-switch first in dwc3_drd_init commit `ab7aa2866d` upstream. If the PHY controller node has a "port" dwc3 tries to find an extcon device even when "usb-role-switch" is present. This happens because dwc3_get_extcon() sees that "port" node and then calls extcon_find_edev_by_node() which will always return EPROBE_DEFER in that case. On the other hand, even if an extcon was present and dwc3_get_extcon() was successful it would still be ignored in favor of "usb-role-switch". Let's just first check if "usb-role-switch" is configured in the device tree and directly use it instead and only try to look for an extcon device otherwise. Fixes: `8a0a137997` ("usb: dwc3: Registering a role switch in the DRD code.") Cc: stable <stable@kernel.org> Signed-off-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20220411155300.9766-1-sven@svenpeter.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:29 +02:00
Vijayavardhan Vennapusa	9f051e510c	usb: gadget: configfs: clear deactivation flag in configfs_composite_unbind() commit `bf95c4d463` upstream. If any function like UVC is deactivating gadget as part of composition switch which results in not calling pullup enablement, it is not getting enabled after switch to new composition due to this deactivation flag not cleared. This results in USB enumeration not happening after switch to new USB composition. Hence clear deactivation flag inside gadget structure in configfs_composite_unbind() before switch to new USB composition. Signed-off-by: Vijayavardhan Vennapusa <vvreddy@codeaurora.org> Signed-off-by: Dan Vacura <w36195@motorola.com> Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/20220413211038.72797-1-w36195@motorola.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:29 +02:00
Dan Vacura	f9b2660d9c	usb: gadget: uvc: Fix crash when encoding data for usb request commit `71d471e3fa` upstream. During the uvcg_video_pump() process, if an error occurs and uvcg_queue_cancel() is called, the buffer queue will be cleared out, but the current marker (queue->buf_used) of the active buffer (no longer active) is not reset. On the next iteration of uvcg_video_pump() the stale buf_used count will be used and the logic of min((unsigned int)len, buf->bytesused - queue->buf_used) may incorrectly calculate a nbytes size, causing an invalid memory access. [80802.185460][ T315] configfs-gadget gadget: uvc: VS request completed with status -18. [80802.185519][ T315] configfs-gadget gadget: uvc: VS request completed with status -18. ... uvcg_queue_cancel() is called and the queue is cleared out, but the marker queue->buf_used is not reset. ... [80802.262328][ T8682] Unable to handle kernel paging request at virtual address ffffffc03af9f000 ... ... [80802.263138][ T8682] Call trace: [80802.263146][ T8682] __memcpy+0x12c/0x180 [80802.263155][ T8682] uvcg_video_pump+0xcc/0x1e0 [80802.263165][ T8682] process_one_work+0x2cc/0x568 [80802.263173][ T8682] worker_thread+0x28c/0x518 [80802.263181][ T8682] kthread+0x160/0x170 [80802.263188][ T8682] ret_from_fork+0x10/0x18 [80802.263198][ T8682] Code: a8c12829 a88130cb a8c130 Fixes: `d692522577` ("usb: gadget/uvc: Port UVC webcam gadget to use videobuf2 framework") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Vacura <w36195@motorola.com> Link: https://lore.kernel.org/r/20220331184024.23918-1-w36195@motorola.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:28 +02:00
Heikki Krogerus	e5e7d6c4b3	usb: typec: ucsi: Fix role swapping commit `eb5d7ff3cf` upstream. All attempts to swap the roles timed out because the completion was done without releasing the port lock. Fixing that by releasing the lock before starting to wait for the completion. Link: https://lore.kernel.org/linux-usb/037de7ac-e210-bdf5-ec7a-8c0c88a0be20@gmail.com/ Fixes: `ad74b8649b` ("usb: typec: ucsi: Preliminary support for alternate modes") Cc: stable@vger.kernel.org Reported-and-tested-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20220405134824.68067-3-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:28 +02:00
Heikki Krogerus	3e5dd4cf30	usb: typec: ucsi: Fix reuse of completion structure commit `e25adcca91` upstream. The role swapping completion variable is reused, so it needs to be reinitialised every time. Otherwise it will be marked as done after the first time it's used and completing immediately. Link: https://lore.kernel.org/linux-usb/20220325203959.GA19752@jackp-linux.qualcomm.com/ Fixes: `6df475f804` ("usb: typec: ucsi: Start using struct typec_operations") Cc: stable@vger.kernel.org Reported-and-suggested-by: Jack Pham <quic_jackp@quicinc.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20220405134824.68067-2-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:28 +02:00
Tasos Sahanidis	6bf55f6b0e	usb: core: Don't hold the device lock while sleeping in do_proc_control() commit `0543e4e885` upstream. Since commit `ae8709b296` ("USB: core: Make do_proc_control() and do_proc_bulk() killable") if a device has the USB_QUIRK_DELAY_CTRL_MSG quirk set, it will temporarily block all other URBs (e.g. interrupts) while sleeping due to a control. This results in noticeable delays when, for example, a userspace usbfs application is sending URB interrupts at a high rate to a keyboard and simultaneously updates the lock indicators using controls. Interrupts with direction set to IN are also affected by this, meaning that delivery of HID reports (containing scancodes) to the usbfs application is delayed as well. This patch fixes the regression by calling msleep() while the device mutex is unlocked, as was the case originally with usb_control_msg(). Fixes: `ae8709b296` ("USB: core: Make do_proc_control() and do_proc_bulk() killable") Cc: stable <stable@kernel.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Tasos Sahanidis <tasos@tasossah.com> Link: https://lore.kernel.org/r/3e299e2a-13b9-ddff-7fee-6845e868bc06@tasossah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:28 +02:00
Hangyu Hua	949d422949	usb: misc: fix improper handling of refcount in uss720_probe() commit `0a96fa640d` upstream. usb_put_dev shouldn't be called when uss720_probe succeeds because of priv->usbdev. At the same time, priv->usbdev shouldn't be set to NULL before destroy_priv in uss720_disconnect because usb_put_dev is in destroy_priv. Fix this by moving priv->usbdev = NULL after usb_put_dev. Fixes: `dcb4b8ad6a` ("misc/uss720: fix memory leak in uss720_probe") Cc: stable <stable@kernel.org> Reviewed-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20220407024001.11761-1-hbh25y@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:28 +02:00
Fawzi Khaber	a590353a95	iio: imu: inv_icm42600: Fix I2C init possible nack commit `b5d6ba09b1` upstream. This register write to REG_INTF_CONFIG6 enables a spike filter that is impacting the line and can prevent the I2C ACK to be seen by the controller. So we don't test the return value. Fixes: `7297ef1e26` ("iio: imu: inv_icm42600: add I2C driver") Signed-off-by: Fawzi Khaber <fawzi.khaber@tdk.com> Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol@tdk.com> Link: https://lore.kernel.org/r/20220411111533.5826-1-jmaneyrol@invensense.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:28 +02:00
Zheyu Ma	7619f3c498	iio: magnetometer: ak8975: Fix the error handling in ak8975_power_on() commit `3a26787dac` upstream. When the driver fails to enable the regulator 'vid', we will get the following splat: [ 79.955610] WARNING: CPU: 5 PID: 441 at drivers/regulator/core.c:2257 _regulator_put+0x3ec/0x4e0 [ 79.959641] RIP: 0010:_regulator_put+0x3ec/0x4e0 [ 79.967570] Call Trace: [ 79.967773] <TASK> [ 79.967951] regulator_put+0x1f/0x30 [ 79.968254] devres_release_group+0x319/0x3d0 [ 79.968608] i2c_device_probe+0x766/0x940 Fix this by disabling the 'vdd' regulator when failing to enable 'vid' regulator. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Link: https://lore.kernel.org/r/20220409034849.3717231-2-zheyuma97@gmail.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:28 +02:00
Michael Hennerich	28e1f974e3	iio: dac: ad5446: Fix read_raw not returning set value commit `89a01cd688` upstream. read_raw should return the un-scaled value. Fixes: `5e06bdfb46` ("staging:iio:dac:ad5446: Return cached value for 'raw' attribute") Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20220406105620.1171340-1-michael.hennerich@analog.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:28 +02:00
Zizhuang Deng	cd266c38aa	iio: dac: ad5592r: Fix the missing return value. commit `b55b38f7cc` upstream. The third call to `fwnode_property_read_u32` did not record the return value, resulting in `channel_offstate` possibly being assigned the wrong value. Fixes: `56ca9db862` ("iio: dac: Add support for the AD5592R/AD5593R ADCs/DACs") Signed-off-by: Zizhuang Deng <sunsetdzz@gmail.com> Link: https://lore.kernel.org/r/20220310125450.4164164-1-sunsetdzz@gmail.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:28 +02:00
Mathias Nyman	48bc03979f	xhci: increase usb U3 -> U0 link resume timeout from 100ms to 500ms commit `33597f0c48` upstream. The first U3 wake signal by the host may be lost if the USB 3 connection is tunneled over USB4, with a runtime suspended USB4 host, and firmware implemented connection manager. Specs state the host must wait 100ms (tU3WakeupRetryDelay) before resending a U3 wake signal if device doesn't respond, leading to U3 -> U0 link transition times around 270ms in the tunneled case. Fixes: `0200b9f790` ("xhci: Wait until link state trainsits to U0 after setting USB_SS_PORT_LS_U0") Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20220408134823.2527272-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:28 +02:00
Henry Lin	9faa311f65	xhci: stop polling roothubs after shutdown commit `dc92944a01` upstream. While rebooting, XHCI controller and its bus device will be shut down in order by .shutdown callback. Stopping roothubs polling in xhci_shutdown() can prevent XHCI driver from accessing port status after its bus device shutdown. Take PCIe XHCI controller as example, if XHCI driver doesn't stop roothubs polling, XHCI driver may access PCIe BAR register for port status after parent PCIe root port driver is shutdown and cause PCIe bus error. [check shared hcd exist before stopping its roothub polling -Mathias] Cc: stable@vger.kernel.org Signed-off-by: Henry Lin <henryl@nvidia.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20220408134823.2527272-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:28 +02:00
Evan Green	10e0d30f99	xhci: Enable runtime PM on second Alderlake controller commit `d8bfe5091d` upstream. Alderlake has two XHCI controllers with PCI IDs 0x461e and 0x51ed. We had previously added the quirk to default enable runtime PM for 0x461e, now add it for 0x51ed as well. Signed-off-by: Evan Green <evgreen@chromium.org> Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/20220408114225.1.Ibcff6b86ed4eacfe4c4bc89c90e18416f3900a3e@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:27 +02:00
zhangqilong	576b40690e	usb: xhci: tegra:Fix PM usage reference leak of tegra_xusb_unpowergate_partitions commit `8771039482` upstream. pm_runtime_get_sync will increment pm usage counter even it failed. Forgetting to putting operation will result in reference leak here. We fix it by replacing it with pm_runtime_resume_and_get to keep usage counter balanced. Fixes: `41a7426d25` ("usb: xhci: tegra: Unlink power domain devices") Cc: stable <stable@vger.kernel.org> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20220319023822.145641-1-zhangqilong3@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:27 +02:00
Daniele Palmas	4ebf2982db	USB: serial: option: add Telit 0x1057, 0x1058, 0x1075 compositions commit `f32c5a0423` upstream. Add support for the following Telit FN980 and FN990 compositions: 0x1057: tty, adb, rmnet, tty, tty, tty, tty, tty 0x1058: tty, adb, tty, tty, tty, tty, tty 0x1075: adb, tty Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Link: https://lore.kernel.org/r/20220406141408.580669-1-dnlplm@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:27 +02:00
Slark Xiao	be94d697ba	USB: serial: option: add support for Cinterion MV32-WA/MV32-WB commit `b4a64ed6e7` upstream. Add support for Cinterion device MV32-WA/MV32-WB. MV32-WA PID is 0x00F1, and MV32-WB PID is 0x00F2. Test evidence as below: T: Bus=04 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 4 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1 P: Vendor=1e2d ProdID=00f1 Rev=05.04 S: Manufacturer=Cinterion S: Product=Cinterion PID 0x00F1 USB Mobile Broadband S: SerialNumber=78ada8c4 C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#=0x0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim I: If#=0x1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) I: If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option I: If#=0x5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option T: Bus=04 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 3 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1 P: Vendor=1e2d ProdID=00f2 Rev=05.04 S: Manufacturer=Cinterion S: Product=Cinterion PID 0x00F2 USB Mobile Broadband S: SerialNumber=cdd06a78 C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#=0x0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim I: If#=0x1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) I: If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option I: If#=0x5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option Interface 0&1: MBIM, 2:Modem, 3: GNSS, 4: NMEA, 5: Diag GNSS port don't use serial driver. Signed-off-by: Slark Xiao <slark_xiao@163.com> Link: https://lore.kernel.org/r/20220414074434.5699-1-slark_xiao@163.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:27 +02:00
Bruno Thomsen	bb73ae98f0	USB: serial: cp210x: add PIDs for Kamstrup USB Meter Reader commit `35a923a0b3` upstream. Wireless reading of water and heat meters using 868 MHz wM-Bus mode C1. The two different product IDs allow detection of dongle antenna solution: - Internal antenna - External antenna using SMA connector https://www.kamstrup.com/en-en/water-solutions/water-meter-reading/usb-meter-reader Signed-off-by: Bruno Thomsen <bruno.thomsen@gmail.com> Link: https://lore.kernel.org/r/20220414081202.5591-1-bruno.thomsen@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:27 +02:00
Kees Cook	f183708e8e	USB: serial: whiteheat: fix heap overflow in WHITEHEAT_GET_DTR_RTS commit `e23e50e7ac` upstream. The sizeof(struct whitehat_dr_info) can be 4 bytes under CONFIG_AEABI=n due to "-mabi=apcs-gnu", even though it has a single u8: whiteheat_private { __u8 mcr; /* 0 1 / / size: 4, cachelines: 1, members: 1 / / padding: 3 / / last cacheline: 4 bytes */ }; The result is technically harmless, as both the source and the destinations are currently the same allocation size (4 bytes) and don't use their padding, but if anything were to ever be added after the "mcr" member in "struct whiteheat_private", it would be overwritten. The structs both have a single u8 "mcr" member, but are 4 bytes in padded size. The memcpy() destination was explicitly targeting the u8 member (size 1) with the length of the whole structure (size 4), triggering the memcpy buffer overflow warning: In file included from include/linux/string.h:253, from include/linux/bitmap.h:11, from include/linux/cpumask.h:12, from include/linux/smp.h:13, from include/linux/lockdep.h:14, from include/linux/spinlock.h:62, from include/linux/mmzone.h:8, from include/linux/gfp.h:6, from include/linux/slab.h:15, from drivers/usb/serial/whiteheat.c:17: In function 'fortify_memcpy_chk', inlined from 'firm_send_command' at drivers/usb/serial/whiteheat.c:587:4: include/linux/fortify-string.h:328:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 328 \| __write_overflow_field(p_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Instead, just assign the one byte directly. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202204142318.vDqjjSFn-lkp@intel.com Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220421001234.2421107-1-keescook@chromium.org Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:27 +02:00
Oliver Neukum	7f8fc60689	USB: quirks: add STRING quirk for VCOM device commit `ec547af8a9` upstream. This has been reported to stall if queried Cc: stable <stable@vger.kernel.org> Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/r/20220414123152.1700-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:27 +02:00
Oliver Neukum	96a5999e1f	USB: quirks: add a Realtek card reader commit `2a7ccf6bb6` upstream. This device is reported to stall when enummerated. Cc: stable <stable@vger.kernel.org> Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/r/20220414110209.30924-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:27 +02:00
Macpaul Lin	0da0ac8941	usb: mtu3: fix USB 3.0 dual-role-switch from device to host commit `456244aeec` upstream. Issue description: When an OTG port has been switched to device role and then switch back to host role again, the USB 3.0 Host (XHCI) will not be able to detect "plug in event of a connected USB 2.0/1.0 ((Highspeed and Fullspeed) devices until system reboot. Root cause and Solution: There is a condition checking flag "ssusb->otg_switch.is_u3_drd" in toggle_opstate(). At the end of role switch procedure, toggle_opstate() will be called to set DC_SESSION and SOFT_CONN bit. If "is_u3_drd" was set and switched the role to USB host 3.0, bit DC_SESSION and SOFT_CONN will be skipped hence caused the port cannot detect connected USB 2.0 (Highspeed and Fullspeed) devices. Simply remove the condition check to solve this issue. Fixes: `d0ed062a8b` ("usb: mtu3: dual-role mode support") Cc: stable@vger.kernel.org Tested-by: Fabien Parent <fparent@baylibre.com> Reviewed-by: Chunfeng Yun <chunfeng.yun@mediatek.com> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com> Signed-off-by: Tainping Fang <tianping.fang@mediatek.com> Link: https://lore.kernel.org/r/20220419081245.21015-1-macpaul.lin@mediatek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:27 +02:00
Greg Kroah-Hartman	4bf7f350c1	Linux 5.15.37 Link: https://lore.kernel.org/r/20220429104052.345760505@linuxfoundation.org Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:35 +02:00
Kumar Kartikeya Dwivedi	f59e6886ca	selftests/bpf: Add test for reg2btf_ids out of bounds access commit `13c6a37d40` upstream. This test tries to pass a PTR_TO_BTF_ID_OR_NULL to the release function, which would trigger a out of bounds access without the fix in commit `45ce4b4f90` ("bpf: Fix crash due to out of bounds access into reg2btf_ids.") but after the fix, it should only index using base_type(reg->type), which should be less than __BPF_REG_TYPE_MAX, and also not permit any type flags to be set for the reg->type. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220220023138.2224652-1-memxor@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:34 +02:00
Linus Torvalds	dcecd95a13	mm: gup: make fault_in_safe_writeable() use fixup_user_fault() commit `fe673d3f5b` upstream Instead of using GUP, make fault_in_safe_writeable() actually force a 'handle_mm_fault()' using the same fixup_user_fault() machinery that futexes already use. Using the GUP machinery meant that fault_in_safe_writeable() did not do everything that a real fault would do, ranging from not auto-expanding the stack segment, to not updating accessed or dirty flags in the page tables (GUP sets those flags on the pages themselves). The latter causes problems on architectures (like s390) that do accessed bit handling in software, which meant that fault_in_safe_writeable() didn't actually do all the fault handling it needed to, and trying to access the user address afterwards would still cause faults. Reported-and-tested-by: Andreas Gruenbacher <agruenba@redhat.com> Fixes: `cdd591fc86` ("iov_iter: Introduce fault_in_iov_iter_writeable") Link: https://lore.kernel.org/all/CAHc6FU5nP+nziNGG0JAF1FUx-GV7kKFvM7aZuU_XD2_1v4vnvg@mail.gmail.com/ Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:34 +02:00
Filipe Manana	4a0123bdb0	btrfs: fallback to blocking mode when doing async dio over multiple extents commit `ca93e44bfb` upstream Some users recently reported that MariaDB was getting a read corruption when using io_uring on top of btrfs. This started to happen in 5.16, after commit `51bd9563b6` ("btrfs: fix deadlock due to page faults during direct IO reads and writes"). That changed btrfs to use the new iomap flag IOMAP_DIO_PARTIAL and to disable page faults before calling iomap_dio_rw(). This was necessary to fix deadlocks when the iovector corresponds to a memory mapped file region. That type of scenario is exercised by test case generic/647 from fstests. For this MariaDB scenario, we attempt to read 16K from file offset X using IOCB_NOWAIT and io_uring. In that range we have 4 extents, each with a size of 4K, and what happens is the following: 1) btrfs_direct_read() disables page faults and calls iomap_dio_rw(); 2) iomap creates a struct iomap_dio object, its reference count is initialized to 1 and its ->size field is initialized to 0; 3) iomap calls btrfs_dio_iomap_begin() with file offset X, which finds the first 4K extent, and setups an iomap for this extent consisting of a single page; 4) At iomap_dio_bio_iter(), we are able to access the first page of the buffer (struct iov_iter) with bio_iov_iter_get_pages() without triggering a page fault; 5) iomap submits a bio for this 4K extent (iomap_dio_submit_bio() -> btrfs_submit_direct()) and increments the refcount on the struct iomap_dio object to 2; The ->size field of the struct iomap_dio object is incremented to 4K; 6) iomap calls btrfs_iomap_begin() again, this time with a file offset of X + 4K. There we setup an iomap for the next extent that also has a size of 4K; 7) Then at iomap_dio_bio_iter() we call bio_iov_iter_get_pages(), which tries to access the next page (2nd page) of the buffer. This triggers a page fault and returns -EFAULT; 8) At __iomap_dio_rw() we see the -EFAULT, but we reset the error to 0 because we passed the flag IOMAP_DIO_PARTIAL to iomap and the struct iomap_dio object has a ->size value of 4K (we submitted a bio for an extent already). The 'wait_for_completion' variable is not set to true, because our iocb has IOCB_NOWAIT set; 9) At the bottom of __iomap_dio_rw(), we decrement the reference count of the struct iomap_dio object from 2 to 1. Because we were not the only ones holding a reference on it and 'wait_for_completion' is set to false, -EIOCBQUEUED is returned to btrfs_direct_read(), which just returns it up the callchain, up to io_uring; 10) The bio submitted for the first extent (step 5) completes and its bio endio function, iomap_dio_bio_end_io(), decrements the last reference on the struct iomap_dio object, resulting in calling iomap_dio_complete_work() -> iomap_dio_complete(). 11) At iomap_dio_complete() we adjust the iocb->ki_pos from X to X + 4K and return 4K (the amount of io done) to iomap_dio_complete_work(); 12) iomap_dio_complete_work() calls the iocb completion callback, iocb->ki_complete() with a second argument value of 4K (total io done) and the iocb with the adjust ki_pos of X + 4K. This results in completing the read request for io_uring, leaving it with a result of 4K bytes read, and only the first page of the buffer filled in, while the remaining 3 pages, corresponding to the other 3 extents, were not filled; 13) For the application, the result is unexpected because if we ask to read N bytes, it expects to get N bytes read as long as those N bytes don't cross the EOF (i_size). MariaDB reports this as an error, as it's not expecting a short read, since it knows it's asking for read operations fully within the i_size boundary. This is typical in many applications, but it may also be questionable if they should react to such short reads by issuing more read calls to get the remaining data. Nevertheless, the short read happened due to a change in btrfs regarding how it deals with page faults while in the middle of a read operation, and there's no reason why btrfs can't have the previous behaviour of returning the whole data that was requested by the application. The problem can also be triggered with the following simple program: /* Get O_DIRECT / #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> #include <string.h> #include <liburing.h> int main(int argc, char argv[]) { char foo_path; struct io_uring ring; struct io_uring_sqe sqe; struct io_uring_cqe cqe; struct iovec iovec; int fd; long pagesize; void write_buf; void read_buf; ssize_t ret; int i; if (argc != 2) { fprintf(stderr, "Use: %s <directory>\n", argv[0]); return 1; } foo_path = malloc(strlen(argv[1]) + 5); if (!foo_path) { fprintf(stderr, "Failed to allocate memory for file path\n"); return 1; } strcpy(foo_path, argv[1]); strcat(foo_path, "/foo"); / * Create file foo with 2 extents, each with a size matching * the page size. Then allocate a buffer to read both extents * with io_uring, using O_DIRECT and IOCB_NOWAIT. Before doing * the read with io_uring, access the first page of the buffer * to fault it in, so that during the read we only trigger a * page fault when accessing the second page of the buffer. / fd = open(foo_path, O_CREAT \| O_TRUNC \| O_WRONLY \| O_DIRECT, 0666); if (fd == -1) { fprintf(stderr, "Failed to create file 'foo': %s (errno %d)", strerror(errno), errno); return 1; } pagesize = sysconf(_SC_PAGE_SIZE); ret = posix_memalign(&write_buf, pagesize, 2 pagesize); if (ret) { fprintf(stderr, "Failed to allocate write buffer\n"); return 1; } memset(write_buf, 0xab, pagesize); memset(write_buf + pagesize, 0xcd, pagesize); /* Create 2 extents, each with a size matching page size. / for (i = 0; i < 2; i++) { ret = pwrite(fd, write_buf + i pagesize, pagesize, i * pagesize); if (ret != pagesize) { fprintf(stderr, "Failed to write to file, ret = %ld errno %d (%s)\n", ret, errno, strerror(errno)); return 1; } ret = fsync(fd); if (ret != 0) { fprintf(stderr, "Failed to fsync file\n"); return 1; } } close(fd); fd = open(foo_path, O_RDONLY \| O_DIRECT); if (fd == -1) { fprintf(stderr, "Failed to open file 'foo': %s (errno %d)", strerror(errno), errno); return 1; } ret = posix_memalign(&read_buf, pagesize, 2 * pagesize); if (ret) { fprintf(stderr, "Failed to allocate read buffer\n"); return 1; } /* * Fault in only the first page of the read buffer. * We want to trigger a page fault for the 2nd page of the * read buffer during the read operation with io_uring * (O_DIRECT and IOCB_NOWAIT). / memset(read_buf, 0, 1); ret = io_uring_queue_init(1, &ring, 0); if (ret != 0) { fprintf(stderr, "Failed to create io_uring queue\n"); return 1; } sqe = io_uring_get_sqe(&ring); if (!sqe) { fprintf(stderr, "Failed to get io_uring sqe\n"); return 1; } iovec.iov_base = read_buf; iovec.iov_len = 2 pagesize; io_uring_prep_readv(sqe, fd, &iovec, 1, 0); ret = io_uring_submit_and_wait(&ring, 1); if (ret != 1) { fprintf(stderr, "Failed at io_uring_submit_and_wait()\n"); return 1; } ret = io_uring_wait_cqe(&ring, &cqe); if (ret < 0) { fprintf(stderr, "Failed at io_uring_wait_cqe()\n"); return 1; } printf("io_uring read result for file foo:\n\n"); printf(" cqe->res == %d (expected %d)\n", cqe->res, 2 * pagesize); printf(" memcmp(read_buf, write_buf) == %d (expected 0)\n", memcmp(read_buf, write_buf, 2 * pagesize)); io_uring_cqe_seen(&ring, cqe); io_uring_queue_exit(&ring); return 0; } When running it on an unpatched kernel: $ gcc io_uring_test.c -luring $ mkfs.btrfs -f /dev/sda $ mount /dev/sda /mnt/sda $ ./a.out /mnt/sda io_uring read result for file foo: cqe->res == 4096 (expected 8192) memcmp(read_buf, write_buf) == -205 (expected 0) After this patch, the read always returns 8192 bytes, with the buffer filled with the correct data. Although that reproducer always triggers the bug in my test vms, it's possible that it will not be so reliable on other environments, as that can happen if the bio for the first extent completes and decrements the reference on the struct iomap_dio object before we do the atomic_dec_and_test() on the reference at __iomap_dio_rw(). Fix this in btrfs by having btrfs_dio_iomap_begin() return -EAGAIN whenever we try to satisfy a non blocking IO request (IOMAP_NOWAIT flag set) over a range that spans multiple extents (or a mix of extents and holes). This avoids returning success to the caller when we only did partial IO, which is not optimal for writes and for reads it's actually incorrect, as the caller doesn't expect to get less bytes read than it has requested (unless EOF is crossed), as previously mentioned. This is also the type of behaviour that xfs follows (xfs_direct_write_iomap_begin()), even though it doesn't use IOMAP_DIO_PARTIAL. A test case for fstests will follow soon. Link: https://lore.kernel.org/linux-btrfs/CABVffEM0eEWho+206m470rtM0d9J8ue85TtR-A_oVTuGLWFicA@mail.gmail.com/ Link: https://lore.kernel.org/linux-btrfs/CAHF2GV6U32gmqSjLe=XKgfcZAmLCiH26cJ2OnHGp5x=VAH4OHQ@mail.gmail.com/ CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:34 +02:00
Filipe Manana	c81c4f5666	btrfs: fix deadlock due to page faults during direct IO reads and writes commit `51bd9563b6` upstream If we do a direct IO read or write when the buffer given by the user is memory mapped to the file range we are going to do IO, we end up ending in a deadlock. This is triggered by the new test case generic/647 from fstests. For a direct IO read we get a trace like this: [967.872718] INFO: task mmap-rw-fault:12176 blocked for more than 120 seconds. [967.874161] Not tainted 5.14.0-rc7-btrfs-next-95 #1 [967.874909] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [967.875983] task:mmap-rw-fault state:D stack: 0 pid:12176 ppid: 11884 flags:0x00000000 [967.875992] Call Trace: [967.875999] __schedule+0x3ca/0xe10 [967.876015] schedule+0x43/0xe0 [967.876020] wait_extent_bit.constprop.0+0x1eb/0x260 [btrfs] [967.876109] ? do_wait_intr_irq+0xb0/0xb0 [967.876118] lock_extent_bits+0x37/0x90 [btrfs] [967.876150] btrfs_lock_and_flush_ordered_range+0xa9/0x120 [btrfs] [967.876184] ? extent_readahead+0xa7/0x530 [btrfs] [967.876214] extent_readahead+0x32d/0x530 [btrfs] [967.876253] ? lru_cache_add+0x104/0x220 [967.876255] ? kvm_sched_clock_read+0x14/0x40 [967.876258] ? sched_clock_cpu+0xd/0x110 [967.876263] ? lock_release+0x155/0x4a0 [967.876271] read_pages+0x86/0x270 [967.876274] ? lru_cache_add+0x125/0x220 [967.876281] page_cache_ra_unbounded+0x1a3/0x220 [967.876291] filemap_fault+0x626/0xa20 [967.876303] __do_fault+0x36/0xf0 [967.876308] __handle_mm_fault+0x83f/0x15f0 [967.876322] handle_mm_fault+0x9e/0x260 [967.876327] __get_user_pages+0x204/0x620 [967.876332] ? get_user_pages_unlocked+0x69/0x340 [967.876340] get_user_pages_unlocked+0xd3/0x340 [967.876349] internal_get_user_pages_fast+0xbca/0xdc0 [967.876366] iov_iter_get_pages+0x8d/0x3a0 [967.876374] bio_iov_iter_get_pages+0x82/0x4a0 [967.876379] ? lock_release+0x155/0x4a0 [967.876387] iomap_dio_bio_actor+0x232/0x410 [967.876396] iomap_apply+0x12a/0x4a0 [967.876398] ? iomap_dio_rw+0x30/0x30 [967.876414] __iomap_dio_rw+0x29f/0x5e0 [967.876415] ? iomap_dio_rw+0x30/0x30 [967.876420] ? lock_acquired+0xf3/0x420 [967.876429] iomap_dio_rw+0xa/0x30 [967.876431] btrfs_file_read_iter+0x10b/0x140 [btrfs] [967.876460] new_sync_read+0x118/0x1a0 [967.876472] vfs_read+0x128/0x1b0 [967.876477] __x64_sys_pread64+0x90/0xc0 [967.876483] do_syscall_64+0x3b/0xc0 [967.876487] entry_SYSCALL_64_after_hwframe+0x44/0xae [967.876490] RIP: 0033:0x7fb6f2c038d6 [967.876493] RSP: 002b:00007fffddf586b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011 [967.876496] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007fb6f2c038d6 [967.876498] RDX: 0000000000001000 RSI: 00007fb6f2c17000 RDI: 0000000000000003 [967.876499] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000000000 [967.876501] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000003 [967.876502] R13: 0000000000000000 R14: 00007fb6f2c17000 R15: 0000000000000000 This happens because at btrfs_dio_iomap_begin() we lock the extent range and return with it locked - we only unlock in the endio callback, at end_bio_extent_readpage() -> endio_readpage_release_extent(). Then after iomap called the btrfs_dio_iomap_begin() callback, it triggers the page faults that resulting in reading the pages, through the readahead callback btrfs_readahead(), and through there we end to attempt to lock again the same extent range (or a subrange of what we locked before), resulting in the deadlock. For a direct IO write, the scenario is a bit different, and it results in trace like this: [1132.442520] run fstests generic/647 at 2021-08-31 18:53:35 [1330.349355] INFO: task mmap-rw-fault:184017 blocked for more than 120 seconds. [1330.350540] Not tainted 5.14.0-rc7-btrfs-next-95 #1 [1330.351158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1330.351900] task:mmap-rw-fault state:D stack: 0 pid:184017 ppid:183725 flags:0x00000000 [1330.351906] Call Trace: [1330.351913] __schedule+0x3ca/0xe10 [1330.351930] schedule+0x43/0xe0 [1330.351935] btrfs_start_ordered_extent+0x108/0x1c0 [btrfs] [1330.352020] ? do_wait_intr_irq+0xb0/0xb0 [1330.352028] btrfs_lock_and_flush_ordered_range+0x8c/0x120 [btrfs] [1330.352064] ? extent_readahead+0xa7/0x530 [btrfs] [1330.352094] extent_readahead+0x32d/0x530 [btrfs] [1330.352133] ? lru_cache_add+0x104/0x220 [1330.352135] ? kvm_sched_clock_read+0x14/0x40 [1330.352138] ? sched_clock_cpu+0xd/0x110 [1330.352143] ? lock_release+0x155/0x4a0 [1330.352151] read_pages+0x86/0x270 [1330.352155] ? lru_cache_add+0x125/0x220 [1330.352162] page_cache_ra_unbounded+0x1a3/0x220 [1330.352172] filemap_fault+0x626/0xa20 [1330.352176] ? filemap_map_pages+0x18b/0x660 [1330.352184] __do_fault+0x36/0xf0 [1330.352189] __handle_mm_fault+0x1253/0x15f0 [1330.352203] handle_mm_fault+0x9e/0x260 [1330.352208] __get_user_pages+0x204/0x620 [1330.352212] ? get_user_pages_unlocked+0x69/0x340 [1330.352220] get_user_pages_unlocked+0xd3/0x340 [1330.352229] internal_get_user_pages_fast+0xbca/0xdc0 [1330.352246] iov_iter_get_pages+0x8d/0x3a0 [1330.352254] bio_iov_iter_get_pages+0x82/0x4a0 [1330.352259] ? lock_release+0x155/0x4a0 [1330.352266] iomap_dio_bio_actor+0x232/0x410 [1330.352275] iomap_apply+0x12a/0x4a0 [1330.352278] ? iomap_dio_rw+0x30/0x30 [1330.352292] __iomap_dio_rw+0x29f/0x5e0 [1330.352294] ? iomap_dio_rw+0x30/0x30 [1330.352306] btrfs_file_write_iter+0x238/0x480 [btrfs] [1330.352339] new_sync_write+0x11f/0x1b0 [1330.352344] ? NF_HOOK_LIST.constprop.0.cold+0x31/0x3e [1330.352354] vfs_write+0x292/0x3c0 [1330.352359] __x64_sys_pwrite64+0x90/0xc0 [1330.352365] do_syscall_64+0x3b/0xc0 [1330.352369] entry_SYSCALL_64_after_hwframe+0x44/0xae [1330.352372] RIP: 0033:0x7f4b0a580986 [1330.352379] RSP: 002b:00007ffd34d75418 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 [1330.352382] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f4b0a580986 [1330.352383] RDX: 0000000000001000 RSI: 00007f4b0a3a4000 RDI: 0000000000000003 [1330.352385] RBP: 00007f4b0a3a4000 R08: 0000000000000003 R09: 0000000000000000 [1330.352386] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 [1330.352387] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Unlike for reads, at btrfs_dio_iomap_begin() we return with the extent range unlocked, but later when the page faults are triggered and we try to read the extents, we end up btrfs_lock_and_flush_ordered_range() where we find the ordered extent for our write, created by the iomap callback btrfs_dio_iomap_begin(), and we wait for it to complete, which makes us deadlock since we can't complete the ordered extent without reading the pages (the iomap code only submits the bio after the pages are faulted in). Fix this by setting the nofault attribute of the given iov_iter and retry the direct IO read/write if we get an -EFAULT error returned from iomap. For reads, also disable page faults completely, this is because when we read from a hole or a prealloc extent, we can still trigger page faults due to the call to iov_iter_zero() done by iomap - at the moment, it is oblivious to the value of the ->nofault attribute of an iov_iter. We also need to keep track of the number of bytes written or read, and pass it to iomap_dio_rw(), as well as use the new flag IOMAP_DIO_PARTIAL. This depends on the iov_iter and iomap changes introduced in commit `c03098d4b9` ("Merge tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2"). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:33 +02:00
Andreas Gruenbacher	640a6be8e8	gfs2: Fix mmap + page fault deadlocks for direct I/O commit `b01b2d72da` upstream Also disable page faults during direct I/O requests and implement a similar kind of retry logic as in the buffered I/O case. The retry logic in the direct I/O case differs from the buffered I/O case in the following way: direct I/O doesn't provide the kinds of consistency guarantees between concurrent reads and writes that buffered I/O provides, so once we lose the inode glock while faulting in user pages, we always resume the operation. We never need to return a partial read or write. This locking problem was originally reported by Jan Kara. Linus came up with the idea of disabling page faults. Many thanks to Al Viro and Matthew Wilcox for their feedback. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:33 +02:00
Andreas Gruenbacher	f86f8d2784	iov_iter: Introduce nofault flag to disable page faults commit `3337ab08d0` upstream Introduce a new nofault flag to indicate to iov_iter_get_pages not to fault in user pages. This is implemented by passing the FOLL_NOFAULT flag to get_user_pages, which causes get_user_pages to fail when it would otherwise fault in a page. We'll use the ->nofault flag to prevent iomap_dio_rw from faulting in pages when page faults are not allowed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:33 +02:00

1 2 3 4 5 ...

1051094 Commits