linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-09 12:17:12 +09:00

Author	SHA1	Message	Date
David Sterba	e00c01b74a	btrfs: clear defrag status of a root if starting transaction fails commit `6819703f5a` upstream. The defrag loop processes leaves in batches and starting transaction for each. The whole defragmentation on a given root is protected by a bit but in case the transaction fails, the bit is not cleared In case the transaction fails the bit would prevent starting defragmentation again, so make sure it's cleared. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:40 +02:00
Filipe Manana	ce3986380a	btrfs: send: fix invalid path for unlink operations after parent orphanization commit `d8ac76cdd1` upstream. During an incremental send operation, when processing the new references for the current inode, we might send an unlink operation for another inode that has a conflicting path and has more than one hard link. However this path was computed and cached before we processed previous new references for the current inode. We may have orphanized a directory of that path while processing a previous new reference, in which case the path will be invalid and cause the receiver process to fail. The following reproducer triggers the problem and explains how/why it happens in its comments: $ cat test-send-unlink.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV >/dev/null mount $DEV $MNT # Create our test files and directory. Inode 259 (file3) has two hard # links. touch $MNT/file1 touch $MNT/file2 touch $MNT/file3 mkdir $MNT/A ln $MNT/file3 $MNT/A/hard_link # Filesystem looks like: # # . (ino 256) # \|----- file1 (ino 257) # \|----- file2 (ino 258) # \|----- file3 (ino 259) # \|----- A/ (ino 260) # \|---- hard_link (ino 259) # # Now create the base snapshot, which is going to be the parent snapshot # for a later incremental send. btrfs subvolume snapshot -r $MNT $MNT/snap1 btrfs send -f /tmp/snap1.send $MNT/snap1 # Move inode 257 into directory inode 260. This results in computing the # path for inode 260 as "/A" and caching it. mv $MNT/file1 $MNT/A/file1 # Move inode 258 (file2) into directory inode 260, with a name of # "hard_link", moving first inode 259 away since it currently has that # location and name. mv $MNT/A/hard_link $MNT/tmp mv $MNT/file2 $MNT/A/hard_link # Now rename inode 260 to something else (B for example) and then create # a hard link for inode 258 that has the old name and location of inode # 260 ("/A"). mv $MNT/A $MNT/B ln $MNT/B/hard_link $MNT/A # Filesystem now looks like: # # . (ino 256) # \|----- tmp (ino 259) # \|----- file3 (ino 259) # \|----- B/ (ino 260) # \| \|---- file1 (ino 257) # \| \|---- hard_link (ino 258) # \| # \|----- A (ino 258) # Create another snapshot of our subvolume and use it for an incremental # send. btrfs subvolume snapshot -r $MNT $MNT/snap2 btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2 # Now unmount the filesystem, create a new one, mount it and try to # apply both send streams to recreate both snapshots. umount $DEV mkfs.btrfs -f $DEV >/dev/null mount $DEV $MNT # First add the first snapshot to the new filesystem by applying the # first send stream. btrfs receive -f /tmp/snap1.send $MNT # The incremental receive operation below used to fail with the # following error: # # ERROR: unlink A/hard_link failed: No such file or directory # # This is because when send is processing inode 257, it generates the # path for inode 260 as "/A", since that inode is its parent in the send # snapshot, and caches that path. # # Later when processing inode 258, it first processes its new reference # that has the path of "/A", which results in orphanizing inode 260 # because there is a a path collision. This results in issuing a rename # operation from "/A" to "/o260-6-0". # # Finally when processing the new reference "B/hard_link" for inode 258, # it notices that it collides with inode 259 (not yet processed, because # it has a higher inode number), since that inode has the name # "hard_link" under the directory inode 260. It also checks that inode # 259 has two hardlinks, so it decides to issue a unlink operation for # the name "hard_link" for inode 259. However the path passed to the # unlink operation is "/A/hard_link", which is incorrect since currently # "/A" does not exists, due to the orphanization of inode 260 mentioned # before. The path is incorrect because it was computed and cached # before the orphanization. This results in the receiver to fail with # the above error. btrfs receive -f /tmp/snap2.send $MNT umount $MNT When running the test, it fails like this: $ ./test-send-unlink.sh Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1' At subvol /mnt/sdi/snap1 Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2' At subvol /mnt/sdi/snap2 At subvol snap1 At snapshot snap2 ERROR: unlink A/hard_link failed: No such file or directory Fix this by recomputing a path before issuing an unlink operation when processing the new references for the current inode if we previously have orphanized a directory. A test case for fstests will follow soon. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:40 +02:00
Ludovic Desroches	fe0c1aa932	ARM: dts: at91: sama5d4: fix pinctrl muxing commit `253adffb0e` upstream. Fix pinctrl muxing, PD28, PD29 and PD31 can be muxed to peripheral A. It allows to use SCK0, SCK1 and SPI0_NPCS2 signals. Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com> Fixes: `679f8d92bb` ("ARM: at91/dt: sama5d4: add pioD pin mux mask and enable pioD") Cc: stable@vger.kernel.org # v4.4+ Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20191025084210.14726-1-ludovic.desroches@microchip.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:39 +02:00
Yang Jihong	90639a0e19	arm_pmu: Fix write counter incorrect in ARMv7 big-endian mode commit `fdbef8c4e6` upstream. Commit `3a95200d3f` ("arm_pmu: Change API to support 64bit counter values") changes the input "value" type from 32-bit to 64-bit, which introduces the following problem: ARMv7 PMU counters is 32-bit width, in big-endian mode, write counter uses high 32-bit, which writes an incorrect value. Before: Performance counter stats for 'ls': 2.22 msec task-clock # 0.675 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 49 page-faults # 0.022 M/sec 2150476593 cycles # 966.663 GHz 2148588788 instructions # 1.00 insn per cycle 2147745484 branches # 965435.074 M/sec 2147508540 branch-misses # 99.99% of all branches None of the above hw event counters are correct. Solution: "value" forcibly converted to 32-bit type before being written to PMU register. After: Performance counter stats for 'ls': 2.09 msec task-clock # 0.681 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 46 page-faults # 0.022 M/sec `2807301` cycles # 1.344 GHz 1060159 instructions # 0.38 insn per cycle 250496 branches # 119.914 M/sec 23192 branch-misses # 9.26% of all branches Fixes: `3a95200d3f` ("arm_pmu: Change API to support 64bit counter values") Cc: <stable@vger.kernel.org> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210430012659.232110-1-yangjihong1@huawei.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:39 +02:00
Alexander Larkin	b62ce8e3f7	Input: joydev - prevent use of not validated data in JSIOCSBTNMAP ioctl commit `f8f84af5da` upstream. Even though we validate user-provided inputs we then traverse past validated data when applying the new map. The issue was originally discovered by Murray McAllister with this simple POC (if the following is executed by an unprivileged user it will instantly panic the system): int main(void) { int fd, ret; unsigned int buffer[10000]; fd = open("/dev/input/js0", O_RDONLY); if (fd == -1) printf("Error opening file\n"); ret = ioctl(fd, JSIOCSBTNMAP & ~IOCSIZE_MASK, &buffer); printf("%d\n", ret); } The solution is to traverse internal buffer which is guaranteed to only contain valid date when constructing the map. Fixes: `182d679b22` ("Input: joydev - prevent potential read overflow in ioctl") Fixes: `999b874f4a` ("Input: joydev - validate axis/button maps before clobbering current ones") Reported-by: Murray McAllister <murray.mcallister@gmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Alexander Larkin <avlarkin82@gmail.com> Link: https://lore.kernel.org/r/20210620120030.1513655-1-avlarkin82@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:39 +02:00
Al Viro	faf8ab4355	iov_iter_fault_in_readable() should do nothing in xarray case commit `0e8f0d6740` upstream. ... and actually should just check it's given an iovec-backed iterator in the first place. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:39 +02:00
Desmond Cheong Zhi Xi	00f00f5db8	ntfs: fix validity check for file name attribute commit `d98e4d9541` upstream. When checking the file name attribute, we want to ensure that it fits within the bounds of ATTR_RECORD. To do this, we should check that (attr record + file name offset + file name length) < (attr record + attr record length). However, the original check did not include the file name offset in the calculation. This means that corrupted on-disk metadata might not caught by the incorrect file name check, and lead to an invalid memory access. An example can be seen in the crash report of a memory corruption error found by Syzbot: https://syzkaller.appspot.com/bug?id=a1a1e379b225812688566745c3e2f7242bffc246 Adding the file name offset to the validity check fixes this error and passes the Syzbot reproducer test. Link: https://lkml.kernel.org/r/20210614050540.289494-1-desmondcheongzx@gmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Reported-by: syzbot+213ac8bb98f7f4420840@syzkaller.appspotmail.com Tested-by: syzbot+213ac8bb98f7f4420840@syzkaller.appspotmail.com Acked-by: Anton Altaparmakov <anton@tuxera.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:39 +02:00
Zhangjiantao (Kirin, nanjing)	6f26f2e79d	xhci: solve a double free problem while doing s4 commit `b31d9d6d7a` upstream. when system is doing s4, the process of xhci_resume may be as below: 1、xhci_mem_cleanup 2、xhci_init->xhci_mem_init->xhci_mem_cleanup(when memory is not enough). xhci_mem_cleanup will be executed twice when system is out of memory. xhci->port_caps is freed in xhci_mem_cleanup,but it isn't set to NULL. It will be freed twice when xhci_mem_cleanup is called the second time. We got following bug when system resumes from s4: kernel BUG at mm/slub.c:309! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP CPU: 0 PID: 5929 Tainted: G S W 5.4.96-arm64-desktop #1 pc : __slab_free+0x5c/0x424 lr : kfree+0x30c/0x32c Call trace: __slab_free+0x5c/0x424 kfree+0x30c/0x32c xhci_mem_cleanup+0x394/0x3cc xhci_mem_init+0x9ac/0x1070 xhci_init+0x8c/0x1d0 xhci_resume+0x1cc/0x5fc xhci_plat_resume+0x64/0x70 platform_pm_thaw+0x28/0x60 dpm_run_callback+0x54/0x24c device_resume+0xd0/0x200 async_resume+0x24/0x60 async_run_entry_fn+0x44/0x110 process_one_work+0x1f0/0x490 worker_thread+0x5c/0x450 kthread+0x158/0x160 ret_from_fork+0x10/0x24 Original patch that caused this issue was backported to 4.4 stable, so this should be backported to 4.4 stabe as well. Fixes: `cf0ee7c60c` ("xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Jiantao Zhang <water.zhangjiantao@huawei.com> Signed-off-by: Tao Xue <xuetao09@huawei.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20210617150354.1512157-5-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:39 +02:00
Jing Xiangfeng	9c7de67819	usb: typec: Add the missed altmode_id_remove() in typec_register_altmode() commit `03026197bb` upstream. typec_register_altmode() misses to call altmode_id_remove() in an error path. Add the missed function call to fix it. Fixes: `8a37d87d72` ("usb: typec: Bus type for alternate modes") Cc: stable <stable@vger.kernel.org> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Link: https://lore.kernel.org/r/20210617073226.47599-1-jingxiangfeng@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:39 +02:00
Minas Harutyunyan	309970bf56	usb: dwc3: Fix debugfs creation flow commit `84524d1232` upstream. Creation EP's debugfs called earlier than debugfs folder for dwc3 device created. As result EP's debugfs are created in '/sys/kernel/debug' instead of '/sys/kernel/debug/usb/dwc3.1.auto'. Moved dwc3_debugfs_init() function call before calling dwc3_core_init_mode() to allow create dwc3 debugfs parent before creating EP's debugfs's. Fixes: `8d396bb0a5` ("usb: dwc3: debugfs: Add and remove endpoint dirs dynamically") Cc: stable <stable@vger.kernel.org> Reviewed-by: Jack Pham <jackp@codeaurora.org> Signed-off-by: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> Link: https://lore.kernel.org/r/01fafb5b2d8335e98e6eadbac61fc796bdf3ec1a.1623948457.git.Minas.Harutyunyan@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:39 +02:00
Hannu Hartikainen	c984eaeb01	USB: cdc-acm: blacklist Heimann USB Appset device commit `4897807753` upstream. The device (32a7:0000 Heimann Sensor GmbH USB appset demo) claims to be a CDC-ACM device in its descriptors but in fact is not. If it is run with echo disabled it returns garbled data, probably due to something that happens in the TTY layer. And when run with echo enabled (the default), it will mess up the calibration data of the sensor the first time any data is sent to the device. In short, I had a bad time after connecting the sensor and trying to get it to work. I hope blacklisting it in the cdc-acm driver will save someone else a bit of trouble. Signed-off-by: Hannu Hartikainen <hannu@hrtk.in> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210622141454.337948-1-hannu@hrtk.in Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:39 +02:00
Linyu Yuan	d654be97e1	usb: gadget: eem: fix echo command packet response issue commit `4249d6fbc1` upstream. when receive eem echo command, it will send a response, but queue this response to the usb request which allocate from gadget device endpoint zero, and transmit the request to IN endpoint of eem interface. on dwc3 gadget, it will trigger following warning in function __dwc3_gadget_ep_queue(), if (WARN(req->dep != dep, "request %pK belongs to '%s'\n", &req->request, req->dep->name)) return -EINVAL; fix it by allocating a usb request from IN endpoint of eem interface, and transmit the usb request to same IN endpoint of eem interface. Signed-off-by: Linyu Yuan <linyyuan@codeaurora.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210616115142.34075-1-linyyuan@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:39 +02:00
Pavel Skripkin	16151f468e	net: can: ems_usb: fix use-after-free in ems_usb_disconnect() commit `ab4a0b8fcb` upstream. In ems_usb_disconnect() dev pointer, which is netdev private data, is used after free_candev() call: \| if (dev) { \| unregister_netdev(dev->netdev); \| free_candev(dev->netdev); \| \| unlink_all_urbs(dev); \| \| usb_free_urb(dev->intr_urb); \| \| kfree(dev->intr_in_buffer); \| kfree(dev->tx_msg_buffer); \| } Fix it by simply moving free_candev() at the end of the block. Fail log: \| BUG: KASAN: use-after-free in ems_usb_disconnect \| Read of size 8 at addr ffff88804e041008 by task kworker/1:2/2895 \| \| CPU: 1 PID: 2895 Comm: kworker/1:2 Not tainted 5.13.0-rc5+ #164 \| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.4 \| Workqueue: usb_hub_wq hub_event \| Call Trace: \| dump_stack (lib/dump_stack.c:122) \| print_address_description.constprop.0.cold (mm/kasan/report.c:234) \| kasan_report.cold (mm/kasan/report.c:420 mm/kasan/report.c:436) \| ems_usb_disconnect (drivers/net/can/usb/ems_usb.c:683 drivers/net/can/usb/ems_usb.c:1058) Fixes: `702171adee` ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface") Link: https://lore.kernel.org/r/20210617185130.5834-1-paskripkin@gmail.com Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:39 +02:00
Johan Hovold	bfa8fce9c1	Input: usbtouchscreen - fix control-request directions commit `41e81022a0` upstream. The direction of the pipe argument must match the request-type direction bit or control requests may fail depending on the host-controller-driver implementation. Fix the four control requests which erroneously used usb_rcvctrlpipe(). Fixes: `1d3e20236d` ("[PATCH] USB: usbtouchscreen: unified USB touchscreen driver") Fixes: `24ced062a2` ("usbtouchscreen: add support for DMC TSC-10/25 devices") Fixes: `9e3b25837a` ("Input: usbtouchscreen - add support for e2i touchscreen controller") Signed-off-by: Johan Hovold <johan@kernel.org> Cc: stable@vger.kernel.org # 2.6.17 Link: https://lore.kernel.org/r/20210524092048.4443-1-johan@kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:38 +02:00
Pavel Skripkin	b39d5fee82	media: dvb-usb: fix wrong definition commit `c680ed46e4` upstream. syzbot reported WARNING in vmalloc. The problem was in zero size passed to vmalloc. The root case was in wrong cxusb_bluebird_lgz201_properties definition. adapter array has only 1 entry, but num_adapters was 2. Call Trace: __vmalloc_node mm/vmalloc.c:2963 [inline] vmalloc+0x67/0x80 mm/vmalloc.c:2996 dvb_dmx_init+0xe4/0xb90 drivers/media/dvb-core/dvb_demux.c:1251 dvb_usb_adapter_dvb_init+0x564/0x860 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:184 dvb_usb_adapter_init drivers/media/usb/dvb-usb/dvb-usb-init.c:86 [inline] dvb_usb_init drivers/media/usb/dvb-usb/dvb-usb-init.c:184 [inline] dvb_usb_device_init.cold+0xc94/0x146e drivers/media/usb/dvb-usb/dvb-usb-init.c:308 cxusb_probe+0x159/0x5e0 drivers/media/usb/dvb-usb/cxusb.c:1634 Fixes: `4d43e13f72` ("V4L/DVB (4643): Multi-input patch for DVB-USB device") Cc: stable@vger.kernel.org Reported-by: syzbot+7336195c02c1bd2f64e1@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:38 +02:00
Takashi Iwai	becc612df6	ALSA: usb-audio: Fix OOB access at proc output commit `362372ceb6` upstream. At extending the available mixer values for 32bit types, we forgot to add the corresponding entries for the format dump in the proc output. This may result in OOB access. Here adds the missing entries. Fixes: `bc18e31c30` ("ALSA: usb-audio: Fix parameter block size for UAC2 control requests") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210622090647.14021-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:38 +02:00
Daehwan Jung	007a59cb75	ALSA: usb-audio: fix rate on Ozone Z90 USB headset commit `aecc19ec40` upstream. It mislabels its 96 kHz altsetting and that's why it causes some noise Signed-off-by: Daehwan Jung <dh10.jung@samsung.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1623836097-61918-1-git-send-email-dh10.jung@samsung.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:38 +02:00
Quat Le	04e385f5eb	scsi: core: Retry I/O for Notify (Enable Spinup) Required error commit `104739aca4` upstream. If the device is power-cycled, it takes time for the initiator to transmit the periodic NOTIFY (ENABLE SPINUP) SAS primitive, and for the device to respond to the primitive to become ACTIVE. Retry the I/O request to allow the device time to become ACTIVE. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210629155826.48441-1-quat.le@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Quat Le <quat.le@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:15:38 +02:00
Greg Kroah-Hartman	fcfbdfe962	Linux 4.19.197 Link: https://lore.kernel.org/r/20210709131644.969303901@linuxfoundation.org Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Pavel Machek (CIP) <pavel@denx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:32 +02:00
Tony Lindgren	330584716d	clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940 commit `25de4ce5ed` upstream. There is a timer wrap issue on dra7 for the ARM architected timer. In a typical clock configuration the timer fails to wrap after 388 days. To work around the issue, we need to use timer-ti-dm percpu timers instead. Let's configure dmtimer3 and 4 as percpu timers by default, and warn about the issue if the dtb is not configured properly. For more information, please see the errata for "AM572x Sitara Processors Silicon Revisions 1.1, 2.0": https://www.ti.com/lit/er/sprz429m/sprz429m.pdf The concept is based on earlier reference patches done by Tero Kristo and Keerthy. Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Keerthy <j-keerthy@ti.com> Cc: Tero Kristo <kristo@kernel.org> [tony@atomide.com: backported to 4.19.y] Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:32 +02:00
Tony Lindgren	1232352231	clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue commit `3efe7a878a` upstream. There is a timer wrap issue on dra7 for the ARM architected timer. In a typical clock configuration the timer fails to wrap after 388 days. To work around the issue, we need to use timer-ti-dm timers instead. Let's prepare for adding support for percpu timers by adding a common dmtimer_clkevt_init_common() and call it from __omap_sync32k_timer_init(). This patch makes no intentional functional changes. Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Keerthy <j-keerthy@ti.com> Cc: Tero Kristo <kristo@kernel.org> [tony@atomide.com: backported to 4.19.y] Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:32 +02:00
Tony Lindgren	78130e2e4e	clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support commit `52762fbd1c` upstream. We can move the TI dmtimer clockevent and clocksource to live under drivers/clocksource if we rely only on the clock framework, and handle the module configuration directly in the clocksource driver based on the device tree data. This removes the early dependency with system timers to the interconnect related code, and we can probe pretty much everything else later on at the module_init level. Let's first add a new driver for timer-ti-dm-systimer based on existing arch/arm/mach-omap2/timer.c. Then let's start moving SoCs to probe with device tree data while still keeping the old timer.c. And eventually we can just drop the old timer.c. Let's take the opportunity to switch to use readl/writel as pointed out by Daniel Lezcano <daniel.lezcano@linaro.org>. This allows further clean-up of the timer-ti-dm code the a lot of the shared helpers can just become static to the non-syster related code. Note the boards can optionally configure different timer source clocks if needed with assigned-clocks and assigned-clock-parents. Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Keerthy <j-keerthy@ti.com> Cc: Tero Kristo <kristo@kernel.org> [tony@atomide.com: backported to 4.19.y] Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:31 +02:00
afzal mohammed	c74081df08	ARM: OMAP: replace setup_irq() by request_irq() commit `b75ca52177` upstream. request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready. Per tglx[1], setup_irq() existed in olden days when allocators were not ready by the time early interrupts were initialized. Hence replace setup_irq() by request_irq(). [1] https://lkml.kernel.org/r/alpine.DEB.2.20.1710191609480.1971@nanos Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Keerthy <j-keerthy@ti.com> Cc: Tero Kristo <kristo@kernel.org> Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:31 +02:00
Alper Gun	4686e3e4aa	KVM: SVM: Call SEV Guest Decommission if ASID binding fails commit `934002cd66` upstream. Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding fails. If a failure happens after a successful LAUNCH_START command, a decommission command should be executed. Otherwise, guest context will be unfreed inside the AMD SP. After the firmware will not have memory to allocate more SEV guest context, LAUNCH_START command will begin to fail with SEV_RET_RESOURCE_LIMIT error. The existing code calls decommission inside sev_unbind_asid, but it is not called if a failure happens before guest activation succeeds. If sev_bind_asid fails, decommission is never called. PSP firmware has a limit for the number of guests. If sev_asid_binding fails many times, PSP firmware will not have resources to create another guest context. Cc: stable@vger.kernel.org Fixes: `59414c9892` ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command") Reported-by: Peter Gonda <pgonda@google.com> Signed-off-by: Alper Gun <alpergun@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210610174604.2554090-1-alpergun@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:31 +02:00
Juergen Gross	cda326e503	xen/events: reset active flag for lateeoi events later commit `3de218ff39` upstream. In order to avoid a race condition for user events when changing cpu affinity reset the active flag only when EOI-ing the event. This is working fine as all user events are lateeoi events. Note that lateeoi_ack_mask_dynirq() is not modified as there is no explicit call to xen_irq_lateeoi() expected later. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Fixes: `b6622798bc` ("xen/events: avoid handling the same event on two cpus at the same time") Tested-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrvsky@oracle.com> Link: https://lore.kernel.org/r/20210623130913.9405-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:31 +02:00
Petr Mladek	6e2a98bc90	kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() commit `5fa54346ca` upstream. The system might hang with the following backtrace: schedule+0x80/0x100 schedule_timeout+0x48/0x138 wait_for_common+0xa4/0x134 wait_for_completion+0x1c/0x2c kthread_flush_work+0x114/0x1cc kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144 kthread_cancel_delayed_work_sync+0x18/0x2c xxxx_pm_notify+0xb0/0xd8 blocking_notifier_call_chain_robust+0x80/0x194 pm_notifier_call_chain_robust+0x28/0x4c suspend_prepare+0x40/0x260 enter_state+0x80/0x3f4 pm_suspend+0x60/0xdc state_store+0x108/0x144 kobj_attr_store+0x38/0x88 sysfs_kf_write+0x64/0xc0 kernfs_fop_write_iter+0x108/0x1d0 vfs_write+0x2f4/0x368 ksys_write+0x7c/0xec It is caused by the following race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(): CPU0 CPU1 Context: Thread A Context: Thread B kthread_mod_delayed_work() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() kthread_cancel_delayed_work_sync() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() spin_lock() work->canceling++ spin_unlock spin_lock() queue_delayed_work() // dwork is put into the worker->delayed_work_list spin_unlock() kthread_flush_work() // flush_work is put at the tail of the dwork wait_for_completion() Context: IRQ kthread_delayed_work_timer_fn() spin_lock() list_del_init(&work->node); spin_unlock() BANG: flush_work is not longer linked and will never get proceed. The problem is that kthread_mod_delayed_work() checks work->canceling flag before canceling the timer. A simple solution is to (re)check work->canceling after __kthread_cancel_work(). But then it is not clear what should be returned when __kthread_cancel_work() removed the work from the queue (list) and it can't queue it again with the new @delay. The return value might be used for reference counting. The caller has to know whether a new work has been queued or an existing one was replaced. The proper solution is that kthread_mod_delayed_work() will remove the work from the queue (list) _only_ when work->canceling is not set. The flag must be checked after the timer is stopped and the remaining operations can be done under worker->lock. Note that kthread_mod_delayed_work() could remove the timer and then bail out. It is fine. The other canceling caller needs to cancel the timer as well. The important thing is that the queue (list) manipulation is done atomically under worker->lock. Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com Fixes: `9a6b06c8d9` ("kthread: allow to modify delayed kthread work") Signed-off-by: Petr Mladek <pmladek@suse.com> Reported-by: Martin Liu <liumartin@google.com> Cc: <jenhaochen@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:31 +02:00
Petr Mladek	13bcf5aeb3	kthread_worker: split code for canceling the delayed work timer commit `34b3d53447` upstream. Patch series "kthread_worker: Fix race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync()". This patchset fixes the race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync() including proper return value handling. This patch (of 2): Simple code refactoring as a preparation step for fixing a race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(). It does not modify the existing behavior. Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: <jenhaochen@google.com> Cc: Martin Liu <liumartin@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:31 +02:00
Anson Huang	4ca30ef625	ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment commit `4521de30fb` upstream. The vdd3p0 LDO's input should be from external USB VBUS directly, NOT PMIC's power supply, the vdd3p0 LDO's target output voltage can be controlled by SW, and it requires input voltage to be high enough, with incorrect power supply assigned, if the power supply's voltage is lower than the LDO target output voltage, it will return fail and skip the LDO voltage adjustment, so remove the power supply assignment for vdd3p0 to avoid such scenario. Fixes: `93385546ba` ("ARM: dts: imx6qdl-sabresd: Assign corresponding power supply for LDOs") Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Nobuhiro Iwamatsu (CIP) <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:30 +02:00
David Rientjes	cadf5bbcef	KVM: SVM: Periodically schedule when unregistering regions on destroy commit `7be74942f1` upstream. There may be many encrypted regions that need to be unregistered when a SEV VM is destroyed. This can lead to soft lockups. For example, on a host running 4.15: watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348] CPU: 206 PID: 194348 Comm: t_virtual_machi RIP: 0010:free_unref_page_list+0x105/0x170 ... Call Trace: [<0>] release_pages+0x159/0x3d0 [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd] [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd] [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd] [<0>] kvm_arch_destroy_vm+0x47/0x200 [<0>] kvm_put_kvm+0x1a8/0x2f0 [<0>] kvm_vm_release+0x25/0x30 [<0>] do_exit+0x335/0xc10 [<0>] do_group_exit+0x3f/0xa0 [<0>] get_signal+0x1bc/0x670 [<0>] do_signal+0x31/0x130 Although the CLFLUSH is no longer issued on every encrypted region to be unregistered, there are no other changes that can prevent soft lockups for very large SEV VMs in the latest kernel. Periodically schedule if necessary. This still holds kvm->lock across the resched, but since this only happens when the VM is destroyed this is assumed to be acceptable. Signed-off-by: David Rientjes <rientjes@google.com> Message-Id: <alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [iwamatsu: adjust filename.] Reference: CVE-2020-36311 Signed-off-by: Nobuhiro Iwamatsu (CIP) <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:30 +02:00
Tahsin Erdogan	6a6e04ce3b	ext4: eliminate bogus error in ext4_data_block_valid_rcu() Mainline commit `ce9f24cccd` ("ext4: check journal inode extents more carefully") enabled validity checks for journal inode's data blocks. This change got ported to stable branches, but the backport for 4.19 has a bug where it will flag an error even when system block entry's inode number matches journal inode. The way error is reported is also problematic because it updates the superblock without following journaling rules. This may result in superblock checksum errors if the superblock is in the process of being committed but has a previously calculated checksum that doesn't include the bogus error update. This patch eliminates the bogus error by trying to match how other backports were implemented, which is to flag an error only when inode numbers mismatch. Fixes: commit `a75a5d1638` ("ext4: check journal inode extents more carefully") Signed-off-by: Tahsin Erdogan <trdgn@amazon.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-11 12:49:30 +02:00
Christian König	6dcca74b35	drm/nouveau: fix dma_address check for CPU/GPU sync [ Upstream commit `d330099115` ] AGP for example doesn't have a dma_address array. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210614110517.1624-1-christian.koenig@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:30 +02:00
ManYi Li	b52a404434	scsi: sr: Return appropriate error code when disk is ejected [ Upstream commit `7dd753ca59` ] Handle a reported media event code of 3. This indicates that the media has been removed from the drive and user intervention is required to proceed. Return DISK_EVENT_EJECT_REQUEST in that case. Link: https://lore.kernel.org/r/20210611094402.23884-1-limanyi@uniontech.com Signed-off-by: ManYi Li <limanyi@uniontech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:30 +02:00
Hugh Dickins	2445837e9c	mm, futex: fix shared futex pgoff on shmem huge page [ Upstream commit `fe19bd3dae` ] If more than one futex is placed on a shmem huge page, it can happen that waking the second wakes the first instead, and leaves the second waiting: the key's shared.pgoff is wrong. When 3.11 commit `13d60f4b6a` ("futex: Take hugepages into account when generating futex_key"), the only shared huge pages came from hugetlbfs, and the code added to deal with its exceptional page->index was put into hugetlb source. Then that was missed when 4.8 added shmem huge pages. page_to_pgoff() is what others use for this nowadays: except that, as currently written, it gives the right answer on hugetlbfs head, but nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific hugetlb_basepage_index() on PageHuge tails as well as on head. Yes, it's unconventional to declare hugetlb_basepage_index() there in pagemap.h, rather than in hugetlb.h; but I do not expect anything but page_to_pgoff() ever to need it. [akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope] Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com Fixes: `800d8c63b2` ("shmem: add huge pages support") Reported-by: Neel Natu <neelnatu@google.com> Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Zhang Yi <wetpzy@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: leave redundant #include <linux/hugetlb.h> in kernel/futex.c, to avoid conflict over the header files included. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:30 +02:00
Hugh Dickins	e943b4373c	mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() [ Upstream commit `a7a69d8ba8` ] Aha! Shouldn't that quick scan over pte_none()s make sure that it holds ptlock in the PVMW_SYNC case? That too might have been responsible for BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though I've never seen any. Link: https://lkml.kernel.org/r/1bdf384c-8137-a149-2a1e-475a4791c3c@google.com Link: https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Fixes: `ace71a19ce` ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Wang Yugui <wangyugui@e16-tech.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:30 +02:00
Hugh Dickins	69784c9d5c	mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes [ Upstream commit `a9a7504d9b` ] Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory). Crash dumps showed two tail pages of a shmem huge page remained mapped by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end of a long unmapped range; and no page table had yet been allocated for the head of the huge page to be mapped into. Although designed to handle these odd misaligned huge-page-mapped-by-pte cases, page_vma_mapped_walk() falls short by returning false prematurely when !pmd_present or !pud_present or !p4d_present or !pgd_present: there are cases when a huge page may span the boundary, with ptes present in the next. Restructure page_vma_mapped_walk() as a loop to continue in these cases, while keeping its layout much as before. Add a step_forward() helper to advance pvmw->address across those boundaries: originally I tried to use mm's standard p?d_addr_end() macros, but hit the same crash 512 times less often: because of the way redundant levels are folded together, but folded differently in different configurations, it was just too difficult to use them correctly; and step_forward() is simpler anyway. Link: https://lkml.kernel.org/r/fedb8632-1798-de42-f39e-873551d5bc81@google.com Fixes: `ace71a19ce` ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:29 +02:00
Hugh Dickins	b114408b1a	mm: page_vma_mapped_walk(): get vma_address_end() earlier [ Upstream commit `a765c417d8` ] page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the start, rather than later at next_pte. It's a little unnecessary overhead on the first call, but makes for a simpler loop in the following commit. Link: https://lkml.kernel.org/r/4542b34d-862f-7cb4-bb22-e0df6ce830a2@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:29 +02:00
Hugh Dickins	7d82908ba4	mm: page_vma_mapped_walk(): use goto instead of while (1) [ Upstream commit `474466301d` ] page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte, and use "goto this_pte", in place of the "while (1)" loop at the end. Link: https://lkml.kernel.org/r/a52b234a-851-3616-2525-f42736e8934@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:29 +02:00
Hugh Dickins	ac0324b14d	mm: page_vma_mapped_walk(): add a level of indentation [ Upstream commit `b3807a91ac` ] page_vma_mapped_walk() cleanup: add a level of indentation to much of the body, making no functional change in this commit, but reducing the later diff when this is all converted to a loop. [hughd@google.com: : page_vma_mapped_walk(): add a level of indentation fix] Link: https://lkml.kernel.org/r/7f817555-3ce1-c785-e438-87d8efdcaf26@google.com Link: https://lkml.kernel.org/r/efde211-f3e2-fe54-977-ef481419e7f3@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:29 +02:00
Hugh Dickins	97a79b7896	mm: page_vma_mapped_walk(): crossing page table boundary [ Upstream commit `4482824874` ] page_vma_mapped_walk() cleanup: adjust the test for crossing page table boundary - I believe pvmw->address is always page-aligned, but nothing else here assumed that; and remember to reset pvmw->pte to NULL after unmapping the page table, though I never saw any bug from that. Link: https://lkml.kernel.org/r/799b3f9c-2a9e-dfef-5d89-26e9f76fd97@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:29 +02:00
Hugh Dickins	d4b99cf445	mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block [ Upstream commit `e2e1d4076c` ] page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to follow the same "return not_found, return not_found, return true" pattern as the block above it (note: returning not_found there is never premature, since existence or prior existence of huge pmd guarantees good alignment). Link: https://lkml.kernel.org/r/378c8650-1488-2edf-9647-32a53cf2e21@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:29 +02:00
Hugh Dickins	9fbb45c5d5	mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd [ Upstream commit `3306d3119c` ] page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then use it in subsequent tests, instead of repeatedly dereferencing pointer. Link: https://lkml.kernel.org/r/53fbc9d-891e-46b2-cb4b-468c3b19238e@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:29 +02:00
Hugh Dickins	a027b69916	mm: page_vma_mapped_walk(): settle PageHuge on entry [ Upstream commit `6d0fd59876` ] page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case out of the way at the start, so no need to worry about it later. Link: https://lkml.kernel.org/r/e31a483c-6d73-a6bb-26c5-43c3b880a2@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:28 +02:00
Hugh Dickins	be9ab2d00d	mm: page_vma_mapped_walk(): use page for pvmw->page [ Upstream commit `f003c03bd2` ] Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes". I've marked all of these for stable: many are merely cleanups, but I think they are much better before the main fix than after. This patch (of 11): page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page was used, sometimes pvmw->page itself: use the local copy "page" throughout. Link: https://lkml.kernel.org/r/589b358c-febc-c88e-d4c2-7834b37fa7bf@google.com Link: https://lkml.kernel.org/r/88e67645-f467-c279-bf5e-af4b5c6b13eb@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:28 +02:00
Yang Shi	e17afb6d0b	mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split [ Upstream commit `504e070dc0` ] When debugging the bug reported by Wang Yugui [1], try_to_unmap() may fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however it may miss the failure when head page is unmapped but other subpage is mapped. Then the second DEBUG_VM BUG() that check total mapcount would catch it. This may incur some confusion. As this is not a fatal issue, so consolidate the two DEBUG_VM checks into one VM_WARN_ON_ONCE_PAGE(). [1] https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Link: https://lkml.kernel.org/r/d0f0db68-98b8-ebfb-16dc-f29df24cf012@google.com Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: fixed up variables and split_queue_lock in split_huge_page_to_list(), and conflict on ttu_flags in unmap_page(). Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:28 +02:00
Hugh Dickins	d5cd96a788	mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() [ Upstream commit `22061a1ffa` ] There is a race between THP unmapping and truncation, when truncate sees pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it, but before its page_remove_rmap() gets to decrement compound_mapcount: generating false "BUG: Bad page cache" reports that the page is still mapped when deleted. This commit fixes that, but not in the way I hoped. The first attempt used try_to_unmap(page, TTU_SYNC\|TTU_IGNORE_MLOCK) instead of unmap_mapping_range() in truncate_cleanup_page(): it has often been an annoyance that we usually call unmap_mapping_range() with no pages locked, but there apply it to a single locked page. try_to_unmap() looks more suitable for a single locked page. However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): it is used to insert THP migration entries, but not used to unmap THPs. Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB needs are different, I'm too ignorant of the DAX cases, and couldn't decide how far to go for anon+swap. Set that aside. The second attempt took a different tack: make no change in truncate.c, but modify zap_huge_pmd() to insert an invalidated huge pmd instead of clearing it initially, then pmd_clear() between page_remove_rmap() and unlocking at the end. Nice. But powerpc blows that approach out of the water, with its serialize_against_pte_lookup(), and interesting pgtable usage. It would need serious help to get working on powerpc (with a minor optimization issue on s390 too). Set that aside. Just add an "if (page_mapped(page)) synchronize_rcu();" or other such delay, after unmapping in truncate_cleanup_page()? Perhaps, but though that's likely to reduce or eliminate the number of incidents, it would give less assurance of whether we had identified the problem correctly. This successful iteration introduces "unmap_mapping_page(page)" instead of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, with an addition to details. Then zap_pmd_range() watches for this case, and does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but safe. Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to assert its interface; but currently that's only used to make sure that page->mapping is stable, and zap_pmd_range() doesn't care if the page is locked or not. Along these lines, in invalidate_inode_pages2_range() move the initial unmap_mapping_range() out from under page lock, before then calling unmap_mapping_page() under page lock if still mapped. Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com Fixes: `fc127da085` ("truncate: handle file thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: fixed up call to truncate_cleanup_page() in truncate_inode_pages_range(). Use hpage_nr_pages() in unmap_mapping_page(). Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:28 +02:00
Jue Wang	2c595b67fc	mm/thp: fix page_address_in_vma() on file THP tails [ Upstream commit `31657170de` ] Anon THP tails were already supported, but memory-failure may need to use page_address_in_vma() on file THP tails, which its page->mapping check did not permit: fix it. hughd adds: no current usage is known to hit the issue, but this does fix a subtle trap in a general helper: best fixed in stable sooner than later. Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com Fixes: `800d8c63b2` ("shmem: add huge pages support") Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:28 +02:00
Hugh Dickins	f387225573	mm/thp: fix vma_address() if virtual address below file offset [ Upstream commit `494334e43c` ] Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory). When that BUG() was changed to a WARN(), it would later crash on the VM_BUG_ON_VMA(end < vma->vm_start \|\| start >= vma->vm_end, vma) in mm/internal.h:vma_address(), used by rmap_walk_file() for try_to_unmap(). vma_address() is usually correct, but there's a wraparound case when the vm_start address is unusually low, but vm_pgoff not so low: vma_address() chooses max(start, vma->vm_start), but that decides on the wrong address, because start has become almost ULONG_MAX. Rewrite vma_address() to be more careful about vm_pgoff; move the VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can be safely used from page_mapped_in_vma() and page_address_in_vma() too. Add vma_address_end() to apply similar care to end address calculation, in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one(); though it raises a question of whether callers would do better to supply pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch. An irritation is that their apparent generality breaks down on KSM pages, which cannot be located by the page->index that page_to_pgoff() uses: as commit `4b0ece6fa0` ("mm: migrate: fix remove_migration_pte() for ksm pages") once discovered. I dithered over the best thing to do about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both vma_address() and vma_address_end(); though the only place in danger of using it on them was try_to_unmap_one(). Sidenote: vma_address() and vma_address_end() now use compound_nr() on a head page, instead of thp_size(): to make the right calculation on a hugetlbfs page, whether or not THPs are configured. try_to_unmap() is used on hugetlbfs pages, but perhaps the wrong calculation never mattered. Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com Fixes: `a8fa41ad2f` ("mm, rmap: check all VMAs that PTE-mapped THP can be part of") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: fixed up conflicts on intervening thp_size(), and mmu_notifier_range initializations; substitute for compound_nr(). Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:28 +02:00
Hugh Dickins	205899d6be	mm/thp: try_to_unmap() use TTU_SYNC for safe splitting [ Upstream commit `732ed55823` ] Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0. And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount. Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page(). When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: `fec89c109f` ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: upstream TTU_SYNC 0x10 takes the value which 5.11 commit `013339df11` ("mm/rmap: always do TTU_IGNORE_ACCESS") freed. It is very tempting to backport that commit (as 5.10 already did) and make no change here; but on reflection, good as that commit is, I'm reluctant to include any possible side-effect of it in this series. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:28 +02:00
Hugh Dickins	fc1fbc5b01	mm/thp: make is_huge_zero_pmd() safe and quicker [ Upstream commit `3b77e8c8cd` ] Most callers of is_huge_zero_pmd() supply a pmd already verified present; but a few (notably zap_huge_pmd()) do not - it might be a pmd migration entry, in which the pfn is encoded differently from a present pmd: which might pass the is_huge_zero_pmd() test (though not on x86, since L1TF forced us to protect against that); or perhaps even crash in pmd_page() applied to a swap-like entry. Make it safe by adding pmd_present() check into is_huge_zero_pmd() itself; and make it quicker by saving huge_zero_pfn, so that is_huge_zero_pmd() will not need to do that pmd_page() lookup each time. __split_huge_pmd_locked() checked pmd_trans_huge() before: that worked, but is unnecessary now that is_huge_zero_pmd() checks present. Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com Fixes: `e71769ae52` ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:27 +02:00
Hugh Dickins	629ee482e0	mm/thp: fix __split_huge_pmd_locked() on shmem migration entry [ Upstream commit `99fa8a4820` ] Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10. Here is v2 batch of long-standing THP bug fixes that I had not got around to sending before, but prompted now by Wang Yugui's report https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and they have done no harm, but have not fixed that issue: something more is needed and I have no idea of what. This patch (of 7): Stressing huge tmpfs page migration racing hole punch often crashed on the VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y kernel; or shortly afterwards, on a bad dereference in __split_huge_pmd_locked() when DEBUG_VM=n. They forgot to allow for pmd migration entries in the non-anonymous case. Full disclosure: those particular experiments were on a kernel with more relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the vanilla kernel: it is conceivable that stricter locking happens to avoid those cases, or makes them less likely; but __split_huge_pmd_locked() already allowed for pmd migration entries when handling anonymous THPs, so this commit brings the shmem and file THP handling into line. And while there: use old_pmd rather than _pmd, as in the following blocks; and make it clearer to the eye that the !vma_is_anonymous() block is self-contained, making an early return after accounting for unmapping. Link: https://lkml.kernel.org/r/af88612-1473-2eaa-903-8d1a448b26@google.com Link: https://lkml.kernel.org/r/dd221a99-efb3-cd1d-6256-7e646af29314@google.com Fixes: `e71769ae52` ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jue Wang <juew@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Shakeel Butt <shakeelb@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: this commit made intervening cleanups in pmdp_huge_clear_flush() redundant: here it's rediffed to skip them. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-11 12:49:27 +02:00

1 2 3 4 5 ...

802987 Commits