linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-09 20:32:04 +09:00

Author	SHA1	Message	Date
Suwan Kim	e2dd254bde	usbip: Implement SG support to vhci-hcd and stub driver commit `ea44d19076` upstream. There are bugs on vhci with usb 3.0 storage device. In USB, each SG list entry buffer should be divisible by the bulk max packet size. But with native SG support, this problem doesn't matter because the SG buffer is treated as contiguous buffer. But without native SG support, USB storage driver breaks SG list into several URBs and the error occurs because of a buffer size of URB that cannot be divided by the bulk max packet size. The error situation is as follows. When USB Storage driver requests 31.5 KB data and has SG list which has 3584 bytes buffer followed by 7 4096 bytes buffer for some reason. USB Storage driver splits this SG list into several URBs because VHCI doesn't support SG and sends them separately. So the first URB buffer size is 3584 bytes. When receiving data from device, USB 3.0 device sends data packet of 1024 bytes size because the max packet size of BULK pipe is 1024 bytes. So device sends 4096 bytes. But the first URB buffer has only 3584 bytes buffer size. So host controller terminates the transfer even though there is more data to receive. So, vhci needs to support SG transfer to prevent this error. In this patch, vhci supports SG regardless of whether the server's host controller supports SG or not, because stub driver splits SG list into several URBs if the server's host controller doesn't support SG. To support SG, vhci sets URB_DMA_MAP_SG flag in urb->transfer_flags if URB has SG list and this flag will tell stub driver to use SG list. After receiving urb from stub driver, vhci clear URB_DMA_MAP_SG flag to avoid unnecessary DMA unmapping in HCD. vhci sends each SG list entry to stub driver. Then, stub driver sees the total length of the buffer and allocates SG table and pages according to the total buffer length calling sgl_alloc(). After stub driver receives completed URB, it again sends each SG list entry to vhci. If the server's host controller doesn't support SG, stub driver breaks a single SG request into several URBs and submits them to the server's host controller. When all the split URBs are completed, stub driver reassembles the URBs into a single return command and sends it to vhci. Moreover, in the situation where vhci supports SG, but stub driver does not, or vice versa, usbip works normally. Because there is no protocol modification, there is no problem in communication between server and client even if the one has a kernel without SG support. In the case of vhci supports SG and stub driver doesn't, because vhci sends only the total length of the buffer to stub driver as it did before the patch applied, stub driver only needs to allocate the required length of buffers using only kmalloc() regardless of whether vhci supports SG or not. But stub driver has to allocate buffer with kmalloc() as much as the total length of SG buffer which is quite huge when vhci sends SG request, so it has overhead in buffer allocation in this situation. If stub driver needs to send data buffer to vhci because of IN pipe, stub driver also sends only total length of buffer as metadata and then sends real data as vhci does. Then vhci receive data from stub driver and store it to the corresponding buffer of SG list entry. And for the case of stub driver supports SG and vhci doesn't, since the USB storage driver checks that vhci doesn't support SG and sends the request to stub driver by splitting the SG list into multiple URBs, stub driver allocates a buffer for each URB with kmalloc() as it did before this patch. * Test environment Test uses two difference machines and two different kernel version to make mismatch situation between the client and the server where vhci supports SG, but stub driver does not, or vice versa. All tests are conducted in both full SG support that both vhci and stub support SG and half SG support that is the mismatch situation. Test kernel version is 5.3-rc6 with commit "usb: add a HCD_DMA flag instead of guestimating DMA capabilities" to avoid unnecessary DMA mapping and unmapping. - Test kernel version - 5.3-rc6 with SG support - 5.1.20-200.fc29.x86_64 without SG support * SG support test - Test devices - Super-speed storage device - SanDisk Ultra USB 3.0 - High-speed storage device - SMI corporation USB 2.0 flash drive - Test description Test read and write operation of mass storage device that uses the BULK transfer. In test, the client reads and writes files whose size is over 1G and it works normally. * Regression test - Test devices - Super-speed device - Logitech Brio webcam - High-speed device - Logitech C920 HD Pro webcam - Full-speed device - Logitech bluetooth mouse - Britz BR-Orion speaker - Low-speed device - Logitech wired mouse - Test description Moving and click test for mouse. To test the webcam, use gnome-cheese. To test the speaker, play music and video on the client. All works normally. * VUDC compatibility test VUDC also works well with this patch. Tests are done with two USB gadget created by CONFIGFS USB gadget. Both use the BULK pipe. 1. Serial gadget 2. Mass storage gadget - Serial gadget test Serial gadget on the host sends and receives data using cat command on the /dev/ttyGS<N>. The client uses minicom to communicate with the serial gadget. - Mass storage gadget test After connecting the gadget with vhci, use "dd" to test read and write operation on the client side. Read - dd if=/dev/sd<N> iflag=direct of=/dev/null bs=1G count=1 Write - dd if=<my file path> iflag=direct of=/dev/sd<N> bs=1G count=1 Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Acked-by: Shuah khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20190828032741.12234-1-suwan.kim027@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:52 +01:00
Shuah Khan	f865ae473c	usbip: Fix vhci_urb_enqueue() URB null transfer buffer error path commit `2c904963b1` upstream. Fix vhci_urb_enqueue() to print debug msg and return error instead of failing with BUG_ON. Signed-off-by: Shuah Khan <shuah@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:51 +01:00
Qian Cai	e9c0fc4a7c	sched/fair: Fix -Wunused-but-set-variable warnings commit `763a9ec06c` upstream. Commit: `de53fd7aed` ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices") introduced a few compilation warnings: kernel/sched/fair.c: In function '__refill_cfs_bandwidth_runtime': kernel/sched/fair.c:4365:6: warning: variable 'now' set but not used [-Wunused-but-set-variable] kernel/sched/fair.c: In function 'start_cfs_bandwidth': kernel/sched/fair.c:4992:6: warning: variable 'overrun' set but not used [-Wunused-but-set-variable] Also, __refill_cfs_bandwidth_runtime() does no longer update the expiration time, so fix the comments accordingly. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Reviewed-by: Dave Chiluk <chiluk+linux@indeed.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pauld@redhat.com Fixes: `de53fd7aed` ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices") Link: https://lkml.kernel.org/r/1566326455-8038-1-git-send-email-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:51 +01:00
Dave Chiluk	502bd15144	sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices commit `de53fd7aed` upstream. It has been observed, that highly-threaded, non-cpu-bound applications running under cpu.cfs_quota_us constraints can hit a high percentage of periods throttled while simultaneously not consuming the allocated amount of quota. This use case is typical of user-interactive non-cpu bound applications, such as those running in kubernetes or mesos when run on multiple cpu cores. This has been root caused to cpu-local run queue being allocated per cpu bandwidth slices, and then not fully using that slice within the period. At which point the slice and quota expires. This expiration of unused slice results in applications not being able to utilize the quota for which they are allocated. The non-expiration of per-cpu slices was recently fixed by 'commit `512ac999d2` ("sched/fair: Fix bandwidth timer clock drift condition")'. Prior to that it appears that this had been broken since at least 'commit `51f2176d74` ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That added the following conditional which resulted in slices never being expired. if (cfs_rq->runtime_expires != cfs_b->runtime_expires) { /* extend local deadline, drift is bounded above by 2 ticks */ cfs_rq->runtime_expires += TICK_NSEC; Because this was broken for nearly 5 years, and has recently been fixed and is now being noticed by many users running kubernetes (https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion that the mechanisms around expiring runtime should be removed altogether. This allows quota already allocated to per-cpu run-queues to live longer than the period boundary. This allows threads on runqueues that do not use much CPU to continue to use their remaining slice over a longer period of time than cpu.cfs_period_us. However, this helps prevent the above condition of hitting throttling while also not fully utilizing your cpu quota. This theoretically allows a machine to use slightly more than its allotted quota in some periods. This overflow would be bounded by the remaining quota left on each per-cpu runqueueu. This is typically no more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will change nothing, as they should theoretically fully utilize all of their quota in each period. For user-interactive tasks as described above this provides a much better user/application experience as their cpu utilization will more closely match the amount they requested when they hit throttling. This means that cpu limits no longer strictly apply per period for non-cpu bound applications, but that they are still accurate over longer timeframes. This greatly improves performance of high-thread-count, non-cpu bound applications with low cfs_quota_us allocation on high-core-count machines. In the case of an artificial testcase (10ms/100ms of quota on 80 CPU machine), this commit resulted in almost 30x performance improvement, while still maintaining correct cpu quota restrictions. That testcase is available at https://github.com/indeedeng/fibtest. Fixes: `512ac999d2` ("sched/fair: Fix bandwidth timer clock drift condition") Signed-off-by: Dave Chiluk <chiluk+linux@indeed.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Ben Segall <bsegall@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Hammond <jhammond@indeed.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kyle Anderson <kwa@yelp.com> Cc: Gabriel Munos <gmunoz@netflix.com> Cc: Peter Oskolkov <posk@posk.io> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Brendan Gregg <bgregg@netflix.com> Link: https://lkml.kernel.org/r/1563900266-19734-2-git-send-email-chiluk+linux@indeed.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:50 +01:00
Takashi Iwai	4ebee4875e	ALSA: usb-audio: Fix copy&paste error in the validator commit `ba8bf0967a` upstream. The recently introduced USB-audio descriptor validator had a stupid copy&paste error that may lead to an unexpected overlook of too short descriptors for processing and extension units. It's likely the cause of the report triggered by syzkaller fuzzer. Let's fix it. Fixes: `57f8770620` ("ALSA: usb-audio: More validations of descriptor units") Reported-by: syzbot+0620f79a1978b1133fd7@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/s5hsgnkdbsl.wl-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:50 +01:00
Dan Carpenter	e005188924	ALSA: usb-audio: remove some dead code commit `b39e077fcb` upstream. We recently cleaned up the error handling in commit `52c3e317a8` ("ALSA: usb-audio: Unify the release of usb_mixer_elem_info objects") but accidentally left this stray return. Fixes: `52c3e317a8` ("ALSA: usb-audio: Unify the release of usb_mixer_elem_info objects") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:50 +01:00
Takashi Iwai	4f6c520026	ALSA: usb-audio: Fix possible NULL dereference at create_yamaha_midi_quirk() commit `60849562a5` upstream. The previous addition of descriptor validation may lead to a NULL dereference at create_yamaha_midi_quirk() when either injd or outjd is NULL. Add proper non-NULL checks. Fixes: `57f8770620` ("ALSA: usb-audio: More validations of descriptor units") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:49 +01:00
Takashi Iwai	3a0cdf210b	ALSA: usb-audio: Clean up check_input_term() commit `e0ccdef926` upstream. The primary changes in this patch are cleanups of __check_input_term() and move to a non-nested switch-case block by evaluating the pair of UAC version and the unit type, as we've done for parse_audio_unit(). Also each parser is split into the function for readability. Now, a slight behavior change by this cleanup is the handling of processing and extension units. Formerly we've dealt with them differently between UAC1/2 and UAC3; the latter returns an error if no input sources are available, while the former continues to parse. In this patch, unify the behavior in all cases: when input sources are available, it parses recursively, then override the type and the id, as well as channel information if not provided yet. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:49 +01:00
Takashi Iwai	9feeaa50e5	ALSA: usb-audio: Remove superfluous bLength checks commit `b8e4f1fdfa` upstream. Now that we got the more comprehensive validation code for USB-audio descriptors, the check of overflow in each descriptor unit parser became superfluous. Drop some of the obvious cases. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:48 +01:00
Takashi Iwai	f0e164f66e	ALSA: usb-audio: Unify the release of usb_mixer_elem_info objects commit `52c3e317a8` upstream. Instead of the direct kfree() calls, introduce a new local helper to release the usb_mixer_elem_info object. This will be extended to do more than a single kfree() in the later patches. Also, use the standard goto instead of multiple calls in parse_audio_selector_unit() error paths. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:48 +01:00
Takashi Iwai	dae4d839e5	ALSA: usb-audio: Simplify parse_audio_unit() commit `68e9fde245` upstream. Minor code refactoring by combining the UAC version and the type in the switch-case flow, so that we reduce the indentation and redundancy. One good bonus is that the duplicated definition of the same type value (e.g. UAC2_EFFECT_UNIT) can be handled more cleanly. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:48 +01:00
Takashi Iwai	17821e2fb1	ALSA: usb-audio: More validations of descriptor units commit `57f8770620` upstream. Introduce a new helper to validate each audio descriptor unit before and check the unit before actually accessing it. This should harden against the OOB access cases with malformed descriptors that have been recently frequently reported by fuzzers. The existing descriptor checks are still kept although they become superfluous after this patch. They'll be cleaned up eventually later. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:48 +01:00
Al Viro	5e36cf8edb	configfs: fix a deadlock in configfs_symlink() commit `351e5d869e` upstream. Configfs abuses symlink(2). Unlike the normal filesystems, it wants the target resolved at symlink(2) time, like link(2) would've done. The problem is that ->symlink() is called with the parent directory locked exclusive, so resolving the target inside the ->symlink() is easily deadlocked. Short of really ugly games in sys_symlink() itself, all we can do is to unlock the parent before resolving the target and relock it after. However, that invalidates the checks done by the caller of ->symlink(), so we have to * check that dentry is still where it used to be (it couldn't have been moved, but it could've been unhashed) * recheck that it's still negative (somebody else might've successfully created a symlink with the same name while we were looking the target up) * recheck the permissions on the parent directory. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:47 +01:00
Al Viro	0dfc45be87	configfs: provide exclusion between IO and removals commit `b0841eefd9` upstream. Make sure that attribute methods are not called after the item has been removed from the tree. To do so, we * at the point of no return in removals, grab ->frag_sem exclusive and mark the fragment dead. * call the methods of attributes with ->frag_sem taken shared and only after having verified that the fragment is still alive. The main benefit is for method instances - they are guaranteed that the objects they are accessing and all ancestors are still there. Another win is that we don't need to bother with extra refcount on config_item when opening a file - the item will be alive for as long as it stays in the tree, and we won't touch it/attributes/any associated data after it's been removed from the tree. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:47 +01:00
Al Viro	25c118d8d1	configfs: new object reprsenting tree fragments commit `47320fbe11` upstream. Refcounted, hangs of configfs_dirent, created by operations that add fragments to configfs tree (mkdir and configfs_register_{subsystem,group}). Will be used in the next commit to provide exclusion between fragment removal and ->show/->store calls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:46 +01:00
Al Viro	65524d647e	configfs_register_group() shouldn't be (and isn't) called in rmdirable parts commit `f19e4ed1e1` upstream. revert `cc57c07343` "configfs: fix registered group removal" It was an attempt to handle something that fundamentally doesn't work - configfs_register_group() should never be done in a part of tree that can be rmdir'ed. And in mainline it never had been, so let's not borrow trouble; the fix was racy anyway, it would take a lot more to make that work and desired semantics is not clear. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:46 +01:00
Al Viro	2bd63490c1	configfs: stash the data we need into configfs_buffer at open time commit `ff4dd08197` upstream. simplifies the ->read()/->write()/->release() instances nicely Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:46 +01:00
Johan Hovold	a7be2debb7	can: peak_usb: fix slab info leak commit `f7a1337f0d` upstream. Fix a small slab info leak due to a failure to clear the command buffer at allocation. The first 16 bytes of the command buffer are always sent to the device in pcan_usb_send_cmd() even though only the first two may have been initialised in case no argument payload is provided (e.g. when waiting for a response). Fixes: `bb4785551f` ("can: usb: PEAK-System Technik USB adapters driver core") Cc: stable <stable@vger.kernel.org> # 3.4 Reported-by: syzbot+863724e7128e14b26732@syzkaller.appspotmail.com Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:45 +01:00
Johan Hovold	ce9b94da0e	can: mcba_usb: fix use-after-free on disconnect commit `4d6636498c` upstream. The driver was accessing its driver data after having freed it. Fixes: `51f3baad7d` ("can: mcba_usb: Add support for Microchip CAN BUS Analyzer") Cc: stable <stable@vger.kernel.org> # 4.12 Cc: Remigiusz Kołłątaj <remigiusz.kollataj@mobica.com> Reported-by: syzbot+e29b17e5042bbc56fae9@syzkaller.appspotmail.com Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:45 +01:00
Wen Yang	5a9e37f202	can: dev: add missing of_node_put() after calling of_get_child_by_name() commit `db9ee384f6` upstream. of_node_put() needs to be called when the device node which is got from of_get_child_by_name() finished using. Fixes: `2290aefa2e` ("can: dev: Add support for limiting configured bitrate") Cc: Franklin S Cooper Jr <fcooper@ti.com> Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:45 +01:00
Navid Emamdoost	9289226f69	can: gs_usb: gs_can_open(): prevent memory leak commit `fb5be6a7b4` upstream. In gs_can_open() if usb_submit_urb() fails the allocated urb should be released. Fixes: `d08e973a77` ("can: gs_usb: Added support for the GS_USB CAN devices") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:44 +01:00
Marc Kleine-Budde	9f5c594288	can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak commit `ca913f1ac0` upstream. If the rx-offload skb_queue is full can_rx_offload_queue_sorted() will not queue the skb and return with an error. None of the callers of this function, issue a kfree_skb() to free the not queued skb. This results in a memory leak. This patch fixes the problem by freeing the skb in case of a full queue. The return value is adjusted to -ENOBUFS to better reflect the actual problem. The device stats handling is left to the callers, as this function might be used in both the rx and tx path. Fixes: `55059f2b7f` ("can: rx-offload: introduce can_rx_offload_get_echo_skb() and can_rx_offload_queue_sorted() functions") Cc: linux-stable <stable@vger.kernel.org> Cc: Martin Hundebøll <martin@geanix.com> Reported-by: Martin Hundebøll <martin@geanix.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:44 +01:00
Stephane Grosjean	ef502d5a84	can: peak_usb: fix a potential out-of-sync while decoding packets commit `de280f403f` upstream. When decoding a buffer received from PCAN-USB, the first timestamp read in a packet is a 16-bit coded time base, and the next ones are an 8-bit offset to this base, regardless of the type of packet read. This patch corrects a potential loss of synchronization by using a timestamp index read from the buffer, rather than an index of received data packets, to determine on the sizeof the timestamp to be read from the packet being decoded. Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Fixes: `46be265d33` ("can: usb: PEAK-System Technik PCAN-USB specific part") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:44 +01:00
Kurt Van Dijck	7ae08111ca	can: c_can: c_can_poll(): only read status register after status IRQ commit `3cb3eaac52` upstream. When the status register is read without the status IRQ pending, the chip may not raise the interrupt line for an upcoming status interrupt and the driver may miss a status interrupt. It is critical that the BUSOFF status interrupt is forwarded to the higher layers, since no more interrupts will follow without intervention. Thanks to Wolfgang and Joe for bringing up the first idea. Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Cc: Wolfgang Grandegger <wg@grandegger.com> Cc: Joe Burmeister <joe.burmeister@devtank.co.uk> Fixes: `fa39b54ccf` ("can: c_can: Get rid of pointless interrupts") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:44 +01:00
Joakim Zhang	0327c7818d	can: flexcan: disable completely the ECC mechanism commit `5e269324db` upstream. The ECC (memory error detection and correction) mechanism can be activated or not, controlled by the ECCDIS bit in CAN_MECR. When disabled, updates on indications and reporting registers are stopped. So if want to disable ECC completely, had better assert ECCDIS bit, not just mask the related interrupts. Fixes: `cdce844865` ("can: flexcan: add vf610 support for FlexCAN") Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:43 +01:00
Johan Hovold	46265660e5	can: usb_8dev: fix use-after-free on disconnect commit `3759739426` upstream. The driver was accessing its driver data after having freed it. Fixes: `0024d8ad16` ("can: usb_8dev: Add support for USB2CAN interface from 8 devices") Cc: stable <stable@vger.kernel.org> # 3.9 Cc: Bernd Krumboeck <b.krumboeck@gmail.com> Cc: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:43 +01:00
Pavel Shilovsky	d8a76e300e	SMB3: Fix persistent handles reconnect commit `d243af7ab9` upstream. When the client hits a network reconnect, it re-opens every open file with a create context to reconnect a persistent handle. All create context types should be 8-bytes aligned but the padding was missed for that one. As a result, some servers don't allow us to reconnect handles and return an error. The problem occurs when the problematic context is not at the end of the create request packet. Fix this by adding a proper padding at the end of the reconnect persistent handle context. Cc: Stable <stable@vger.kernel.org> # 4.19.x Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:42 +01:00
Jan Beulich	caddaf43b0	x86/apic/32: Avoid bogus LDR warnings commit `fe6f85ca12` upstream. The removal of the LDR initialization in the bigsmp_32 APIC code unearthed a problem in setup_local_APIC(). The code checks unconditionally for a mismatch of the logical APIC id by comparing the early APIC id which was initialized in get_smp_config() with the actual LDR value in the APIC. Due to the removal of the bogus LDR initialization the check now can trigger on bigsmp_32 APIC systems emitting a warning for every booting CPU. This is of course a false positive because the APIC is not using logical destination mode. Restrict the check and the possibly resulting fixup to systems which are actually using the APIC in logical destination mode. [ tglx: Massaged changelog and added Cc stable ] Fixes: `bae3a8d330` ("x86/apic: Do not initialize LDR and DFR for bigsmp") Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/666d8f91-b5a8-1afd-7add-821e72a35f03@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:42 +01:00
Alexander Shishkin	dc1a91dc49	intel_th: pci: Add Jasper Lake PCH support commit `9d55499d8d` upstream. This adds support for Intel TH on Jasper Lake PCH. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191028070651.9770-8-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:42 +01:00
Alexander Shishkin	f9d3aea1dc	intel_th: pci: Add Comet Lake PCH support commit `3adbb5718d` upstream. This adds support for Intel TH on Comet Lake PCH. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191028070651.9770-7-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:41 +01:00
Dan Carpenter	64997ee49c	netfilter: ipset: Fix an error code in ip_set_sockfn_get() commit `30b7244d79` upstream. The copy_to_user() function returns the number of bytes remaining to be copied. In this code, that positive return is checked at the end of the function and we return zero/success. What we should do instead is return -EFAULT. Fixes: `a7b4f989a6` ("netfilter: ipset: IP set core support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:41 +01:00
Lukas Wunner	1b0e60f6a4	netfilter: nf_tables: Align nft_expr private data to 64-bit commit `250367c59e` upstream. Invoking the following commands on a 32-bit architecture with strict alignment requirements (such as an ARMv7-based Raspberry Pi) results in an alignment exception: # nft add table ip test-ip4 # nft add chain ip test-ip4 output { type filter hook output priority 0; } # nft add rule ip test-ip4 output quota 1025 bytes Alignment trap: not handling instruction e1b26f9f at [<7f4473f8>] Unhandled fault: alignment exception (0x001) at 0xb832e824 Internal error: : 1 [#1] PREEMPT SMP ARM Hardware name: BCM2835 [<7f4473fc>] (nft_quota_do_init [nft_quota]) [<7f447448>] (nft_quota_init [nft_quota]) [<7f4260d0>] (nf_tables_newrule [nf_tables]) [<7f4168dc>] (nfnetlink_rcv_batch [nfnetlink]) [<7f416bd0>] (nfnetlink_rcv [nfnetlink]) [<8078b334>] (netlink_unicast) [<8078b664>] (netlink_sendmsg) [<8071b47c>] (sock_sendmsg) [<8071bd18>] (___sys_sendmsg) [<8071ce3c>] (__sys_sendmsg) [<8071ce94>] (sys_sendmsg) The reason is that nft_quota_do_init() calls atomic64_set() on an atomic64_t which is only aligned to 32-bit, not 64-bit, because it succeeds struct nft_expr in memory which only contains a 32-bit pointer. Fix by aligning the nft_expr private data to 64-bit. Fixes: `96518518cc` ("netfilter: add nftables") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:41 +01:00
Ondrej Jirman	2dae80b5b6	ARM: sunxi: Fix CPU powerdown on A83T commit `e690053e97` upstream. PRCM_PWROFF_GATING_REG has CPU0 at bit 4 on A83T. So without this patch, instead of gating the CPU0, the whole cluster was power gated, when shutting down first CPU in the cluster. Fixes: `6961275e72` ("ARM: sun8i: smp: Add support for A83T") Signed-off-by: Ondrej Jirman <megous@megous.com> Acked-by: Chen-Yu Tsai <wens@csie.org> Cc: stable@vger.kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:40 +01:00
Andreas Klinger	20b9e094dc	iio: srf04: fix wrong limitation in distance measuring commit `431f7667bd` upstream. The measured time value in the driver is limited to the maximum distance which can be read by the sensor. This limitation was wrong and is fixed by this patch. It also takes into account that we are supporting a variety of sensors today and that the recently added sensors have a higher maximum distance range. Changes in v2: - Added a Tested-by Suggested-by: Zbyněk Kocur <zbynek.kocur@fel.cvut.cz> Tested-by: Zbyněk Kocur <zbynek.kocur@fel.cvut.cz> Signed-off-by: Andreas Klinger <ak@it-klinger.de> Cc:<Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:40 +01:00
Alexandru Ardelean	bee45b44b1	iio: imu: adis16480: make sure provided frequency is positive commit `24e1eb5c0d` upstream. It could happen that either `val` or `val2` [provided from userspace] is negative. In that case the computed frequency could get a weird value. Fix this by checking that neither of the 2 variables is negative, and check that the computed result is not-zero. Fixes: `e4f9593901` ("iio: imu: adis16480 switch sampling frequency attr to core support") Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:40 +01:00
Fabrice Gasnier	a428996147	iio: adc: stm32-adc: fix stopping dma commit `e6afcf6c59` upstream. There maybe a race when using dmaengine_terminate_all(). The predisable routine may call iio_triggered_buffer_predisable() prior to a pending DMA callback. Adopt dmaengine_terminate_sync() to ensure there's no pending DMA request before calling iio_triggered_buffer_predisable(). Fixes: `2763ea0585` ("iio: adc: stm32: add optional dma support") Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:40 +01:00
Al Viro	78a1d6cdd3	ceph: add missing check in d_revalidate snapdir handling commit `1f08529c84` upstream. We should not play with dcache without parent locked... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:39 +01:00
Luis Henriques	6f9657793a	ceph: fix use-after-free in __ceph_remove_cap() commit `ea60ed6fcf` upstream. KASAN reports a use-after-free when running xfstest generic/531, with the following trace: [ 293.903362] kasan_report+0xe/0x20 [ 293.903365] rb_erase+0x1f/0x790 [ 293.903370] __ceph_remove_cap+0x201/0x370 [ 293.903375] __ceph_remove_caps+0x4b/0x70 [ 293.903380] ceph_evict_inode+0x4e/0x360 [ 293.903386] evict+0x169/0x290 [ 293.903390] __dentry_kill+0x16f/0x250 [ 293.903394] dput+0x1c6/0x440 [ 293.903398] __fput+0x184/0x330 [ 293.903404] task_work_run+0xb9/0xe0 [ 293.903410] exit_to_usermode_loop+0xd3/0xe0 [ 293.903413] do_syscall_64+0x1a0/0x1c0 [ 293.903417] entry_SYSCALL_64_after_hwframe+0x44/0xa9 This happens because __ceph_remove_cap() may queue a cap release (__ceph_queue_cap_release) which can be scheduled before that cap is removed from the inode list with rb_erase(&cap->ci_node, &ci->i_caps); And, when this finally happens, the use-after-free will occur. This can be fixed by removing the cap from the inode list before being removed from the session list, and thus eliminating the risk of an UAF. Cc: stable@vger.kernel.org Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:39 +01:00
Catalin Marinas	3840610d60	arm64: Do not mask out PTE_RDONLY in pte_same() commit `6767df245f` upstream. Following commit `73e86cb03c` ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()"), the PTE_RDONLY bit is no longer managed by set_pte_at() but built into the PAGE_* attribute definitions. Consequently, pte_same() must include this bit when checking two PTEs for equality. Remove the arm64-specific pte_same() function, practically reverting commit `747a70e60b` ("arm64: Fix copy-on-write referencing in HugeTLB") Fixes: `73e86cb03c` ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()") Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Will Deacon <will@kernel.org> Cc: Steve Capper <steve.capper@arm.com> Reported-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:39 +01:00
Bard Liao	56f270a1d7	soundwire: bus: set initial value to port_status commit `f1fac63af6` upstream. port_status[port_num] are assigned for each port_num in some if conditions. So some of the port_status may not be initialized. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20190829181135.16049-1-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:38 +01:00
Michal Suchanek	9a06efc745	soundwire: depend on ACPI commit `52eb063d15` upstream. The device cannot be probed on !ACPI and gives this warning: drivers/soundwire/slave.c:16:12: warning: ‘sdw_slave_add’ defined but not used [-Wunused-function] static int sdw_slave_add(struct sdw_bus *bus, ^~~~~~~~~~~~~ Cc: stable@vger.kernel.org Fixes: `7c3cd189b8` ("soundwire: Add Master registration") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Link: https://lore.kernel.org/r/bd685232ea511251eeb9554172f1524eabf9a46e.1570097621.git.msuchanek@suse.de Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:38 +01:00
Jason Gerecke	a81a463745	HID: wacom: generic: Treat serial number and related fields as unsigned commit `ff479731c3` upstream. The HID descriptors for most Wacom devices oddly declare the serial number and other related fields as signed integers. When these numbers are ingested by the HID subsystem, they are automatically sign-extended into 32-bit integers. We treat the fields as unsigned elsewhere in the kernel and userspace, however, so this sign-extension causes problems. In particular, the sign-extended tool ID sent to userspace as ABS_MISC does not properly match unsigned IDs used by xf86-input-wacom and libwacom. We introduce a function 'wacom_s32tou' that can undo the automatic sign extension performed by 'hid_snto32'. We call this function when processing the serial number and related fields to ensure that we are dealing with and reporting the unsigned form. We opt to use this method rather than adding a descriptor fixup in 'wacom_hid_usage_quirk' since it should be more robust in the face of future devices. Ref: https://github.com/linuxwacom/input-wacom/issues/134 Fixes: `f85c9dc678` ("HID: wacom: generic: Support tool ID and additional tool types") CC: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Reviewed-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:38 +01:00
Alex Deucher	e3fdd0c1a3	drm/radeon: fix si_enable_smc_cac() failed issue commit `2c409ba81b` upstream. Need to set the dte flag on this asic. Port the fix from amdgpu: `5cb818b861` ("drm/amd/amdgpu: fix si_enable_smc_cac() failed issue") Reviewed-by: Yong Zhao <yong.zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:38 +01:00
Jiri Olsa	f39fbd05f9	perf tools: Fix time sorting commit `722ddfde36` upstream. The final sort might get confused when the comparison is done over bigger numbers than int like for -s time. Check the following report for longer workloads: $ perf report -s time -F time,overhead --stdio Fix hist_entry__sort() to properly return int64_t and not possible cut int. Fixes: `043ca389a3` ("perf tools: Use hpp formats to sort final output") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org # v3.16+ Link: http://lore.kernel.org/lkml/20191104232711.16055-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:37 +01:00
Shuah Khan	66d53cd683	tools: gpio: Use !building_out_of_srctree to determine srctree commit `4a6a6f5c4a` upstream. make TARGETS=gpio kselftest fails with: Makefile:23: tools/build/Makefile.include: No such file or directory When the gpio tool make is invoked from tools Makefile, srctree is cleared and the current logic check for srctree equals to empty string to determine srctree location from CURDIR. When the build in invoked from selftests/gpio Makefile, the srctree is set to "." and the same logic used for srctree equals to empty is needed to determine srctree. Check building_out_of_srctree undefined as the condition for both cases to fix "make TARGETS=gpio kselftest" build failure. Cc: stable@vger.kernel.org Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:37 +01:00
Kevin Hao	8e358a0276	dump_stack: avoid the livelock of the dump_lock commit `5cbf2fff3b` upstream. In the current code, we use the atomic_cmpxchg() to serialize the output of the dump_stack(), but this implementation suffers the thundering herd problem. We have observed such kind of livelock on a Marvell cn96xx board(24 cpus) when heavily using the dump_stack() in a kprobe handler. Actually we can let the competitors to wait for the releasing of the lock before jumping to atomic_cmpxchg(). This will definitely mitigate the thundering herd problem. Thanks Linus for the suggestion. [akpm@linux-foundation.org: fix comment] Link: http://lkml.kernel.org/r/20191030031637.6025-1-haokexin@gmail.com Fixes: `b58d977432` ("dump_stack: serialize the output from dump_stack()") Signed-off-by: Kevin Hao <haokexin@gmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:37 +01:00
Michal Hocko	6c944fc51f	mm, vmstat: hide /proc/pagetypeinfo from normal users commit `abaed0112c` upstream. /proc/pagetypeinfo is a debugging tool to examine internal page allocator state wrt to fragmentation. It is not very useful for any other use so normal users really do not need to read this file. Waiman Long has noticed that reading this file can have negative side effects because zone->lock is necessary for gathering data and that a) interferes with the page allocator and its users and b) can lead to hard lockups on large machines which have very long free_list. Reduce both issues by simply not exporting the file to regular users. Link: http://lkml.kernel.org/r/20191025072610.18526-2-mhocko@kernel.org Fixes: `467c996c1e` ("Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Waiman Long <longman@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Waiman Long <longman@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Jann Horn <jannh@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:36 +01:00
Yang Shi	2686f71fdc	mm: thp: handle page cache THP correctly in PageTransCompoundMap commit `169226f7e0` upstream. We have a usecase to use tmpfs as QEMU memory backend and we would like to take the advantage of THP as well. But, our test shows the EPT is not PMD mapped even though the underlying THP are PMD mapped on host. The number showed by /sys/kernel/debug/kvm/largepage is much less than the number of PMD mapped shmem pages as the below: 7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 579584 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 12 And some benchmarks do worse than with anonymous THPs. By digging into the code we figured out that commit `127393fbe5` ("mm: thp: kvm: fix memory corruption in KVM with THP enabled") checks if there is a single PTE mapping on the page for anonymous THP when setting up EPT map. But the _mapcount < 0 check doesn't work for page cache THP since every subpage of page cache THP would get _mapcount inc'ed once it is PMD mapped, so PageTransCompoundMap() always returns false for page cache THP. This would prevent KVM from setting up PMD mapped EPT entry. So we need handle page cache THP correctly. However, when page cache THP's PMD gets split, kernel just remove the map instead of setting up PTE map like what anonymous THP does. Before KVM calls get_user_pages() the subpages may get PTE mapped even though it is still a THP since the page cache THP may be mapped by other processes at the mean time. Checking its _mapcount and whether the THP has PTE mapped or not. Although this may report some false negative cases (PTE mapped by other processes), it looks not trivial to make this accurate. With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages are PMD mapped by EPT as the below: 7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 557056 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 271 And the benchmarks are as same as anonymous THPs. [yang.shi@linux.alibaba.com: v4] Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com Fixes: `dd78fedde4` ("rmap: support file thp") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reported-by: Gang Deng <gavin.dg@linux.alibaba.com> Tested-by: Gang Deng <gavin.dg@linux.alibaba.com> Suggested-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:36 +01:00
Mel Gorman	7dfa51beac	mm, meminit: recalculate pcpu batch and high limits after init completes commit `3e8fc0075e` upstream. Deferred memory initialisation updates zone->managed_pages during the initialisation phase but before that finishes, the per-cpu page allocator (pcpu) calculates the number of pages allocated/freed in batches as well as the maximum number of pages allowed on a per-cpu list. As zone->managed_pages is not up to date yet, the pcpu initialisation calculates inappropriately low batch and high values. This increases zone lock contention quite severely in some cases with the degree of severity depending on how many CPUs share a local zone and the size of the zone. A private report indicated that kernel build times were excessive with extremely high system CPU usage. A perf profile indicated that a large chunk of time was lost on zone->lock contention. This patch recalculates the pcpu batch and high values after deferred initialisation completes for every populated zone in the system. It was tested on a 2-socket AMD EPYC 2 machine using a kernel compilation workload -- allmodconfig and all available CPUs. mmtests configuration: config-workload-kernbench-max Configuration was modified to build on a fresh XFS partition. kernbench 5.4.0-rc3 5.4.0-rc3 vanilla resetpcpu-v2 Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%* Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%* Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%* Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%) Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%) Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%) 5.4.0-rc3 5.4.0-rc3 vanilla resetpcpu-v2 Duration User 39766.24 49221.79 Duration System 44298.10 13361.67 Duration Elapsed 519.11 388.87 The patch reduces system CPU usage by 69.86% and total build time by 26.65%. The variance of system CPU usage is also much reduced. Before, this was the breakdown of batch and high values over all zones was: 256 batch: 1 256 batch: 63 512 batch: 7 256 high: 0 256 high: 378 512 high: 42 512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After the patch: 256 batch: 1 768 batch: 63 256 high: 0 768 high: 378 [mgorman@techsingularity.net: fix merge/linkage snafu] Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> [4.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:35 +01:00
Johannes Weiner	8e6bf4bc3a	mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges commit `869712fd3d` upstream. While upgrading from 4.16 to 5.2, we noticed these allocation errors in the log of the new kernel: SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: tw_sock_TCPv6(960:helper-logs), object size: 232, buffer size: 240, default order: 1, min order: 0 node 0: slabs: 5, objs: 170, free: 0 slab_out_of_memory+1 ___slab_alloc+969 __slab_alloc+14 kmem_cache_alloc+346 inet_twsk_alloc+60 tcp_time_wait+46 tcp_fin+206 tcp_data_queue+2034 tcp_rcv_state_process+784 tcp_v6_do_rcv+405 __release_sock+118 tcp_close+385 inet_release+46 __sock_release+55 sock_close+17 __fput+170 task_work_run+127 exit_to_usermode_loop+191 do_syscall_64+212 entry_SYSCALL_64_after_hwframe+68 accompanied by an increase in machines going completely radio silent under memory pressure. One thing that changed since 4.16 is `e699e2c6a6` ("net, mm: account sock objects to kmemcg"), which made these slab caches subject to cgroup memory accounting and control. The problem with that is that cgroups, unlike the page allocator, do not maintain dedicated atomic reserves. As a cgroup's usage hovers at its limit, atomic allocations - such as done during network rx - can fail consistently for extended periods of time. The kernel is not able to operate under these conditions. We don't want to revert the culprit patch, because it indeed tracks a potentially substantial amount of memory used by a cgroup. We also don't want to implement dedicated atomic reserves for cgroups. There is no point in keeping a fixed margin of unused bytes in the cgroup's memory budget to accomodate a consumer that is impossible to predict - we'd be wasting memory and get into configuration headaches, not unlike what we have going with min_free_kbytes. We do this for physical mem because we have to, but cgroups are an accounting game. Instead, account these privileged allocations to the cgroup, but let them bypass the configured limit if they have to. This way, we get the benefits of accounting the consumed memory and have it exert pressure on the rest of the cgroup, but like with the page allocator, we shift the burden of reclaimining on behalf of atomic allocations onto the regular allocations that can block. Link: http://lkml.kernel.org/r/20191022233708.365764-1-hannes@cmpxchg.org Fixes: `e699e2c6a6` ("net, mm: account sock objects to kmemcg") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> [4.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-12 19:20:35 +01:00

1 2 3 4 5 ...

792039 Commits