linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-07 19:30:30 +09:00

Author	SHA1	Message	Date
Huazhong Tan	e334510840	net: hns3: bugfix for reporting unknown vector0 interrupt repeatly problem [ Upstream commit `0d4411408a` ] The current driver supports handling two vector0 interrupts, reset and mailbox. When the hardware reports an interrupt of another type of interrupt source, if the driver does not process the interrupt, but enables the interrupt, the hardware will repeatedly report the unknown interrupt. Therefore, the driver enables the vector0 interrupt after clearing the known type of interrupt source. Other conditions are not enabled. Fixes: `cd8c5c269b` ("net: hns3: Fix for hclge_reset running repeatly problem") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:12 +01:00
Huazhong Tan	b0465d06d4	net: hns3: bugfix for buffer not free problem during resetting [ Upstream commit `73b907a083` ] When hns3_get_ring_config()/hns3_queue_to_ring()/ hns3_get_vector_ring_chain() failed during resetting, the allocated memory has not been freed before these three functions return. So this patch adds error handler in these functions to fix it. Fixes: `76ad4f0ee7` ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:12 +01:00
Jacob Keller	3d9bc014c5	fm10k: ensure completer aborts are marked as non-fatal after a resume [ Upstream commit `e330af7889` ] VF drivers can trigger PCIe completer aborts any time they read a queue that they don't own. Even in nominal circumstances, it is not possible to prevent the VF driver from reading queues it doesn't own. VF drivers may attempt to read queues it previously owned, but which it no longer does due to a PF reset. Normally these completer aborts aren't an issue. However, on some platforms these trigger machine check errors. This is true even if we lower their severity from fatal to non-fatal. Indeed, we already have code for lowering the severity. We could attempt to mask these errors conditionally around resets, which is the most common time they would occur. However this would essentially be a race between the PF and VF drivers, and we may still occasionally see machine check exceptions on these strictly configured platforms. Instead, mask the errors entirely any time we resume VFs. By doing so, we prevent the completer aborts from being sent to the parent PCIe device, and thus these strict platforms will not upgrade them into machine check errors. Additionally, we don't lose any information by masking these errors, because we'll still report VFs which attempt to access queues via the FUM_BAD_VF_QACCESS errors. Without this change, on platforms where completer aborts cause machine check exceptions, the VF reading queues it doesn't own could crash the host system. Masking the completer abort prevents this, so we should mask it for good, and not just around a PCIe reset. Otherwise malicious or misconfigured VFs could cause the host system to crash. Because we are masking the error entirely, there is little reason to also keep setting the severity bit, so that code is also removed. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:11 +01:00
Miroslav Lichvar	2fed73906e	igb: shorten maximum PHC timecounter update interval [ Upstream commit `094bf4d0e9` ] The timecounter needs to be updated at least once per ~550 seconds in order to avoid a 40-bit SYSTIM timestamp to be misinterpreted as an old timestamp. Since commit `500462a9d` ("timers: Switch to a non-cascading wheel"), scheduling of delayed work seems to be less accurate and a requested delay of 540 seconds may actually be longer than 550 seconds. Shorten the delay to 480 seconds to be sure the timecounter is updated in time. This fixes an issue with HW timestamps on 82580/I350/I354 being off by ~1100 seconds for few seconds every ~9 minutes. Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:11 +01:00
David Hildenbrand	3081ae5e37	powerpc/powernv: hold device_hotplug_lock when calling device_online() [ Upstream commit `cec1680591` ] device_online() should be called with device_hotplug_lock() held. Link: http://lkml.kernel.org/r/20180925091457.28651-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rashmica Gupta <rashmica.g@gmail.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: John Allen <jallen@linux.vnet.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <lenb@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:11 +01:00
David Hildenbrand	17523d7a1c	mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock [ Upstream commit `381eab4a6e` ] There seem to be some problems as result of `30467e0b3b` ("mm, hotplug: fix concurrent memory hot-add deadlock"), which tried to fix a possible lock inversion reported and discussed in [1] due to the two locks a) device_lock() b) mem_hotplug_lock While add_memory() first takes b), followed by a) during bus_probe_device(), onlining of memory from user space first took a), followed by b), exposing a possible deadlock. In [1], and it was decided to not make use of device_hotplug_lock, but rather to enforce a locking order. The problems I spotted related to this: 1. Memory block device attributes: While .state first calls mem_hotplug_begin() and the calls device_online() - which takes device_lock() - .online does no longer call mem_hotplug_begin(), so effectively calls online_pages() without mem_hotplug_lock. 2. device_online() should be called under device_hotplug_lock, however onlining memory during add_memory() does not take care of that. In addition, I think there is also something wrong about the locking in 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages() without locks. This was introduced after `30467e0b3b`. And skimming over the code, I assume it could need some more care in regards to locking (e.g. device_online() called without device_hotplug_lock. This will be addressed in the following patches. Now that we hold the device_hotplug_lock when - adding memory (e.g. via add_memory()/add_memory_resource()) - removing memory (e.g. via remove_memory()) - device_online()/device_offline() We can move mem_hotplug_lock usage back into online_pages()/offline_pages(). Why is mem_hotplug_lock still needed? Essentially to make get_online_mems()/put_online_mems() be very fast (relying on device_hotplug_lock would be very slow), and to serialize against addition of memory that does not create memory block devices (hmm). [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/ 2015-February/065324.html This patch is partly based on a patch by Vitaly Kuznetsov. Link: http://lkml.kernel.org/r/20180925091457.28651-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Rashmica Gupta <rashmica.g@gmail.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: John Allen <jallen@linux.vnet.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:10 +01:00
David Hildenbrand	02735d5987	mm/memory_hotplug: make add_memory() take the device_hotplug_lock [ Upstream commit `8df1d0e4a2` ] add_memory() currently does not take the device_hotplug_lock, however is aleady called under the lock from arch/powerpc/platforms/pseries/hotplug-memory.c drivers/acpi/acpi_memhotplug.c to synchronize against CPU hot-remove and similar. In general, we should hold the device_hotplug_lock when adding memory to synchronize against online/offline request (e.g. from user space) - which already resulted in lock inversions due to device_lock() and mem_hotplug_lock - see `30467e0b3b` ("mm, hotplug: fix concurrent memory hot-add deadlock"). add_memory()/add_memory_resource() will create memory block devices, so this really feels like the right thing to do. Holding the device_hotplug_lock makes sure that a memory block device can really only be accessed (e.g. via .online/.state) from user space, once the memory has been fully added to the system. The lock is not held yet in drivers/xen/balloon.c arch/powerpc/platforms/powernv/memtrace.c drivers/s390/char/sclp_cmd.c drivers/hv/hv_balloon.c So, let's either use the locked variants or take the lock. Don't export add_memory_resource(), as it once was exported to be used by XEN, which is never built as a module. If somebody requires it, we also have to export a locked variant (as device_hotplug_lock is never exported). Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: John Allen <jallen@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mathieu Malaterre <malat@debian.org> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:10 +01:00
Borislav Petkov	023c071f10	kernel/panic.c: do not append newline to the stack protector panic string [ Upstream commit `95c4fb78fb` ] ... because panic() itself already does this. Otherwise you have line-broken trailer: [ 1.836965] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pgd_alloc+0x29e/0x2a0 [ 1.836965] ]--- Link: http://lkml.kernel.org/r/20181008202901.7894-1-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:10 +01:00
Colin Ian King	1914e5edd8	fs/hfs/extent.c: fix array out of bounds read of array extent [ Upstream commit `6c9a3f843a` ] Currently extent and index i are both being incremented causing an array out of bounds read on extent[i]. Fix this by removing the extraneous increment of extent. Ernesto said: : This is only triggered when deleting a file with a resource fork. I : may be wrong because the documentation isn't clear, but I don't think : you can create those under linux. So I guess nobody was testing them. : : > A disk space leak, perhaps? : : That's what it looks like in general. hfs_free_extents() won't do : anything if the block count doesn't add up, and the error will be : ignored. Now, if the block count randomly does add up, we could see : some corruption. Detected by CoverityScan, CID#711541 ("Out of bounds read") Link: http://lkml.kernel.org/r/20180831140538.31566-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ernesto A. Fernndez <ernesto.mnd.fernandez@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:10 +01:00
Ernesto A. Fernández	a9f38975eb	hfs: update timestamp on truncate() [ Upstream commit `8cd3cb5061` ] The vfs takes care of updating mtime on ftruncate(), but on truncate() it must be done by the module. Link: http://lkml.kernel.org/r/e1611eda2985b672ed2d8677350b4ad8c2d07e8a.1539316825.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:09 +01:00
Ernesto A. Fernández	0013adceb5	hfsplus: update timestamps on truncate() [ Upstream commit `dc8844aada` ] The vfs takes care of updating ctime and mtime on ftruncate(), but on truncate() it must be done by the module. This patch can be tested with xfstests generic/313. Link: http://lkml.kernel.org/r/9beb0913eea37288599e8e1b7cec8768fb52d1b8.1539316825.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:09 +01:00
Ernesto A. Fernández	38e7b916da	hfs: fix return value of hfs_get_block() [ Upstream commit `1267a07be5` ] Direct writes to empty inodes fail with EIO. The generic direct-io code is in part to blame (a patch has been submitted as "direct-io: allow direct writes to empty inodes"), but hfs is worse affected than the other filesystems because the fallback to buffered I/O doesn't happen. The problem is the return value of hfs_get_block() when called with !create. Change it to be more consistent with the other modules. Link: http://lkml.kernel.org/r/4538ab8c35ea37338490525f0f24cbc37227528c.1539195310.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:09 +01:00
Ernesto A. Fernández	550da9ee35	hfsplus: fix return value of hfsplus_get_block() [ Upstream commit `839c3a6a5e` ] Direct writes to empty inodes fail with EIO. The generic direct-io code is in part to blame (a patch has been submitted as "direct-io: allow direct writes to empty inodes"), but hfsplus is worse affected than the other filesystems because the fallback to buffered I/O doesn't happen. The problem is the return value of hfsplus_get_block() when called with !create. Change it to be more consistent with the other modules. Link: http://lkml.kernel.org/r/2cd1301404ec7cf1e39c8f11a01a4302f1460ad6.1539195310.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:09 +01:00
Ernesto A. Fernández	8687d57d24	hfs: prevent btree data loss on ENOSPC [ Upstream commit `54640c7502` ] Inserting a new record in a btree may require splitting several of its nodes. If we hit ENOSPC halfway through, the new nodes will be left orphaned and their records will be lost. This could mean lost inodes or extents. Henceforth, check the available disk space before making any changes. This still leaves the potential problem of corruption on ENOMEM. There is no need to reserve space before deleting a catalog record, as we do for hfsplus. This difference is because hfs index nodes have fixed length keys. Link: http://lkml.kernel.org/r/ab5fc8a7d5ffccfd5f27b1cf2cb4ceb6c110da74.1536269131.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:09 +01:00
Ernesto A. Fernández	0b54b59d85	hfsplus: prevent btree data loss on ENOSPC [ Upstream commit `d92915c35b` ] Inserting or deleting a record in a btree may require splitting several of its nodes. If we hit ENOSPC halfway through, the new nodes will be left orphaned and their records will be lost. This could mean lost inodes, extents or xattrs. Henceforth, check the available disk space before making any changes. This still leaves the potential problem of corruption on ENOMEM. The patch can be tested with xfstests generic/027. Link: http://lkml.kernel.org/r/4596eef22fbda137b4ffa0272d92f0da15364421.1536269129.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:08 +01:00
Ernesto A. Fernández	7beaf6105e	hfs: fix BUG on bnode parent update [ Upstream commit `ef75bcc576` ] hfs_brec_update_parent() may hit BUG_ON() if the first record of both a leaf node and its parent are changed, and if this forces the parent to be split. It is not possible for this to happen on a valid hfs filesystem because the index nodes have fixed length keys. For reasons I ignore, the hfs module does have support for a number of hfsplus features. A corrupt btree header may report variable length keys and trigger this BUG, so it's better to fix it. Link: http://lkml.kernel.org/r/cf9b02d57f806217a2b1bf5db8c3e39730d8f603.1535682463.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:08 +01:00
Ernesto A. Fernández	1df96949eb	hfsplus: fix BUG on bnode parent update [ Upstream commit `19a9d0f1ac` ] Creating, renaming or deleting a file may hit BUG_ON() if the first record of both a leaf node and its parent are changed, and if this forces the parent to be split. This bug is triggered by xfstests generic/027, somewhat rarely; here is a more reliable reproducer: truncate -s 50M fs.iso mkfs.hfsplus fs.iso mount fs.iso /mnt i=1000 while [ $i -le 2400 ]; do touch /mnt/$i &>/dev/null ((++i)) done i=2400 while [ $i -ge 1000 ]; do mv /mnt/$i /mnt/$(perl -e "print $i x61") &>/dev/null ((--i)) done The issue is that a newly created bnode is being put twice. Reset new_node to NULL in hfs_brec_update_parent() before reaching goto again. Link: http://lkml.kernel.org/r/5ee1db09b60373a15890f6a7c835d00e76bf601d.1535682461.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:08 +01:00
Rasmus Villemoes	08751e477f	lib/bitmap.c: fix remaining space computation in bitmap_print_to_pagebuf [ Upstream commit `ce1091d471` ] For various alignments of buf, the current expression computes 4096 ok 4095 ok 8190 8189 ... 4097 i.e., if the caller has already written two bytes into the page buffer, len is 8190 rather than 4094, because PTR_ALIGN aligns up to the next boundary. So if the printed version of the bitmap is huge, scnprintf() ends up writing beyond the page boundary. I don't think any current callers actually write anything before bitmap_print_to_pagebuf, but the API seems to be designed to allow it. [akpm@linux-foundation.org: use offset_in_page(), per Andy] [akpm@linux-foundation.org: include mm.h for offset_in_page()] Link: http://lkml.kernel.org/r/20180818131623.8755-7-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:08 +01:00
Rasmus Villemoes	1d58349459	linux/bitmap.h: fix type of nbits in bitmap_shift_right() [ Upstream commit `d9873969fa` ] Most other bitmap API, including the OOL version __bitmap_shift_right, take unsigned nbits. This was accidentally left out from `2fbad29917`. Link: http://lkml.kernel.org/r/20180818131623.8755-5-linux@rasmusvillemoes.dk Fixes: `2fbad29917` ("lib: bitmap: change bitmap_shift_right to take unsigned parameters") Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reported-by: Yury Norov <ynorov@caviumnetworks.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:07 +01:00
Rasmus Villemoes	8deaaf77ce	linux/bitmap.h: handle constant zero-size bitmaps correctly [ Upstream commit `7275b09785` ] The static inlines in bitmap.h do not handle a compile-time constant nbits==0 correctly (they dereference the passed src or dst pointers, despite only 0 words being valid to access). I had the 0-day buildbot chew on a patch [1] that would cause build failures for such cases without complaining, suggesting that we don't have any such users currently, at least for the 70 .config/arch combinations that was built. Should any turn up, make sure they use the out-of-line versions, which do handle nbits==0 correctly. This is of course not the most efficient, but it's much less churn than teaching all the static inlines an "if (zero_const_nbits())", and since we don't have any current instances, this doesn't affect existing code at all. [1] lkml.kernel.org/r/20180815085539.27485-1-linux@rasmusvillemoes.dk Link: http://lkml.kernel.org/r/20180818131623.8755-3-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:07 +01:00
Dan Carpenter	30598425ae	mm/gup_benchmark.c: prevent integer overflow in ioctl [ Upstream commit `4b408c74ee` ] The concern here is that "gup->size" is a u64 and "nr_pages" is unsigned long. On 32 bit systems we could trick the kernel into allocating fewer pages than expected. Link: http://lkml.kernel.org/r/20181025061546.hnhkv33diogf2uis@kili.mountain Fixes: `64c349f4ae` ("mm: add infrastructure for get_user_pages_fast() benchmarking") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Keith Busch <keith.busch@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:07 +01:00
Ming Lei	9663d294ae	block: call rq_qos_exit() after queue is frozen [ Upstream commit `c57cdf7a9e` ] rq_qos_exit() removes the current q->rq_qos, this action has to be done after queue is frozen, otherwise the IO queue path may never be waken up, then IO hang is caused. So fixes this issue by moving rq_qos_exit() after queue is frozen. Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:06 +01:00
Michael Ellerman	a125df22d1	selftests/powerpc/cache_shape: Fix out-of-tree build [ Upstream commit `69f8117f17` ] Use TEST_GEN_PROGS and don't redefine all, this makes the out-of-tree build work. We need to move the extra dependencies below the include of lib.mk, because it adds the $(OUTPUT) prefix if it's defined. We can also drop the clean rule, lib.mk does it for us. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:06 +01:00
Michael Ellerman	024cd793bb	selftests/powerpc/switch_endian: Fix out-of-tree build [ Upstream commit `266bac361d` ] For the out-of-tree build to work we need to tell switch_endian_test to look for check-reversed.S in $(OUTPUT). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:06 +01:00
Joel Stanley	a4a660f7ab	selftests/powerpc/signal: Fix out-of-tree build [ Upstream commit `27825349d7` ] We should use TEST_GEN_PROGS, not TEST_PROGS. That tells the selftests makefile (lib.mk) that those tests are generated (built), and so it adds the $(OUTPUT) prefix for us, making the out-of-tree build work correctly. It also means we don't need our own clean rule, lib.mk does it. We also have to update the signal_tm rule to use $(OUTPUT). Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:05 +01:00
Joel Stanley	f74f406bbd	selftests/powerpc/ptrace: Fix out-of-tree build [ Upstream commit `c39b79082a` ] We should use TEST_GEN_PROGS, not TEST_PROGS. That tells the selftests makefile (lib.mk) that those tests are generated (built), and so it adds the $(OUTPUT) prefix for us, making the out-of-tree build work correctly. It also means we don't need our own clean rule, lib.mk does it. We also have to update the ptrace-pkey and core-pkey rules to use $(OUTPUT). Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:05 +01:00
Joel Stanley	57aab8f0a3	powerpc/xmon: Relax frame size for clang [ Upstream commit `9c87156cce` ] When building with clang (8 trunk, 7.0 release) the frame size limit is hit: arch/powerpc/xmon/xmon.c:452:12: warning: stack frame size of 2576 bytes in function 'xmon_core' [-Wframe-larger-than=] Some investigation by Naveen indicates this is due to clang saving the addresses to printf format strings on the stack. While this issue is investigated, bump up the frame size limit for xmon when building with clang. Link: https://github.com/ClangBuiltLinux/linux/issues/252 Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:05 +01:00
Hangbin Liu	32d7474b7a	ipv4/igmp: fix v1/v2 switchback timeout based on rfc3376, 8.12 [ Upstream commit `966c37f2d7` ] Similiar with ipv6 mcast commit `89225d1ce6` ("net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12.") i) RFC3376 8.12. Older Version Querier Present Timeout says: The Older Version Querier Interval is the time-out for transitioning a host back to IGMPv3 mode once an older version query is heard. When an older version query is received, hosts set their Older Version Querier Present Timer to Older Version Querier Interval. This value MUST be ((the Robustness Variable) times (the Query Interval in the last Query received)) plus (one Query Response Interval). Currently we only use a hardcode value IGMP_V1/v2_ROUTER_PRESENT_TIMEOUT. Fix it by adding two new items mr_qi(Query Interval) and mr_qri(Query Response Interval) in struct in_device. Now we can calculate the switchback time via (mr_qrv * mr_qi) + mr_qri. We need update these values when receive IGMPv3 queries. Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:05 +01:00
Darrick J. Wong	691bd94c15	vfs: avoid problematic remapping requests into partial EOF block [ Upstream commit `07d19dc9fb` ] A deduplication data corruption is exposed in XFS and btrfs. It is caused by extending the block match range to include the partial EOF block, but then allowing unknown data beyond EOF to be considered a "match" to data in the destination file because the comparison is only made to the end of the source file. This corrupts the destination file when the source extent is shared with it. The VFS remapping prep functions only support whole block dedupe, but we still need to appear to support whole file dedupe correctly. Hence if the dedupe request includes the last block of the souce file, don't include it in the actual dedupe operation. If the rest of the range dedupes successfully, then reject the entire request. A subsequent patch will enable us to shorten dedupe requests correctly. When reflinking sub-file ranges, a data corruption can occur when the source file range includes a partial EOF block. This shares the unknown data beyond EOF into the second file at a position inside EOF, exposing stale data in the second file. If the reflink request includes the last block of the souce file, only proceed with the reflink operation if it lands at or past the destination file's current EOF. If it lands within the destination file EOF, reject the entire request with -EINVAL and make the caller go the hard way. A subsequent patch will enable us to shorten reflink requests correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:04 +01:00
Anton Ivanov	cdc45f2047	um: Make line/tty semantics use true write IRQ [ Upstream commit `917e2fd2c5` ] This fixes a long standing bug where large amounts of output could freeze the tty (most commonly seen on stdio console). While the bug has always been there it became more pronounced after moving to the new interrupt controller. The line semantics are now changed to have true IRQ write semantics which should further improve the tty/line subsystem stability and performance Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:04 +01:00
Masahiro Yamada	a17e3bbfb9	i2c: uniphier-f: fix race condition when IRQ is cleared [ Upstream commit `eaba68785c` ] The current IRQ handler clears all the IRQ status bits when it bails out. This is dangerous because it might clear away the status bits that have just been set while processing the current handler. If this happens, the IRQ event for the latest transfer is lost forever. The IRQ status bits must be cleared before the next transfer is kicked. Fixes: `6a62974b66` ("i2c: uniphier_f: add UniPhier FIFO-builtin I2C driver") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:04 +01:00
Masahiro Yamada	a118403a5e	i2c: uniphier-f: fix occasional timeout error [ Upstream commit `39226aaa85` ] Currently, a timeout error could happen at a repeated START condition. For a (non-repeated) START condition, the controller starts sending data when the UNIPHIER_FI2C_CR_STA bit is set. However, for a repeated START condition, the hardware starts running when the slave address is written to the TX FIFO - the write to the UNIPHIER_FI2C_CR register is actually unneeded. Because the hardware is already running before the IRQ is enabled for a repeated START, the driver may miss the IRQ event. In most cases, this problem does not show up since modern CPUs are much faster than the I2C transfer. However, it is still possible that a context switch happens after the controller starts, but before the IRQ register is set up. To fix this, - Do not write UNIPHIER_FI2C_CR for repeated START conditions. - Enable IRQ before writing the slave address to the TX FIFO. - Disable IRQ for the current CPU while queuing up the TX FIFO; If the CPU is interrupted by some task, the interrupt handler might be invoked due to the empty TX FIFO before completing the setup. Fixes: `6a62974b66` ("i2c: uniphier_f: add UniPhier FIFO-builtin I2C driver") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:04 +01:00
Masahiro Yamada	1466eae37a	i2c: uniphier-f: make driver robust against concurrency [ Upstream commit `f1fdcbbdf4` ] This is unlikely to happen, but it is possible for a CPU to enter the interrupt handler just after wait_for_completion_timeout() has expired. If this happens, the hardware is accessed from multiple contexts concurrently. Disable the IRQ after wait_for_completion_timeout(), and do nothing from the handler when the IRQ is disabled. Fixes: `6a62974b66` ("i2c: uniphier_f: add UniPhier FIFO-builtin I2C driver") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:03 +01:00
Jianchao Wang	10807b3746	block: fix the DISCARD request merge [ Upstream commit `6984046608` ] There are two cases when handle DISCARD merge. If max_discard_segments == 1, the bios/requests need to be contiguous to merge. If max_discard_segments > 1, it takes every bio as a range and different range needn't to be contiguous. But now, attempt_merge screws this up. It always consider contiguity for DISCARD for the case max_discard_segments > 1 and cannot merge contiguous DISCARD for the case max_discard_segments == 1, because rq_attempt_discard_merge always returns false in this case. This patch fixes both of the two cases above. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:03 +01:00
Sabrina Dubroca	b948d56951	macsec: let the administrator set UP state even if lowerdev is down [ Upstream commit `07bddef983` ] Currently, the kernel doesn't let the administrator set a macsec device up unless its lower device is currently up. This is inconsistent, as a macsec device that is up won't automatically go down when its lower device goes down. Now that linkstate propagation works, there's really no reason for this limitation, so let's remove it. Fixes: `c09440f7dc` ("macsec: introduce IEEE 802.1AE driver") Reported-by: Radu Rendec <radu.rendec@gmail.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:03 +01:00
Sabrina Dubroca	f5bdad7106	macsec: update operstate when lower device changes [ Upstream commit `e6ac075882` ] Like all other virtual devices (macvlan, vlan), the operstate of a macsec device should match the state of its lower device. This is done by calling netif_stacked_transfer_operstate from its netdevice notifier. We also need to call netif_stacked_transfer_operstate when a new macsec device is created, so that its operstate is set properly. This is only relevant when we try to bring the device up directly when we create it. Radu Rendec proposed a similar patch, inspired from the 802.1q driver, that included changing the administrative state of the macsec device, instead of just the operstate. This version is similar to what the macvlan driver does, and updates only the operstate. Fixes: `c09440f7dc` ("macsec: introduce IEEE 802.1AE driver") Reported-by: Radu Rendec <radu.rendec@gmail.com> Reported-by: Patrick Talbert <ptalbert@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:03 +01:00
Andrea Arcangeli	4291e97c69	mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition [ Upstream commit `d7c3393413` ] Patch series "migrate_misplaced_transhuge_page race conditions". Aaron found a new instance of the THP MADV_DONTNEED race against pmdp_clear_flush* variants, that was apparently left unfixed. While looking into the race found by Aaron, I may have found two more issues in migrate_misplaced_transhuge_page. These race conditions would not cause kernel instability, but they'd corrupt userland data or leave data non zero after MADV_DONTNEED. I did only minor testing, and I don't expect to be able to reproduce this (especially the lack of ->invalidate_range before migrate_page_copy, requires the latest iommu hardware or infiniband to reproduce). The last patch is noop for x86 and it needs further review from maintainers of archs that implement flush_cache_range() (not in CC yet). To avoid confusion, it's not the first patch that introduces the bug fixed in the second patch, even before removing the pmdp_huge_clear_flush_notify, that _notify suffix was called after migrate_page_copy already run. This patch (of 3): This is a corollary of `ced108037c` ("thp: fix MADV_DONTNEED vs. numa balancing race"), `58ceeb6bec` ("thp: fix MADV_DONTNEED vs. MADV_FREE race") and `5b7abeae3a` ("thp: fix MADV_DONTNEED vs clear soft dirty race). When the above three fixes where posted Dave asked https://lkml.kernel.org/r/929b3844-aec2-0111-fef7-8002f9d4e2b9@intel.com but apparently this was missed. The pmdp_clear_flush* in migrate_misplaced_transhuge_page() was introduced in `a54a407fbf` ("mm: Close races between THP migration and PMD numa clearing"). The important part of such commit is only the part where the page lock is not released until the first do_huge_pmd_numa_page() finished disarming the pagenuma/protnone. The addition of pmdp_clear_flush() wasn't beneficial to such commit and there's no commentary about such an addition either. I guess the pmdp_clear_flush() in such commit was added just in case for safety, but it ended up introducing the MADV_DONTNEED race condition found by Aaron. At that point in time nobody thought of such kind of MADV_DONTNEED race conditions yet (they were fixed later) so the code may have looked more robust by adding the pmdp_clear_flush(). This specific race condition won't destabilize the kernel, but it can confuse userland because after MADV_DONTNEED the memory won't be zeroed out. This also optimizes the code and removes a superfluous TLB flush. [akpm@linux-foundation.org: reflow comment to 80 cols, fix grammar and typo (beacuse)] Link: http://lkml.kernel.org/r/20181013002430.698-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:02 +01:00
Keith Busch	ac1cad79bc	tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage [ Upstream commit `319e0bec1a` ] If the '-w' parameter was provided, the benchmark would exit due to a mssing 'break'. Link: http://lkml.kernel.org/r/20181010195605.10689-3-keith.busch@intel.com Signed-off-by: Keith Busch <keith.busch@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:02 +01:00
Dave Chinner	2d9d6c099e	mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock [ Upstream commit `64081362e8` ] We've recently seen a workload on XFS filesystems with a repeatable deadlock between background writeback and a multi-process application doing concurrent writes and fsyncs to a small range of a file. range_cyclic writeback Process 1 Process 2 xfs_vm_writepages write_cache_pages writeback_index = 2 cycled = 0 .... find page 2 dirty lock Page 2 ->writepage page 2 writeback page 2 clean page 2 added to bio no more pages write() locks page 1 dirties page 1 locks page 2 dirties page 1 fsync() .... xfs_vm_writepages write_cache_pages start index 0 find page 1 towrite lock Page 1 ->writepage page 1 writeback page 1 clean page 1 added to bio find page 2 towrite lock Page 2 page 2 is writeback <blocks> write() locks page 1 dirties page 1 fsync() .... xfs_vm_writepages write_cache_pages start index 0 !done && !cycled sets index to 0, restarts lookup find page 1 dirty find page 1 towrite lock Page 1 page 1 is writeback <blocks> lock Page 1 <blocks> DEADLOCK because: - process 1 needs page 2 writeback to complete to make enough progress to issue IO pending for page 1 - writeback needs page 1 writeback to complete so process 2 can progress and unlock the page it is blocked on, then it can issue the IO pending for page 2 - process 2 can't make progress until process 1 issues IO for page 1 The underlying cause of the problem here is that range_cyclic writeback is processing pages in descending index order as we hold higher index pages in a structure controlled from above write_cache_pages(). The write_cache_pages() caller needs to be able to submit these pages for IO before write_cache_pages restarts writeback at mapping index 0 to avoid wcp inverting the page lock/writeback wait order. generic_writepages() is not susceptible to this bug as it has no private context held across write_cache_pages() - filesystems using this infrastructure always submit pages in ->writepage immediately and so there is no problem with range_cyclic going back to mapping index 0. However: mpage_writepages() has a private bio context, exofs_writepages() has page_collect fuse_writepages() has fuse_fill_wb_data nfs_writepages() has nfs_pageio_descriptor xfs_vm_writepages() has xfs_writepage_ctx All of these ->writepages implementations can hold pages under writeback in their private structures until write_cache_pages() returns, and hence they are all susceptible to this deadlock. Also worth noting is that ext4 has it's own bastardised version of write_cache_pages() and so it /may/ have an equivalent deadlock. I looked at the code long enough to understand that it has a similar retry loop for range_cyclic writeback reaching the end of the file and then promptly ran away before my eyes bled too much. I'll leave it for the ext4 developers to determine if their code is actually has this deadlock and how to fix it if it has. There's a few ways I can see avoid this deadlock. There's probably more, but these are the first I've though of: 1. get rid of range_cyclic altogether 2. range_cyclic always stops at EOF, and we start again from writeback index 0 on the next call into write_cache_pages() 2a. wcp also returns EAGAIN to ->writepages implementations to indicate range cyclic has hit EOF. writepages implementations can then flush the current context and call wpc again to continue. i.e. lift the retry into the ->writepages implementation 3. range_cyclic uses trylock_page() rather than lock_page(), and it skips pages it can't lock without blocking. It will already do this for pages under writeback, so this seems like a no-brainer 3a. all non-WB_SYNC_ALL writeback uses trylock_page() to avoid blocking as per pages under writeback. I don't think #1 is an option - range_cyclic prevents frequently dirtied lower file offset from starving background writeback of rarely touched higher file offsets. #2 is simple, and I don't think it will have any impact on performance as going back to the start of the file implies an immediate seek. We'll have exactly the same number of seeks if we switch writeback to another inode, and then come back to this one later and restart from index 0. #2a is pretty much "status quo without the deadlock". Moving the retry loop up into the wcp caller means we can issue IO on the pending pages before calling wcp again, and so avoid locking or waiting on pages in the wrong order. I'm not convinced we need to do this given that we get the same thing from #2 on the next writeback call from the writeback infrastructure. #3 is really just a band-aid - it doesn't fix the access/wait inversion problem, just prevents it from becoming a deadlock situation. I'd prefer we fix the inversion, not sweep it under the carpet like this. #3a is really an optimisation that just so happens to include the band-aid fix of #3. So it seems that the simplest way to fix this issue is to implement solution #2 Link: http://lkml.kernel.org/r/20181005054526.21507-1-david@fromorbit.com Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.de> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:02 +01:00
Jia-Ju Bai	bcba80f38a	fs/ocfs2/dlm/dlmdebug.c: fix a sleep-in-atomic-context bug in dlm_print_one_mle() [ Upstream commit `999865764f` ] The kernel module may sleep with holding a spinlock. The function call paths (from bottom to top) in Linux-4.16 are: [FUNC] get_zeroed_page(GFP_NOFS) fs/ocfs2/dlm/dlmdebug.c, 332: get_zeroed_page in dlm_print_one_mle fs/ocfs2/dlm/dlmmaster.c, 240: dlm_print_one_mle in __dlm_put_mle fs/ocfs2/dlm/dlmmaster.c, 255: __dlm_put_mle in dlm_put_mle fs/ocfs2/dlm/dlmmaster.c, 254: spin_lock in dlm_put_ml [FUNC] get_zeroed_page(GFP_NOFS) fs/ocfs2/dlm/dlmdebug.c, 332: get_zeroed_page in dlm_print_one_mle fs/ocfs2/dlm/dlmmaster.c, 240: dlm_print_one_mle in __dlm_put_mle fs/ocfs2/dlm/dlmmaster.c, 222: __dlm_put_mle in dlm_put_mle_inuse fs/ocfs2/dlm/dlmmaster.c, 219: spin_lock in dlm_put_mle_inuse To fix this bug, GFP_NOFS is replaced with GFP_ATOMIC. This bug is found by my static analysis tool DSAC. Link: http://lkml.kernel.org/r/20180901112528.27025-1-baijiaju1990@gmail.com Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:02 +01:00
Andrey Ryabinin	99b3146b79	arm64: lib: use C string functions with KASAN enabled [ Upstream commit `19a2ca0fb5` ] ARM64 has asm implementation of memchr(), memcmp(), str[r]chr(), str[n]cmp(), str[n]len(). KASAN don't see memory accesses in asm code, thus it can potentially miss many bugs. Ifdef out __HAVE_ARCH_* defines of these functions when KASAN is enabled, so the generic implementations from lib/string.c will be used. We can't just remove the asm functions because efistub uses them. And we can't have two non-weak functions either, so declare the asm functions as weak. Link: http://lkml.kernel.org/r/20180920135631.23833-2-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Kyeongdon Kim <kyeongdon.kim@lge.com> Cc: Alexander Potapenko <glider@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:01 +01:00
David S. Miller	b84e965c7e	sparc64: Rework xchg() definition to avoid warnings. [ Upstream commit `6c2fc9cddc` ] Such as: fs/ocfs2/file.c: In function ‘ocfs2_file_write_iter’: ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value] #define xchg(ptr,x) ((__typeof__((ptr)))__xchg((unsigned long)(x),(ptr),sizeof((ptr)))) and drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: In function ‘ixgbevf_xdp_setup’: ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value] #define xchg(ptr,x) ((__typeof__((ptr)))__xchg((unsigned long)(x),(ptr),sizeof((ptr)))) Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:01 +01:00
Felipe Rechia	4e4cad4365	powerpc/process: Fix flush_all_to_thread for SPE [ Upstream commit `e901378578` ] Fix a bug introduced by the creation of flush_all_to_thread() for processors that have SPE (Signal Processing Engine) and use it to compute floating-point operations. >From userspace perspective, the problem was seen in attempts of computing floating-point operations which should generate exceptions. For example: fork(); float x = 0.0 / 0.0; isnan(x); // forked process returns False (should be True) The operation above also should always cause the SPEFSCR FINV bit to be set. However, the SPE floating-point exceptions were turned off after a fork(). Kernel versions prior to the bug used flush_spe_to_thread(), which first saves SPEFSCR register values in tsk->thread and then calls giveup_spe(tsk). After commit `579e633e76`, the save_all() function was called first to giveup_spe(), and then the SPEFSCR register values were saved in tsk->thread. This would save the SPEFSCR register values after disabling SPE for that thread, causing the bug described above. Fixes `579e633e76` ("powerpc: create flush_all_to_thread()") Signed-off-by: Felipe Rechia <felipe.rechia@datacom.com.br> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:01 +01:00
Martin Lau	54299e1cf3	bpf, btf: fix a missing check bug in btf_parse [ Upstream commit `4a6998aff8` ] Wenwen Wang reported: In btf_parse(), the header of the user-space btf data 'btf_data' is firstly parsed and verified through btf_parse_hdr(). In btf_parse_hdr(), the header is copied from user-space 'btf_data' to kernel-space 'btf->hdr' and then verified. If no error happens during the verification process, the whole data of 'btf_data', including the header, is then copied to 'data' in btf_parse(). It is obvious that the header is copied twice here. More importantly, no check is enforced after the second copy to make sure the headers obtained in these two copies are same. Given that 'btf_data' resides in the user space, a malicious user can race to modify the header between these two copies. By doing so, the user can inject inconsistent data, which can cause undefined behavior of the kernel and introduce potential security risk. This issue is similar to the one fixed in commit `8af03d1ae2` ("bpf: btf: Fix a missing check bug"). To fix it, this patch copies the user 'btf_data' before parsing / verifying the BTF header. Fixes: `69b693f0ae` ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Co-developed-by: Wenwen Wang <wang6495@umn.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:01 +01:00
Taehee Yoo	8044e741ee	bpf: devmap: fix wrong interface selection in notifier_call [ Upstream commit `f592f80483` ] The dev_map_notification() removes interface in devmap if unregistering interface's ifindex is same. But only checking ifindex is not enough because other netns can have same ifindex. so that wrong interface selection could occurred. Hence netdev pointer comparison code is added. v2: compare netdev pointer instead of using net_eq() (Daniel Borkmann) v1: Initial patch Fixes: `2ddf71e23c` ("net: add notifier hooks for devmap bpf map") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:01 +01:00
Tristram Ha	7b557dbdc5	net: ethernet: cadence: fix socket buffer corruption problem [ Upstream commit `899ecaedd1` ] Socket buffer is not re-created when headroom is 2 and tailroom is 1. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:00 +01:00
Geert Uytterhoeven	3681b901e9	thermal: rcar_thermal: Prevent hardware access during system suspend [ Upstream commit `3a31386217` ] On r8a7791/koelsch, sometimes the following message is printed during system suspend: rcar_thermal e61f0000.thermal: thermal sensor was broken This happens if the workqueue runs while the device is already suspended. Fix this by using the freezable system workqueue instead, cfr. commit `51e20d0e3a` ("thermal: Prevent polling from happening during system suspend"). Fixes: `e0a5172e9e` ("thermal: rcar: add interrupt support") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:00 +01:00
Sergei Shtylyov	436e610bf1	thermal: rcar_thermal: fix duplicate IRQ request [ Upstream commit `df016bbba6` ] The driver on R8A77995 requests the same IRQ twice since platform_get_resource() is always called for the 1st IRQ resource. Fixes: `1969d9dc20` ("thermal: rcar_thermal: add r8a77995 support") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:00 +01:00
Peng Hao	51aa1a10fb	selftests: fix warning: "_GNU_SOURCE" redefined [ Upstream commit `0387662d1b` ] Makefile contains -D_GNU_SOURCE. remove define "_GNU_SOURCE" in c files. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:00 +01:00
Andrea Parri	c62be41088	selftests: kvm: Fix -Wformat warnings [ Upstream commit `fb363e2d20` ] Fixes the following warnings: dirty_log_test.c: In function ‘help’: dirty_log_test.c:216:9: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘int’ [-Wformat=] printf(" -i: specify iteration counts (default: %"PRIu64")\n", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from include/test_util.h:18:0, from dirty_log_test.c:16: /usr/include/inttypes.h:105:34: note: format string is defined here # define PRIu64 __PRI64_PREFIX "u" dirty_log_test.c:218:9: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘int’ [-Wformat=] printf(" -I: specify interval in ms (default: %"PRIu64" ms)\n", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from include/test_util.h:18:0, from dirty_log_test.c:16: /usr/include/inttypes.h:105:34: note: format string is defined here # define PRIu64 __PRI64_PREFIX "u" Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:16:59 +01:00

1 2 3 4 5 ...

792968 Commits