linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-05 02:21:52 +09:00

Author	SHA1	Message	Date
Paolo Bonzini	bb98bd9733	KVM: x86: inject exceptions produced by x86_decode_insn commit `6ea6e84309` upstream. Sometimes, a processor might execute an instruction while another processor is updating the page tables for that instruction's code page, but before the TLB shootdown completes. The interesting case happens if the page is in the TLB. In general, the processor will succeed in executing the instruction and nothing bad happens. However, what if the instruction is an MMIO access? If that happens, KVM invokes the emulator, and the emulator gets the updated page tables. If the update side had marked the code page as non present, the page table walk then will fail and so will x86_decode_insn. Unfortunately, even though kvm_fetch_guest_virt is correctly returning X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as a fatal error if the instruction cannot simply be reexecuted (as is the case for MMIO). And this in fact happened sometimes when rebooting Windows 2012r2 guests. Just checking ctxt->have_exception and injecting the exception if true is enough to fix the case. Thanks to Eduardo Habkost for helping in the debugging of this issue. Reported-by: Yanan Fu <yfu@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:32 +01:00
Liran Alon	15cc35bc76	KVM: x86: Exit to user-mode on #UD intercept when emulator requires commit `61cb57c9ed` upstream. Instruction emulation after trapping a #UD exception can result in an MMIO access, for example when emulating a MOVBE on a processor that doesn't support the instruction. In this case, the #UD vmexit handler must exit to user mode, but there wasn't any code to do so. Add it for both VMX and SVM. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:32 +01:00
Liran Alon	6c4eaffbad	KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk commit `51c4b8bba6` upstream. When guest passes KVM it's pvclock-page GPA via WRMSR to MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW, KVM don't initialize pvclock-page to some start-values. It just requests a clock-update which will happen before entering to guest. The clock-update logic will call kvm_setup_pvclock_page() to update the pvclock-page with info. However, kvm_setup_pvclock_page() wrongly assumes that the version-field is initialized to an even number. This is wrong because at first-time write, field could be any-value. Fix simply makes sure that if first-time version-field is odd, increment it once more to make it even and only then start standard logic. This follows same logic as done in other pvclock shared-pages (See kvm_write_wall_clock() and record_steal_time()). Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:31 +01:00
Michael Ellerman	c12e358867	powerpc/kexec: Fix kexec/kdump in P9 guest kernels commit `2621e945fb` upstream. The code that cleans up the IAMR/AMOR before kexec'ing failed to remember that when we're running as a guest AMOR is not writable, it's hypervisor privileged. They symptom is that the kexec stops before entering purgatory and nothing else is seen on the console. If you examine the state of the system all threads will be in the 0x700 program check handler. Fix it by making the write to AMOR dependent on HV mode. Fixes: `1e2a516e89` ("powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR") Reported-by: Yilin Zhang <yilzhang@redhat.com> Debugged-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:31 +01:00
Mahesh Salgaonkar	3bfbf2b1bc	powerpc/powernv: Fix kexec crashes caused by tlbie tracing commit `a3961f824c` upstream. Rebooting into a new kernel with kexec fails in trace_tlbie() which is called from native_hpte_clear(). This happens if the running kernel has CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always execute few RCU checks regardless of whether tracing is on or off. We are already in the last phase of kexec sequence in real mode with HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and causes kexec to fail. Fix this by not calling trace_tlbie() from native_hpte_clear(). mpe: It's not safe to call trace points at this point in the kexec path, even if we could avoid the RCU checks/warnings. The only solution is to not call them. Fixes: `0428491cba` ("powerpc/mm: Trace tlbie(l) instructions") Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:31 +01:00
Ard Biesheuvel	e9b2a6680f	arm64: ftrace: emit ftrace-mod.o contents through code commit `be0f272bfc` upstream. When building the arm64 kernel with both CONFIG_ARM64_MODULE_PLTS and CONFIG_DYNAMIC_FTRACE enabled, the ftrace-mod.o object file is built with the kernel and contains a trampoline that is linked into each module, so that modules can be loaded far away from the kernel and still reach the ftrace entry point in the core kernel with an ordinary relative branch, as is emitted by the compiler instrumentation code dynamic ftrace relies on. In order to be able to build out of tree modules, this object file needs to be included into the linux-headers or linux-devel packages, which is undesirable, as it makes arm64 a special case (although a precedent does exist for 32-bit PPC). Given that the trampoline essentially consists of a PLT entry, let's not bother with a source or object file for it, and simply patch it in whenever the trampoline is being populated, using the existing PLT support routines. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:31 +01:00
Ard Biesheuvel	e1b4e001b2	arm64: module-plts: factor out PLT generation code for ftrace commit `7e8b9c1d2e` upstream. To allow the ftrace trampoline code to reuse the PLT entry routines, factor it out and move it into asm/module.h. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:31 +01:00
John Johansen	69af22696b	apparmor: fix oops in audit_signal_cb hook commit `b12cbb2158` upstream. The apparmor_audit_data struct ordering got messed up during a merge conflict, resulting in the signal integer and peer pointer being in a union instead of a struct. For most of the 4.13 and 4.14 life cycle, this was hidden by commit `651e28c553` ("apparmor: add base infastructure for socket mediation") which fixed the apparmor_audit_data struct when its data was added. When that commit was reverted in -rc7 the signal audit bug was exposed, and unfortunately it never showed up in any of the testing until after 4.14 was released. Shaun Khan, Zephaniah E. Loss-Cutler-Hull filed nearly simultaneous bug reports (with different oopes, the smaller of which is included below). Full credit goes to Tetsuo Handa for jumping on this as well and noticing the audit data struct problem and reporting it. [ 76.178568] BUG: unable to handle kernel paging request at ffffffff0eee3bc0 [ 76.178579] IP: audit_signal_cb+0x6c/0xe0 [ 76.178581] PGD 1a640a067 P4D 1a640a067 PUD 0 [ 76.178586] Oops: 0000 [#1] PREEMPT SMP [ 76.178589] Modules linked in: fuse rfcomm bnep usblp uvcvideo btusb btrtl btbcm btintel bluetooth ecdh_generic ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables intel_rapl joydev wmi_bmof serio_raw iwldvm iwlwifi shpchp kvm_intel kvm irqbypass autofs4 algif_skcipher nls_iso8859_1 nls_cp437 crc32_pclmul ghash_clmulni_intel [ 76.178620] CPU: 0 PID: 10675 Comm: pidgin Not tainted 4.14.0-f1-dirty #135 [ 76.178623] Hardware name: Hewlett-Packard HP EliteBook Folio 9470m/18DF, BIOS 68IBD Ver. F.62 10/22/2015 [ 76.178625] task: ffff9c7a94c31dc0 task.stack: ffffa09b02a4c000 [ 76.178628] RIP: 0010:audit_signal_cb+0x6c/0xe0 [ 76.178631] RSP: 0018:ffffa09b02a4fc08 EFLAGS: 00010292 [ 76.178634] RAX: ffffa09b02a4fd60 RBX: ffff9c7aee0741f8 RCX: 0000000000000000 [ 76.178636] RDX: ffffffffee012290 RSI: 0000000000000006 RDI: ffff9c7a9493d800 [ 76.178638] RBP: ffffa09b02a4fd40 R08: 000000000000004d R09: ffffa09b02a4fc46 [ 76.178641] R10: ffffa09b02a4fcb8 R11: ffff9c7ab44f5072 R12: ffffa09b02a4fd40 [ 76.178643] R13: ffffffff9e447be0 R14: ffff9c7a94c31dc0 R15: 0000000000000001 [ 76.178646] FS: 00007f8b11ba2a80(0000) GS:ffff9c7afea00000(0000) knlGS:0000000000000000 [ 76.178648] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 76.178650] CR2: ffffffff0eee3bc0 CR3: 00000003d5209002 CR4: 00000000001606f0 [ 76.178652] Call Trace: [ 76.178660] common_lsm_audit+0x1da/0x780 [ 76.178665] ? d_absolute_path+0x60/0x90 [ 76.178669] ? aa_check_perms+0xcd/0xe0 [ 76.178672] aa_check_perms+0xcd/0xe0 [ 76.178675] profile_signal_perm.part.0+0x90/0xa0 [ 76.178679] aa_may_signal+0x16e/0x1b0 [ 76.178686] apparmor_task_kill+0x51/0x120 [ 76.178690] security_task_kill+0x44/0x60 [ 76.178695] group_send_sig_info+0x25/0x60 [ 76.178699] kill_pid_info+0x36/0x60 [ 76.178703] SYSC_kill+0xdb/0x180 [ 76.178707] ? preempt_count_sub+0x92/0xd0 [ 76.178712] ? _raw_write_unlock_irq+0x13/0x30 [ 76.178716] ? task_work_run+0x6a/0x90 [ 76.178720] ? exit_to_usermode_loop+0x80/0xa0 [ 76.178723] entry_SYSCALL_64_fastpath+0x13/0x94 [ 76.178727] RIP: 0033:0x7f8b0e58b767 [ 76.178729] RSP: 002b:00007fff19efd4d8 EFLAGS: 00000206 ORIG_RAX: 000000000000003e [ 76.178732] RAX: ffffffffffffffda RBX: 0000557f3e3c2050 RCX: 00007f8b0e58b767 [ 76.178735] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000263b [ 76.178737] RBP: 0000000000000000 R08: 0000557f3e3c2270 R09: 0000000000000001 [ 76.178739] R10: 000000000000022d R11: 0000000000000206 R12: 0000000000000000 [ 76.178741] R13: 0000000000000001 R14: 0000557f3e3c13c0 R15: 0000000000000000 [ 76.178745] Code: 48 8b 55 18 48 89 df 41 b8 20 00 08 01 5b 5d 48 8b 42 10 48 8b 52 30 48 63 48 4c 48 8b 44 c8 48 31 c9 48 8b 70 38 e9 f4 fd 00 00 <48> 8b 14 d5 40 27 e5 9e 48 c7 c6 7d 07 19 9f 48 89 df e8 fd 35 [ 76.178794] RIP: audit_signal_cb+0x6c/0xe0 RSP: ffffa09b02a4fc08 [ 76.178796] CR2: ffffffff0eee3bc0 [ 76.178799] ---[ end trace 514af9529297f1a3 ]--- Fixes: `cd1dbf76b2` ("apparmor: add the ability to mediate signals") Reported-by: Zephaniah E. Loss-Cutler-Hull <warp-spam_kernel@aehallh.com> Reported-by: Shuah Khan <shuahkh@osg.samsung.com> Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Tested-by: Ivan Kozik <ivan@ludios.org> Tested-by: Zephaniah E. Loss-Cutler-Hull <warp-spam_kernel@aehallh.com> Tested-by: Christian Boltz <apparmor@cboltz.de> Tested-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:31 +01:00
Peter Ujfalusi	a3ed24cd28	omapdrm: hdmi4: Correct the SoC revision matching commit `23970e150a` upstream. I believe the intention of the commit `2c9fc9bf45` ("drm: omapdrm: Move FEAT_HDMI_* features to hdmi4 driver") was to identify omap4430 ES1.x, omap4430 ES2.x and other OMAP4 revisions, like omap4460. By using family=OMAP4 in the match the code will treat omap4460 ES1.x in a same way as it would treat omap4430 ES1.x This breaks HDMI audio on OMAP4460 devices (PandaES for example). Correct the match rule so we are not going to get false positive match. Fixes: `2c9fc9bf45` ("drm: omapdrm: Move FEAT_HDMI_* features to hdmi4 driver") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:31 +01:00
Laurent Pinchart	4479ade38b	drm: omapdrm: Fix DPI on platforms using the DSI VDDS commit `bf25dac38f` upstream. Commit `d178e034d5` ("drm: omapdrm: Move FEAT_DPI_USES_VDDS_DSI feature to dpi code") replaced usage of platform data version with SoC matching to configure DPI VDDS. The SoC match entries were incorrect, they should have matched on the machine name instead of the SoC family. Fix it. The result was observed on OpenPandora with OMAP3530 where the panel only had the Blue channel and Red&Green were missing. It was not observed on GTA04 with DM3730. Fixes: `d178e034d5` ("drm: omapdrm: Move FEAT_DPI_USES_VDDS_DSI feature to dpi code") Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reported-by: H. Nikolaus Schaller <hns@goldelico.com> Tested-by: H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:31 +01:00
Martin Schwidefsky	b367ea9453	s390: revert ELF_ET_DYN_BASE base changes commit `345f8f34bb` upstream. This reverts commit `a73dc5370e`. Reducing the base address for 31-bit PIE executables from (STACK_TOP/3)*2 to 4MB broke several compat programs which use -fpie to move the executable out of the lower 16MB. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:31 +01:00
Vasily Averin	84779085fa	lockd: lost rollback of set_grace_period() in lockd_down_net() commit `3a2b19d1ee` upstream. Commit `efda760fe9` ("lockd: fix lockd shutdown race") is incorrect, it removes lockd_manager and disarm grace_period_end for init_net only. If nfsd was started from another net namespace lockd_up_net() calls set_grace_period() that adds lockd_manager into per-netns list and queues grace_period_end delayed work. These action should be reverted in lockd_down_net(). Otherwise it can lead to double list_add on after restart nfsd in netns, and to use-after-free if non-disarmed delayed work will be executed after netns destroy. Fixes: `efda760fe9` ("lockd: fix lockd shutdown race") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:30 +01:00
Ondrej Mosnáček	861eae231b	crypto: skcipher - Fix skcipher_walk_aead_common commit `c14ca83865` upstream. The skcipher_walk_aead_common function calls scatterwalk_copychunks on the input and output walks to skip the associated data. If the AD end at an SG list entry boundary, then after these calls the walks will still be pointing to the end of the skipped region. These offsets are later checked for alignment in skcipher_walk_next, so the skcipher_walk may detect the alignment incorrectly. This patch fixes it by calling scatterwalk_done after the copychunks calls to ensure that the offsets refer to the right SG list entry. Fixes: `b286d8b1a6` ("crypto: skcipher - Add skcipher walk interface") Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:30 +01:00
Stephan Mueller	7f21961a2d	crypto: af_alg - remove locking in async callback commit `7d2c3f54e6` upstream. The code paths protected by the socket-lock do not use or modify the socket in a non-atomic fashion. The actions pertaining the socket do not even need to be handled as an atomic operation. Thus, the socket-lock can be safely ignored. This fixes a bug regarding scheduling in atomic as the callback function may be invoked in interrupt context. In addition, the sock_hold is moved before the AIO encrypt/decrypt operation to ensure that the socket is always present. This avoids a tiny race window where the socket is unprotected and yet used by the AIO operation. Finally, the release of resources for a crypto operation is moved into a common function of af_alg_free_resources. Fixes: `e870456d8e` ("crypto: algif_skcipher - overhaul memory management") Fixes: `d887c52d6a` ("crypto: algif_aead - overhaul memory management") Reported-by: Romain Izard <romain.izard.pro@gmail.com> Signed-off-by: Stephan Mueller <smueller@chronox.de> Tested-by: Romain Izard <romain.izard.pro@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:30 +01:00
Stephan Mueller	721872a199	crypto: algif_aead - skip SGL entries with NULL page commit `8e1fa89aa8` upstream. The TX SGL may contain SGL entries that are assigned a NULL page. This may happen if a multi-stage AIO operation is performed where the data for each stage is pointed to by one SGL entry. Upon completion of that stage, af_alg_pull_tsgl will assign NULL to the SGL entry. The NULL cipher used to copy the AAD from TX SGL to the destination buffer, however, cannot handle the case where the SGL starts with an SGL entry having a NULL page. Thus, the code needs to advance the start pointer into the SGL to the first non-NULL entry. This fixes a crash visible on Intel x86 32 bit using the libkcapi test suite. Fixes: `72548b093e` ("crypto: algif_aead - copy AAD from src to dst") Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:30 +01:00
Naofumi Honda	b78da96d28	nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat commit `64ebe12494` upstream. From kernel 4.9, my two nfsv4 servers sometimes suffer from "panic: unable to handle kernel page request" in posix_unblock_lock() called from nfs4_laundromat(). These panics diseappear if we revert the commit "nfsd: add a LRU list for blocked locks". The cause appears to be a typo in nfs4_laundromat(), which is also present in nfs4_state_shutdown_net(). Fixes: `7919d0a27f` "nfsd: add a LRU list for blocked locks" Cc: jlayton@redhat.com Reveiwed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:30 +01:00
Trond Myklebust	73cfeab675	nfsd: Fix another OPEN stateid race commit `d8a1a00055` upstream. If nfsd4_process_open2() is initialising a new stateid, and yet the call to nfs4_get_vfs_file() fails for some reason, then we must declare the stateid closed, and unhash it before dropping the mutex. Right now, we unhash the stateid after dropping the mutex, and without changing the stateid type, meaning that another OPEN could theoretically look it up and attempt to use it. Reported-by: Andrew W Elble <aweits@rit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:30 +01:00
Trond Myklebust	db77ab54a5	nfsd: Fix stateid races between OPEN and CLOSE commit `15ca08d329` upstream. Open file stateids can linger on the nfs4_file list of stateids even after they have been closed. In order to avoid reusing such a stateid, and confusing the client, we need to recheck the nfs4_stid's type after taking the mutex. Otherwise, we risk reusing an old stateid that was already closed, which will confuse clients that expect new stateids to conform to RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:30 +01:00
Josef Bacik	28580e7573	btrfs: clear space cache inode generation always commit `8e138e0d92` upstream. We discovered a box that had double allocations, and suspected the space cache may be to blame. While auditing the write out path I noticed that if we've already setup the space cache we will just carry on. This means that any error we hit after cache_save_setup before we go to actually write the cache out we won't reset the inode generation, so whatever was already written will be considered correct, except it'll be stale. Fix this by _always_ resetting the generation on the block group inode, this way we only ever have valid or invalid cache. With this patch I was no longer able to reproduce cache corruption with dm-log-writes and my bpf error injection tool. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:30 +01:00
Kirill A. Shutemov	be75ad849b	mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine commit `f4f0a3d85b` upstream. I made a mistake during converting hugetlb code to 5-level paging: in huge_pte_alloc() we have to use p4d_alloc(), not p4d_offset(). Otherwise it leads to crash -- NULL-pointer dereference in pud_alloc() if p4d table is not yet allocated. It only can happen in 5-level paging mode. In 4-level paging mode p4d_offset() always returns pgd, so we are fine. Link: http://lkml.kernel.org/r/20171122121921.64822-1-kirill.shutemov@linux.intel.com Fixes: `c2febafc67` ("mm: convert generic code to 5-level paging") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:30 +01:00
Ian Kent	f354f4ffe6	autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored" commit `5d38f049ce` upstream. Commit `42f4614821` ("autofs: fix AT_NO_AUTOMOUNT not being honored") allowed the fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT flag but introduced a semantic change. In order to honor AT_NO_AUTOMOUNT a semantic change was made to the negative dentry case for stat family system calls in follow_automount(). This changed the unconditional triggering of an automount in this case to no longer be done and an error returned instead. This has caused more problems than I expected so reverting the change is needed. In a discussion with Neil Brown it was concluded that the automount(8) daemon can implement this change without kernel modifications. So that will be done instead and the autofs module documentation updated with a description of the problem and what needs to be done by module users for this specific case. Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.themaw.net Fixes: `42f4614821` ("autofs: fix AT_NO_AUTOMOUNT not being honored") Signed-off-by: Ian Kent <raven@themaw.net> Cc: Neil Brown <neilb@suse.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Colin Walters <walters@redhat.com> Cc: Ondrej Holy <oholy@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:29 +01:00
Ian Kent	793a07bef9	autofs: revert "autofs: take more care to not update last_used on path walk" commit `43694d4bf8` upstream. While commit `092a53452b` ("autofs: take more care to not update last_used on path walk") helped (partially) resolve a problem where automounts were not expiring due to aggressive accesses from user space it has a side effect for very large environments. This change helps with the expire problem by making the expire more aggressive but, for very large environments, that means more mount requests from clients. When there are a lot of clients that can mean fairly significant server load increases. It turns out I put the last_used in this position to solve this very problem and failed to update my own thinking of the autofs expire policy. So the patch being reverted introduces a regression which should be fixed. Link: http://lkml.kernel.org/r/151174729420.6162.1832622523537052460.stgit@pluto.themaw.net Fixes: `092a53452b` ("autofs: take more care to not update last_used on path walk") Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: NeilBrown <neilb@suse.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Colin Walters <walters@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Ondrej Holy <oholy@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:29 +01:00
OGAWA Hirofumi	fa2a989473	fs/fat/inode.c: fix sb_rdonly() change commit `b6e8e12c0a` upstream. Commit `bc98a42c1f` ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)") converted fat_remount():new_rdonly from a bool to an int. However fat_remount() depends upon the compiler's conversion of a non-zero integer into boolean `true'. Fix it by switching `new_rdonly' back into a bool. Link: http://lkml.kernel.org/r/87mv3d5x51.fsf@mail.parknet.co.jp Fixes: `bc98a42c1f` ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)") Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Joe Perches <joe@perches.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:29 +01:00
Shakeel Butt	d983b6251c	mm, memcg: fix mem_cgroup_swapout() for THPs commit `d08afa149a` upstream. Commit `d6810d7300` ("memcg, THP, swap: make mem_cgroup_swapout() support THP") changed mem_cgroup_swapout() to support transparent huge page (THP). However the patch missed one location which should be changed for correctly handling THPs. The resulting bug will cause the memory cgroups whose THPs were swapped out to become zombies on deletion. Link: http://lkml.kernel.org/r/20171128161941.20931-1-shakeelb@google.com Fixes: `d6810d7300` ("memcg, THP, swap: make mem_cgroup_swapout() support THP") Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-12-05 11:26:29 +01:00
Zi Yan	13167cf417	mm: migrate: fix an incorrect call of prep_transhuge_page() commit `40a899ed16` upstream. In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during memory hotplug/hot remove prep_transhuge_page() is called incorrectly on non-THP pages for migration, when THP is on but THP migration is not enabled. This leads to a bad state of target pages for migration. By inspecting the code, if called on a non-THP, prep_transhuge_page() will 1) change the value of the mapping of (page + 2), since it is used for THP deferred list; 2) change the lru value of (page + 1), since it is used for THP's dtor. Both can lead to data corruption of these two pages. Andrea said: "Pragmatically and from the point of view of the memory_hotplug subsys, the effect is a kernel crash when pages are being migrated during a memory hot remove offline and migration target pages are found in a bad state" This patch fixes it by only calling prep_transhuge_page() when we are certain that the target page is THP. Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com Fixes: `8135d8926c` ("mm: memory_hotplug: memory hotremove supports thp migration") Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu> Reported-by: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:29 +01:00
chenjie	8a0bb9ebaa	mm/madvise.c: fix madvise() infinite loop under special circumstances commit `6ea8d958a2` upstream. MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings. Unfortunately madvise_willneed() doesn't communicate this information properly to the generic madvise syscall implementation. The calling convention is quite subtle there. madvise_vma() is supposed to either return an error or update &prev otherwise the main loop will never advance to the next vma and it will keep looping for ever without a way to get out of the kernel. It seems this has been broken since introduction. Nobody has noticed because nobody seems to be using MADVISE_WILLNEED on these DAX mappings. [mhocko@suse.com: rewrite changelog] Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com Fixes: `fe77ba6f4f` ("[PATCH] xip: madvice/fadvice: execute in place") Signed-off-by: chenjie <chenjie6@huawei.com> Signed-off-by: guoxuenan <guoxuenan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: zhangyi (F) <yi.zhang@huawei.com> Cc: Miao Xie <miaoxie@huawei.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Shaohua Li <shli@fb.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:29 +01:00
Kees Cook	c16a65582a	exec: avoid RLIMIT_STACK races with prlimit() commit `04e35f4495` upstream. While the defense-in-depth RLIMIT_STACK limit on setuid processes was protected against races from other threads calling setrlimit(), I missed protecting it against races from external processes calling prlimit(). This adds locking around the change and makes sure that rlim_max is set too. Link: http://lkml.kernel.org/r/20171127193457.GA11348@beast Fixes: `64701dee41` ("exec: Use sane stack rlimit under secureexec") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Reported-by: Brad Spengler <spender@grsecurity.net> Acked-by: Serge Hallyn <serge@hallyn.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:29 +01:00
Dan Williams	aab681eb8f	IB/core: disable memory registration of filesystem-dax vmas commit `5f1d43de54` upstream. Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow RDMA to create long standing memory registrations against filesytem-dax vmas. Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: `3565fce3a6` ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Inki Dae <inki.dae@samsung.com> Cc: Jan Kara <jack@suse.cz> Cc: Joonyoung Shim <jy0922.shim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:29 +01:00
Dan Williams	e13ee3368e	v4l2: disable filesystem-dax mapping support commit `b70131de64` upstream. V4L2 memory registrations are incompatible with filesystem-dax that needs the ability to revoke dma access to a mapping at will, or otherwise allow the kernel to wait for completion of DMA. The filesystem-dax implementation breaks the traditional solution of truncate of active file backed mappings since there is no page-cache page we can orphan to sustain ongoing DMA. If v4l2 wants to support long lived DMA mappings it needs to arrange to hold a file lease or use some other mechanism so that the kernel can coordinate revoking DMA access when the filesystem needs to truncate mappings. Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: `3565fce3a6` ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Doug Ledford <dledford@redhat.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Inki Dae <inki.dae@samsung.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Joonyoung Shim <jy0922.shim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:29 +01:00
Dan Williams	daac045736	mm: fail get_vaddr_frames() for filesystem-dax mappings commit `b7f0554a56` upstream. Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow V4L2, Exynos, and other frame vector users to create long standing / irrevocable memory registrations against filesytem-dax vmas. [dan.j.williams@intel.com: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan] Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: `3565fce3a6` ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Inki Dae <inki.dae@samsung.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Joonyoung Shim <jy0922.shim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Doug Ledford <dledford@redhat.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:28 +01:00
Dan Williams	40aa9d2998	mm: introduce get_user_pages_longterm commit `2bb6d28370` upstream. Patch series "introduce get_user_pages_longterm()", v2. Here is a new get_user_pages api for cases where a driver intends to keep an elevated page count indefinitely. This is distinct from usages like iov_iter_get_pages where the elevated page counts are transient. The iov_iter_get_pages cases immediately turn around and submit the pages to a device driver which will put_page when the i/o operation completes (under kernel control). In the longterm case userspace is responsible for dropping the page reference at some undefined point in the future. This is untenable for filesystem-dax case where the filesystem is in control of the lifetime of the block / page and needs reasonable limits on how long it can wait for pages in a mapping to become idle. Fixing filesystems to actually wait for dax pages to be idle before blocks from a truncate/hole-punch operation are repurposed is saved for a later patch series. Also, allowing longterm registration of dax mappings is a future patch series that introduces a "map with lease" semantic where the kernel can revoke a lease and force userspace to drop its page references. I have also tagged these for -stable to purposely break cases that might assume that longterm memory registrations for filesystem-dax mappings were supported by the kernel. The behavior regression this policy change implies is one of the reasons we maintain the "dax enabled. Warning: EXPERIMENTAL, use at your own risk" notification when mounting a filesystem in dax mode. It is worth noting the device-dax interface does not suffer the same constraints since it does not support file space management operations like hole-punch. This patch (of 4): Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow long standing memory registrations against filesytem-dax vmas. Device-dax vmas do not have this problem and are explicitly allowed. This is temporary until a "memory registration with layout-lease" mechanism can be implemented for the affected sub-systems (RDMA and V4L2). [akpm@linux-foundation.org: use kcalloc()] Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: `3565fce3a6` ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Christoph Hellwig <hch@lst.de> Cc: Doug Ledford <dledford@redhat.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Inki Dae <inki.dae@samsung.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Joonyoung Shim <jy0922.shim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:28 +01:00
Dan Williams	61586a2286	device-dax: implement ->split() to catch invalid munmap attempts commit `9702cffdbf` upstream. Similar to how device-dax enforces that the 'address', 'offset', and 'len' parameters to mmap() be aligned to the device's fundamental alignment, the same constraints apply to munmap(). Implement ->split() to fail munmap calls that violate the alignment constraint. Otherwise, we later fail VM_BUG_ON checks in the unmap_page_range() path with crash signatures of the form: vma ffff8800b60c8a88 start 00007f88c0000000 end 00007f88c0e00000 next (null) prev (null) mm ffff8800b61150c0 prot 8000000000000027 anon_vma (null) vm_ops ffffffffa0091240 pgoff 0 file ffff8800b638ef80 private_data (null) flags: 0x380000fb(read\|write\|shared\|mayread\|maywrite\|mayexec\|mayshare\|softdirty\|mixedmap\|hugepage) ------------[ cut here ]------------ kernel BUG at mm/huge_memory.c:2014! [..] RIP: 0010:__split_huge_pud+0x12a/0x180 [..] Call Trace: unmap_page_range+0x245/0xa40 ? __vma_adjust+0x301/0x990 unmap_vmas+0x4c/0xa0 unmap_region+0xae/0x120 ? __vma_rb_erase+0x11a/0x230 do_munmap+0x276/0x410 vm_munmap+0x6a/0xa0 SyS_munmap+0x1d/0x30 Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: `dee4107924` ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:28 +01:00
Dan Williams	c6c78a1d45	mm, hugetlbfs: introduce ->split() to vm_operations_struct commit `31383c6865` upstream. Patch series "device-dax: fix unaligned munmap handling" When device-dax is operating in huge-page mode we want it to behave like hugetlbfs and fail attempts to split vmas into unaligned ranges. It would be messy to teach the munmap path about device-dax alignment constraints in the same (hstate) way that hugetlbfs communicates this constraint. Instead, these patches introduce a new ->split() vm operation. This patch (of 2): The device-dax interface has similar constraints as hugetlbfs in that it requires the munmap path to unmap in huge page aligned units. Rather than add more custom vma handling code in __split_vma() introduce a new vm operation to perform this vma specific check. Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: `dee4107924` ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:28 +01:00
Dan Williams	bf55918cb4	mm: fix device-dax pud write-faults triggered by get_user_pages() commit `1501899a89` upstream. Currently only get_user_pages_fast() can safely handle the writable gup case due to its use of pud_access_permitted() to check whether the pud entry is writable. In the gup slow path pud_write() is used instead of pud_access_permitted() and to date it has been unimplemented, just calls BUG_ON(). kernel BUG at ./include/linux/hugetlb.h:244! [..] RIP: 0010:follow_devmap_pud+0x482/0x490 [..] Call Trace: follow_page_mask+0x28c/0x6e0 __get_user_pages+0xe4/0x6c0 get_user_pages_unlocked+0x130/0x1b0 get_user_pages_fast+0x89/0xb0 iov_iter_get_pages_alloc+0x114/0x4a0 nfs_direct_read_schedule_iovec+0xd2/0x350 ? nfs_start_io_direct+0x63/0x70 nfs_file_direct_read+0x1e0/0x250 nfs_file_read+0x90/0xc0 For now this just implements a simple check for the _PAGE_RW bit similar to pmd_write. However, this implies that the gup-slow-path check is missing the extra checks that the gup-fast-path performs with pud_access_permitted. Later patches will align all checks to use the 'access_permitted' helper if the architecture provides it. Note that the generic 'access_permitted' helper fallback is the simple _PAGE_RW check on architectures that do not define the 'access_permitted' helper(s). [dan.j.williams@intel.com: fix powerpc compile error] Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: `a00cc7d9dd` ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:28 +01:00
Mike Kravetz	b4c8fce668	mm/cma: fix alloc_contig_range ret code/potential leak commit `63cd448908` upstream. If the call __alloc_contig_migrate_range() in alloc_contig_range returns -EBUSY, processing continues so that test_pages_isolated() is called where there is a tracepoint to identify the busy pages. However, it is possible for busy pages to become available between the calls to these two routines. In this case, the range of pages may be allocated. Unfortunately, the original return code (ret == -EBUSY) is still set and returned to the caller. Therefore, the caller believes the pages were not allocated and they are leaked. Update the comment to indicate that allocation is still possible even if __alloc_contig_migrate_range returns -EBUSY. Also, clear return code in this case so that it is not accidentally used or returned to caller. Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com Fixes: `8ef5849fa8` ("mm/cma: always check which page caused allocation failure") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:28 +01:00
Kirill A. Shutemov	01ca972745	mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d() commit `a8f9736645` upstream. Currently, we unconditionally make page table dirty in touch_pmd(). It may result in false-positive can_follow_write_pmd(). We may avoid the situation, if we would only make the page table entry dirty if caller asks for write access -- FOLL_WRITE. The patch also changes touch_pud() in the same way. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:28 +01:00
Wang Nan	786b924d39	mm, oom_reaper: gather each vma to prevent leaking TLB entry commit `687cb0884a` upstream. tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory space. In this case, tlb->fullmm is true. Some archs like arm64 doesn't flush TLB when tlb->fullmm is true: commit `5a7862e830` ("arm64: tlbflush: avoid flushing when fullmm == 1"). Which causes leaking of tlb entries. Will clarifies his patch: "Basically, we tag each address space with an ASID (PCID on x86) which is resident in the TLB. This means we can elide TLB invalidation when pulling down a full mm because we won't ever assign that ASID to another mm without doing TLB invalidation elsewhere (which actually just nukes the whole TLB). I think that means that we could potentially not fault on a kernel uaccess, because we could hit in the TLB" There could be a window between complete_signal() sending IPI to other cores and all threads sharing this mm are really kicked off from cores. In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to flush TLB then frees pages. However, due to the above problem, the TLB entries are not really flushed on arm64. Other threads are possible to access these pages through TLB entries. Moreover, a copy_to_user() can also write to these pages without generating page fault, causes use-after-free bugs. This patch gathers each vma instead of gathering full vm space. In this case tlb->fullmm is not true. The behavior of oom reaper become similar to munmapping before do_exit, which should be safe for all archs. Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com Fixes: `aac4536355` ("mm, oom: introduce oom reaper") Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Bob Liu <liubo95@huawei.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:28 +01:00
Michal Hocko	b8d0c953d9	mm, memory_hotplug: do not back off draining pcp free pages from kworker context commit `4b81cb2ff6` upstream. drain_all_pages backs off when called from a kworker context since commit `0ccce3b924` ("mm, page_alloc: drain per-cpu pages from workqueue context") because the original IPI based pcp draining has been replaced by a WQ based one and the check wanted to prevent from recursion and inter workers dependencies. This has made some sense at the time because the system WQ has been used and one worker holding the lock could be blocked while waiting for new workers to emerge which can be a problem under OOM conditions. Since then commit `ce612879dd` ("mm: move pcp and lru-pcp draining into single wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a rescuer so we shouldn't depend on any other WQ activity to make a forward progress so calling drain_all_pages from a worker context is safe as long as this doesn't happen from mm_percpu_wq itself which is not the case because all workers are required to _not_ depend on any MM locks. Why is this a problem in the first place? ACPI driven memory hot-remove (acpi_device_hotplug) is executed from the worker context. We end up calling __offline_pages to free all the pages and that requires both lru_add_drain_all_cpuslocked and drain_all_pages to do their job otherwise we can have dangling pages on pcp lists and fail the offline operation (__test_page_isolated_in_pageblock would see a page with 0 ref count but without PageBuddy set). Fix the issue by removing the worker check in drain_all_pages. lru_add_drain_all_cpuslocked doesn't have this restriction so it works as expected. Link: http://lkml.kernel.org/r/20170828093341.26341-1-mhocko@kernel.org Fixes: `0ccce3b924` ("mm, page_alloc: drain per-cpu pages from workqueue context") Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:27 +01:00
Stefan Brüns	2cbd866dd8	platform/x86: hp-wmi: Fix tablet mode detection for convertibles commit `9968e12a29` upstream. Commit `f9cf3b2880` ("platform/x86: hp-wmi: Refactor dock and tablet state fetchers") consolidated the methods for docking and laptop mode detection, but omitted to apply the correct mask for the laptop mode (it always uses the constant for docking). Fixes: `f9cf3b2880` ("platform/x86: hp-wmi: Refactor dock and tablet state fetchers") Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de> Cc: Michel Dänzer <michel@daenzer.net> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-05 11:26:27 +01:00
Greg Kroah-Hartman	191314edb3	Linux 4.14.3	2017-11-30 08:41:00 +00:00
Sasha Neftin	0f478f25d5	e1000e: fix buffer overrun while the I219 is processing DMA transactions commit `b10effb92e` upstream. Intel® 100/200 Series Chipset platforms reduced the round-trip latency for the LAN Controller DMA accesses, causing in some high performance cases a buffer overrun while the I219 LAN Connected Device is processing the DMA transactions. I219LM and I219V devices can fall into unrecovered Tx hang under very stressfully UDP traffic and multiple reconnection of Ethernet cable. This Tx hang of the LAN Controller is only recovered if the system is rebooted. Slightly slow down DMA access by reducing the number of outstanding requests. This workaround could have an impact on TCP traffic performance on the platform. Disabling TSO eliminates performance loss for TCP traffic without a noticeable impact on CPU performance. Please, refer to I218/I219 specification update: https://www.intel.com/content/www/us/en/embedded/products/networking/ ethernet-connection-i218-family-documentation.html Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Reviewed-by: Raanan Avargil <raanan.avargil@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:59 +00:00
Benjamin Poirier	10d0fd2931	e1000e: Avoid receiver overrun interrupt bursts commit `4aea7a5c5e` upstream. When e1000e_poll() is not fast enough to keep up with incoming traffic, the adapter (when operating in msix mode) raises the Other interrupt to signal Receiver Overrun. This is a double problem because 1) at the moment e1000_msix_other() assumes that it is only called in case of Link Status Change and 2) if the condition persists, the interrupt is repeatedly raised again in quick succession. Ideally we would configure the Other interrupt to not be raised in case of receiver overrun but this doesn't seem possible on this adapter. Instead, we handle the first part of the problem by reverting to the practice of reading ICR in the other interrupt handler, like before commit `16ecba59bc` ("e1000e: Do not read ICR in Other interrupt"). Thanks to commit `0a8047ac68` ("e1000e: Fix msi-x interrupt automask") which cleared IAME from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts anymore. We handle the second part of the problem by not re-enabling the Other interrupt right away when there is overrun. Instead, we wait until traffic subsides, napi polling mode is exited and interrupts are re-enabled. Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca> Fixes: `16ecba59bc` ("e1000e: Do not read ICR in Other interrupt") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:59 +00:00
Benjamin Poirier	830466993d	e1000e: Separate signaling for link check/link up commit `19110cfbb3` upstream. Lennart reported the following race condition: \ e1000_watchdog_task \ e1000e_has_link \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link /* link is up / mac->get_link_status = false; / interrupt / \ e1000_msix_other hw->mac.get_link_status = true; link_active = !hw->mac.get_link_status / link_active is false, wrongly */ This problem arises because the single flag get_link_status is used to signal two different states: link status needs checking and link status is down. Avoid the problem by using the return value of .check_for_link to signal the link status to e1000e_has_link(). Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca> Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:59 +00:00
Benjamin Poirier	2b91745f8a	e1000e: Fix return value test commit `d3509f8bc7` upstream. All the helpers return -E1000_ERR_PHY. Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:59 +00:00
Benjamin Poirier	8606bf0219	e1000e: Fix error path in link detection commit `c4c40e51f9` upstream. In case of error from e1e_rphy(), the loop will exit early and "success" will be set to true erroneously. Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:59 +00:00
Luca Coelho	391243ca6d	iwlwifi: mvm: support version 7 of the SCAN_REQ_UMAC FW command commit `dac4df1c5f` upstream. Newer firmware versions (such as iwlwifi-8000C-34.ucode) have introduced an API change in the SCAN_REQ_UMAC command that is not backwards compatible. The driver needs to detect and use the new API format when the firmware reports it, otherwise the scan command will not work properly, causing a command timeout. Fix this by adding a TLV that tells the driver that the new API is in use and use the correct structures for it. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=197591 Fixes: `d7a5b3e9e4` ("iwlwifi: mvm: bump API to 34 for 8000 and up") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Cc: Thomas Backlund <tmb@mageia.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:59 +00:00
Luca Coelho	5f24172d80	iwlwifi: fix PCI IDs and configuration mapping for 9000 series commit `dbc89253a7` upstream. A lot of PCI IDs were missing and there were some problems with the configuration and firmware selection for devices on the 9000 series. Fix the firmware selection by adding files for the B-steps; add configuration for some integrated devices; and add a bunch of PCI IDs (mostly for integrated devices) that were missing from the driver's list. Without this patch, a lot of devices will not be recognized or will try to load the wrong firmware file. Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Cc: Thomas Backlund <tmb@mageia.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:59 +00:00
Ihab Zhaika	9f4482f1a3	iwlwifi: add new cards for 8260 series commit `d669fc2d42` upstream. add three new PCI ID'S for 8260 series Signed-off-by: Ihab Zhaika <ihab.zhaika@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Cc: Thomas Backlund <tmb@mageia.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:59 +00:00
Ihab Zhaika	2b2c1ae5b3	iwlwifi: add new cards for 8265 series commit `7cddbef445` upstream. add two new PCI ID'S for 8265 series Signed-off-by: Ihab Zhaika <ihab.zhaika@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Cc: Thomas Backlund <tmb@mageia.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:59 +00:00
Ihab Zhaika	3e85efe021	iwlwifi: add new cards for a000 series commit `57b36f7fcb` upstream. add four new PCI ID'S for a000 series Signed-off-by: Ihab Zhaika <ihab.zhaika@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Cc: Thomas Backlund <tmb@mageia.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:58 +00:00

1 2 3 4 5 ...

708516 Commits