Commit Graph

79759 Commits

Author SHA1 Message Date
Thiago Rafael Becker
58330ec2fe kernel: make groups_sort calling a responsibility group_info allocators
commit bdcf0a423e upstream.

In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
 - Make groups_sort globally visible.
 - Move the call to groups_sort to the modifiers of group_info
 - Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:10 +01:00
David Howells
d1698dc8a5 fscache: Fix the default for fscache_maybe_release_page()
commit 9880150655 upstream.

Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached.  It mustn't return false as that indicates
the page cannot yet be freed.

The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem ->releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.

This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress.  This might be an issue for
9P, however.

Note that the pages aren't entirely stuck.  Removing a file or unmounting
will clear things because that uses ->invalidatepage() instead.

Fixes: 201a15428b ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:10 +01:00
Kees Cook
3e1457d6bf KPTI: Rename to PAGE_TABLE_ISOLATION
This renames CONFIG_KAISER to CONFIG_PAGE_TABLE_ISOLATION.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:26 +01:00
Hugh Dickins
3e3d38fd98 kaiser: vmstat show NR_KAISERTABLE as nr_overhead
The kaiser update made an interesting choice, never to free any shadow
page tables.  Contention on global spinlock was worrying, particularly
with it held across page table scans when freeing.  Something had to be
done: I was going to add refcounting; but simply never to free them is
an appealing choice, minimizing contention without complicating the code
(the more a page table is found already, the less the spinlock is used).

But leaking pages in this way is also a worry: can we get away with it?
At the very least, we need a count to show how bad it actually gets:
in principle, one might end up wasting about 1/256 of memory that way
(1/512 for when direct-mapped pages have to be user-mapped, plus 1/512
for when they are user-mapped from the vmalloc area on another occasion
(but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed).

Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the
shared pgd entries, and 1 for each intermediate page table added
thereafter for user-mapping - but leave out the 1 per mm, for its
shadow pgd, because that distracts from the monotonic increase.
Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled).

In practice, it doesn't look so bad so far: more like 1/12000 after
nine hours of gtests below; and movable pageblock segregation should
tend to cluster the kaiser tables into a subset of the address space
(if not, they will be bad for compaction too).  But production may
tell a different story: keep an eye on this number, and bring back
lighter freeing if it gets out of control (maybe a shrinker).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:24 +01:00
Hugh Dickins
c52e55a2a8 kaiser: cleanups while trying for gold link
While trying to get our gold link to work, four cleanups:
matched the gdt_page declaration to its definition;
in fiddling unsuccessfully with PERCPU_INPUT(), lined up backslashes;
lined up the backslashes according to convention in percpu-defs.h;
deleted the unused irq_stack_pointer addition to irq_stack_union.

Sad to report that aligning backslashes does not appear to help gold
align to 8192: but while these did not help, they are worth keeping.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:24 +01:00
Hugh Dickins
003e476716 kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping().  And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Dave Hansen
bed9bb7f3e kaiser: merged update
Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Richard Fellner
8a43ddfb93 KAISER: Kernel Address Isolation
This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

        https://github.com/IAIK/KAISER

From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>

After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Andy Lutomirski
5d67dbef74 mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP
commit 5dd0b16cda upstream.

This fixes CONFIG_SMP=n, CONFIG_DEBUG_TLBFLUSH=y without introducing
further #ifdef soup.  Caught by a Kbuild bot randconfig build.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ce4a4e565f ("x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code")
Link: http://lkml.kernel.org/r/76da9a3cc4415996f2ad2c905b93414add322021.1496673616.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:28 +01:00
Shaohua Li
e0bdd21a86 net: reevalulate autoflowlabel setting after sysctl setting
[ Upstream commit 513674b5a2 ]

sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
supposed to not include flowlabel. This is true for normal packet, but
not for reset packet.

The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
changed, so the sock will keep the old behavior in terms of auto
flowlabel. Reset packet is suffering from this problem, because reset
packet is sent from a special control socket, which is created at boot
time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
socket will always have its ipv6_pinfo.autoflowlabel set, even after
user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
have flowlabel. Normal sock created before sysctl setting suffers from
the same issue. We can't even turn off autoflowlabel unless we kill all
socks in the hosts.

To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
autoflowlabel setting from user, otherwise we always call
ip6_default_np_autolabel() which has the new settings of sysctl.

Note, this changes behavior a little bit. Before commit 42240901f7
(ipv6: Implement different admin modes for automatic flow labels), the
autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
existing connection will change autoflowlabel behavior. After that
commit, autoflowlabel behavior is sticky in the whole life of the sock.
With this patch, the behavior isn't sticky again.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:25 +01:00
Eric Dumazet
3d406a18bd ipv4: igmp: guard against silly MTU values
[ Upstream commit b5476022bb ]

IPv4 stack reacts to changes to small MTU, by disabling itself under
RTNL.

But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in igmp code where it is
assumed the mtu is suitable.

Fix this by reading device mtu once and checking IPv4 minimal MTU.

This patch adds missing IPV4_MIN_MTU define, to not abuse
ETH_MIN_MTU anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:24 +01:00
Sebastian Andrzej Siewior
2e234e707f crypto: mcryptd - protect the per-CPU queue with a lock
commit 9abffc6f2e upstream.

mcryptd_enqueue_request() grabs the per-CPU queue struct and protects
access to it with disabled preemption. Then it schedules a worker on the
same CPU. The worker in mcryptd_queue_worker() guards access to the same
per-CPU variable with disabled preemption.

If we take CPU-hotplug into account then it is possible that between
queue_work_on() and the actual invocation of the worker the CPU goes
down and the worker will be scheduled on _another_ CPU. And here the
preempt_disable() protection does not work anymore. The easiest thing is
to add a spin_lock() to guard access to the list.

Another detail: mcryptd_queue_worker() is not processing more than
MCRYPTD_BATCH invocation in a row. If there are still items left, then
it will invoke queue_work() to proceed with more later. *I* would
suggest to simply drop that check because it does not use a system
workqueue and the workqueue is already marked as "CPU_INTENSIVE". And if
preemption is required then the scheduler should do it.
However if queue_work() is used then the work item is marked as CPU
unbound. That means it will try to run on the local CPU but it may run
on another CPU as well. Especially with CONFIG_DEBUG_WQ_FORCE_RR_CPU=y.
Again, the preempt_disable() won't work here but lock which was
introduced will help.
In order to keep work-item on the local CPU (and avoid RR) I changed it
to queue_work_on().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:19 +01:00
Andy Lutomirski
425f13a366 sched/core: Add switch_mm_irqs_off() and use it in the scheduler
commit f98db6013c upstream.

By default, this is the same thing as switch_mm().

x86 will override it as an optimization.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/df401df47bdd6be3e389c6f1e3f5310d70e81b2c.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:22:09 +01:00
Jan Kara
b59614cfd2 mm: Handle 0 flags in _calc_vm_trans() macro
[ Upstream commit 592e254502 ]

_calc_vm_trans() does not handle the situation when some of the passed
flags are 0 (which can happen if these VM flags do not make sense for
the architecture). Improve the _calc_vm_trans() macro to return 0 in
such situation. Since all passed flags are constant, this does not add
any runtime overhead.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:04:59 +01:00
Mike Christie
dfd6deed83 target: fix ALUA transition timeout handling
[ Upstream commit d7175373f2 ]

The implicit transition time tells initiators the min time
to wait before timing out a transition. We currently schedule
the transition to occur in tg_pt_gp_implicit_trans_secs
seconds so there is no room for delays. If
core_alua_do_transition_tg_pt_work->core_alua_update_tpg_primary_metadata
needs to write out info to a remote file, then the initiator can
easily time out the operation.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:04:57 +01:00
Jack Morgenstein
acc7d1bd90 net/mlx4_core: Avoid delays during VF driver device shutdown
[ Upstream commit 4cbe4dac82 ]

Some Hypervisors detach VFs from VMs by instantly causing an FLR event
to be generated for a VF.

In the mlx4 case, this will cause that VF's comm channel to be disabled
before the VM has an opportunity to invoke the VF device's "shutdown"
method.

For such Hypervisors, there is a race condition between the VF's
shutdown method and its internal-error detection/reset thread.

The internal-error detection/reset thread (which runs every 5 seconds) also
detects a disabled comm channel. If the internal-error detection/reset
flow wins the race, we still get delays (while that flow tries repeatedly
to detect comm-channel recovery).

The cited commit fixed the command timeout problem when the
internal-error detection/reset flow loses the race.

This commit avoids the unneeded delays when the internal-error
detection/reset flow wins.

Fixes: d585df1c5c ("net/mlx4_core: Avoid command timeouts during VF driver device shutdown")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reported-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:04:56 +01:00
Eric Biggers
43cd7f3861 crypto: hmac - require that the underlying hash algorithm is unkeyed
commit af3ff8045b upstream.

Because the HMAC template didn't check that its underlying hash
algorithm is unkeyed, trying to use "hmac(hmac(sha3-512-generic))"
through AF_ALG or through KEYCTL_DH_COMPUTE resulted in the inner HMAC
being used without having been keyed, resulting in sha3_update() being
called without sha3_init(), causing a stack buffer overflow.

This is a very old bug, but it seems to have only started causing real
problems when SHA-3 support was added (requires CONFIG_CRYPTO_SHA3)
because the innermost hash's state is ->import()ed from a zeroed buffer,
and it just so happens that other hash algorithms are fine with that,
but SHA-3 is not.  However, there could be arch or hardware-dependent
hash algorithms also affected; I couldn't test everything.

Fix the bug by introducing a function crypto_shash_alg_has_setkey()
which tests whether a shash algorithm is keyed.  Then update the HMAC
template to require that its underlying hash algorithm is unkeyed.

Here is a reproducer:

    #include <linux/if_alg.h>
    #include <sys/socket.h>

    int main()
    {
        int algfd;
        struct sockaddr_alg addr = {
            .salg_type = "hash",
            .salg_name = "hmac(hmac(sha3-512-generic))",
        };
        char key[4096] = { 0 };

        algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
        bind(algfd, (const struct sockaddr *)&addr, sizeof(addr));
        setsockopt(algfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key));
    }

Here was the KASAN report from syzbot:

    BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:341  [inline]
    BUG: KASAN: stack-out-of-bounds in sha3_update+0xdf/0x2e0  crypto/sha3_generic.c:161
    Write of size 4096 at addr ffff8801cca07c40 by task syzkaller076574/3044

    CPU: 1 PID: 3044 Comm: syzkaller076574 Not tainted 4.14.0-mm1+ #25
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  Google 01/01/2011
    Call Trace:
      __dump_stack lib/dump_stack.c:17 [inline]
      dump_stack+0x194/0x257 lib/dump_stack.c:53
      print_address_description+0x73/0x250 mm/kasan/report.c:252
      kasan_report_error mm/kasan/report.c:351 [inline]
      kasan_report+0x25b/0x340 mm/kasan/report.c:409
      check_memory_region_inline mm/kasan/kasan.c:260 [inline]
      check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
      memcpy+0x37/0x50 mm/kasan/kasan.c:303
      memcpy include/linux/string.h:341 [inline]
      sha3_update+0xdf/0x2e0 crypto/sha3_generic.c:161
      crypto_shash_update+0xcb/0x220 crypto/shash.c:109
      shash_finup_unaligned+0x2a/0x60 crypto/shash.c:151
      crypto_shash_finup+0xc4/0x120 crypto/shash.c:165
      hmac_finup+0x182/0x330 crypto/hmac.c:152
      crypto_shash_finup+0xc4/0x120 crypto/shash.c:165
      shash_digest_unaligned+0x9e/0xd0 crypto/shash.c:172
      crypto_shash_digest+0xc4/0x120 crypto/shash.c:186
      hmac_setkey+0x36a/0x690 crypto/hmac.c:66
      crypto_shash_setkey+0xad/0x190 crypto/shash.c:64
      shash_async_setkey+0x47/0x60 crypto/shash.c:207
      crypto_ahash_setkey+0xaf/0x180 crypto/ahash.c:200
      hash_setkey+0x40/0x90 crypto/algif_hash.c:446
      alg_setkey crypto/af_alg.c:221 [inline]
      alg_setsockopt+0x2a1/0x350 crypto/af_alg.c:254
      SYSC_setsockopt net/socket.c:1851 [inline]
      SyS_setsockopt+0x189/0x360 net/socket.c:1830
      entry_SYSCALL_64_fastpath+0x1f/0x96

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:04:51 +01:00
Stephen Bates
24c98ec494 lib/genalloc.c: make the avail variable an atomic_long_t
[ Upstream commit 36a3d1dd4e ]

If the amount of resources allocated to a gen_pool exceeds 2^32 then the
avail atomic overflows and this causes problems when clients try and
borrow resources from the pool.  This is only expected to be an issue on
64 bit systems.

Add the <linux/atomic.h> header to pull in atomic_long* operations.  So
that 32 bit systems continue to use atomic32_t but 64 bit systems can
use atomic64_t.

Link: http://lkml.kernel.org/r/1509033843-25667-1-git-send-email-sbates@raithlin.com
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Daniel Mentz <danielmentz@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:54 +01:00
Ladislav Michl
0500c6d352 ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure
[ Upstream commit 7807e086a2 ]

gpmc_probe_onenand_child returns success even on gpmc_onenand_init
failure. Fix that.

Signed-off-by: Ladislav Michl <ladis@linux-mips.org>
Acked-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:51 +01:00
Kirill A. Shutemov
5327f9bada mm: drop unused pmdp_huge_get_and_clear_notify()
commit c0c379e293 upstream.

Dave noticed that after fixing MADV_DONTNEED vs numa balancing race the
last pmdp_huge_get_and_clear_notify() user is gone.

Let's drop the helper.

Link: http://lkml.kernel.org/r/20170306112047.24809-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[jwang: adjust context for 4.4]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:50 +01:00
Dave Gordon
74b6030355 drm: extra printk() wrapper macros
commit 30b0da8d55 upstream.

We had only DRM_INFO() and DRM_ERROR(), whereas the underlying printk()
provides several other useful intermediate levels such as NOTICE and
WARNING. So this patch fills out the set by providing both regular and
once-only macros for each of the levels INFO, NOTICE, and WARNING, using
a common underlying macro that does all the token-pasting.

DRM_ERROR is unchanged, as it's not just a printk wrapper.

v2:
    Fix whitespace, missing ## (Eric Engestrom)

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: dri-devel@lists.freedesktop.org
Acked-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:49 +01:00
Greg Kroah-Hartman
481efb4c72 efi: Move some sysfs files to be read-only by root
commit af97a77bc0 upstream.

Thanks to the scripts/leaking_addresses.pl script, it was found that
some EFI values should not be readable by non-root users.

So make them root-only, and to do that, add a __ATTR_RO_MODE() macro to
make this easier, and use it in other places at the same time.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20171206095010.24170-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:48 +01:00
Huacai Chen
7e0091c295 scsi: libsas: align sata_device's rps_resp on a cacheline
commit c2e8fbf908 upstream.

The rps_resp buffer in ata_device is a DMA target, but it isn't
explicitly cacheline aligned. Due to this, adjacent fields can be
overwritten with stale data from memory on non-coherent architectures.
As a result, the kernel is sometimes unable to communicate with an SATA
device behind a SAS expander.

Fix this by ensuring that the rps_resp buffer is cacheline aligned.

This issue is similar to that fixed by Commit 84bda12af3 ("libata:
align ap->sector_buf") and Commit 4ee34ea3a1 ("libata: Align
ata_device's id on a cacheline").

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:48 +01:00
Masakazu Mokuno
9d812f7bef USB: core: Add type-specific length check of BOS descriptors
commit 81cf4a4536 upstream.

As most of BOS descriptors are longer in length than their header
'struct usb_dev_cap_header', comparing solely with it is not sufficient
to avoid out-of-bounds access to BOS descriptors.

This patch adds descriptor type specific length check in
usb_get_bos_descriptor() to fix the issue.

Signed-off-by: Masakazu Mokuno <masakazu.mokuno@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 18:42:44 +01:00
John Youn
f044af8c1d usb: ch9: Add size macro for SSP dev cap descriptor
commit 446fa3a95d upstream.

The SuperspeedPlus Device Capability Descriptor has a variable size
depending on the number of sublink speed attributes.

This patch adds a macro to calculate that size. The macro takes one
argument, the Sublink Speed Attribute Count (SSAC) as reported by the
descriptor in bmAttributes[4:0].

See USB 3.1 9.6.2.5, Table 9-19.

Signed-off-by: John Youn <johnyoun@synopsys.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 18:42:44 +01:00
Mathias Nyman
ddaa1ae2eb usb: Add USB 3.1 Precision time measurement capability descriptor support
commit faee822c5a upstream.

USB 3.1 devices that support precision time measurement have an
additional PTM cabaility descriptor as part of the full BOS descriptor

Look for this descriptor while parsing the BOS descriptor, and store it in
struct usb_hub_bos if it exists.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 18:42:43 +01:00
Jan Kara
cc4add3897 mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers
[ Upstream commit 0911d0041c ]

Some ->page_mkwrite handlers may return VM_FAULT_RETRY as its return
code (GFS2 or Lustre can definitely do this).  However VM_FAULT_RETRY
from ->page_mkwrite is completely unhandled by the mm code and results
in locking and writeably mapping the page which definitely is not what
the caller wanted.

Fix Lustre and block_page_mkwrite_ret() used by other filesystems
(notably GFS2) to return VM_FAULT_NOPAGE instead which results in
bailing out from the fault code, the CPU then retries the access, and we
fault again effectively doing what the handler wanted.

Link: http://lkml.kernel.org/r/20170203150729.15863-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 18:42:43 +01:00
Huacai Chen
5a7391b6d8 bcache: Fix building error on MIPS
commit cf33c1ee52 upstream.

This patch try to fix the building error on MIPS. The reason is MIPS
has already defined the PTR macro, which conflicts with the PTR macro
in include/uapi/linux/bcache.h.

[fixed by mlyle: corrected a line-length issue]

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05 11:22:51 +01:00
Tom Herbert
27248d2fa7 netlink: add a start callback for starting a netlink dump
commit fc9e50f5a5 upstream.

The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05 11:22:49 +01:00
Trond Myklebust
36d2f19430 SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status
commit e9d4bf219c upstream.

There is no guarantee that either the request or the svc_xprt exist
by the time we get round to printing the trace message.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30 08:37:23 +00:00
Yang Shi
e34e744f70 mm: check the return value of lookup_page_ext for all call sites
commit f86e427197 upstream.

Per the discussion with Joonsoo Kim [1], we need check the return value
of lookup_page_ext() for all call sites since it might return NULL in
some cases, although it is unlikely, i.e.  memory hotplug.

Tested with ltp with "page_owner=0".

[1] http://lkml.kernel.org/r/20160519002809.GA10245@js1304-P5Q-DELUXE

[akpm@linux-foundation.org: fix build-breaking typos]
[arnd@arndb.de: fix build problems from lookup_page_ext]
  Link: http://lkml.kernel.org/r/6285269.2CksypHdYp@wuerfel
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1464023768-31025-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:25 +01:00
Pavel Tatashin
c1b3703b64 mm/page_alloc.c: broken deferred calculation
commit d135e57502 upstream.

In reset_deferred_meminit() we determine number of pages that must not
be deferred.  We initialize pages for at least 2G of memory, but also
pages for reserved memory in this node.

The reserved memory is determined in this function:
memblock_reserved_memory_within(), which operates over physical
addresses, and returns size in bytes.  However, reset_deferred_meminit()
assumes that that this function operates with pfns, and returns page
count.

The result is that in the best case machine boots slower than expected
due to initializing more pages than needed in single thread, and in the
worst case panics because fewer than needed pages are initialized early.

Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
Fixes: 864b9a393d ("mm: consider memblock reservations for deferred memory initialization sizing")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:25 +01:00
Ye Yin
001e9cbe1d netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
[ Upstream commit 2b5ec1a5f9 ]

When run ipvs in two different network namespace at the same host, and one
ipvs transport network traffic to the other network namespace ipvs.
'ipvs_property' flag will make the second ipvs take no effect. So we should
clear 'ipvs_property' when SKB network namespace changed.

Fixes: 621e84d6f3 ("dev: introduce skb_scrub_packet()")
Signed-off-by: Ye Yin <hustcat@gmail.com>
Signed-off-by: Wei Zhou <chouryzhou@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:24 +01:00
Dmitry V. Levin
834a5d880a uapi: fix linux/rds.h userspace compilation errors
[ Upstream commit feb0869d90 ]

Consistently use types from linux/types.h to fix the following
linux/rds.h userspace compilation errors:

/usr/include/linux/rds.h:106:2: error: unknown type name 'uint8_t'
  uint8_t name[32];
/usr/include/linux/rds.h:107:2: error: unknown type name 'uint64_t'
  uint64_t value;
/usr/include/linux/rds.h:117:2: error: unknown type name 'uint64_t'
  uint64_t next_tx_seq;
/usr/include/linux/rds.h:118:2: error: unknown type name 'uint64_t'
  uint64_t next_rx_seq;
/usr/include/linux/rds.h:121:2: error: unknown type name 'uint8_t'
  uint8_t transport[TRANSNAMSIZ];  /* null term ascii */
/usr/include/linux/rds.h:122:2: error: unknown type name 'uint8_t'
  uint8_t flags;
/usr/include/linux/rds.h:129:2: error: unknown type name 'uint64_t'
  uint64_t seq;
/usr/include/linux/rds.h:130:2: error: unknown type name 'uint32_t'
  uint32_t len;
/usr/include/linux/rds.h:135:2: error: unknown type name 'uint8_t'
  uint8_t flags;
/usr/include/linux/rds.h:139:2: error: unknown type name 'uint32_t'
  uint32_t sndbuf;
/usr/include/linux/rds.h:144:2: error: unknown type name 'uint32_t'
  uint32_t rcvbuf;
/usr/include/linux/rds.h:145:2: error: unknown type name 'uint64_t'
  uint64_t inum;
/usr/include/linux/rds.h:153:2: error: unknown type name 'uint64_t'
  uint64_t       hdr_rem;
/usr/include/linux/rds.h:154:2: error: unknown type name 'uint64_t'
  uint64_t       data_rem;
/usr/include/linux/rds.h:155:2: error: unknown type name 'uint32_t'
  uint32_t       last_sent_nxt;
/usr/include/linux/rds.h:156:2: error: unknown type name 'uint32_t'
  uint32_t       last_expected_una;
/usr/include/linux/rds.h:157:2: error: unknown type name 'uint32_t'
  uint32_t       last_seen_una;
/usr/include/linux/rds.h:164:2: error: unknown type name 'uint8_t'
  uint8_t  src_gid[RDS_IB_GID_LEN];
/usr/include/linux/rds.h:165:2: error: unknown type name 'uint8_t'
  uint8_t  dst_gid[RDS_IB_GID_LEN];
/usr/include/linux/rds.h:167:2: error: unknown type name 'uint32_t'
  uint32_t max_send_wr;
/usr/include/linux/rds.h:168:2: error: unknown type name 'uint32_t'
  uint32_t max_recv_wr;
/usr/include/linux/rds.h:169:2: error: unknown type name 'uint32_t'
  uint32_t max_send_sge;
/usr/include/linux/rds.h:170:2: error: unknown type name 'uint32_t'
  uint32_t rdma_mr_max;
/usr/include/linux/rds.h:171:2: error: unknown type name 'uint32_t'
  uint32_t rdma_mr_size;
/usr/include/linux/rds.h:212:9: error: unknown type name 'uint64_t'
 typedef uint64_t rds_rdma_cookie_t;
/usr/include/linux/rds.h:215:2: error: unknown type name 'uint64_t'
  uint64_t addr;
/usr/include/linux/rds.h:216:2: error: unknown type name 'uint64_t'
  uint64_t bytes;
/usr/include/linux/rds.h:221:2: error: unknown type name 'uint64_t'
  uint64_t cookie_addr;
/usr/include/linux/rds.h:222:2: error: unknown type name 'uint64_t'
  uint64_t flags;
/usr/include/linux/rds.h:228:2: error: unknown type name 'uint64_t'
  uint64_t  cookie_addr;
/usr/include/linux/rds.h:229:2: error: unknown type name 'uint64_t'
  uint64_t  flags;
/usr/include/linux/rds.h:234:2: error: unknown type name 'uint64_t'
  uint64_t flags;
/usr/include/linux/rds.h:240:2: error: unknown type name 'uint64_t'
  uint64_t local_vec_addr;
/usr/include/linux/rds.h:241:2: error: unknown type name 'uint64_t'
  uint64_t nr_local;
/usr/include/linux/rds.h:242:2: error: unknown type name 'uint64_t'
  uint64_t flags;
/usr/include/linux/rds.h:243:2: error: unknown type name 'uint64_t'
  uint64_t user_token;
/usr/include/linux/rds.h:248:2: error: unknown type name 'uint64_t'
  uint64_t  local_addr;
/usr/include/linux/rds.h:249:2: error: unknown type name 'uint64_t'
  uint64_t  remote_addr;
/usr/include/linux/rds.h:252:4: error: unknown type name 'uint64_t'
    uint64_t compare;
/usr/include/linux/rds.h:253:4: error: unknown type name 'uint64_t'
    uint64_t swap;
/usr/include/linux/rds.h:256:4: error: unknown type name 'uint64_t'
    uint64_t add;
/usr/include/linux/rds.h:259:4: error: unknown type name 'uint64_t'
    uint64_t compare;
/usr/include/linux/rds.h:260:4: error: unknown type name 'uint64_t'
    uint64_t swap;
/usr/include/linux/rds.h:261:4: error: unknown type name 'uint64_t'
    uint64_t compare_mask;
/usr/include/linux/rds.h:262:4: error: unknown type name 'uint64_t'
    uint64_t swap_mask;
/usr/include/linux/rds.h:265:4: error: unknown type name 'uint64_t'
    uint64_t add;
/usr/include/linux/rds.h:266:4: error: unknown type name 'uint64_t'
    uint64_t nocarry_mask;
/usr/include/linux/rds.h:269:2: error: unknown type name 'uint64_t'
  uint64_t flags;
/usr/include/linux/rds.h:270:2: error: unknown type name 'uint64_t'
  uint64_t user_token;
/usr/include/linux/rds.h:274:2: error: unknown type name 'uint64_t'
  uint64_t user_token;
/usr/include/linux/rds.h:275:2: error: unknown type name 'int32_t'
  int32_t  status;

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-21 09:21:23 +01:00
Dmitry V. Levin
16e7973f8e uapi: fix linux/rds.h userspace compilation error
[ Upstream commit 1786dbf370 ]

On the kernel side, sockaddr_storage is #define'd to
__kernel_sockaddr_storage.  Replacing struct sockaddr_storage with
struct __kernel_sockaddr_storage defined by <linux/socket.h> fixes
the following linux/rds.h userspace compilation error:

/usr/include/linux/rds.h:226:26: error: field 'dest_addr' has incomplete type
  struct sockaddr_storage dest_addr;

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-21 09:21:23 +01:00
Sasha Levin
f5e303d54a Revert "uapi: fix linux/rds.h userspace compilation errors"
This reverts commit ad50561ba7.

There was a mixup with the commit message for two upstream commit
that have the same subject line.

This revert will be followed by the two commits with proper commit
messages.

Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-21 09:21:23 +01:00
Chen Gang
a904ebe92a mm: add PHYS_PFN, use it in __phys_to_pfn()
commit 8f235d1a3e upstream.

__phys_to_pfn and __pfn_to_phys are symmetric, PHYS_PFN and PFN_PHYS are
semmetric:

 - y = (phys_addr_t)x << PAGE_SHIFT

 - y >> PAGE_SHIFT = (phys_add_t)x

 - (unsigned long)(y >> PAGE_SHIFT) = x

[akpm@linux-foundation.org: use macro arg name `x']
[arnd@arndb.de: include linux/pfn.h for PHYS_PFN definition]
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcin Nowakowski <marcin.nowakowski@mips.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-21 09:21:22 +01:00
Tony Lindgren
4fd669feac ARM: dts: Fix omap3 off mode pull defines
[ Upstream commit d97556c801 ]

We need to also have OFFPULLUDENABLE bit set to use the off mode pull values.
Otherwise the line is pulled down internally if no external pull exists.

This is has some documentation at:

http://processors.wiki.ti.com/index.php/Optimizing_OMAP35x_and_AM/DM37x_OFF_mode_PAD_configuration

Note that the value is still glitchy during off mode transitions as documented
in spz319f.pdf "Advisory 1.45". It's best to use external pulls instead of
relying on the internal ones for off mode and even then anything pulled up
will get driven down momentarily on off mode restore for GPIO banks other
than bank1.

Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-21 09:21:19 +01:00
Bart Van Assche
4063c20933 target/iscsi: Fix iSCSI task reassignment handling
commit 59b6986dbf upstream.

Allocate a task management request structure for all task management
requests, including task reassignment. This change avoids that the
se_tmr->response assignment dereferences an uninitialized se_tmr
pointer.

Reported-by: Moshe David <mdavid@infinidat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Moshe David <mdavid@infinidat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18 11:11:07 +01:00
Eric Dumazet
11fa335320 tcp/dccp: fix other lockdep splats accessing ireq_opt
[ Upstream commit 06f877d613 ]

In my first attempt to fix the lockdep splat, I forgot we could
enter inet_csk_route_req() with a freshly allocated request socket,
for which refcount has not yet been elevated, due to complex
SLAB_TYPESAFE_BY_RCU rules.

We either are in rcu_read_lock() section _or_ we own a refcount on the
request.

Correct RCU verb to use here is rcu_dereference_check(), although it is
not possible to prove we actually own a reference on a shared
refcount :/

In v2, I added ireq_opt_deref() helper and use in three places, to fix other
possible splats.

[   49.844590]  lockdep_rcu_suspicious+0xea/0xf3
[   49.846487]  inet_csk_route_req+0x53/0x14d
[   49.848334]  tcp_v4_route_req+0xe/0x10
[   49.850174]  tcp_conn_request+0x31c/0x6a0
[   49.851992]  ? __lock_acquire+0x614/0x822
[   49.854015]  tcp_v4_conn_request+0x5a/0x79
[   49.855957]  ? tcp_v4_conn_request+0x5a/0x79
[   49.858052]  tcp_rcv_state_process+0x98/0xdcc
[   49.859990]  ? sk_filter_trim_cap+0x2f6/0x307
[   49.862085]  tcp_v4_do_rcv+0xfc/0x145
[   49.864055]  ? tcp_v4_do_rcv+0xfc/0x145
[   49.866173]  tcp_v4_rcv+0x5ab/0xaf9
[   49.868029]  ip_local_deliver_finish+0x1af/0x2e7
[   49.870064]  ip_local_deliver+0x1b2/0x1c5
[   49.871775]  ? inet_del_offload+0x45/0x45
[   49.873916]  ip_rcv_finish+0x3f7/0x471
[   49.875476]  ip_rcv+0x3f1/0x42f
[   49.876991]  ? ip_local_deliver_finish+0x2e7/0x2e7
[   49.878791]  __netif_receive_skb_core+0x6d3/0x950
[   49.880701]  ? process_backlog+0x7e/0x216
[   49.882589]  __netif_receive_skb+0x1d/0x5e
[   49.884122]  process_backlog+0x10c/0x216
[   49.885812]  net_rx_action+0x147/0x3df

Fixes: a6ca7abe53 ("tcp/dccp: fix lockdep splat in inet_csk_route_req()")
Fixes: c92e8c02fe ("tcp/dccp: fix ireq->opt races")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <fengguang.wu@intel.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18 11:11:07 +01:00
Eric Dumazet
13eddc6756 tcp/dccp: fix ireq->opt races
[ Upstream commit c92e8c02fe ]

syzkaller found another bug in DCCP/TCP stacks [1]

For the reasons explained in commit ce1050089c ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.

Note the opt field is renamed to ireq_opt to ease grep games.

[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295

CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 print_address_description+0x73/0x250 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x25b/0x340 mm/kasan/report.c:409
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
 tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000

Allocated by task 3295:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
 __do_kmalloc mm/slab.c:3725 [inline]
 __kmalloc+0x162/0x760 mm/slab.c:3734
 kmalloc include/linux/slab.h:498 [inline]
 tcp_v4_save_options include/net/tcp.h:1962 [inline]
 tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 3306:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
 __cache_free mm/slab.c:3503 [inline]
 kfree+0xca/0x250 mm/slab.c:3820
 inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
 __sk_destruct+0xfd/0x910 net/core/sock.c:1560
 sk_destruct+0x47/0x80 net/core/sock.c:1595
 __sk_free+0x57/0x230 net/core/sock.c:1603
 sk_free+0x2a/0x40 net/core/sock.c:1614
 sock_put include/net/sock.h:1652 [inline]
 inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
 tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
 tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Fixes: e994b2f0fb ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18 11:11:06 +01:00
Cong Wang
4b27fe34a2 tun: call dev_get_valid_name() before register_netdevice()
[ Upstream commit 0ad646c81b ]

register_netdevice() could fail early when we have an invalid
dev name, in which case ->ndo_uninit() is not called. For tun
device, this is a problem because a timer etc. are already
initialized and it expects ->ndo_uninit() to clean them up.

We could move these initializations into a ->ndo_init() so
that register_netdevice() knows better, however this is still
complicated due to the logic in tun_detach().

Therefore, I choose to just call dev_get_valid_name() before
register_netdevice(), which is quicker and much easier to audit.
And for this specific case, it is already enough.

Fixes: 96442e4242 ("tuntap: choose the txq based on rxq")
Reported-by: Dmitry Alexeev <avekceeb@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18 11:11:05 +01:00
Eric Dumazet
71c4a0fc35 tcp: fix tcp_mtu_probe() vs highest_sack
[ Upstream commit 2b7cda9c35 ]

Based on SNMP values provided by Roman, Yuchung made the observation
that some crashes in tcp_sacktag_walk() might be caused by MTU probing.

Looking at tcp_mtu_probe(), I found that when a new skb was placed
in front of the write queue, we were not updating tcp highest sack.

If one skb is freed because all its content was copied to the new skb
(for MTU probing), then tp->highest_sack could point to a now freed skb.

Bad things would then happen, including infinite loops.

This patch renames tcp_highest_sack_combine() and uses it
from tcp_mtu_probe() to fix the bug.

Note that I also removed one test against tp->sacked_out,
since we want to replace tp->highest_sack regardless of whatever
condition, since keeping a stale pointer to freed skb is a recipe
for disaster.

Fixes: a47e5a988a ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Roman Gushchin <guro@fb.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18 11:11:05 +01:00
Takashi Iwai
1e54b5f721 ALSA: seq: Avoid invalid lockdep class warning
commit 3510c7aa06 upstream.

The recent fix for adding rwsem nesting annotation was using the given
"hop" argument as the lock subclass key.  Although the idea itself
works, it may trigger a kernel warning like:
  BUG: looking up invalid subclass: 8
  ....
since the lockdep has a smaller number of subclasses (8) than we
currently allow for the hops there (10).

The current definition is merely a sanity check for avoiding the too
deep delivery paths, and the 8 hops are already enough.  So, as a
quick fix, just follow the max hops as same as the max lockdep
subclasses.

Fixes: 1f20f9ff57 ("ALSA: seq: Fix nested rwsem annotation for lockdep splat")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15 17:13:12 +01:00
Peter Zijlstra
ad8c619750 x86/uaccess, sched/preempt: Verify access_ok() context
commit 7c4788950b upstream.

I recently encountered wreckage because access_ok() was used where it
should not be, add an explicit WARN when access_ok() is used wrongly.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[add include/preempt.h to fix build error - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15 17:13:11 +01:00
Enrico Mioso
75f82a703b cdc_ncm: Set NTB format again after altsetting switch for Huawei devices
commit 2b02c20ce0 upstream.

Some firmwares in Huawei E3372H devices have been observed to switch back
to NTB 32-bit format after altsetting switch.
This patch implements a driver flag to check for the device settings and
set NTB format to 16-bit again if needed.
The flag has been activated for devices controlled by the huawei_cdc_ncm.c
driver.

V1->V2:
- fixed broken error checks
- some corrections to the commit message
V2->V3:
- variable name changes, to clarify what's happening
- check (and possibly set) the NTB format later in the common bind code path

Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
Reported-and-tested-by: Christian Panton <christian@panton.org>
Reviewed-by: Bjørn Mork <bjorn@mork.no>
CC: Bjørn Mork <bjorn@mork.no>
CC: Christian Panton <christian@panton.org>
CC: linux-usb@vger.kernel.org
CC: netdev@vger.kernel.org
CC: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Porto Rio <porto.rio@gmx.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15 17:13:11 +01:00
Volodymyr Bendiuga
ca2090aa58 phy: increase size of MII_BUS_ID_SIZE and bus_id
[ Upstream commit 4567d686f5 ]

Some bus names are pretty long and do not fit into
17 chars. Increase therefore MII_BUS_ID_SIZE and
phy_fixup.bus_id to larger number. Now mii_bus.id
can host larger name.

Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@gmail.com>
Signed-off-by: Magnus Öberg <magnus.oberg@westermo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15 17:13:10 +01:00
Baruch Siach
7d74eecca9 spi: uapi: spidev: add missing ioctl header
commit a2b4a79b88 upstream.

The SPI_IOC_MESSAGE() macro references _IOC_SIZEBITS. Add linux/ioctl.h
to make sure this macro is defined. This fixes the following build
failure of lcdproc with the musl libc:

In file included from .../sysroot/usr/include/sys/ioctl.h:7:0,
                 from hd44780-spi.c:31:
hd44780-spi.c: In function 'spi_transfer':
hd44780-spi.c:89:24: error: '_IOC_SIZEBITS' undeclared (first use in this function)
  status = ioctl(p->fd, SPI_IOC_MESSAGE(1), &xfer);
                        ^

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 09:40:49 +01:00
David Howells
8a004caec1 KEYS: Fix race between updating and finding a negative key
commit 363b02dab0 upstream.

Consolidate KEY_FLAG_INSTANTIATED, KEY_FLAG_NEGATIVE and the rejection
error into one field such that:

 (1) The instantiation state can be modified/read atomically.

 (2) The error can be accessed atomically with the state.

 (3) The error isn't stored unioned with the payload pointers.

This deals with the problem that the state is spread over three different
objects (two bits and a separate variable) and reading or updating them
atomically isn't practical, given that not only can uninstantiated keys
change into instantiated or rejected keys, but rejected keys can also turn
into instantiated keys - and someone accessing the key might not be using
any locking.

The main side effect of this problem is that what was held in the payload
may change, depending on the state.  For instance, you might observe the
key to be in the rejected state.  You then read the cached error, but if
the key semaphore wasn't locked, the key might've become instantiated
between the two reads - and you might now have something in hand that isn't
actually an error code.

The state is now KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE or a negative error
code if the key is negatively instantiated.  The key_is_instantiated()
function is replaced with key_is_positive() to avoid confusion as negative
keys are also 'instantiated'.

Additionally, barriering is included:

 (1) Order payload-set before state-set during instantiation.

 (2) Order state-read before payload-read when using the key.

Further separate barriering is necessary if RCU is being used to access the
payload content after reading the payload pointers.

Fixes: 146aa8b145 ("KEYS: Merge the type-specific data with the payload data")
Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-27 10:23:18 +02:00
Jan Luebbe
b178c94efd bus: mbus: fix window size calculation for 4GB windows
commit 2bbbd96357 upstream.

At least the Armada XP SoC supports 4GB on a single DRAM window. Because
the size register values contain the actual size - 1, the MSB is set in
that case. For example, the SDRAM window's control register's value is
0xffffffe1 for 4GB (bits 31 to 24 contain the size).

The MBUS driver reads back each window's size from registers and
calculates the actual size as (control_reg | ~DDR_SIZE_MASK) + 1, which
overflows for 32 bit values, resulting in other miscalculations further
on (a bad RAM window for the CESA crypto engine calculated by
mvebu_mbus_setup_cpu_target_nooverlap() in my case).

This patch changes the type in 'struct mbus_dram_window' from u32 to
u64, which allows us to keep using the same register calculation code in
most MBUS-using drivers (which calculate ->size - 1 again).

Fixes: fddddb52a6 ("bus: introduce an Marvell EBU MBus driver")
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-27 10:23:17 +02:00