Commit Graph

277 Commits

Author SHA1 Message Date
Daniel Borkmann
ecb6af1fce bpf: Add kconfig knob for disabling unpriv bpf by default
commit 08389d8882 upstream.

Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.

This still allows a transition of 2 -> {0,1} through an admin. Similarly,
this also still keeps 1 -> {1} behavior intact, so that once set to permanently
disabled, it cannot be undone aside from a reboot.

We've also added extra2 with max of 2 for the procfs handler, so that an admin
still has a chance to toggle between 0 <-> 2.

Either way, as an additional alternative, applications can make use of CAP_BPF
that we added a while ago.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net
[fllinden@amazon.com: backported to 4.9]
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-16 12:42:13 +09:00
Lorenz Bauer
bf4659d9f7 bpf: Prevent increasing bpf_jit_limit above max
[ Upstream commit fadb7ff1a6 ]

Restrict bpf_jit_limit to the maximum supported by the arch's JIT.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211014142554.53120-4-lmb@cloudflare.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-16 12:12:54 +09:00
Tatsuhiko Yasumatsu
358abf34da bpf: Fix integer overflow in prealloc_elems_and_freelist()
[ Upstream commit 30e29a9a2b ]

In prealloc_elems_and_freelist(), the multiplication to calculate the
size passed to bpf_map_area_alloc() could lead to an integer overflow.
As a result, out-of-bounds write could occur in pcpu_freelist_populate()
as reported by KASAN:

[...]
[   16.968613] BUG: KASAN: slab-out-of-bounds in pcpu_freelist_populate+0xd9/0x100
[   16.969408] Write of size 8 at addr ffff888104fc6ea0 by task crash/78
[   16.970038]
[   16.970195] CPU: 0 PID: 78 Comm: crash Not tainted 5.15.0-rc2+ #1
[   16.970878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[   16.972026] Call Trace:
[   16.972306]  dump_stack_lvl+0x34/0x44
[   16.972687]  print_address_description.constprop.0+0x21/0x140
[   16.973297]  ? pcpu_freelist_populate+0xd9/0x100
[   16.973777]  ? pcpu_freelist_populate+0xd9/0x100
[   16.974257]  kasan_report.cold+0x7f/0x11b
[   16.974681]  ? pcpu_freelist_populate+0xd9/0x100
[   16.975190]  pcpu_freelist_populate+0xd9/0x100
[   16.975669]  stack_map_alloc+0x209/0x2a0
[   16.976106]  __sys_bpf+0xd83/0x2ce0
[...]

The possibility of this overflow was originally discussed in [0], but
was overlooked.

Fix the integer overflow by changing elem_size to u64 from u32.

  [0] https://lore.kernel.org/bpf/728b238e-a481-eb50-98e9-b0f430ab01e7@gmail.com/

Fixes: 557c0c6e7d ("bpf: convert stackmap to pre-allocation")
Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210930135545.173698-1-th.yasumatsu@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-16 12:00:21 +09:00
Bui Quang Minh
762d7f193e bpf: Check for integer overflow when using roundup_pow_of_two()
[ Upstream commit 6183f4d3a0 ]

On 32-bit architecture, roundup_pow_of_two() can return 0 when the argument
has upper most bit set due to resulting 1UL << 32. Add a check for this case.

Fixes: d5a3b1f691 ("bpf: introduce BPF_MAP_TYPE_STACK_TRACE")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210127063653.3576-1-minhquangbui99@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-16 10:35:58 +09:00
Daniel Borkmann
45578d884c bpf: Fix buggy rsh min/max bounds tracking
[ no upstream commit ]

Fix incorrect bounds tracking for RSH opcode. Commit f23cc643f9 ("bpf: fix
range arithmetic for bpf map access") had a wrong assumption about min/max
bounds. The new dst_reg->min_value needs to be derived by right shifting the
max_val bounds, not min_val, and likewise new dst_reg->max_value needs to be
derived by right shifting the min_val bounds, not max_val. Later stable kernels
than 4.9 are not affected since bounds tracking was overall reworked and they
already track this similarly as in the fix.

Fixes: f23cc643f9 ("bpf: fix range arithmetic for bpf map access")
Reported-by: Ryota Shiga (Flatt Security)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-16 10:32:46 +09:00
Thomas Gleixner
f0a49cf5e1 bpf: Remove recursion prevention from rcu free callback
[ Upstream commit 8a37963c7a ]

If an element is freed via RCU then recursion into BPF instrumentation
functions is not a concern. The element is already detached from the map
and the RCU callback does not hold any locks on which a kprobe, perf event
or tracepoint attached BPF program could deadlock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145643.259118710@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-16 09:15:40 +09:00
Greg Kroah-Hartman
d2f06c7f50 bpf: Explicitly memset the bpf_attr structure
commit 8096f22942 upstream.

For the bpf syscall, we are relying on the compiler to properly zero out
the bpf_attr union that we copy userspace data into. Unfortunately that
doesn't always work properly, padding and other oddities might not be
correctly zeroed, and in some tests odd things have been found when the
stack is pre-initialized to other values.

Fix this by explicitly memsetting the structure to 0 before using it.

Reported-by: Maciej Żenczykowski <maze@google.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Alexander Potapenko <glider@google.com>
Reported-by: Alistair Delva <adelva@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://android-review.googlesource.com/c/kernel/common/+/1235490
Link: https://lore.kernel.org/bpf/20200320094813.GA421650@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-15 17:12:39 +09:00
Daniel Borkmann
d0e1753e08 bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K
[ Upstream commit fdadd04931 ]

Michael and Sandipan report:

  Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
  JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
  and is adjusted again at init time if MODULES_VADDR is defined.

  For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
  the compile-time default at boot-time, which is 0x9c400000 when
  using 64K page size. This overflows the signed 32-bit bpf_jit_limit
  value:

  root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
  -1673527296

  and can cause various unexpected failures throughout the network
  stack. In one case `strace dhclient eth0` reported:

  setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
             16) = -1 ENOTSUPP (Unknown error 524)

  and similar failures can be seen with tools like tcpdump. This doesn't
  always reproduce however, and I'm not sure why. The more consistent
  failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
  host would time out on systemd/netplan configuring a virtio-net NIC
  with no noticeable errors in the logs.

Given this and also given that in near future some architectures like
arm64 will have a custom area for BPF JIT image allocations we should
get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
so therefore add another overridable bpf_jit_alloc_exec_limit() helper
function which returns the possible size of the memory area for deriving
the default heuristic in bpf_jit_charge_init().

Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
JIT memory provider, and therefore in case archs implement their custom
module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.

Additionally, for archs supporting large page sizes, we should change
the sysctl to be handled as long to not run into sysctl restrictions
in future.

Fixes: ede95a63b5 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
Reported-by: Sandipan Das <sandipan@linux.ibm.com>
Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-15 14:20:03 +09:00
Daniel Borkmann
7bb4d804f2 bpf: add bpf_jit_limit knob to restrict unpriv allocations
commit ede95a63b5 upstream.

Rick reported that the BPF JIT could potentially fill the entire module
space with BPF programs from unprivileged users which would prevent later
attempts to load normal kernel modules or privileged BPF programs, for
example. If JIT was enabled but unsuccessful to generate the image, then
before commit 290af86629 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
we would always fall back to the BPF interpreter. Nowadays in the case
where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort
with a failure since the BPF interpreter was compiled out.

Add a global limit and enforce it for unprivileged users such that in case
of BPF interpreter compiled out we fail once the limit has been reached
or we fall back to BPF interpreter earlier w/o using module mem if latter
was compiled in. In a next step, fair share among unprivileged users can
be resolved in particular for the case where we would fail hard once limit
is reached.

Fixes: 290af86629 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
Fixes: 0a14842f5a ("net: filter: Just In Time compiler for x86-64")
Co-Developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[bwh: Backported to 4.9: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-15 14:18:02 +09:00
Daniel Borkmann
eaf25147be bpf: get rid of pure_initcall dependency to enable jits
commit fa9dd599b4 upstream.

Having a pure_initcall() callback just to permanently enable BPF
JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave
a small race window in future where JIT is still disabled on boot.
Since we know about the setting at compilation time anyway, just
initialize it properly there. Also consolidate all the individual
bpf_jit_enable variables into a single one and move them under one
location. Moreover, don't allow for setting unspecified garbage
values on them.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[bwh: Backported to 4.9 as dependency of commit 2e4a30983b
 "bpf: restrict access to core bpf sysctls":
 - Drop change in arch/mips/net/ebpf_jit.c
 - Drop change to bpf_jit_kallsyms
 - Adjust filenames, context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-15 14:17:59 +09:00
Valdis Klētnieks
b2492a4754 bpf: silence warning messages in core
[ Upstream commit aee450cbe4 ]

Compiling kernel/bpf/core.c with W=1 causes a flood of warnings:

kernel/bpf/core.c:1198:65: warning: initialized field overwritten [-Woverride-init]
 1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
      |                                                                 ^~~~
kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
 1087 |  INSN_3(ALU, ADD,  X),   \
      |  ^~~~~~
kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
 1202 |   BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
      |   ^~~~~~~~~~~~
kernel/bpf/core.c:1198:65: note: (near initialization for 'public_insntable[12]')
 1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
      |                                                                 ^~~~
kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
 1087 |  INSN_3(ALU, ADD,  X),   \
      |  ^~~~~~
kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
 1202 |   BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
      |   ^~~~~~~~~~~~

98 copies of the above.

The attached patch silences the warnings, because we *know* we're overwriting
the default initializer. That leaves bpf/core.c with only 6 other warnings,
which become more visible in comparison.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-15 14:00:42 +09:00
Alexei Starovoitov
c9eae09f78 bpf: convert htab map to hlist_nulls
commit 4fe8435909 upstream.

when all map elements are pre-allocated one cpu can delete and reuse htab_elem
while another cpu is still walking the hlist. In such case the lookup may
miss the element. Convert hlist to hlist_nulls to avoid such scenario.
When bucket lock is taken there is no need to take such precautions,
so only convert map_lookup and map_get_next to nulls.
The race window is extremely small and only reproducible with explicit
udelay() inside lookup_nulls_elem_raw()

Similar to hlist add hlist_nulls_for_each_entry_safe() and
hlist_nulls_entry_safe() helpers.

Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Reported-by: Jonathan Perry <jonperry@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-15 12:49:02 +09:00
Alexei Starovoitov
7c405304bf bpf: fix struct htab_elem layout
commit 9f691549f7 upstream.

when htab_elem is removed from the bucket list the htab_elem.hash_node.next
field should not be overridden too early otherwise we have a tiny race window
between lookup and delete.
The bug was discovered by manual code analysis and reproducible
only with explicit udelay() in lookup_elem_raw().

Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Reported-by: Jonathan Perry <jonperry@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-15 12:49:00 +09:00
Alexei Starovoitov
59a5ce2c52 bpf: check pending signals while verifying programs
[ Upstream commit c3494801cd ]

Malicious user space may try to force the verifier to use as much cpu
time and memory as possible. Hence check for pending signals
while verifying the program.
Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN,
since the kernel has to release the resources used for program verification.

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-15 10:39:21 +09:00
Alexei Starovoitov
25aa9d5450 bpf: Prevent memory disambiguation attack
commit af86ca4e30 upstream.

Detect code patterns where malicious 'speculative store bypass' can be used
and sanitize such patterns.

 39: (bf) r3 = r10
 40: (07) r3 += -216
 41: (79) r8 = *(u64 *)(r7 +0)   // slow read
 42: (7a) *(u64 *)(r10 -72) = 0  // verifier inserts this instruction
 43: (7b) *(u64 *)(r8 +0) = r3   // this store becomes slow due to r8
 44: (79) r1 = *(u64 *)(r6 +0)   // cpu speculatively executes this load
 45: (71) r2 = *(u8 *)(r1 +0)    // speculatively arbitrary 'load byte'
                                 // is now sanitized

Above code after x86 JIT becomes:
 e5: mov    %rbp,%rdx
 e8: add    $0xffffffffffffff28,%rdx
 ef: mov    0x0(%r13),%r14
 f3: movq   $0x0,-0x48(%rbp)
 fb: mov    %rdx,0x0(%r14)
 ff: mov    0x0(%rbx),%rdi
103: movzbq 0x0(%rdi),%rsi

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 4.9:
 - Add bpf_verifier_env parameter to check_stack_write()
 - Look up stack slot_types with state->stack_slot_type[] rather than
   state->stack[].slot_type[]
 - Drop bpf_verifier_env argument to verbose()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-15 10:06:36 +09:00
Ben Hutchings
d9c24fd382 bpf/verifier: Pass instruction index to check_mem_access() and check_xadd()
Extracted from commit 31fd85816d "bpf: permits narrower load from
bpf program context fields".

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-15 10:06:35 +09:00
Ben Hutchings
de188dc042 bpf/verifier: Add spi variable to check_stack_write()
Extracted from commit dc503a8ad9 "bpf/verifier: track liveness for
pruning".

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-15 10:06:34 +09:00
Jakub Kicinski
cd057e2336 bpf: fix references to free_bpf_prog_info() in comments
[ Upstream commit ab7f5bf092 ]

Comments in the verifier refer to free_bpf_prog_info() which
seems to have never existed in tree.  Replace it with
free_used_maps().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-12 16:43:17 +09:00
Greg Kroah-Hartman
c462abbf77 Merge 4.9.99 into android-4.9
Changes in 4.9.99
	perf/core: Fix the perf_cpu_time_max_percent check
	percpu: include linux/sched.h for cond_resched()
	bpf: map_get_next_key to return first key on NULL
	arm/arm64: KVM: Add PSCI version selection API
	crypto: talitos - fix IPsec cipher in length
	serial: imx: ensure UCR3 and UFCR are setup correctly
	USB: serial: option: Add support for Quectel EP06
	ALSA: pcm: Check PCM state at xfern compat ioctl
	ALSA: seq: Fix races at MIDI encoding in snd_virmidi_output_trigger()
	ALSA: aloop: Mark paused device as inactive
	ALSA: aloop: Add missing cable lock to ctl API callbacks
	tracepoint: Do not warn on ENOMEM
	Input: leds - fix out of bound access
	Input: atmel_mxt_ts - add touchpad button mapping for Samsung Chromebook Pro
	xfs: prevent creating negative-sized file via INSERT_RANGE
	RDMA/cxgb4: release hw resources on device removal
	RDMA/ucma: Allow resolving address w/o specifying source address
	RDMA/mlx5: Protect from shift operand overflow
	NET: usb: qmi_wwan: add support for ublox R410M PID 0x90b2
	IB/mlx5: Use unlimited rate when static rate is not supported
	IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
	drm/vmwgfx: Fix a buffer object leak
	drm/bridge: vga-dac: Fix edid memory leak
	test_firmware: fix setting old custom fw path back on exit, second try
	USB: serial: visor: handle potential invalid device configuration
	USB: Accept bulk endpoints with 1024-byte maxpacket
	USB: serial: option: reimplement interface masking
	USB: serial: option: adding support for ublox R410M
	usb: musb: host: fix potential NULL pointer dereference
	usb: musb: trace: fix NULL pointer dereference in musb_g_tx()
	platform/x86: asus-wireless: Fix NULL pointer dereference
	s390/facilites: use stfle_fac_list array size for MAX_FACILITY_BIT
	Linux 4.9.99

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-05-09 12:28:25 +02:00
Teng Qin
fcbc8d0e7d bpf: map_get_next_key to return first key on NULL
commit 8fe4592438 upstream.

When iterating through a map, we need to find a key that does not exist
in the map so map_get_next_key will give us the first key of the map.
This often requires a lot of guessing in production systems.

This patch makes map_get_next_key return the first key when the key
pointer in the parameter is NULL.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chenbo Feng <fengc@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-09 09:50:19 +02:00
Greg Kroah-Hartman
bb94f9d8f5 Merge 4.9.91 into android-4.9
Changes in 4.9.91
	MIPS: ralink: Remove ralink_halt()
	iio: st_pressure: st_accel: pass correct platform data to init
	ALSA: usb-audio: Fix parsing descriptor of UAC2 processing unit
	ALSA: aloop: Sync stale timer before release
	ALSA: aloop: Fix access to not-yet-ready substream via cable
	ALSA: hda/realtek - Always immediately update mute LED with pin VREF
	mmc: dw_mmc: fix falling from idmac to PIO mode when dw_mci_reset occurs
	PCI: Add function 1 DMA alias quirk for Highpoint RocketRAID 644L
	ahci: Add PCI-id for the Highpoint Rocketraid 644L card
	clk: bcm2835: Fix ana->maskX definitions
	clk: bcm2835: Protect sections updating shared registers
	clk: sunxi-ng: a31: Fix CLK_OUT_* clock ops
	Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174
	libata: fix length validation of ATAPI-relayed SCSI commands
	libata: remove WARN() for DMA or PIO command without data
	libata: don't try to pass through NCQ commands to non-NCQ devices
	libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs
	libata: disable LPM for Crucial BX100 SSD 500GB drive
	libata: Enable queued TRIM for Samsung SSD 860
	libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs
	libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versions
	libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version
	nfsd: remove blocked locks on client teardown
	mm/vmalloc: add interfaces to free unmapped page table
	x86/mm: implement free pmd/pte page interfaces
	mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
	mm/thp: do not wait for lock_page() in deferred_split_scan()
	mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
	drm/vmwgfx: Fix a destoy-while-held mutex problem.
	drm/radeon: Don't turn off DP sink when disconnected
	drm: udl: Properly check framebuffer mmap offsets
	acpi, numa: fix pxm to online numa node associations
	ACPI / watchdog: Fix off-by-one error at resource assignment
	libnvdimm, {btt, blk}: do integrity setup before add_disk()
	brcmfmac: fix P2P_DEVICE ethernet address generation
	rtlwifi: rtl8723be: Fix loss of signal
	tracing: probeevent: Fix to support minus offset from symbol
	mtdchar: fix usage of mtd_ooblayout_ecc()
	mtd: nand: fsl_ifc: Fix nand waitfunc return value
	mtd: nand: fsl_ifc: Fix eccstat array overflow for IFC ver >= 2.0.0
	mtd: nand: fsl_ifc: Read ECCSTAT0 and ECCSTAT1 registers for IFC 2.0
	staging: ncpfs: memory corruption in ncp_read_kernel()
	can: ifi: Repair the error handling
	can: ifi: Check core revision upon probe
	can: cc770: Fix stalls on rt-linux, remove redundant IRQ ack
	can: cc770: Fix queue stall & dropped RTR reply
	can: cc770: Fix use after free in cc770_tx_interrupt()
	tty: vt: fix up tabstops properly
	selftests/x86/ptrace_syscall: Fix for yet more glibc interference
	kvm/x86: fix icebp instruction handling
	x86/build/64: Force the linker to use 2MB page size
	x86/boot/64: Verify alignment of the LOAD segment
	x86/entry/64: Don't use IST entry for #BP stack
	perf/x86/intel/uncore: Fix Skylake UPI event format
	perf stat: Fix CVS output format for non-supported counters
	perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period()
	perf/x86/intel/uncore: Fix multi-domain PCI CHA enumeration bug on Skylake servers
	iio: ABI: Fix name of timestamp sysfs file
	staging: lustre: ptlrpc: kfree used instead of kvfree
	selftests, x86, protection_keys: fix wrong offset in siginfo
	selftests/x86/protection_keys: Fix syscall NR redefinition warnings
	signal/testing: Don't look for __SI_FAULT in userspace
	x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'
	selftests: x86: sysret_ss_attrs doesn't build on a PIE build
	kbuild: disable clang's default use of -fmerge-all-constants
	bpf: skip unnecessary capability check
	bpf, x64: increase number of passes
	Linux 4.9.91

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-03-29 11:32:39 +02:00
Chenbo Feng
3eb88807b2 bpf: skip unnecessary capability check
commit 0fa4fe85f4 upstream.

The current check statement in BPF syscall will do a capability check
for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This
code path will trigger unnecessary security hooks on capability checking
and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN
access. This can be resolved by simply switch the order of the statement
and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is
allowed.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:26 +02:00
Al Viro
7dc12f7d2f BACKPORT: fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'"
Descriptor table is a shared object; it's not a place where you can
stick temporary references to files, especially when we don't need
an opened file at all.

Cc: stable@vger.kernel.org # v4.14
Fixes: 98589a0998 ("netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chenbo Feng <fengc@google.com>

Removed the code related to function bpf_prog_get_ok() since it is not
exsit in current android tree.
(cherry picked from commit 040ee69226)

Change-Id: If7a602128cdea4b4b50c8effb215c9bca7449515
2018-03-14 11:39:19 -07:00
Shmulik Ladkani
7e3c72f4c7 UPSTREAM: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
Commit 2c16d60332 ("netfilter: xt_bpf: support ebpf") introduced
support for attaching an eBPF object by an fd, with the
'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
IPT_SO_SET_REPLACE call.

However this breaks subsequent iptables calls:

 # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
 # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
 iptables: Invalid argument. Run `dmesg' for more information.

That's because iptables works by loading existing rules using
IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
the replacement set.

However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
(from the initial "iptables -m bpf" invocation) - so when 2nd invocation
occurs, userspace passes a bogus fd number, which leads to
'bpf_mt_check_v1' to fail.

One suggested solution [1] was to hack iptables userspace, to perform a
"entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
process-local fd per every 'xt_bpf_info_v1' entry seen.

However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.

This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
'.fd' and instead perform an in-kernel lookup for the bpf object given
the provided '.path'.

It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
expected to provide the path of the pinned object.

Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.

References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
            [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2

Reported-by: Rafael Buchbinder <rafi@rbk.ms>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Chenbo Feng <fengc@google.com>
(cherry picked from commit 98589a0998)

Change-Id: Ia0d15a76823cca3afb38786a3d2c25c13ccf941d
2018-03-14 11:39:19 -07:00
Greg Kroah-Hartman
a2904940bd Merge 4.9.87 into android-4.9
Changes in 4.9.87
	tpm: st33zp24: fix potential buffer overruns caused by bit glitches on the bus
	tpm_i2c_infineon: fix potential buffer overruns caused by bit glitches on the bus
	tpm_i2c_nuvoton: fix potential buffer overruns caused by bit glitches on the bus
	tpm_tis: fix potential buffer overruns caused by bit glitches on the bus
	tpm: constify transmit data pointers
	tpm_tis_spi: Use DMA-safe memory for SPI transfers
	tpm-dev-common: Reject too short writes
	ALSA: usb-audio: Add a quirck for B&W PX headphones
	ALSA: hda: Add a power_save blacklist
	ALSA: hda - Fix pincfg at resume on Lenovo T470 dock
	timers: Forward timer base before migrating timers
	parisc: Fix ordering of cache and TLB flushes
	cpufreq: s3c24xx: Fix broken s3c_cpufreq_init()
	dax: fix vma_is_fsdax() helper
	x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend
	x86/platform/intel-mid: Handle Intel Edison reboot correctly
	media: m88ds3103: don't call a non-initalized function
	nospec: Allow index argument to have const-qualified type
	ARM: mvebu: Fix broken PL310_ERRATA_753970 selects
	ARM: kvm: fix building with gcc-8
	KVM: mmu: Fix overlap between public and private memslots
	KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
	KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
	PCI/ASPM: Deal with missing root ports in link state handling
	dm io: fix duplicate bio completion due to missing ref count
	ARM: dts: LogicPD SOM-LV: Fix I2C1 pinmux
	ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux
	x86/mm: Give each mm TLB flush generation a unique ID
	x86/speculation: Use Indirect Branch Prediction Barrier in context switch
	md: only allow remove_and_add_spares when no sync_thread running.
	netlink: put module reference if dump start fails
	x86/apic/vector: Handle legacy irq data correctly
	bridge: check brport attr show in brport_show
	fib_semantics: Don't match route with mismatching tclassid
	hdlc_ppp: carrier detect ok, don't turn off negotiation
	ipv6 sit: work around bogus gcc-8 -Wrestrict warning
	net: fix race on decreasing number of TX queues
	net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68
	netlink: ensure to loop over all netns in genlmsg_multicast_allns()
	ppp: prevent unregistered channels from connecting to PPP units
	udplite: fix partial checksum initialization
	sctp: fix dst refcnt leak in sctp_v4_get_dst
	mlxsw: spectrum_switchdev: Check success of FDB add operation
	net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT
	tcp: Honor the eor bit in tcp_mtu_probe
	rxrpc: Fix send in rxrpc_send_data_packet()
	tcp_bbr: better deal with suboptimal GSO
	sctp: fix dst refcnt leak in sctp_v6_get_dst()
	s390/qeth: fix underestimated count of buffer elements
	s390/qeth: fix SETIP command handling
	s390/qeth: fix overestimated count of buffer elements
	s390/qeth: fix IP removal on offline cards
	s390/qeth: fix double-free on IP add/remove race
	s390/qeth: fix IP address lookup for L3 devices
	s390/qeth: fix IPA command submission race
	sctp: verify size of a new chunk in _sctp_make_chunk()
	net: mpls: Pull common label check into helper
	mpls, nospec: Sanitize array index in mpls_label_ok()
	bpf: fix wrong exposure of map_flags into fdinfo for lpm
	bpf: fix mlock precharge on arraymaps
	bpf, x64: implement retpoline for tail call
	bpf, arm64: fix out of bounds access in tail call
	bpf: add schedule points in percpu arrays management
	bpf, ppc64: fix out of bounds access in tail call
	btrfs: preserve i_mode if __btrfs_set_acl() fails
	Linux 4.9.87

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-03-11 17:38:31 +01:00
Eric Dumazet
2a8bc5316a bpf: add schedule points in percpu arrays management
[ upstream commit 32fff239de ]

syszbot managed to trigger RCU detected stalls in
bpf_array_free_percpu()

It takes time to allocate a huge percpu map, but even more time to free
it.

Since we run in process context, use cond_resched() to yield cpu if
needed.

Fixes: a10423b87a ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11 16:21:35 +01:00
Daniel Borkmann
422baf61d4 bpf: fix mlock precharge on arraymaps
[ upstream commit 9c2d63b843 ]

syzkaller recently triggered OOM during percpu map allocation;
while there is work in progress by Dennis Zhou to add __GFP_NORETRY
semantics for percpu allocator under pressure, there seems also a
missing bpf_map_precharge_memlock() check in array map allocation.

Given today the actual bpf_map_charge_memlock() happens after the
find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock()
is there to bail out early before we go and do the map setup work
when we find that we hit the limits anyway. Therefore add this for
array map as well.

Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Fixes: a10423b87a ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11 16:21:34 +01:00
Daniel Borkmann
816cfeb77c bpf: fix wrong exposure of map_flags into fdinfo for lpm
[ upstream commit a316338cb7 ]

trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
attr->map_flags, since it does not support preallocation yet. We
check the flag, but we never copy the flag into trie->map.map_flags,
which is later on exposed into fdinfo and used by loaders such as
iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
whether a pinned map has the same spec as the one from the BPF obj
file and if not, bails out, which is currently the case for lpm
since it exposes always 0 as flags.

Also copy over flags in array_map_alloc() and stack_map_alloc().
They always have to be 0 right now, but we should make sure to not
miss to copy them over at a later point in time when we add actual
flags for them to use.

Fixes: b95a5c4db0 ("bpf: add a longest prefix match trie map implementation")
Reported-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11 16:21:34 +01:00
Sami Tolvanen
417637a2d9 bpf: fix function type for __bpf_prog_run
Bug: 67506682
Change-Id: I096a470c65a2a1867c51da9a33843ae23bf5e547
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2018-02-28 15:09:58 -08:00
Greg Kroah-Hartman
71f1469722 Merge 4.9.79 into android-4.9
Changes in 4.9.79
	x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
	orangefs: use list_for_each_entry_safe in purge_waiting_ops
	orangefs: initialize op on loop restart in orangefs_devreq_read
	usbip: prevent vhci_hcd driver from leaking a socket pointer address
	usbip: Fix implicit fallthrough warning
	usbip: Fix potential format overflow in userspace tools
	can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
	can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
	KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
	Prevent timer value 0 for MWAITX
	drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
	drivers: base: cacheinfo: fix boot error message when acpi is enabled
	mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
	hwpoison, memcg: forcibly uncharge LRU pages
	cma: fix calculation of aligned offset
	mm, page_alloc: fix potential false positive in __zone_watermark_ok
	ipc: msg, make msgrcv work with LONG_MIN
	ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
	ACPICA: Namespace: fix operand cache leak
	netfilter: nfnetlink_cthelper: Add missing permission checks
	netfilter: xt_osf: Add missing permission checks
	reiserfs: fix race in prealloc discard
	reiserfs: don't preallocate blocks for extended attributes
	fs/fcntl: f_setown, avoid undefined behaviour
	scsi: libiscsi: fix shifting of DID_REQUEUE host byte
	Revert "module: Add retpoline tag to VERMAGIC"
	mm: fix 100% CPU kswapd busyloop on unreclaimable nodes
	Input: trackpoint - force 3 buttons if 0 button is reported
	orangefs: fix deadlock; do not write i_size in read_iter
	um: link vmlinux with -no-pie
	vsyscall: Fix permissions for emulate mode with KAISER/PTI
	eventpoll.h: add missing epoll event masks
	dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
	ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
	ipv6: fix udpv6 sendmsg crash caused by too small MTU
	ipv6: ip6_make_skb() needs to clear cork.base.dst
	lan78xx: Fix failure in USB Full Speed
	net: igmp: fix source address check for IGMPv3 reports
	net: qdisc_pkt_len_init() should be more robust
	net: tcp: close sock if net namespace is exiting
	pppoe: take ->needed_headroom of lower device into account on xmit
	r8169: fix memory corruption on retrieval of hardware statistics.
	sctp: do not allow the v4 socket to bind a v4mapped v6 address
	sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
	tipc: fix a memory leak in tipc_nl_node_get_link()
	vmxnet3: repair memory leak
	net: Allow neigh contructor functions ability to modify the primary_key
	ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
	ppp: unlock all_ppp_mutex before registering device
	be2net: restore properly promisc mode after queues reconfiguration
	ip6_gre: init dev->mtu and dev->hard_header_len correctly
	gso: validate gso_type in GSO handlers
	mlxsw: spectrum_router: Don't log an error on missing neighbor
	tun: fix a memory leak for tfile->tx_array
	flow_dissector: properly cap thoff field
	perf/x86/amd/power: Do not load AMD power module on !AMD platforms
	x86/microcode/intel: Extend BDW late-loading further with LLC size check
	hrtimer: Reset hrtimer cpu base proper on CPU hotplug
	x86: bpf_jit: small optimization in emit_bpf_tail_call()
	bpf: fix bpf_tail_call() x64 JIT
	bpf: introduce BPF_JIT_ALWAYS_ON config
	bpf: arsh is not supported in 32 bit alu thus reject it
	bpf: avoid false sharing of map refcount with max_entries
	bpf: fix divides by zero
	bpf: fix 32-bit divide by zero
	bpf: reject stores into ctx via st and xadd
	nfsd: auth: Fix gid sorting when rootsquash enabled
	Linux 4.9.79

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-31 14:13:00 +01:00
Daniel Borkmann
f531fbb06a bpf: reject stores into ctx via st and xadd
[ upstream commit f37a8cb84c ]

Alexei found that verifier does not reject stores into context
via BPF_ST instead of BPF_STX. And while looking at it, we
also should not allow XADD variant of BPF_STX.

The context rewriter is only assuming either BPF_LDX_MEM- or
BPF_STX_MEM-type operations, thus reject anything other than
that so that assumptions in the rewriter properly hold. Add
test cases as well for BPF selftests.

Fixes: d691f9e8d4 ("bpf: allow programs to write to certain skb fields")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Alexei Starovoitov
265d7657c9 bpf: fix 32-bit divide by zero
[ upstream commit 68fda450a7 ]

due to some JITs doing if (src_reg == 0) check in 64-bit mode
for div/mod operations mask upper 32-bits of src register
before doing the check

Fixes: 622582786c ("net: filter: x86: internal BPF JIT")
Fixes: 7a12b5031c ("sparc64: Add eBPF JIT.")
Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Eric Dumazet
4606077802 bpf: fix divides by zero
[ upstream commit c366287ebd ]

Divides by zero are not nice, lets avoid them if possible.

Also do_div() seems not needed when dealing with 32bit operands,
but this seems a minor detail.

Fixes: bd4cf0ed33 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Daniel Borkmann
fcabc6d008 bpf: arsh is not supported in 32 bit alu thus reject it
[ upstream commit 7891a87efc ]

The following snippet was throwing an 'unknown opcode cc' warning
in BPF interpreter:

  0: (18) r0 = 0x0
  2: (7b) *(u64 *)(r10 -16) = r0
  3: (cc) (u32) r0 s>>= (u32) r0
  4: (95) exit

Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
generation, not all of them do and interpreter does neither. We can
leave existing ones and implement it later in bpf-next for the
remaining ones, but reject this properly in verifier for the time
being.

Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Alexei Starovoitov
a3d6dd6a66 bpf: introduce BPF_JIT_ALWAYS_ON config
[ upstream commit 290af86629 ]

The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."

To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64

The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)

v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
  It will be sent when the trees are merged back to net-next

Considered doing:
  int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:56 +01:00
Alexei Starovoitov
5226bb3b95 bpf: fix bpf_tail_call() x64 JIT
[ upstream commit 90caccdd8c ]

- bpf prog_array just like all other types of bpf array accepts 32-bit index.
  Clarify that in the comment.
- fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
- tighten corresponding check in the interpreter to stay consistent

The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
in commit 96eabe7a40 in 4.14. Before that the map_flags would stay zero and
though JIT code is wrong it will check bounds correctly.
Hence two fixes tags. All other JITs don't have this problem.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 96eabe7a40 ("bpf: Allow selecting numa node during map creation")
Fixes: b52f00e6a7 ("x86: bpf_jit: implement bpf_tail_call() helper")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:56 +01:00
Greg Kroah-Hartman
033d019ce2 Merge 4.9.77 into android-4.9
Changes in 4.9.77
	dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
	mac80211: Add RX flag to indicate ICV stripped
	ath10k: rebuild crypto header in rx data frames
	KVM: Fix stack-out-of-bounds read in write_mmio
	can: gs_usb: fix return value of the "set_bittiming" callback
	IB/srpt: Disable RDMA access by the initiator
	MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
	MIPS: Factor out NT_PRFPREG regset access helpers
	MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
	MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
	MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
	MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
	MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
	kvm: vmx: Scrub hardware GPRs at VM-exit
	platform/x86: wmi: Call acpi_wmi_init() later
	x86/acpi: Handle SCI interrupts above legacy space gracefully
	ALSA: pcm: Remove incorrect snd_BUG_ON() usages
	ALSA: pcm: Add missing error checks in OSS emulation plugin builder
	ALSA: pcm: Abort properly at pending signal in OSS read/write loops
	ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
	ALSA: aloop: Release cable upon open error path
	ALSA: aloop: Fix inconsistent format due to incomplete rule
	ALSA: aloop: Fix racy hw constraints adjustment
	x86/acpi: Reduce code duplication in mp_override_legacy_irq()
	zswap: don't param_set_charp while holding spinlock
	lan78xx: use skb_cow_head() to deal with cloned skbs
	sr9700: use skb_cow_head() to deal with cloned skbs
	smsc75xx: use skb_cow_head() to deal with cloned skbs
	cx82310_eth: use skb_cow_head() to deal with cloned skbs
	xhci: Fix ring leak in failure path of xhci_alloc_virt_device()
	8021q: fix a memory leak for VLAN 0 device
	ip6_tunnel: disable dst caching if tunnel is dual-stack
	net: core: fix module type in sock_diag_bind
	RDS: Heap OOB write in rds_message_alloc_sgs()
	RDS: null pointer dereference in rds_atomic_free_op
	sh_eth: fix TSU resource handling
	sh_eth: fix SH7757 GEther initialization
	net: stmmac: enable EEE in MII, GMII or RGMII only
	ipv6: fix possible mem leaks in ipv6_make_skb()
	ethtool: do not print warning for applications using legacy API
	mlxsw: spectrum_router: Fix NULL pointer deref
	net/sched: Fix update of lastuse in act modules implementing stats_update
	crypto: algapi - fix NULL dereference in crypto_remove_spawns()
	rbd: set max_segments to USHRT_MAX
	x86/microcode/intel: Extend BDW late-loading with a revision check
	KVM: x86: Add memory barrier on vmcs field lookup
	drm/vmwgfx: Potential off by one in vmw_view_add()
	kaiser: Set _PAGE_NX only if supported
	iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
	target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
	bpf: move fixup_bpf_calls() function
	bpf: refactor fixup_bpf_calls()
	bpf: prevent out-of-bounds speculation
	bpf, array: fix overflow in max_entries and undefined behavior in index_mask
	USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
	USB: serial: cp210x: add new device ID ELV ALC 8xxx
	usb: misc: usb3503: make sure reset is low for at least 100us
	USB: fix usbmon BUG trigger
	usbip: remove kernel addresses from usb device and urb debug msgs
	usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
	usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
	staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
	Bluetooth: Prevent stack info leak from the EFS element.
	uas: ignore UAS for Norelsys NS1068(X) chips
	e1000e: Fix e1000_check_for_copper_link_ich8lan return value.
	x86/Documentation: Add PTI description
	x86/cpu: Factor out application of forced CPU caps
	x86/cpufeatures: Make CPU bugs sticky
	x86/cpufeatures: Add X86_BUG_CPU_INSECURE
	x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
	x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
	x86/cpu: Merge bugs.c and bugs_64.c
	sysfs/cpu: Add vulnerability folder
	x86/cpu: Implement CPU vulnerabilites sysfs functions
	x86/cpu/AMD: Make LFENCE a serializing instruction
	x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
	sysfs/cpu: Fix typos in vulnerability documentation
	x86/alternatives: Fix optimize_nops() checking
	x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
	x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
	objtool, modules: Discard objtool annotation sections for modules
	objtool: Detect jumps to retpoline thunks
	objtool: Allow alternatives to be ignored
	x86/asm: Use register variable to get stack pointer value
	x86/retpoline: Add initial retpoline support
	x86/spectre: Add boot time option to select Spectre v2 mitigation
	x86/retpoline/crypto: Convert crypto assembler indirect jumps
	x86/retpoline/entry: Convert entry assembler indirect jumps
	x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
	x86/retpoline/hyperv: Convert assembler indirect jumps
	x86/retpoline/xen: Convert Xen hypercall indirect jumps
	x86/retpoline/checksum32: Convert assembler indirect jumps
	x86/retpoline/irq32: Convert assembler indirect jumps
	x86/retpoline: Fill return stack buffer on vmexit
	selftests/x86: Add test_vsyscall
	x86/retpoline: Remove compile time warning
	objtool: Fix retpoline support for pre-ORC objtool
	x86/pti/efi: broken conversion from efi to kernel page table
	Linux 4.9.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-17 10:29:45 +01:00
Daniel Borkmann
820ef2a0e5 bpf, array: fix overflow in max_entries and undefined behavior in index_mask
commit bbeb6e4323 upstream.

syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.

However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr->max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.

Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.

This fixes all the issues triggered by syzkaller's reproducers.

Fixes: b2157399cc ("bpf: prevent out-of-bounds speculation")
Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Alexei Starovoitov
a9bfac14cd bpf: prevent out-of-bounds speculation
commit b2157399cc upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index >= max_entries) {
    index &= map->index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[ Backported to 4.9 - gregkh ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Alexei Starovoitov
f55093dccd bpf: refactor fixup_bpf_calls()
commit 79741b3bde upstream.

reduce indent and make it iterate over instructions similar to
convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[Backported to 4.9.y - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Alexei Starovoitov
28035366af bpf: move fixup_bpf_calls() function
commit e245c5c6a5 upstream.

no functional change.
move fixup_bpf_calls() to verifier.c
it's being refactored in the next patch

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[backported to 4.9 - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Greg Kroah-Hartman
f3f3457d45 Merge 4.9.73 into android-4.9
Changes in 4.9.73
	ACPI: APEI / ERST: Fix missing error handling in erst_reader()
	acpi, nfit: fix health event notification
	crypto: mcryptd - protect the per-CPU queue with a lock
	mfd: cros ec: spi: Don't send first message too soon
	mfd: twl4030-audio: Fix sibling-node lookup
	mfd: twl6040: Fix child-node lookup
	ALSA: rawmidi: Avoid racy info ioctl via ctl device
	ALSA: usb-audio: Add native DSD support for Esoteric D-05X
	ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
	PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
	parisc: Hide Diva-built-in serial aux and graphics card
	spi: xilinx: Detect stall with Unknown commands
	pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
	KVM: X86: Fix load RFLAGS w/o the fixed bit
	kvm: x86: fix RSM when PCID is non-zero
	clk: sunxi: sun9i-mmc: Implement reset callback for reset controls
	powerpc/perf: Dereference BHRB entries safely
	libnvdimm, pfn: fix start_pad handling for aligned namespaces
	net: mvneta: clear interface link status on port disable
	net: mvneta: use proper rxq_number in loop on rx queues
	net: mvneta: eliminate wrong call to handle rx descriptor error
	bpf/verifier: Fix states_equal() comparison of pointer and UNKNOWN
	Linux 4.9.73

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-29 18:18:15 +01:00
Ben Hutchings
37435f7e80 bpf/verifier: Fix states_equal() comparison of pointer and UNKNOWN
An UNKNOWN_VALUE is not supposed to be derived from a pointer, unless
pointer leaks are allowed.  Therefore, states_equal() must not treat
a state with a pointer in a register as "equal" to a state with an
UNKNOWN_VALUE in that register.

This was fixed differently upstream, but the code around here was
largely rewritten in 4.14 by commit f1174f77b5 "bpf/verifier: rework
value tracking".  The bug can be detected by the bpf/verifier sub-test
"pointer/scalar confusion in state equality check (way 1)".

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Jann Horn <jannh@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
2017-12-29 17:43:00 +01:00
Greg Kroah-Hartman
cb7518e616 Merge 4.9.72 into android-4.9
Changes in 4.9.72
	cxl: Check if vphb exists before iterating over AFU devices
	arm64: Initialise high_memory global variable earlier
	ALSA: hda - add support for docking station for HP 820 G2
	ALSA: hda - add support for docking station for HP 840 G3
	kvm: fix usage of uninit spinlock in avic_vm_destroy()
	HID: corsair: support for K65-K70 Rapidfire and Scimitar Pro RGB
	HID: corsair: Add driver Scimitar Pro RGB gaming mouse 1b1c:1b3e support to hid-corsair
	arm: kprobes: Fix the return address of multiple kretprobes
	arm: kprobes: Align stack to 8-bytes in test code
	nvme-loop: handle cpu unplug when re-establishing the controller
	cpuidle: Validate cpu_dev in cpuidle_add_sysfs()
	r8152: fix the list rx_done may be used without initialization
	crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
	vsock: track pkt owner vsock
	vhost-vsock: add pkt cancel capability
	vsock: cancel packets when failing to connect
	sch_dsmark: fix invalid skb_cow() usage
	bna: integer overflow bug in debugfs
	sctp: out_qlen should be updated when pruning unsent queue
	net: qmi_wwan: Add USB IDs for MDM6600 modem on Motorola Droid 4
	hwmon: (max31790) Set correct PWM value
	usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
	usb: gadget: udc: remove pointer dereference after free
	netfilter: nfnl_cthelper: fix runtime expectation policy updates
	netfilter: nfnl_cthelper: Fix memory leak
	iommu/exynos: Workaround FLPD cache flush issues for SYSMMU v5
	r8152: fix the rx early size of RTL8153
	tipc: fix nametbl deadlock at tipc_nametbl_unsubscribe
	inet: frag: release spinlock before calling icmp_send()
	pinctrl: st: add irq_request/release_resources callbacks
	scsi: lpfc: Fix PT2PT PRLI reject
	kvm: vmx: Flush TLB when the APIC-access address changes
	KVM: x86: correct async page present tracepoint
	KVM: VMX: Fix enable VPID conditions
	ARM: dts: ti: fix PCI bus dtc warnings
	hwmon: (asus_atk0110) fix uninitialized data access
	HID: xinmo: fix for out of range for THT 2P arcade controller.
	ASoC: STI: Fix reader substream pointer set
	r8152: prevent the driver from transmitting packets with carrier off
	s390/qeth: size calculation outbound buffers
	s390/qeth: no ETH header for outbound AF_IUCV
	bna: avoid writing uninitialized data into hw registers
	i40iw: Receive netdev events post INET_NOTIFIER state
	IB/core: Protect against self-requeue of a cq work item
	infiniband: Fix alignment of mmap cookies to support VIPT caching
	nbd: set queue timeout properly
	net: Do not allow negative values for busy_read and busy_poll sysctl interfaces
	IB/rxe: double free on error
	IB/rxe: increment msn only when completing a request
	i40e: Do not enable NAPI on q_vectors that have no rings
	RDMA/iser: Fix possible mr leak on device removal event
	irda: vlsi_ir: fix check for DMA mapping errors
	netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table
	netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register
	ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
	cpufreq: Fix creation of symbolic links to policy directories
	net: ipconfig: fix ic_close_devs() use-after-free
	KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
	virtio-balloon: use actual number of stats for stats queue buffers
	virtio_balloon: prevent uninitialized variable use
	isdn: kcapi: avoid uninitialized data
	net: moxa: fix TX overrun memory leak
	xhci: plat: Register shutdown for xhci_plat
	netfilter: nfnetlink_queue: fix secctx memory leak
	Btrfs: fix an integer overflow check
	ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
	cpuidle: powernv: Pass correct drv->cpumask for registration
	bnxt_en: Fix NULL pointer dereference in reopen failure path
	backlight: pwm_bl: Fix overflow condition
	crypto: crypto4xx - increase context and scatter ring buffer elements
	rtc: pl031: make interrupt optional
	kvm, mm: account kvm related kmem slabs to kmemcg
	net: phy: at803x: Change error to EINVAL for invalid MAC
	PCI: Avoid bus reset if bridge itself is broken
	scsi: cxgb4i: fix Tx skb leak
	scsi: mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume created on two SATA drive
	PCI: Create SR-IOV virtfn/physfn links before attaching driver
	PM / OPP: Move error message to debug level
	igb: check memory allocation failure
	ixgbe: fix use of uninitialized padding
	IB/rxe: check for allocation failure on elem
	PCI/AER: Report non-fatal errors only to the affected endpoint
	tracing: Exclude 'generic fields' from histograms
	ASoC: img-parallel-out: Add pm_runtime_get/put to set_fmt callback
	fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bw
	scsi: lpfc: Fix secure firmware updates
	scsi: lpfc: PLOGI failures during NPIV testing
	vfio/pci: Virtualize Maximum Payload Size
	fm10k: ensure we process SM mbx when processing VF mbx
	net: ipv6: send NS for DAD when link operationally up
	staging: greybus: light: Release memory obtained by kasprintf
	clk: sunxi-ng: sun6i: Rename HDMI DDC clock to avoid name collision
	tcp: fix under-evaluated ssthresh in TCP Vegas
	rtc: set the alarm to the next expiring timer
	cpuidle: fix broadcast control when broadcast can not be entered
	thermal: hisilicon: Handle return value of clk_prepare_enable
	thermal/drivers/hisi: Fix missing interrupt enablement
	thermal/drivers/hisi: Fix kernel panic on alarm interrupt
	thermal/drivers/hisi: Simplify the temperature/step computation
	thermal/drivers/hisi: Fix multiple alarm interrupts firing
	MIPS: math-emu: Fix final emulation phase for certain instructions
	platform/x86: asus-wireless: send an EV_SYN/SYN_REPORT between state changes
	Revert "Bluetooth: btusb: driver to enable the usb-wakeup feature"
	bpf: adjust insn_aux_data when patching insns
	bpf: fix branch pruning logic
	bpf: reject out-of-bounds stack pointer calculation
	bpf: fix incorrect sign extension in check_alu_op()
	sparc32: Export vac_cache_size to fix build error
	Linux 4.9.72

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-27 13:38:06 +01:00
Daniel Borkmann
3695b3b185 bpf: fix incorrect sign extension in check_alu_op()
From: Jann Horn <jannh@google.com>

[ Upstream commit 95a762e2c8 ]

Distinguish between
BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit)
and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit);
only perform sign extension in the first case.

Starting with v4.14, this is exploitable by unprivileged users as long as
the unprivileged_bpf_disabled sysctl isn't set.

Debian assigned CVE-2017-16995 for this issue.

v3:
 - add CVE number (Ben Hutchings)

Fixes: 484611357c ("bpf: allow access into map value arrays")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:23:47 +01:00
Daniel Borkmann
d75d3ee237 bpf: reject out-of-bounds stack pointer calculation
From: Jann Horn <jannh@google.com>

Reject programs that compute wildly out-of-bounds stack pointers.
Otherwise, pointers can be computed with an offset that doesn't fit into an
`int`, causing security issues in the stack memory access check (as well as
signed integer overflow during offset addition).

This is a fix specifically for the v4.9 stable tree because the mainline
code looks very different at this point.

Fixes: 7bca0a9702 ("bpf: enhance verifier to understand stack pointer arithmetic")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:23:47 +01:00
Daniel Borkmann
7b5b73ea87 bpf: fix branch pruning logic
From: Alexei Starovoitov <ast@fb.com>

[ Upstream commit c131187db2 ]

when the verifier detects that register contains a runtime constant
and it's compared with another constant it will prune exploration
of the branch that is guaranteed not to be taken at runtime.
This is all correct, but malicious program may be constructed
in such a way that it always has a constant comparison and
the other branch is never taken under any conditions.
In this case such path through the program will not be explored
by the verifier. It won't be taken at run-time either, but since
all instructions are JITed the malicious program may cause JITs
to complain about using reserved fields, etc.
To fix the issue we have to track the instructions explored by
the verifier and sanitize instructions that are dead at run time
with NOPs. We cannot reject such dead code, since llvm generates
it for valid C code, since it doesn't do as much data flow
analysis as the verifier does.

Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:23:47 +01:00
Daniel Borkmann
565f012f5a bpf: adjust insn_aux_data when patching insns
From: Alexei Starovoitov <ast@fb.com>

[ Upstream commit 8041902dae ]

convert_ctx_accesses() replaces single bpf instruction with a set of
instructions. Adjust corresponding insn_aux_data while patching.
It's needed to make sure subsequent 'for(all insn)' loops
have matching insn and insn_aux_data.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:23:47 +01:00
Greg Kroah-Hartman
3f1d77ca5f Merge 4.9.69 into android-4.9
Changes in 4.9.69
	usb: gadget: udc: renesas_usb3: fix number of the pipes
	can: ti_hecc: Fix napi poll return value for repoll
	can: kvaser_usb: free buf in error paths
	can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback()
	can: kvaser_usb: ratelimit errors if incomplete messages are received
	can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
	can: ems_usb: cancel urb on -EPIPE and -EPROTO
	can: esd_usb2: cancel urb on -EPIPE and -EPROTO
	can: usb_8dev: cancel urb on -EPIPE and -EPROTO
	virtio: release virtio index when fail to device_register
	hv: kvp: Avoid reading past allocated blocks from KVP file
	isa: Prevent NULL dereference in isa_bus driver callbacks
	scsi: dma-mapping: always provide dma_get_cache_alignment
	scsi: use dma_get_cache_alignment() as minimum DMA alignment
	scsi: libsas: align sata_device's rps_resp on a cacheline
	efi: Move some sysfs files to be read-only by root
	efi/esrt: Use memunmap() instead of kfree() to free the remapping
	ASN.1: fix out-of-bounds read when parsing indefinite length item
	ASN.1: check for error from ASN1_OP_END__ACT actions
	KEYS: add missing permission check for request_key() destination
	X.509: reject invalid BIT STRING for subjectPublicKey
	X.509: fix comparisons of ->pkey_algo
	x86/PCI: Make broadcom_postcore_init() check acpi_disabled
	KVM: x86: fix APIC page invalidation
	btrfs: fix missing error return in btrfs_drop_snapshot
	ALSA: pcm: prevent UAF in snd_pcm_info
	ALSA: seq: Remove spurious WARN_ON() at timer check
	ALSA: usb-audio: Fix out-of-bound error
	ALSA: usb-audio: Add check return value for usb_string()
	iommu/vt-d: Fix scatterlist offset handling
	smp/hotplug: Move step CPUHP_AP_SMPCFD_DYING to the correct place
	s390: fix compat system call table
	KVM: s390: Fix skey emulation permission check
	powerpc/64s: Initialize ISAv3 MMU registers before setting partition table
	brcmfmac: change driver unbind order of the sdio function devices
	kdb: Fix handling of kallsyms_symbol_next() return value
	drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
	media: dvb: i2c transfers over usb cannot be done from stack
	arm64: KVM: fix VTTBR_BADDR_MASK BUG_ON off-by-one
	arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
	KVM: VMX: remove I/O port 0x80 bypass on Intel hosts
	KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion
	KVM: arm/arm64: vgic-irqfd: Fix MSI entry allocation
	KVM: arm/arm64: vgic-its: Check result of allocation before use
	arm64: fpsimd: Prevent registers leaking from dead tasks
	bus: arm-cci: Fix use of smp_processor_id() in preemptible context
	bus: arm-ccn: Check memory allocation failure
	bus: arm-ccn: Fix use of smp_processor_id() in preemptible context
	bus: arm-ccn: fix module unloading Error: Removing state 147 which has instances left.
	crypto: talitos - fix AEAD test failures
	crypto: talitos - fix memory corruption on SEC2
	crypto: talitos - fix setkey to check key weakness
	crypto: talitos - fix AEAD for sha224 on non sha224 capable chips
	crypto: talitos - fix use of sg_link_tbl_len
	crypto: talitos - fix ctr-aes-talitos
	usb: f_fs: Force Reserved1=1 in OS_DESC_EXT_COMPAT
	ARM: BUG if jumping to usermode address in kernel mode
	ARM: avoid faulting on qemu
	thp: reduce indentation level in change_huge_pmd()
	thp: fix MADV_DONTNEED vs. numa balancing race
	mm: drop unused pmdp_huge_get_and_clear_notify()
	Revert "drm/armada: Fix compile fail"
	Revert "spi: SPI_FSL_DSPI should depend on HAS_DMA"
	ARM: 8657/1: uaccess: consistently check object sizes
	vti6: Don't report path MTU below IPV6_MIN_MTU.
	ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure
	x86/selftests: Add clobbers for int80 on x86_64
	x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register
	sched/fair: Make select_idle_cpu() more aggressive
	x86/hpet: Prevent might sleep splat on resume
	powerpc/64: Invalidate process table caching after setting process table
	selftest/powerpc: Fix false failures for skipped tests
	powerpc: Fix compiling a BE kernel with a powerpc64le toolchain
	lirc: fix dead lock between open and wakeup_filter
	module: set __jump_table alignment to 8
	powerpc/64: Fix checksum folding in csum_add()
	ARM: OMAP2+: Fix device node reference counts
	ARM: OMAP2+: Release device node after it is no longer needed.
	ASoC: rcar: avoid SSI_MODEx settings for SSI8
	gpio: altera: Use handle_level_irq when configured as a level_high
	HID: chicony: Add support for another ASUS Zen AiO keyboard
	usb: gadget: configs: plug memory leak
	USB: gadgetfs: Fix a potential memory leak in 'dev_config()'
	usb: dwc3: gadget: Fix system suspend/resume on TI platforms
	usb: gadget: pxa27x: Test for a valid argument pointer
	usb: gadget: udc: net2280: Fix tmp reusage in net2280 driver
	kvm: nVMX: VMCLEAR should not cause the vCPU to shut down
	libata: drop WARN from protocol error in ata_sff_qc_issue()
	workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
	scsi: qla2xxx: Fix ql_dump_buffer
	scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters
	irqchip/crossbar: Fix incorrect type of register size
	KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset
	arm: KVM: Survive unknown traps from guests
	arm64: KVM: Survive unknown traps from guests
	KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled
	spi_ks8995: fix "BUG: key accdaa28 not in .data!"
	spi_ks8995: regs_size incorrect for some devices
	bnx2x: prevent crash when accessing PTP with interface down
	bnx2x: fix possible overrun of VFPF multicast addresses array
	bnx2x: fix detection of VLAN filtering feature for VF
	bnx2x: do not rollback VF MAC/VLAN filters we did not configure
	rds: tcp: Sequence teardown of listen and acceptor sockets to avoid races
	ibmvnic: Fix overflowing firmware/hardware TX queue
	ibmvnic: Allocate number of rx/tx buffers agreed on by firmware
	ipv6: reorder icmpv6_init() and ip6_mr_init()
	crypto: s5p-sss - Fix completing crypto request in IRQ handler
	i2c: riic: fix restart condition
	blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()
	zram: set physical queue limits to avoid array out of bounds accesses
	netfilter: don't track fragmented packets
	axonram: Fix gendisk handling
	drm/amd/amdgpu: fix console deadlock if late init failed
	powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested
	EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro
	EDAC, i5000, i5400: Fix definition of NRECMEMB register
	kbuild: pkg: use --transform option to prefix paths in tar
	coccinelle: fix parallel build with CHECK=scripts/coccicheck
	x86/mpx/selftests: Fix up weird arrays
	mac80211_hwsim: Fix memory leak in hwsim_new_radio_nl()
	gre6: use log_ecn_error module parameter in ip6_tnl_rcv()
	route: also update fnhe_genid when updating a route cache
	route: update fnhe_expires for redirect when the fnhe exists
	drivers/rapidio/devices/rio_mport_cdev.c: fix resource leak in error handling path in 'rio_dma_transfer()'
	lib/genalloc.c: make the avail variable an atomic_long_t
	dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0
	NFS: Fix a typo in nfs_rename()
	sunrpc: Fix rpc_task_begin trace point
	xfs: fix forgotten rcu read unlock when skipping inode reclaim
	dt-bindings: usb: fix reg-property port-number range
	block: wake up all tasks blocked in get_request()
	sparc64/mm: set fields in deferred pages
	zsmalloc: calling zs_map_object() from irq is a bug
	sctp: do not free asoc when it is already dead in sctp_sendmsg
	sctp: use the right sk after waking up from wait_buf sleep
	bpf: fix lockdep splat
	clk: uniphier: fix DAPLL2 clock rate of Pro5
	atm: horizon: Fix irq release error
	jump_label: Invoke jump_label_test() via early_initcall()
	xfrm: Copy policy family in clone_policy
	IB/mlx4: Increase maximal message size under UD QP
	IB/mlx5: Assign send CQ and recv CQ of UMR QP
	afs: Connect up the CB.ProbeUuid
	Linux 4.9.69

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-14 09:58:43 +01:00
Eric Dumazet
f45f4f8a7c bpf: fix lockdep splat
[ Upstream commit 89ad2fa3f0 ]

pcpu_freelist_pop() needs the same lockdep awareness than
pcpu_freelist_populate() to avoid a false positive.

 [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]

 switchto-defaul/12508 [HC0[0]:SC0[6]:HE0:SE0] is trying to acquire:
  (&htab->buckets[i].lock){......}, at: [<ffffffff9dc099cb>] __htab_percpu_map_update_elem+0x1cb/0x300

 and this task is already holding:
  (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}, at: [<ffffffff9e135848>] __dev_queue_xmit+0
x868/0x1240
 which would create a new lock dependency:
  (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} -> (&htab->buckets[i].lock){......}

 but this new dependency connects a SOFTIRQ-irq-safe lock:
  (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}
 ... which became SOFTIRQ-irq-safe at:
   [<ffffffff9db5931b>] __lock_acquire+0x42b/0x1f10
   [<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
   [<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
   [<ffffffff9e135848>] __dev_queue_xmit+0x868/0x1240
   [<ffffffff9e136240>] dev_queue_xmit+0x10/0x20
   [<ffffffff9e1965d9>] ip_finish_output2+0x439/0x590
   [<ffffffff9e197410>] ip_finish_output+0x150/0x2f0
   [<ffffffff9e19886d>] ip_output+0x7d/0x260
   [<ffffffff9e19789e>] ip_local_out+0x5e/0xe0
   [<ffffffff9e197b25>] ip_queue_xmit+0x205/0x620
   [<ffffffff9e1b8398>] tcp_transmit_skb+0x5a8/0xcb0
   [<ffffffff9e1ba152>] tcp_write_xmit+0x242/0x1070
   [<ffffffff9e1baffc>] __tcp_push_pending_frames+0x3c/0xf0
   [<ffffffff9e1b3472>] tcp_rcv_established+0x312/0x700
   [<ffffffff9e1c1acc>] tcp_v4_do_rcv+0x11c/0x200
   [<ffffffff9e1c3dc2>] tcp_v4_rcv+0xaa2/0xc30
   [<ffffffff9e191107>] ip_local_deliver_finish+0xa7/0x240
   [<ffffffff9e191a36>] ip_local_deliver+0x66/0x200
   [<ffffffff9e19137d>] ip_rcv_finish+0xdd/0x560
   [<ffffffff9e191e65>] ip_rcv+0x295/0x510
   [<ffffffff9e12ff88>] __netif_receive_skb_core+0x988/0x1020
   [<ffffffff9e130641>] __netif_receive_skb+0x21/0x70
   [<ffffffff9e1306ff>] process_backlog+0x6f/0x230
   [<ffffffff9e132129>] net_rx_action+0x229/0x420
   [<ffffffff9da07ee8>] __do_softirq+0xd8/0x43d
   [<ffffffff9e282bcc>] do_softirq_own_stack+0x1c/0x30
   [<ffffffff9dafc2f5>] do_softirq+0x55/0x60
   [<ffffffff9dafc3a8>] __local_bh_enable_ip+0xa8/0xb0
   [<ffffffff9db4c727>] cpu_startup_entry+0x1c7/0x500
   [<ffffffff9daab333>] start_secondary+0x113/0x140

 to a SOFTIRQ-irq-unsafe lock:
  (&head->lock){+.+...}
 ... which became SOFTIRQ-irq-unsafe at:
 ...  [<ffffffff9db5971f>] __lock_acquire+0x82f/0x1f10
   [<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
   [<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
   [<ffffffff9dc0b7fa>] pcpu_freelist_pop+0x7a/0xb0
   [<ffffffff9dc08b2c>] htab_map_alloc+0x50c/0x5f0
   [<ffffffff9dc00dc5>] SyS_bpf+0x265/0x1200
   [<ffffffff9e28195f>] entry_SYSCALL_64_fastpath+0x12/0x17

 other info that might help us debug this:

 Chain exists of:
   dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2 --> &htab->buckets[i].lock --> &head->lock

  Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&head->lock);
                                local_irq_disable();
                                lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
                                lock(&htab->buckets[i].lock);
   <Interrupt>
     lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);

  *** DEADLOCK ***

Fixes: e19494edab ("bpf: introduce percpu_freelist")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:23 +01:00