284 Commits

Author SHA1 Message Date
Mauro (mdrjr) Ribeiro
82c8dea85f Merge tag 'v4.9.258' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.258 stable release

Change-Id: Ibf7f2a97a8bad31d60fa7727fa94e3f5535bfec3
2021-07-30 20:21:00 -03:00
Mauro (mdrjr) Ribeiro
1c38096ec6 Merge tag 'v4.9.254' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.254 stable release

Change-Id: I595b8604e6a9ad4554c8bce112929257aaa9d336
2021-07-30 20:16:52 -03:00
Bui Quang Minh
253150830a bpf: Check for integer overflow when using roundup_pow_of_two()
[ Upstream commit 6183f4d3a0 ]

On 32-bit architecture, roundup_pow_of_two() can return 0 when the argument
has upper most bit set due to resulting 1UL << 32. Add a check for this case.

Fixes: d5a3b1f691 ("bpf: introduce BPF_MAP_TYPE_STACK_TRACE")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210127063653.3576-1-minhquangbui99@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-23 13:59:16 +01:00
Daniel Borkmann
b984811672 bpf: Fix buggy rsh min/max bounds tracking
[ no upstream commit ]

Fix incorrect bounds tracking for RSH opcode. Commit f23cc643f9 ("bpf: fix
range arithmetic for bpf map access") had a wrong assumption about min/max
bounds. The new dst_reg->min_value needs to be derived by right shifting the
max_val bounds, not min_val, and likewise new dst_reg->max_value needs to be
derived by right shifting the min_val bounds, not max_val. Later stable kernels
than 4.9 are not affected since bounds tracking was overall reworked and they
already track this similarly as in the fix.

Fixes: f23cc643f9 ("bpf: fix range arithmetic for bpf map access")
Reported-by: Ryota Shiga (Flatt Security)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-30 13:27:15 +01:00
Mauro (mdrjr) Ribeiro
de59193db1 Merge tag 'v4.9.238' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.238 stable release

Change-Id: I5aad49a29352f44772f23d33945a369ddfab49bf
2020-12-22 09:19:23 -03:00
Thomas Gleixner
d59ef3125c bpf: Remove recursion prevention from rcu free callback
[ Upstream commit 8a37963c7a ]

If an element is freed via RCU then recursion into BPF instrumentation
functions is not a concern. The element is already detached from the map
and the RCU callback does not hold any locks on which a kprobe, perf event
or tracepoint attached BPF program could deadlock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145643.259118710@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 20:40:08 +02:00
Mauro (mdrjr) Ribeiro
965041309b Merge tag 'v4.9.218' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.218 stable release
2020-04-07 21:33:19 -03:00
Mauro (mdrjr) Ribeiro
09019bb1f7 Merge tag 'v4.9.190' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.190 stable release
2020-04-07 20:17:16 -03:00
Mauro (mdrjr) Ribeiro
33417c4f2f Merge tag 'v4.9.187' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.187 stable release
2020-04-07 20:10:04 -03:00
Mauro (mdrjr) Ribeiro
cf3b5c0a88 Merge tag 'v4.9.177' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.177 stable release
2020-04-07 15:14:07 -03:00
Mauro (mdrjr) Ribeiro
9972746c92 Merge tag 'v4.9.147' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.147 stable release
2020-04-07 14:40:34 -03:00
Mauro (mdrjr) Ribeiro
60bf0e0c88 Merge tag 'v4.9.144' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.144 stable release
2020-04-07 14:38:25 -03:00
Mauro (mdrjr) Ribeiro
8cc5b2adad Merge tag 'v4.9.117' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.117 stable release
2020-04-06 22:43:27 -03:00
Greg Kroah-Hartman
ce7656ea22 bpf: Explicitly memset the bpf_attr structure
commit 8096f22942 upstream.

For the bpf syscall, we are relying on the compiler to properly zero out
the bpf_attr union that we copy userspace data into. Unfortunately that
doesn't always work properly, padding and other oddities might not be
correctly zeroed, and in some tests odd things have been found when the
stack is pre-initialized to other values.

Fix this by explicitly memsetting the structure to 0 before using it.

Reported-by: Maciej Żenczykowski <maze@google.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Alexander Potapenko <glider@google.com>
Reported-by: Alistair Delva <adelva@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://android-review.googlesource.com/c/kernel/common/+/1235490
Link: https://lore.kernel.org/bpf/20200320094813.GA421650@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02 17:20:40 +02:00
Daniel Borkmann
6c1dc8f96b bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K
[ Upstream commit fdadd04931 ]

Michael and Sandipan report:

  Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
  JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
  and is adjusted again at init time if MODULES_VADDR is defined.

  For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
  the compile-time default at boot-time, which is 0x9c400000 when
  using 64K page size. This overflows the signed 32-bit bpf_jit_limit
  value:

  root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
  -1673527296

  and can cause various unexpected failures throughout the network
  stack. In one case `strace dhclient eth0` reported:

  setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
             16) = -1 ENOTSUPP (Unknown error 524)

  and similar failures can be seen with tools like tcpdump. This doesn't
  always reproduce however, and I'm not sure why. The more consistent
  failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
  host would time out on systemd/netplan configuring a virtio-net NIC
  with no noticeable errors in the logs.

Given this and also given that in near future some architectures like
arm64 will have a custom area for BPF JIT image allocations we should
get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
so therefore add another overridable bpf_jit_alloc_exec_limit() helper
function which returns the possible size of the memory area for deriving
the default heuristic in bpf_jit_charge_init().

Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
JIT memory provider, and therefore in case archs implement their custom
module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.

Additionally, for archs supporting large page sizes, we should change
the sysctl to be handled as long to not run into sysctl restrictions
in future.

Fixes: ede95a63b5 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
Reported-by: Sandipan Das <sandipan@linux.ibm.com>
Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-25 10:51:50 +02:00
Daniel Borkmann
c98446e1ba bpf: add bpf_jit_limit knob to restrict unpriv allocations
commit ede95a63b5 upstream.

Rick reported that the BPF JIT could potentially fill the entire module
space with BPF programs from unprivileged users which would prevent later
attempts to load normal kernel modules or privileged BPF programs, for
example. If JIT was enabled but unsuccessful to generate the image, then
before commit 290af86629 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
we would always fall back to the BPF interpreter. Nowadays in the case
where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort
with a failure since the BPF interpreter was compiled out.

Add a global limit and enforce it for unprivileged users such that in case
of BPF interpreter compiled out we fail once the limit has been reached
or we fall back to BPF interpreter earlier w/o using module mem if latter
was compiled in. In a next step, fair share among unprivileged users can
be resolved in particular for the case where we would fail hard once limit
is reached.

Fixes: 290af86629 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
Fixes: 0a14842f5a ("net: filter: Just In Time compiler for x86-64")
Co-Developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[bwh: Backported to 4.9: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25 10:51:41 +02:00
Daniel Borkmann
5124abda30 bpf: get rid of pure_initcall dependency to enable jits
commit fa9dd599b4 upstream.

Having a pure_initcall() callback just to permanently enable BPF
JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave
a small race window in future where JIT is still disabled on boot.
Since we know about the setting at compilation time anyway, just
initialize it properly there. Also consolidate all the individual
bpf_jit_enable variables into a single one and move them under one
location. Moreover, don't allow for setting unspecified garbage
values on them.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[bwh: Backported to 4.9 as dependency of commit 2e4a30983b
 "bpf: restrict access to core bpf sysctls":
 - Drop change in arch/mips/net/ebpf_jit.c
 - Drop change to bpf_jit_kallsyms
 - Adjust filenames, context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25 10:51:40 +02:00
Valdis Klētnieks
2b23f7074a bpf: silence warning messages in core
[ Upstream commit aee450cbe4 ]

Compiling kernel/bpf/core.c with W=1 causes a flood of warnings:

kernel/bpf/core.c:1198:65: warning: initialized field overwritten [-Woverride-init]
 1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
      |                                                                 ^~~~
kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
 1087 |  INSN_3(ALU, ADD,  X),   \
      |  ^~~~~~
kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
 1202 |   BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
      |   ^~~~~~~~~~~~
kernel/bpf/core.c:1198:65: note: (near initialization for 'public_insntable[12]')
 1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
      |                                                                 ^~~~
kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
 1087 |  INSN_3(ALU, ADD,  X),   \
      |  ^~~~~~
kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
 1202 |   BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
      |   ^~~~~~~~~~~~

98 copies of the above.

The attached patch silences the warnings, because we *know* we're overwriting
the default initializer. That leaves bpf/core.c with only 6 other warnings,
which become more visible in comparison.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-04 09:33:19 +02:00
Alexei Starovoitov
82303dd64a bpf: convert htab map to hlist_nulls
commit 4fe8435909 upstream.

when all map elements are pre-allocated one cpu can delete and reuse htab_elem
while another cpu is still walking the hlist. In such case the lookup may
miss the element. Convert hlist to hlist_nulls to avoid such scenario.
When bucket lock is taken there is no need to take such precautions,
so only convert map_lookup and map_get_next to nulls.
The race window is extremely small and only reproducible with explicit
udelay() inside lookup_nulls_elem_raw()

Similar to hlist add hlist_nulls_for_each_entry_safe() and
hlist_nulls_entry_safe() helpers.

Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Reported-by: Jonathan Perry <jonperry@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-16 19:43:40 +02:00
Alexei Starovoitov
aad9db666c bpf: fix struct htab_elem layout
commit 9f691549f7 upstream.

when htab_elem is removed from the bucket list the htab_elem.hash_node.next
field should not be overridden too early otherwise we have a tiny race window
between lookup and delete.
The bug was discovered by manual code analysis and reproducible
only with explicit udelay() in lookup_elem_raw().

Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Reported-by: Jonathan Perry <jonperry@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-16 19:43:40 +02:00
Alexei Starovoitov
ae30c98dcf bpf: check pending signals while verifying programs
[ Upstream commit c3494801cd ]

Malicious user space may try to force the verifier to use as much cpu
time and memory as possible. Hence check for pending signals
while verifying the program.
Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN,
since the kernel has to release the resources used for program verification.

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-21 14:11:38 +01:00
Alexei Starovoitov
def8c1d045 bpf: Prevent memory disambiguation attack
commit af86ca4e30 upstream.

Detect code patterns where malicious 'speculative store bypass' can be used
and sanitize such patterns.

 39: (bf) r3 = r10
 40: (07) r3 += -216
 41: (79) r8 = *(u64 *)(r7 +0)   // slow read
 42: (7a) *(u64 *)(r10 -72) = 0  // verifier inserts this instruction
 43: (7b) *(u64 *)(r8 +0) = r3   // this store becomes slow due to r8
 44: (79) r1 = *(u64 *)(r6 +0)   // cpu speculatively executes this load
 45: (71) r2 = *(u8 *)(r1 +0)    // speculatively arbitrary 'load byte'
                                 // is now sanitized

Above code after x86 JIT becomes:
 e5: mov    %rbp,%rdx
 e8: add    $0xffffffffffffff28,%rdx
 ef: mov    0x0(%r13),%r14
 f3: movq   $0x0,-0x48(%rbp)
 fb: mov    %rdx,0x0(%r14)
 ff: mov    0x0(%rbx),%rdi
103: movzbq 0x0(%rdi),%rsi

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 4.9:
 - Add bpf_verifier_env parameter to check_stack_write()
 - Look up stack slot_types with state->stack_slot_type[] rather than
   state->stack[].slot_type[]
 - Drop bpf_verifier_env argument to verbose()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 13:05:10 +01:00
Ben Hutchings
62e0865f20 bpf/verifier: Pass instruction index to check_mem_access() and check_xadd()
Extracted from commit 31fd85816d "bpf: permits narrower load from
bpf program context fields".

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 13:05:10 +01:00
Ben Hutchings
9c33b84ba0 bpf/verifier: Add spi variable to check_stack_write()
Extracted from commit dc503a8ad9 "bpf/verifier: track liveness for
pruning".

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 13:05:10 +01:00
Jakub Kicinski
e31a06ec82 bpf: fix references to free_bpf_prog_info() in comments
[ Upstream commit ab7f5bf092 ]

Comments in the verifier refer to free_bpf_prog_info() which
seems to have never existed in tree.  Replace it with
free_used_maps().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03 07:55:23 +02:00
Greg Kroah-Hartman
c462abbf77 Merge 4.9.99 into android-4.9
Changes in 4.9.99
	perf/core: Fix the perf_cpu_time_max_percent check
	percpu: include linux/sched.h for cond_resched()
	bpf: map_get_next_key to return first key on NULL
	arm/arm64: KVM: Add PSCI version selection API
	crypto: talitos - fix IPsec cipher in length
	serial: imx: ensure UCR3 and UFCR are setup correctly
	USB: serial: option: Add support for Quectel EP06
	ALSA: pcm: Check PCM state at xfern compat ioctl
	ALSA: seq: Fix races at MIDI encoding in snd_virmidi_output_trigger()
	ALSA: aloop: Mark paused device as inactive
	ALSA: aloop: Add missing cable lock to ctl API callbacks
	tracepoint: Do not warn on ENOMEM
	Input: leds - fix out of bound access
	Input: atmel_mxt_ts - add touchpad button mapping for Samsung Chromebook Pro
	xfs: prevent creating negative-sized file via INSERT_RANGE
	RDMA/cxgb4: release hw resources on device removal
	RDMA/ucma: Allow resolving address w/o specifying source address
	RDMA/mlx5: Protect from shift operand overflow
	NET: usb: qmi_wwan: add support for ublox R410M PID 0x90b2
	IB/mlx5: Use unlimited rate when static rate is not supported
	IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
	drm/vmwgfx: Fix a buffer object leak
	drm/bridge: vga-dac: Fix edid memory leak
	test_firmware: fix setting old custom fw path back on exit, second try
	USB: serial: visor: handle potential invalid device configuration
	USB: Accept bulk endpoints with 1024-byte maxpacket
	USB: serial: option: reimplement interface masking
	USB: serial: option: adding support for ublox R410M
	usb: musb: host: fix potential NULL pointer dereference
	usb: musb: trace: fix NULL pointer dereference in musb_g_tx()
	platform/x86: asus-wireless: Fix NULL pointer dereference
	s390/facilites: use stfle_fac_list array size for MAX_FACILITY_BIT
	Linux 4.9.99

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-05-09 12:28:25 +02:00
Teng Qin
fcbc8d0e7d bpf: map_get_next_key to return first key on NULL
commit 8fe4592438 upstream.

When iterating through a map, we need to find a key that does not exist
in the map so map_get_next_key will give us the first key of the map.
This often requires a lot of guessing in production systems.

This patch makes map_get_next_key return the first key when the key
pointer in the parameter is NULL.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chenbo Feng <fengc@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-09 09:50:19 +02:00
Greg Kroah-Hartman
bb94f9d8f5 Merge 4.9.91 into android-4.9
Changes in 4.9.91
	MIPS: ralink: Remove ralink_halt()
	iio: st_pressure: st_accel: pass correct platform data to init
	ALSA: usb-audio: Fix parsing descriptor of UAC2 processing unit
	ALSA: aloop: Sync stale timer before release
	ALSA: aloop: Fix access to not-yet-ready substream via cable
	ALSA: hda/realtek - Always immediately update mute LED with pin VREF
	mmc: dw_mmc: fix falling from idmac to PIO mode when dw_mci_reset occurs
	PCI: Add function 1 DMA alias quirk for Highpoint RocketRAID 644L
	ahci: Add PCI-id for the Highpoint Rocketraid 644L card
	clk: bcm2835: Fix ana->maskX definitions
	clk: bcm2835: Protect sections updating shared registers
	clk: sunxi-ng: a31: Fix CLK_OUT_* clock ops
	Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174
	libata: fix length validation of ATAPI-relayed SCSI commands
	libata: remove WARN() for DMA or PIO command without data
	libata: don't try to pass through NCQ commands to non-NCQ devices
	libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs
	libata: disable LPM for Crucial BX100 SSD 500GB drive
	libata: Enable queued TRIM for Samsung SSD 860
	libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs
	libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versions
	libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version
	nfsd: remove blocked locks on client teardown
	mm/vmalloc: add interfaces to free unmapped page table
	x86/mm: implement free pmd/pte page interfaces
	mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
	mm/thp: do not wait for lock_page() in deferred_split_scan()
	mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
	drm/vmwgfx: Fix a destoy-while-held mutex problem.
	drm/radeon: Don't turn off DP sink when disconnected
	drm: udl: Properly check framebuffer mmap offsets
	acpi, numa: fix pxm to online numa node associations
	ACPI / watchdog: Fix off-by-one error at resource assignment
	libnvdimm, {btt, blk}: do integrity setup before add_disk()
	brcmfmac: fix P2P_DEVICE ethernet address generation
	rtlwifi: rtl8723be: Fix loss of signal
	tracing: probeevent: Fix to support minus offset from symbol
	mtdchar: fix usage of mtd_ooblayout_ecc()
	mtd: nand: fsl_ifc: Fix nand waitfunc return value
	mtd: nand: fsl_ifc: Fix eccstat array overflow for IFC ver >= 2.0.0
	mtd: nand: fsl_ifc: Read ECCSTAT0 and ECCSTAT1 registers for IFC 2.0
	staging: ncpfs: memory corruption in ncp_read_kernel()
	can: ifi: Repair the error handling
	can: ifi: Check core revision upon probe
	can: cc770: Fix stalls on rt-linux, remove redundant IRQ ack
	can: cc770: Fix queue stall & dropped RTR reply
	can: cc770: Fix use after free in cc770_tx_interrupt()
	tty: vt: fix up tabstops properly
	selftests/x86/ptrace_syscall: Fix for yet more glibc interference
	kvm/x86: fix icebp instruction handling
	x86/build/64: Force the linker to use 2MB page size
	x86/boot/64: Verify alignment of the LOAD segment
	x86/entry/64: Don't use IST entry for #BP stack
	perf/x86/intel/uncore: Fix Skylake UPI event format
	perf stat: Fix CVS output format for non-supported counters
	perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period()
	perf/x86/intel/uncore: Fix multi-domain PCI CHA enumeration bug on Skylake servers
	iio: ABI: Fix name of timestamp sysfs file
	staging: lustre: ptlrpc: kfree used instead of kvfree
	selftests, x86, protection_keys: fix wrong offset in siginfo
	selftests/x86/protection_keys: Fix syscall NR redefinition warnings
	signal/testing: Don't look for __SI_FAULT in userspace
	x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'
	selftests: x86: sysret_ss_attrs doesn't build on a PIE build
	kbuild: disable clang's default use of -fmerge-all-constants
	bpf: skip unnecessary capability check
	bpf, x64: increase number of passes
	Linux 4.9.91

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-03-29 11:32:39 +02:00
Chenbo Feng
3eb88807b2 bpf: skip unnecessary capability check
commit 0fa4fe85f4 upstream.

The current check statement in BPF syscall will do a capability check
for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This
code path will trigger unnecessary security hooks on capability checking
and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN
access. This can be resolved by simply switch the order of the statement
and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is
allowed.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:26 +02:00
Al Viro
7dc12f7d2f BACKPORT: fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'"
Descriptor table is a shared object; it's not a place where you can
stick temporary references to files, especially when we don't need
an opened file at all.

Cc: stable@vger.kernel.org # v4.14
Fixes: 98589a0998 ("netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chenbo Feng <fengc@google.com>

Removed the code related to function bpf_prog_get_ok() since it is not
exsit in current android tree.
(cherry picked from commit 040ee69226)

Change-Id: If7a602128cdea4b4b50c8effb215c9bca7449515
2018-03-14 11:39:19 -07:00
Shmulik Ladkani
7e3c72f4c7 UPSTREAM: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
Commit 2c16d60332 ("netfilter: xt_bpf: support ebpf") introduced
support for attaching an eBPF object by an fd, with the
'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
IPT_SO_SET_REPLACE call.

However this breaks subsequent iptables calls:

 # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
 # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
 iptables: Invalid argument. Run `dmesg' for more information.

That's because iptables works by loading existing rules using
IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
the replacement set.

However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
(from the initial "iptables -m bpf" invocation) - so when 2nd invocation
occurs, userspace passes a bogus fd number, which leads to
'bpf_mt_check_v1' to fail.

One suggested solution [1] was to hack iptables userspace, to perform a
"entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
process-local fd per every 'xt_bpf_info_v1' entry seen.

However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.

This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
'.fd' and instead perform an in-kernel lookup for the bpf object given
the provided '.path'.

It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
expected to provide the path of the pinned object.

Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.

References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
            [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2

Reported-by: Rafael Buchbinder <rafi@rbk.ms>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Chenbo Feng <fengc@google.com>
(cherry picked from commit 98589a0998)

Change-Id: Ia0d15a76823cca3afb38786a3d2c25c13ccf941d
2018-03-14 11:39:19 -07:00
Greg Kroah-Hartman
a2904940bd Merge 4.9.87 into android-4.9
Changes in 4.9.87
	tpm: st33zp24: fix potential buffer overruns caused by bit glitches on the bus
	tpm_i2c_infineon: fix potential buffer overruns caused by bit glitches on the bus
	tpm_i2c_nuvoton: fix potential buffer overruns caused by bit glitches on the bus
	tpm_tis: fix potential buffer overruns caused by bit glitches on the bus
	tpm: constify transmit data pointers
	tpm_tis_spi: Use DMA-safe memory for SPI transfers
	tpm-dev-common: Reject too short writes
	ALSA: usb-audio: Add a quirck for B&W PX headphones
	ALSA: hda: Add a power_save blacklist
	ALSA: hda - Fix pincfg at resume on Lenovo T470 dock
	timers: Forward timer base before migrating timers
	parisc: Fix ordering of cache and TLB flushes
	cpufreq: s3c24xx: Fix broken s3c_cpufreq_init()
	dax: fix vma_is_fsdax() helper
	x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend
	x86/platform/intel-mid: Handle Intel Edison reboot correctly
	media: m88ds3103: don't call a non-initalized function
	nospec: Allow index argument to have const-qualified type
	ARM: mvebu: Fix broken PL310_ERRATA_753970 selects
	ARM: kvm: fix building with gcc-8
	KVM: mmu: Fix overlap between public and private memslots
	KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
	KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
	PCI/ASPM: Deal with missing root ports in link state handling
	dm io: fix duplicate bio completion due to missing ref count
	ARM: dts: LogicPD SOM-LV: Fix I2C1 pinmux
	ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux
	x86/mm: Give each mm TLB flush generation a unique ID
	x86/speculation: Use Indirect Branch Prediction Barrier in context switch
	md: only allow remove_and_add_spares when no sync_thread running.
	netlink: put module reference if dump start fails
	x86/apic/vector: Handle legacy irq data correctly
	bridge: check brport attr show in brport_show
	fib_semantics: Don't match route with mismatching tclassid
	hdlc_ppp: carrier detect ok, don't turn off negotiation
	ipv6 sit: work around bogus gcc-8 -Wrestrict warning
	net: fix race on decreasing number of TX queues
	net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68
	netlink: ensure to loop over all netns in genlmsg_multicast_allns()
	ppp: prevent unregistered channels from connecting to PPP units
	udplite: fix partial checksum initialization
	sctp: fix dst refcnt leak in sctp_v4_get_dst
	mlxsw: spectrum_switchdev: Check success of FDB add operation
	net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT
	tcp: Honor the eor bit in tcp_mtu_probe
	rxrpc: Fix send in rxrpc_send_data_packet()
	tcp_bbr: better deal with suboptimal GSO
	sctp: fix dst refcnt leak in sctp_v6_get_dst()
	s390/qeth: fix underestimated count of buffer elements
	s390/qeth: fix SETIP command handling
	s390/qeth: fix overestimated count of buffer elements
	s390/qeth: fix IP removal on offline cards
	s390/qeth: fix double-free on IP add/remove race
	s390/qeth: fix IP address lookup for L3 devices
	s390/qeth: fix IPA command submission race
	sctp: verify size of a new chunk in _sctp_make_chunk()
	net: mpls: Pull common label check into helper
	mpls, nospec: Sanitize array index in mpls_label_ok()
	bpf: fix wrong exposure of map_flags into fdinfo for lpm
	bpf: fix mlock precharge on arraymaps
	bpf, x64: implement retpoline for tail call
	bpf, arm64: fix out of bounds access in tail call
	bpf: add schedule points in percpu arrays management
	bpf, ppc64: fix out of bounds access in tail call
	btrfs: preserve i_mode if __btrfs_set_acl() fails
	Linux 4.9.87

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-03-11 17:38:31 +01:00
Eric Dumazet
2a8bc5316a bpf: add schedule points in percpu arrays management
[ upstream commit 32fff239de ]

syszbot managed to trigger RCU detected stalls in
bpf_array_free_percpu()

It takes time to allocate a huge percpu map, but even more time to free
it.

Since we run in process context, use cond_resched() to yield cpu if
needed.

Fixes: a10423b87a ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11 16:21:35 +01:00
Daniel Borkmann
422baf61d4 bpf: fix mlock precharge on arraymaps
[ upstream commit 9c2d63b843 ]

syzkaller recently triggered OOM during percpu map allocation;
while there is work in progress by Dennis Zhou to add __GFP_NORETRY
semantics for percpu allocator under pressure, there seems also a
missing bpf_map_precharge_memlock() check in array map allocation.

Given today the actual bpf_map_charge_memlock() happens after the
find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock()
is there to bail out early before we go and do the map setup work
when we find that we hit the limits anyway. Therefore add this for
array map as well.

Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Fixes: a10423b87a ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11 16:21:34 +01:00
Daniel Borkmann
816cfeb77c bpf: fix wrong exposure of map_flags into fdinfo for lpm
[ upstream commit a316338cb7 ]

trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
attr->map_flags, since it does not support preallocation yet. We
check the flag, but we never copy the flag into trie->map.map_flags,
which is later on exposed into fdinfo and used by loaders such as
iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
whether a pinned map has the same spec as the one from the BPF obj
file and if not, bails out, which is currently the case for lpm
since it exposes always 0 as flags.

Also copy over flags in array_map_alloc() and stack_map_alloc().
They always have to be 0 right now, but we should make sure to not
miss to copy them over at a later point in time when we add actual
flags for them to use.

Fixes: b95a5c4db0 ("bpf: add a longest prefix match trie map implementation")
Reported-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11 16:21:34 +01:00
Sami Tolvanen
417637a2d9 bpf: fix function type for __bpf_prog_run
Bug: 67506682
Change-Id: I096a470c65a2a1867c51da9a33843ae23bf5e547
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2018-02-28 15:09:58 -08:00
Greg Kroah-Hartman
71f1469722 Merge 4.9.79 into android-4.9
Changes in 4.9.79
	x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
	orangefs: use list_for_each_entry_safe in purge_waiting_ops
	orangefs: initialize op on loop restart in orangefs_devreq_read
	usbip: prevent vhci_hcd driver from leaking a socket pointer address
	usbip: Fix implicit fallthrough warning
	usbip: Fix potential format overflow in userspace tools
	can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
	can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
	KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
	Prevent timer value 0 for MWAITX
	drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
	drivers: base: cacheinfo: fix boot error message when acpi is enabled
	mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
	hwpoison, memcg: forcibly uncharge LRU pages
	cma: fix calculation of aligned offset
	mm, page_alloc: fix potential false positive in __zone_watermark_ok
	ipc: msg, make msgrcv work with LONG_MIN
	ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
	ACPICA: Namespace: fix operand cache leak
	netfilter: nfnetlink_cthelper: Add missing permission checks
	netfilter: xt_osf: Add missing permission checks
	reiserfs: fix race in prealloc discard
	reiserfs: don't preallocate blocks for extended attributes
	fs/fcntl: f_setown, avoid undefined behaviour
	scsi: libiscsi: fix shifting of DID_REQUEUE host byte
	Revert "module: Add retpoline tag to VERMAGIC"
	mm: fix 100% CPU kswapd busyloop on unreclaimable nodes
	Input: trackpoint - force 3 buttons if 0 button is reported
	orangefs: fix deadlock; do not write i_size in read_iter
	um: link vmlinux with -no-pie
	vsyscall: Fix permissions for emulate mode with KAISER/PTI
	eventpoll.h: add missing epoll event masks
	dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
	ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
	ipv6: fix udpv6 sendmsg crash caused by too small MTU
	ipv6: ip6_make_skb() needs to clear cork.base.dst
	lan78xx: Fix failure in USB Full Speed
	net: igmp: fix source address check for IGMPv3 reports
	net: qdisc_pkt_len_init() should be more robust
	net: tcp: close sock if net namespace is exiting
	pppoe: take ->needed_headroom of lower device into account on xmit
	r8169: fix memory corruption on retrieval of hardware statistics.
	sctp: do not allow the v4 socket to bind a v4mapped v6 address
	sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
	tipc: fix a memory leak in tipc_nl_node_get_link()
	vmxnet3: repair memory leak
	net: Allow neigh contructor functions ability to modify the primary_key
	ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
	ppp: unlock all_ppp_mutex before registering device
	be2net: restore properly promisc mode after queues reconfiguration
	ip6_gre: init dev->mtu and dev->hard_header_len correctly
	gso: validate gso_type in GSO handlers
	mlxsw: spectrum_router: Don't log an error on missing neighbor
	tun: fix a memory leak for tfile->tx_array
	flow_dissector: properly cap thoff field
	perf/x86/amd/power: Do not load AMD power module on !AMD platforms
	x86/microcode/intel: Extend BDW late-loading further with LLC size check
	hrtimer: Reset hrtimer cpu base proper on CPU hotplug
	x86: bpf_jit: small optimization in emit_bpf_tail_call()
	bpf: fix bpf_tail_call() x64 JIT
	bpf: introduce BPF_JIT_ALWAYS_ON config
	bpf: arsh is not supported in 32 bit alu thus reject it
	bpf: avoid false sharing of map refcount with max_entries
	bpf: fix divides by zero
	bpf: fix 32-bit divide by zero
	bpf: reject stores into ctx via st and xadd
	nfsd: auth: Fix gid sorting when rootsquash enabled
	Linux 4.9.79

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-31 14:13:00 +01:00
Daniel Borkmann
f531fbb06a bpf: reject stores into ctx via st and xadd
[ upstream commit f37a8cb84c ]

Alexei found that verifier does not reject stores into context
via BPF_ST instead of BPF_STX. And while looking at it, we
also should not allow XADD variant of BPF_STX.

The context rewriter is only assuming either BPF_LDX_MEM- or
BPF_STX_MEM-type operations, thus reject anything other than
that so that assumptions in the rewriter properly hold. Add
test cases as well for BPF selftests.

Fixes: d691f9e8d4 ("bpf: allow programs to write to certain skb fields")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Alexei Starovoitov
265d7657c9 bpf: fix 32-bit divide by zero
[ upstream commit 68fda450a7 ]

due to some JITs doing if (src_reg == 0) check in 64-bit mode
for div/mod operations mask upper 32-bits of src register
before doing the check

Fixes: 622582786c ("net: filter: x86: internal BPF JIT")
Fixes: 7a12b5031c ("sparc64: Add eBPF JIT.")
Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Eric Dumazet
4606077802 bpf: fix divides by zero
[ upstream commit c366287ebd ]

Divides by zero are not nice, lets avoid them if possible.

Also do_div() seems not needed when dealing with 32bit operands,
but this seems a minor detail.

Fixes: bd4cf0ed33 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Daniel Borkmann
fcabc6d008 bpf: arsh is not supported in 32 bit alu thus reject it
[ upstream commit 7891a87efc ]

The following snippet was throwing an 'unknown opcode cc' warning
in BPF interpreter:

  0: (18) r0 = 0x0
  2: (7b) *(u64 *)(r10 -16) = r0
  3: (cc) (u32) r0 s>>= (u32) r0
  4: (95) exit

Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
generation, not all of them do and interpreter does neither. We can
leave existing ones and implement it later in bpf-next for the
remaining ones, but reject this properly in verifier for the time
being.

Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Alexei Starovoitov
a3d6dd6a66 bpf: introduce BPF_JIT_ALWAYS_ON config
[ upstream commit 290af86629 ]

The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."

To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64

The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)

v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
  It will be sent when the trees are merged back to net-next

Considered doing:
  int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:56 +01:00
Alexei Starovoitov
5226bb3b95 bpf: fix bpf_tail_call() x64 JIT
[ upstream commit 90caccdd8c ]

- bpf prog_array just like all other types of bpf array accepts 32-bit index.
  Clarify that in the comment.
- fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
- tighten corresponding check in the interpreter to stay consistent

The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
in commit 96eabe7a40 in 4.14. Before that the map_flags would stay zero and
though JIT code is wrong it will check bounds correctly.
Hence two fixes tags. All other JITs don't have this problem.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 96eabe7a40 ("bpf: Allow selecting numa node during map creation")
Fixes: b52f00e6a7 ("x86: bpf_jit: implement bpf_tail_call() helper")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:56 +01:00
Greg Kroah-Hartman
033d019ce2 Merge 4.9.77 into android-4.9
Changes in 4.9.77
	dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
	mac80211: Add RX flag to indicate ICV stripped
	ath10k: rebuild crypto header in rx data frames
	KVM: Fix stack-out-of-bounds read in write_mmio
	can: gs_usb: fix return value of the "set_bittiming" callback
	IB/srpt: Disable RDMA access by the initiator
	MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
	MIPS: Factor out NT_PRFPREG regset access helpers
	MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
	MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
	MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
	MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
	MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
	kvm: vmx: Scrub hardware GPRs at VM-exit
	platform/x86: wmi: Call acpi_wmi_init() later
	x86/acpi: Handle SCI interrupts above legacy space gracefully
	ALSA: pcm: Remove incorrect snd_BUG_ON() usages
	ALSA: pcm: Add missing error checks in OSS emulation plugin builder
	ALSA: pcm: Abort properly at pending signal in OSS read/write loops
	ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
	ALSA: aloop: Release cable upon open error path
	ALSA: aloop: Fix inconsistent format due to incomplete rule
	ALSA: aloop: Fix racy hw constraints adjustment
	x86/acpi: Reduce code duplication in mp_override_legacy_irq()
	zswap: don't param_set_charp while holding spinlock
	lan78xx: use skb_cow_head() to deal with cloned skbs
	sr9700: use skb_cow_head() to deal with cloned skbs
	smsc75xx: use skb_cow_head() to deal with cloned skbs
	cx82310_eth: use skb_cow_head() to deal with cloned skbs
	xhci: Fix ring leak in failure path of xhci_alloc_virt_device()
	8021q: fix a memory leak for VLAN 0 device
	ip6_tunnel: disable dst caching if tunnel is dual-stack
	net: core: fix module type in sock_diag_bind
	RDS: Heap OOB write in rds_message_alloc_sgs()
	RDS: null pointer dereference in rds_atomic_free_op
	sh_eth: fix TSU resource handling
	sh_eth: fix SH7757 GEther initialization
	net: stmmac: enable EEE in MII, GMII or RGMII only
	ipv6: fix possible mem leaks in ipv6_make_skb()
	ethtool: do not print warning for applications using legacy API
	mlxsw: spectrum_router: Fix NULL pointer deref
	net/sched: Fix update of lastuse in act modules implementing stats_update
	crypto: algapi - fix NULL dereference in crypto_remove_spawns()
	rbd: set max_segments to USHRT_MAX
	x86/microcode/intel: Extend BDW late-loading with a revision check
	KVM: x86: Add memory barrier on vmcs field lookup
	drm/vmwgfx: Potential off by one in vmw_view_add()
	kaiser: Set _PAGE_NX only if supported
	iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
	target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
	bpf: move fixup_bpf_calls() function
	bpf: refactor fixup_bpf_calls()
	bpf: prevent out-of-bounds speculation
	bpf, array: fix overflow in max_entries and undefined behavior in index_mask
	USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
	USB: serial: cp210x: add new device ID ELV ALC 8xxx
	usb: misc: usb3503: make sure reset is low for at least 100us
	USB: fix usbmon BUG trigger
	usbip: remove kernel addresses from usb device and urb debug msgs
	usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
	usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
	staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
	Bluetooth: Prevent stack info leak from the EFS element.
	uas: ignore UAS for Norelsys NS1068(X) chips
	e1000e: Fix e1000_check_for_copper_link_ich8lan return value.
	x86/Documentation: Add PTI description
	x86/cpu: Factor out application of forced CPU caps
	x86/cpufeatures: Make CPU bugs sticky
	x86/cpufeatures: Add X86_BUG_CPU_INSECURE
	x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
	x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
	x86/cpu: Merge bugs.c and bugs_64.c
	sysfs/cpu: Add vulnerability folder
	x86/cpu: Implement CPU vulnerabilites sysfs functions
	x86/cpu/AMD: Make LFENCE a serializing instruction
	x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
	sysfs/cpu: Fix typos in vulnerability documentation
	x86/alternatives: Fix optimize_nops() checking
	x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
	x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
	objtool, modules: Discard objtool annotation sections for modules
	objtool: Detect jumps to retpoline thunks
	objtool: Allow alternatives to be ignored
	x86/asm: Use register variable to get stack pointer value
	x86/retpoline: Add initial retpoline support
	x86/spectre: Add boot time option to select Spectre v2 mitigation
	x86/retpoline/crypto: Convert crypto assembler indirect jumps
	x86/retpoline/entry: Convert entry assembler indirect jumps
	x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
	x86/retpoline/hyperv: Convert assembler indirect jumps
	x86/retpoline/xen: Convert Xen hypercall indirect jumps
	x86/retpoline/checksum32: Convert assembler indirect jumps
	x86/retpoline/irq32: Convert assembler indirect jumps
	x86/retpoline: Fill return stack buffer on vmexit
	selftests/x86: Add test_vsyscall
	x86/retpoline: Remove compile time warning
	objtool: Fix retpoline support for pre-ORC objtool
	x86/pti/efi: broken conversion from efi to kernel page table
	Linux 4.9.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-17 10:29:45 +01:00
Daniel Borkmann
820ef2a0e5 bpf, array: fix overflow in max_entries and undefined behavior in index_mask
commit bbeb6e4323 upstream.

syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.

However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr->max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.

Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.

This fixes all the issues triggered by syzkaller's reproducers.

Fixes: b2157399cc ("bpf: prevent out-of-bounds speculation")
Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Alexei Starovoitov
a9bfac14cd bpf: prevent out-of-bounds speculation
commit b2157399cc upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index >= max_entries) {
    index &= map->index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[ Backported to 4.9 - gregkh ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Alexei Starovoitov
f55093dccd bpf: refactor fixup_bpf_calls()
commit 79741b3bde upstream.

reduce indent and make it iterate over instructions similar to
convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[Backported to 4.9.y - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Alexei Starovoitov
28035366af bpf: move fixup_bpf_calls() function
commit e245c5c6a5 upstream.

no functional change.
move fixup_bpf_calls() to verifier.c
it's being refactored in the next patch

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[backported to 4.9 - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Greg Kroah-Hartman
f3f3457d45 Merge 4.9.73 into android-4.9
Changes in 4.9.73
	ACPI: APEI / ERST: Fix missing error handling in erst_reader()
	acpi, nfit: fix health event notification
	crypto: mcryptd - protect the per-CPU queue with a lock
	mfd: cros ec: spi: Don't send first message too soon
	mfd: twl4030-audio: Fix sibling-node lookup
	mfd: twl6040: Fix child-node lookup
	ALSA: rawmidi: Avoid racy info ioctl via ctl device
	ALSA: usb-audio: Add native DSD support for Esoteric D-05X
	ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
	PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
	parisc: Hide Diva-built-in serial aux and graphics card
	spi: xilinx: Detect stall with Unknown commands
	pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
	KVM: X86: Fix load RFLAGS w/o the fixed bit
	kvm: x86: fix RSM when PCID is non-zero
	clk: sunxi: sun9i-mmc: Implement reset callback for reset controls
	powerpc/perf: Dereference BHRB entries safely
	libnvdimm, pfn: fix start_pad handling for aligned namespaces
	net: mvneta: clear interface link status on port disable
	net: mvneta: use proper rxq_number in loop on rx queues
	net: mvneta: eliminate wrong call to handle rx descriptor error
	bpf/verifier: Fix states_equal() comparison of pointer and UNKNOWN
	Linux 4.9.73

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-29 18:18:15 +01:00
Ben Hutchings
37435f7e80 bpf/verifier: Fix states_equal() comparison of pointer and UNKNOWN
An UNKNOWN_VALUE is not supposed to be derived from a pointer, unless
pointer leaks are allowed.  Therefore, states_equal() must not treat
a state with a pointer in a register as "equal" to a state with an
UNKNOWN_VALUE in that register.

This was fixed differently upstream, but the code around here was
largely rewritten in 4.14 by commit f1174f77b5 "bpf/verifier: rework
value tracking".  The bug can be detected by the bpf/verifier sub-test
"pointer/scalar confusion in state equality check (way 1)".

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Jann Horn <jannh@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
2017-12-29 17:43:00 +01:00