Commit Graph

104400 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
18ba00a34e Merge 4.19.19 into android-4.19
Changes in 4.19.19
	amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs
	net: bridge: Fix ethernet header pointer before check skb forwardable
	net: Fix usage of pskb_trim_rcsum
	net: phy: marvell: Errata for mv88e6390 internal PHYs
	net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling
	net/sched: act_tunnel_key: fix memory leak in case of action replace
	net_sched: refetch skb protocol for each filter
	openvswitch: Avoid OOB read when parsing flow nlattrs
	vhost: log dirty page correctly
	mlxsw: pci: Increase PCI SW reset timeout
	net: ipv4: Fix memory leak in network namespace dismantle
	mlxsw: spectrum_fid: Update dummy FID index
	mlxsw: pci: Ring CQ's doorbell before RDQ's
	net/sched: cls_flower: allocate mask dynamically in fl_change()
	udp: with udp_segment release on error path
	ip6_gre: fix tunnel list corruption for x-netns
	erspan: build the header with the right proto according to erspan_ver
	net: phy: marvell: Fix deadlock from wrong locking
	ip6_gre: update version related info when changing link
	tcp: allow MSG_ZEROCOPY transmission also in CLOSE_WAIT state
	mei: me: mark LBG devices as having dma support
	mei: me: add denverton innovation engine device IDs
	USB: leds: fix regression in usbport led trigger
	USB: serial: simple: add Motorola Tetra TPG2200 device id
	USB: serial: pl2303: add new PID to support PL2303TB
	ceph: clear inode pointer when snap realm gets dropped by its inode
	ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages
	ASoC: rt5514-spi: Fix potential NULL pointer dereference
	ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
	clk: socfpga: stratix10: fix rate calculation for pll clocks
	clk: socfpga: stratix10: fix naming convention for the fixed-clocks
	inotify: Fix fd refcount leak in inotify_add_watch().
	ALSA: hda/realtek - Fix typo for ALC225 model
	ALSA: hda - Add mute LED support for HP ProBook 470 G5
	ARCv2: lib: memeset: fix doing prefetchw outside of buffer
	ARC: adjust memblock_reserve of kernel memory
	ARC: perf: map generic branches to correct hardware condition
	s390/mm: always force a load of the primary ASCE on context switch
	s390/early: improve machine detection
	s390/smp: fix CPU hotplug deadlock with CPU rescan
	misc: ibmvsm: Fix potential NULL pointer dereference
	char/mwave: fix potential Spectre v1 vulnerability
	mmc: dw_mmc-bluefield: : Fix the license information
	mmc: meson-gx: Free irq in release() callback
	staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1
	tty: Handle problem if line discipline does not have receive_buf
	uart: Fix crash in uart_write and uart_put_char
	tty/n_hdlc: fix __might_sleep warning
	hv_balloon: avoid touching uninitialized struct page during tail onlining
	Drivers: hv: vmbus: Check for ring when getting debug info
	vgacon: unconfuse vc_origin when using soft scrollback
	CIFS: Fix possible hang during async MTU reads and writes
	CIFS: Fix credits calculations for reads with errors
	CIFS: Fix credit calculation for encrypted reads with errors
	CIFS: Do not reconnect TCP session in add_credits()
	smb3: add credits we receive from oplock/break PDUs
	Input: xpad - add support for SteelSeries Stratus Duo
	Input: input_event - provide override for sparc64
	Input: uinput - fix undefined behavior in uinput_validate_absinfo()
	acpi/nfit: Block function zero DSMs
	acpi/nfit: Fix command-supported detection
	scsi: ufs: Use explicit access size in ufshcd_dump_regs
	dm thin: fix passdown_double_checking_shared_status()
	dm crypt: fix parsing of extended IV arguments
	drm/amdgpu: Add APTX quirk for Lenovo laptop
	KVM: x86: Fix single-step debugging
	KVM: x86: Fix PV IPIs for 32-bit KVM host
	KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error
	kvm: x86/vmx: Use kzalloc for cached_vmcs12
	KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned
	x86/pkeys: Properly copy pkey state at fork()
	x86/selftests/pkeys: Fork() to check for state being preserved
	x86/kaslr: Fix incorrect i8254 outb() parameters
	x86/entry/64/compat: Fix stack switching for XEN PV
	posix-cpu-timers: Unbreak timer rearming
	net: sun: cassini: Cleanup license conflict
	irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
	can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it
	can: bcm: check timer values before ktime conversion
	can: flexcan: fix NULL pointer exception during bringup
	vt: make vt_console_print() compatible with the unicode screen buffer
	vt: always call notifier with the console lock held
	vt: invoke notifier on screen size change
	drm/meson: Fix atomic mode switching regression
	bpf: improve verifier branch analysis
	bpf: add per-insn complexity limit
	bpf: move {prev_,}insn_idx into verifier env
	bpf: move tmp variable into ax register in interpreter
	bpf: enable access to ax register also from verifier rewrite
	bpf: restrict map value pointer arithmetic for unprivileged
	bpf: restrict stack pointer arithmetic for unprivileged
	bpf: restrict unknown scalars of mixed signed bounds for unprivileged
	bpf: fix check_map_access smin_value test when pointer contains offset
	bpf: prevent out of bounds speculation on pointer arithmetic
	bpf: fix sanitation of alu op with pointer / scalar type from different paths
	bpf: fix inner map masking to prevent oob under speculation
	s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU
	nvmet-rdma: Add unlikely for response allocated check
	nvmet-rdma: fix null dereference under heavy load
	Revert "mm, memory_hotplug: initialize struct pages for the full memory section"
	usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup
	ide: fix a typo in the settings proc file name
	Input: input_event - fix the CONFIG_SPARC64 mixup
	Linux 4.19.19

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-31 08:29:40 +01:00
Deepa Dinamani
3a3b6a6b15 Input: input_event - fix the CONFIG_SPARC64 mixup
commit 141e5dcaa7 upstream.

Arnd Bergmann pointed out that CONFIG_* cannot be used in a uapi header.
Override with an equivalent conditional.

Fixes: 2e746942eb ("Input: input_event - provide override for sparc64")
Fixes: 152194fe9c ("Input: extend usable life of event timestamps to 2106 on 32 bit systems")
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31 08:14:42 +01:00
Daniel Borkmann
eed84f94ff bpf: fix sanitation of alu op with pointer / scalar type from different paths
[ commit d3bd7413e0 upstream ]

While 979d63d50c ("bpf: prevent out of bounds speculation on pointer
arithmetic") took care of rejecting alu op on pointer when e.g. pointer
came from two different map values with different map properties such as
value size, Jann reported that a case was not covered yet when a given
alu op is used in both "ptr_reg += reg" and "numeric_reg += reg" from
different branches where we would incorrectly try to sanitize based
on the pointer's limit. Catch this corner case and reject the program
instead.

Fixes: 979d63d50c ("bpf: prevent out of bounds speculation on pointer arithmetic")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:41 +01:00
Daniel Borkmann
f92a819b4c bpf: prevent out of bounds speculation on pointer arithmetic
[ commit 979d63d50c upstream ]

Jann reported that the original commit back in b2157399cc
("bpf: prevent out-of-bounds speculation") was not sufficient
to stop CPU from speculating out of bounds memory access:
While b2157399cc only focussed on masking array map access
for unprivileged users for tail calls and data access such
that the user provided index gets sanitized from BPF program
and syscall side, there is still a more generic form affected
from BPF programs that applies to most maps that hold user
data in relation to dynamic map access when dealing with
unknown scalars or "slow" known scalars as access offset, for
example:

  - Load a map value pointer into R6
  - Load an index into R7
  - Do a slow computation (e.g. with a memory dependency) that
    loads a limit into R8 (e.g. load the limit from a map for
    high latency, then mask it to make the verifier happy)
  - Exit if R7 >= R8 (mispredicted branch)
  - Load R0 = R6[R7]
  - Load R0 = R6[R0]

For unknown scalars there are two options in the BPF verifier
where we could derive knowledge from in order to guarantee
safe access to the memory: i) While </>/<=/>= variants won't
allow to derive any lower or upper bounds from the unknown
scalar where it would be safe to add it to the map value
pointer, it is possible through ==/!= test however. ii) another
option is to transform the unknown scalar into a known scalar,
for example, through ALU ops combination such as R &= <imm>
followed by R |= <imm> or any similar combination where the
original information from the unknown scalar would be destroyed
entirely leaving R with a constant. The initial slow load still
precedes the latter ALU ops on that register, so the CPU
executes speculatively from that point. Once we have the known
scalar, any compare operation would work then. A third option
only involving registers with known scalars could be crafted
as described in [0] where a CPU port (e.g. Slow Int unit)
would be filled with many dependent computations such that
the subsequent condition depending on its outcome has to wait
for evaluation on its execution port and thereby executing
speculatively if the speculated code can be scheduled on a
different execution port, or any other form of mistraining
as described in [1], for example. Given this is not limited
to only unknown scalars, not only map but also stack access
is affected since both is accessible for unprivileged users
and could potentially be used for out of bounds access under
speculation.

In order to prevent any of these cases, the verifier is now
sanitizing pointer arithmetic on the offset such that any
out of bounds speculation would be masked in a way where the
pointer arithmetic result in the destination register will
stay unchanged, meaning offset masked into zero similar as
in array_index_nospec() case. With regards to implementation,
there are three options that were considered: i) new insn
for sanitation, ii) push/pop insn and sanitation as inlined
BPF, iii) reuse of ax register and sanitation as inlined BPF.

Option i) has the downside that we end up using from reserved
bits in the opcode space, but also that we would require
each JIT to emit masking as native arch opcodes meaning
mitigation would have slow adoption till everyone implements
it eventually which is counter-productive. Option ii) and iii)
have both in common that a temporary register is needed in
order to implement the sanitation as inlined BPF since we
are not allowed to modify the source register. While a push /
pop insn in ii) would be useful to have in any case, it
requires once again that every JIT needs to implement it
first. While possible, amount of changes needed would also
be unsuitable for a -stable patch. Therefore, the path which
has fewer changes, less BPF instructions for the mitigation
and does not require anything to be changed in the JITs is
option iii) which this work is pursuing. The ax register is
already mapped to a register in all JITs (modulo arm32 where
it's mapped to stack as various other BPF registers there)
and used in constant blinding for JITs-only so far. It can
be reused for verifier rewrites under certain constraints.
The interpreter's tmp "register" has therefore been remapped
into extending the register set with hidden ax register and
reusing that for a number of instructions that needed the
prior temporary variable internally (e.g. div, mod). This
allows for zero increase in stack space usage in the interpreter,
and enables (restricted) generic use in rewrites otherwise as
long as such a patchlet does not make use of these instructions.
The sanitation mask is dynamic and relative to the offset the
map value or stack pointer currently holds.

There are various cases that need to be taken under consideration
for the masking, e.g. such operation could look as follows:
ptr += val or val += ptr or ptr -= val. Thus, the value to be
sanitized could reside either in source or in destination
register, and the limit is different depending on whether
the ALU op is addition or subtraction and depending on the
current known and bounded offset. The limit is derived as
follows: limit := max_value_size - (smin_value + off). For
subtraction: limit := umax_value + off. This holds because
we do not allow any pointer arithmetic that would
temporarily go out of bounds or would have an unknown
value with mixed signed bounds where it is unclear at
verification time whether the actual runtime value would
be either negative or positive. For example, we have a
derived map pointer value with constant offset and bounded
one, so limit based on smin_value works because the verifier
requires that statically analyzed arithmetic on the pointer
must be in bounds, and thus it checks if resulting
smin_value + off and umax_value + off is still within map
value bounds at time of arithmetic in addition to time of
access. Similarly, for the case of stack access we derive
the limit as follows: MAX_BPF_STACK + off for subtraction
and -off for the case of addition where off := ptr_reg->off +
ptr_reg->var_off.value. Subtraction is a special case for
the masking which can be in form of ptr += -val, ptr -= -val,
or ptr -= val. In the first two cases where we know that
the value is negative, we need to temporarily negate the
value in order to do the sanitation on a positive value
where we later swap the ALU op, and restore original source
register if the value was in source.

The sanitation of pointer arithmetic alone is still not fully
sufficient as is, since a scenario like the following could
happen ...

  PTR += 0x1000 (e.g. K-based imm)
  PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
  PTR += 0x1000
  PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
  [...]

... which under speculation could end up as ...

  PTR += 0x1000
  PTR -= 0 [ truncated by mitigation ]
  PTR += 0x1000
  PTR -= 0 [ truncated by mitigation ]
  [...]

... and therefore still access out of bounds. To prevent such
case, the verifier is also analyzing safety for potential out
of bounds access under speculative execution. Meaning, it is
also simulating pointer access under truncation. We therefore
"branch off" and push the current verification state after the
ALU operation with known 0 to the verification stack for later
analysis. Given the current path analysis succeeded it is
likely that the one under speculation can be pruned. In any
case, it is also subject to existing complexity limits and
therefore anything beyond this point will be rejected. In
terms of pruning, it needs to be ensured that the verification
state from speculative execution simulation must never prune
a non-speculative execution path, therefore, we mark verifier
state accordingly at the time of push_stack(). If verifier
detects out of bounds access under speculative execution from
one of the possible paths that includes a truncation, it will
reject such program.

Given we mask every reg-based pointer arithmetic for
unprivileged programs, we've been looking into how it could
affect real-world programs in terms of size increase. As the
majority of programs are targeted for privileged-only use
case, we've unconditionally enabled masking (with its alu
restrictions on top of it) for privileged programs for the
sake of testing in order to check i) whether they get rejected
in its current form, and ii) by how much the number of
instructions and size will increase. We've tested this by
using Katran, Cilium and test_l4lb from the kernel selftests.
For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o
and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb
we've used test_l4lb.o as well as test_l4lb_noinline.o. We
found that none of the programs got rejected by the verifier
with this change, and that impact is rather minimal to none.
balancer_kern.o had 13,904 bytes (1,738 insns) xlated and
7,797 bytes JITed before and after the change. Most complex
program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated
and 18,538 bytes JITed before and after and none of the other
tail call programs in bpf_lxc.o had any changes either. For
the older bpf_lxc_opt_-DUNKNOWN.o object we found a small
increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed
before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed
after the change. Other programs from that object file had
similar small increase. Both test_l4lb.o had no change and
remained at 6,544 bytes (817 insns) xlated and 3,401 bytes
JITed and for test_l4lb_noinline.o constant at 5,080 bytes
(634 insns) xlated and 3,313 bytes JITed. This can be explained
in that LLVM typically optimizes stack based pointer arithmetic
by using K-based operations and that use of dynamic map access
is not overly frequent. However, in future we may decide to
optimize the algorithm further under known guarantees from
branch and value speculation. Latter seems also unclear in
terms of prediction heuristics that today's CPUs apply as well
as whether there could be collisions in e.g. the predictor's
Value History/Pattern Table for triggering out of bounds access,
thus masking is performed unconditionally at this point but could
be subject to relaxation later on. We were generally also
brainstorming various other approaches for mitigation, but the
blocker was always lack of available registers at runtime and/or
overhead for runtime tracking of limits belonging to a specific
pointer. Thus, we found this to be minimally intrusive under
given constraints.

With that in place, a simple example with sanitized access on
unprivileged load at post-verification time looks as follows:

  # bpftool prog dump xlated id 282
  [...]
  28: (79) r1 = *(u64 *)(r7 +0)
  29: (79) r2 = *(u64 *)(r7 +8)
  30: (57) r1 &= 15
  31: (79) r3 = *(u64 *)(r0 +4608)
  32: (57) r3 &= 1
  33: (47) r3 |= 1
  34: (2d) if r2 > r3 goto pc+19
  35: (b4) (u32) r11 = (u32) 20479  |
  36: (1f) r11 -= r2                | Dynamic sanitation for pointer
  37: (4f) r11 |= r2                | arithmetic with registers
  38: (87) r11 = -r11               | containing bounded or known
  39: (c7) r11 s>>= 63              | scalars in order to prevent
  40: (5f) r11 &= r2                | out of bounds speculation.
  41: (0f) r4 += r11                |
  42: (71) r4 = *(u8 *)(r4 +0)
  43: (6f) r4 <<= r1
  [...]

For the case where the scalar sits in the destination register
as opposed to the source register, the following code is emitted
for the above example:

  [...]
  16: (b4) (u32) r11 = (u32) 20479
  17: (1f) r11 -= r2
  18: (4f) r11 |= r2
  19: (87) r11 = -r11
  20: (c7) r11 s>>= 63
  21: (5f) r2 &= r11
  22: (0f) r2 += r0
  23: (61) r0 = *(u32 *)(r2 +0)
  [...]

JIT blinding example with non-conflicting use of r10:

  [...]
   d5:	je     0x0000000000000106    _
   d7:	mov    0x0(%rax),%edi       |
   da:	mov    $0xf153246,%r10d     | Index load from map value and
   e0:	xor    $0xf153259,%r10      | (const blinded) mask with 0x1f.
   e7:	and    %r10,%rdi            |_
   ea:	mov    $0x2f,%r10d          |
   f0:	sub    %rdi,%r10            | Sanitized addition. Both use r10
   f3:	or     %rdi,%r10            | but do not interfere with each
   f6:	neg    %r10                 | other. (Neither do these instructions
   f9:	sar    $0x3f,%r10           | interfere with the use of ax as temp
   fd:	and    %r10,%rdi            | in interpreter.)
  100:	add    %rax,%rdi            |_
  103:	mov    0x0(%rdi),%eax
 [...]

Tested that it fixes Jann's reproducer, and also checked that test_verifier
and test_progs suite with interpreter, JIT and JIT with hardening enabled
on x86-64 and arm64 runs successfully.

  [0] Speculose: Analyzing the Security Implications of Speculative
      Execution in CPUs, Giorgi Maisuradze and Christian Rossow,
      https://arxiv.org/pdf/1801.04084.pdf

  [1] A Systematic Evaluation of Transient Execution Attacks and
      Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz,
      Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens,
      Dmitry Evtyushkin, Daniel Gruss,
      https://arxiv.org/pdf/1811.05441.pdf

Fixes: b2157399cc ("bpf: prevent out-of-bounds speculation")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:41 +01:00
Daniel Borkmann
232ac70dd3 bpf: enable access to ax register also from verifier rewrite
[ commit 9b73bfdd08 upstream ]

Right now we are using BPF ax register in JIT for constant blinding as
well as in interpreter as temporary variable. Verifier will not be able
to use it simply because its use will get overridden from the former in
bpf_jit_blind_insn(). However, it can be made to work in that blinding
will be skipped if there is prior use in either source or destination
register on the instruction. Taking constraints of ax into account, the
verifier is then open to use it in rewrites under some constraints. Note,
ax register already has mappings in every eBPF JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:40 +01:00
Daniel Borkmann
b855e31037 bpf: move tmp variable into ax register in interpreter
[ commit 144cd91c4c upstream ]

This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run()
into the hidden ax register. The latter is currently only used in JITs
for constant blinding as a temporary scratch register, meaning the BPF
interpreter will never see the use of ax. Therefore it is safe to use
it for the cases where tmp has been used earlier. This is needed to later
on allow restricted hidden use of ax in both interpreter and JITs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:40 +01:00
Daniel Borkmann
333a31c89a bpf: move {prev_,}insn_idx into verifier env
[ commit c08435ec7f upstream ]

Move prev_insn_idx and insn_idx from the do_check() function into
the verifier environment, so they can be read inside the various
helper functions for handling the instructions. It's easier to put
this into the environment rather than changing all call-sites only
to pass it along. insn_idx is useful in particular since this later
on allows to hold state in env->insn_aux_data[env->insn_idx].

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:40 +01:00
Deepa Dinamani
71b1af8774 Input: input_event - provide override for sparc64
commit 2e746942eb upstream.

The usec part of the timeval is defined as
__kernel_suseconds_t	tv_usec; /* microseconds */

Arnd noticed that sparc64 is the only architecture that defines
__kernel_suseconds_t as int rather than long.

This breaks the current y2038 fix for kernel as we only access and define
the timeval struct for non-kernel use cases.  But, this was hidden by an
another typo in the use of __KERNEL__ qualifier.

Fix the typo, and provide an override for sparc64.

Fixes: 152194fe9c ("Input: extend usable life of event timestamps to 2106 on 32 bit systems")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31 08:14:37 +01:00
Dexuan Cui
a912e16fae Drivers: hv: vmbus: Check for ring when getting debug info
commit ba50bf1ce9 upstream.

fc96df16a1 is good and can already fix the "return stack garbage" issue,
but let's also improve hv_ringbuffer_get_debuginfo(), which would silently
return stack garbage, if people forget to check channel->state or
ring_info->ring_buffer, when using the function in the future.

Having an error check in the function would eliminate the potential risk.

Add a Fixes tag to indicate the patch depdendency.

Fixes: fc96df16a1 ("Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels")
Cc: stable@vger.kernel.org
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31 08:14:36 +01:00
Ido Schimmel
adbf7e5809 net: ipv4: Fix memory leak in network namespace dismantle
[ Upstream commit f97f4dd8b3 ]

IPv4 routing tables are flushed in two cases:

1. In response to events in the netdev and inetaddr notification chains
2. When a network namespace is being dismantled

In both cases only routes associated with a dead nexthop group are
flushed. However, a nexthop group will only be marked as dead in case it
is populated with actual nexthops using a nexthop device. This is not
the case when the route in question is an error route (e.g.,
'blackhole', 'unreachable').

Therefore, when a network namespace is being dismantled such routes are
not flushed and leaked [1].

To reproduce:
# ip netns add blue
# ip -n blue route add unreachable 192.0.2.0/24
# ip netns del blue

Fix this by not skipping error routes that are not marked with
RTNH_F_DEAD when flushing the routing tables.

To prevent the flushing of such routes in case #1, add a parameter to
fib_table_flush() that indicates if the table is flushed as part of
namespace dismantle or not.

Note that this problem does not exist in IPv6 since error routes are
associated with the loopback device.

[1]
unreferenced object 0xffff888066650338 (size 56):
  comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff  ..........ba....
    e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00  ...d............
  backtrace:
    [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
    [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
    [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
    [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
    [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
    [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
    [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
    [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
    [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
    [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000003a8b605b>] 0xffffffffffffffff
unreferenced object 0xffff888061621c88 (size 48):
  comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
  hex dump (first 32 bytes):
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
    6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff  kkkkkkkk..&_....
  backtrace:
    [<00000000733609e3>] fib_table_insert+0x978/0x1500
    [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
    [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
    [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
    [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
    [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
    [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
    [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
    [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
    [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
    [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000003a8b605b>] 0xffffffffffffffff

Fixes: 8cced9eff1 ("[NETNS]: Enable routing configuration in non-initial namespace.")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31 08:14:32 +01:00
Ross Lagerwall
40f2f08030 net: Fix usage of pskb_trim_rcsum
[ Upstream commit 6c57f04580 ]

In certain cases, pskb_trim_rcsum() may change skb pointers.
Reinitialize header pointers afterwards to avoid potential
use-after-frees. Add a note in the documentation of
pskb_trim_rcsum(). Found by KASAN.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31 08:14:31 +01:00
Greg Kroah-Hartman
26bf816608 Merge 4.19.18 into android-4.19
Changes in 4.19.18
	ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address
	mlxsw: spectrum: Disable lag port TX before removing it
	mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion
	net: dsa: mv88x6xxx: mv88e6390 errata
	net, skbuff: do not prefer skb allocation fails early
	qmi_wwan: add MTU default to qmap network interface
	r8169: Add support for new Realtek Ethernet
	ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses
	net: clear skb->tstamp in bridge forwarding path
	netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets
	gpio: pl061: Move irq_chip definition inside struct pl061
	drm/amd/display: Guard against null stream_state in set_crc_source
	drm/amdkfd: fix interrupt spin lock
	ixgbe: allow IPsec Tx offload in VEPA mode
	platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey
	e1000e: allow non-monotonic SYSTIM readings
	usb: typec: tcpm: Do not disconnect link for self powered devices
	selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
	of: overlay: add missing of_node_put() after add new node to changeset
	writeback: don't decrement wb->refcnt if !wb->bdi
	serial: set suppress_bind_attrs flag only if builtin
	bpf: Allow narrow loads with offset > 0
	ALSA: oxfw: add support for APOGEE duet FireWire
	x86/mce: Fix -Wmissing-prototypes warnings
	MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
	crypto: ecc - regularize scalar for scalar multiplication
	arm64: perf: set suppress_bind_attrs flag to true
	drm/atomic-helper: Complete fake_commit->flip_done potentially earlier
	clk: meson: meson8b: fix incorrect divider mapping in cpu_scale_table
	samples: bpf: fix: error handling regarding kprobe_events
	usb: gadget: udc: renesas_usb3: add a safety connection way for forced_b_device
	fpga: altera-cvp: fix probing for multiple FPGAs on the bus
	selinux: always allow mounting submounts
	ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined
	scsi: qedi: Check for session online before getting iSCSI TLV data.
	drm/amdgpu: Reorder uvd ring init before uvd resume
	rxe: IB_WR_REG_MR does not capture MR's iova field
	efi/libstub: Disable some warnings for x86{,_64}
	jffs2: Fix use of uninitialized delayed_work, lockdep breakage
	clk: imx: make mux parent strings const
	pstore/ram: Do not treat empty buffers as valid
	media: uvcvideo: Refactor teardown of uvc on USB disconnect
	powerpc/xmon: Fix invocation inside lock region
	powerpc/pseries/cpuidle: Fix preempt warning
	media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
	ASoC: use dma_ops of parent device for acp_audio_dma
	media: venus: core: Set dma maximum segment size
	staging: erofs: fix use-after-free of on-stack `z_erofs_vle_unzip_io'
	net: call sk_dst_reset when set SO_DONTROUTE
	scsi: target: use consistent left-aligned ASCII INQUIRY data
	scsi: target/core: Make sure that target_wait_for_sess_cmds() waits long enough
	selftests: do not macro-expand failed assertion expressions
	arm64: kasan: Increase stack size for KASAN_EXTRA
	clk: imx6q: reset exclusive gates on init
	arm64: Fix minor issues with the dcache_by_line_op macro
	bpf: relax verifier restriction on BPF_MOV | BPF_ALU
	kconfig: fix file name and line number of warn_ignored_character()
	kconfig: fix memory leak when EOF is encountered in quotation
	mmc: atmel-mci: do not assume idle after atmci_request_end
	btrfs: volumes: Make sure there is no overlap of dev extents at mount time
	btrfs: alloc_chunk: fix more DUP stripe size handling
	btrfs: fix use-after-free due to race between replace start and cancel
	btrfs: improve error handling of btrfs_add_link
	tty/serial: do not free trasnmit buffer page under port lock
	perf intel-pt: Fix error with config term "pt=0"
	perf tests ARM: Disable breakpoint tests 32-bit
	perf svghelper: Fix unchecked usage of strncpy()
	perf parse-events: Fix unchecked usage of strncpy()
	perf vendor events intel: Fix Load_Miss_Real_Latency on SKL/SKX
	netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set
	netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine
	netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine
	x86/topology: Use total_cpus for max logical packages calculation
	dm crypt: use u64 instead of sector_t to store iv_offset
	dm kcopyd: Fix bug causing workqueue stalls
	perf stat: Avoid segfaults caused by negated options
	tools lib subcmd: Don't add the kernel sources to the include path
	dm snapshot: Fix excessive memory usage and workqueue stalls
	perf cs-etm: Correct packets swapping in cs_etm__flush()
	perf tools: Add missing sigqueue() prototype for systems lacking it
	perf tools: Add missing open_memstream() prototype for systems lacking it
	quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls.
	clocksource/drivers/integrator-ap: Add missing of_node_put()
	dm: Check for device sector overflow if CONFIG_LBDAF is not set
	Bluetooth: btusb: Add support for Intel bluetooth device 8087:0029
	ALSA: bebob: fix model-id of unit for Apogee Ensemble
	sysfs: Disable lockdep for driver bind/unbind files
	IB/usnic: Fix potential deadlock
	scsi: mpt3sas: fix memory ordering on 64bit writes
	scsi: smartpqi: correct lun reset issues
	ath10k: fix peer stats null pointer dereference
	scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
	scsi: megaraid: fix out-of-bound array accesses
	iomap: don't search past page end in iomap_is_partially_uptodate
	ocfs2: fix panic due to unrecovered local alloc
	mm/page-writeback.c: don't break integrity writeback on ->writepage() error
	mm/swap: use nr_node_ids for avail_lists in swap_info_struct
	userfaultfd: clear flag if remap event not enabled
	mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps
	iwlwifi: mvm: Send LQ command as async when necessary
	Bluetooth: Fix unnecessary error message for HCI request completion
	ipmi: fix use-after-free of user->release_barrier.rda
	ipmi: msghandler: Fix potential Spectre v1 vulnerabilities
	ipmi: Prevent use-after-free in deliver_response
	ipmi:ssif: Fix handling of multi-part return messages
	ipmi: Don't initialize anything in the core until something uses it
	Linux 4.19.18

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-26 11:58:37 +01:00
Aaron Lu
b0cd52e644 mm/swap: use nr_node_ids for avail_lists in swap_info_struct
[ Upstream commit 66f71da9dd ]

Since a2468cc9bf ("swap: choose swap device according to numa node"),
avail_lists field of swap_info_struct is changed to an array with
MAX_NUMNODES elements.  This made swap_info_struct size increased to 40KiB
and needs an order-4 page to hold it.

This is not optimal in that:
1 Most systems have way less than MAX_NUMNODES(1024) nodes so it
  is a waste of memory;
2 It could cause swapon failure if the swap device is swapped on
  after system has been running for a while, due to no order-4
  page is available as pointed out by Vasily Averin.

Solve the above two issues by using nr_node_ids(which is the actual
possible node number the running system has) for avail_lists instead of
MAX_NUMNODES.

nr_node_ids is unknown at compile time so can't be directly used when
declaring this array.  What I did here is to declare avail_lists as zero
element array and allocate space for it when allocating space for
swap_info_struct.  The reason why keep using array but not pointer is
plist_for_each_entry needs the field to be part of the struct, so pointer
will not work.

This patch is on top of Vasily Averin's fix commit.  I think the use of
kvzalloc for swap_info_struct is still needed in case nr_node_ids is
really big on some systems.

Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26 09:32:43 +01:00
Bart Van Assche
3ad8148ce0 scsi: target/core: Make sure that target_wait_for_sess_cmds() waits long enough
[ Upstream commit ad669505c4 ]

A session must only be released after all code that accesses the session
structure has finished. Make sure that this is the case by introducing a
new command counter per session that is only decremented after the
.release_cmd() callback has finished. This patch fixes the following crash:

BUG: KASAN: use-after-free in do_raw_spin_lock+0x1c/0x130
Read of size 4 at addr ffff8801534b16e4 by task rmdir/14805
CPU: 16 PID: 14805 Comm: rmdir Not tainted 4.18.0-rc2-dbg+ #5
Call Trace:
dump_stack+0xa4/0xf5
print_address_description+0x6f/0x270
kasan_report+0x241/0x360
__asan_load4+0x78/0x80
do_raw_spin_lock+0x1c/0x130
_raw_spin_lock_irqsave+0x52/0x60
srpt_set_ch_state+0x27/0x70 [ib_srpt]
srpt_disconnect_ch+0x1b/0xc0 [ib_srpt]
srpt_close_session+0xa8/0x260 [ib_srpt]
target_shutdown_sessions+0x170/0x180 [target_core_mod]
core_tpg_del_initiator_node_acl+0xf3/0x200 [target_core_mod]
target_fabric_nacl_base_release+0x25/0x30 [target_core_mod]
config_item_release+0x9c/0x110 [configfs]
config_item_put+0x26/0x30 [configfs]
configfs_rmdir+0x3b8/0x510 [configfs]
vfs_rmdir+0xb3/0x1e0
do_rmdir+0x262/0x2c0
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Disseldorp <ddiss@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26 09:32:38 +01:00
Andrey Ignatov
464b01e440 bpf: Allow narrow loads with offset > 0
[ Upstream commit 46f53a65d2 ]

Currently BPF verifier allows narrow loads for a context field only with
offset zero. E.g. if there is a __u32 field then only the following
loads are permitted:
  * off=0, size=1 (narrow);
  * off=0, size=2 (narrow);
  * off=0, size=4 (full).

On the other hand LLVM can generate a load with offset different than
zero that make sense from program logic point of view, but verifier
doesn't accept it.

E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code:

  #define DST_IP4			0xC0A801FEU /* 192.168.1.254 */
  ...
  	if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) &&

where ctx is struct bpf_sock_addr.

Some versions of LLVM can produce the following byte code for it:

       8:       71 12 07 00 00 00 00 00         r2 = *(u8 *)(r1 + 7)
       9:       67 02 00 00 18 00 00 00         r2 <<= 24
      10:       18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00         r3 = 4261412864 ll
      12:       5d 32 07 00 00 00 00 00         if r2 != r3 goto +7 <LBB0_6>

where `*(u8 *)(r1 + 7)` means narrow load for ctx->user_ip4 with size=1
and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently
rejected by verifier.

Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok()
what means any is_valid_access implementation, that uses the function,
works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or
sock_addr_is_valid_access() for bpf_sock_addr.

The patch makes such loads supported. Offset can be in [0; size_default)
but has to be multiple of load size. E.g. for __u32 field the following
loads are supported now:
  * off=0, size=1 (narrow);
  * off=1, size=1 (narrow);
  * off=2, size=1 (narrow);
  * off=3, size=1 (narrow);
  * off=0, size=2 (narrow);
  * off=2, size=2 (narrow);
  * off=0, size=4 (full).

Reported-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26 09:32:34 +01:00
Anders Roxell
e7a5f00735 writeback: don't decrement wb->refcnt if !wb->bdi
[ Upstream commit 347a28b586 ]

This happened while running in qemu-system-aarch64, the AMBA PL011 UART
driver when enabling CONFIG_DEBUG_TEST_DRIVER_REMOVE.
arch_initcall(pl011_init) came before subsys_initcall(default_bdi_init),
devtmpfs' handle_remove() crashes because the reference count is a NULL
pointer only because wb->bdi hasn't been initialized yet.

Rework so that wb_put have an extra check if wb->bdi before decrement
wb->refcnt and also add a WARN_ON_ONCE to get a warning if it happens again
in other drivers.

Fixes: 52ebea749a ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26 09:32:34 +01:00
Badhri Jagan Sridharan
579f3fc1f4 usb: typec: tcpm: Do not disconnect link for self powered devices
[ Upstream commit 23b5f73266 ]

During HARD_RESET the data link is disconnected.
For self powered device, the spec is advising against doing that.

>From USB_PD_R3_0
7.1.5 Response to Hard Resets
Device operation during and after a Hard Reset is defined as follows:
Self-powered devices Should Not disconnect from USB during a Hard Reset
(see Section 9.1.2).
Bus powered devices will disconnect from USB during a Hard Reset due to the
loss of their power source.

Tackle this by letting TCPM know whether the device is self or bus powered.

This overcomes unnecessary port disconnections from hard reset.
Also, speeds up the enumeration time when connected to Type-A ports.

Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
---------
Version history:
V3:
Rebase on top of usb-next

V2:
Based on feedback from heikki.krogerus@linux.intel.com
- self_powered added to the struct tcpm_port which is populated from
  a. "connector" node of the device tree in tcpm_fw_get_caps()
  b. "self_powered" node of the tcpc_config in tcpm_copy_caps

Based on feedbase from linux@roeck-us.net
- Code was refactored
- SRC_HARD_RESET_VBUS_OFF sets the link state to false based
  on self_powered flag

V1 located here:
https://lkml.org/lkml/2018/9/13/94
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26 09:32:34 +01:00
Greg Kroah-Hartman
73dc755ee0 Merge 4.19.17 into android-4.19
Changes in 4.19.17
	tty/ldsem: Wake up readers after timed out down_write()
	tty: Hold tty_ldisc_lock() during tty_reopen()
	tty: Simplify tty->count math in tty_reopen()
	tty: Don't hold ldisc lock in tty_reopen() if ldisc present
	can: gw: ensure DLC boundaries after CAN frame modification
	netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTS
	netfilter: nf_conncount: don't skip eviction when age is negative
	netfilter: nf_conncount: split gc in two phases
	netfilter: nf_conncount: restart search when nodes have been erased
	netfilter: nf_conncount: merge lookup and add functions
	netfilter: nf_conncount: move all list iterations under spinlock
	netfilter: nf_conncount: speculative garbage collection on empty lists
	netfilter: nf_conncount: fix argument order to find_next_bit
	mmc: sdhci-msm: Disable CDR function on TX
	Revert "scsi: target: iscsi: cxgbit: fix csk leak"
	scsi: target: iscsi: cxgbit: fix csk leak
	scsi: target: iscsi: cxgbit: fix csk leak
	arm64/kvm: consistently handle host HCR_EL2 flags
	arm64: Don't trap host pointer auth use to EL2
	ipv6: fix kernel-infoleak in ipv6_local_error()
	net: bridge: fix a bug on using a neighbour cache entry without checking its state
	packet: Do not leak dev refcounts on error exit
	tcp: change txhash on SYN-data timeout
	tun: publish tfile after it's fully initialized
	lan743x: Remove phy_read from link status change function
	smc: move unhash as early as possible in smc_release()
	r8169: don't try to read counters if chip is in a PCI power-save state
	bonding: update nest level on unlink
	ip: on queued skb use skb_header_pointer instead of pskb_may_pull
	r8169: load Realtek PHY driver module before r8169
	crypto: sm3 - fix undefined shift by >= width of value
	crypto: caam - fix zero-length buffer DMA mapping
	crypto: authencesn - Avoid twice completion call in decrypt path
	crypto: ccree - convert to use crypto_authenc_extractkeys()
	crypto: bcm - convert to use crypto_authenc_extractkeys()
	crypto: authenc - fix parsing key with misaligned rta_len
	crypto: talitos - reorder code in talitos_edesc_alloc()
	crypto: talitos - fix ablkcipher for CONFIG_VMAP_STACK
	xen: Fix x86 sched_clock() interface for xen
	Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
	btrfs: wait on ordered extents on abort cleanup
	Yama: Check for pid death before checking ancestry
	scsi: core: Synchronize request queue PM status only on successful resume
	scsi: sd: Fix cache_type_store()
	mips: fix n32 compat_ipc_parse_version
	MIPS: BCM47XX: Setup struct device for the SoC
	MIPS: lantiq: Fix IPI interrupt handling
	drm/i915/gvt: Fix mmap range check
	OF: properties: add missing of_node_put
	mfd: tps6586x: Handle interrupts on suspend
	media: v4l: ioctl: Validate num_planes for debug messages
	RDMA/nldev: Don't expose unsafe global rkey to regular user
	RDMA/vmw_pvrdma: Return the correct opcode when creating WR
	kbuild: Disable LD_DEAD_CODE_DATA_ELIMINATION with ftrace & GCC <= 4.7
	net: dsa: realtek-smi: fix OF child-node lookup
	pstore/ram: Avoid allocation and leak of platform data
	arm64: kaslr: ensure randomized quantities are clean to the PoC
	arm64: dts: marvell: armada-ap806: reserve PSCI area
	Disable MSI also when pcie-octeon.pcie_disable on
	fix int_sqrt64() for very large numbers
	omap2fb: Fix stack memory disclosure
	media: vivid: fix error handling of kthread_run
	media: vivid: set min width/height to a value > 0
	bpf: in __bpf_redirect_no_mac pull mac only if present
	ipv6: make icmp6_send() robust against null skb->dev
	LSM: Check for NULL cred-security on free
	media: vb2: vb2_mmap: move lock up
	sunrpc: handle ENOMEM in rpcb_getport_async
	netfilter: ebtables: account ebt_table_info to kmemcg
	block: use rcu_work instead of call_rcu to avoid sleep in softirq
	selinux: fix GPF on invalid policy
	blockdev: Fix livelocks on loop device
	sctp: allocate sctp_sockaddr_entry with kzalloc
	tipc: fix uninit-value in in tipc_conn_rcv_sub
	tipc: fix uninit-value in tipc_nl_compat_link_reset_stats
	tipc: fix uninit-value in tipc_nl_compat_bearer_enable
	tipc: fix uninit-value in tipc_nl_compat_link_set
	tipc: fix uninit-value in tipc_nl_compat_name_table_dump
	tipc: fix uninit-value in tipc_nl_compat_doit
	block/loop: Don't grab "struct file" for vfs_getattr() operation.
	block/loop: Use global lock for ioctl() operation.
	loop: Fold __loop_release into loop_release
	loop: Get rid of loop_index_mutex
	loop: Push lo_ctl_mutex down into individual ioctls
	loop: Split setting of lo_state from loop_clr_fd
	loop: Push loop_ctl_mutex down into loop_clr_fd()
	loop: Push loop_ctl_mutex down to loop_get_status()
	loop: Push loop_ctl_mutex down to loop_set_status()
	loop: Push loop_ctl_mutex down to loop_set_fd()
	loop: Push loop_ctl_mutex down to loop_change_fd()
	loop: Move special partition reread handling in loop_clr_fd()
	loop: Move loop_reread_partitions() out of loop_ctl_mutex
	loop: Fix deadlock when calling blkdev_reread_part()
	loop: Avoid circular locking dependency between loop_ctl_mutex and bd_mutex
	loop: Get rid of 'nested' acquisition of loop_ctl_mutex
	loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()
	loop: drop caches if offset or block_size are changed
	drm/fb-helper: Ignore the value of fb_var_screeninfo.pixclock
	selftests: Fix test errors related to lib.mk khdr target
	media: vb2: be sure to unlock mutex on errors
	nbd: Use set_blocksize() to set device blocksize
	Linux 4.19.17

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-23 08:46:58 +01:00
Yufen Yu
4cc66cc4f8 block: use rcu_work instead of call_rcu to avoid sleep in softirq
commit 94a2c3a32b upstream.

We recently got a stack by syzkaller like this:

BUG: sleeping function called from invalid context at mm/slab.h:361
in_atomic(): 1, irqs_disabled(): 0, pid: 6644, name: blkid
INFO: lockdep is turned off.
CPU: 1 PID: 6644 Comm: blkid Not tainted 4.4.163-514.55.6.9.x86_64+ #76
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
 0000000000000000 5ba6a6b879e50c00 ffff8801f6b07b10 ffffffff81cb2194
 0000000041b58ab3 ffffffff833c7745 ffffffff81cb2080 5ba6a6b879e50c00
 0000000000000000 0000000000000001 0000000000000004 0000000000000000
Call Trace:
 <IRQ>  [<ffffffff81cb2194>] __dump_stack lib/dump_stack.c:15 [inline]
 <IRQ>  [<ffffffff81cb2194>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
 [<ffffffff8129a981>] ___might_sleep+0x291/0x490 kernel/sched/core.c:7675
 [<ffffffff8129ac33>] __might_sleep+0xb3/0x270 kernel/sched/core.c:7637
 [<ffffffff81794c13>] slab_pre_alloc_hook mm/slab.h:361 [inline]
 [<ffffffff81794c13>] slab_alloc_node mm/slub.c:2610 [inline]
 [<ffffffff81794c13>] slab_alloc mm/slub.c:2692 [inline]
 [<ffffffff81794c13>] kmem_cache_alloc_trace+0x2c3/0x5c0 mm/slub.c:2709
 [<ffffffff81cbe9a7>] kmalloc include/linux/slab.h:479 [inline]
 [<ffffffff81cbe9a7>] kzalloc include/linux/slab.h:623 [inline]
 [<ffffffff81cbe9a7>] kobject_uevent_env+0x2c7/0x1150 lib/kobject_uevent.c:227
 [<ffffffff81cbf84f>] kobject_uevent+0x1f/0x30 lib/kobject_uevent.c:374
 [<ffffffff81cbb5b9>] kobject_cleanup lib/kobject.c:633 [inline]
 [<ffffffff81cbb5b9>] kobject_release+0x229/0x440 lib/kobject.c:675
 [<ffffffff81cbb0a2>] kref_sub include/linux/kref.h:73 [inline]
 [<ffffffff81cbb0a2>] kref_put include/linux/kref.h:98 [inline]
 [<ffffffff81cbb0a2>] kobject_put+0x72/0xd0 lib/kobject.c:692
 [<ffffffff8216f095>] put_device+0x25/0x30 drivers/base/core.c:1237
 [<ffffffff81c4cc34>] delete_partition_rcu_cb+0x1d4/0x2f0 block/partition-generic.c:232
 [<ffffffff813c08bc>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
 [<ffffffff813c08bc>] rcu_do_batch kernel/rcu/tree.c:2705 [inline]
 [<ffffffff813c08bc>] invoke_rcu_callbacks kernel/rcu/tree.c:2973 [inline]
 [<ffffffff813c08bc>] __rcu_process_callbacks kernel/rcu/tree.c:2940 [inline]
 [<ffffffff813c08bc>] rcu_process_callbacks+0x59c/0x1c70 kernel/rcu/tree.c:2957
 [<ffffffff8120f509>] __do_softirq+0x299/0xe20 kernel/softirq.c:273
 [<ffffffff81210496>] invoke_softirq kernel/softirq.c:350 [inline]
 [<ffffffff81210496>] irq_exit+0x216/0x2c0 kernel/softirq.c:391
 [<ffffffff82c2cd7b>] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
 [<ffffffff82c2cd7b>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
 [<ffffffff82c2bc25>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:746
 <EOI>  [<ffffffff814cbf40>] ? audit_kill_trees+0x180/0x180
 [<ffffffff8187d2f7>] fd_install+0x57/0x80 fs/file.c:626
 [<ffffffff8180989e>] do_sys_open+0x45e/0x550 fs/open.c:1043
 [<ffffffff818099c2>] SYSC_open fs/open.c:1055 [inline]
 [<ffffffff818099c2>] SyS_open+0x32/0x40 fs/open.c:1050
 [<ffffffff82c299e1>] entry_SYSCALL_64_fastpath+0x1e/0x9a

In softirq context, we call rcu callback function delete_partition_rcu_cb(),
which may allocate memory by kzalloc with GFP_KERNEL flag. If the
allocation cannot be satisfied, it may sleep. However, That is not allowed
in softirq contex.

Although we found this problem on linux 4.4, the latest kernel version
seems to have this problem as well. And it is very similar to the
previous one:
	https://lkml.org/lkml/2018/7/9/391

Fix it by using RCU workqueue, which allows sleep.

Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22 21:40:35 +01:00
Adit Ranadive
ec485378a4 RDMA/vmw_pvrdma: Return the correct opcode when creating WR
commit 6325e01b6c upstream.

Since the IB_WR_REG_MR opcode value changed, let's set the PVRDMA device
opcodes explicitly.

Reported-by: Ruishuang Wang <ruishuangw@vmware.com>
Fixes: 9a59739bd0 ("IB/rxe: Revise the ib_wr_opcode enum")
Cc: stable@vger.kernel.org
Reviewed-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Ruishuang Wang <ruishuangw@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22 21:40:34 +01:00
Rafał Miłecki
19f41f32a4 MIPS: BCM47XX: Setup struct device for the SoC
commit 321c46b915 upstream.

So far we never had any device registered for the SoC. This resulted in
some small issues that we kept ignoring like:
1) Not working GPIOLIB_IRQCHIP (gpiochip_irqchip_add_key() failing)
2) Lack of proper tree in the /sys/devices/
3) mips_dma_alloc_coherent() silently handling empty coherent_dma_mask

Kernel 4.19 came with a lot of DMA changes and caused a regression on
bcm47xx. Starting with the commit f8c55dc6e8 ("MIPS: use generic dma
noncoherent ops for simple noncoherent platforms") DMA coherent
allocations just fail. Example:
[    1.114914] bgmac_bcma bcma0:2: Allocation of TX ring 0x200 failed
[    1.121215] bgmac_bcma bcma0:2: Unable to alloc memory for DMA
[    1.127626] bgmac_bcma: probe of bcma0:2 failed with error -12
[    1.133838] bgmac_bcma: Broadcom 47xx GBit MAC driver loaded

The bgmac driver also triggers a WARNING:
[    0.959486] ------------[ cut here ]------------
[    0.964387] WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 bgmac_enet_probe+0x1b4/0x5c4
[    0.973751] Modules linked in:
[    0.976913] CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.9 #0
[    0.982750] Stack : 804a0000 804597c4 00000000 00000000 80458fd8 8381bc2c 838282d4 80481a47
[    0.991367]         8042e3ec 00000001 804d38f0 00000204 83980000 00000065 8381bbe0 6f55b24f
[    0.999975]         00000000 00000000 80520000 00002018 00000000 00000075 00000007 00000000
[    1.008583]         00000000 80480000 000ee811 00000000 00000000 00000000 80432c00 80248db8
[    1.017196]         00000009 00000204 83980000 803ad7b0 00000000 801feeec 00000000 804d0000
[    1.025804]         ...
[    1.028325] Call Trace:
[    1.030875] [<8000aef8>] show_stack+0x58/0x100
[    1.035513] [<8001f8b4>] __warn+0xe4/0x118
[    1.039708] [<8001f9a4>] warn_slowpath_null+0x48/0x64
[    1.044935] [<80248db8>] bgmac_enet_probe+0x1b4/0x5c4
[    1.050101] [<802498e0>] bgmac_probe+0x558/0x590
[    1.054906] [<80252fd0>] bcma_device_probe+0x38/0x70
[    1.060017] [<8020e1e8>] really_probe+0x170/0x2e8
[    1.064891] [<8020e714>] __driver_attach+0xa4/0xec
[    1.069784] [<8020c1e0>] bus_for_each_dev+0x58/0xb0
[    1.074833] [<8020d590>] bus_add_driver+0xf8/0x218
[    1.079731] [<8020ef24>] driver_register+0xcc/0x11c
[    1.084804] [<804b54cc>] bgmac_init+0x1c/0x44
[    1.089258] [<8000121c>] do_one_initcall+0x7c/0x1a0
[    1.094343] [<804a1d34>] kernel_init_freeable+0x150/0x218
[    1.099886] [<803a082c>] kernel_init+0x10/0x104
[    1.104583] [<80005878>] ret_from_kernel_thread+0x14/0x1c
[    1.110107] ---[ end trace f441c0d873d1fb5b ]---

This patch setups a "struct device" (and passes it to the bcma) which
allows fixing all the mentioned problems. It'll also require a tiny bcma
patch which will follow through the wireless tree & its maintainer.

Fixes: f8c55dc6e8 ("MIPS: use generic dma noncoherent ops for simple noncoherent platforms")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-wireless@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22 21:40:33 +01:00
Pablo Neira Ayuso
b01b92417d netfilter: nf_conncount: speculative garbage collection on empty lists
commit c80f10bc97 upstream.

Instead of removing a empty list node that might be reintroduced soon
thereafter, tentatively place the empty list node on the list passed to
tree_nodes_free(), then re-check if the list is empty again before erasing
it from the tree.

[ Florian: rebase on top of pending nf_conncount fixes ]

Fixes: 5c789e131c ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22 21:40:29 +01:00
Florian Westphal
bdc6c893ba netfilter: nf_conncount: merge lookup and add functions
commit df4a902509 upstream.

'lookup' is always followed by 'add'.
Merge both and make the list-walk part of nf_conncount_add().

This also avoids one unneeded unlock/re-lock pair.

Extra care needs to be taken in count_tree, as we only hold rcu
read lock, i.e. we can only insert to an existing tree node after
acquiring its lock and making sure it has a nonzero count.

As a zero count should be rare, just fall back to insert_tree()
(which acquires tree lock).

This issue and its solution were pointed out by Shawn Bohrer
during patch review.

Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22 21:40:29 +01:00
Greg Kroah-Hartman
976f78d572 Merge 4.19.16 into android-4.19
Changes in 4.19.16
	Btrfs: fix deadlock when using free space tree due to block group creation
	staging: rtl8188eu: Fix module loading from tasklet for CCMP encryption
	staging: rtl8188eu: Fix module loading from tasklet for WEP encryption
	cpufreq: scmi: Fix frequency invariance in slow path
	x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE
	ALSA: hda/realtek - Support Dell headset mode for New AIO platform
	ALSA: hda/realtek - Add unplug function into unplug state of Headset Mode for ALC225
	ALSA: hda/realtek - Disable headset Mic VREF for headset mode of ALC225
	CIFS: Fix adjustment of credits for MTU requests
	CIFS: Do not set credits to 1 if the server didn't grant anything
	CIFS: Do not hide EINTR after sending network packets
	CIFS: Fix credit computation for compounded requests
	cifs: Fix potential OOB access of lock element array
	usb: cdc-acm: send ZLP for Telit 3G Intel based modems
	USB: storage: don't insert sane sense for SPC3+ when bad sense specified
	USB: storage: add quirk for SMI SM3350
	USB: Add USB_QUIRK_DELAY_CTRL_MSG quirk for Corsair K70 RGB
	slab: alien caches must not be initialized if the allocation of the alien cache failed
	mm/usercopy.c: no check page span for stack objects
	mm, memcg: fix reclaim deadlock with writeback
	ACPI: power: Skip duplicate power resource references in _PRx
	ACPI / PMIC: xpower: Fix TS-pin current-source handling
	ACPI/IORT: Fix rc_dma_get_range()
	i2c: dev: prevent adapter retries and timeout being set as minus value
	mtd: rawnand: qcom: fix memory corruption that causes panic
	vfio/type1: Fix unmap overflow off-by-one
	drm/amdgpu: Add new VegaM pci id
	PCI: dwc: Use interrupt masking instead of disabling
	PCI: dwc: Take lock when ACKing an interrupt
	PCI: dwc: Move interrupt acking into the proper callback
	drm/amd/display: Fix MST dp_blank REG_WAIT timeout
	drm/fb_helper: Allow leaking fbdev smem_start
	drm/fb-helper: Partially bring back workaround for bugs of SDL 1.2
	drm/i915: Unwind failure on pinning the gen7 ppgtt
	drm/amdgpu: Don't ignore rc from drm_dp_mst_topology_mgr_resume()
	drm/amdgpu: Don't fail resume process if resuming atomic state fails
	rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is set
	ext4: make sure enough credits are reserved for dioread_nolock writes
	ext4: fix a potential fiemap/page fault deadlock w/ inline_data
	ext4: avoid kernel warning when writing the superblock to a dead device
	ext4: use ext4_write_inode() when fsyncing w/o a journal
	ext4: track writeback errors using the generic tracking infrastructure
	ext4: fix special inode number checks in __ext4_iget()
	mm: page_mapped: don't assume compound page is huge or THP
	sunrpc: use-after-free in svc_process_common()
	KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less
	arm64: compat: Don't pull syscall number from regs in arm_compat_syscall
	Btrfs: fix access to available allocation bits when starting balance
	Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation
	Btrfs: use nofs context when initializing security xattrs to avoid deadlock
	Linux 4.19.16

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-16 22:17:03 +01:00
Vasily Averin
44e7bab39f sunrpc: use-after-free in svc_process_common()
commit d4b09acf92 upstream.

if node have NFSv41+ mounts inside several net namespaces
it can lead to use-after-free in svc_process_common()

svc_process_common()
        /* Setup reply header */
        rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE

svc_process_common() can use incorrect rqstp->rq_xprt,
its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
The problem is that serv is global structure but sv_bc_xprt
is assigned per-netnamespace.

According to Trond, the whole "let's set up rqstp->rq_xprt
for the back channel" is nothing but a giant hack in order
to work around the fact that svc_process_common() uses it
to find the xpt_ops, and perform a couple of (meaningless
for the back channel) tests of xpt_flags.

All we really need in svc_process_common() is to be able to run
rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()

Bruce J Fields points that this xpo_prep_reply_hdr() call
is an awfully roundabout way just to do "svc_putnl(resv, 0);"
in the tcp case.

This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
now it calls svc_process_common() with rqstp->rq_xprt = NULL.

To adjust reply header svc_process_common() just check
rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.

To handle rqstp->rq_xprt = NULL case in functions called from
svc_process_common() patch intruduces net namespace pointer
svc_rqst->rq_bc_net and adjust SVC_NET() definition.
Some other function was also adopted to properly handle described case.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: stable@vger.kernel.org
Fixes: 23c20ecd44 ("NFS: callback up - users counting cleanup")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
v2: added lost extern svc_tcp_prep_reply_hdr()
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-16 22:04:37 +01:00
WANG Chao
4bef2bacb1 x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE
commit e4f358916d upstream.

Commit

  4cd24de3a0 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")

replaced the RETPOLINE define with CONFIG_RETPOLINE checks. Remove the
remaining pieces.

 [ bp: Massage commit message. ]

Fixes: 4cd24de3a0 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux-kbuild@vger.kernel.org
Cc: srinivas.eeda@oracle.com
Cc: stable <stable@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181210163725.95977-1-chao.wang@ucloud.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-16 22:04:29 +01:00
Joel Fernandes (Google)
dec031f640 UPSTREAM: mm/memfd: Add an F_SEAL_FUTURE_WRITE seal to memfd
Android uses ashmem for sharing memory regions.  We are looking forward to
migrating all usecases of ashmem to memfd so that we can possibly remove
the ashmem driver in the future from staging while also benefiting from
using memfd and contributing to it.  Note staging drivers are also not ABI
and generally can be removed at anytime.

One of the main usecases Android has is the ability to create a region and
mmap it as writeable, then add protection against making any "future"
writes while keeping the existing already mmap'ed writeable-region active.
This allows us to implement a usecase where receivers of the shared
memory buffer can get a read-only view, while the sender continues to
write to the buffer.  See CursorWindow documentation in Android for more
details:
https://developer.android.com/reference/android/database/CursorWindow

This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active.

A better way to do F_SEAL_FUTURE_WRITE seal was discussed [1] last week
where we don't need to modify core VFS structures to get the same
behavior of the seal. This solves several side-effects pointed by Andy.
self-tests are provided in later patch to verify the expected semantics.

[1] https://lore.kernel.org/lkml/20181111173650.GA256781@google.com/

[Thanks a lot to Andy for suggestions to improve code]
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: John Stultz <john.stultz@linaro.org>

Change-Id: I6710c045954378f87bfbff6311d372a3b8549064
2019-01-16 09:48:42 -05:00
Joel Fernandes
198aac25b7 Revert "UPSTREAM: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd"
This reverts commit 1dc8ca4429.

Change-Id: I513034073c278e2ae58a53352cc2553a256a7ee0
2019-01-16 09:48:42 -05:00
Greg Kroah-Hartman
caf54339d3 Merge 4.19.15 into android-4.19
Changes in 4.19.15
	ARM: dts: sun8i: a83t: bananapi-m3: increase vcc-pd voltage to 3.3V
	pinctrl: meson: fix pull enable register calculation
	arm64: dts: mt7622: fix no more console output on rfb1
	powerpc: Fix COFF zImage booting on old powermacs
	powerpc/mm: Fix linux page tables build with some configs
	HID: ite: Add USB id match for another ITE based keyboard rfkill key quirk
	ARM: dts: imx7d-pico: Describe the Wifi clock
	ARM: imx: update the cpu power up timing setting on i.mx6sx
	ARM: dts: imx7d-nitrogen7: Fix the description of the Wifi clock
	IB/mlx5: Block DEVX umem from the non applicable cases
	Input: restore EV_ABS ABS_RESERVED
	powerpc/mm: Fallback to RAM if the altmap is unusable
	drm/amdgpu: Fix DEBUG_LOCKS_WARN_ON(depth <= 0) in amdgpu_ctx.lock
	IB/core: Fix oops in netdev_next_upper_dev_rcu()
	checkstack.pl: fix for aarch64
	xfrm: Fix error return code in xfrm_output_one()
	xfrm: Fix bucket count reported to userspace
	xfrm: Fix NULL pointer dereference in xfrm_input when skb_dst_force clears the dst_entry.
	ieee802154: hwsim: fix off-by-one in parse nested
	netfilter: nf_tables: fix suspicious RCU usage in nft_chain_stats_replace()
	netfilter: seqadj: re-load tcp header pointer after possible head reallocation
	Revert "scsi: qla2xxx: Fix NVMe Target discovery"
	scsi: bnx2fc: Fix NULL dereference in error handling
	Input: omap-keypad - fix idle configuration to not block SoC idle states
	Input: synaptics - enable RMI on ThinkPad T560
	ibmvnic: Convert reset work item mutex to spin lock
	ibmvnic: Fix non-atomic memory allocation in IRQ context
	ieee802154: ca8210: fix possible u8 overflow in ca8210_rx_done
	x86/mm: Fix guard hole handling
	x86/dump_pagetables: Fix LDT remap address marker
	i40e: fix mac filter delete when setting mac address
	ixgbe: Fix race when the VF driver does a reset
	netfilter: ipset: do not call ipset_nest_end after nla_nest_cancel
	netfilter: nat: can't use dst_hold on noref dst
	netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()
	bnx2x: Clear fip MAC when fcoe offload support is disabled
	bnx2x: Remove configured vlans as part of unload sequence.
	bnx2x: Send update-svid ramrod with retry/poll flags enabled
	scsi: target: iscsi: cxgbit: fix csk leak
	scsi: target: iscsi: cxgbit: add missing spin_lock_init()
	mt76: fix potential NULL pointer dereference in mt76_stop_tx_queues
	x86, hyperv: remove PCI dependency
	drivers: net: xgene: Remove unnecessary forward declarations
	net/tls: Init routines in create_ctx
	w90p910_ether: remove incorrect __init annotation
	net: hns: Incorrect offset address used for some registers.
	net: hns: All ports can not work when insmod hns ko after rmmod.
	net: hns: Some registers use wrong address according to the datasheet.
	net: hns: Fixed bug that netdev was opened twice
	net: hns: Clean rx fbd when ae stopped.
	net: hns: Free irq when exit from abnormal branch
	net: hns: Avoid net reset caused by pause frames storm
	net: hns: Fix ntuple-filters status error.
	net: hns: Add mac pcs config when enable|disable mac
	net: hns: Fix ping failed when use net bridge and send multicast
	mac80211: fix a kernel panic when TXing after TXQ teardown
	SUNRPC: Fix a race with XPRT_CONNECTING
	qed: Fix an error code qed_ll2_start_xmit()
	net: macb: fix random memory corruption on RX with 64-bit DMA
	net: macb: fix dropped RX frames due to a race
	net: macb: add missing barriers when reading descriptors
	lan743x: Expand phy search for LAN7431
	lan78xx: Resolve issue with changing MAC address
	vxge: ensure data0 is initialized in when fetching firmware version information
	nl80211: fix memory leak if validate_pae_over_nl80211() fails
	mac80211: free skb fraglist before freeing the skb
	kbuild: fix false positive warning/error about missing libelf
	m68k: Fix memblock-related crashes
	virtio: fix test build after uio.h change
	lan743x: Remove MAC Reset from initialization
	gpio: mvebu: only fail on missing clk if pwm is actually to be used
	Input: synaptics - enable SMBus for HP EliteBook 840 G4
	net: netxen: fix a missing check and an uninitialized use
	qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixup
	serial/sunsu: fix refcount leak
	auxdisplay: charlcd: fix x/y command parsing
	scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
	scsi: lpfc: do not set queue->page_count to 0 if pc_sli4_params.wqpcnt is invalid
	fork: record start_time late
	zram: fix double free backing device
	hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined
	mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
	mm, devm_memremap_pages: kill mapping "System RAM" support
	mm, devm_memremap_pages: fix shutdown handling
	mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support
	mm, hmm: use devm semantics for hmm_devmem_{add, remove}
	mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL
	mm, swap: fix swapoff with KSM pages
	memcg, oom: notify on oom killer invocation from the charge path
	sunrpc: fix cache_head leak due to queued request
	sunrpc: use SVC_NET() in svcauth_gss_* functions
	powerpc: remove old GCC version checks
	powerpc: consolidate -mno-sched-epilog into FTRACE flags
	powerpc: avoid -mno-sched-epilog on GCC 4.9 and newer
	powerpc: Disable -Wbuiltin-requires-header when setjmp is used
	kbuild: add -no-integrated-as Clang option unconditionally
	kbuild: consolidate Clang compiler flags
	Makefile: Export clang toolchain variables
	powerpc/boot: Set target when cross-compiling for clang
	raid6/ppc: Fix build for clang
	dma-direct: do not include SME mask in the DMA supported check
	mt76x0: init hw capabilities
	media: cx23885: only reset DMA on problematic CPUs
	ALSA: cs46xx: Potential NULL dereference in probe
	ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit()
	ALSA: usb-audio: Check mixer unit descriptors more strictly
	ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks
	ALSA: usb-audio: Always check descriptor sizes in parser code
	srcu: Lock srcu_data structure in srcu_gp_start()
	driver core: Add missing dev->bus->need_parent_lock checks
	Fix failure path in alloc_pid()
	block: deactivate blk_stat timer in wbt_disable_default()
	block: mq-deadline: Fix write completion handling
	dlm: fixed memory leaks after failed ls_remove_names allocation
	dlm: possible memory leak on error path in create_lkb()
	dlm: lost put_lkb on error path in receive_convert() and receive_unlock()
	dlm: memory leaks on error path in dlm_user_request()
	gfs2: Get rid of potential double-freeing in gfs2_create_inode
	gfs2: Fix loop in gfs2_rbm_find
	b43: Fix error in cordic routine
	selinux: policydb - fix byte order and alignment issues
	PCI / PM: Allow runtime PM without callback functions
	lockd: Show pid of lockd for remote locks
	nfsd4: zero-length WRITE should succeed
	arm64: drop linker script hack to hide __efistub_ symbols
	arm64: relocatable: fix inconsistencies in linker script and options
	leds: pwm: silently error out on EPROBE_DEFER
	Revert "powerpc/tm: Unset MSR[TS] if not recheckpointing"
	powerpc/tm: Set MSR[TS] just prior to recheckpoint
	iio: dac: ad5686: fix bit shift read register
	9p/net: put a lower bound on msize
	rxe: fix error completion wr_id and qp_num
	RDMA/srpt: Fix a use-after-free in the channel release code
	iommu/vt-d: Handle domain agaw being less than iommu agaw
	sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b
	ceph: don't update importing cap's mseq when handing cap export
	video: fbdev: pxafb: Fix "WARNING: invalid free of devm_ allocated data"
	drivers/perf: hisi: Fixup one DDRC PMU register offset
	genwqe: Fix size check
	intel_th: msu: Fix an off-by-one in attribute store
	power: supply: olpc_battery: correct the temperature units
	of: of_node_get()/of_node_put() nodes held in phandle cache
	of: __of_detach_node() - remove node from phandle cache
	lib: fix build failure in CONFIG_DEBUG_VIRTUAL test
	drm/nouveau/drm/nouveau: Check rc from drm_dp_mst_topology_mgr_resume()
	drm/vc4: Set ->is_yuv to false when num_planes == 1
	drm/rockchip: psr: do not dereference encoder before it is null checked.
	drm/amd/display: Fix unintialized max_bpc state values
	bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw
	Linux 4.19.15

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-13 10:25:45 +01:00
Dan Williams
e890a86706 mm, hmm: use devm semantics for hmm_devmem_{add, remove}
commit 58ef15b765 upstream.

devm semantics arrange for resources to be torn down when
device-driver-probe fails or when device-driver-release completes.
Similar to devm_memremap_pages() there is no need to support an explicit
remove operation when the users properly adhere to devm semantics.

Note that devm_kzalloc() automatically handles allocating node-local
memory.

Link: http://lkml.kernel.org/r/154275559545.76910.9186690723515469051.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 09:51:04 +01:00
Dan Williams
ec5471c92f mm, devm_memremap_pages: fix shutdown handling
commit a95c90f1e2 upstream.

The last step before devm_memremap_pages() returns success is to allocate
a release action, devm_memremap_pages_release(), to tear the entire setup
down.  However, the result from devm_add_action() is not checked.

Checking the error from devm_add_action() is not enough.  The api
currently relies on the fact that the percpu_ref it is using is killed by
the time the devm_memremap_pages_release() is run.  Rather than continue
this awkward situation, offload the responsibility of killing the
percpu_ref to devm_memremap_pages_release() directly.  This allows
devm_memremap_pages() to do the right thing relative to init failures and
shutdown.

Without this change we could fail to register the teardown of
devm_memremap_pages().  The likelihood of hitting this failure is tiny as
small memory allocations almost always succeed.  However, the impact of
the failure is large given any future reconfiguration, or disable/enable,
of an nvdimm namespace will fail forever as subsequent calls to
devm_memremap_pages() will fail to setup the pgmap_radix since there will
be stale entries for the physical address range.

An argument could be made to require that the ->kill() operation be set in
the @pgmap arg rather than passed in separately.  However, it helps code
readability, tracking the lifetime of a given instance, to be able to grep
the kill routine directly at the devm_memremap_pages() call site.

Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fixes: e8d5134833 ("memremap: change devm_memremap_pages interface...")
Reviewed-by: "Jérôme Glisse" <jglisse@redhat.com>
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 09:51:04 +01:00
Taehee Yoo
cee05c0371 netfilter: nf_tables: fix suspicious RCU usage in nft_chain_stats_replace()
[ Upstream commit 4c05ec4738 ]

basechain->stats is rcu protected data which is updated from
nft_chain_stats_replace(). This function is executed from the commit
phase which holds the pernet nf_tables commit mutex - not the global
nfnetlink subsystem mutex.

Test commands to reproduce the problem are:
   %iptables-nft -I INPUT
   %iptables-nft -Z
   %iptables-nft -Z

This patch uses RCU calls to handle basechain->stats updates to fix a
splat that looks like:

[89279.358755] =============================
[89279.363656] WARNING: suspicious RCU usage
[89279.368458] 4.20.0-rc2+ #44 Tainted: G        W    L
[89279.374661] -----------------------------
[89279.379542] net/netfilter/nf_tables_api.c:1404 suspicious rcu_dereference_protected() usage!
[...]
[89279.406556] 1 lock held by iptables-nft/5225:
[89279.411728]  #0: 00000000bf45a000 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1f/0x70 [nf_tables]
[89279.424022] stack backtrace:
[89279.429236] CPU: 0 PID: 5225 Comm: iptables-nft Tainted: G        W    L    4.20.0-rc2+ #44
[89279.430135] Call Trace:
[89279.430135]  dump_stack+0xc9/0x16b
[89279.430135]  ? show_regs_print_info+0x5/0x5
[89279.430135]  ? lockdep_rcu_suspicious+0x117/0x160
[89279.430135]  nft_chain_commit_update+0x4ea/0x640 [nf_tables]
[89279.430135]  ? sched_clock_local+0xd4/0x140
[89279.430135]  ? check_flags.part.35+0x440/0x440
[89279.430135]  ? __rhashtable_remove_fast.constprop.67+0xec0/0xec0 [nf_tables]
[89279.430135]  ? sched_clock_cpu+0x126/0x170
[89279.430135]  ? find_held_lock+0x39/0x1c0
[89279.430135]  ? hlock_class+0x140/0x140
[89279.430135]  ? is_bpf_text_address+0x5/0xf0
[89279.430135]  ? check_flags.part.35+0x440/0x440
[89279.430135]  ? __lock_is_held+0xb4/0x140
[89279.430135]  nf_tables_commit+0x2555/0x39c0 [nf_tables]

Fixes: f102d66b33 ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-13 09:50:57 +01:00
Peter Hutterer
52e1f29e8b Input: restore EV_ABS ABS_RESERVED
[ Upstream commit c201e3808e ]

ABS_RESERVED was added in d9ca1c990a and accidentally removed as part of
ffe0e7cf29 when the high-resolution scrolling code was removed.

Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Reviewed-by: Martin Kepplinger <martin.kepplinger@ginzinger.com>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-13 09:50:56 +01:00
Greg Kroah-Hartman
8735c21738 Merge 4.19.14 into android-4.19
Changes in 4.19.14
	ax25: fix a use-after-free in ax25_fillin_cb()
	gro_cell: add napi_disable in gro_cells_destroy
	ibmveth: fix DMA unmap error in ibmveth_xmit_start error path
	ieee802154: lowpan_header_create check must check daddr
	ip6mr: Fix potential Spectre v1 vulnerability
	ipv4: Fix potential Spectre v1 vulnerability
	ipv6: explicitly initialize udp6_addr in udp_sock_create6()
	ipv6: tunnels: fix two use-after-free
	ip: validate header length on virtual device xmit
	isdn: fix kernel-infoleak in capi_unlocked_ioctl
	net: clear skb->tstamp in forwarding paths
	net/hamradio/6pack: use mod_timer() to rearm timers
	net: ipv4: do not handle duplicate fragments as overlapping
	net: macb: restart tx after tx used bit read
	net: mvpp2: 10G modes aren't supported on all ports
	net: phy: Fix the issue that netif always links up after resuming
	netrom: fix locking in nr_find_socket()
	net/smc: fix TCP fallback socket release
	net: stmmac: Fix an error code in probe()
	net/tls: allocate tls context using GFP_ATOMIC
	net/wan: fix a double free in x25_asy_open_tty()
	packet: validate address length
	packet: validate address length if non-zero
	ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
	qmi_wwan: Added support for Fibocom NL668 series
	qmi_wwan: Added support for Telit LN940 series
	qmi_wwan: Add support for Fibocom NL678 series
	sctp: initialize sin6_flowinfo for ipv6 addrs in sctp_inet6addr_event
	sock: Make sock->sk_stamp thread-safe
	tcp: fix a race in inet_diag_dump_icsk()
	tipc: check tsk->group in tipc_wait_for_cond()
	tipc: compare remote and local protocols in tipc_udp_enable()
	tipc: fix a double free in tipc_enable_bearer()
	tipc: fix a double kfree_skb()
	tipc: use lock_sock() in tipc_sk_reinit()
	vhost: make sure used idx is seen before log in vhost_add_used_n()
	VSOCK: Send reset control packet when socket is partially bound
	xen/netfront: tolerate frags with no data
	net/mlx5: Typo fix in del_sw_hw_rule
	tipc: check group dests after tipc_wait_for_cond()
	net/mlx5e: Remove the false indication of software timestamping support
	ipv6: frags: Fix bogus skb->sk in reassembled packets
	net/ipv6: Fix a test against 'ipv6_find_idev()' return value
	nfp: flower: ensure TCP flags can be placed in IPv6 frame
	ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error
	mscc: Configured MAC entries should be locked.
	net/mlx5e: Cancel DIM work on close SQ
	net/mlx5e: RX, Verify MPWQE stride size is in range
	net: mvpp2: fix the phylink mode validation
	qed: Fix command number mismatch between driver and the mfw
	mlxsw: core: Increase timeout during firmware flash process
	net/mlx5e: Remove unused UDP GSO remaining counter
	net/mlx5e: RX, Fix wrong early return in receive queue poll
	net: mvneta: fix operation for 64K PAGE_SIZE
	net: Use __kernel_clockid_t in uapi net_stamp.h
	r8169: fix WoL device wakeup enable
	IB/hfi1: Incorrect sizing of sge for PIO will OOPs
	ALSA: rme9652: Fix potential Spectre v1 vulnerability
	ALSA: emu10k1: Fix potential Spectre v1 vulnerabilities
	ALSA: pcm: Fix potential Spectre v1 vulnerability
	ALSA: emux: Fix potential Spectre v1 vulnerabilities
	powerpc/fsl: Fix spectre_v2 mitigations reporting
	mtd: atmel-quadspi: disallow building on ebsa110
	mtd: rawnand: marvell: prevent timeouts on a loaded machine
	mtd: rawnand: omap2: Pass the parent of pdev to dma_request_chan()
	ALSA: hda: add mute LED support for HP EliteBook 840 G4
	ALSA: hda/realtek: Enable audio jacks of ASUS UX391UA with ALC294
	ALSA: fireface: fix for state to fetch PCM frames
	ALSA: firewire-lib: fix wrong handling payload_length as payload_quadlet
	ALSA: firewire-lib: fix wrong assignment for 'out_packet_without_header' tracepoint
	ALSA: firewire-lib: use the same print format for 'without_header' tracepoints
	ALSA: hda/realtek: Enable the headset mic auto detection for ASUS laptops
	ALSA: hda/tegra: clear pending irq handlers
	usb: dwc2: host: use hrtimer for NAK retries
	USB: serial: pl2303: add ids for Hewlett-Packard HP POS pole displays
	USB: serial: option: add Fibocom NL678 series
	usb: r8a66597: Fix a possible concurrency use-after-free bug in r8a66597_endpoint_disable()
	usb: dwc2: disable power_down on Amlogic devices
	Revert "usb: dwc3: pci: Use devm functions to get the phy GPIOs"
	usb: roles: Add a description for the class to Kconfig
	media: dvb-usb-v2: Fix incorrect use of transfer_flags URB_FREE_BUFFER
	staging: wilc1000: fix missing read_write setting when reading data
	ASoC: intel: cht_bsw_max98090_ti: Add pmc_plt_clk_0 quirk for Chromebook Clapper
	ASoC: intel: cht_bsw_max98090_ti: Add pmc_plt_clk_0 quirk for Chromebook Gnawty
	s390/pci: fix sleeping in atomic during hotplug
	Input: atmel_mxt_ts - don't try to free unallocated kernel memory
	Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G
	x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off
	x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()
	KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
	arm64: KVM: Make VHE Stage-2 TLB invalidation operations non-interruptible
	KVM: nVMX: Free the VMREAD/VMWRITE bitmaps if alloc_kvm_area() fails
	platform-msi: Free descriptors in platform_msi_domain_free()
	drm/v3d: Skip debugfs dumping GCA on platforms without GCA.
	DRM: UDL: get rid of useless vblank initialization
	clocksource/drivers/arc_timer: Utilize generic sched_clock
	perf machine: Record if a arch has a single user/kernel address space
	perf thread: Add fallback functions for cases where cpumode is insufficient
	perf tools: Use fallback for sample_addr_correlates_sym() cases
	perf script: Use fallbacks for branch stacks
	perf pmu: Suppress potential format-truncation warning
	perf env: Also consider env->arch == NULL as local operation
	ocxl: Fix endiannes bug in ocxl_link_update_pe()
	ocxl: Fix endiannes bug in read_afu_name()
	ext4: add ext4_sb_bread() to disambiguate ENOMEM cases
	ext4: fix possible use after free in ext4_quota_enable
	ext4: missing unlock/put_page() in ext4_try_to_write_inline_data()
	ext4: fix EXT4_IOC_GROUP_ADD ioctl
	ext4: include terminating u32 in size of xattr entries when expanding inodes
	ext4: avoid declaring fs inconsistent due to invalid file handles
	ext4: force inode writes when nfsd calls commit_metadata()
	ext4: check for shutdown and r/o file system in ext4_write_inode()
	spi: bcm2835: Fix race on DMA termination
	spi: bcm2835: Fix book-keeping of DMA termination
	spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode
	clk: rockchip: fix typo in rk3188 spdif_frac parent
	clk: sunxi-ng: Use u64 for calculation of NM rate
	crypto: cavium/nitrox - fix a DMA pool free failure
	crypto: chcr - small packet Tx stalls the queue
	crypto: testmgr - add AES-CFB tests
	crypto: cfb - fix decryption
	cgroup: fix CSS_TASK_ITER_PROCS
	cdc-acm: fix abnormal DATA RX issue for Mediatek Preloader.
	btrfs: dev-replace: go back to suspended state if target device is missing
	btrfs: dev-replace: go back to suspend state if another EXCL_OP is running
	btrfs: skip file_extent generation check for free_space_inode in run_delalloc_nocow
	Btrfs: fix fsync of files with multiple hard links in new directories
	btrfs: run delayed items before dropping the snapshot
	Btrfs: send, fix race with transaction commits that create snapshots
	brcmfmac: fix roamoff=1 modparam
	brcmfmac: Fix out of bounds memory access during fw load
	powerpc/tm: Unset MSR[TS] if not recheckpointing
	dax: Don't access a freed inode
	dax: Use non-exclusive wait in wait_entry_unlocked()
	f2fs: read page index before freeing
	f2fs: fix validation of the block count in sanity_check_raw_super
	f2fs: sanity check of xattr entry size
	serial: uartps: Fix interrupt mask issue to handle the RX interrupts properly
	media: cec: keep track of outstanding transmits
	media: cec-pin: fix broken tx_ignore_nack_until_eom error injection
	media: rc: cec devices do not have a lirc chardev
	media: imx274: fix stack corruption in imx274_read_reg
	media: vivid: free bitmap_cap when updating std/timings/etc.
	media: vb2: check memory model for VIDIOC_CREATE_BUFS
	media: v4l2-tpg: array index could become negative
	tools lib traceevent: Fix processing of dereferenced args in bprintk events
	MIPS: math-emu: Write-protect delay slot emulation pages
	MIPS: c-r4k: Add r4k_blast_scache_node for Loongson-3
	MIPS: Ensure pmd_present() returns false after pmd_mknotpresent()
	MIPS: Align kernel load address to 64KB
	MIPS: Expand MIPS32 ASIDs to 64 bits
	MIPS: OCTEON: mark RGMII interface disabled on OCTEON III
	MIPS: Fix a R10000_LLSC_WAR logic in atomic.h
	CIFS: Fix error mapping for SMB2_LOCK command which caused OFD lock problem
	smb3: fix large reads on encrypted connections
	arm64: KVM: Avoid setting the upper 32 bits of VTCR_EL2 to 1
	arm/arm64: KVM: vgic: Force VM halt when changing the active state of GICv3 PPIs/SGIs
	ARM: dts: exynos: Specify I2S assigned clocks in proper node
	rtc: m41t80: Correct alarm month range with RTC reads
	KVM: arm/arm64: vgic: Do not cond_resched_lock() with IRQs disabled
	KVM: arm/arm64: vgic: Cap SPIs to the VM-defined maximum
	KVM: arm/arm64: vgic-v2: Set active_source to 0 when restoring state
	KVM: arm/arm64: vgic: Fix off-by-one bug in vgic_get_irq()
	iommu/arm-smmu-v3: Fix big-endian CMD_SYNC writes
	arm64: compat: Avoid sending SIGILL for unallocated syscall numbers
	tpm: tpm_try_transmit() refactor error flow.
	tpm: tpm_i2c_nuvoton: use correct command duration for TPM 2.x
	spi: bcm2835: Unbreak the build of esoteric configs
	MIPS: Only include mmzone.h when CONFIG_NEED_MULTIPLE_NODES=y
	Linux 4.19.14

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-09 18:55:03 +01:00
Hans Verkuil
4651530385 media: cec: keep track of outstanding transmits
commit 32804fcb61 upstream.

I noticed that repeatedly running 'cec-ctl --playback' would occasionally
select 'Playback Device 2' instead of 'Playback Device 1', even though there
were no other Playback devices in the HDMI topology. This happened both with
'real' hardware and with the vivid CEC emulation, suggesting that this was an
issue in the core code that claims a logical address.

What 'cec-ctl --playback' does is to first clear all existing logical addresses,
and immediately after that configure the new desired device type.

The core code will poll the logical addresses trying to find a free address.
When found it will issue a few standard messages as per the CEC spec and return.
Those messages are queued up and will be transmitted asynchronously.

What happens is that if you run two 'cec-ctl --playback' commands in quick
succession, there is still a message of the first cec-ctl command being transmitted
when you reconfigure the adapter again in the second cec-ctl command.

When the logical addresses are cleared, then all information about outstanding
transmits inside the CEC core is also cleared, and the core is no longer aware
that there is still a transmit in flight.

When the hardware finishes the transmit it calls transmit_done and the CEC core
thinks it is actually in response of a POLL messages that is trying to find a
free logical address. The result of all this is that the core thinks that the
logical address for Playback Device 1 is in use, when it is really an earlier
transmit that ended.

The main transmit thread looks at adap->transmitting to check if a transmit
is in progress, but that is set to NULL when the adapter is unconfigured.
adap->transmitting represents the view of userspace, not that of the hardware.
So when unconfiguring the adapter the message is marked aborted from the point
of view of userspace, but seen from the PoV of the hardware it is still ongoing.

So introduce a new bool transmit_in_progress that represents the hardware state
and use that instead of adap->transmitting. Now the CEC core waits until the
hardware finishes the transmit before starting a new transmit.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: <stable@vger.kernel.org>      # for v4.18 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09 17:38:46 +01:00
Theodore Ts'o
bf2fd1f970 ext4: force inode writes when nfsd calls commit_metadata()
commit fde872682e upstream.

Some time back, nfsd switched from calling vfs_fsync() to using a new
commit_metadata() hook in export_operations().  If the file system did
not provide a commit_metadata() hook, it fell back to using
sync_inode_metadata().  Unfortunately doesn't work on all file
systems.  In particular, it doesn't work on ext4 due to how the inode
gets journalled --- the VFS writeback code will not always call
ext4_write_inode().

So we need to provide our own ext4_nfs_commit_metdata() method which
calls ext4_write_inode() directly.

Google-Bug-Id: 121195940
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09 17:38:43 +01:00
Miquel Raynal
6c56e89e4e platform-msi: Free descriptors in platform_msi_domain_free()
commit 81b1e6e6a8 upstream.

Since the addition of platform MSI support, there were two helpers
supposed to allocate/free IRQs for a device:

    platform_msi_domain_alloc_irqs()
    platform_msi_domain_free_irqs()

In these helpers, IRQ descriptors are allocated in the "alloc" routine
while they are freed in the "free" one.

Later, two other helpers have been added to handle IRQ domains on top
of MSI domains:

    platform_msi_domain_alloc()
    platform_msi_domain_free()

Seen from the outside, the logic is pretty close with the former
helpers and people used it with the same logic as before: a
platform_msi_domain_alloc() call should be balanced with a
platform_msi_domain_free() call. While this is probably what was
intended to do, the platform_msi_domain_free() does not remove/free
the IRQ descriptor(s) created/inserted in
platform_msi_domain_alloc().

One effect of such situation is that removing a module that requested
an IRQ will let one orphaned IRQ descriptor (with an allocated MSI
entry) in the device descriptors list. Next time the module will be
inserted back, one will observe that the allocation will happen twice
in the MSI domain, one time for the remaining descriptor, one time for
the new one. It also has the side effect to quickly overshoot the
maximum number of allocated MSI and then prevent any module requesting
an interrupt in the same domain to be inserted anymore.

This situation has been met with loops of insertion/removal of the
mvpp2.ko module (requesting 15 MSIs each time).

Fixes: 552c494a76 ("platform-msi: Allow creation of a MSI-based stacked irq domain")
Cc: stable@vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09 17:38:42 +01:00
Davide Caratti
e4a2ffe902 net: Use __kernel_clockid_t in uapi net_stamp.h
[ Upstream commit e2c4cf7f98 ]

Herton reports the following error when building a userspace program that
includes net_stamp.h:

 In file included from foo.c:2:
 /usr/include/linux/net_tstamp.h:158:2: error: unknown type name
 ‘clockid_t’
   clockid_t clockid; /* reference clockid */
   ^~~~~~~~~

Fix it by using __kernel_clockid_t in place of clockid_t.

Fixes: 80b14dee2b ("net: Add a new socket option for a future transmit time.")
Cc: Timothy Redaelli <tredaelli@redhat.com>
Reported-by: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Tested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09 17:38:36 +01:00
Deepa Dinamani
60f05dddf1 sock: Make sock->sk_stamp thread-safe
[ Upstream commit 3a0ed3e961 ]

Al Viro mentioned (Message-ID
<20170626041334.GZ10672@ZenIV.linux.org.uk>)
that there is probably a race condition
lurking in accesses of sk_stamp on 32-bit machines.

sock->sk_stamp is of type ktime_t which is always an s64.
On a 32 bit architecture, we might run into situations of
unsafe access as the access to the field becomes non atomic.

Use seqlocks for synchronization.
This allows us to avoid using spinlocks for readers as
readers do not need mutual exclusion.

Another approach to solve this is to require sk_lock for all
modifications of the timestamps. The current approach allows
for timestamps to have their own lock: sk_stamp_lock.
This allows for the patch to not compete with already
existing critical sections, and side effects are limited
to the paths in the patch.

The addition of the new field maintains the data locality
optimizations from
commit 9115e8cd2a ("net: reorganize struct sock for better data
locality")

Note that all the instances of the sk_stamp accesses
are either through the ioctl or the syscall recvmsg.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09 17:38:33 +01:00
Cong Wang
6e36567284 ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
[ Upstream commit aff6db4545 ]

__ptr_ring_swap_queue() tries to move pointers from the old
ring to the new one, but it forgets to check if ->producer
is beyond the new size at the end of the operation. This leads
to an out-of-bound access in __ptr_ring_produce() as reported
by syzbot.

Reported-by: syzbot+8993c0fa96d57c399735@syzkaller.appspotmail.com
Fixes: 5d49de5320 ("ptr_ring: resize support")
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09 17:38:33 +01:00
Willem de Bruijn
cde81154f8 ip: validate header length on virtual device xmit
[ Upstream commit cb9f1b7838 ]

KMSAN detected read beyond end of buffer in vti and sit devices when
passing truncated packets with PF_PACKET. The issue affects additional
ip tunnel devices.

Extend commit 76c0ddd8c3 ("ip6_tunnel: be careful when accessing the
inner header") and commit ccfec9e5cb ("ip_tunnel: be careful when
accessing the inner header").

Move the check to a separate helper and call at the start of each
ndo_start_xmit function in net/ipv4 and net/ipv6.

Minor changes:
- convert dev_kfree_skb to kfree_skb on error path,
  as dev_kfree_skb calls consume_skb which is not for error paths.
- use pskb_network_may_pull even though that is pedantic here,
  as the same as pskb_may_pull for devices without llheaders.
- do not cache ipv6 hdrs if used only once
  (unsafe across pskb_may_pull, was more relevant to earlier patch)

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09 17:38:31 +01:00
Quentin Perret
9f5e702922 Revert "FROMLIST: PM / EM: Expose the Energy Model in sysfs"
This reverts commit b8c80a0f42. It has not
been accepted upstream and is useful only for debug purpose. An
alternate debugfs interface will be pushed upstream to expose the Energy
Model. Android should use that too. In the meantime, it should be early
enough to remove the sysfs nodes before partners start basing userspace
tools on them.

Bug: 120440300
Change-Id: I8a5f003550df7df9116d4735c8d9ac10ac0aa79c
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2019-01-04 09:19:03 +00:00
Quentin Perret
3ed3d89a25 Revert "FROMLIST: sched: Introduce a sysctl for Energy Aware Scheduling"
This reverts commit b78eec5fb7. It has not
been accepted upstream and is of no particular interest in Android since
EAS is always enabled anyway.

Bug: 120440300
Change-Id: I7d1f2ad206c05041d386fc99f18e29c4fa56cbc8
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2019-01-04 09:17:55 +00:00
Quentin Perret
81dd278ee4 ANDROID: sched: Align EAS with upstream
The core EAS patches have now been accepted upstream. The patches used
in Android are based on a slightly earlier version of the series. In
order to reduce the delta with mainline and ease backports, align the
EAS code paths with their upstream version.

This basically applies the output of git range-diff on the appropriate
commits, and fixes a conflict in schedutil regarding the integration of
schedtune.

Bug: 120440300
Change-Id: I208ebeb4207e3f4f4bbb5103c606b293a464c20f
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2019-01-04 09:17:54 +00:00
Greg Kroah-Hartman
a872d2d074 Merge 4.19.13 into android-4.19
Changes in 4.19.13
	iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()"
	Revert "vfs: Allow userns root to call mknod on owned filesystems."
	USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data
	xhci: Don't prevent USB2 bus suspend in state check intended for USB3 only
	USB: xhci: fix 'broken_suspend' placement in struct xchi_hcd
	USB: serial: option: add GosunCn ZTE WeLink ME3630
	USB: serial: option: add HP lt4132
	USB: serial: option: add Simcom SIM7500/SIM7600 (MBIM mode)
	USB: serial: option: add Fibocom NL668 series
	USB: serial: option: add Telit LN940 series
	ubifs: Handle re-linking of inodes correctly while recovery
	scsi: t10-pi: Return correct ref tag when queue has no integrity profile
	scsi: sd: use mempool for discard special page
	mmc: core: Reset HPI enabled state during re-init and in case of errors
	mmc: core: Allow BKOPS and CACHE ctrl even if no HPI support
	mmc: core: Use a minimum 1600ms timeout when enabling CACHE ctrl
	mmc: omap_hsmmc: fix DMA API warning
	gpio: max7301: fix driver for use with CONFIG_VMAP_STACK
	gpiolib-acpi: Only defer request_irq for GpioInt ACPI event handlers
	posix-timers: Fix division by zero bug
	KVM: X86: Fix NULL deref in vcpu_scan_ioapic
	kvm: x86: Add AMD's EX_CFG to the list of ignored MSRs
	KVM: Fix UAF in nested posted interrupt processing
	Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels
	futex: Cure exit race
	x86/mtrr: Don't copy uninitialized gentry fields back to userspace
	x86/mm: Fix decoy address handling vs 32-bit builds
	x86/vdso: Pass --eh-frame-hdr to the linker
	x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence
	panic: avoid deadlocks in re-entrant console drivers
	mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
	mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
	mm: introduce mm_[p4d|pud|pmd]_folded
	xfrm_user: fix freeing of xfrm states on acquire
	rtlwifi: Fix leak of skb when processing C2H_BT_INFO
	iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT to old firmwares
	Revert "mwifiex: restructure rx_reorder_tbl_lock usage"
	iwlwifi: add new cards for 9560, 9462, 9461 and killer series
	media: ov5640: Fix set format regression
	mm, memory_hotplug: initialize struct pages for the full memory section
	mm: thp: fix flags for pmd migration when split
	mm, page_alloc: fix has_unmovable_pages for HugePages
	mm: don't miss the last page because of round-off error
	Input: elantech - disable elan-i2c for P52 and P72
	proc/sysctl: don't return ENOMEM on lookup when a table is unregistering
	drm/ioctl: Fix Spectre v1 vulnerabilities
	Linux 4.19.13

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-12-29 13:46:09 +01:00
Roman Gushchin
a5e8809697 mm: don't miss the last page because of round-off error
commit 68600f623d upstream.

I've noticed, that dying memory cgroups are often pinned in memory by a
single pagecache page.  Even under moderate memory pressure they sometimes
stayed in such state for a long time.  That looked strange.

My investigation showed that the problem is caused by applying the LRU
pressure balancing math:

  scan = div64_u64(scan * fraction[lru], denominator),

where

  denominator = fraction[anon] + fraction[file] + 1.

Because fraction[lru] is always less than denominator, if the initial scan
size is 1, the result is always 0.

This means the last page is not scanned and has
no chances to be reclaimed.

Fix this by rounding up the result of the division.

In practice this change significantly improves the speed of dying cgroups
reclaim.

[guro@fb.com: prevent double calculation of DIV64_U64_ROUND_UP() arguments]
  Link: http://lkml.kernel.org/r/20180829213311.GA13501@castle
Link: http://lkml.kernel.org/r/20180827162621.30187-3-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29 13:37:59 +01:00
Mathias Krause
5ecdfbb0d9 xfrm_user: fix freeing of xfrm states on acquire
commit 4a135e5389 upstream.

Commit 565f0fa902 ("xfrm: use a dedicated slab cache for struct
xfrm_state") moved xfrm state objects to use their own slab cache.
However, it missed to adapt xfrm_user to use this new cache when
freeing xfrm states.

Fix this by introducing and make use of a new helper for freeing
xfrm_state objects.

Fixes: 565f0fa902 ("xfrm: use a dedicated slab cache for struct xfrm_state")
Reported-by: Pan Bian <bianpan2016@163.com>
Cc: <stable@vger.kernel.org> # v4.18+
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29 13:37:58 +01:00
Martin Schwidefsky
89d6fff074 mm: introduce mm_[p4d|pud|pmd]_folded
[ Upstream commit 1071fc5779 ]

Add three architecture overrideable functions to test if the
p4d, pud, or pmd layer of a page table is folded or not.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-29 13:37:58 +01:00
Martin Schwidefsky
ba38c3e788 mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
[ Upstream commit a8874e7e8a ]

Change the currently empty defines for __PAGETABLE_PMD_FOLDED,
__PAGETABLE_PUD_FOLDED and __PAGETABLE_P4D_FOLDED to return 1.
This makes it possible to use __is_defined() to test if the
preprocessor define exists.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-29 13:37:57 +01:00
Martin Schwidefsky
28a3b553dd mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
[ Upstream commit 6d212db119 ]

The common mm code calls mm_dec_nr_pmds() and mm_dec_nr_puds()
in free_pgtables() if the address range spans a full pud or pmd.
If mm_dec_nr_puds/mm_dec_nr_pmds are non-empty due to configuration
settings they blindly subtract the size of the pmd or pud table from
pgtable_bytes even if the pud or pmd page table layer is folded.

Add explicit mm_[pmd|pud]_folded checks to the four pgtable_bytes
accounting functions mm_inc_nr_puds, mm_inc_nr_pmds, mm_dec_nr_puds
and mm_dec_nr_pmds. As the check for folded page tables can be
overwritten by the architecture, this allows to keep a correct
pgtable_bytes value for platforms that use a dynamic number of
page table levels.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-29 13:37:57 +01:00