Commit Graph

24689 Commits

Author SHA1 Message Date
Quentin Perret
b5ba569f6f sched/fair: select the most energy-efficient CPU candidate on wake-up
The current implementation of the energy-aware wake-up path relies on
find_best_target() to select an ordered list of CPU candidates for task
placement. The first candidate of the list saving energy is then chosen,
disregarding all the others to avoid the overhead of an expensive
energy_diff.

With the recent refactoring of select_energy_cpu_idx(), the cost of
exploring multiple CPUs has been reduced, hence offering the opportunity
to select the most energy-efficient candidate at a lower cost. This commit
seizes this opportunity by allowing to change select_energy_cpu_idx()'s
behaviour as to ignore the order of CPUs returned by find_best_target()
and to pick the best candidate energy-wise.

As this functionality is still considered as experimental, it is hidden
behind a sched_feature named FBT_STRICT_ORDER (like the equivalent
feature in Android 4.14) which defaults to true, hence keeping the
current behaviour by default.

Change-Id: I0cb833bfec1a4a053eddaff1652c0b6cad554f97
Suggested-by: Patrick Bellasi <patrick.bellasi@arm.com>
Suggested-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-02-14 16:21:10 +00:00
Greg Kroah-Hartman
f8bbe517d0 Merge 4.9.81 into android-4.9
Changes in 4.9.81
	powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
	powerpc/64: Add macros for annotating the destination of rfid/hrfid
	powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
	powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
	powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
	powerpc/64s: Add support for RFI flush of L1-D cache
	powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
	powerpc/pseries: Query hypervisor for RFI flush settings
	powerpc/powernv: Check device-tree for RFI flush settings
	powerpc/64s: Wire up cpu_show_meltdown()
	powerpc/64s: Allow control of RFI flush via debugfs
	auxdisplay: img-ascii-lcd: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
	pinctrl: pxa: pxa2xx: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
	ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
	kaiser: fix intel_bts perf crashes
	x86/pti: Make unpoison of pgd for trusted boot work for real
	kaiser: allocate pgd with order 0 when pti=off
	serial: core: mark port as initialized after successful IRQ change
	ip6mr: fix stale iterator
	net: igmp: add a missing rcu locking section
	qlcnic: fix deadlock bug
	qmi_wwan: Add support for Quectel EP06
	r8169: fix RTL8168EP take too long to complete driver initialization.
	tcp: release sk_frag.page in tcp_disconnect
	vhost_net: stop device during reset owner
	tcp_bbr: fix pacing_gain to always be unity when using lt_bw
	cls_u32: add missing RCU annotation.
	ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only
	soreuseport: fix mem leak in reuseport_add_sock()
	x86/asm: Fix inline asm call constraints for GCC 4.4
	x86/microcode/AMD: Do not load when running on a hypervisor
	media: soc_camera: soc_scale_crop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
	b43: Add missing MODULE_FIRMWARE()
	KEYS: encrypted: fix buffer overread in valid_master_desc()
	x86/retpoline: Remove the esp/rsp thunk
	KVM: x86: Make indirect calls in emulator speculation safe
	KVM: VMX: Make indirect call speculation safe
	module/retpoline: Warn about missing retpoline in module
	x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
	x86/cpufeatures: Add Intel feature bits for Speculation Control
	x86/cpufeatures: Add AMD feature bits for Speculation Control
	x86/msr: Add definitions for new speculation control MSRs
	x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
	x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
	x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
	x86/nospec: Fix header guards names
	x86/bugs: Drop one "mitigation" from dmesg
	x86/cpu/bugs: Make retpoline module warning conditional
	x86/cpufeatures: Clean up Spectre v2 related CPUID flags
	x86/retpoline: Simplify vmexit_fill_RSB()
	x86/spectre: Check CONFIG_RETPOLINE in command line parser
	x86/entry/64: Remove the SYSCALL64 fast path
	x86/entry/64: Push extra regs right away
	x86/asm: Move 'status' from thread_struct to thread_info
	Documentation: Document array_index_nospec
	array_index_nospec: Sanitize speculative array de-references
	x86: Implement array_index_mask_nospec
	x86: Introduce barrier_nospec
	x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec
	x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end}
	x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec
	x86/get_user: Use pointer masking to limit speculation
	x86/syscall: Sanitize syscall table de-references under speculation
	vfs, fdtable: Prevent bounds-check bypass via speculative execution
	nl80211: Sanitize array index in parse_txq_params
	x86/spectre: Report get_user mitigation for spectre_v1
	x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable"
	x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
	x86/paravirt: Remove 'noreplace-paravirt' cmdline option
	x86/kvm: Update spectre-v1 mitigation
	x86/retpoline: Avoid retpolines for built-in __init functions
	x86/spectre: Simplify spectre_v2 command line parsing
	x86/pti: Mark constant arrays as __initconst
	x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL
	KVM: nVMX: kmap() can't fail
	KVM: nVMX: vmx_complete_nested_posted_interrupt() can't fail
	KVM: nVMX: mark vmcs12 pages dirty on L2 exit
	KVM: nVMX: Eliminate vmcs02 pool
	KVM: VMX: introduce alloc_loaded_vmcs
	KVM: VMX: make MSR bitmaps per-VCPU
	KVM/x86: Add IBPB support
	KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
	KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
	KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
	crypto: tcrypt - fix S/G table for test_aead_speed()
	ASoC: simple-card: Fix misleading error message
	ASoC: rsnd: don't call free_irq() on Parent SSI
	ASoC: rsnd: avoid duplicate free_irq()
	drm: rcar-du: Use the VBK interrupt for vblank events
	drm: rcar-du: Fix race condition when disabling planes at CRTC stop
	x86/microcode: Do the family check first
	Linux 4.9.81

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-02-13 12:57:29 +01:00
Andi Kleen
a1745ad92f module/retpoline: Warn about missing retpoline in module
(cherry picked from commit caf7501a1b)

There's a risk that a kernel which has full retpoline mitigations becomes
vulnerable when a module gets loaded that hasn't been compiled with the
right compiler or the right option.

To enable detection of that mismatch at module load time, add a module info
string "retpoline" at build time when the module was compiled with
retpoline support. This only covers compiled C source, but assembler source
or prebuilt object files are not checked.

If a retpoline enabled kernel detects a non retpoline protected module at
load time, print a warning and report it in the sysfs vulnerability file.

[ tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: gregkh@linuxfoundation.org
Cc: torvalds@linux-foundation.org
Cc: jeyu@kernel.org
Cc: arjan@linux.intel.com
Link: https://lkml.kernel.org/r/20180125235028.31211-1-andi@firstfloor.org
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-13 12:35:58 +01:00
Chris Redpath
8a174b4749 sched/fair: prevent possible infinite loop in sched_group_energy
There is a race between hotplug and energy_diff which might result
in endless loop in sched_group_energy. When this happens, the end
condition cannot be detected.

We can store how many CPUs we need to visit at the beginning, and
bail out of the energy calculation if we visit more cpus than expected.

Bug: 72311797 72202633
Change-Id: I8dda75468ee1570da4071cd8165ef5131a8205d8
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2018-02-10 00:45:21 +00:00
Pavankumar Kondeti
de22a05ed0 sched/fair: fix array out of bounds access in select_energy_cpu_idx()
We are using an incorrect index while initializing the nrg_delta for
the previous CPU in select_energy_cpu_idx(). This initialization
it self is not needed as the nrg_delta for the previous CPU is already
initialized to 0 while preparing the energ_env struct.

Change-Id: Iee4e2c62f904050d2680a0a1df646d4d515c62cc
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2018-02-03 05:29:57 +05:30
Ionela Voinescu
f964120739 sched/fair: use min capacity when evaluating active cpus
When we are calculating what the impact of placing a task on a specific
cpu is, we should include the information that there might be a minimum
capacity imposed upon that cpu which could change the performance and/or
energy cost decisions.

When choosing an active target CPU, favour CPUs that won't end up
running at a high OPP due to a min capacity cap imposed by external
actors.

Change-Id: Ibc3302304345b63107f172b1fc3ffdabc19aa9d4
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2018-02-01 18:37:35 +00:00
Ionela Voinescu
88a968ca63 sched/fair: use min capacity when evaluating idle backup cpus
When we are calculating what the impact of placing a task on a specific
cpu is, we should include the information that there might be a minimum
capacity imposed upon that cpu which could change the performance and/or
energy cost decisions.

When choosing an idle backup CPU, favour CPUs that won't end up
running at a high OPP due to a min capacity cap imposed by external
actors.

Change-Id: I566623ffb3a7c5b61a23242dcce1cb4147ef8a4a
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2018-02-01 18:37:32 +00:00
Ionela Voinescu
9e1c648c71 sched/fair: use min capacity when evaluating placement energy costs
Add the ability to track minimim capacity forced onto a sched_group
by some external actor.

group_max_util returns the highest utilisation inside a sched_group
and is used when we are trying to calculate an energy cost estimate
for a specific scheduling scenario. Minimum capacities imposed from
elsewhere will influence this energy cost so we should reflect it
here.

Change-Id: Ibd537a6dbe6d67b11cc9e9be18f40fcb2c0f13de
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2018-02-01 18:37:27 +00:00
Ionela Voinescu
58b761f4c0 sched/fair: introduce minimum capacity capping sched feature
We have the ability to track minimum capacity forced onto a CPU by
userspace or external actors. This is provided though a minimum
frequency scale factor exposed by arch_scale_min_freq_capacity.

The use of this information is enabled through the MIN_CAPACITY_CAPPING
feature. If not enabled, the minimum frequency scale factor will
remain 0 and it will not impact energy estimation or scheduling
decisions.

Change-Id: Ibc61f2bf4fddf186695b72b262e602a6e8bfde37
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2018-02-01 18:37:23 +00:00
Ionela Voinescu
33550efbaa sched: add arch_scale_min_freq_capacity to track minimum capacity caps
If the minimum capacity of a group is capped by userspace or internal
dependencies which are not otherwise visible to the scheduler, we need
a way to see these and integrate this information into the energy
calculations and task placement decisions we make.

Add arch_scale_min_freq_capacity to determine the lowest capacity which
a specific cpu can provide under the current set of known constraints.

Change-Id: Ied4a1dc0982bbf42cb5ea2f27201d4363db59705
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
2018-02-01 18:37:10 +00:00
Dietmar Eggemann
da6833cff7 sched/fair: introduce an arch scaling function for max frequency capping
The max frequency scaling factor is defined as:

  max_freq_scale = policy_max_freq / cpuinfo_max_freq

To be able to scale the cpu capacity by this factor introduce a call to
the new arch scaling function arch_scale_max_freq_capacity() in
update_cpu_capacity() and provide a default implementation which returns
SCHED_CAPACITY_SCALE.

Another subsystem (e.g. cpufreq) can overwrite this default implementation,
exactly as for frequency and cpu invariance. It has to be enabled by the
arch by defining arch_scale_max_freq_capacity to the actual
implementation.

Change-Id: I266cd1f4c1c82f54b80063c36aa5f7662599dd28
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2018-02-01 15:59:24 +00:00
Greg Kroah-Hartman
71f1469722 Merge 4.9.79 into android-4.9
Changes in 4.9.79
	x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
	orangefs: use list_for_each_entry_safe in purge_waiting_ops
	orangefs: initialize op on loop restart in orangefs_devreq_read
	usbip: prevent vhci_hcd driver from leaking a socket pointer address
	usbip: Fix implicit fallthrough warning
	usbip: Fix potential format overflow in userspace tools
	can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
	can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
	KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
	Prevent timer value 0 for MWAITX
	drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
	drivers: base: cacheinfo: fix boot error message when acpi is enabled
	mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
	hwpoison, memcg: forcibly uncharge LRU pages
	cma: fix calculation of aligned offset
	mm, page_alloc: fix potential false positive in __zone_watermark_ok
	ipc: msg, make msgrcv work with LONG_MIN
	ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
	ACPICA: Namespace: fix operand cache leak
	netfilter: nfnetlink_cthelper: Add missing permission checks
	netfilter: xt_osf: Add missing permission checks
	reiserfs: fix race in prealloc discard
	reiserfs: don't preallocate blocks for extended attributes
	fs/fcntl: f_setown, avoid undefined behaviour
	scsi: libiscsi: fix shifting of DID_REQUEUE host byte
	Revert "module: Add retpoline tag to VERMAGIC"
	mm: fix 100% CPU kswapd busyloop on unreclaimable nodes
	Input: trackpoint - force 3 buttons if 0 button is reported
	orangefs: fix deadlock; do not write i_size in read_iter
	um: link vmlinux with -no-pie
	vsyscall: Fix permissions for emulate mode with KAISER/PTI
	eventpoll.h: add missing epoll event masks
	dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
	ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
	ipv6: fix udpv6 sendmsg crash caused by too small MTU
	ipv6: ip6_make_skb() needs to clear cork.base.dst
	lan78xx: Fix failure in USB Full Speed
	net: igmp: fix source address check for IGMPv3 reports
	net: qdisc_pkt_len_init() should be more robust
	net: tcp: close sock if net namespace is exiting
	pppoe: take ->needed_headroom of lower device into account on xmit
	r8169: fix memory corruption on retrieval of hardware statistics.
	sctp: do not allow the v4 socket to bind a v4mapped v6 address
	sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
	tipc: fix a memory leak in tipc_nl_node_get_link()
	vmxnet3: repair memory leak
	net: Allow neigh contructor functions ability to modify the primary_key
	ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
	ppp: unlock all_ppp_mutex before registering device
	be2net: restore properly promisc mode after queues reconfiguration
	ip6_gre: init dev->mtu and dev->hard_header_len correctly
	gso: validate gso_type in GSO handlers
	mlxsw: spectrum_router: Don't log an error on missing neighbor
	tun: fix a memory leak for tfile->tx_array
	flow_dissector: properly cap thoff field
	perf/x86/amd/power: Do not load AMD power module on !AMD platforms
	x86/microcode/intel: Extend BDW late-loading further with LLC size check
	hrtimer: Reset hrtimer cpu base proper on CPU hotplug
	x86: bpf_jit: small optimization in emit_bpf_tail_call()
	bpf: fix bpf_tail_call() x64 JIT
	bpf: introduce BPF_JIT_ALWAYS_ON config
	bpf: arsh is not supported in 32 bit alu thus reject it
	bpf: avoid false sharing of map refcount with max_entries
	bpf: fix divides by zero
	bpf: fix 32-bit divide by zero
	bpf: reject stores into ctx via st and xadd
	nfsd: auth: Fix gid sorting when rootsquash enabled
	Linux 4.9.79

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-31 14:13:00 +01:00
Daniel Borkmann
f531fbb06a bpf: reject stores into ctx via st and xadd
[ upstream commit f37a8cb84c ]

Alexei found that verifier does not reject stores into context
via BPF_ST instead of BPF_STX. And while looking at it, we
also should not allow XADD variant of BPF_STX.

The context rewriter is only assuming either BPF_LDX_MEM- or
BPF_STX_MEM-type operations, thus reject anything other than
that so that assumptions in the rewriter properly hold. Add
test cases as well for BPF selftests.

Fixes: d691f9e8d4 ("bpf: allow programs to write to certain skb fields")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Alexei Starovoitov
265d7657c9 bpf: fix 32-bit divide by zero
[ upstream commit 68fda450a7 ]

due to some JITs doing if (src_reg == 0) check in 64-bit mode
for div/mod operations mask upper 32-bits of src register
before doing the check

Fixes: 622582786c ("net: filter: x86: internal BPF JIT")
Fixes: 7a12b5031c ("sparc64: Add eBPF JIT.")
Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Eric Dumazet
4606077802 bpf: fix divides by zero
[ upstream commit c366287ebd ]

Divides by zero are not nice, lets avoid them if possible.

Also do_div() seems not needed when dealing with 32bit operands,
but this seems a minor detail.

Fixes: bd4cf0ed33 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Daniel Borkmann
fcabc6d008 bpf: arsh is not supported in 32 bit alu thus reject it
[ upstream commit 7891a87efc ]

The following snippet was throwing an 'unknown opcode cc' warning
in BPF interpreter:

  0: (18) r0 = 0x0
  2: (7b) *(u64 *)(r10 -16) = r0
  3: (cc) (u32) r0 s>>= (u32) r0
  4: (95) exit

Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
generation, not all of them do and interpreter does neither. We can
leave existing ones and implement it later in bpf-next for the
remaining ones, but reject this properly in verifier for the time
being.

Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:57 +01:00
Alexei Starovoitov
a3d6dd6a66 bpf: introduce BPF_JIT_ALWAYS_ON config
[ upstream commit 290af86629 ]

The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."

To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64

The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)

v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
  It will be sent when the trees are merged back to net-next

Considered doing:
  int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:56 +01:00
Alexei Starovoitov
5226bb3b95 bpf: fix bpf_tail_call() x64 JIT
[ upstream commit 90caccdd8c ]

- bpf prog_array just like all other types of bpf array accepts 32-bit index.
  Clarify that in the comment.
- fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
- tighten corresponding check in the interpreter to stay consistent

The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
in commit 96eabe7a40 in 4.14. Before that the map_flags would stay zero and
though JIT code is wrong it will check bounds correctly.
Hence two fixes tags. All other JITs don't have this problem.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 96eabe7a40 ("bpf: Allow selecting numa node during map creation")
Fixes: b52f00e6a7 ("x86: bpf_jit: implement bpf_tail_call() helper")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:56 +01:00
Thomas Gleixner
c98ff7299b hrtimer: Reset hrtimer cpu base proper on CPU hotplug
commit d5421ea43d upstream.

The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.

If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.

Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.

Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.

Fixes: 41d2e49493 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:56 +01:00
Ke Wang
7be1985454 ANDROID: sched: EAS: check energy_aware() before calling select_energy_cpu_brute() in up-migrate path
In up-migrate path, select_energy_cpu_brute() was called directly
without checking energy_aware(). This will make select_energy_cpu_brute()
always worked even disabling energy_aware() on the asymmetric cpu
capacity system.

Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
2018-01-29 15:31:47 +00:00
Greg Kroah-Hartman
e9dabe69de Merge 4.9.78 into android-4.9
Changes in 4.9.78
	libnvdimm, btt: Fix an incompatibility in the log layout
	scsi: sg: disable SET_FORCE_LOW_DMA
	futex: Prevent overflow by strengthen input validation
	ALSA: seq: Make ioctls race-free
	ALSA: pcm: Remove yet superfluous WARN_ON()
	ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant
	ALSA: hda - Apply the existing quirk to iMac 14,1
	timers: Unconditionally check deferrable base
	af_key: fix buffer overread in verify_address_len()
	af_key: fix buffer overread in parse_exthdrs()
	iser-target: Fix possible use-after-free in connection establishment error
	scsi: hpsa: fix volume offline state
	sched/deadline: Zero out positive runtime after throttling constrained tasks
	x86/retpoline: Fill RSB on context switch for affected CPUs
	x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
	objtool: Improve error message for bad file argument
	x86/cpufeature: Move processor tracing out of scattered features
	module: Add retpoline tag to VERMAGIC
	x86/mm/pkeys: Fix fill_sig_info_pkey
	x86/tsc: Fix erroneous TSC rate on Skylake Xeon
	pipe: avoid round_pipe_size() nr_pages overflow on 32-bit
	x86/apic/vector: Fix off by one in error path
	perf tools: Fix build with ARCH=x86_64
	Input: ALPS - fix multi-touch decoding on SS4 plus touchpads
	Input: 88pm860x-ts - fix child-node lookup
	Input: twl6040-vibra - fix child-node lookup
	Input: twl4030-vibra - fix sibling-node lookup
	tracing: Fix converting enum's from the map in trace_event_eval_update()
	phy: work around 'phys' references to usb-nop-xceiv devices
	ARM: sunxi_defconfig: Enable CMA
	ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7
	can: peak: fix potential bug in packet fragmentation
	scripts/gdb/linux/tasks.py: fix get_thread_info
	proc: fix coredump vs read /proc/*/stat race
	libata: apply MAX_SEC_1024 to all LITEON EP1 series devices
	workqueue: avoid hard lockups in show_workqueue_state()
	dm btree: fix serious bug in btree_split_beneath()
	dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6
	arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
	x86/cpu, x86/pti: Do not enable PTI on AMD processors
	usbip: fix warning in vhci_hcd_probe/lockdep_init_map
	x86/mce: Make machine check speculation protected
	retpoline: Introduce start/end markers of indirect thunk
	kprobes/x86: Blacklist indirect thunk functions for kprobes
	kprobes/x86: Disable optimizing on the function jumps to indirect thunk
	x86/pti: Document fix wrong index
	x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
	MIPS: AR7: ensure the port type's FCR value is used
	Linux 4.9.78

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-23 20:18:30 +01:00
Sergey Senozhatsky
ca2d736867 workqueue: avoid hard lockups in show_workqueue_state()
commit 62635ea8c1 upstream.

show_workqueue_state() can print out a lot of messages while being in
atomic context, e.g. sysrq-t -> show_workqueue_state(). If the console
device is slow it may end up triggering NMI hard lockup watchdog.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 19:57:08 +01:00
Steven Rostedt (VMware)
9a50ea0ce7 tracing: Fix converting enum's from the map in trace_event_eval_update()
commit 1ebe1eaf2f upstream.

Since enums do not get converted by the TRACE_EVENT macro into their values,
the event format displaces the enum name and not the value. This breaks
tools like perf and trace-cmd that need to interpret the raw binary data. To
solve this, an enum map was created to convert these enums into their actual
numbers on boot up. This is done by TRACE_EVENTS() adding a
TRACE_DEFINE_ENUM() macro.

Some enums were not being converted. This was caused by an optization that
had a bug in it.

All calls get checked against this enum map to see if it should be converted
or not, and it compares the call's system to the system that the enum map
was created under. If they match, then they call is processed.

To cut down on the number of iterations needed to find the maps with a
matching system, since calls and maps are grouped by system, when a match is
made, the index into the map array is saved, so that the next call, if it
belongs to the same system as the previous call, could start right at that
array index and not have to scan all the previous arrays.

The problem was, the saved index was used as the variable to know if this is
a call in a new system or not. If the index was zero, it was assumed that
the call is in a new system and would keep incrementing the saved index
until it found a matching system. The issue arises when the first matching
system was at index zero. The next map, if it belonged to the same system,
would then think it was the first match and increment the index to one. If
the next call belong to the same system, it would begin its search of the
maps off by one, and miss the first enum that should be converted. This left
a single enum not converted properly.

Also add a comment to describe exactly what that index was for. It took me a
bit too long to figure out what I was thinking when debugging this issue.

Link: http://lkml.kernel.org/r/717BE572-2070-4C1E-9902-9F2E0FEDA4F8@oracle.com

Fixes: 0c564a538a ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Teste-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 19:57:07 +01:00
Xunlei Pang
1ad4f2872c sched/deadline: Zero out positive runtime after throttling constrained tasks
commit ae83b56a56 upstream.

When a contrained task is throttled by dl_check_constrained_dl(),
it may carry the remaining positive runtime, as a result when
dl_task_timer() fires and calls replenish_dl_entity(), it will
not be replenished correctly due to the positive dl_se->runtime.

This patch assigns its runtime to 0 if positive after throttling.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: df8eac8caf ("sched/deadline: Throttle a constrained deadline task activated after the deadline)
Link: http://lkml.kernel.org/r/1494421417-27550-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 19:57:05 +01:00
Thomas Gleixner
676109b28c timers: Unconditionally check deferrable base
commit ed4bbf7910 upstream.

When the timer base is checked for expired timers then the deferrable base
must be checked as well. This was missed when making the deferrable base
independent of base::nohz_active.

Fixes: ced6d5c11d ("timers: Use deferrable base independent of base::nohz_active")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: rt@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 19:57:04 +01:00
Li Jinyue
d8a3170db0 futex: Prevent overflow by strengthen input validation
commit fbe0e839d1 upstream.

UBSAN reports signed integer overflow in kernel/futex.c:

 UBSAN: Undefined behaviour in kernel/futex.c:2041:18
 signed integer overflow:
 0 - -2147483648 cannot be represented in type 'int'

Add a sanity check to catch negative values of nr_wake and nr_requeue.

Signed-off-by: Li Jinyue <lijinyue@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: dvhart@infradead.org
Link: https://lkml.kernel.org/r/1513242294-31786-1-git-send-email-lijinyue@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 19:57:04 +01:00
Victor Wan
20946741c8 Merge branch 'android-4.9' into amlogic-4.9-dev
Conflicts:
	Makefile
	init/main.c
2018-01-22 20:17:25 +08:00
Greg Kroah-Hartman
033d019ce2 Merge 4.9.77 into android-4.9
Changes in 4.9.77
	dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
	mac80211: Add RX flag to indicate ICV stripped
	ath10k: rebuild crypto header in rx data frames
	KVM: Fix stack-out-of-bounds read in write_mmio
	can: gs_usb: fix return value of the "set_bittiming" callback
	IB/srpt: Disable RDMA access by the initiator
	MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
	MIPS: Factor out NT_PRFPREG regset access helpers
	MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
	MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
	MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
	MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
	MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
	kvm: vmx: Scrub hardware GPRs at VM-exit
	platform/x86: wmi: Call acpi_wmi_init() later
	x86/acpi: Handle SCI interrupts above legacy space gracefully
	ALSA: pcm: Remove incorrect snd_BUG_ON() usages
	ALSA: pcm: Add missing error checks in OSS emulation plugin builder
	ALSA: pcm: Abort properly at pending signal in OSS read/write loops
	ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
	ALSA: aloop: Release cable upon open error path
	ALSA: aloop: Fix inconsistent format due to incomplete rule
	ALSA: aloop: Fix racy hw constraints adjustment
	x86/acpi: Reduce code duplication in mp_override_legacy_irq()
	zswap: don't param_set_charp while holding spinlock
	lan78xx: use skb_cow_head() to deal with cloned skbs
	sr9700: use skb_cow_head() to deal with cloned skbs
	smsc75xx: use skb_cow_head() to deal with cloned skbs
	cx82310_eth: use skb_cow_head() to deal with cloned skbs
	xhci: Fix ring leak in failure path of xhci_alloc_virt_device()
	8021q: fix a memory leak for VLAN 0 device
	ip6_tunnel: disable dst caching if tunnel is dual-stack
	net: core: fix module type in sock_diag_bind
	RDS: Heap OOB write in rds_message_alloc_sgs()
	RDS: null pointer dereference in rds_atomic_free_op
	sh_eth: fix TSU resource handling
	sh_eth: fix SH7757 GEther initialization
	net: stmmac: enable EEE in MII, GMII or RGMII only
	ipv6: fix possible mem leaks in ipv6_make_skb()
	ethtool: do not print warning for applications using legacy API
	mlxsw: spectrum_router: Fix NULL pointer deref
	net/sched: Fix update of lastuse in act modules implementing stats_update
	crypto: algapi - fix NULL dereference in crypto_remove_spawns()
	rbd: set max_segments to USHRT_MAX
	x86/microcode/intel: Extend BDW late-loading with a revision check
	KVM: x86: Add memory barrier on vmcs field lookup
	drm/vmwgfx: Potential off by one in vmw_view_add()
	kaiser: Set _PAGE_NX only if supported
	iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
	target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
	bpf: move fixup_bpf_calls() function
	bpf: refactor fixup_bpf_calls()
	bpf: prevent out-of-bounds speculation
	bpf, array: fix overflow in max_entries and undefined behavior in index_mask
	USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
	USB: serial: cp210x: add new device ID ELV ALC 8xxx
	usb: misc: usb3503: make sure reset is low for at least 100us
	USB: fix usbmon BUG trigger
	usbip: remove kernel addresses from usb device and urb debug msgs
	usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
	usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
	staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
	Bluetooth: Prevent stack info leak from the EFS element.
	uas: ignore UAS for Norelsys NS1068(X) chips
	e1000e: Fix e1000_check_for_copper_link_ich8lan return value.
	x86/Documentation: Add PTI description
	x86/cpu: Factor out application of forced CPU caps
	x86/cpufeatures: Make CPU bugs sticky
	x86/cpufeatures: Add X86_BUG_CPU_INSECURE
	x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
	x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
	x86/cpu: Merge bugs.c and bugs_64.c
	sysfs/cpu: Add vulnerability folder
	x86/cpu: Implement CPU vulnerabilites sysfs functions
	x86/cpu/AMD: Make LFENCE a serializing instruction
	x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
	sysfs/cpu: Fix typos in vulnerability documentation
	x86/alternatives: Fix optimize_nops() checking
	x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
	x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
	objtool, modules: Discard objtool annotation sections for modules
	objtool: Detect jumps to retpoline thunks
	objtool: Allow alternatives to be ignored
	x86/asm: Use register variable to get stack pointer value
	x86/retpoline: Add initial retpoline support
	x86/spectre: Add boot time option to select Spectre v2 mitigation
	x86/retpoline/crypto: Convert crypto assembler indirect jumps
	x86/retpoline/entry: Convert entry assembler indirect jumps
	x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
	x86/retpoline/hyperv: Convert assembler indirect jumps
	x86/retpoline/xen: Convert Xen hypercall indirect jumps
	x86/retpoline/checksum32: Convert assembler indirect jumps
	x86/retpoline/irq32: Convert assembler indirect jumps
	x86/retpoline: Fill return stack buffer on vmexit
	selftests/x86: Add test_vsyscall
	x86/retpoline: Remove compile time warning
	objtool: Fix retpoline support for pre-ORC objtool
	x86/pti/efi: broken conversion from efi to kernel page table
	Linux 4.9.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-17 10:29:45 +01:00
Daniel Borkmann
820ef2a0e5 bpf, array: fix overflow in max_entries and undefined behavior in index_mask
commit bbeb6e4323 upstream.

syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.

However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr->max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.

Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.

This fixes all the issues triggered by syzkaller's reproducers.

Fixes: b2157399cc ("bpf: prevent out-of-bounds speculation")
Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Alexei Starovoitov
a9bfac14cd bpf: prevent out-of-bounds speculation
commit b2157399cc upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index >= max_entries) {
    index &= map->index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[ Backported to 4.9 - gregkh ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Alexei Starovoitov
f55093dccd bpf: refactor fixup_bpf_calls()
commit 79741b3bde upstream.

reduce indent and make it iterate over instructions similar to
convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[Backported to 4.9.y - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Alexei Starovoitov
28035366af bpf: move fixup_bpf_calls() function
commit e245c5c6a5 upstream.

no functional change.
move fixup_bpf_calls() to verifier.c
it's being refactored in the next patch

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[backported to 4.9 - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Greg Kroah-Hartman
91549408ce Merge 4.9.76 into android-4.9
Changes in 4.9.76
	kernel/acct.c: fix the acct->needcheck check in check_free_space()
	crypto: n2 - cure use after free
	crypto: chacha20poly1305 - validate the digest size
	crypto: pcrypt - fix freeing pcrypt instances
	sunxi-rsb: Include OF based modalias in device uevent
	fscache: Fix the default for fscache_maybe_release_page()
	nbd: fix use-after-free of rq/bio in the xmit path
	kernel: make groups_sort calling a responsibility group_info allocators
	kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
	kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
	kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
	iommu/arm-smmu-v3: Don't free page table ops twice
	iommu/arm-smmu-v3: Cope with duplicated Stream IDs
	ARC: uaccess: dont use "l" gcc inline asm constraint modifier
	Input: elantech - add new icbody type 15
	x86/microcode/AMD: Add support for fam17h microcode loading
	parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
	parisc: qemu idle sleep support
	x86/tlb: Drop the _GPL from the cpu_tlbstate export
	Map the vsyscall page with _PAGE_USER
	mtd: nand: pxa3xx: Fix READOOB implementation
	Linux 4.9.76

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-10 09:51:38 +01:00
Oleg Nesterov
4d53eb4949 kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
commit 426915796c upstream.

complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.

If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.

After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.

This should hopefully fix the problem reported by Dmitry.  This
test-case

	static int init(void *arg)
	{
		for (;;)
			pause();
	}

	int main(void)
	{
		char stack[16 * 1024];

		for (;;) {
			int pid = clone(init, stack + sizeof(stack)/2,
					CLONE_NEWPID | SIGCHLD, NULL);
			assert(pid > 0);

			assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
			assert(waitpid(-1, NULL, WSTOPPED) == pid);

			assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
			assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
			assert(pid == wait(NULL));
		}
	}

triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop().  do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.

And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.

Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Kyle Huey <me@kylehuey.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:29:53 +01:00
Oleg Nesterov
794ac8ef9b kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
commit ac25385089 upstream.

Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
signals even if force == T.  This simplifies the next change and this
matches the same check in get_signal() which will drop these signals
anyway.

Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:29:52 +01:00
Oleg Nesterov
1453b3ac6c kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
commit 628c1bcba2 upstream.

The comment in sig_ignored() says "Tracers may want to know about even
ignored signals" but SIGKILL can not be reported to debugger and it is
just wrong to return 0 in this case: SIGKILL should only kill the
SIGNAL_UNKILLABLE task if it comes from the parent ns.

Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
sig_task_ignored().

SISGTOP coming from within the namespace is not really right too but at
least debugger can intercept it, and we can't drop it here because this
will break "gdb -p 1": ptrace_attach() won't work.  Perhaps we will add
another ->ptrace check later, we will see.

Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:29:52 +01:00
Thiago Rafael Becker
79258d9834 kernel: make groups_sort calling a responsibility group_info allocators
commit bdcf0a423e upstream.

In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
 - Make groups_sort globally visible.
 - Move the call to groups_sort to the modifiers of group_info
 - Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:29:52 +01:00
Oleg Nesterov
790080ce0e kernel/acct.c: fix the acct->needcheck check in check_free_space()
commit 4d9570158b upstream.

As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check
is very wrong, we need time_is_after_jiffies() to make sys_acct() work.

Ignoring the overflows, the code should "goto out" if needcheck >
jiffies, while currently it checks "needcheck < jiffies" and thus in the
likely case check_free_space() does nothing until jiffies overflow.

In particular this means that sys_acct() is simply broken, acct_on()
sets acct->needcheck = jiffies and expects that check_free_space()
should set acct->active = 1 after the free-space check, but this won't
happen if jiffies increments in between.

This was broken by commit 32dc730860 ("get rid of timer in
kern/acct.c") in 2011, then another (correct) commit 795a2f22a8
("acct() should honour the limits from the very beginning") made the
problem more visible.

Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com
Fixes: 32dc730860 ("get rid of timer in kern/acct.c")
Reported-by: TSUKADA Koutaro <tsukada@ascade.co.jp>
Suggested-by: TSUKADA Koutaro <tsukada@ascade.co.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:29:51 +01:00
Patrick Bellasi
3b9305de32 sched/fair: reduce rounding errors in energy computations
The SG's energy is obtained by adding busy and idle contributions which
are computed by considering a proper fraction of the
SCHED_CAPACITY_SCALE defined by the SG's utilizations.

By scaling each and every contribution conputed we risk to accumulate
rounding errors which can results into a non null energy_delta also in
cases when the same total accomulated utilization is differently
distributed among different CPUs.

To reduce rouding errors, this patch accumulated non-scaled busy/idle
energy contributions for each visited SG, and scale each of them just
one time at the end.

Change-Id: Idf8367fee0ac11938c6436096f0c1b2d630210d2
Suggested-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-01-08 15:55:12 +00:00
Patrick Bellasi
cf28cf03a3 sched/fair: re-factor energy_diff to use a single (extensible) energy_env
The energy_env data structure is used to cache values required by multiple
different functions involved in energy_diff computation. Some of these
functions require additional parameters which can be easily embedded
into the energy_env itself. The current implementation of energy_diff
hardcodes the usage of two different energy_env structures to estimate and
compare the energy consumption related to a "before" and an "after" CPU.
Moreover, it does this energy estimation by walking multiple times the
SDs/SGs data structures.

A better design can be envisioned by better using the energy_env structure
to support a more efficient and concurrent evaluation of multiple schedule
candidates. To this purpose, this patch provides a complete re-factoring
of the energy_diff implementation to:

1. use a single energy_env structure for the evaluation of all the
   candidate CPUs

2. walk just one time the SDs/SGs, thus improving the overall performance
   to compute the energy estimation for each CPU candidate specified by
   the single used energy_env

3. simplify the code (at least if you look at the new version and not at
   this re-factoring patch) thus providing a more clean code to maintain
   and extend for additional features

This patch updated all the clients of energy_env to use only the data
provided by this structure and an index for one of its CPUs candidates.
Embedding everything within the energy env will make it simple to add
tracepoints for this new version, which can easily provide an holistic
view on how energy_diff evaluated the proposed CPU candidates.

The new proposed structure, for both "struct energy_env" and the functions
using it, is designed in such a way to easily accommodate additional
further extensions (e.g. SchedTune filtering) without requiring an
additional big re-factoring of these core functions.

Finally, the usage of a CPUs candidate array, embedded into the
energy_diff structure, allows also to seamless extend the exploration of
multiple candidate CPUs, for example to support the comparison of a
spread-vs-packing strategy.

Change-Id: Ic04ffb6848b2c763cf1788767f22c6872eb12bee
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
[reworked find_new_capacity() and enforced the respect of
find_best_target() selection order]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-01-08 15:55:08 +00:00
Patrick Bellasi
142ce32a63 sched/fair: cleanup select_energy_cpu_brute to be more consistent
The current definition of select_energy_cpu_brute is a bit confusing in
the definition of the value for the target_cpu to be returned as wakeup
CPU for the specified task.

This cleanup the code by ensuring that we always set target_cpu right
before returning it. rcu_read_lock and check on *sd!=NULL are also moved
around to be exactly where they are required.

Change-Id: I70a4b558b3624a13395da1a87ddc0776fd1d6641
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-01-08 15:55:03 +00:00
Patrick Bellasi
7f44e92d1c sched/fair: remove capacity tracking from energy_diff
In preparation for the energy_diff refactoring, let's remove all the
SchedTune specific bits which are used to keep track of the capacity
variations requited by the PESpace filtering.

This removes also the energy_normalization function and the wrapper of
energy_diff which is used to trigger a PESpace filtering by
schedtune_accept_deltas().

The remaining code is the "original" energy_diff function which
looks just at the energy variations to compare prev_cpu vs next_cpu.

Change-Id: I4fb1d1c5ba45a364e6db9ab8044969349aba0307
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-01-08 15:54:59 +00:00
Patrick Bellasi
4905932b05 sched/fair: remove energy_diff tracepoint in preparation to re-factoring
The format of the energy_diff tracepoint is going to be changed by the
following energ_diff refactoring patches. Let's remove it now to start from
a clean slate.

Change-Id: Id4f537ed60d90a7ddcca0a29a49944bfacb85c8c
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-01-08 15:54:55 +00:00
Chris Redpath
e997bf0fde sched/fair: use *p to reference task_structs
This is a simple renaming patch which just align to the most common code
convention used in fair.c, task_structs pointers are usually named *p.

Change-Id: Id0769e52b6a271014d89353fdb4be9bb721b5b2f
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-01-08 15:54:50 +00:00
Ke Wang
225006a4f7 sched: EAS: Fix the calculation of group util in group_idle_state()
util_delta becomes not zero in eenv_before, which will affect the
calculation of grp_util in group_idle_state(). Fix it under the
new condition.

Change-Id: Ic3853bb45876a8e388afcbe4e72d25fc42b1d7b0
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
(cherry picked from commit 47c87b2654)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-01-08 15:54:46 +00:00
Victor Wan
2c95ea743b Merge branch 'android-4.9' into amlogic-4.9-dev
Conflicts:
	arch/arm/configs/omap2plus_defconfig
	drivers/Makefile
	drivers/android/binder.c
2018-01-08 18:44:19 +08:00
Greg Kroah-Hartman
bc7ff9b998 Merge 4.9.75 into android-4.9
Changes in 4.9.75
	tcp_bbr: reset full pipe detection on loss recovery undo
	tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
	x86/boot: Add early cmdline parsing for options with arguments
	KAISER: Kernel Address Isolation
	kaiser: merged update
	kaiser: do not set _PAGE_NX on pgd_none
	kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
	kaiser: fix build and FIXME in alloc_ldt_struct()
	kaiser: KAISER depends on SMP
	kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
	kaiser: fix perf crashes
	kaiser: ENOMEM if kaiser_pagetable_walk() NULL
	kaiser: tidied up asm/kaiser.h somewhat
	kaiser: tidied up kaiser_add/remove_mapping slightly
	kaiser: align addition to x86/mm/Makefile
	kaiser: cleanups while trying for gold link
	kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
	kaiser: delete KAISER_REAL_SWITCH option
	kaiser: vmstat show NR_KAISERTABLE as nr_overhead
	kaiser: enhanced by kernel and user PCIDs
	kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
	kaiser: PCID 0 for kernel and 128 for user
	kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
	kaiser: paranoid_entry pass cr3 need to paranoid_exit
	kaiser: kaiser_remove_mapping() move along the pgd
	kaiser: fix unlikely error in alloc_ldt_struct()
	kaiser: add "nokaiser" boot option, using ALTERNATIVE
	x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
	x86/kaiser: Check boottime cmdline params
	kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
	kaiser: drop is_atomic arg to kaiser_pagetable_walk()
	kaiser: asm/tlbflush.h handle noPGE at lower level
	kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
	x86/paravirt: Dont patch flush_tlb_single
	x86/kaiser: Reenable PARAVIRT
	kaiser: disabled on Xen PV
	x86/kaiser: Move feature detection up
	KPTI: Rename to PAGE_TABLE_ISOLATION
	KPTI: Report when enabled
	kaiser: Set _PAGE_NX only if supported
	Linux 4.9.75

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-05 22:28:25 +01:00
Hugh Dickins
0994a2cf8f kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping().  And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:46:32 +01:00
Dave Hansen
8f0baadf2b kaiser: merged update
Merged fixes and cleanups, rebased to 4.9.51 tree (no 5-level paging).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:46:32 +01:00
Richard Fellner
13be4483bb KAISER: Kernel Address Isolation
This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

        https://github.com/IAIK/KAISER

From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>

After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:46:32 +01:00