linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-02 11:13:02 +09:00

Author	SHA1	Message	Date
Quentin Perret	b5ba569f6f	sched/fair: select the most energy-efficient CPU candidate on wake-up The current implementation of the energy-aware wake-up path relies on find_best_target() to select an ordered list of CPU candidates for task placement. The first candidate of the list saving energy is then chosen, disregarding all the others to avoid the overhead of an expensive energy_diff. With the recent refactoring of select_energy_cpu_idx(), the cost of exploring multiple CPUs has been reduced, hence offering the opportunity to select the most energy-efficient candidate at a lower cost. This commit seizes this opportunity by allowing to change select_energy_cpu_idx()'s behaviour as to ignore the order of CPUs returned by find_best_target() and to pick the best candidate energy-wise. As this functionality is still considered as experimental, it is hidden behind a sched_feature named FBT_STRICT_ORDER (like the equivalent feature in Android 4.14) which defaults to true, hence keeping the current behaviour by default. Change-Id: I0cb833bfec1a4a053eddaff1652c0b6cad554f97 Suggested-by: Patrick Bellasi <patrick.bellasi@arm.com> Suggested-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-02-14 16:21:10 +00:00
Greg Kroah-Hartman	f8bbe517d0	Merge 4.9.81 into android-4.9 Changes in 4.9.81 powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper powerpc/64: Add macros for annotating the destination of rfid/hrfid powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL powerpc/64s: Add support for RFI flush of L1-D cache powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti powerpc/pseries: Query hypervisor for RFI flush settings powerpc/powernv: Check device-tree for RFI flush settings powerpc/64s: Wire up cpu_show_meltdown() powerpc/64s: Allow control of RFI flush via debugfs auxdisplay: img-ascii-lcd: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE pinctrl: pxa: pxa2xx: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE kaiser: fix intel_bts perf crashes x86/pti: Make unpoison of pgd for trusted boot work for real kaiser: allocate pgd with order 0 when pti=off serial: core: mark port as initialized after successful IRQ change ip6mr: fix stale iterator net: igmp: add a missing rcu locking section qlcnic: fix deadlock bug qmi_wwan: Add support for Quectel EP06 r8169: fix RTL8168EP take too long to complete driver initialization. tcp: release sk_frag.page in tcp_disconnect vhost_net: stop device during reset owner tcp_bbr: fix pacing_gain to always be unity when using lt_bw cls_u32: add missing RCU annotation. ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only soreuseport: fix mem leak in reuseport_add_sock() x86/asm: Fix inline asm call constraints for GCC 4.4 x86/microcode/AMD: Do not load when running on a hypervisor media: soc_camera: soc_scale_crop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE b43: Add missing MODULE_FIRMWARE() KEYS: encrypted: fix buffer overread in valid_master_desc() x86/retpoline: Remove the esp/rsp thunk KVM: x86: Make indirect calls in emulator speculation safe KVM: VMX: Make indirect call speculation safe module/retpoline: Warn about missing retpoline in module x86/cpufeatures: Add CPUID_7_EDX CPUID leaf x86/cpufeatures: Add Intel feature bits for Speculation Control x86/cpufeatures: Add AMD feature bits for Speculation Control x86/msr: Add definitions for new speculation control MSRs x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support x86/nospec: Fix header guards names x86/bugs: Drop one "mitigation" from dmesg x86/cpu/bugs: Make retpoline module warning conditional x86/cpufeatures: Clean up Spectre v2 related CPUID flags x86/retpoline: Simplify vmexit_fill_RSB() x86/spectre: Check CONFIG_RETPOLINE in command line parser x86/entry/64: Remove the SYSCALL64 fast path x86/entry/64: Push extra regs right away x86/asm: Move 'status' from thread_struct to thread_info Documentation: Document array_index_nospec array_index_nospec: Sanitize speculative array de-references x86: Implement array_index_mask_nospec x86: Introduce barrier_nospec x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end} x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec x86/get_user: Use pointer masking to limit speculation x86/syscall: Sanitize syscall table de-references under speculation vfs, fdtable: Prevent bounds-check bypass via speculative execution nl80211: Sanitize array index in parse_txq_params x86/spectre: Report get_user mitigation for spectre_v1 x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable" x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel x86/paravirt: Remove 'noreplace-paravirt' cmdline option x86/kvm: Update spectre-v1 mitigation x86/retpoline: Avoid retpolines for built-in __init functions x86/spectre: Simplify spectre_v2 command line parsing x86/pti: Mark constant arrays as __initconst x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL KVM: nVMX: kmap() can't fail KVM: nVMX: vmx_complete_nested_posted_interrupt() can't fail KVM: nVMX: mark vmcs12 pages dirty on L2 exit KVM: nVMX: Eliminate vmcs02 pool KVM: VMX: introduce alloc_loaded_vmcs KVM: VMX: make MSR bitmaps per-VCPU KVM/x86: Add IBPB support KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL crypto: tcrypt - fix S/G table for test_aead_speed() ASoC: simple-card: Fix misleading error message ASoC: rsnd: don't call free_irq() on Parent SSI ASoC: rsnd: avoid duplicate free_irq() drm: rcar-du: Use the VBK interrupt for vblank events drm: rcar-du: Fix race condition when disabling planes at CRTC stop x86/microcode: Do the family check first Linux 4.9.81 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-02-13 12:57:29 +01:00
Andi Kleen	a1745ad92f	module/retpoline: Warn about missing retpoline in module (cherry picked from commit `caf7501a1b`) There's a risk that a kernel which has full retpoline mitigations becomes vulnerable when a module gets loaded that hasn't been compiled with the right compiler or the right option. To enable detection of that mismatch at module load time, add a module info string "retpoline" at build time when the module was compiled with retpoline support. This only covers compiled C source, but assembler source or prebuilt object files are not checked. If a retpoline enabled kernel detects a non retpoline protected module at load time, print a warning and report it in the sysfs vulnerability file. [ tglx: Massaged changelog ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: gregkh@linuxfoundation.org Cc: torvalds@linux-foundation.org Cc: jeyu@kernel.org Cc: arjan@linux.intel.com Link: https://lkml.kernel.org/r/20180125235028.31211-1-andi@firstfloor.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-02-13 12:35:58 +01:00
Chris Redpath	8a174b4749	sched/fair: prevent possible infinite loop in sched_group_energy There is a race between hotplug and energy_diff which might result in endless loop in sched_group_energy. When this happens, the end condition cannot be detected. We can store how many CPUs we need to visit at the beginning, and bail out of the energy calculation if we visit more cpus than expected. Bug: 72311797 72202633 Change-Id: I8dda75468ee1570da4071cd8165ef5131a8205d8 Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-10 00:45:21 +00:00
Pavankumar Kondeti	de22a05ed0	sched/fair: fix array out of bounds access in select_energy_cpu_idx() We are using an incorrect index while initializing the nrg_delta for the previous CPU in select_energy_cpu_idx(). This initialization it self is not needed as the nrg_delta for the previous CPU is already initialized to 0 while preparing the energ_env struct. Change-Id: Iee4e2c62f904050d2680a0a1df646d4d515c62cc Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2018-02-03 05:29:57 +05:30
Ionela Voinescu	f964120739	sched/fair: use min capacity when evaluating active cpus When we are calculating what the impact of placing a task on a specific cpu is, we should include the information that there might be a minimum capacity imposed upon that cpu which could change the performance and/or energy cost decisions. When choosing an active target CPU, favour CPUs that won't end up running at a high OPP due to a min capacity cap imposed by external actors. Change-Id: Ibc3302304345b63107f172b1fc3ffdabc19aa9d4 Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-01 18:37:35 +00:00
Ionela Voinescu	88a968ca63	sched/fair: use min capacity when evaluating idle backup cpus When we are calculating what the impact of placing a task on a specific cpu is, we should include the information that there might be a minimum capacity imposed upon that cpu which could change the performance and/or energy cost decisions. When choosing an idle backup CPU, favour CPUs that won't end up running at a high OPP due to a min capacity cap imposed by external actors. Change-Id: I566623ffb3a7c5b61a23242dcce1cb4147ef8a4a Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-01 18:37:32 +00:00
Ionela Voinescu	9e1c648c71	sched/fair: use min capacity when evaluating placement energy costs Add the ability to track minimim capacity forced onto a sched_group by some external actor. group_max_util returns the highest utilisation inside a sched_group and is used when we are trying to calculate an energy cost estimate for a specific scheduling scenario. Minimum capacities imposed from elsewhere will influence this energy cost so we should reflect it here. Change-Id: Ibd537a6dbe6d67b11cc9e9be18f40fcb2c0f13de Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-01 18:37:27 +00:00
Ionela Voinescu	58b761f4c0	sched/fair: introduce minimum capacity capping sched feature We have the ability to track minimum capacity forced onto a CPU by userspace or external actors. This is provided though a minimum frequency scale factor exposed by arch_scale_min_freq_capacity. The use of this information is enabled through the MIN_CAPACITY_CAPPING feature. If not enabled, the minimum frequency scale factor will remain 0 and it will not impact energy estimation or scheduling decisions. Change-Id: Ibc61f2bf4fddf186695b72b262e602a6e8bfde37 Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-01 18:37:23 +00:00
Ionela Voinescu	33550efbaa	sched: add arch_scale_min_freq_capacity to track minimum capacity caps If the minimum capacity of a group is capped by userspace or internal dependencies which are not otherwise visible to the scheduler, we need a way to see these and integrate this information into the energy calculations and task placement decisions we make. Add arch_scale_min_freq_capacity to determine the lowest capacity which a specific cpu can provide under the current set of known constraints. Change-Id: Ied4a1dc0982bbf42cb5ea2f27201d4363db59705 Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>	2018-02-01 18:37:10 +00:00
Dietmar Eggemann	da6833cff7	sched/fair: introduce an arch scaling function for max frequency capping The max frequency scaling factor is defined as: max_freq_scale = policy_max_freq / cpuinfo_max_freq To be able to scale the cpu capacity by this factor introduce a call to the new arch scaling function arch_scale_max_freq_capacity() in update_cpu_capacity() and provide a default implementation which returns SCHED_CAPACITY_SCALE. Another subsystem (e.g. cpufreq) can overwrite this default implementation, exactly as for frequency and cpu invariance. It has to be enabled by the arch by defining arch_scale_max_freq_capacity to the actual implementation. Change-Id: I266cd1f4c1c82f54b80063c36aa5f7662599dd28 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-01 15:59:24 +00:00
Greg Kroah-Hartman	71f1469722	Merge 4.9.79 into android-4.9 Changes in 4.9.79 x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels orangefs: use list_for_each_entry_safe in purge_waiting_ops orangefs: initialize op on loop restart in orangefs_devreq_read usbip: prevent vhci_hcd driver from leaking a socket pointer address usbip: Fix implicit fallthrough warning usbip: Fix potential format overflow in userspace tools can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2 Prevent timer value 0 for MWAITX drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled drivers: base: cacheinfo: fix boot error message when acpi is enabled mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack hwpoison, memcg: forcibly uncharge LRU pages cma: fix calculation of aligned offset mm, page_alloc: fix potential false positive in __zone_watermark_ok ipc: msg, make msgrcv work with LONG_MIN ACPI / scan: Prefer devices without _HID/_CID for _ADR matching ACPICA: Namespace: fix operand cache leak netfilter: nfnetlink_cthelper: Add missing permission checks netfilter: xt_osf: Add missing permission checks reiserfs: fix race in prealloc discard reiserfs: don't preallocate blocks for extended attributes fs/fcntl: f_setown, avoid undefined behaviour scsi: libiscsi: fix shifting of DID_REQUEUE host byte Revert "module: Add retpoline tag to VERMAGIC" mm: fix 100% CPU kswapd busyloop on unreclaimable nodes Input: trackpoint - force 3 buttons if 0 button is reported orangefs: fix deadlock; do not write i_size in read_iter um: link vmlinux with -no-pie vsyscall: Fix permissions for emulate mode with KAISER/PTI eventpoll.h: add missing epoll event masks dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL ipv6: fix udpv6 sendmsg crash caused by too small MTU ipv6: ip6_make_skb() needs to clear cork.base.dst lan78xx: Fix failure in USB Full Speed net: igmp: fix source address check for IGMPv3 reports net: qdisc_pkt_len_init() should be more robust net: tcp: close sock if net namespace is exiting pppoe: take ->needed_headroom of lower device into account on xmit r8169: fix memory corruption on retrieval of hardware statistics. sctp: do not allow the v4 socket to bind a v4mapped v6 address sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf tipc: fix a memory leak in tipc_nl_node_get_link() vmxnet3: repair memory leak net: Allow neigh contructor functions ability to modify the primary_key ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY ppp: unlock all_ppp_mutex before registering device be2net: restore properly promisc mode after queues reconfiguration ip6_gre: init dev->mtu and dev->hard_header_len correctly gso: validate gso_type in GSO handlers mlxsw: spectrum_router: Don't log an error on missing neighbor tun: fix a memory leak for tfile->tx_array flow_dissector: properly cap thoff field perf/x86/amd/power: Do not load AMD power module on !AMD platforms x86/microcode/intel: Extend BDW late-loading further with LLC size check hrtimer: Reset hrtimer cpu base proper on CPU hotplug x86: bpf_jit: small optimization in emit_bpf_tail_call() bpf: fix bpf_tail_call() x64 JIT bpf: introduce BPF_JIT_ALWAYS_ON config bpf: arsh is not supported in 32 bit alu thus reject it bpf: avoid false sharing of map refcount with max_entries bpf: fix divides by zero bpf: fix 32-bit divide by zero bpf: reject stores into ctx via st and xadd nfsd: auth: Fix gid sorting when rootsquash enabled Linux 4.9.79 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-01-31 14:13:00 +01:00
Daniel Borkmann	f531fbb06a	bpf: reject stores into ctx via st and xadd [ upstream commit `f37a8cb84c` ] Alexei found that verifier does not reject stores into context via BPF_ST instead of BPF_STX. And while looking at it, we also should not allow XADD variant of BPF_STX. The context rewriter is only assuming either BPF_LDX_MEM- or BPF_STX_MEM-type operations, thus reject anything other than that so that assumptions in the rewriter properly hold. Add test cases as well for BPF selftests. Fixes: `d691f9e8d4` ("bpf: allow programs to write to certain skb fields") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-31 12:55:57 +01:00
Alexei Starovoitov	265d7657c9	bpf: fix 32-bit divide by zero [ upstream commit `68fda450a7` ] due to some JITs doing if (src_reg == 0) check in 64-bit mode for div/mod operations mask upper 32-bits of src register before doing the check Fixes: `622582786c` ("net: filter: x86: internal BPF JIT") Fixes: `7a12b5031c` ("sparc64: Add eBPF JIT.") Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-31 12:55:57 +01:00
Eric Dumazet	4606077802	bpf: fix divides by zero [ upstream commit `c366287ebd` ] Divides by zero are not nice, lets avoid them if possible. Also do_div() seems not needed when dealing with 32bit operands, but this seems a minor detail. Fixes: `bd4cf0ed33` ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-31 12:55:57 +01:00
Daniel Borkmann	fcabc6d008	bpf: arsh is not supported in 32 bit alu thus reject it [ upstream commit `7891a87efc` ] The following snippet was throwing an 'unknown opcode cc' warning in BPF interpreter: 0: (18) r0 = 0x0 2: (7b) (u64 )(r10 -16) = r0 3: (cc) (u32) r0 s>>= (u32) r0 4: (95) exit Although a number of JITs do support BPF_ALU \| BPF_ARSH \| BPF_{K,X} generation, not all of them do and interpreter does neither. We can leave existing ones and implement it later in bpf-next for the remaining ones, but reject this properly in verifier for the time being. Fixes: `17a5267067` ("bpf: verifier (add verifier core)") Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-31 12:55:57 +01:00
Alexei Starovoitov	a3d6dd6a66	bpf: introduce BPF_JIT_ALWAYS_ON config [ upstream commit `290af86629` ] The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715. A quote from goolge project zero blog: "At this point, it would normally be necessary to locate gadgets in the host kernel code that can be used to actually leak data by reading from an attacker-controlled location, shifting and masking the result appropriately and then using the result of that as offset to an attacker-controlled address for a load. But piecing gadgets together and figuring out which ones work in a speculation context seems annoying. So instead, we decided to use the eBPF interpreter, which is built into the host kernel - while there is no legitimate way to invoke it from inside a VM, the presence of the code in the host kernel's text section is sufficient to make it usable for the attack, just like with ordinary ROP gadgets." To make attacker job harder introduce BPF_JIT_ALWAYS_ON config option that removes interpreter from the kernel in favor of JIT-only mode. So far eBPF JIT is supported by: x64, arm64, arm32, sparc64, s390, powerpc64, mips64 The start of JITed program is randomized and code page is marked as read-only. In addition "constant blinding" can be turned on with net.core.bpf_jit_harden v2->v3: - move __bpf_prog_ret0 under ifdef (Daniel) v1->v2: - fix init order, test_bpf and cBPF (Daniel's feedback) - fix offloaded bpf (Jakub's feedback) - add 'return 0' dummy in case something can invoke prog->bpf_func - retarget bpf tree. For bpf-next the patch would need one extra hunk. It will be sent when the trees are merged back to net-next Considered doing: int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT; but it seems better to land the patch as-is and in bpf-next remove bpf_jit_enable global variable from all JITs, consolidate in one place and remove this jit_init() function. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-31 12:55:56 +01:00
Alexei Starovoitov	5226bb3b95	bpf: fix bpf_tail_call() x64 JIT [ upstream commit `90caccdd8c` ] - bpf prog_array just like all other types of bpf array accepts 32-bit index. Clarify that in the comment. - fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes - tighten corresponding check in the interpreter to stay consistent The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag in commit `96eabe7a40` in 4.14. Before that the map_flags would stay zero and though JIT code is wrong it will check bounds correctly. Hence two fixes tags. All other JITs don't have this problem. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Fixes: `96eabe7a40` ("bpf: Allow selecting numa node during map creation") Fixes: `b52f00e6a7` ("x86: bpf_jit: implement bpf_tail_call() helper") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-31 12:55:56 +01:00
Thomas Gleixner	c98ff7299b	hrtimer: Reset hrtimer cpu base proper on CPU hotplug commit `d5421ea43d` upstream. The hrtimer interrupt code contains a hang detection and mitigation mechanism, which prevents that a long delayed hrtimer interrupt causes a continous retriggering of interrupts which prevent the system from making progress. If a hang is detected then the timer hardware is programmed with a certain delay into the future and a flag is set in the hrtimer cpu base which prevents newly enqueued timers from reprogramming the timer hardware prior to the chosen delay. The subsequent hrtimer interrupt after the delay clears the flag and resumes normal operation. If such a hang happens in the last hrtimer interrupt before a CPU is unplugged then the hang_detected flag is set and stays that way when the CPU is plugged in again. At that point the timer hardware is not armed and it cannot be armed because the hang_detected flag is still active, so nothing clears that flag. As a consequence the CPU does not receive hrtimer interrupts and no timers expire on that CPU which results in RCU stalls and other malfunctions. Clear the flag along with some other less critical members of the hrtimer cpu base to ensure starting from a clean state when a CPU is plugged in. Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the root cause of that hard to reproduce heisenbug. Once understood it's trivial and certainly justifies a brown paperbag. Fixes: `41d2e49493` ("hrtimer: Tune hrtimer_interrupt hang logic") Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-31 12:55:56 +01:00
Ke Wang	7be1985454	ANDROID: sched: EAS: check energy_aware() before calling select_energy_cpu_brute() in up-migrate path In up-migrate path, select_energy_cpu_brute() was called directly without checking energy_aware(). This will make select_energy_cpu_brute() always worked even disabling energy_aware() on the asymmetric cpu capacity system. Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>	2018-01-29 15:31:47 +00:00
Greg Kroah-Hartman	e9dabe69de	Merge 4.9.78 into android-4.9 Changes in 4.9.78 libnvdimm, btt: Fix an incompatibility in the log layout scsi: sg: disable SET_FORCE_LOW_DMA futex: Prevent overflow by strengthen input validation ALSA: seq: Make ioctls race-free ALSA: pcm: Remove yet superfluous WARN_ON() ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant ALSA: hda - Apply the existing quirk to iMac 14,1 timers: Unconditionally check deferrable base af_key: fix buffer overread in verify_address_len() af_key: fix buffer overread in parse_exthdrs() iser-target: Fix possible use-after-free in connection establishment error scsi: hpsa: fix volume offline state sched/deadline: Zero out positive runtime after throttling constrained tasks x86/retpoline: Fill RSB on context switch for affected CPUs x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros objtool: Improve error message for bad file argument x86/cpufeature: Move processor tracing out of scattered features module: Add retpoline tag to VERMAGIC x86/mm/pkeys: Fix fill_sig_info_pkey x86/tsc: Fix erroneous TSC rate on Skylake Xeon pipe: avoid round_pipe_size() nr_pages overflow on 32-bit x86/apic/vector: Fix off by one in error path perf tools: Fix build with ARCH=x86_64 Input: ALPS - fix multi-touch decoding on SS4 plus touchpads Input: 88pm860x-ts - fix child-node lookup Input: twl6040-vibra - fix child-node lookup Input: twl4030-vibra - fix sibling-node lookup tracing: Fix converting enum's from the map in trace_event_eval_update() phy: work around 'phys' references to usb-nop-xceiv devices ARM: sunxi_defconfig: Enable CMA ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7 can: peak: fix potential bug in packet fragmentation scripts/gdb/linux/tasks.py: fix get_thread_info proc: fix coredump vs read /proc/*/stat race libata: apply MAX_SEC_1024 to all LITEON EP1 series devices workqueue: avoid hard lockups in show_workqueue_state() dm btree: fix serious bug in btree_split_beneath() dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6 arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls x86/cpu, x86/pti: Do not enable PTI on AMD processors usbip: fix warning in vhci_hcd_probe/lockdep_init_map x86/mce: Make machine check speculation protected retpoline: Introduce start/end markers of indirect thunk kprobes/x86: Blacklist indirect thunk functions for kprobes kprobes/x86: Disable optimizing on the function jumps to indirect thunk x86/pti: Document fix wrong index x86/retpoline: Optimize inline assembler for vmexit_fill_RSB MIPS: AR7: ensure the port type's FCR value is used Linux 4.9.78 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-01-23 20:18:30 +01:00
Sergey Senozhatsky	ca2d736867	workqueue: avoid hard lockups in show_workqueue_state() commit `62635ea8c1` upstream. show_workqueue_state() can print out a lot of messages while being in atomic context, e.g. sysrq-t -> show_workqueue_state(). If the console device is slow it may end up triggering NMI hard lockup watchdog. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-23 19:57:08 +01:00
Steven Rostedt (VMware)	9a50ea0ce7	tracing: Fix converting enum's from the map in trace_event_eval_update() commit `1ebe1eaf2f` upstream. Since enums do not get converted by the TRACE_EVENT macro into their values, the event format displaces the enum name and not the value. This breaks tools like perf and trace-cmd that need to interpret the raw binary data. To solve this, an enum map was created to convert these enums into their actual numbers on boot up. This is done by TRACE_EVENTS() adding a TRACE_DEFINE_ENUM() macro. Some enums were not being converted. This was caused by an optization that had a bug in it. All calls get checked against this enum map to see if it should be converted or not, and it compares the call's system to the system that the enum map was created under. If they match, then they call is processed. To cut down on the number of iterations needed to find the maps with a matching system, since calls and maps are grouped by system, when a match is made, the index into the map array is saved, so that the next call, if it belongs to the same system as the previous call, could start right at that array index and not have to scan all the previous arrays. The problem was, the saved index was used as the variable to know if this is a call in a new system or not. If the index was zero, it was assumed that the call is in a new system and would keep incrementing the saved index until it found a matching system. The issue arises when the first matching system was at index zero. The next map, if it belonged to the same system, would then think it was the first match and increment the index to one. If the next call belong to the same system, it would begin its search of the maps off by one, and miss the first enum that should be converted. This left a single enum not converted properly. Also add a comment to describe exactly what that index was for. It took me a bit too long to figure out what I was thinking when debugging this issue. Link: http://lkml.kernel.org/r/717BE572-2070-4C1E-9902-9F2E0FEDA4F8@oracle.com Fixes: `0c564a538a` ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values") Reported-by: Chuck Lever <chuck.lever@oracle.com> Teste-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-23 19:57:07 +01:00
Xunlei Pang	1ad4f2872c	sched/deadline: Zero out positive runtime after throttling constrained tasks commit `ae83b56a56` upstream. When a contrained task is throttled by dl_check_constrained_dl(), it may carry the remaining positive runtime, as a result when dl_task_timer() fires and calls replenish_dl_entity(), it will not be replenished correctly due to the positive dl_se->runtime. This patch assigns its runtime to 0 if positive after throttling. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `df8eac8caf` ("sched/deadline: Throttle a constrained deadline task activated after the deadline) Link: http://lkml.kernel.org/r/1494421417-27550-1-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-23 19:57:05 +01:00
Thomas Gleixner	676109b28c	timers: Unconditionally check deferrable base commit `ed4bbf7910` upstream. When the timer base is checked for expired timers then the deferrable base must be checked as well. This was missed when making the deferrable base independent of base::nohz_active. Fixes: `ced6d5c11d` ("timers: Use deferrable base independent of base::nohz_active") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: rt@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-23 19:57:04 +01:00
Li Jinyue	d8a3170db0	futex: Prevent overflow by strengthen input validation commit `fbe0e839d1` upstream. UBSAN reports signed integer overflow in kernel/futex.c: UBSAN: Undefined behaviour in kernel/futex.c:2041:18 signed integer overflow: 0 - -2147483648 cannot be represented in type 'int' Add a sanity check to catch negative values of nr_wake and nr_requeue. Signed-off-by: Li Jinyue <lijinyue@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: dvhart@infradead.org Link: https://lkml.kernel.org/r/1513242294-31786-1-git-send-email-lijinyue@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-23 19:57:04 +01:00
Victor Wan	20946741c8	Merge branch 'android-4.9' into amlogic-4.9-dev Conflicts: Makefile init/main.c	2018-01-22 20:17:25 +08:00
Greg Kroah-Hartman	033d019ce2	Merge 4.9.77 into android-4.9 Changes in 4.9.77 dm bufio: fix shrinker scans when (nr_to_scan < retain_target) mac80211: Add RX flag to indicate ICV stripped ath10k: rebuild crypto header in rx data frames KVM: Fix stack-out-of-bounds read in write_mmio can: gs_usb: fix return value of the "set_bittiming" callback IB/srpt: Disable RDMA access by the initiator MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task MIPS: Factor out NT_PRFPREG regset access helpers MIPS: Guard against any partial write attempt with PTRACE_SETREGSET MIPS: Consistently handle buffer counter with PTRACE_SETREGSET MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses kvm: vmx: Scrub hardware GPRs at VM-exit platform/x86: wmi: Call acpi_wmi_init() later x86/acpi: Handle SCI interrupts above legacy space gracefully ALSA: pcm: Remove incorrect snd_BUG_ON() usages ALSA: pcm: Add missing error checks in OSS emulation plugin builder ALSA: pcm: Abort properly at pending signal in OSS read/write loops ALSA: pcm: Allow aborting mutex lock at OSS read/write loops ALSA: aloop: Release cable upon open error path ALSA: aloop: Fix inconsistent format due to incomplete rule ALSA: aloop: Fix racy hw constraints adjustment x86/acpi: Reduce code duplication in mp_override_legacy_irq() zswap: don't param_set_charp while holding spinlock lan78xx: use skb_cow_head() to deal with cloned skbs sr9700: use skb_cow_head() to deal with cloned skbs smsc75xx: use skb_cow_head() to deal with cloned skbs cx82310_eth: use skb_cow_head() to deal with cloned skbs xhci: Fix ring leak in failure path of xhci_alloc_virt_device() 8021q: fix a memory leak for VLAN 0 device ip6_tunnel: disable dst caching if tunnel is dual-stack net: core: fix module type in sock_diag_bind RDS: Heap OOB write in rds_message_alloc_sgs() RDS: null pointer dereference in rds_atomic_free_op sh_eth: fix TSU resource handling sh_eth: fix SH7757 GEther initialization net: stmmac: enable EEE in MII, GMII or RGMII only ipv6: fix possible mem leaks in ipv6_make_skb() ethtool: do not print warning for applications using legacy API mlxsw: spectrum_router: Fix NULL pointer deref net/sched: Fix update of lastuse in act modules implementing stats_update crypto: algapi - fix NULL dereference in crypto_remove_spawns() rbd: set max_segments to USHRT_MAX x86/microcode/intel: Extend BDW late-loading with a revision check KVM: x86: Add memory barrier on vmcs field lookup drm/vmwgfx: Potential off by one in vmw_view_add() kaiser: Set _PAGE_NX only if supported iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK bpf: move fixup_bpf_calls() function bpf: refactor fixup_bpf_calls() bpf: prevent out-of-bounds speculation bpf, array: fix overflow in max_entries and undefined behavior in index_mask USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ USB: serial: cp210x: add new device ID ELV ALC 8xxx usb: misc: usb3503: make sure reset is low for at least 100us USB: fix usbmon BUG trigger usbip: remove kernel addresses from usb device and urb debug msgs usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl Bluetooth: Prevent stack info leak from the EFS element. uas: ignore UAS for Norelsys NS1068(X) chips e1000e: Fix e1000_check_for_copper_link_ich8lan return value. x86/Documentation: Add PTI description x86/cpu: Factor out application of forced CPU caps x86/cpufeatures: Make CPU bugs sticky x86/cpufeatures: Add X86_BUG_CPU_INSECURE x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN x86/cpufeatures: Add X86_BUG_SPECTRE_V[12] x86/cpu: Merge bugs.c and bugs_64.c sysfs/cpu: Add vulnerability folder x86/cpu: Implement CPU vulnerabilites sysfs functions x86/cpu/AMD: Make LFENCE a serializing instruction x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC sysfs/cpu: Fix typos in vulnerability documentation x86/alternatives: Fix optimize_nops() checking x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier objtool, modules: Discard objtool annotation sections for modules objtool: Detect jumps to retpoline thunks objtool: Allow alternatives to be ignored x86/asm: Use register variable to get stack pointer value x86/retpoline: Add initial retpoline support x86/spectre: Add boot time option to select Spectre v2 mitigation x86/retpoline/crypto: Convert crypto assembler indirect jumps x86/retpoline/entry: Convert entry assembler indirect jumps x86/retpoline/ftrace: Convert ftrace assembler indirect jumps x86/retpoline/hyperv: Convert assembler indirect jumps x86/retpoline/xen: Convert Xen hypercall indirect jumps x86/retpoline/checksum32: Convert assembler indirect jumps x86/retpoline/irq32: Convert assembler indirect jumps x86/retpoline: Fill return stack buffer on vmexit selftests/x86: Add test_vsyscall x86/retpoline: Remove compile time warning objtool: Fix retpoline support for pre-ORC objtool x86/pti/efi: broken conversion from efi to kernel page table Linux 4.9.77 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-01-17 10:29:45 +01:00
Daniel Borkmann	820ef2a0e5	bpf, array: fix overflow in max_entries and undefined behavior in index_mask commit `bbeb6e4323` upstream. syzkaller tried to alloc a map with 0xfffffffd entries out of a userns, and thus unprivileged. With the recently added logic in `b2157399cc` ("bpf: prevent out-of-bounds speculation") we round this up to the next power of two value for max_entries for unprivileged such that we can apply proper masking into potentially zeroed out map slots. However, this will generate an index_mask of 0xffffffff, and therefore a + 1 will let this overflow into new max_entries of 0. This will pass allocation, etc, and later on map access we still enforce on the original attr->max_entries value which was 0xfffffffd, therefore triggering GPF all over the place. Thus bail out on overflow in such case. Moreover, on 32 bit archs roundup_pow_of_two() can also not be used, since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit space is undefined. Therefore, do this by hand in a 64 bit variable. This fixes all the issues triggered by syzkaller's reproducers. Fixes: `b2157399cc` ("bpf: prevent out-of-bounds speculation") Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-17 09:38:55 +01:00
Alexei Starovoitov	a9bfac14cd	bpf: prevent out-of-bounds speculation commit `b2157399cc` upstream. Under speculation, CPUs may mis-predict branches in bounds checks. Thus, memory accesses under a bounds check may be speculated even if the bounds check fails, providing a primitive for building a side channel. To avoid leaking kernel data round up array-based maps and mask the index after bounds check, so speculated load with out of bounds index will load either valid value from the array or zero from the padded area. Unconditionally mask index for all array types even when max_entries are not rounded to power of 2 for root user. When map is created by unpriv user generate a sequence of bpf insns that includes AND operation to make sure that JITed code includes the same 'index & index_mask' operation. If prog_array map is created by unpriv user replace bpf_tail_call(ctx, map, index); with if (index >= max_entries) { index &= map->index_mask; bpf_tail_call(ctx, map, index); } (along with roundup to power 2) to prevent out-of-bounds speculation. There is secondary redundant 'if (index >= max_entries)' in the interpreter and in all JITs, but they can be optimized later if necessary. Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array) cannot be used by unpriv, so no changes there. That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on all architectures with and without JIT. v2->v3: Daniel noticed that attack potentially can be crafted via syscall commands without loading the program, so add masking to those paths as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Slaby <jslaby@suse.cz> [ Backported to 4.9 - gregkh ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-17 09:38:55 +01:00
Alexei Starovoitov	f55093dccd	bpf: refactor fixup_bpf_calls() commit `79741b3bde` upstream. reduce indent and make it iterate over instructions similar to convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Jiri Slaby <jslaby@suse.cz> [Backported to 4.9.y - gregkh] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-17 09:38:55 +01:00
Alexei Starovoitov	28035366af	bpf: move fixup_bpf_calls() function commit `e245c5c6a5` upstream. no functional change. move fixup_bpf_calls() to verifier.c it's being refactored in the next patch Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Jiri Slaby <jslaby@suse.cz> [backported to 4.9 - gregkh] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-17 09:38:55 +01:00
Greg Kroah-Hartman	91549408ce	Merge 4.9.76 into android-4.9 Changes in 4.9.76 kernel/acct.c: fix the acct->needcheck check in check_free_space() crypto: n2 - cure use after free crypto: chacha20poly1305 - validate the digest size crypto: pcrypt - fix freeing pcrypt instances sunxi-rsb: Include OF based modalias in device uevent fscache: Fix the default for fscache_maybe_release_page() nbd: fix use-after-free of rq/bio in the xmit path kernel: make groups_sort calling a responsibility group_info allocators kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal() iommu/arm-smmu-v3: Don't free page table ops twice iommu/arm-smmu-v3: Cope with duplicated Stream IDs ARC: uaccess: dont use "l" gcc inline asm constraint modifier Input: elantech - add new icbody type 15 x86/microcode/AMD: Add support for fam17h microcode loading parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel parisc: qemu idle sleep support x86/tlb: Drop the _GPL from the cpu_tlbstate export Map the vsyscall page with _PAGE_USER mtd: nand: pxa3xx: Fix READOOB implementation Linux 4.9.76 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-01-10 09:51:38 +01:00
Oleg Nesterov	4d53eb4949	kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal() commit `426915796c` upstream. complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy the thread group, today this is wrong in many ways. If nothing else, fatal_signal_pending() should always imply that the whole thread group (except ->group_exit_task if it is not NULL) is killed, this check breaks the rule. After the previous changes we can rely on sig_task_ignored(); sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want to kill this task and sig == SIGKILL OR it is traced and debugger can intercept the signal. This should hopefully fix the problem reported by Dmitry. This test-case static int init(void arg) { for (;;) pause(); } int main(void) { char stack[16 1024]; for (;;) { int pid = clone(init, stack + sizeof(stack)/2, CLONE_NEWPID \| SIGCHLD, NULL); assert(pid > 0); assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0); assert(waitpid(-1, NULL, WSTOPPED) == pid); assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0); assert(syscall(__NR_tkill, pid, SIGKILL) == 0); assert(pid == wait(NULL)); } } triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in task_participate_group_stop(). do_signal_stop()->signal_group_exit() checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending() checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING. And his should fix the minor security problem reported by Kyle, SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the task is the root of a pid namespace. Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Kyle Huey <me@kylehuey.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-10 09:29:53 +01:00
Oleg Nesterov	794ac8ef9b	kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals commit `ac25385089` upstream. Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only() signals even if force == T. This simplifies the next change and this matches the same check in get_signal() which will drop these signals anyway. Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-10 09:29:52 +01:00
Oleg Nesterov	1453b3ac6c	kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL commit `628c1bcba2` upstream. The comment in sig_ignored() says "Tracers may want to know about even ignored signals" but SIGKILL can not be reported to debugger and it is just wrong to return 0 in this case: SIGKILL should only kill the SIGNAL_UNKILLABLE task if it comes from the parent ns. Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on sig_task_ignored(). SISGTOP coming from within the namespace is not really right too but at least debugger can intercept it, and we can't drop it here because this will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add another ->ptrace check later, we will see. Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-10 09:29:52 +01:00
Thiago Rafael Becker	79258d9834	kernel: make groups_sort calling a responsibility group_info allocators commit `bdcf0a423e` upstream. In testing, we found that nfsd threads may call set_groups in parallel for the same entry cached in auth.unix.gid, racing in the call of groups_sort, corrupting the groups for that entry and leading to permission denials for the client. This patch: - Make groups_sort globally visible. - Move the call to groups_sort to the modifiers of group_info - Remove the call to groups_sort from set_groups Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NeilBrown <neilb@suse.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-10 09:29:52 +01:00
Oleg Nesterov	790080ce0e	kernel/acct.c: fix the acct->needcheck check in check_free_space() commit `4d9570158b` upstream. As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check is very wrong, we need time_is_after_jiffies() to make sys_acct() work. Ignoring the overflows, the code should "goto out" if needcheck > jiffies, while currently it checks "needcheck < jiffies" and thus in the likely case check_free_space() does nothing until jiffies overflow. In particular this means that sys_acct() is simply broken, acct_on() sets acct->needcheck = jiffies and expects that check_free_space() should set acct->active = 1 after the free-space check, but this won't happen if jiffies increments in between. This was broken by commit `32dc730860` ("get rid of timer in kern/acct.c") in 2011, then another (correct) commit `795a2f22a8` ("acct() should honour the limits from the very beginning") made the problem more visible. Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com Fixes: `32dc730860` ("get rid of timer in kern/acct.c") Reported-by: TSUKADA Koutaro <tsukada@ascade.co.jp> Suggested-by: TSUKADA Koutaro <tsukada@ascade.co.jp> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-10 09:29:51 +01:00
Patrick Bellasi	3b9305de32	sched/fair: reduce rounding errors in energy computations The SG's energy is obtained by adding busy and idle contributions which are computed by considering a proper fraction of the SCHED_CAPACITY_SCALE defined by the SG's utilizations. By scaling each and every contribution conputed we risk to accumulate rounding errors which can results into a non null energy_delta also in cases when the same total accomulated utilization is differently distributed among different CPUs. To reduce rouding errors, this patch accumulated non-scaled busy/idle energy contributions for each visited SG, and scale each of them just one time at the end. Change-Id: Idf8367fee0ac11938c6436096f0c1b2d630210d2 Suggested-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:55:12 +00:00
Patrick Bellasi	cf28cf03a3	sched/fair: re-factor energy_diff to use a single (extensible) energy_env The energy_env data structure is used to cache values required by multiple different functions involved in energy_diff computation. Some of these functions require additional parameters which can be easily embedded into the energy_env itself. The current implementation of energy_diff hardcodes the usage of two different energy_env structures to estimate and compare the energy consumption related to a "before" and an "after" CPU. Moreover, it does this energy estimation by walking multiple times the SDs/SGs data structures. A better design can be envisioned by better using the energy_env structure to support a more efficient and concurrent evaluation of multiple schedule candidates. To this purpose, this patch provides a complete re-factoring of the energy_diff implementation to: 1. use a single energy_env structure for the evaluation of all the candidate CPUs 2. walk just one time the SDs/SGs, thus improving the overall performance to compute the energy estimation for each CPU candidate specified by the single used energy_env 3. simplify the code (at least if you look at the new version and not at this re-factoring patch) thus providing a more clean code to maintain and extend for additional features This patch updated all the clients of energy_env to use only the data provided by this structure and an index for one of its CPUs candidates. Embedding everything within the energy env will make it simple to add tracepoints for this new version, which can easily provide an holistic view on how energy_diff evaluated the proposed CPU candidates. The new proposed structure, for both "struct energy_env" and the functions using it, is designed in such a way to easily accommodate additional further extensions (e.g. SchedTune filtering) without requiring an additional big re-factoring of these core functions. Finally, the usage of a CPUs candidate array, embedded into the energy_diff structure, allows also to seamless extend the exploration of multiple candidate CPUs, for example to support the comparison of a spread-vs-packing strategy. Change-Id: Ic04ffb6848b2c763cf1788767f22c6872eb12bee Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> [reworked find_new_capacity() and enforced the respect of find_best_target() selection order] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:55:08 +00:00
Patrick Bellasi	142ce32a63	sched/fair: cleanup select_energy_cpu_brute to be more consistent The current definition of select_energy_cpu_brute is a bit confusing in the definition of the value for the target_cpu to be returned as wakeup CPU for the specified task. This cleanup the code by ensuring that we always set target_cpu right before returning it. rcu_read_lock and check on *sd!=NULL are also moved around to be exactly where they are required. Change-Id: I70a4b558b3624a13395da1a87ddc0776fd1d6641 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:55:03 +00:00
Patrick Bellasi	7f44e92d1c	sched/fair: remove capacity tracking from energy_diff In preparation for the energy_diff refactoring, let's remove all the SchedTune specific bits which are used to keep track of the capacity variations requited by the PESpace filtering. This removes also the energy_normalization function and the wrapper of energy_diff which is used to trigger a PESpace filtering by schedtune_accept_deltas(). The remaining code is the "original" energy_diff function which looks just at the energy variations to compare prev_cpu vs next_cpu. Change-Id: I4fb1d1c5ba45a364e6db9ab8044969349aba0307 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:54:59 +00:00
Patrick Bellasi	4905932b05	sched/fair: remove energy_diff tracepoint in preparation to re-factoring The format of the energy_diff tracepoint is going to be changed by the following energ_diff refactoring patches. Let's remove it now to start from a clean slate. Change-Id: Id4f537ed60d90a7ddcca0a29a49944bfacb85c8c Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:54:55 +00:00
Chris Redpath	e997bf0fde	sched/fair: use p to reference task_structs This is a simple renaming patch which just align to the most common code convention used in fair.c, task_structs pointers are usually named p. Change-Id: Id0769e52b6a271014d89353fdb4be9bb721b5b2f Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:54:50 +00:00
Ke Wang	225006a4f7	sched: EAS: Fix the calculation of group util in group_idle_state() util_delta becomes not zero in eenv_before, which will affect the calculation of grp_util in group_idle_state(). Fix it under the new condition. Change-Id: Ic3853bb45876a8e388afcbe4e72d25fc42b1d7b0 Signed-off-by: Ke Wang <ke.wang@spreadtrum.com> (cherry picked from commit `47c87b2654`) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:54:46 +00:00
Victor Wan	2c95ea743b	Merge branch 'android-4.9' into amlogic-4.9-dev Conflicts: arch/arm/configs/omap2plus_defconfig drivers/Makefile drivers/android/binder.c	2018-01-08 18:44:19 +08:00
Greg Kroah-Hartman	bc7ff9b998	Merge 4.9.75 into android-4.9 Changes in 4.9.75 tcp_bbr: reset full pipe detection on loss recovery undo tcp_bbr: reset long-term bandwidth sampling on loss recovery undo x86/boot: Add early cmdline parsing for options with arguments KAISER: Kernel Address Isolation kaiser: merged update kaiser: do not set _PAGE_NX on pgd_none kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE kaiser: fix build and FIXME in alloc_ldt_struct() kaiser: KAISER depends on SMP kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER kaiser: fix perf crashes kaiser: ENOMEM if kaiser_pagetable_walk() NULL kaiser: tidied up asm/kaiser.h somewhat kaiser: tidied up kaiser_add/remove_mapping slightly kaiser: align addition to x86/mm/Makefile kaiser: cleanups while trying for gold link kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET kaiser: delete KAISER_REAL_SWITCH option kaiser: vmstat show NR_KAISERTABLE as nr_overhead kaiser: enhanced by kernel and user PCIDs kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user kaiser: PCID 0 for kernel and 128 for user kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user kaiser: paranoid_entry pass cr3 need to paranoid_exit kaiser: kaiser_remove_mapping() move along the pgd kaiser: fix unlikely error in alloc_ldt_struct() kaiser: add "nokaiser" boot option, using ALTERNATIVE x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling x86/kaiser: Check boottime cmdline params kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush kaiser: drop is_atomic arg to kaiser_pagetable_walk() kaiser: asm/tlbflush.h handle noPGE at lower level kaiser: kaiser_flush_tlb_on_return_to_user() check PCID x86/paravirt: Dont patch flush_tlb_single x86/kaiser: Reenable PARAVIRT kaiser: disabled on Xen PV x86/kaiser: Move feature detection up KPTI: Rename to PAGE_TABLE_ISOLATION KPTI: Report when enabled kaiser: Set _PAGE_NX only if supported Linux 4.9.75 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-01-05 22:28:25 +01:00
Hugh Dickins	0994a2cf8f	kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE Kaiser only needs to map one page of the stack; and kernel/fork.c did not build on powerpc (no __PAGE_KERNEL). It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack() and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's kaiser_add_mapping() and kaiser_remove_mapping(). And use linux/kaiser.h in init/main.c to avoid the #ifdefs there. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-05 15:46:32 +01:00
Dave Hansen	8f0baadf2b	kaiser: merged update Merged fixes and cleanups, rebased to 4.9.51 tree (no 5-level paging). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-05 15:46:32 +01:00
Richard Fellner	13be4483bb	KAISER: Kernel Address Isolation This patch introduces our implementation of KAISER (Kernel Address Isolation to have Side-channels Efficiently Removed), a kernel isolation technique to close hardware side channels on kernel address information. More information about the patch can be found on: https://github.com/IAIK/KAISER From: Richard Fellner <richard.fellner@student.tugraz.at> From: Daniel Gruss <daniel.gruss@iaik.tugraz.at> Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode Date: Thu, 4 May 2017 14:26:50 +0200 Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2 Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5 To: <linux-kernel@vger.kernel.org> To: <kernel-hardening@lists.openwall.com> Cc: <clementine.maurice@iaik.tugraz.at> Cc: <moritz.lipp@iaik.tugraz.at> Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at> Cc: Richard Fellner <richard.fellner@student.tugraz.at> Cc: Ingo Molnar <mingo@kernel.org> Cc: <kirill.shutemov@linux.intel.com> Cc: <anders.fogh@gdata-adan.de> After several recent works [1,2,3] KASLR on x86_64 was basically considered dead by many researchers. We have been working on an efficient but effective fix for this problem and found that not mapping the kernel space when running in user mode is the solution to this problem [4] (the corresponding paper [5] will be presented at ESSoS17). With this RFC patch we allow anybody to configure their kernel with the flag CONFIG_KAISER to add our defense mechanism. If there are any questions we would love to answer them. We also appreciate any comments! Cheers, Daniel (+ the KAISER team from Graz University of Technology) [1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf [2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf [3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf [4] https://github.com/IAIK/KAISER [5] https://gruss.cc/files/kaiser.pdf [patch based also on https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch] Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at> Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at> Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at> Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-05 15:46:32 +01:00

... 11 12 13 14 15 ...

24689 Commits