The current implementation of the energy-aware wake-up path relies on
find_best_target() to select an ordered list of CPU candidates for task
placement. The first candidate of the list saving energy is then chosen,
disregarding all the others to avoid the overhead of an expensive
energy_diff.
With the recent refactoring of select_energy_cpu_idx(), the cost of
exploring multiple CPUs has been reduced, hence offering the opportunity
to select the most energy-efficient candidate at a lower cost. This commit
seizes this opportunity by allowing to change select_energy_cpu_idx()'s
behaviour as to ignore the order of CPUs returned by find_best_target()
and to pick the best candidate energy-wise.
As this functionality is still considered as experimental, it is hidden
behind a sched_feature named FBT_STRICT_ORDER (like the equivalent
feature in Android 4.14) which defaults to true, hence keeping the
current behaviour by default.
Change-Id: I0cb833bfec1a4a053eddaff1652c0b6cad554f97
Suggested-by: Patrick Bellasi <patrick.bellasi@arm.com>
Suggested-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Changes in 4.9.81
powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
powerpc/64: Add macros for annotating the destination of rfid/hrfid
powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
powerpc/64s: Add support for RFI flush of L1-D cache
powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
powerpc/pseries: Query hypervisor for RFI flush settings
powerpc/powernv: Check device-tree for RFI flush settings
powerpc/64s: Wire up cpu_show_meltdown()
powerpc/64s: Allow control of RFI flush via debugfs
auxdisplay: img-ascii-lcd: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
pinctrl: pxa: pxa2xx: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
kaiser: fix intel_bts perf crashes
x86/pti: Make unpoison of pgd for trusted boot work for real
kaiser: allocate pgd with order 0 when pti=off
serial: core: mark port as initialized after successful IRQ change
ip6mr: fix stale iterator
net: igmp: add a missing rcu locking section
qlcnic: fix deadlock bug
qmi_wwan: Add support for Quectel EP06
r8169: fix RTL8168EP take too long to complete driver initialization.
tcp: release sk_frag.page in tcp_disconnect
vhost_net: stop device during reset owner
tcp_bbr: fix pacing_gain to always be unity when using lt_bw
cls_u32: add missing RCU annotation.
ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only
soreuseport: fix mem leak in reuseport_add_sock()
x86/asm: Fix inline asm call constraints for GCC 4.4
x86/microcode/AMD: Do not load when running on a hypervisor
media: soc_camera: soc_scale_crop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
b43: Add missing MODULE_FIRMWARE()
KEYS: encrypted: fix buffer overread in valid_master_desc()
x86/retpoline: Remove the esp/rsp thunk
KVM: x86: Make indirect calls in emulator speculation safe
KVM: VMX: Make indirect call speculation safe
module/retpoline: Warn about missing retpoline in module
x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
x86/cpufeatures: Add Intel feature bits for Speculation Control
x86/cpufeatures: Add AMD feature bits for Speculation Control
x86/msr: Add definitions for new speculation control MSRs
x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
x86/nospec: Fix header guards names
x86/bugs: Drop one "mitigation" from dmesg
x86/cpu/bugs: Make retpoline module warning conditional
x86/cpufeatures: Clean up Spectre v2 related CPUID flags
x86/retpoline: Simplify vmexit_fill_RSB()
x86/spectre: Check CONFIG_RETPOLINE in command line parser
x86/entry/64: Remove the SYSCALL64 fast path
x86/entry/64: Push extra regs right away
x86/asm: Move 'status' from thread_struct to thread_info
Documentation: Document array_index_nospec
array_index_nospec: Sanitize speculative array de-references
x86: Implement array_index_mask_nospec
x86: Introduce barrier_nospec
x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec
x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end}
x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec
x86/get_user: Use pointer masking to limit speculation
x86/syscall: Sanitize syscall table de-references under speculation
vfs, fdtable: Prevent bounds-check bypass via speculative execution
nl80211: Sanitize array index in parse_txq_params
x86/spectre: Report get_user mitigation for spectre_v1
x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable"
x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
x86/paravirt: Remove 'noreplace-paravirt' cmdline option
x86/kvm: Update spectre-v1 mitigation
x86/retpoline: Avoid retpolines for built-in __init functions
x86/spectre: Simplify spectre_v2 command line parsing
x86/pti: Mark constant arrays as __initconst
x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL
KVM: nVMX: kmap() can't fail
KVM: nVMX: vmx_complete_nested_posted_interrupt() can't fail
KVM: nVMX: mark vmcs12 pages dirty on L2 exit
KVM: nVMX: Eliminate vmcs02 pool
KVM: VMX: introduce alloc_loaded_vmcs
KVM: VMX: make MSR bitmaps per-VCPU
KVM/x86: Add IBPB support
KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
crypto: tcrypt - fix S/G table for test_aead_speed()
ASoC: simple-card: Fix misleading error message
ASoC: rsnd: don't call free_irq() on Parent SSI
ASoC: rsnd: avoid duplicate free_irq()
drm: rcar-du: Use the VBK interrupt for vblank events
drm: rcar-du: Fix race condition when disabling planes at CRTC stop
x86/microcode: Do the family check first
Linux 4.9.81
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
(cherry picked from commit caf7501a1b)
There's a risk that a kernel which has full retpoline mitigations becomes
vulnerable when a module gets loaded that hasn't been compiled with the
right compiler or the right option.
To enable detection of that mismatch at module load time, add a module info
string "retpoline" at build time when the module was compiled with
retpoline support. This only covers compiled C source, but assembler source
or prebuilt object files are not checked.
If a retpoline enabled kernel detects a non retpoline protected module at
load time, print a warning and report it in the sysfs vulnerability file.
[ tglx: Massaged changelog ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: gregkh@linuxfoundation.org
Cc: torvalds@linux-foundation.org
Cc: jeyu@kernel.org
Cc: arjan@linux.intel.com
Link: https://lkml.kernel.org/r/20180125235028.31211-1-andi@firstfloor.org
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a race between hotplug and energy_diff which might result
in endless loop in sched_group_energy. When this happens, the end
condition cannot be detected.
We can store how many CPUs we need to visit at the beginning, and
bail out of the energy calculation if we visit more cpus than expected.
Bug: 72311797 72202633
Change-Id: I8dda75468ee1570da4071cd8165ef5131a8205d8
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
We are using an incorrect index while initializing the nrg_delta for
the previous CPU in select_energy_cpu_idx(). This initialization
it self is not needed as the nrg_delta for the previous CPU is already
initialized to 0 while preparing the energ_env struct.
Change-Id: Iee4e2c62f904050d2680a0a1df646d4d515c62cc
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
When we are calculating what the impact of placing a task on a specific
cpu is, we should include the information that there might be a minimum
capacity imposed upon that cpu which could change the performance and/or
energy cost decisions.
When choosing an active target CPU, favour CPUs that won't end up
running at a high OPP due to a min capacity cap imposed by external
actors.
Change-Id: Ibc3302304345b63107f172b1fc3ffdabc19aa9d4
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
When we are calculating what the impact of placing a task on a specific
cpu is, we should include the information that there might be a minimum
capacity imposed upon that cpu which could change the performance and/or
energy cost decisions.
When choosing an idle backup CPU, favour CPUs that won't end up
running at a high OPP due to a min capacity cap imposed by external
actors.
Change-Id: I566623ffb3a7c5b61a23242dcce1cb4147ef8a4a
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Add the ability to track minimim capacity forced onto a sched_group
by some external actor.
group_max_util returns the highest utilisation inside a sched_group
and is used when we are trying to calculate an energy cost estimate
for a specific scheduling scenario. Minimum capacities imposed from
elsewhere will influence this energy cost so we should reflect it
here.
Change-Id: Ibd537a6dbe6d67b11cc9e9be18f40fcb2c0f13de
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
We have the ability to track minimum capacity forced onto a CPU by
userspace or external actors. This is provided though a minimum
frequency scale factor exposed by arch_scale_min_freq_capacity.
The use of this information is enabled through the MIN_CAPACITY_CAPPING
feature. If not enabled, the minimum frequency scale factor will
remain 0 and it will not impact energy estimation or scheduling
decisions.
Change-Id: Ibc61f2bf4fddf186695b72b262e602a6e8bfde37
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
If the minimum capacity of a group is capped by userspace or internal
dependencies which are not otherwise visible to the scheduler, we need
a way to see these and integrate this information into the energy
calculations and task placement decisions we make.
Add arch_scale_min_freq_capacity to determine the lowest capacity which
a specific cpu can provide under the current set of known constraints.
Change-Id: Ied4a1dc0982bbf42cb5ea2f27201d4363db59705
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
The max frequency scaling factor is defined as:
max_freq_scale = policy_max_freq / cpuinfo_max_freq
To be able to scale the cpu capacity by this factor introduce a call to
the new arch scaling function arch_scale_max_freq_capacity() in
update_cpu_capacity() and provide a default implementation which returns
SCHED_CAPACITY_SCALE.
Another subsystem (e.g. cpufreq) can overwrite this default implementation,
exactly as for frequency and cpu invariance. It has to be enabled by the
arch by defining arch_scale_max_freq_capacity to the actual
implementation.
Change-Id: I266cd1f4c1c82f54b80063c36aa5f7662599dd28
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Changes in 4.9.79
x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
orangefs: use list_for_each_entry_safe in purge_waiting_ops
orangefs: initialize op on loop restart in orangefs_devreq_read
usbip: prevent vhci_hcd driver from leaking a socket pointer address
usbip: Fix implicit fallthrough warning
usbip: Fix potential format overflow in userspace tools
can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
Prevent timer value 0 for MWAITX
drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
drivers: base: cacheinfo: fix boot error message when acpi is enabled
mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
hwpoison, memcg: forcibly uncharge LRU pages
cma: fix calculation of aligned offset
mm, page_alloc: fix potential false positive in __zone_watermark_ok
ipc: msg, make msgrcv work with LONG_MIN
ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
ACPICA: Namespace: fix operand cache leak
netfilter: nfnetlink_cthelper: Add missing permission checks
netfilter: xt_osf: Add missing permission checks
reiserfs: fix race in prealloc discard
reiserfs: don't preallocate blocks for extended attributes
fs/fcntl: f_setown, avoid undefined behaviour
scsi: libiscsi: fix shifting of DID_REQUEUE host byte
Revert "module: Add retpoline tag to VERMAGIC"
mm: fix 100% CPU kswapd busyloop on unreclaimable nodes
Input: trackpoint - force 3 buttons if 0 button is reported
orangefs: fix deadlock; do not write i_size in read_iter
um: link vmlinux with -no-pie
vsyscall: Fix permissions for emulate mode with KAISER/PTI
eventpoll.h: add missing epoll event masks
dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
ipv6: fix udpv6 sendmsg crash caused by too small MTU
ipv6: ip6_make_skb() needs to clear cork.base.dst
lan78xx: Fix failure in USB Full Speed
net: igmp: fix source address check for IGMPv3 reports
net: qdisc_pkt_len_init() should be more robust
net: tcp: close sock if net namespace is exiting
pppoe: take ->needed_headroom of lower device into account on xmit
r8169: fix memory corruption on retrieval of hardware statistics.
sctp: do not allow the v4 socket to bind a v4mapped v6 address
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
tipc: fix a memory leak in tipc_nl_node_get_link()
vmxnet3: repair memory leak
net: Allow neigh contructor functions ability to modify the primary_key
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
ppp: unlock all_ppp_mutex before registering device
be2net: restore properly promisc mode after queues reconfiguration
ip6_gre: init dev->mtu and dev->hard_header_len correctly
gso: validate gso_type in GSO handlers
mlxsw: spectrum_router: Don't log an error on missing neighbor
tun: fix a memory leak for tfile->tx_array
flow_dissector: properly cap thoff field
perf/x86/amd/power: Do not load AMD power module on !AMD platforms
x86/microcode/intel: Extend BDW late-loading further with LLC size check
hrtimer: Reset hrtimer cpu base proper on CPU hotplug
x86: bpf_jit: small optimization in emit_bpf_tail_call()
bpf: fix bpf_tail_call() x64 JIT
bpf: introduce BPF_JIT_ALWAYS_ON config
bpf: arsh is not supported in 32 bit alu thus reject it
bpf: avoid false sharing of map refcount with max_entries
bpf: fix divides by zero
bpf: fix 32-bit divide by zero
bpf: reject stores into ctx via st and xadd
nfsd: auth: Fix gid sorting when rootsquash enabled
Linux 4.9.79
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ upstream commit f37a8cb84c ]
Alexei found that verifier does not reject stores into context
via BPF_ST instead of BPF_STX. And while looking at it, we
also should not allow XADD variant of BPF_STX.
The context rewriter is only assuming either BPF_LDX_MEM- or
BPF_STX_MEM-type operations, thus reject anything other than
that so that assumptions in the rewriter properly hold. Add
test cases as well for BPF selftests.
Fixes: d691f9e8d4 ("bpf: allow programs to write to certain skb fields")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ upstream commit c366287ebd ]
Divides by zero are not nice, lets avoid them if possible.
Also do_div() seems not needed when dealing with 32bit operands,
but this seems a minor detail.
Fixes: bd4cf0ed33 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ upstream commit 7891a87efc ]
The following snippet was throwing an 'unknown opcode cc' warning
in BPF interpreter:
0: (18) r0 = 0x0
2: (7b) *(u64 *)(r10 -16) = r0
3: (cc) (u32) r0 s>>= (u32) r0
4: (95) exit
Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
generation, not all of them do and interpreter does neither. We can
leave existing ones and implement it later in bpf-next for the
remaining ones, but reject this properly in verifier for the time
being.
Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ upstream commit 290af86629 ]
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."
To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64
The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)
v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
It will be sent when the trees are merged back to net-next
Considered doing:
int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ upstream commit 90caccdd8c ]
- bpf prog_array just like all other types of bpf array accepts 32-bit index.
Clarify that in the comment.
- fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
- tighten corresponding check in the interpreter to stay consistent
The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
in commit 96eabe7a40 in 4.14. Before that the map_flags would stay zero and
though JIT code is wrong it will check bounds correctly.
Hence two fixes tags. All other JITs don't have this problem.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 96eabe7a40 ("bpf: Allow selecting numa node during map creation")
Fixes: b52f00e6a7 ("x86: bpf_jit: implement bpf_tail_call() helper")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5421ea43d upstream.
The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.
If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.
Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.
Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.
Fixes: 41d2e49493 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In up-migrate path, select_energy_cpu_brute() was called directly
without checking energy_aware(). This will make select_energy_cpu_brute()
always worked even disabling energy_aware() on the asymmetric cpu
capacity system.
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
Changes in 4.9.78
libnvdimm, btt: Fix an incompatibility in the log layout
scsi: sg: disable SET_FORCE_LOW_DMA
futex: Prevent overflow by strengthen input validation
ALSA: seq: Make ioctls race-free
ALSA: pcm: Remove yet superfluous WARN_ON()
ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant
ALSA: hda - Apply the existing quirk to iMac 14,1
timers: Unconditionally check deferrable base
af_key: fix buffer overread in verify_address_len()
af_key: fix buffer overread in parse_exthdrs()
iser-target: Fix possible use-after-free in connection establishment error
scsi: hpsa: fix volume offline state
sched/deadline: Zero out positive runtime after throttling constrained tasks
x86/retpoline: Fill RSB on context switch for affected CPUs
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
objtool: Improve error message for bad file argument
x86/cpufeature: Move processor tracing out of scattered features
module: Add retpoline tag to VERMAGIC
x86/mm/pkeys: Fix fill_sig_info_pkey
x86/tsc: Fix erroneous TSC rate on Skylake Xeon
pipe: avoid round_pipe_size() nr_pages overflow on 32-bit
x86/apic/vector: Fix off by one in error path
perf tools: Fix build with ARCH=x86_64
Input: ALPS - fix multi-touch decoding on SS4 plus touchpads
Input: 88pm860x-ts - fix child-node lookup
Input: twl6040-vibra - fix child-node lookup
Input: twl4030-vibra - fix sibling-node lookup
tracing: Fix converting enum's from the map in trace_event_eval_update()
phy: work around 'phys' references to usb-nop-xceiv devices
ARM: sunxi_defconfig: Enable CMA
ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7
can: peak: fix potential bug in packet fragmentation
scripts/gdb/linux/tasks.py: fix get_thread_info
proc: fix coredump vs read /proc/*/stat race
libata: apply MAX_SEC_1024 to all LITEON EP1 series devices
workqueue: avoid hard lockups in show_workqueue_state()
dm btree: fix serious bug in btree_split_beneath()
dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6
arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
x86/cpu, x86/pti: Do not enable PTI on AMD processors
usbip: fix warning in vhci_hcd_probe/lockdep_init_map
x86/mce: Make machine check speculation protected
retpoline: Introduce start/end markers of indirect thunk
kprobes/x86: Blacklist indirect thunk functions for kprobes
kprobes/x86: Disable optimizing on the function jumps to indirect thunk
x86/pti: Document fix wrong index
x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
MIPS: AR7: ensure the port type's FCR value is used
Linux 4.9.78
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 62635ea8c1 upstream.
show_workqueue_state() can print out a lot of messages while being in
atomic context, e.g. sysrq-t -> show_workqueue_state(). If the console
device is slow it may end up triggering NMI hard lockup watchdog.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ebe1eaf2f upstream.
Since enums do not get converted by the TRACE_EVENT macro into their values,
the event format displaces the enum name and not the value. This breaks
tools like perf and trace-cmd that need to interpret the raw binary data. To
solve this, an enum map was created to convert these enums into their actual
numbers on boot up. This is done by TRACE_EVENTS() adding a
TRACE_DEFINE_ENUM() macro.
Some enums were not being converted. This was caused by an optization that
had a bug in it.
All calls get checked against this enum map to see if it should be converted
or not, and it compares the call's system to the system that the enum map
was created under. If they match, then they call is processed.
To cut down on the number of iterations needed to find the maps with a
matching system, since calls and maps are grouped by system, when a match is
made, the index into the map array is saved, so that the next call, if it
belongs to the same system as the previous call, could start right at that
array index and not have to scan all the previous arrays.
The problem was, the saved index was used as the variable to know if this is
a call in a new system or not. If the index was zero, it was assumed that
the call is in a new system and would keep incrementing the saved index
until it found a matching system. The issue arises when the first matching
system was at index zero. The next map, if it belonged to the same system,
would then think it was the first match and increment the index to one. If
the next call belong to the same system, it would begin its search of the
maps off by one, and miss the first enum that should be converted. This left
a single enum not converted properly.
Also add a comment to describe exactly what that index was for. It took me a
bit too long to figure out what I was thinking when debugging this issue.
Link: http://lkml.kernel.org/r/717BE572-2070-4C1E-9902-9F2E0FEDA4F8@oracle.com
Fixes: 0c564a538a ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Teste-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.9.77
dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
mac80211: Add RX flag to indicate ICV stripped
ath10k: rebuild crypto header in rx data frames
KVM: Fix stack-out-of-bounds read in write_mmio
can: gs_usb: fix return value of the "set_bittiming" callback
IB/srpt: Disable RDMA access by the initiator
MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
MIPS: Factor out NT_PRFPREG regset access helpers
MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
kvm: vmx: Scrub hardware GPRs at VM-exit
platform/x86: wmi: Call acpi_wmi_init() later
x86/acpi: Handle SCI interrupts above legacy space gracefully
ALSA: pcm: Remove incorrect snd_BUG_ON() usages
ALSA: pcm: Add missing error checks in OSS emulation plugin builder
ALSA: pcm: Abort properly at pending signal in OSS read/write loops
ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
ALSA: aloop: Release cable upon open error path
ALSA: aloop: Fix inconsistent format due to incomplete rule
ALSA: aloop: Fix racy hw constraints adjustment
x86/acpi: Reduce code duplication in mp_override_legacy_irq()
zswap: don't param_set_charp while holding spinlock
lan78xx: use skb_cow_head() to deal with cloned skbs
sr9700: use skb_cow_head() to deal with cloned skbs
smsc75xx: use skb_cow_head() to deal with cloned skbs
cx82310_eth: use skb_cow_head() to deal with cloned skbs
xhci: Fix ring leak in failure path of xhci_alloc_virt_device()
8021q: fix a memory leak for VLAN 0 device
ip6_tunnel: disable dst caching if tunnel is dual-stack
net: core: fix module type in sock_diag_bind
RDS: Heap OOB write in rds_message_alloc_sgs()
RDS: null pointer dereference in rds_atomic_free_op
sh_eth: fix TSU resource handling
sh_eth: fix SH7757 GEther initialization
net: stmmac: enable EEE in MII, GMII or RGMII only
ipv6: fix possible mem leaks in ipv6_make_skb()
ethtool: do not print warning for applications using legacy API
mlxsw: spectrum_router: Fix NULL pointer deref
net/sched: Fix update of lastuse in act modules implementing stats_update
crypto: algapi - fix NULL dereference in crypto_remove_spawns()
rbd: set max_segments to USHRT_MAX
x86/microcode/intel: Extend BDW late-loading with a revision check
KVM: x86: Add memory barrier on vmcs field lookup
drm/vmwgfx: Potential off by one in vmw_view_add()
kaiser: Set _PAGE_NX only if supported
iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
bpf: move fixup_bpf_calls() function
bpf: refactor fixup_bpf_calls()
bpf: prevent out-of-bounds speculation
bpf, array: fix overflow in max_entries and undefined behavior in index_mask
USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
USB: serial: cp210x: add new device ID ELV ALC 8xxx
usb: misc: usb3503: make sure reset is low for at least 100us
USB: fix usbmon BUG trigger
usbip: remove kernel addresses from usb device and urb debug msgs
usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
Bluetooth: Prevent stack info leak from the EFS element.
uas: ignore UAS for Norelsys NS1068(X) chips
e1000e: Fix e1000_check_for_copper_link_ich8lan return value.
x86/Documentation: Add PTI description
x86/cpu: Factor out application of forced CPU caps
x86/cpufeatures: Make CPU bugs sticky
x86/cpufeatures: Add X86_BUG_CPU_INSECURE
x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
x86/cpu: Merge bugs.c and bugs_64.c
sysfs/cpu: Add vulnerability folder
x86/cpu: Implement CPU vulnerabilites sysfs functions
x86/cpu/AMD: Make LFENCE a serializing instruction
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
sysfs/cpu: Fix typos in vulnerability documentation
x86/alternatives: Fix optimize_nops() checking
x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
objtool, modules: Discard objtool annotation sections for modules
objtool: Detect jumps to retpoline thunks
objtool: Allow alternatives to be ignored
x86/asm: Use register variable to get stack pointer value
x86/retpoline: Add initial retpoline support
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline: Fill return stack buffer on vmexit
selftests/x86: Add test_vsyscall
x86/retpoline: Remove compile time warning
objtool: Fix retpoline support for pre-ORC objtool
x86/pti/efi: broken conversion from efi to kernel page table
Linux 4.9.77
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit bbeb6e4323 upstream.
syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.
However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr->max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.
Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.
This fixes all the issues triggered by syzkaller's reproducers.
Fixes: b2157399cc ("bpf: prevent out-of-bounds speculation")
Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2157399cc upstream.
Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.
To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.
Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.
If prog_array map is created by unpriv user replace
bpf_tail_call(ctx, map, index);
with
if (index >= max_entries) {
index &= map->index_mask;
bpf_tail_call(ctx, map, index);
}
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.
Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.
That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.
v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[ Backported to 4.9 - gregkh ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79741b3bde upstream.
reduce indent and make it iterate over instructions similar to
convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[Backported to 4.9.y - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.9.76
kernel/acct.c: fix the acct->needcheck check in check_free_space()
crypto: n2 - cure use after free
crypto: chacha20poly1305 - validate the digest size
crypto: pcrypt - fix freeing pcrypt instances
sunxi-rsb: Include OF based modalias in device uevent
fscache: Fix the default for fscache_maybe_release_page()
nbd: fix use-after-free of rq/bio in the xmit path
kernel: make groups_sort calling a responsibility group_info allocators
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
iommu/arm-smmu-v3: Don't free page table ops twice
iommu/arm-smmu-v3: Cope with duplicated Stream IDs
ARC: uaccess: dont use "l" gcc inline asm constraint modifier
Input: elantech - add new icbody type 15
x86/microcode/AMD: Add support for fam17h microcode loading
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
parisc: qemu idle sleep support
x86/tlb: Drop the _GPL from the cpu_tlbstate export
Map the vsyscall page with _PAGE_USER
mtd: nand: pxa3xx: Fix READOOB implementation
Linux 4.9.76
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 426915796c upstream.
complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.
If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.
After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.
This should hopefully fix the problem reported by Dmitry. This
test-case
static int init(void *arg)
{
for (;;)
pause();
}
int main(void)
{
char stack[16 * 1024];
for (;;) {
int pid = clone(init, stack + sizeof(stack)/2,
CLONE_NEWPID | SIGCHLD, NULL);
assert(pid > 0);
assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
assert(waitpid(-1, NULL, WSTOPPED) == pid);
assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
assert(pid == wait(NULL));
}
}
triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop(). do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.
And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.
Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Kyle Huey <me@kylehuey.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 628c1bcba2 upstream.
The comment in sig_ignored() says "Tracers may want to know about even
ignored signals" but SIGKILL can not be reported to debugger and it is
just wrong to return 0 in this case: SIGKILL should only kill the
SIGNAL_UNKILLABLE task if it comes from the parent ns.
Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
sig_task_ignored().
SISGTOP coming from within the namespace is not really right too but at
least debugger can intercept it, and we can't drop it here because this
will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add
another ->ptrace check later, we will see.
Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d9570158b upstream.
As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check
is very wrong, we need time_is_after_jiffies() to make sys_acct() work.
Ignoring the overflows, the code should "goto out" if needcheck >
jiffies, while currently it checks "needcheck < jiffies" and thus in the
likely case check_free_space() does nothing until jiffies overflow.
In particular this means that sys_acct() is simply broken, acct_on()
sets acct->needcheck = jiffies and expects that check_free_space()
should set acct->active = 1 after the free-space check, but this won't
happen if jiffies increments in between.
This was broken by commit 32dc730860 ("get rid of timer in
kern/acct.c") in 2011, then another (correct) commit 795a2f22a8
("acct() should honour the limits from the very beginning") made the
problem more visible.
Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com
Fixes: 32dc730860 ("get rid of timer in kern/acct.c")
Reported-by: TSUKADA Koutaro <tsukada@ascade.co.jp>
Suggested-by: TSUKADA Koutaro <tsukada@ascade.co.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The SG's energy is obtained by adding busy and idle contributions which
are computed by considering a proper fraction of the
SCHED_CAPACITY_SCALE defined by the SG's utilizations.
By scaling each and every contribution conputed we risk to accumulate
rounding errors which can results into a non null energy_delta also in
cases when the same total accomulated utilization is differently
distributed among different CPUs.
To reduce rouding errors, this patch accumulated non-scaled busy/idle
energy contributions for each visited SG, and scale each of them just
one time at the end.
Change-Id: Idf8367fee0ac11938c6436096f0c1b2d630210d2
Suggested-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
The energy_env data structure is used to cache values required by multiple
different functions involved in energy_diff computation. Some of these
functions require additional parameters which can be easily embedded
into the energy_env itself. The current implementation of energy_diff
hardcodes the usage of two different energy_env structures to estimate and
compare the energy consumption related to a "before" and an "after" CPU.
Moreover, it does this energy estimation by walking multiple times the
SDs/SGs data structures.
A better design can be envisioned by better using the energy_env structure
to support a more efficient and concurrent evaluation of multiple schedule
candidates. To this purpose, this patch provides a complete re-factoring
of the energy_diff implementation to:
1. use a single energy_env structure for the evaluation of all the
candidate CPUs
2. walk just one time the SDs/SGs, thus improving the overall performance
to compute the energy estimation for each CPU candidate specified by
the single used energy_env
3. simplify the code (at least if you look at the new version and not at
this re-factoring patch) thus providing a more clean code to maintain
and extend for additional features
This patch updated all the clients of energy_env to use only the data
provided by this structure and an index for one of its CPUs candidates.
Embedding everything within the energy env will make it simple to add
tracepoints for this new version, which can easily provide an holistic
view on how energy_diff evaluated the proposed CPU candidates.
The new proposed structure, for both "struct energy_env" and the functions
using it, is designed in such a way to easily accommodate additional
further extensions (e.g. SchedTune filtering) without requiring an
additional big re-factoring of these core functions.
Finally, the usage of a CPUs candidate array, embedded into the
energy_diff structure, allows also to seamless extend the exploration of
multiple candidate CPUs, for example to support the comparison of a
spread-vs-packing strategy.
Change-Id: Ic04ffb6848b2c763cf1788767f22c6872eb12bee
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
[reworked find_new_capacity() and enforced the respect of
find_best_target() selection order]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
The current definition of select_energy_cpu_brute is a bit confusing in
the definition of the value for the target_cpu to be returned as wakeup
CPU for the specified task.
This cleanup the code by ensuring that we always set target_cpu right
before returning it. rcu_read_lock and check on *sd!=NULL are also moved
around to be exactly where they are required.
Change-Id: I70a4b558b3624a13395da1a87ddc0776fd1d6641
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
In preparation for the energy_diff refactoring, let's remove all the
SchedTune specific bits which are used to keep track of the capacity
variations requited by the PESpace filtering.
This removes also the energy_normalization function and the wrapper of
energy_diff which is used to trigger a PESpace filtering by
schedtune_accept_deltas().
The remaining code is the "original" energy_diff function which
looks just at the energy variations to compare prev_cpu vs next_cpu.
Change-Id: I4fb1d1c5ba45a364e6db9ab8044969349aba0307
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
The format of the energy_diff tracepoint is going to be changed by the
following energ_diff refactoring patches. Let's remove it now to start from
a clean slate.
Change-Id: Id4f537ed60d90a7ddcca0a29a49944bfacb85c8c
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
This is a simple renaming patch which just align to the most common code
convention used in fair.c, task_structs pointers are usually named *p.
Change-Id: Id0769e52b6a271014d89353fdb4be9bb721b5b2f
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
util_delta becomes not zero in eenv_before, which will affect the
calculation of grp_util in group_idle_state(). Fix it under the
new condition.
Change-Id: Ic3853bb45876a8e388afcbe4e72d25fc42b1d7b0
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
(cherry picked from commit 47c87b2654)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Changes in 4.9.75
tcp_bbr: reset full pipe detection on loss recovery undo
tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
x86/boot: Add early cmdline parsing for options with arguments
KAISER: Kernel Address Isolation
kaiser: merged update
kaiser: do not set _PAGE_NX on pgd_none
kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
kaiser: fix build and FIXME in alloc_ldt_struct()
kaiser: KAISER depends on SMP
kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
kaiser: fix perf crashes
kaiser: ENOMEM if kaiser_pagetable_walk() NULL
kaiser: tidied up asm/kaiser.h somewhat
kaiser: tidied up kaiser_add/remove_mapping slightly
kaiser: align addition to x86/mm/Makefile
kaiser: cleanups while trying for gold link
kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
kaiser: delete KAISER_REAL_SWITCH option
kaiser: vmstat show NR_KAISERTABLE as nr_overhead
kaiser: enhanced by kernel and user PCIDs
kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
kaiser: PCID 0 for kernel and 128 for user
kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
kaiser: paranoid_entry pass cr3 need to paranoid_exit
kaiser: kaiser_remove_mapping() move along the pgd
kaiser: fix unlikely error in alloc_ldt_struct()
kaiser: add "nokaiser" boot option, using ALTERNATIVE
x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
x86/kaiser: Check boottime cmdline params
kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
kaiser: drop is_atomic arg to kaiser_pagetable_walk()
kaiser: asm/tlbflush.h handle noPGE at lower level
kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
x86/paravirt: Dont patch flush_tlb_single
x86/kaiser: Reenable PARAVIRT
kaiser: disabled on Xen PV
x86/kaiser: Move feature detection up
KPTI: Rename to PAGE_TABLE_ISOLATION
KPTI: Report when enabled
kaiser: Set _PAGE_NX only if supported
Linux 4.9.75
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping(). And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>