Preempt and irq trace events can be used for tracing the start and
end of an atomic section which can be used by a trace viewer like
systrace to graphically view the start and end of an atomic section and
correlate them with latencies and scheduling issues.
This also serves as a prelude to using synthetic events or probes to
rewrite the preempt and irqsoff tracers, along with numerous benefits of
using trace events features for these events.
Change-Id: I718d40f7c3c48579adf9d7121b21495a669c89bd
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zilstra <peterz@infradead.org>
Cc: kernel-team@android.com
Link: https://patchwork.kernel.org/patch/9988157/
Signed-off-by: Joel Fernandes <joelaf@google.com>
In preparation of adding irqsoff and preemptsoff enable and disable trace
events, move required functions and code to make it easier to add these events
in a later patch. This patch is just code movement and no functional change.
Change-Id: I587d411da5efbc4959bcccd7a05c7a66c231e1e0
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@android.com
Link: https://patchwork.kernel.org/patch/9988159/
Signed-off-by: Joel Fernandes <joelaf@google.com>
Currently, sugov_next_freq_shared() uses last_freq_update_time as a
reference to decide when to start considering CPU contributions as
stale.
However, since last_freq_update_time is set by the last CPU that issued
a frequency transition, this might cause problems in certain cases. In
practice, the detection of stale utilization values fails whenever the
CPU with such values was the last to update the policy. For example (and
please note again that the SCHED_CPUFREQ_RT flag is not the problem
here, but only the detection of after how much time that flag has to be
considered stale), suppose a policy with 2 CPUs:
CPU0 | CPU1
|
| RT task scheduled
| SCHED_CPUFREQ_RT is set
| CPU1->last_update = now
| freq transition to max
| last_freq_update_time = now
|
more than TICK_NSEC nsecs
|
a small CFS wakes up |
CPU0->last_update = now1 |
delta_ns(CPU0) < TICK_NSEC* |
CPU0's util is considered |
delta_ns(CPU1) = |
last_freq_update_time - |
CPU1->last_update = 0 |
< TICK_NSEC |
CPU1 is still considered |
CPU1->SCHED_CPUFREQ_RT is set |
we stay at max (until CPU1 |
exits from idle) |
* delta_ns is actually negative as now1 > last_freq_update_time
While last_freq_update_time is a sensible reference for rate limiting,
it doesn't seem to be useful for working around stale CPU states.
Fix the problem by always considering now (time) as the reference for
deciding when CPUs have stale contributions.
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit d86ab9cff8)
This reverts commit c5616f2f87.
If we re-init the per-cpu boostgroup spinlock every time that
we add a new boosted cgroup, we can easily wipe out (reinit)
a spinlock struct while in a critical section. We should only
be setting up the per-cpu boostgroup data, and the spin_lock
initialization need only happen once - which we're already
doing in a postcore_initcall.
For example:
-------- CPU 0 -------- | -------- CPU1 --------
cgroupX boost group added |
schedtune_enqueue_task |
acquires(bg->lock) | cgroupY boost group added
| for_each_cpu()
| raw_spin_lock_init(bg->lock)
releases(bg->lock) |
BUG (already unlocked) |
|
This results in the following BUG from the debug spinlock code:
BUG: spinlock already unlocked on CPU#5, rcuop/6/68
Change-Id: I3016702780b461a0cd95e26c538cd18df27d6316
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(cherry picked from commit ec8d7c14ea)
Tetsuo has properly noted that mmput slow path might get blocked waiting
for another party (e.g. exit_aio waits for an IO). If that happens the
oom_reaper would be put out of the way and will not be able to process
next oom victim. We should strive for making this context as reliable
and independent on other subsystems as much as possible.
Introduce mmput_async which will perform the slow path from an async
(WQ) context. This will delay the operation but that shouldn't be a
problem because the oom_reaper has reclaimed the victim's address space
for most cases as much as possible and the remaining context shouldn't
bind too much memory anymore. The only exception is when mmap_sem
trylock has failed which shouldn't happen too often.
The issue is only theoretical but not impossible.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Only backports mmput_async.
Change-Id: I5fe54abcc629e7d9eab9fe03908903d1174177f1
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Commit b235beea9e ("Clarify naming of thread info/stack allocators")
breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE:
kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack'
kernel/fork.c:355:8: error: assignment from incompatible pointer type
stack = alloc_thread_stack_node(tsk, node);
^
Fix it by renaming free_stack() to free_thread_stack(), and updating the
return type of alloc_thread_stack_node().
Fixes: b235beea9e ("Clarify naming of thread info/stack allocators")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 38331309
Change-Id: I5b7f920b459fb84adf5fc75f83bb488b855c4deb
(cherry picked from commit 9521d39976)
Signed-off-by: Zubin Mithra <zsm@google.com>
commit 50e7663233 upstream.
Cpusets vs. suspend-resume is _completely_ broken. And it got noticed
because it now resulted in non-cpuset usage breaking too.
On suspend cpuset_cpu_inactive() doesn't call into
cpuset_update_active_cpus() because it doesn't want to move tasks about,
there is no need, all tasks are frozen and won't run again until after
we've resumed everything.
But this means that when we finally do call into
cpuset_update_active_cpus() after resuming the last frozen cpu in
cpuset_cpu_active(), the top_cpuset will not have any difference with
the cpu_active_mask and this it will not in fact do _anything_.
So the cpuset configuration will not be restored. This was largely
hidden because we would unconditionally create identity domains and
mobile users would not in fact use cpusets much. And servers what do use
cpusets tend to not suspend-resume much.
An addition problem is that we'd not in fact wait for the cpuset work to
finish before resuming the tasks, allowing spurious migrations outside
of the specified domains.
Fix the rebuild by introducing cpuset_force_rebuild() and fix the
ordering with cpuset_wait_for_hotplug().
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: deb7aa308e ("cpuset: reorganize CPU / memory hotplug handling")
Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b0b8499ae upstream.
The trampoline allocated by function tracer was overwriten by function_graph
tracer, and caused a memory leak. The save_global_trampoline should have
saved the previous trampoline in register_ftrace_graph() and restored it in
unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was
only used in unregister_ftrace_graph as default value 0, and it overwrote the
previous trampoline's value. Causing the previous allocated trampoline to be
lost.
kmmeleak backtrace:
kmemleak_vmalloc+0x77/0xc0
__vmalloc_node_range+0x1b5/0x2c0
module_alloc+0x7c/0xd0
arch_ftrace_update_trampoline+0xb5/0x290
ftrace_startup+0x78/0x210
register_ftrace_function+0x8b/0xd0
function_trace_init+0x4f/0x80
tracing_set_tracer+0xe6/0x170
tracing_set_trace_write+0x90/0xd0
__vfs_write+0x37/0x170
vfs_write+0xb2/0x1b0
SyS_write+0x55/0xc0
do_syscall_64+0x67/0x180
return_from_SYSCALL_64+0x0/0x6a
[
Looking further into this, I found that this was left over from when the
function and function graph tracers shared the same ftrace_ops. But in
commit 5f151b2401 ("ftrace: Fix function_profiler and function tracer
together"), the two were separated, and the save_global_trampoline no
longer was necessary (and it may have been broken back then too).
-- Steven Rostedt
]
Link: http://lkml.kernel.org/r/20170912021454.5976-1-shuwang@redhat.com
Fixes: 5f151b2401 ("ftrace: Fix function_profiler and function tracer together")
Signed-off-by: Shu Wang <shuwang@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d88d56c58 upstream.
Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.
This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.
Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.
Change-Id: I84f455fc9adbb26c4130205d9ebe3683f638fc1d
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Daniel Mentz <danielmentz@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
(cherry picked from LTS linux-4.9.y
commit a53bfdda06)
commit 66a733ea6b upstream.
As Chris explains, get_seccomp_filter() and put_seccomp_filter() can end
up using different filters. Once we drop ->siglock it is possible for
task->seccomp.filter to have been replaced by SECCOMP_FILTER_FLAG_TSYNC.
Fixes: f8e529ed94 ("seccomp, ptrace: add support for dumping seccomp filters")
Reported-by: Chris Salls <chrissalls5@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[tycho: add __get_seccomp_filter vs. open coding refcount_inc()]
Signed-off-by: Tycho Andersen <tycho@docker.com>
[kees: tweak commit log]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 75df6e688c upstream.
When reading data from trace_pipe, tracing_wait_pipe() performs a
check to see if tracing has been turned off after some data was read.
Currently, this check always looks at global trace state, but it
should be checking the trace instance where trace_pipe is located at.
Because of this bug, cat instances/i1/trace_pipe in the following
script will immediately exit instead of waiting for data:
cd /sys/kernel/debug/tracing
echo 0 > tracing_on
mkdir -p instances/i1
echo 1 > instances/i1/tracing_on
echo 1 > instances/i1/events/sched/sched_process_exec/enable
cat instances/i1/trace_pipe
Link: http://lkml.kernel.org/r/20170917102348.1615-1-tahsin@google.com
Fixes: 10246fa35d ("tracing: give easy way to clear trace buffer")
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit edb096e007 upstream.
If function tracing is disabled by the user via the function-trace option or
the proc sysctl file, and a ftrace_ops that was allocated on the heap is
unregistered, then the shutdown code exits out without doing the proper
clean up. This was found via kmemleak and running the ftrace selftests, as
one of the tests unregisters with function tracing disabled.
# cat kmemleak
unreferenced object 0xffffffffa0020000 (size 4096):
comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s)
hex dump (first 32 bytes):
55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89 U.t$.UH...t$.UH.
e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c .H......H.D$PH.L
backtrace:
[<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0
[<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0
[<ffffffff8109697f>] module_alloc+0x4f/0x90
[<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420
[<ffffffff81249947>] ftrace_startup+0xe7/0x300
[<ffffffff81249bd2>] register_ftrace_function+0x72/0x90
[<ffffffff81263786>] trace_selftest_ops+0x204/0x397
[<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624
[<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7
[<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192
[<ffffffff81002230>] do_one_initcall+0x90/0x1e2
[<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe
[<ffffffff81d61ec3>] kernel_init+0x13/0x122
[<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40
[<ffffffffffffffff>] 0xffffffffffffffff
Fixes: 12cce594fa ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 170b3b1050 upstream.
Currently trace_clock timestamps are applied to both regular and max
buffers only for global trace. For instance trace, trace_clock
timestamps are applied only to regular buffer. But, regular and max
buffers can be swapped, for example, following a snapshot. So, for
instance trace, bad timestamps can be seen following a snapshot.
Let's apply trace_clock timestamps to instance max buffer as well.
Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com
Fixes: 277ba0446 ("tracing: Add interface to allow multiple trace buffers")
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 46320a6acc upstream.
In the second iteration of trace_selftest_ops(), the error goto label is
wrong in the case where trace_selftest_test_global_cnt is off. In the
case of error, it leaks the dynamic ops that was allocated.
Fixes: 95950c2e ("ftrace: Add self-tests for multiple function trace users")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).
That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.
Make intel_pstate set transition_delay_us to 500.
BACKPORT: Modified to support the separate up_rate_limit_us and
down_rate_limit_us (upstream just has a single rate_limit_us). Also
dropped the changes for intel_pstate as there's a merge conflict.
Change-Id: I62a8543879a4d8582cdcb31ebd55607705d1c8b1
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
(cherry picked from commit 1b72e7fd30)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
The initial window start needs to be close to ktime ns = 0 to be
aligned with scheduler tick.
Change-Id: Ia91f74efce2f910106622a054a6fcd507e763ca5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
EAS won't allow NOHZ idle balancer until CPU's over utilized. However
nohz_kick_needed() can return true. This causes idle CPU wake up for
nothing.
Change-Id: I6e548442e29e4f85cda695e4c7101dd591b12fe6
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
In order to calculate energy difference we currently iterates CPUs under
the same sched doamin to accumulate total energy cost and compare before
and after :
for_each_domain(cpu)
total_energy_before += (cpu_util * power) >> SCHED_CAPACITY_SHIFT;
for_each_domain(cpu)
total_energy_after += (cpu_util * power) >> SCHED_CAPACITY_SHIFT;
Doing such can incorrectly calculate and report abs(delta) > 0 when
there is actually no energy delta between before and after because the
same total accumulated cpu_util of all the CPUs can be distributed
differently before and after and it causes different amount of rounding
error.
Fix such incorrectness by shifting just once with accumulated
total_energy.
Change-Id: I82f1e2e358367058960938b4ef81714f57e921cf
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(moved part to another commit)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
WALT's function cpu_util(cpu) reports CPU's load without taking into
account of waking task's load. Thus currently cpu_overutilized()
underestimates load on the previous CPU of waking task.
Take into account of task's load to determine whether previous CPU is
overutilzed to bail out early without running energy_diff() which is
expensive.
Change-Id: I30f146984a880ad2cc1b8a4ce35bd239a8c9a607
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(minor rebase conflicts)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
With WALT all the scheduler classes' load are accounted in scr->cfs and
update_cpu_capacity_request() adds capacity margin. At present, at tick
path, scheduler also adds capacity margin. Therefore the margin applied
twice.
Fix such error by using margin applied cpu utilization only for checking
whether frequency increase is needed.
Change-Id: Id7d8cc73b2e4eec70b274ca66e09bb0b16bf6f09
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(trivial rebase conflict)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
WALT CPU utilization reports CPU load of all the scheduler classes.
Therefore adding RT class's load additionally will cause frequency
overshooting. Fix such issue by not accounting RT class load when
requesting capacity.
Change-Id: I29600d7af7ca8c00e0d2ff1e13872024ccaa72bf
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
WALT accounts two major statistics; CPU load and cumulative tasks
demand.
The CPU load which is account of accumulated each CPU's absolute
execution time is for CPU frequency guidance. Whereas cumulative
tasks demand which is each CPU's instantaneous load to reflect
CPU's load at given time is for task placement decision.
Use cumulative tasks demand for cpu_util() for task placement and
introduce cpu_util_freq() for frequency guidance.
Change-Id: Id928f01dbc8cb2a617cdadc584c1f658022565c5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
When running tasks's ravg.demand is changed update_history() adjusts
rq->cumulative_runnable_avg to reflect change of CPU load. Currently
this fixup is broken by accumulating task's new demand without
subtracting the task's old demand.
Fix the fixup logic to subtract the task's old demand.
Change-Id: I61beb32a4850879ccb39b733f5564251e465bfeb
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Account cumulative runnable average for WALT CPU utilization accounting.
Change-Id: I56934894e626dec183740eeaf89a57d2ef638143
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Currently the iowait_boost feature in schedutil makes the frequency go
to max on iowait wakeups. This feature was added to handle a case that
Peter described where the throughput of operations involving continuous
I/O requests [1] is reduced due to running at a lower frequency, however
the lower throughput itself causes utilization to be low and hence
causing frequency to be low hence its "stuck".
Instead of going to max, its also possible to achieve the same effect by
ramping up to max if there are repeated in_iowait wakeups happening.
This patch is an attempt to do that. We start from a lower frequency
(policy->min) and double the boost for every consecutive iowait update
until we reach the maximum iowait boost frequency (iowait_boost_max).
I ran a synthetic test (continuous O_DIRECT writes in a loop) on an x86
machine with intel_pstate in passive mode using schedutil. In this test
the iowait_boost value ramped from 800MHz to 4GHz in 60ms. The patch
achieves the desired improved throughput as the existing behavior.
[1] https://patchwork.kernel.org/patch/9735885/
Change-Id: I4a018434a50f4ca29ec15b03465f6dc212e54423
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Fix merge error. kernel/sched/energy.c should not build on non-SMP.
Fixes: f9ae5d202b ("Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git")
Change-Id: I5dba90ec363c02e489a48d76bb97f9f9d62827cd
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
commit 1c08c22c87 upstream.
The memory_pressure control file was incorrectly set up without
a private value (0, by default). As a result, this control
file was treated like memory_migrate on read. By adding back the
FILE_MEMORY_PRESSURE private value, the correct memory pressure value
will be returned.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 7dbdb199d3 ("cgroup: replace cftype->mode with CFTYPE_WORLD_WRITABLE")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
LSK 17.08 v4.4-android
* tag 'lsk-v4.4-17.08-android': (451 commits)
Linux 4.4.83
pinctrl: samsung: Remove bogus irq_[un]mask from resource management
pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
pnfs/blocklayout: require 64-bit sector_t
iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits
usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume
usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter
usb: core: unlink urbs from the tail of the endpoint's urb_list
USB: Check for dropped connection before switching to full speed
uag: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069
iio: light: tsl2563: use correct event code
iio: accel: bmc150: Always restore device to normal mode after suspend-resume
staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read
USB: hcd: Mark secondary HCD as dead if the primary one died
usb: musb: fix tx fifo flush handling again
USB: serial: pl2303: add new ATEN device id
USB: serial: cp210x: add support for Qivicon USB ZigBee dongle
USB: serial: option: add D-Link DWM-222 device ID
nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays
fuse: initialize the flock flag in fuse_file on allocation
...
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Conflicts:
drivers/android/binder.c
Keep AOSP changes and discard LTS binder changes, since these LTS changes
have already been merged and further refactored in AOSP tree long ago.
commit 64aee2a965 upstream.
Regardless of which events form a group, it does not make sense for the
events to target different tasks and/or CPUs, as this leaves the group
inconsistent and impossible to schedule. The core perf code assumes that
these are consistent across (successfully intialised) groups.
Core perf code only verifies this when moving SW events into a HW
context. Thus, we can violate this requirement for pure SW groups and
pure HW groups, unless the relevant PMU driver happens to perform this
verification itself. These mismatched groups subsequently wreak havoc
elsewhere.
For example, we handle watchpoints as SW events, and reserve watchpoint
HW on a per-CPU basis at pmu::event_init() time to ensure that any event
that is initialised is guaranteed to have a slot at pmu::add() time.
However, the core code only checks the group leader's cpu filter (via
event_filter_match()), and can thus install follower events onto CPUs
violating thier (mismatched) CPU filters, potentially installing them
into a CPU without sufficient reserved slots.
This can be triggered with the below test case, resulting in warnings
from arch backends.
#define _GNU_SOURCE
#include <linux/hw_breakpoint.h>
#include <linux/perf_event.h>
#include <sched.h>
#include <stdio.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <unistd.h>
static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
int group_fd, unsigned long flags)
{
return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
}
char watched_char;
struct perf_event_attr wp_attr = {
.type = PERF_TYPE_BREAKPOINT,
.bp_type = HW_BREAKPOINT_RW,
.bp_addr = (unsigned long)&watched_char,
.bp_len = 1,
.size = sizeof(wp_attr),
};
int main(int argc, char *argv[])
{
int leader, ret;
cpu_set_t cpus;
/*
* Force use of CPU0 to ensure our CPU0-bound events get scheduled.
*/
CPU_ZERO(&cpus);
CPU_SET(0, &cpus);
ret = sched_setaffinity(0, sizeof(cpus), &cpus);
if (ret) {
printf("Unable to set cpu affinity\n");
return 1;
}
/* open leader event, bound to this task, CPU0 only */
leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
if (leader < 0) {
printf("Couldn't open leader: %d\n", leader);
return 1;
}
/*
* Open a follower event that is bound to the same task, but a
* different CPU. This means that the group should never be possible to
* schedule.
*/
ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
if (ret < 0) {
printf("Couldn't open mismatched follower: %d\n", ret);
return 1;
} else {
printf("Opened leader/follower with mismastched CPUs\n");
}
/*
* Open as many independent events as we can, all bound to the same
* task, CPU0 only.
*/
do {
ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
} while (ret >= 0);
/*
* Force enable/disble all events to trigger the erronoeous
* installation of the follower event.
*/
printf("Opened all events. Toggling..\n");
for (;;) {
prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
}
return 0;
}
Fix this by validating this requirement regardless of whether we're
moving events.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhou Chengming <zhouchengming1@huawei.com>
Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>