linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-14 09:30:40 +09:00

Author	SHA1	Message	Date
Chris Redpath	d8063e7015	HMP: Implement task packing for small tasks in HMP systems If we wake up a task on a little CPU, fill CPUs rather than spread. Adds 2 new files to sys/kernel/hmp to control packing behaviour. packing_enable: task packing enabled (1) or disabled (0) packing_limit: Runqueues will be filled up to this load ratio. This functionality is disabled by default on TC2 as it lacks per-cpu power gating so packing small tasks there doesn't make sense. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-10-11 15:07:18 +01:00
Chris Redpath	cd5c2cc93d	hmp: Remove potential for task_struct access race Accessing the task_struct can be racy in certain conditions, so we need to only acquire the data when needed. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-10-11 15:07:18 +01:00
Chris Redpath	2e14ecb254	sched: HMP: fix potential logical errors The previous API for hmp_up_migration reset the destination CPU every time, regardless of if a migration was desired. The code using it assumed that the value would not be changed unless a migration was required. In one rare circumstance, this could have lead to a task migrating to a little CPU at the wrong time. Fixing that lead to a slight logical tweak to make the surrounding APIs operate a bit more obviously. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Robin Randhawa <robin.randhawa@arm.com> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-10-11 15:07:18 +01:00
Chris Redpath	5ecaba3d9f	smp: smp_cross_call function pointer tracing generic tracing for smp_cross_call function calls Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-10-11 15:07:18 +01:00
Chris Redpath	7b8e0b3f2a	sched: HMP: Additional trace points for debugging HMP behaviour 1. Replace magic numbers in code for migration trace. Trace points still emit a number as force=<n> field: force=0 : wakeup migration force=1 : forced migration force=2 : offload migration force=3 : idle pull migration 2. Add trace to expose offload decision-making. Also adds tracing rq->nr_running so that you can look back to see what state the RQ was in at the time. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-10-11 15:07:17 +01:00
Chris Redpath	d73babce9a	sched: HMP: Change default HMP thresholds When the up-threshold is at 512 on TC2, behaviour looks OK since the graphic-related tasks are very heavy due to lack of a GPU. Increasing the up-threshold does not reduce power consumption. When a GPU is present, graphic tasks are much less CPU-heavy and so additional power may be saved by having a higher threshold. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-10-11 15:07:17 +01:00
Chris Redpath	c2111520cf	HMP: Update migration timer when we fork-migrate Prevents fork-migration adversely interacting with normal migration (i.e. runqueues containing forked tasks being selected as migration targets when there is a better choice available) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-09-05 18:09:17 +01:00
Chris Redpath	0d520ee8d4	HMP: Access runqueue task clocks directly. Avoids accesses through cfs_rq going bad when the cpu_rq doesn't have a cfs member. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-09-05 18:09:17 +01:00
Chris Redpath	1325a370da	HMP: Implement idle pull for HMP When an A15 goes idle, we should up-migrate anything which is above the threshold and running on an A7. Reuses the HMP force-migration spinlock, but adds its own new cpu stopper client. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-09-05 18:09:17 +01:00
Chris Redpath	70845269b5	sched: HMP change nr_running offload metric rq->nr_running was better than cfs.nr_running, since it includes all tasks actually on the CPU. However, it includes RT tasks which we would rather ignore at this point. Switching to cfs.h_nr_running includes all the CFS tasks but no RT tasks. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-09-05 18:09:17 +01:00
Chris Redpath	72d74c1196	HMP: Explicitly implement all-load-is-max-load policy for HMP targets Experimentally, one of the best policies for HMP migration CPU selection is to completely ignore part-loaded CPUs and only look for idle ones. If there are no idle ones, we will choose the one which was least-recently-disturbed. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-09-05 18:09:16 +01:00
Chris Redpath	e580deb7fe	HMP: Modify the runqueue stats to add a new child stat The original intent here was to track unweighted runqueue load with less resolution so we could use the least-recently-disturbed runqueue to choose between 'closely related' load levels. However, after experimenting with the resolution it turns out that the following algorithm is highly beneficial for mobile workloads. In hmp_domain_min_load: * If any CPU is zero, the overall load is zero * If no CPUs are idle, the domain is 'fully loaded' Additionally, the time since last migration count is used to discriminate between idle CPUs. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-09-05 18:09:16 +01:00
Chris Redpath	add684211e	sched: track per-rq 'last migration time' Track when migrations were performed to runqueues. Use this to decide between runqueues as migration targets when run queues in an hmp domain have equal load. Intention is to spread migration load amongst CPUs more fairly. When all CPUs in an hmp domain are fully loaded, the existing code always selects the last CPU as a migration target - this is unfair and little better than doing no selection. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-09-05 18:09:16 +01:00
Morten Rasmussen	c05cd3079d	sched: HMP fix traversing the rb-tree from the curr pointer The hmp_get_{lightest,heaviest}_task() need to use __pick_first_entity() to get a pointer to a sched_entity on the rq. The current is not kept on the rq while running, so its rb-tree node pointers are no longer valid. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-09-05 18:09:16 +01:00
Chris Redpath	83a3cdb6d3	HMP: select 'best' task for migration rather than 'current' When we are looking for a task to migrate up, select the heaviest one in the first 5 runnable on the runqueue. Likewise, when looking for a task to offload, select the lightest one in the first 5 runnable on the runqueue. Ensure task selected is runnable in the target domain. This change is necessary in order to implement idle pull in a sensible manner, but here is used in up-migration and offload to select the correct target task. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2013-09-05 18:09:16 +01:00
Jon Medhurst (Tixy)	0d5ddd14a8	HMP: Check the system has little cpus before forcing rt tasks onto them It is sometimes desirable to run a kernel with HMP scheduling enabled on a system which is not big.LITTLE, e.g. when building a multi-platform kernel, or when testing a big.LITTLE system with one cluster disabled. We should therefore allow for the situation where is no little domain. Signed-off-by: Jon Medhurst <tixy@linaro.org> Signed-off-by: Mark Brown <broonie@linaro.org>	2013-09-05 17:50:06 +01:00
Dietmar Eggemann	4ab2679351	HMP: experimental: Force all rt tasks to start on little domain. This patch restricts the allowed cpu mask for rt tasks initially started with a full cpu mask to the little domain. An rt task is specified as real time in __setscheduler() which is finally called for all rt tasks (kernel and user land). In this function we restrict the allowed cpu mask to the little domain. This also prevents that a rt tasks can later be pushed to the big domain because the function find_lowest_rq() will only recognize the allowed cpu mask of a task to find the new cpu the task runs on. Current kludges of the patch: * Since we do not have an API to get the cpu mask of the A7 cluster, hmp_slow_cpu_mask is made global in arm/kernel/topology.c for now. * The watchdog_enable() function calls sched_setscheduler() before kthread_bind() for the cpu specific watchdog kernel threads. The order of these two calls has to be changed to make this patch work. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>	2013-07-17 11:32:41 +01:00
Chris Redpath	6eada00873	sched: Restrict nohz balance kicks to stay in the HMP domain There is little point in doing a nohz balance kick on a CPU from a different HMP domain, since the unset SD_LOAD_BALANCE flag on the CPU domain level prevents tasks from being balanced across clusters except through the per-task load driven hmp_migrate/hmp_offload paths. Further, the nohz balance kick is actively harmful to power usage if all the tasks fit into the little domain since it causes the big domain to wake up and do a lot of calculation to determine that there is nothing to do. A more generic solution is to walk the sched domain tree and determine the intersection of potential idle balance cpus with visibility of tasks on the current CPU, however HMP domains are more easily accessible. Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2013-07-17 11:32:30 +01:00
Chris Redpath	954978dd2c	HMP: Force new non-kernel tasks onto big CPUs until load stabilises Initialise the load stats for new tasks so that they do not see the instability in early task life which makes it so hard to decide which CPU is appropriate. Also, change the fork balance algorithm so that the least loaded of the CPUs in the big cluster is chosen regardless of the bigness of the parent task. This is intended to help performance for applications which use many short-lived tasks. Although best practise is usually to use a thread pool, apps which do not do this should not be subject to the randomness of the early stats. We should ignore real-time threads for forking on big CPUs, but it is not possible to figure out if a new thread is real-time or not at the fork stage. Instead, we prevent kernel threads from getting the initial boost - when they later become real-time they will only be on big if their compute requirements demand it. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2013-07-17 11:32:30 +01:00
Chris Redpath	3f3b210703	HMP: Avoid multiple calls to hmp_domain_min_load in fast path When evaluating a migration we make two calls to hmp_domain_min_load. This is unnecessary if we pass on the target CPU information from the hmp_up_migration path. In hmp_down_migration, we don't consider the load of the target CPUS. Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2013-07-17 11:32:29 +01:00
Chris Redpath	08d7db89a2	HMP: Select least-loaded CPU when performing HMP Migrations The reference patch set always selects the first CPU in an HMP domain as a migration target. In busy situations, this means that the migrated thread cannot make immediate use of an idle CPU but must share a busy one until the load balancer runs across the big domain. This patch uses the hmp_domain_min_load function introduced in global balancing to figure out which of the CPUs is the least busy and selects that as a migration target - in both directions. This essentially implements a task-spread strategy and is intended to maximise performance of migrated threads but is likely to use more power than the packing strategy previously employed. Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2013-07-17 11:32:29 +01:00
Chris Redpath	ede58a69a3	HMP: Use unweighted load for hmp migration decisions Normal task and runqueue loading is scaled according to priority to end up with a weighted load, known as the contribution. We want the CPU time to be allotted according to priority, but we also want to make big/little decisions based upon raw load. It is common, for example, for Android apps following the dev guide to end up with all their long-running or async action threads as low priority unless they override the AsyncThread constructor. All these threads are such low priority that they become invisible to the hmp_offload routine. Using unweighted load here allows us to maximise CPU usage in busy situations. Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2013-07-17 11:32:28 +01:00
Chris Redpath	7e64466300	sched: cfs.nr_running does not contain the intended metric rq->nr_running is the actual number of runnable tasks we wish to use to determine if a task is alone on a CPU. Change-Id: Icaf3022e02924ecdc94e14d4146c6fadd9580e2b Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2013-07-17 11:12:27 +01:00
Morten Rasmussen	cf71912f48	sched: Basic global balancing support for HMP This patch introduces an extra-check at task up-migration to prevent overloading the cpus in the faster hmp_domain while the slower hmp_domain is not fully utilized. The patch also introduces a periodic balance check that can down-migrate tasks if the faster domain is oversubscribed and the slower is under-utilized. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>	2013-07-17 11:12:27 +01:00
Chris Redpath	ae570aeb1d	ARM: Fix build breakage when big.LITTLE.conf is not used. Change-Id: I8641f5e930c65b5672130bd4a18d9868bb3ca594 Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>	2013-07-17 11:12:27 +01:00
Chris Redpath	71b5dbd6d5	ARM: Experimental Frequency-Invariant Load Scaling Patch Evaluation Patch to investigate using load as a representation of the amount of POTENTIAL cpu compute capacity used rather than a representation of the CURRENT cpu compute capacity. If CPUFreq is enabled, scales load in accordance with frequency. Powersave/performance CPUFreq governors are detected and scaling is disabled while these governors are in use. This is because when a single-frequency governor is in use, potential CPU capacity is static. So long as the governors and CPUFreq subsystem correctly report the frequencies available, the scaling should self tune. Adds an additional file to sysfs to allow this feature to be disabled for experimentation. /sys/kernel/hmp/frequency_invariant_load_scale write 0 to disable, 1 to enable. Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2013-07-17 11:12:27 +01:00
Olivier Cozette	0e48eed05c	ARM: Change load tracking scale using sysfs These functions allow to change the load average period used in the task load average computation through /sys/kernel/hmp/load_avg_period_ms. This period is the time in ms to go from 0 to 0.5 load average while running or the time from 1 to 0.5 while sleeping. The default one used is 32 and gives the same load_avg_ratio computation than without this patch. These functions also allow to change the up and down threshold of HMP using /sys/kernel/hmp/{up,down}_threshold. Both must be between 0 and 1024. The thresholds are divided by 1024 before being compared to the load_avg_ratio. If /sys/kernel/hmp/load_avg_period_ms is 128 and /sys/kernel/hmp/up_threshold is 512, a task will be migrated to a bigger cluster after running for 128ms. Because after load_avg_period_ms the load average is 0.5 and real up_threshold us 512 / 1024 = 0.5. Signed-off-by: Olivier Cozette <olivier.cozette@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2013-07-17 11:12:27 +01:00
Chris Redpath	b64cc6f7e5	sched: Ignore offline CPUs in HMP migration & load stats Previously, an offline CPU would always appear to have a zero load and this would distort the offload functionality used for balancing big and little domains. Maintain a mask of online CPUs in each domain and use this instead. Change-Id: I639b564b2f40cb659af8ceb8bd37f84b8a1fe323 Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2013-07-17 11:12:26 +01:00
Chris Redpath	d2c920023c	sched: Do not ignore grouped tasks during HMP forced migration. If the entity is not a task, it is a cfs group rq. Iterate up to find the task entity. Change-Id: I7cab7aba0798f6f14e38ad32e566d90e5937ffbc Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2013-07-17 11:12:26 +01:00
Morten Rasmussen	eeebbf595c	sched: Only down migrate low priority tasks if allowed by affinity mask Adds an extra check intersection of the task affinity mask and the slower hmp_domain cpumask before down migrating low priority tasks. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>	2013-07-17 11:12:26 +01:00
Morten Rasmussen	76525733b4	sched: SCHED_HMP multi-domain task migration control We need a way to prevent tasks that are migrating up and down the hmp_domains from migrating straight on through before the load has adapted to the new compute capacity of the CPU on the new hmp_domain. This patch adds a next up/down migration delay that prevents the task from doing another migration in the same direction until the delay has expired. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>	2013-07-17 11:12:25 +01:00
Morten Rasmussen	0d811e649a	sched: Add HMP task migration ftrace event Adds ftrace event for tracing task migrations using HMP optimized scheduling. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>	2013-07-17 11:12:25 +01:00
Morten Rasmussen	b9d3d56128	sched: Add ftrace events for entity load-tracking Adds ftrace events for key variables related to the entity load-tracking to help debugging scheduler behaviour. Allows tracing of load contribution and runqueue residency ratio for both entities and runqueues as well as entity CPU usage ratio. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>	2013-07-17 11:12:25 +01:00
Morten Rasmussen	943106d943	sched: Introduce priority-based task migration filter Introduces a priority threshold which prevents low priority task from migrating to faster hmp_domains (cpus). This is useful for user-space software which assigns lower task priority to background task. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>	2013-07-17 11:12:24 +01:00
Morten Rasmussen	2dd22b22c9	sched: Forced task migration on heterogeneous systems This patch introduces forced task migration for moving suitable currently running tasks between hmp_domains. Task behaviour is likely to change over time. Tasks running in a less capable hmp_domain may change to become more demanding and should therefore be migrated up. They are unlikely go through the select_task_rq_fair() path anytime soon and therefore need special attention. This patch introduces a period check (SCHED_TICK) of the currently running task on all runqueues and sets up a forced migration using stop_machine_no_wait() if the task needs to be migrated. Ideally, this should not be implemented by polling all runqueues. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>	2013-07-17 11:12:24 +01:00
Morten Rasmussen	798e82cab1	sched: Task placement for heterogeneous systems based on task load-tracking This patch introduces the basic SCHED_HMP infrastructure. Each class of cpus is represented by a hmp_domain and tasks will only be moved between these domains when their load profiles suggest it is beneficial. SCHED_HMP relies heavily on the task load-tracking introduced in Paul Turners fair group scheduling patch set: <https://lkml.org/lkml/2012/8/23/267> SCHED_HMP requires that the platform implements arch_get_hmp_domains() which should set up the platform specific list of hmp_domains. It is also assumed that the platform disables SD_LOAD_BALANCE for the appropriate sched_domains. Tasks placement takes place every time a task is to be inserted into a runqueue based on its load history. The task placement decision is based on load thresholds. There are no restrictions on the number of hmp_domains, however, multiple (>2) has not been tested and the up/down migration policy is rather simple. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>	2013-07-17 11:12:24 +01:00
Morten Rasmussen	be6ef1d56e	sched: entity load-tracking load_avg_ratio This patch adds load_avg_ratio to each task. The load_avg_ratio is a variant of load_avg_contrib which is not scaled by the task priority. It is calculated like this: runnable_avg_sum * NICE_0_LOAD / (runnable_avg_period + 1). Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>	2013-07-17 11:12:24 +01:00
Paul Turner	0841c6ae0b	sched: implement usage tracking With the frame-work for runnable tracking now fully in place. Per-entity usage tracking is a simple and low-overhead addition. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>	2013-07-17 11:12:23 +01:00
Mathieu Desnoyers	706b23bde2	Fix: kernel/ptrace.c: ptrace_peek_siginfo() missing __put_user() validation This __put_user() could be used by unprivileged processes to write into kernel memory. The issue here is that even if copy_siginfo_to_user() fails, the error code is not checked before __put_user() is executed. Luckily, ptrace_peek_siginfo() has been added within the 3.10-rc cycle, so it has not hit a stable release yet. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Roland McGrath <roland@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Pedro Alves <palves@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-06-29 11:29:08 -07:00
Linus Torvalds	a75930c633	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "Correct an ordering issue in the tick broadcast code. I really wish we'd get compensation for pain and suffering for each line of code we write to work around dysfunctional timer hardware." * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick: Fix tick_broadcast_pending_mask not cleared	2013-06-29 10:27:19 -07:00
Linus Torvalds	54faf77d06	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Three small fixlets" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot() hw_breakpoint: Fix cpu check in task_bp_pinned(cpu) kprobes: Fix arch_prepare_kprobe to handle copy insn failures	2013-06-26 08:51:44 -10:00
Linus Torvalds	f71194a7d4	Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This series fixes a couple of build failures, and fixes MTRR cleanup and memory setup on very specific memory maps. Finally, it fixes triggering backtraces on all CPUs, which was inadvertently disabled on x86." * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Fix dummy variable buffer allocation x86: Fix trigger_all_cpu_backtrace() implementation x86: Fix section mismatch on load_ucode_ap x86: fix build error and kconfig for ia32_emulation and binfmt range: Do not add new blank slot with add_range_with_merge x86, mtrr: Fix original mtrr range get for mtrr_cleanup	2013-06-21 06:33:48 -10:00
Daniel Lezcano	ea8deb8dfa	tick: Fix tick_broadcast_pending_mask not cleared The recent modification in the cpuidle framework consolidated the timer broadcast code across the different drivers by setting a new flag in the idle state. It tells the cpuidle core code to enter/exit the broadcast mode for the cpu when entering a deep idle state. The broadcast timer enter/exit is no longer handled by the back-end driver. This change made the local interrupt to be enabled before calling CLOCK_EVENT_NOTIFY_EXIT. On a tegra114, a four cores system, when the flag has been introduced in the driver, the following warning appeared: WARNING: at kernel/time/tick-broadcast.c:578 tick_broadcast_oneshot_control CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.10.0-rc3-next-20130529+ #15 [<c00667f8>] (tick_broadcast_oneshot_control+0x1a4/0x1d0) from [<c0065cd0>] (tick_notify+0x240/0x40c) [<c0065cd0>] (tick_notify+0x240/0x40c) from [<c0044724>] (notifier_call_chain+0x44/0x84) [<c0044724>] (notifier_call_chain+0x44/0x84) from [<c0044828>] (raw_notifier_call_chain+0x18/0x20) [<c0044828>] (raw_notifier_call_chain+0x18/0x20) from [<c00650cc>] (clockevents_notify+0x28/0x170) [<c00650cc>] (clockevents_notify+0x28/0x170) from [<c033f1f0>] (cpuidle_idle_call+0x11c/0x168) [<c033f1f0>] (cpuidle_idle_call+0x11c/0x168) from [<c000ea94>] (arch_cpu_idle+0x8/0x38) [<c000ea94>] (arch_cpu_idle+0x8/0x38) from [<c005ea80>] (cpu_startup_entry+0x60/0x134) [<c005ea80>] (cpu_startup_entry+0x60/0x134) from [<804fe9a4>] (0x804fe9a4) I don't have the hardware, so I wasn't able to reproduce the warning but after looking a while at the code, I deduced the following: 1. the CPU2 enters a deep idle state and sets the broadcast timer 2. the timer expires, the tick_handle_oneshot_broadcast function is called, setting the tick_broadcast_pending_mask and waking up the idle cpu CPU2 3. the CPU2 exits idle handles the interrupt and then invokes tick_broadcast_oneshot_control with CLOCK_EVENT_NOTIFY_EXIT which runs the following code: [...] if (dev->next_event.tv64 == KTIME_MAX) goto out; if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_pending_mask)) goto out; [...] So if there is no next event scheduled for CPU2, we fulfil the first condition and jump out without clearing the tick_broadcast_pending_mask. 4. CPU2 goes to deep idle again and calls tick_broadcast_oneshot_control with CLOCK_NOTIFY_EVENT_ENTER but with the tick_broadcast_pending_mask set for CPU2, triggering the warning. The issue only surfaced due to the modifications of the cpuidle framework, which resulted in interrupts being enabled before the call to the clockevents code. If the call happens before interrupts have been enabled, the warning cannot trigger, because there is still the event pending which caused the broadcast timer expiry. Move the check for the next event below the check for the pending bit, so the pending bit gets cleared whether an event is scheduled on the cpu or not. [ tglx: Massaged changelog ] Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reported-and-tested-by: Joseph Lo <josephl@nvidia.com> Cc: Stephen Warren <swarren@nvidia.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/1371485735-31249-1-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-06-21 13:10:34 +02:00
Linus Torvalds	a3d5c3460a	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two smaller fixes - plus a context tracking tracing fix that is a bit bigger" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing/context-tracking: Add preempt_schedule_context() for tracing sched: Fix clear NOHZ_BALANCE_KICK sched/x86: Construct all sibling maps if smt	2013-06-20 08:18:35 -10:00
Linus Torvalds	86c76676cf	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Four fixes. The mmap ones are unfortunately larger than desired - fuzzing uncovered bugs that needed perf context life time management changes to fix properly" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP perf: Fix mmap() accounting hole perf: Fix perf mmap bugs kprobes: Fix to free gone and unused optprobes	2013-06-20 08:17:36 -10:00
Linus Torvalds	805e318548	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull cpu idle fixes from Thomas Gleixner: - Add a missing irq enable. Fallout of the idle conversion - Fix stackprotector wreckage caused by the idle conversion * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: idle: Enable interrupts in the weak arch_cpu_idle() implementation idle: Add the stack canary init to cpu_startup_entry()	2013-06-20 08:16:07 -10:00
Linus Torvalds	4db88eb4c3	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: - Fix inconstinant clock usage in virtual time accounting - Fix a build error in KVM caused by the NOHZ work - Remove a pointless timekeeping duty assignment which breaks NOHZ - Use a proper notifier return value to avoid random behaviour * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick: Remove useless timekeeping duty attribution to broadcast source nohz: Fix notifier return val that enforce timekeeping kvm: Move guest entry/exit APIs to context_tracking vtime: Use consistent clocks among nohz accounting	2013-06-20 08:15:13 -10:00
Oleg Nesterov	c790b0ad23	hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot() fetch_bp_busy_slots() and toggle_bp_slot() use for_each_online_cpu(), this is obviously wrong wrt cpu_up() or cpu_down(), we can over/under account the per-cpu numbers. For example: # echo 0 >> /sys/devices/system/cpu/cpu1/online # perf record -e mem:0x10 -p 1 & # echo 1 >> /sys/devices/system/cpu/cpu1/online # perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10 -C1 -a & # taskset -p 0x2 1 triggers the same WARN_ONCE("Can't find any breakpoint slot") in arch_install_hw_breakpoint(). Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20130620155009.GA6327@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-06-20 17:57:01 +02:00
Oleg Nesterov	8b4d801b2b	hw_breakpoint: Fix cpu check in task_bp_pinned(cpu) trinity fuzzer triggered WARN_ONCE("Can't find any breakpoint slot") in arch_install_hw_breakpoint() but the problem is not arch-specific. The problem is, task_bp_pinned(cpu) checks "cpu == iter->cpu" but this doesn't account the "all cpus" events with iter->cpu < 0. This means that, say, register_user_hw_breakpoint(tsk) can happily create the arbitrary number > HBP_NUM of breakpoints which can not be activated. toggle_bp_task_slot() is equally wrong by the same reason and nr_task_bp_pinned[] can have negative entries. Simple test: # perl -e 'sleep 1 while 1' & # perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 -p `pidof perl` Before this patch this triggers the same problem/WARN_ON(), after the patch it correctly fails with -ENOSPC. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20130620155006.GA6324@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-06-20 17:57:00 +02:00
Steven Rostedt	29bb9e5a75	tracing/context-tracking: Add preempt_schedule_context() for tracing Dave Jones hit the following bug report: =============================== [ INFO: suspicious RCU usage. ] 3.10.0-rc2+ #1 Not tainted ------------------------------- include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 2 locks held by cc1/63645: #0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0 #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0 CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28 ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645 Call Trace: [<ffffffff816ae383>] dump_stack+0x19/0x1b [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130 [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0 [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0 [<ffffffff8108dffc>] update_curr+0xec/0x240 [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480 [<ffffffff816b3a71>] __schedule+0x161/0x9b0 [<ffffffff816b4721>] preempt_schedule+0x51/0x80 [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60 [<ffffffff816b6824>] ? retint_careful+0x12/0x2e [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210 [<ffffffff816be280>] ftrace_call+0x5/0x2f [<ffffffff816b681d>] ? retint_careful+0xb/0x2e [<ffffffff816b4805>] ? schedule_user+0x5/0x70 [<ffffffff816b4805>] ? schedule_user+0x5/0x70 [<ffffffff816b6824>] ? retint_careful+0x12/0x2e ------------[ cut here ]------------ What happened was that the function tracer traced the schedule_user() code that tells RCU that the system is coming back from userspace, and to add the CPU back to the RCU monitoring. Because the function tracer does a preempt_disable/enable_notrace() calls the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set, then preempt_schedule() is called. But this is called before the user_exit() function can inform the kernel that the CPU is no longer in user mode and needs to be accounted for by RCU. The fix is to create a new preempt_schedule_context() that checks if the kernel is still in user mode and if so to switch it to kernel mode before calling schedule. It also switches back to user mode coming back from schedule in need be. The only user of this currently is the preempt_enable_notrace(), which is only used by the tracing subsystem. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-06-19 12:55:10 +02:00

1 2 3 4 5 ...

15855 Commits