Commit Graph

506 Commits

Author SHA1 Message Date
Mark Brown
91521ab3c9 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-06-27 10:59:38 +01:00
Mark Brown
f056113faf Merge remote-tracking branch 'lsk/v3.10/topic/big.LITTLE' into linux-linaro-lsk 2014-06-27 10:58:22 +01:00
Alex Shi
65abdc9b50 HMP: use per cpu cpuidle driver to fix deadlock in hmp_idle_pull
Using per cpu cpuidle driver to fix deadlock in hmp_idle_pull.
Otherwise a deadlock happened when do bl_idle_init.

[  113.878664] other info that might help us debug this:
[  113.878667]  Possible unsafe locking scenario:
[  113.878667]
[  113.878670]        CPU0
[  113.878673]        ----
[  113.878681]   lock(cpuidle_driver_lock);
[  113.878684]   <Interrupt>
[  113.878691]     lock(cpuidle_driver_lock);
[  113.878693]
[  113.878693]  *** DEADLOCK ***
[  113.878693]
[  113.878697] 1 lock held by ksoftirqd/4/28:
[  113.878719]  #0:  (hmp_force_migration){+.....}, at: [<c0054da5>]
hmp_idle_pull+0x49/0x508

This patch is just a quick/cheap workaround for cpuidle_driver_lock
deadlock. It works for TC2 and any other platform where the idle
driver cannot be changed at runtime.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-06-27 10:18:30 +01:00
Mark Brown
713c715377 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-06-25 11:42:00 +01:00
Mark Brown
de29502181 Merge remote-tracking branch 'lsk/v3.10/topic/big.LITTLE' into linux-linaro-lsk 2014-06-25 10:24:55 +01:00
Alex Shi
4a29519b4a Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-06-12 17:36:21 +08:00
Alex Shi
5f00470fca Merge tag v3.10.43 into linux-linaro-lsk
This is the 3.10.43 stable release
2014-06-12 15:01:53 +08:00
Lai Jiangshan
24d52daafc sched: Fix hotplug vs. set_cpus_allowed_ptr()
commit 6acbfb9697 upstream.

Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b55 ("sched: Cleanup cpu_active madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b55 ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11 12:03:24 -07:00
Thomas Gleixner
ecaf19a768 sched: Sanitize irq accounting madness
commit 2d513868e2 upstream.

Russell reported, that irqtime_account_idle_ticks() takes ages due to:

       for (i = 0; i < ticks; i++)
               irqtime_account_process_tick(current, 0, rq);

It's sad, that this code was written way _AFTER_ the NOHZ idle
functionality was available. I charge myself guitly for not paying
attention when that crap got merged with commit abb74cefa ("sched:
Export ns irqtimes through /proc/stat")

So instead of looping nr_ticks times just apply the whole thing at
once.

As a side note: The whole cputime_t vs. u64 business in that context
wants to be cleaned up as well. There is no point in having all these
back and forth conversions. Lets standardise on u64 nsec for all
kernel internal accounting and be done with it. Everything else does
not make sense at all for fine grained accounting. Frederic, can you
please take care of that?

Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Shaun Ruffell <sruffell@digium.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1405022307000.6261@ionos.tec.linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11 12:03:21 -07:00
Steven Rostedt (Red Hat)
e58d78f941 sched: Use CPUPRI_NR_PRIORITIES instead of MAX_RT_PRIO in cpupri check
commit 6227cb00cc upstream.

The check at the beginning of cpupri_find() makes sure that the task_pri
variable does not exceed the cp->pri_to_cpu array length. But that length
is CPUPRI_NR_PRIORITIES not MAX_RT_PRIO, where it will miss the last two
priorities in that array.

As task_pri is computed from convert_prio() which should never be bigger
than CPUPRI_NR_PRIORITIES, if the check should cause a panic if it is
hit.

Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1397015410.5212.13.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11 12:03:21 -07:00
Chris Redpath
4378062f28 sched: hmp: fix out-of-range CPU possible
If someone hotplugs all the little CPUs while another CPU is handling
a wakeup, we can potentially return new_cpu == NR_CPUS from
hmp_select_slower_cpu (which is called internally by
hmp_best_little_cpu as well). We will use this to deref the
per_cpu rq array in hmp_next_down_delay which can go boom.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-06-11 15:08:35 +01:00
Mark Brown
5b30189298 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-05-15 16:56:03 +01:00
Mark Brown
15fdd2469e Merge remote-tracking branch 'lsk/v3.10/topic/big.LITTLE' into linux-linaro-lsk 2014-05-15 16:55:19 +01:00
Chris Redpath
d1df056f9e hmp: dont attempt to pull tasks if affinity doesn't allow it
When looking for a task to be idle-pulled, don't consider tasks
where the affinity does not allow that task to be placed on the
target CPU. Also ensure that tasks with restricted affinity
do not block selecting other unrestricted busy tasks.

Use the knowledge of target CPU more effectively in idle pull
by passing to hmp_get_heaviest_task when we know it, otherwise
only checking for general affinity matches with any of the CPUs
in the bigger HMP domain.

We still need to explicitly check affinity is allowed in idle pull
since if we find no match in hmp_get_heaviest_task we will return
the current one, which may not be affine to the new CPU despite
having high enough load. In this case, there is nothing to move.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-05-09 17:22:39 +01:00
Chris Redpath
940407d585 hmp: Use idle pull to perform forced up-migrations
When a normal forced up-migration takes place we stop the task to
be migrated while the target CPU becomes available. This delay can
range from 80us to 1500us on TC2 if the target CPU is in a deep idle
state.

Instead, interrupt the target CPU and ask it to pull a task.
This lets the current eligible task continue executing on the
original CPU while the target CPU wakes. Use a pinned timer to
prevent the pulling CPU going back into power-down with pending
up-migrations.

If we trigger for a nohz kick, it doesn't matter about triggering
for an idle pull since the idle_pull flag will be set when we
execute the softirq and we'll still do the idle pull.

If the target CPU is busy, we will not pull any tasks.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-05-09 17:22:39 +01:00
Chris Redpath
0168997c0a sched: hmp: unify active migration code
The HMP active migration code is functionally identical to the CFS
active migration code apart from one flag check. Share the code
and make the flag check optional.

Two wrapper functions allow the flag check to be present or not.

Thanks to tixy@linaro.org for pointing out the build break and a
good solution in an earlier version.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-05-09 17:22:39 +01:00
Chris Redpath
84efcd0cc5 hmp: sched: Clean up hmp_up_threshold checks into a utility fn
In anticipation of modifying the up_threshold handling, make all
instances use the same utility fn to check if a task is eligible
for up-migration. This also removes the previous difference in
threshold comparison where up-migration used '!<threshold' and
idle pull used '>threshold' to decide up-migration eligibility.
Make them both use '!<threshold' instead for consistency, although
this is unlikely to change any results.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-05-09 17:22:39 +01:00
Mark Brown
25612daaa1 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-05-07 14:56:11 +01:00
Mark Brown
2cae6ea4bb Merge remote-tracking branch 'lsk/v3.10/topic/big.LITTLE' into linux-linaro-lsk 2014-05-07 14:55:58 +01:00
Chris Redpath
1ade57e54e sched: hmp: Change small task packing defaults for all platforms
All platforms other than TC2 default to enabling packing. Since TC2
shows no performance or energy degradation with this feature enabled
make it default enabled the same as everyone else.
Likewise, vendors have been including TC2 support in multi-machine
kernel builds so they expect the default thresholds to remain the
same when the TC2 #ifdef is removed.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-05-07 11:34:00 +01:00
Mark Brown
b1cdb3ded8 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-04-08 19:52:48 +01:00
Mark Brown
1859631f14 Merge branch 'v3.10/topic/big.LITTLE' of git://git.linaro.org/kernel/linux-linaro-stable into linux-linaro-lsk 2014-04-08 18:47:40 +01:00
Jon Medhurst
db3dba6818 Revert "hmp: sched: Clean up hmp_up_threshold checks into a utility fn"
This reverts commit 765aae26e6.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-04-08 16:43:25 +01:00
Jon Medhurst
11971ff25f Revert "sched: hmp: unify active migration code"
This reverts commit 0baa5811ba.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-04-08 16:43:24 +01:00
Jon Medhurst
7e1f7d3d4e Revert "hmp: Use idle pull to perform forced up-migrations"
This reverts commit aae7721f20.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-04-08 16:43:23 +01:00
Jon Medhurst
8503bfd723 Revert "hmp: dont attempt to pull tasks if affinity doesn't allow it"
This reverts commit 5a570cfc01.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-04-08 16:43:19 +01:00
Mark Brown
0f15ea2dcd Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-03-31 23:51:39 +01:00
Mark Brown
f3124ba519 Merge tag 'v3.10.35' into linux-linaro-lsk
This is the 3.10.35 stable release
2014-03-31 23:51:22 +01:00
Gerald Schaefer
ccdb5fa37f sched/autogroup: Fix race with task_groups list
commit 41261b6a83 upstream.

In autogroup_create(), a tg is allocated and added to the task_groups
list. If CONFIG_RT_GROUP_SCHED is set, this tg is then modified while on
the list, without locking. This can race with someone walking the list,
like __enable_runtime() during CPU unplug, and result in a use-after-free
bug.

To fix this, move sched_online_group(), which adds the tg to the list,
to the end of the autogroup_create() function after the modification.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369411669-46971-2-git-send-email-gerald.schaefer@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-31 09:58:14 -07:00
Mark Brown
c0a93defcd Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-03-31 16:37:22 +01:00
Mark Brown
7dcba49cf6 Merge branch 'for-lsk' of git://git.linaro.org/arm/big.LITTLE/mp into linux-linaro-lsk 2014-03-31 16:36:42 +01:00
Alex Shi
696e897344 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-03-28 11:07:37 +08:00
Alex Shi
02e11cd173 Merge tag 'v3.10.34' into linux-linaro-lsk
This is 3.10.34 stable release
2014-03-28 10:58:52 +08:00
Chris Redpath
5a570cfc01 hmp: dont attempt to pull tasks if affinity doesn't allow it
When looking for a task to be idle-pulled, don't consider tasks
where the affinity does not allow that task to be placed on the
target CPU. Also ensure that tasks with restricted affinity
do not block selecting other unrestricted busy tasks.

Use the knowledge of target CPU more effectively in idle pull
by passing to hmp_get_heaviest_task when we know it, otherwise
only checking for general affinity matches with any of the CPUs
in the bigger HMP domain.

We still need to explicitly check affinity is allowed in idle pull
since if we find no match in hmp_get_heaviest_task we will return
the current one, which may not be affine to the new CPU despite
having high enough load. In this case, there is nothing to move.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-03-24 15:14:35 +00:00
Chris Redpath
aae7721f20 hmp: Use idle pull to perform forced up-migrations
When a normal forced up-migration takes place we stop the task to
be migrated while the target CPU becomes available. This delay can
range from 80us to 1500us on TC2 if the target CPU is in a deep idle
state.

Instead, interrupt the target CPU and ask it to pull a task.
This lets the current eligible task continue executing on the
original CPU while the target CPU wakes. Use a pinned timer to
prevent the pulling CPU going back into power-down with pending
up-migrations.

If we trigger for a nohz kick, it doesn't matter about triggering
for an idle pull since the idle_pull flag will be set when we
execute the softirq and we'll still do the idle pull.

If the target CPU is busy, we will not pull any tasks.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-03-24 15:14:34 +00:00
Chris Redpath
0baa5811ba sched: hmp: unify active migration code
The HMP active migration code is functionally identical to the CFS
active migration code apart from one flag check. Share the code
and make the flag check optional.

Two wrapper functions allow the flag check to be present or not.

Thanks to tixy@linaro.org for pointing out the build break and a
good solution in an earlier version.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-03-24 15:14:34 +00:00
Chris Redpath
765aae26e6 hmp: sched: Clean up hmp_up_threshold checks into a utility fn
In anticipation of modifying the up_threshold handling, make all
instances use the same utility fn to check if a task is eligible
for up-migration. This also removes the previous difference in
threshold comparison where up-migration used '!<threshold' and
idle pull used '>threshold' to decide up-migration eligibility.
Make them both use '!<threshold' instead for consistency, although
this is unlikely to change any results.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-03-24 15:14:33 +00:00
George McCollister
84bb5b645e sched: Fix double normalization of vruntime
commit 791c9e0292 upstream.

dequeue_entity() is called when p->on_rq and sets se->on_rq = 0
which appears to guarentee that the !se->on_rq condition is met.
If the task has done set_current_state(TASK_INTERRUPTIBLE) without
schedule() the second condition will be met and vruntime will be
incorrectly adjusted twice.

In certain cases this can result in the task's vruntime never increasing
past the vruntime of other tasks on the CFS' run queue, starving them of
CPU time.

This patch changes switched_from_fair() to use !p->on_rq instead of
!se->on_rq.

I'm able to cause a task with a priority of 120 to starve all other
tasks with the same priority on an ARM platform running 3.2.51-rt72
PREEMPT RT by writing one character at time to a serial tty (16550 UART)
in a tight loop. I'm also able to verify making this change corrects the
problem on that platform and kernel version.

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392767811-28916-1-git-send-email-george.mccollister@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:09 -07:00
Mark Brown
80b4f5de42 Merge remote-tracking branch 'lsk/linux-linaro-lsk' into linux-linaro-lsk-android 2014-01-22 15:31:30 +00:00
Mark Brown
4105a61b15 Merge remote-tracking branch 'lsk/v3.10/topic/big.LITTLE' into linux-linaro-lsk 2014-01-22 12:43:08 +00:00
Dietmar Eggemann
b30814c74c HMP: Fix rt task allowed cpu mask restriction code on 1x1 system
There is an error scenario where on a 1x1 HMP system (weight of the
hmp_slow_cpu_mask is 1) the short-cut of restricting the allowed cpu mask
of an rt tasks leads to triggering a kernel bug in the rt sched class
set_cpus_allowed function set_cpus_allowed_rt().

In case the task is on the run-queue and the weight of the required cpu mask
is 1 and this is different to the p->nr_cpus_allowed value, this back-end
function interprets this in such a way that a task changed from being
migratable to not migratable anymore and decrements the rt_nr_migratory
counter.  There is a BUG_ON(!rq->rt.rt_nr_migratory) check in this code
path which triggers in this situation.

To circumvent this issue, set the number of allowed cpus for a task p to
the weight of the hmp_slow_cpu_mask before calling do_set_cpus_allowed()
in __setscheduler(). It will be set to this value in do_set_cpus_allowed()
after the call to the sched class related backend function any way.  By
doing this, set_cpus_allowed_rt() returns without trying to update the
rt_nr_migratory counter.

This patch has been tested with a test device driver requiring a threaded
irq handler on a TC2 system with a reduced cpu mask (1 Cortex A15, 1
Cortex A7).

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-01-22 09:50:46 +00:00
Chris Redpath
b2fafaba35 sched: hmp: Fix potential task_struct memory leak
We use get_task_struct to increment the ref count on a task_struct
so that even if the task dies with a pending migration we are still
able to read the memory without causing a fault.

In the case of non-running tasks, we forgot to decrement the ref
count when we are done with the task.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-01-22 09:50:45 +00:00
Chris Redpath
ba8ed8301f sched: hmp: Change TC2 packing config to disabled default if present
Since TC2 power curves don't really have a utilisation hotspot where
packing makes sense, if it is present for a TC2 system at least make
it default to disabled.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-01-22 09:50:44 +00:00
Chris Redpath
257e5075a1 sched: hmp: Make idle balance behaviour normal when packing disabled
The presence of packing permanently changed the idle balance
behaviour. Do not restrict idle balance on the smallest CPUs when
packing is present but disabled.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-01-22 09:50:43 +00:00
Chris Redpath
7896b1e659 sched: update runqueue clock before migrations away
If we migrate a sleeping task away from a CPU which has the
tick stopped, then both the clock_task and decay_counter will
be out of date for that CPU and we will not decay load correctly
regardless of how often we update the blocked load.

This is only an issue for tasks which are not on a runqueue
(because otherwise that CPU would be awake) and simultaneously
the CPU the task previously ran on has had the tick stopped.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-01-22 09:50:42 +00:00
Chris Redpath
f720a920e8 sched: reset blocked load decay_count during synchronization
If an entity happens to sleep for less than one tick duration
the tracked load associated with that entity can be decayed by an
unexpectedly large amount if it is later migrated to a different
CPU. This can interfere with correct scheduling when entity load
is used for decision making.

The reason for this is that when an entity is dequeued and enqueued
quickly, such that se.avg.decay_count and cfs_rq.decay_counter
do not differ when that entity is enqueued again,
__synchronize_entity_decay skips the calculation step and also skips
clearing the decay_count. At a later time that entity may be
migrated and its load will be decayed incorrectly.

All users of this function expect decay_count to be zero'ed after
use.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-01-22 09:50:42 +00:00
Kamalesh Babulal
42f95a9ca8 sched/debug: Add load-tracking statistics to task
At present we print per-entity load-tracking statistics for
cfs_rq of cgroups/runqueues. Given that per task statistics
is maintained, it can be used to know the contribution made
by the task to its parenting cfs_rq level.

This patch adds per-task load-tracking statistics to /proc/<PID>/sched.

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130625080336.GA20175@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 939fd731eb)

Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-01-22 09:50:40 +00:00
Alex Shi
7bbbbe2e4b Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-01-16 09:17:42 +08:00
Alex Shi
c8e95ac690 Merge remote-tracking branch 'stable/linux-3.10.y' 3.10.27 into linux-linaro-lsk 2014-01-16 09:14:57 +08:00
Paul Turner
5ba4542368 sched: Guarantee new group-entities always have weight
commit 0ac9b1c218 upstream.

Currently, group entity load-weights are initialized to zero. This
admits some races with respect to the first time they are re-weighted in
earlty use. ( Let g[x] denote the se for "g" on cpu "x". )

Suppose that we have root->a and that a enters a throttled state,
immediately followed by a[0]->t1 (the only task running on cpu[0])
blocking:

  put_prev_task(group_cfs_rq(a[0]), t1)
  put_prev_entity(..., t1)
  check_cfs_rq_runtime(group_cfs_rq(a[0]))
  throttle_cfs_rq(group_cfs_rq(a[0]))

Then, before unthrottling occurs, let a[0]->b[0]->t2 wake for the first
time:

  enqueue_task_fair(rq[0], t2)
  enqueue_entity(group_cfs_rq(b[0]), t2)
  enqueue_entity_load_avg(group_cfs_rq(b[0]), t2)
  account_entity_enqueue(group_cfs_ra(b[0]), t2)
  update_cfs_shares(group_cfs_rq(b[0]))
  < skipped because b is part of a throttled hierarchy >
  enqueue_entity(group_cfs_rq(a[0]), b[0])
  ...

We now have b[0] enqueued, yet group_cfs_rq(a[0])->load.weight == 0
which violates invariants in several code-paths. Eliminate the
possibility of this by initializing group entity weight.

Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016181627.22647.47543.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-15 15:28:54 -08:00