linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-03 11:43:03 +09:00

Author	SHA1	Message	Date
Chris Redpath	db00ec15cb	ANDROID: sched/fair: ensure utilization signals are synchronized before use wake_cap performs task and cpu utilization synchronization which is what allows us to subtract current task util from prev_cpu util and have a sensible number to work with. It looks as though if wake_wide returns 0, we could potentially not execute wake_cap, which would result in unsynced signals we then use for energy calculations. This is not necessarily an issue we've seen in traces, but it looks as though it should be changed. Change-Id: Ic54a3cba2a10d946ea20113a04371dea04115e82 Signed-off-by: Chris Redpath <chris.redpath@arm.com> [Remove _wake_cap variable to match commit `cf6ed9a668`] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:33 -07:00
Chris Redpath	7de1b8372f	ANDROID: sched/fair: remove task util from own cpu when placing waking task When we place a waking task with find_best_target, we calculate the existing and new utilisation of each candidate cpu. However, we do not remove any blocked load resulting from the waking task on the previous cpu which might cause unnecessary migrations. Switch to using cpu_util_wake which does this for us, which requires moving cpu_util_wake a few functions earlier. Also, we have multiple potential cpu utilization signals here, so update the necessary bits to allow WALT to work properly (including not subtracting task util for WALT). When WALT is in use, cpu utilization is the utilization in the previous completed window, whilst the task utilization ignores fully idle windows. There seems to be no way to have a decently accurate estimate of how much (if any) utilization from this task remains on the prev cpu. Instead, just return cpu_util when we're using WALT. Change-Id: I448203ab98ffb5c020dfb6b218581eef1f5601f7 Reported-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:32 -07:00
Chris Redpath	8d40f58e90	ANDROID: trace:sched: Make util_avg in load_avg trace reflect PELT/WALT as used With the ability to choose between WALT and PELT for utilisation tracking we can have the situation where we're using WALT to make all the decisions and reporting PELT figures in the sched_load_avg_(cpu\|task) trace points. This is not too much of an issue, but when analysing trace it is nice to see numbers representing what the scheduler is using rather than needing to add in additional sched_walt_* traces to figure it out. Add reporting for both types, and make the util_avg member reflect what will be seen from cpu or task_util functions in the scheduler. Change-Id: I2abbd2c5fa70822096d0f3372b4c12b1c6af1590 Signed-off-by: Chris Redpath <chris.redpath@arm.com> [renamed macros according to `172895e6b5`] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:31 -07:00
Dietmar Eggemann	af88a165ec	ANDROID: sched/fair: Add eas (& cas) specific rq, sd and task stats The statistic counter are placed in the eas (& cas) wakeup path. Each of them has one representation for the runqueue (rq), the sched_domain (sd) and the task. A task counter is always incremented. A rq counter is always incremented for the rq the scheduler is currently running on. A sd counter is only incremented if a relation to a sd exists. The counters are exposed: (1) In /proc/schedstat for rq's and sd's: $ cat /proc/schedstat ... cpu0 71422 0 2321254 ... eas 44144 0 0 19446 0 24698 568435 51621 156932 133 222011 17459 120279 516814 83 0 156962 359235 176439 139981 <- runqueue for cpu0 ... domain0 3 42430 42331 ... eas 0 0 0 14200 0 0 0 0 0 0 0 0 0 0 0 0 0 0 66355 0 <- MC sched domain for cpu0 ... The per-cpu eas vector has the following elements: sis_attempts sis_idle sis_cache_affine sis_suff_cap sis_idle_cpu sis_count \|\| secb_attempts secb_sync secb_idle_bt secb_insuff_cap secb_no_nrg_sav secb_nrg_sav secb_count \|\| fbt_attempts fbt_no_cpu fbt_no_sd fbt_pref_idle fbt_count \|\| cas_attempts cas_count The following relations exist between these counters (from cpu0 eas vector above): sis_attempts = sis_idle + sis_cache_affine + sis_suff_cap + sis_idle_cpu + sis_count 44144 = 0 + 0 + 19446 + 0 + 24698 secb_attempts = secb_sync + secb_idle_bt + secb_insuff_cap + secb_no_nrg_sav + secb_nrg_sav + secb_count 568435 = 51621 + 156932 + 133 + 222011 + 17459 + 120279 fbt_attempts = fbt_no_cpu + fbt_no_sd + fbt_pref_idle + fbt_count + (return -1) 516814 = 83 + 0 + 156962 + 359235 + (534) cas_attempts = cas_count + (return -1 or smp_processor_id()) 176439 = 139981 + (36458) (2) In /proc/$PROCESS_PID/task/$TASK_PID/sched for a task. example: main thread of system_server $ cat /proc/1083/task/1083/sched ... se.statistics.nr_wakeups_sis_attempts : 945 se.statistics.nr_wakeups_sis_idle : 0 se.statistics.nr_wakeups_sis_cache_affine : 0 se.statistics.nr_wakeups_sis_suff_cap : 219 se.statistics.nr_wakeups_sis_idle_cpu : 0 se.statistics.nr_wakeups_sis_count : 726 se.statistics.nr_wakeups_secb_attempts : 10376 se.statistics.nr_wakeups_secb_sync : 1462 se.statistics.nr_wakeups_secb_idle_bt : 6984 se.statistics.nr_wakeups_secb_insuff_cap : 3 se.statistics.nr_wakeups_secb_no_nrg_sav : 927 se.statistics.nr_wakeups_secb_nrg_sav : 206 se.statistics.nr_wakeups_secb_count : 794 se.statistics.nr_wakeups_fbt_attempts : 8914 se.statistics.nr_wakeups_fbt_no_cpu : 0 se.statistics.nr_wakeups_fbt_no_sd : 0 se.statistics.nr_wakeups_fbt_pref_idle : 6987 se.statistics.nr_wakeups_fbt_count : 1554 se.statistics.nr_wakeups_cas_attempts : 3107 se.statistics.nr_wakeups_cas_count : 1195 ... The same relation between the counters as in the per-cpu case apply. Change-Id: Ie7d01267c78a3f41f60a3ef52917d5a5d463f195 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> [fixed schedstat macros calls to match modifications made in commit `ae92882e56`] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:31 -07:00
Andres Oportus	8990dc7f3c	ANDROID: sched/core: Fix PELT jump to max OPP upon util increase Change-Id: Ic80b588ec466ef707f658dcea039fd0d6b384b63 Signed-off-by: Andres Oportus <andresoportus@google.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:30 -07:00
Dietmar Eggemann	06654998ed	ANDROID: sched: EAS & 'single cpu per cluster'/cpu hotplug interoperability For Energy-Aware Scheduling (EAS) to work properly, even in the case that there is only one cpu per cluster or that cpus are hot-plugged out, the Energy Model (EM) data on all energy-aware sched domains (sd) has to be present for all online cpus. Mainline sd hierarchy setup code will remove sd's which are not useful for task scheduling e.g. in the following situations: 1. Only 1 cpu is/remains in one cluster of a multi cluster system. This remaining cpu only has DIE and no MC sd. 2. A complete cluster in a two cluster system is hot-plugged out. The cpus of the remaining cluster only have MC and no DIE sd. To make sure that all online cpus keep all their energy-aware sd's, the sd degenerate functionality has been changed to not free a sd if its first sched group (sg) contains EM data in case: 1. There is only 1 cpu left in the sd. 2. There have to be at least 2 sg's if certain sd flags are set. Instead of freeing such a sd it now clears only its SD_LOAD_BALANCE flag. This will make sure that the EAS functionality will always see all energy-aware sd's for all online cpus. It will introduce a tiny performance degradation for operations on affected cpus since the hot-path macro for_each_domain() has to deal with sd's not contributing to task scheduling at all now. In most cases the exisiting code makes sure that task scheduling is not invoked on a sd with !SD_LOAD_BALANCE. However, a small change is necessary in update_sd_lb_stats() to make sure that sd->parent is only initialized to !NULL in case the parent sd contains more than 1 sg. The handling of newidle decay values before the SD_LOAD_BALANCE check in rebalance_domains() stays unchanged. Test (w/ CONFIG_SCHED_DEBUG): JUNO r0 default system: $ cat /proc/cpuinfo \| grep "^CPU part" CPU part : 0xd03 CPU part : 0xd07 CPU part : 0xd07 CPU part : 0xd03 CPU part : 0xd03 CPU part : 0xd03 SD names and flags: $ cat /proc/sys/kernel/sched_domain/cpu/domain/name MC DIE MC DIE MC DIE MC DIE MC DIE MC DIE $ printf "%x\n" `cat /proc/sys/kernel/sched_domain/cpu/domain/flags` 832f 102f 832f 102f 832f 102f 832f 102f 832f 102f 832f 102f Test 1: Hotplug-out one A57 (CPU part 0xd07) cpu: $ echo 0 > /sys/devices/system/cpu/cpu1/online $ cat /proc/cpuinfo \| grep "^CPU part" CPU part : 0xd03 CPU part : 0xd07 CPU part : 0xd03 CPU part : 0xd03 CPU part : 0xd03 SD names and flags for remaining A57 (cpu2) cpu: $ cat /proc/sys/kernel/sched_domain/cpu2/domain/name MC DIE $ printf "%x\n" `cat /proc/sys/kernel/sched_domain/cpu2/domain/flags` 832e <-- MC SD with !SD_LOAD_BALANCE 102f Test 2: Hotplug-out the entire A57 cluster: $ echo 0 > /sys/devices/system/cpu/cpu1/online $ echo 0 > /sys/devices/system/cpu/cpu2/online $ cat /proc/cpuinfo \| grep "^CPU part" CPU part : 0xd03 CPU part : 0xd03 CPU part : 0xd03 CPU part : 0xd03 SD names and flags for the remaining A53 (CPU part 0xd03) cluster: $ cat /proc/sys/kernel/sched_domain/cpu/domain/name MC DIE MC DIE MC DIE MC DIE $ printf "%x\n" `cat /proc/sys/kernel/sched_domain/cpu/domain/flags` 832f 102e <-- DIE SD with !SD_LOAD_BALANCE 832f 102e 832f 102e 832f 102e Change-Id: If24aa2b2628f334abbf0207d39e2a86168d9d673 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:29 -07:00
Vincent Guittot	6960f771e9	UPSTREAM: sched/core: Fix group_entity's share update The update of the share of a cfs_rq is done when its load_avg is updated but before the group_entity's load_avg has been updated for the past time slot. This generates wrong load_avg accounting which can be significant when small tasks are involved in the scheduling. Let take the example of a task a that is dequeued of its task group A: root (cfs_rq) \ (se) A (cfs_rq) \ (se) a Task "a" was the only task in task group A which becomes idle when a is dequeued. We have the sequence: - dequeue_entity a->se - update_load_avg(a->se) - dequeue_entity_load_avg(A->cfs_rq, a->se) - update_cfs_shares(A->cfs_rq) A->cfs_rq->load.weight == 0 A->se->load.weight is updated with the new share (0 in this case) - dequeue_entity A->se - update_load_avg(A->se) but its weight is now null so the last time slot (up to a tick) will be accounted with a weight of 0 instead of its real weight during the time slot. The last time slot will be accounted as an idle one whereas it was a running one. If the running time of task a is short enough that no tick happens when it runs, all running time of group entity A->se will be accounted as idle time. Instead, we should update the share of a cfs_rq (in fact the weight of its group entity) only after having updated the load_avg of the group_entity. update_cfs_shares() now takes the sched_entity as a parameter instead of the cfs_rq, and the weight of the group_entity is updated only once its load_avg has been synced with current time. Change-Id: Id3f651220251d3ae3802c84fa59a68cc241db2c3 Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pjt@google.com Link: http://lkml.kernel.org/r/1482335426-7664-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `89ee048f3c`) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:29 -07:00
Vincent Guittot	3a34bf5b2d	UPSTREAM: sched/fair: Propagate asynchrous detach A task can be asynchronously detached from cfs_rq when migrating between CPUs. The load of the migrated task is then removed from source cfs_rq during its next update. We use this event to set propagation flag. During the load balance, we take advantage of the update of blocked load to propagate any pending changes. The propagation relies on patch: "sched: Fix hierarchical order in rq->leaf_cfs_rq_list" ... which orders children and parents, to ensure that it's done in one pass. Change-Id: I5d1a9f273111c0bb7503ca3c9ad2406d12867cb5 Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-6-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `4e5160766f`) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:28 -07:00
Vincent Guittot	96956e2684	UPSTREAM: sched/fair: Propagate load during synchronous attach/detach When a task moves from/to a cfs_rq, we set a flag which is then used to propagate the change at parent level (sched_entity and cfs_rq) during next update. If the cfs_rq is throttled, the flag will stay pending until the cfs_rq is unthrottled. For propagating the utilization, we copy the utilization of group cfs_rq to the sched_entity. For propagating the load, we have to take into account the load of the whole task group in order to evaluate the load of the sched_entity. Similarly to what was done before the rewrite of PELT, we add a correction factor in case the task group's load is greater than its share so it will contribute the same load of a task of equal weight. Change-Id: I0aeaed29bb880c91d10df82c38ac9fc681de4f76 Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-5-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `09a43ace1f`) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:27 -07:00
Vincent Guittot	793cfff02e	UPSTREAM: sched/fair: Factorize attach/detach entity Factorize post_init_entity_util_avg() and part of attach_task_cfs_rq() in one function attach_entity_cfs_rq(). Create symmetric detach_entity_cfs_rq() function. Change-Id: Id258d6055039ecfe7a3a8752dfff0257fc9f6508 Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-2-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `df217913e7`) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:26 -07:00
Dietmar Eggemann	56ffdd61dc	ANDROID: sched/fair: Simplify idle_idx handling in select_idle_sibling() Rename best_idle to best_idle_cpu so the same name is used like in find_best_target(). Fix if (best_idle > 0) since best_idle_cpu = 0 is a valid target. Use 'unsigned long' data type for best_idle_capacity. Since we're looking for the shallowest best_idle_cstate initialize best_idle_cstate = INT_MAX. For cpus which are not idle (idle_idx = -1) the condition 'if (idle_idx < best_idle_cstate && ...)' is never executed. Change-Id: Ic5b63d58478696b3d1ec6253cf739a69a574cf99 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> (cherry picked from commit 8bff5e9c0968108d465e1f2a4624fc5ec2f00849) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:25 -07:00
Dietmar Eggemann	2dfb172df5	ANDROID: sched/fair: refactor find_best_target() for simplicity Simplify backup_capacity handling and use 'unsigned long' data type for cpu capacity, simplify target_util handling, simplify idle_idx handling & refactor min_util, new_util. Also return first idle cpu for prefer_idle task immediately. Change-Id: Ic89e140f7b369f3965703fdc8463013d16e9b94a Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:25 -07:00
Dietmar Eggemann	3bfde3b4f8	ANDROID: sched/fair: Change cpu iteration order in find_best_target() The schedtune task parameter 'boosted' is mapped into the cpu iteration order. Currently for 'boosted' equal true the iteration starts at the last cpu (NR_CPUS-1) whereas for 'boosted' equal false it starts at the first cpu (0). This only has the desired effect if the cpu topology oerdering matches the underlying assumption. This e.g. is the case for the Qc snapdragon 821 with its [L0 L1 b0 b1] cpu topology layout (L=lower max freq, b=higher max freq). This results in cpus with higher maximum capacity being given the highest logical cpu ids. However not all big.LITTLE systems enumerate their cpus in the same way. For example, the ARM Versatile Express Juno board has 6 cpus for which the default configuration has topology [L0 b0 b1 L1 L2 L3]. To make this approach independent from the cpu topology layout it now iterates over the cpus in the order of the sched_groups of the EAS sched_domain (sd_ea). The order of cpu iteration is different for the different cpu types in case the cpu is used to dereference sd_ea. Considering the Qc snapdragon 821 again, for cpu L0 and L1 the order is 'b0->b1->L0->L1' whereas for b0 and b1 the order is 'b0->b1->L0->L1'. This approach does not allow the exact same iteration order as with the currently used flat iteration over [0 .. NR_CPUS-1] but the cpus are ordered by the original cpu capacity. The cpu iteration is now done in the sd_ea sched_group order required by the 'boosted' value ['L0->L1->b0->b1'/'b0->b1->L0->L1'] rather than forward/backward over the flat cpu space ['L0->L1->b0->b1'/ 'b1->b0->L1->L0']. Change-Id: I8fbe2073dedd2ecb1c750620c6000c11a5ff4358 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> (cherry picked from commit a0c6a4272c3968c0ff50d3fed65f5865b72d777b) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:24 -07:00
Dietmar Eggemann	14774e7609	ANDROID: sched/core: Add first cpu w/ max/min orig capacity to root domain This will allow to start iterating from a cpu with max or min original capacity in the wakeup path regardless on which cpu the scheduler is currently running (smp_processor_id()) or the previous cpu of the task (task_cpu(p)). This iteration has to happen on a sched_domain spanning all cpus in the order of the sched_groups of this sched_domain seen by the starting cpu. In case of an SMP system the first cpu with max orig capacity and the the one with min orig capacity is the same. This can temporally happen on a big.LITTLE system with hotplug as well. E.g. the different order of cpu iteration can be used to map schedtune task parameter 'boosted' into the cpu iteration order in find_best_target(). Use of READ_ONCE()/WRITE_ONCE() to avoid load/store tearing. Change-Id: I812fbd9c7e5f506617e456c0eec3edcd2c016e92 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> (cherry picked from commit fd6e9543c1fd8971a5e2e68e39b2f6e591d46114) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:23 -07:00
Dietmar Eggemann	52a8450d07	ANDROID: sched/core: Remove remnants of commit fd5c98da1a42 Commit fd5c98da1a42 "WIP: sched: Store system-wide maximum cpu capacity in root domain" was repalced by commit 8148bdfff4f5 "WIP: sched: Update max cpu capacity in case of max frequency constraints" which didn't remove all the now unused bits. Change-Id: I067f6366431f43337cffa7a2a8e0de32dd33d2f9 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> (cherry picked from commit 6d284a607cec51bcafca313bc396bc3103b1e876) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:23 -07:00
Dietmar Eggemann	ea5a7f2df0	ANDROID: sched: Remove sysctl_sched_is_big_little With the new wakeup approach this sysctl is not necessary any more. Change-Id: I52114b3c918791f6a4f9f30f50002919ccbc1a9c Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> (cherry picked from commit 885c0d503bcdf0ef4e9b46822496f16b20aa3bbd) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:22 -07:00
Dietmar Eggemann	8c2e3d8850	ANDROID: sched/fair: Code !is_big_little path into select_energy_cpu_brute() This patch replaces the existing EAS upstream implementation of select_energy_cpu_brute() with the one of find_best_target() used in Android previously. It also removes the cpumask 'and' from select_energy_cpu_brute, see the existing use of 'cpu = smp_processor_id()' in select_task_rq_fair(). Change-Id: If678c002efaa87d1ba3ec9989a4e9f8df98b83ec Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> [ added guarding for non-schedtune builds ] Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:21 -07:00
Dietmar Eggemann	52b09b1fde	ANDROID: EAS: sched/fair: Re-integrate 'honor sync wakeups' into wakeup path This patch re-integrates the part which was initially provided by 3b9d7554aeec ("EAS: sched/fair: tunable to honor sync wakeups") into energy_aware_wake_cpu() into select_energy_cpu_brute(). Change-Id: I748fde3ecdeb44651179bce0a5bb8dd82d1903f6 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> (cherry picked from commit b75b7286cb068d5761621ea134c23dd131db953f) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:21 -07:00
Dietmar Eggemann	8e862e1e31	ANDROID: Fixup!: sched/fair.c: Set SchedTune specific struct energy_env.task This has to be done in the caller function of energy_diff() version of SchedTune to avoid Null pointer dereference in energy_diff(). Change-Id: I3f0f68dbd11efb15bbb3b1832f8294419ed85241 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> (cherry picked from commit 14531d4e245d063f713ee5ed835df958e6c7838f) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:20 -07:00
Morten Rasmussen	9e31218f23	ANDROID: sched/fair: Energy-aware wake-up task placement When the systems is not overutilized, place waking tasks on the most energy efficient cpu. Previous attempts reduced the search space by matching task utilization to cpu capacity before consulting the energy model as this is an expensive operation. The search heuristics didn't work very well and lacking any better alternatives this patch takes the brute-force route and tries all potential targets. This approach doesn't scale, but it might be sufficient for many embedded applications while work is continuing on a heuristic that can minimize the necessary computations. The heuristic must be derrived from the platform energy model rather than make additional assumptions, such lower capacity implies better energy efficiency. PeterZ mentioned in the past that we might be able to derrive some simpler deciding functions using mathematical (modal?) analysis. Change-Id: I772bacb4c8fd599f8006fa422f842e66377a9c6c Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> [rebase: on top of msm-google/android-msm-marlin-3.18] Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> (cherry picked from commit a894422dbdb7b77ea2acfe7ff909ccb5ded23514) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:19 -07:00
Morten Rasmussen	5383892305	ANDROID: sched/fair: Add energy_diff dead-zone margin It is not worth the overhead to migrate tasks for tiny insignificant energy savings. To prevent this, an energy margin is introduced in energy_diff() which effectively adds a dead-zone that rounds tiny energy differences to zero. Since no scale is enforced for energy model data the margin can't be absolute. Instead it is defined as +/-1.56% energy saving compared to the current total estimated energy consumption. Change-Id: I6be069c752c701fb825430896b3b768a7ab2fee4 Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> [rebase: on top of msm-google/android-msm-marlin-3.18, massage original patch which changes code in energy_diff() into __energy_diff() introduced by SchedTune] Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> (cherry picked from commit 780cb5a5fa47adf13d4fc2b77e8e94448cd56098) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:18 -07:00
Morten Rasmussen	5dbcddecc5	UPSTREAM: sched/fair: Fix incorrect comment for capacity_margin The comment for capacity_margin introduced in: `3273163c67` ("sched/fair: Let asymmetric CPU configurations balance at wake-up") ... got its usage the wrong way round - fix it. Change-Id: I0668ea25e3d95a4ee625a101991610dddcdc2230 Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-7-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `893c5d2279`) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:17 -07:00
Morten Rasmussen	942295e902	UPSTREAM: sched/fair: Avoid pulling tasks from non-overloaded higher capacity groups For asymmetric CPU capacity systems it is counter-productive for throughput if low capacity CPUs are pulling tasks from non-overloaded CPUs with higher capacity. The assumption is that higher CPU capacity is preferred over running alone in a group with lower CPU capacity. This patch rejects higher CPU capacity groups with one or less task per CPU as potential busiest group which could otherwise lead to a series of failing load-balancing attempts leading to a force-migration. Change-Id: I6c668db8efb23c98cba6a0358a5eae550b247c27 Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-5-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `9e0994c0a1`) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:16 -07:00
Morten Rasmussen	3d8cb903db	UPSTREAM: sched/fair: Add per-CPU min capacity to sched_group_capacity struct sched_group_capacity currently represents the compute capacity sum of all CPUs in the sched_group. Unless it is divided by the group_weight to get the average capacity per CPU, it hides differences in CPU capacity for mixed capacity systems (e.g. high RT/IRQ utilization or ARM big.LITTLE). But even the average may not be sufficient if the group covers CPUs of different capacities. Instead, by extending struct sched_group_capacity to indicate min per-CPU capacity in the group a suitable group for a given task utilization can more easily be found such that CPUs with reduced capacity can be avoided for tasks with high utilization (not implemented by this patch). Change-Id: I833725e2b70794384c2b8efac5dc8107f1dbb622 Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-4-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `bf475ce0a3`) [Fixed cherry-pick issue] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:30:16 -07:00
Morten Rasmussen	544443501e	UPSTREAM: sched/fair: Consider spare capacity in find_idlest_group() In low-utilization scenarios comparing relative loads in find_idlest_group() doesn't always lead to the most optimum choice. Systems with groups containing different numbers of cpus and/or cpus of different compute capacity are significantly better off when considering spare capacity rather than relative load in those scenarios. In addition to existing load based search an alternative spare capacity based candidate sched_group is found and selected instead if sufficient spare capacity exists. If not, existing behaviour is preserved. Change-Id: I165d49fa8f20b6c8bfab5324f3549aa449e81a17 Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-3-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `6a0b19c0f3`) [Fixed cherry-pick issue] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 15:28:30 -07:00
Morten Rasmussen	3557724318	UPSTREAM: sched/fair: Compute task/cpu utilization at wake-up correctly At task wake-up load-tracking isn't updated until the task is enqueued. The task's own view of its utilization contribution may therefore not be aligned with its contribution to the cfs_rq load-tracking which may have been updated in the meantime. Basically, the task's own utilization hasn't yet accounted for the sleep decay, while the cfs_rq may have (partially). Estimating the cfs_rq utilization in case the task is migrated at wake-up as task_rq(p)->cfs.avg.util_avg - p->se.avg.util_avg is therefore incorrect as the two load-tracking signals aren't time synchronized (different last update). To solve this problem, this patch synchronizes the task utilization with its previous rq before the task utilization is used in the wake-up path. Currently the update/synchronization is done _after_ the task has been placed by select_task_rq_fair(). The synchronization is done without having to take the rq lock using the existing mechanism used in remove_entity_load_avg(). Change-Id: I61a79f7f68742026d1bdfd828df13a6165fdae5d Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-2-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `104cb16d9e`) [Fixed cherry-pick issue] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 12:58:58 -07:00
Quentin Perret	7fb3e0c96a	ANDROID: Partial Revert: "ANDROID: sched: Add cpu capacity awareness to wakeup balancing" Revert most of changes of commit `b9ac009489`. Keep infrastructure bits used in following EAS patches. Fix task_util return type. Change-Id: Ic545ac745d00f19f7b6567d4b4b8d4bde32a95ca Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 12:58:51 -07:00
Dietmar Eggemann	0df28983b8	ANDROID: sched/fair: Decommission energy_aware_wake_cpu() The EAS functionality in the wakeup path will be brought back by the following patch ("sched/fair: Energy-aware wake-up task placement") providing the function select_energy_cpu_brute(). Change-Id: I927fb9e8261cfacfe404695f853941c7959aa146 [ Trivial merge conflicts resolved. ] Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> (cherry picked from commit 80aee424fb7765a777267e144037642625a71304) Signed-off-by: Chris Redpath <chris.redpath@arm.com> [removed remaining energy_aware() call from select_task_rq_fair()] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 12:58:38 -07:00
Dietmar Eggemann	e5eb12acc7	ANDROID: Revert "WIP: sched: Consider spare cpu capacity at task wake-up" This reverts commit 75a9695b619741019363f889c99c97c7bb823797. Change-Id: I846b21f2bdeb0b0ca30ad65683564ed07a429428 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> [ minor merge changes ] Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 12:58:23 -07:00
Viresh Kumar	8c29c1a327	FROM-LIST: cpufreq: schedutil: Redefine the rate_limit_us tunable The rate_limit_us tunable is intended to reduce the possible overhead from running the schedutil governor. However, that overhead can be divided into two separate parts: the governor computations and the invocation of the scaling driver to set the CPU frequency. The latter is where the real overhead comes from. The former is much less expensive in terms of execution time and running it every time the governor callback is invoked by the scheduler, after rate_limit_us interval has passed since the last frequency update, would not be a problem. For this reason, redefine the rate_limit_us tunable so that it means the minimum time that has to pass between two consecutive invocations of the scaling driver by the schedutil governor (to set the CPU frequency). Change-Id: Iced64116b826c25441ef537c27a3dabfcf81919e Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [pulled from linux-pm linux-next https://patchwork.kernel.org/patch/9583949/ ] Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 12:58:18 -07:00
Steve Muckle	4152c22c4f	ANDROID: cpufreq: schedutil: add up/down frequency transition rate limits The rate-limit tunable in the schedutil governor applies to transitions to both lower and higher frequencies. On several platforms it is not the ideal tunable though, as it is difficult to get best power/performance figures using the same limit in both directions. It is common on mobile platforms with demanding user interfaces to want to increase frequency rapidly for example but decrease slowly. One of the example can be a case where we have short busy periods followed by similar or longer idle periods. If we keep the rate-limit high enough, we will not go to higher frequencies soon enough. On the other hand, if we keep it too low, we will have too many frequency transitions, as we will always reduce the frequency after the busy period. It would be very useful if we can set low rate-limit while increasing the frequency (so that we can respond to the short busy periods quickly) and high rate-limit while decreasing frequency (so that we don't reduce the frequency immediately after the short busy period and that may avoid frequency transitions before the next busy period). Implement separate up/down transition rate limits. Note that the governor avoids frequency recalculations for a period equal to minimum of up and down rate-limit. A global mutex is also defined to protect updates to min_rate_limit_us via two separate sysfs files. Note that this wouldn't change behavior of the schedutil governor for the platforms which wish to keep same values for both up and down rate limits. This is tested with the rt-app [1] on ARM Exynos, dual A15 processor platform. Testcase: Run a SCHED_OTHER thread on CPU0 which will emulate work-load for X ms of busy period out of the total period of Y ms, i.e. Y - X ms of idle period. The values of X/Y taken were: 20/40, 20/50, 20/70, i.e idle periods of 20, 30 and 50 ms respectively. These were tested against values of up/down rate limits as: 10/10 ms and 10/40 ms. For every test we noticed a performance increase of 5-10% with the schedutil governor, which was very much expected. [Viresh]: Simplified user interface and introduced min_rate_limit_us + mutex, rewrote commit log and included test results. [1] https://github.com/scheduler-tools/rt-app/ Change-Id: I18720a83855b196b8e21dcdc8deae79131635b84 Signed-off-by: Steve Muckle <smuckle.linux@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> (applied from https://marc.info/?l=linux-kernel&m=147936011103832&w=2) [trivial adaptations] Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 12:58:12 -07:00
Juri Lelli	c6e9438301	ANDROID: sched/cpufreq: make schedutil use WALT signal If WALT is available and enabled, make schedutil governor use its utilization signal. Change-Id: I92bc37989447a76616e9bcc4e9e8616774fb9925 Signed-off-by: Juri Lelli <juri.lelli@arm.com> [we need to use boosted_cpu_util for schedutil, so make it not static] Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 12:57:47 -07:00
Steve Muckle	8d408128c7	ANDROID: sched: cpufreq: use rt_avg as estimate of required RT CPU capacity A policy of going to fmax on any RT activity will be detrimental for power on many platforms. Often RT accounts for only a small amount of CPU activity so sending the CPU frequency to fmax is overkill. Worse still, some platforms may not be able to even complete the CPU frequency change before the RT activity has already completed. Cpufreq governors have not treated RT activity this way in the past so it is not part of the expected semantics of the RT scheduling class. The DL class offers guarantees about task completion and could be used for this purpose. Modify the schedutil algorithm to instead use rt_avg as an estimate of RT utilization of the CPU. Based on previous work by Vincent Guittot <vincent.guittot@linaro.org>. Change-Id: I1ed605a3e2512a94d34217a8e57c3fd97cca60be Signed-off-by: Steve Muckle <smuckle@linaro.org> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 12:57:00 -07:00
Viresh Kumar	29d892d7f5	UPSTREAM: cpufreq: schedutil: move slow path from workqueue to SCHED_FIFO task If slow path frequency changes are conducted in a SCHED_OTHER context then they may be delayed for some amount of time, including indefinitely, when real time or deadline activity is taking place. Move the slow path to a real time kernel thread. In the future the thread should be made SCHED_DEADLINE. The RT priority is arbitrarily set to 50 for now. Hackbench results on ARM Exynos, dual core A15 platform for 10 iterations: $ hackbench -s 100 -l 100 -g 10 -f 20 Before After --------------------------------- 1.808 1.603 1.847 1.251 2.229 1.590 1.952 1.600 1.947 1.257 1.925 1.627 2.694 1.620 1.258 1.621 1.919 1.632 1.250 1.240 Average: 1.8829 1.5041 Based on initial work by Steve Muckle. Change-Id: I8bb111435b0871e961a6f26db1151c91624817b8 Signed-off-by: Steve Muckle <smuckle.linux@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry-picked from commit `02a7b1ee3b`) [fixed cherry-pick issue] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 12:54:36 -07:00
Steve Muckle	4e685a28e7	ANDROID: sched/cpufreq: fix tunables for schedfreq governor The schedfreq governor does not currently handle cpufreq drivers which use a global set of tunables (!have_governor_per_policy). For example on x86 and using the acpi cpufreq driver, doing this cat /sys/devices/system/cpu/cpufreq/sched/up_throttle_nsec will result in a bad pointer access. Update the tunable code using the upstream schedutil tunable code by Rafael Wysocki as a guide. Includes a partial backport of the reorganized cpufreq tunable infrastructure. Change-Id: I7e6f8de1dac297077ad43f37dd2f6ddbfe921c98 Signed-off-by: Steve Muckle <smuckle@linaro.org> [fixed cherry-pick issue] Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 12:53:07 -07:00
Brendan Jackman	b224647f87	DEBUG: sched/fair: Fix sched_load_avg_cpu events for task_groups The current sched_load_avg_cpu event traces the load for any cfs_rq that is updated. This is not representative of the CPU load - instead we should only trace this event when the cfs_rq being updated is in the root_task_group. Change-Id: I345c2f13f6b5718cb4a89beb247f7887ce97ed6b Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> (cherry picked from commit `1cb392e103`) (cherry picked from commit 835f5ebd2f7170c87dc1387976ed1b4e46215251) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 19:09:52 +01:00
Brendan Jackman	0f493a73ed	DEBUG: sched/fair: Fix missing sched_load_avg_cpu events update_cfs_rq_load_avg is called from update_blocked_averages without triggering the sched_load_avg_cpu event. Move the event trigger to inside update_cfs_rq_load_avg to avoid this missing event. Change-Id: I6c4f66f687a644e4e7f798db122d28a8f5919b7b Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> (cherry picked from commit `7f18f0963d`) (cherry picked from commit 998d092b3385f1e62fa5cb5f491c7c633a4bbc3e) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 19:09:52 +01:00
Morten Rasmussen	b19cdb902d	sched: Consider misfit tasks when load-balancing With the new group_misfit_task load-balancing scenario additional policy conditions are needed when load-balancing. Misfit task balancing only makes sense between source group with lower capacity than the target group. If capacities are the same, fallback to normal group_other balancing. The aim is to balance tasks such that no task has its throughput hindered by compute capacity if a cpu with more capacity is available. Load-balancing is generally based on average load in the sched_groups, but for misfitting tasks it is necessary to introduce exceptions to migrate tasks against usual metrics and optimize throughput. This patch ensures the following load-balance for mixed capacity systems (e.g. ARM big.LITTLE) for always-running tasks: 1. Place a task on each cpu starting in order from cpus with highest capacity to lowest until all cpus are in use (i.e. one task on each cpu). 2. Once all cpus are in use balance according to compute capacity such that load per capacity is approximately the same regardless of the compute capacity (i.e. big cpus get more tasks than little cpus). Necessary changes are introduced in find_busiest_group(), calculate_imbalance(), and find_busiest_queue(). This includes passing the group_type on to find_busiest_queue() through struct lb_env, which is currently only considers imbalance and not the imbalance situation (group_type). To avoid taking remote rq locks to examine source sched_groups for misfit tasks, each cpu is responsible for tracking misfit tasks themselves and update the rq->misfit_task flag. This means checking task utilization when tasks are scheduled and on sched_tick. Change-Id: I29b35783765b0e51ae3ede74a8160b07f885bfde Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> (cherry picked from commit `0d2b1cdfa1`) [Fixed cherry-pick issue] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-08-03 19:09:51 +01:00
Greg Kroah-Hartman	9ae2c670d8	Merge 4.9.40 into android-4.9 Changes in 4.9.40 disable new gcc-7.1.1 warnings for now ir-core: fix gcc-7 warning on bool arithmetic dm mpath: cleanup -Wbool-operation warning in choose_pgpath() s5p-jpeg: don't return a random width/height thermal: max77620: fix device-node reference imbalance thermal: cpu_cooling: Avoid accessing potentially freed structures ath9k: fix tx99 use after free ath9k: fix tx99 bus error ath9k: fix an invalid pointer dereference in ath9k_rng_stop() NFC: fix broken device allocation NFC: nfcmrvl_uart: add missing tty-device sanity check NFC: nfcmrvl: do not use device-managed resources NFC: nfcmrvl: use nfc-device for firmware download NFC: nfcmrvl: fix firmware-management initialisation nfc: Ensure presence of required attributes in the activate_target handler nfc: Fix the sockaddr length sanitization in llcp_sock_connect NFC: Add sockaddr length checks before accessing sa_family in bind handlers perf intel-pt: Move decoder error setting into one condition perf intel-pt: Improve sample timestamp perf intel-pt: Fix missing stack clear perf intel-pt: Ensure IP is zero when state is INTEL_PT_STATE_NO_IP perf intel-pt: Fix last_ip usage perf intel-pt: Ensure never to set 'last_ip' when packet 'count' is zero perf intel-pt: Use FUP always when scanning for an IP perf intel-pt: Clear FUP flag on error Bluetooth: use constant time memory comparison for secret values wlcore: fix 64K page support btrfs: Don't clear SGID when inheriting ACLs igb: Explicitly select page 0 at initialization ASoC: compress: Derive substream from stream based on direction PM / Domains: Fix unsafe iteration over modified list of device links PM / Domains: Fix unsafe iteration over modified list of domain providers PM / Domains: Fix unsafe iteration over modified list of domains scsi: ses: do not add a device to an enclosure if enclosure_add_links() fails. scsi: Add STARGET_CREATED_REMOVE state to scsi_target_state iscsi-target: Add login_keys_workaround attribute for non RFC initiators xen/scsiback: Fix a TMR related use-after-free powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp() powerpc/64: Fix atomic64_inc_not_zero() to return an int powerpc: Fix emulation of mcrf in emulate_step() powerpc: Fix emulation of mfocrf in emulate_step() powerpc/asm: Mark cr0 as clobbered in mftb() powerpc/mm/radix: Properly clear process table entry af_key: Fix sadb_x_ipsecrequest parsing PCI: Work around poweroff & suspend-to-RAM issue on Macbook Pro 11 PCI: rockchip: Use normal register bank for config accessors PCI/PM: Restore the status of PCI devices across hibernation ipvs: SNAT packet replies only for NATed connections xhci: fix 20000ms port resume timeout xhci: Fix NULL pointer dereference when cleaning up streams for removed host xhci: Bad Ethernet performance plugged in ASM1042A host mxl111sf: Fix driver to use heap allocate buffers for USB messages usb: storage: return on error to avoid a null pointer dereference USB: cdc-acm: add device-id for quirky printer usb: renesas_usbhs: fix usbhsc_resume() for !USBHSF_RUNTIME_PWCTRL usb: renesas_usbhs: gadget: disable all eps when the driver stops md: don't use flush_signals in userspace processes x86/xen: allow userspace access during hypercalls cx88: Fix regression in initial video standard setting libnvdimm, btt: fix btt_rw_page not returning errors libnvdimm: fix badblock range handling of ARS range ext2: Don't clear SGID when inheriting ACLs Raid5 should update rdev->sectors after reshape s390/syscalls: Fix out of bounds arguments access drm/amd/amdgpu: Return error if initiating read out of range on vram drm/radeon/ci: disable mclk switching for high refresh rates (v2) drm/radeon: Fix eDP for single-display iMac10,1 (v2) ipmi: use rcu lock around call to intf->handlers->sender() ipmi:ssif: Add missing unlock in error branch xfs: Don't clear SGID when inheriting ACLs f2fs: sanity check size of nat and sit cache f2fs: Don't clear SGID when inheriting ACLs drm/ttm: Fix use-after-free in ttm_bo_clean_mm ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials vfio: Fix group release deadlock vfio: New external user group/file match nvme-rdma: remove race conditions from IB signalling ftrace: Fix uninitialized variable in match_records() MIPS: Fix mips_atomic_set() retry condition MIPS: Fix mips_atomic_set() with EVA MIPS: Negate error syscall return in trace ubifs: Don't leak kernel memory to the MTD ACPI / EC: Drop EC noirq hooks to fix a regression Revert "ACPI / EC: Enable event freeze mode..." to fix a regression x86/acpi: Prevent out of bound access caused by broken ACPI tables x86/ioapic: Pass the correct data to unmask_ioapic_irq() MIPS: Fix MIPS I ISA /proc/cpuinfo reporting MIPS: Save static registers before sysmips MIPS: Actually decode JALX in `__compute_return_epc_for_insn' MIPS: Fix unaligned PC interpretation in `compute_return_epc' MIPS: math-emu: Prevent wrong ISA mode instruction emulation MIPS: Send SIGILL for BPOSGE32 in `__compute_return_epc_for_insn' MIPS: Rename `sigill_r6' to `sigill_r2r6' in `__compute_return_epc_for_insn' MIPS: Send SIGILL for linked branches in `__compute_return_epc_for_insn' MIPS: Send SIGILL for R6 branches in `__compute_return_epc_for_insn' MIPS: Fix a typo: s/preset/present/ in r2-to-r6 emulation error message Input: i8042 - fix crash at boot time IB/iser: Fix connection teardown race condition IB/core: Namespace is mandatory input for address resolution sunrpc: use constant time memory comparison for mac NFS: only invalidate dentrys that are clearly invalid. udf: Fix deadlock between writeback and udf_setsize() target: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_done perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target Revert "perf/core: Drop kernel samples even though :u is specified" staging: rtl8188eu: add TL-WN722N v2 support staging: comedi: ni_mio_common: fix AO timer off-by-one regression staging: sm750fb: avoid conflicting vesafb staging: lustre: ko2iblnd: check copy_from_iter/copy_to_iter return code ceph: fix race in concurrent readdir RDMA/core: Initialize port_num in qp_attr drm/mst: Fix error handling during MST sideband message reception drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req() drm/mst: Avoid processing partially received up/down message transactions mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array hfsplus: Don't clear SGID when inheriting ACLs ovl: fix random return value on mount acpi/nfit: Fix memory corruption/Unregister mce decoder on failure of: device: Export of_device_{get_modalias, uvent_modalias} to modules spmi: Include OF based modalias in device uevent reiserfs: Don't clear SGID when inheriting ACLs PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present tracing: Fix kmemleak in instance_rmdir alarmtimer: don't rate limit one-shot timers Linux 4.9.40 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-07-27 15:24:43 -07:00
Greg Hackmann	91af5f04cd	alarmtimer: don't rate limit one-shot timers Commit `ff86bf0c65` ("alarmtimer: Rate limit periodic intervals") sets a minimum bound on the alarm timer interval. This minimum bound shouldn't be applied if the interval is 0. Otherwise, one-shot timers will be converted into periodic ones. Fixes: `ff86bf0c65` ("alarmtimer: Rate limit periodic intervals") Reported-by: Ben Fennema <fennema@google.com> Signed-off-by: Greg Hackmann <ghackmann@google.com> Cc: stable@vger.kernel.org Cc: John Stultz <john.stultz@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-27 15:08:08 -07:00
Chunyu Hu	919e481152	tracing: Fix kmemleak in instance_rmdir commit `db9108e054` upstream. Hit the kmemleak when executing instance_rmdir, it forgot releasing mem of tracing_cpumask. With this fix, the warn does not appear any more. unreferenced object 0xffff93a8dfaa7c18 (size 8): comm "mkdir", pid 1436, jiffies 4294763622 (age 9134.308s) hex dump (first 8 bytes): ff ff ff ff ff ff ff ff ........ backtrace: [<ffffffff88b6567a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff8861ea41>] __kmalloc_node+0xf1/0x280 [<ffffffff88b505d3>] alloc_cpumask_var_node+0x23/0x30 [<ffffffff88b5060e>] alloc_cpumask_var+0xe/0x10 [<ffffffff88571ab0>] instance_mkdir+0x90/0x240 [<ffffffff886e5100>] tracefs_syscall_mkdir+0x40/0x70 [<ffffffff886565c9>] vfs_mkdir+0x109/0x1b0 [<ffffffff8865b1d0>] SyS_mkdir+0xd0/0x100 [<ffffffff88403857>] do_syscall_64+0x67/0x150 [<ffffffff88b710e7>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Link: http://lkml.kernel.org/r/1500546969-12594-1-git-send-email-chuhu@redhat.com Fixes: `ccfe9e42e4` ("tracing: Make tracing_cpumask available for all instances") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-27 15:08:08 -07:00
Ingo Molnar	a76a032300	Revert "perf/core: Drop kernel samples even though :u is specified" commit `6a8a75f323` upstream. This reverts commit `cc1582c231`. This commit introduced a regression that broke rr-project, which uses sampling events to receive a signal on overflow (but does not care about the contents of the sample). These signals are critical to the correct operation of rr. There's been some back and forth about how to fix it - but to not keep applications in limbo queue up a revert. Reported-by: Kyle Huey <me@kylehuey.com> Acked-by: Kyle Huey <me@kylehuey.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170628105600.GC5981@leverpostej Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-27 15:08:06 -07:00
Dan Carpenter	198bd494ce	ftrace: Fix uninitialized variable in match_records() commit `2e028c4fe1` upstream. My static checker complains that if "func" is NULL then "clear_filter" is uninitialized. This seems like it could be true, although it's possible something subtle is happening that I haven't seen. kernel/trace/ftrace.c:3844 match_records() error: uninitialized symbol 'clear_filter'. Link: http://lkml.kernel.org/r/20170712073556.h6tkpjcdzjaozozs@mwanda Fixes: `f0a3b154bd` ("ftrace: Clarify code for mod command") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-27 15:08:03 -07:00
Daniel Mentz	5b07c2d252	Revert "ANDROID: proc: smaps: Allow smaps access for CAP_SYS_RESOURCE" This reverts commit `ff8b80819c`. This fixes CVE-2017-0710. SELinux allows more fine grained control: We grant processes that need access to smaps CAP_SYS_PTRACE but prohibit them from using ptrace attach(). Bug: 34951864 Bug: 36468447 Change-Id: I00a513188245a30bc63dcbdafbb9746bc6d9d6ff Signed-off-by: Daniel Mentz <danielmentz@google.com>	2017-07-21 11:10:17 -07:00
Greg Kroah-Hartman	14accea70e	Merge 4.9.39 into android-4.9 Changes in 4.9.39 xen-netfront: Rework the fix for Rx stall during OOM and network stress net_sched: fix error recovery at qdisc creation net: sched: Fix one possible panic when no destroy callback net/phy: micrel: configure intterupts after autoneg workaround ipv6: avoid unregistering inet6_dev for loopback net: dp83640: Avoid NULL pointer dereference. tcp: reset sk_rx_dst in tcp_disconnect() net: prevent sign extension in dev_get_stats() bridge: mdb: fix leak on complete_info ptr on fail path rocker: move dereference before free bpf: prevent leaking pointer via xadd on unpriviledged net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish() net/mlx5: Cancel delayed recovery work when unloading the driver liquidio: fix bug in soft reset failure detection net/mlx5e: Fix TX carrier errors report in get stats ndo ipv6: dad: don't remove dynamic addresses if link is down vxlan: fix hlist corruption net: core: Fix slab-out-of-bounds in netdev_stats_to_stats64 net: ipv6: Compare lwstate in detecting duplicate nexthops vrf: fix bug_on triggered by rx when destroying a vrf rds: tcp: use sock_create_lite() to create the accept socket brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx() brcmfmac: Fix a memory leak in error handling path in 'brcmf_cfg80211_attach' brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain sfc: don't read beyond unicast address list cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES cfg80211: Check if PMKID attribute is of expected size cfg80211: Check if NAN service ID is of expected size irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity parisc: Report SIGSEGV instead of SIGBUS when running out of stack parisc: use compat_sys_keyctl() parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs parisc/mm: Ensure IRQs are off in switch_mm() tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth thp, mm: fix crash due race in MADV_FREE handling kernel/extable.c: mark core_kernel_text notrace mm/list_lru.c: fix list_lru_count_node() to be race free fs/dcache.c: fix spin lockup issue on nlru->lock checkpatch: silence perl 5.26.0 unescaped left brace warnings binfmt_elf: use ELF_ET_DYN_BASE only for PIE arm: move ELF_ET_DYN_BASE to 4MB arm64: move ELF_ET_DYN_BASE to 4GB / 4MB powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB s390: reduce ELF_ET_DYN_BASE exec: Limit arg stack to at most 75% of _STK_LIM ARM64: dts: marvell: armada37xx: Fix timer interrupt specifiers vt: fix unchecked __put_user() in tioclinux ioctls rcu: Add memory barriers for NOCB leader wakeup nvmem: core: fix leaks on registration errors mnt: In umount propagation reparent in a separate pass mnt: In propgate_umount handle visiting mounts in any order mnt: Make propagate_umount less slow for overlapping mount propagation trees selftests/capabilities: Fix the test_execve test mm: fix overflow check in expand_upwards() crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD crypto: atmel - only treat EBUSY as transient if backlog crypto: sha1-ssse3 - Disable avx2 crypto: caam - properly set IV after {en,de}crypt crypto: caam - fix signals handling Revert "sched/core: Optimize SCHED_SMT" sched/fair, cpumask: Export for_each_cpu_wrap() sched/topology: Fix building of overlapping sched-groups sched/topology: Optimize build_group_mask() sched/topology: Fix overlapping sched_group_mask PM / wakeirq: Convert to SRCU PM / QoS: return -EINVAL for bogus strings tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results kvm: vmx: Do not disable intercepts for BNDCFGS kvm: x86: Guest BNDCFGS requires guest MPX support kvm: vmx: Check value written to IA32_BNDCFGS kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS 4.9.39 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-07-21 08:55:50 +02:00
Pavankumar Kondeti	04e002a5f6	tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results commit `c59f29cb14` upstream. The 's' flag is supposed to indicate that a softirq is running. This can be detected by testing the preempt_count with SOFTIRQ_OFFSET. The current code tests the preempt_count with SOFTIRQ_MASK, which would be true even when softirqs are disabled but not serving a softirq. Link: http://lkml.kernel.org/r/1481300417-3564-1-git-send-email-pkondeti@codeaurora.org Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-21 07:42:24 +02:00
Peter Zijlstra	758dc6a8da	sched/topology: Fix overlapping sched_group_mask commit `73bb059f9b` upstream. The point of sched_group_mask is to select those CPUs from sched_group_cpus that can actually arrive at this balance domain. The current code gets it wrong, as can be readily demonstrated with a topology like: node 0 1 2 3 0: 10 20 30 20 1: 20 10 20 30 2: 30 20 10 20 3: 20 30 20 10 Where (for example) domain 1 on CPU1 ends up with a mask that includes CPU0: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 (mask: 1), 2, 0 [] domain 1: span 0-3 level NUMA [] groups: 0-2 (mask: 0-2) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072) This causes sched_balance_cpu() to compute the wrong CPU and consequently should_we_balance() will terminate early resulting in missed load-balance opportunities. The fixed topology looks like: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 (mask: 1), 2, 0 [] domain 1: span 0-3 level NUMA [] groups: 0-2 (mask: 1) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072) (note: this relies on OVERLAP domains to always have children, this is true because the regular topology domains are still here -- this is before degenerate trimming) Debugged-by: Lauro Ramos Venancio <lvenanci@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: `e3589f6c81` ("sched: Allow for overlapping sched_domain spans") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-21 07:42:24 +02:00
Lauro Ramos Venancio	3e165b2322	sched/topology: Optimize build_group_mask() commit `f32d782e31` upstream. The group mask is always used in intersection with the group CPUs. So, when building the group mask, we don't have to care about CPUs that are not part of the group. Signed-off-by: Lauro Ramos Venancio <lvenanci@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: lwang@redhat.com Cc: riel@redhat.com Link: http://lkml.kernel.org/r/1492717903-5195-2-git-send-email-lvenanci@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-21 07:42:23 +02:00
Peter Zijlstra	7c3f08eadc	sched/topology: Fix building of overlapping sched-groups commit `0372dd2736` upstream. When building the overlapping groups, we very obviously should start with the previous domain of _this_ @cpu, not CPU-0. This can be readily demonstrated with a topology like: node 0 1 2 3 0: 10 20 30 20 1: 20 10 20 30 2: 30 20 10 20 3: 20 30 20 10 Where (for example) CPU1 ends up generating the following nonsensical groups: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 2 0 [] domain 1: span 0-3 level NUMA [] groups: 1-3 (cpu_capacity = 3072) 0-1,3 (cpu_capacity = 3072) Where the fact that domain 1 doesn't include a group with span 0-2 is the obvious fail. With patch this looks like: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 0 2 [] domain 1: span 0-3 level NUMA [] groups: 0-2 (cpu_capacity = 3072) 0,2-3 (cpu_capacity = 3072) Debugged-by: Lauro Ramos Venancio <lvenanci@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: `e3589f6c81` ("sched: Allow for overlapping sched_domain spans") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-21 07:42:23 +02:00
Peter Zijlstra	542ebc96c2	sched/fair, cpumask: Export for_each_cpu_wrap() commit c6508a39640b9a27fc2bc10cb708152672c82045 upstream. commit `c743f0a5c5` upstream. More users for for_each_cpu_wrap() have appeared. Promote the construct to generic cpumask interface. The implementation is slightly modified to reduce arguments. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Lauro Ramos Venancio <lvenanci@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: lwang@redhat.com Link: http://lkml.kernel.org/r/20170414122005.o35me2h5nowqkxbv@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-21 07:42:23 +02:00

1 2 3 4 5 ...

23809 Commits