linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-02 03:03:00 +09:00

Author	SHA1	Message	Date
Odin Ugedal	60f0b82677	sched/fair: Fix CFS bandwidth hrtimer expiry type [ Upstream commit `72d0ad7cb5` ] The time remaining until expiry of the refresh_timer can be negative. Casting the type to an unsigned 64-bit value will cause integer underflow, making the runtime_refresh_within return false instead of true. These situations are rare, but they do happen. This does not cause user-facing issues or errors; other than possibly unthrottling cfs_rq's using runtime from the previous period(s), making the CFS bandwidth enforcement less strict in those (special) situations. Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Link: https://lore.kernel.org/r/20210629121452.18429-1-odin@uged.al Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-16 11:33:40 +09:00
Vincent Guittot	1ec0012772	sched/fair: handle case of task_h_load() returning 0 commit `01cfcde9c2` upstream. task_h_load() can return 0 in some situations like running stress-ng mmapfork, which forks thousands of threads, in a sched group on a 224 cores system. The load balance doesn't handle this correctly because env->imbalance never decreases and it will stop pulling tasks only after reaching loop_max, which can be equal to the number of running tasks of the cfs. Make sure that imbalance will be decreased by at least 1. misfit task is the other feature that doesn't handle correctly such situation although it's probably more difficult to face the problem because of the smaller number of CPUs and running tasks on heterogenous system. We can't simply ensure that task_h_load() returns at least one because it would imply to handle underflow in other places. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: <stable@vger.kernel.org> # v4.4+ Link: https://lkml.kernel.org/r/20200710152426.16981-1-vincent.guittot@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-16 08:31:08 +09:00
Jens Axboe	3a641a476e	sched/fair: Don't NUMA balance for kthreads [ Upstream commit `18f855e574` ] Stefano reported a crash with using SQPOLL with io_uring: BUG: kernel NULL pointer dereference, address: 00000000000003b0 CPU: 2 PID: 1307 Comm: io_uring-sq Not tainted 5.7.0-rc7 #11 RIP: 0010:task_numa_work+0x4f/0x2c0 Call Trace: task_work_run+0x68/0xa0 io_sq_thread+0x252/0x3d0 kthread+0xf9/0x130 ret_from_fork+0x35/0x40 which is task_numa_work() oopsing on current->mm being NULL. The task work is queued by task_tick_numa(), which checks if current->mm is NULL at the time of the call. But this state isn't necessarily persistent, if the kthread is using use_mm() to temporarily adopt the mm of a task. Change the task_tick_numa() check to exclude kernel threads in general, as it doesn't make sense to attempt ot balance for kthreads anyway. Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/865de121-8190-5d30-ece5-3b097dc74431@kernel.dk Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-15 17:32:04 +09:00
Xuewei Zhang	9ae6c47d3c	sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision commit `4929a4e6fa` upstream. The quota/period ratio is used to ensure a child task group won't get more bandwidth than the parent task group, and is calculated as: normalized_cfs_quota() = [(quota_us << 20) / period_us] If the quota/period ratio was changed during this scaling due to precision loss, it will cause inconsistency between parent and child task groups. See below example: A userspace container manager (kubelet) does three operations: 1) Create a parent cgroup, set quota to 1,000us and period to 10,000us. 2) Create a few children cgroups. 3) Set quota to 1,000us and period to 10,000us on a child cgroup. These operations are expected to succeed. However, if the scaling of 147/128 happens before step 3, quota and period of the parent cgroup will be changed: new_quota: 1148437ns, 1148us new_period: 11484375ns, 11484us And when step 3 comes in, the ratio of the child cgroup will be 104857, which will be larger than the parent cgroup ratio (104821), and will fail. Scaling them by a factor of 2 will fix the problem. Tested-by: Phil Auld <pauld@redhat.com> Signed-off-by: Xuewei Zhang <xueweiz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Phil Auld <pauld@redhat.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Fixes: `2e8e192263` ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup") Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-15 16:10:08 +09:00
Valentin Schneider	f18fe54d12	sched/fair: Don't increase sd->balance_interval on newidle balance [ Upstream commit `3f130a37c4` ] When load_balance() fails to move some load because of task affinity, we end up increasing sd->balance_interval to delay the next periodic balance in the hopes that next time we look, that annoying pinned task(s) will be gone. However, idle_balance() pays no attention to sd->balance_interval, yet it will still lead to an increase in balance_interval in case of pinned tasks. If we're going through several newidle balances (e.g. we have a periodic task), this can lead to a huge increase of the balance_interval in a very small amount of time. To prevent that, don't increase the balance interval when going through a newidle balance. This is a similar approach to what is done in commit `58b26c4c02` ("sched: Increment cache_nice_tries only on periodic lb"), where we disregard newidle balance and rely on periodic balance for more stable results. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar.Eggemann@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: patrick.bellasi@arm.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1537974727-30788-2-git-send-email-valentin.schneider@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-15 15:18:39 +09:00
Vincent Guittot	9fb9eaa490	sched/fair: Fix imbalance due to CPU affinity [ Upstream commit `f6cad8df6b` ] The load_balance() has a dedicated mecanism to detect when an imbalance is due to CPU affinity and must be handled at parent level. In this case, the imbalance field of the parent's sched_group is set. The description of sg_imbalanced() gives a typical example of two groups of 4 CPUs each and 4 tasks each with a cpumask covering 1 CPU of the first group and 3 CPUs of the second group. Something like: { 0 1 2 3 } { 4 5 6 7 } * * * * But the load_balance fails to fix this UC on my octo cores system made of 2 clusters of quad cores. Whereas the load_balance is able to detect that the imbalanced is due to CPU affinity, it fails to fix it because the imbalance field is cleared before letting parent level a chance to run. In fact, when the imbalance is detected, the load_balance reruns without the CPU with pinned tasks. But there is no other running tasks in the situation described above and everything looks balanced this time so the imbalance field is immediately cleared. The imbalance field should not be cleared if there is no other task to move when the imbalance is detected. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1561996022-28829-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-15 14:32:59 +09:00
Liangyan	a199ce2af3	sched/fair: Don't assign runtime for throttled cfs_rq commit `5e2d2cc258` upstream. do_sched_cfs_period_timer() will refill cfs_b runtime and call distribute_cfs_runtime to unthrottle cfs_rq, sometimes cfs_b->runtime will allocate all quota to one cfs_rq incorrectly, then other cfs_rqs attached to this cfs_b can't get runtime and will be throttled. We find that one throttled cfs_rq has non-negative cfs_rq->runtime_remaining and cause an unexpetced cast from s64 to u64 in snippet: distribute_cfs_runtime() { runtime = -cfs_rq->runtime_remaining + 1; } The runtime here will change to a large number and consume all cfs_b->runtime in this cfs_b period. According to Ben Segall, the throttled cfs_rq can have account_cfs_rq_runtime called on it because it is throttled before idle_balance, and the idle_balance calls update_rq_clock to add time that is accounted to the task. This commit prevents cfs_rq to be assgined new runtime if it has been throttled until that distribute_cfs_runtime is called. Signed-off-by: Liangyan <liangyan.peng@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: shanpeic@linux.alibaba.com Cc: stable@vger.kernel.org Cc: xlpang@linux.alibaba.com Fixes: `d3d9dc3302` ("sched: Throttle entities exceeding their allowed bandwidth") Link: https://lkml.kernel.org/r/20190826121633.6538-1-liangyan.peng@linux.alibaba.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-15 14:29:00 +09:00
Jann Horn	9ebc8cd5f7	sched/fair: Don't free p->numa_faults with concurrent readers commit `16d51a590a` upstream. When going through execve(), zero out the NUMA fault statistics instead of freeing them. During execve, the task is reachable through procfs and the scheduler. A concurrent /proc/*/sched reader can read data from a freed ->numa_faults allocation (confirmed by KASAN) and write it back to userspace. I believe that it would also be possible for a use-after-free read to occur through a race between a NUMA fault and execve(): task_numa_fault() can lead to task_numa_compare(), which invokes task_weight() on the currently running task of a different CPU. Another way to fix this would be to make ->numa_faults RCU-managed or add extra locking, but it seems easier to wipe the NUMA fault statistics on execve. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Fixes: `82727018b0` ("sched/numa: Call task_numa_free() from do_execve()") Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-15 14:08:49 +09:00
Xie XiuQi	a67712d343	sched/numa: Fix a possible divide-by-zero commit `a860fa7b96` upstream. sched_clock_cpu() may not be consistent between CPUs. If a task migrates to another CPU, then se.exec_start is set to that CPU's rq_clock_task() by update_stats_curr_start(). Specifically, the new value might be before the old value due to clock skew. So then if in numa_get_avg_runtime() the expression: 'now - p->last_task_numa_placement' ends up as -1, then the divider '*period + 1' in task_numa_placement() is 0 and things go bang. Similar to update_curr(), check if time goes backwards to avoid this. [ peterz: Wrote new changelog. ] [ mingo: Tweaked the code comment. ] Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: cj.chengjian@huawei.com Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20190425080016.GX11158@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-15 12:35:05 +09:00
Phil Auld	8023469412	sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup [ Upstream commit `2e8e192263` ] With extremely short cfs_period_us setting on a parent task group with a large number of children the for loop in sched_cfs_period_timer() can run until the watchdog fires. There is no guarantee that the call to hrtimer_forward_now() will ever return 0. The large number of children can make do_sched_cfs_period_timer() take longer than the period. NMI watchdog: Watchdog detected hard LOCKUP on cpu 24 RIP: 0010:tg_nop+0x0/0x10 <IRQ> walk_tg_tree_from+0x29/0xb0 unthrottle_cfs_rq+0xe0/0x1a0 distribute_cfs_runtime+0xd3/0xf0 sched_cfs_period_timer+0xcb/0x160 ? sched_cfs_slack_timer+0xd0/0xd0 __hrtimer_run_queues+0xfb/0x270 hrtimer_interrupt+0x122/0x270 smp_apic_timer_interrupt+0x6a/0x140 apic_timer_interrupt+0xf/0x20 </IRQ> To prevent this we add protection to the loop that detects when the loop has run too many times and scales the period and quota up, proportionally, so that the timer can complete before then next period expires. This preserves the relative runtime quota while preventing the hard lockup. A warning is issued reporting this state and the new values. Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-15 12:28:05 +09:00
Mel Gorman	fd57ab642c	sched/fair: Do not re-read ->h_load_next during hierarchical load calculation commit `0e9f02450d` upstream. A NULL pointer dereference bug was reported on a distribution kernel but the same issue should be present on mainline kernel. It occured on s390 but should not be arch-specific. A partial oops looks like: Unable to handle kernel pointer dereference in virtual kernel address space ... Call Trace: ... try_to_wake_up+0xfc/0x450 vhost_poll_wakeup+0x3a/0x50 [vhost] __wake_up_common+0xbc/0x178 __wake_up_common_lock+0x9e/0x160 __wake_up_sync_key+0x4e/0x60 sock_def_readable+0x5e/0x98 The bug hits any time between 1 hour to 3 days. The dereference occurs in update_cfs_rq_h_load when accumulating h_load. The problem is that cfq_rq->h_load_next is not protected by any locking and can be updated by parallel calls to task_h_load. Depending on the compiler, code may be generated that re-reads cfq_rq->h_load_next after the check for NULL and then oops when reading se->avg.load_avg. The dissassembly showed that it was possible to reread h_load_next after the check for NULL. While this does not appear to be an issue for later compilers, it's still an accident if the correct code is generated. Full locking in this path would have high overhead so this patch uses READ_ONCE to read h_load_next only once and check for NULL before dereferencing. It was confirmed that there were no further oops after 10 days of testing. As Peter pointed out, it is also necessary to use WRITE_ONCE() to avoid any potential problems with store tearing. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Fixes: `685207963b` ("sched: Move h_load calculation to task_h_load()") Link: https://lkml.kernel.org/r/20190319123610.nsivgf3mjbjjesxb@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-15 12:22:49 +09:00
Song Muchun	b9cc54e80d	sched/fair: Fix the min_vruntime update logic in dequeue_entity() [ Upstream commit `9845c49cc9` ] The comment and the code around the update_min_vruntime() call in dequeue_entity() are not in agreement. >From commit: `b60205c7c5` ("sched/fair: Fix min_vruntime tracking") I think that we want to update min_vruntime when a task is sleeping/migrating. So, the check is inverted there - fix it. Signed-off-by: Song Muchun <smuchun@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `b60205c7c5` ("sched/fair: Fix min_vruntime tracking") Link: http://lkml.kernel.org/r/20181014112612.2614-1-smuchun@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-15 09:19:12 +09:00
Phil Auld	d296f53c35	sched/fair: Fix throttle_list starvation with low CFS quota commit `baa9be4ffb` upstream. With a very low cpu.cfs_quota_us setting, such as the minimum of 1000, distribute_cfs_runtime may not empty the throttled_list before it runs out of runtime to distribute. In that case, due to the change from `c06f04c704` to put throttled entries at the head of the list, later entries on the list will starve. Essentially, the same X processes will get pulled off the list, given CPU time and then, when expired, get put back on the head of the list where distribute_cfs_runtime will give runtime to the same set of processes leaving the rest. Fix the issue by setting a bit in struct cfs_bandwidth when distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can decide to put the throttled entry on the tail or the head of the list. The bit is set/cleared by the callers of distribute_cfs_runtime while they hold cfs_bandwidth->lock. This is easy to reproduce with a handful of CPU consumers. I use 'crash' on the live system. In some cases you can simply look at the throttled list and see the later entries are not changing: crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining \| paste - - \| awk '{print $1" "$4}' \| pr -t -n3 1 ffff90b56cb2d200 -976050 2 ffff90b56cb2cc00 -484925 3 ffff90b56cb2bc00 -658814 4 ffff90b56cb2ba00 -275365 5 ffff90b166a45600 -135138 6 ffff90b56cb2da00 -282505 7 ffff90b56cb2e000 -148065 8 ffff90b56cb2fa00 -872591 9 ffff90b56cb2c000 -84687 10 ffff90b56cb2f000 -87237 11 ffff90b166a40a00 -164582 crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining \| paste - - \| awk '{print $1" "$4}' \| pr -t -n3 1 ffff90b56cb2d200 -994147 2 ffff90b56cb2cc00 -306051 3 ffff90b56cb2bc00 -961321 4 ffff90b56cb2ba00 -24490 5 ffff90b166a45600 -135138 6 ffff90b56cb2da00 -282505 7 ffff90b56cb2e000 -148065 8 ffff90b56cb2fa00 -872591 9 ffff90b56cb2c000 -84687 10 ffff90b56cb2f000 -87237 11 ffff90b166a40a00 -164582 Sometimes it is easier to see by finding a process getting starved and looking at the sched_info: crash> task ffff8eb765994500 sched_info PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest" sched_info = { pcount = 8, run_delay = 697094208, last_arrival = 240260125039, last_queued = 240260327513 }, crash> task ffff8eb765994500 sched_info PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest" sched_info = { pcount = 8, run_delay = 697094208, last_arrival = 240260125039, last_queued = 240260327513 }, Signed-off-by: Phil Auld <pauld@redhat.com> Reviewed-by: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: `c06f04c704` ("sched: Fix potential near-infinite distribute_cfs_runtime() loop") Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csb Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-15 09:15:24 +09:00
Steve Muckle	5282278601	sched/fair: Fix vruntime_normalized() for remote non-migration wakeup commit `d0cdb3ce88` upstream. When a task which previously ran on a given CPU is remotely queued to wake up on that same CPU, there is a period where the task's state is TASK_WAKING and its vruntime is not normalized. This is not accounted for in vruntime_normalized() which will cause an error in the task's vruntime if it is switched from the fair class during this time. For example if it is boosted to RT priority via rt_mutex_setprio(), rq->min_vruntime will not be subtracted from the task's vruntime but it will be added again when the task returns to the fair class. The task's vruntime will have been erroneously doubled and the effective priority of the task will be reduced. Note this will also lead to inflation of all vruntimes since the doubled vruntime value will become the rq's min_vruntime when other tasks leave the rq. This leads to repeated doubling of the vruntime and priority penalty. Fix this by recognizing a WAKING task's vruntime as normalized only if sched_remote_wakeup is true. This indicates a migration, in which case the vruntime would have been normalized in migrate_task_rq_fair(). Based on a similar patch from John Dias <joaodias@google.com>. Suggested-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Steve Muckle <smuckle@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chris Redpath <Chris.Redpath@arm.com> Cc: John Dias <joaodias@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miguel de Dios <migueldedios@google.com> Cc: Morten Rasmussen <Morten.Rasmussen@arm.com> Cc: Patrick Bellasi <Patrick.Bellasi@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: kernel-team@android.com Fixes: `b5179ac70d` ("sched/fair: Prepare to fix fairness problems on migration") Link: http://lkml.kernel.org/r/20180831224217.169476-1-smuckle@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-15 08:25:26 +09:00
Victor Wan	cc7b1eac54	Merge branch 'android-4.9' into amlogic-4.9-dev Signed-off-by: Victor Wan <victor.wan@amlogic.com> Conflicts: drivers/md/dm-bufio.c drivers/media/dvb-core/dvb_frontend.c drivers/usb/dwc3/core.c drivers/usb/gadget/function/f_fs.c	2018-08-07 14:43:24 +08:00
Patrick Bellasi	c964a2ba92	ANDROID: sched/fair: cosmetics Change-Id: Ibbefdf1cdb4d72c7075c7b3edf6789630475a8e2 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2018-06-07 13:44:47 +00:00
Joel Fernandes	7fd40752c3	UPSTREAM: sched/fair: Consider RT/IRQ pressure in capacity_spare_wake capacity_spare_wake in the slow path influences choice of idlest groups, as we search for groups with maximum spare capacity. In scenarios where RT pressure is high, a sub optimal group can be chosen and hurt performance of the task being woken up. Several tests with results are included below to show improvements with this change. 1) Hackbench on Pixel 2 Android device (4x4 ARM64 Octa core) ------------------------------------------------------------ Here we have RT activity running on big CPU cluster induced with rt-app, and running hackbench in parallel. The RT tasks are bound to 4 CPUs on the big cluster (cpu 4,5,6,7) and have 100ms periodicity with runtime=20ms sleep=80ms. Hackbench shows big benefit (30%) improvement when number of tasks is 8 and 32: Note: data is completion time in seconds (lower is better). Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000. +--------+-----+-------+-------------------+---------------------------+ \| groups \| fds \| tasks \| Without Patch \| With Patch \| +--------+-----+-------+---------+---------+-----------------+---------+ \| \| \| \| Mean \| Stdev \| Mean \| Stdev \| \| \| \| +-------------------+-----------------+---------+ \| 1 \| 8 \| 8 \| 1.0534 \| 0.13722 \| 0.7293 (+30.7%) \| 0.02653 \| \| 2 \| 8 \| 16 \| 1.6219 \| 0.16631 \| 1.6391 (-1%) \| 0.24001 \| \| 4 \| 8 \| 32 \| 1.2538 \| 0.13086 \| 1.1080 (+11.6%) \| 0.16201 \| +--------+-----+-------+---------+---------+-----------------+---------+ 2) Rohit ran barrier.c test (details below) with following improvements: ------------------------------------------------------------------------ This was Rohit's original use case for a patch he posted at [1] however from his recent tests he showed my patch can replace his slow path changes [1] and there's no need to selectively scan/skip CPUs in find_idlest_group_cpu in the slow path to get the improvement he sees. barrier.c (open_mp code) as a micro-benchmark. It does a number of iterations and barrier sync at the end of each for loop. Here barrier,c is running in along with ping on CPU 0 and 1 as: 'ping -l 10000 -q -s 10 -f hostX' barrier.c can be found at: http://www.spinics.net/lists/kernel/msg2506955.html Following are the results for the iterations per second with this micro-benchmark (higher is better), on a 44 core, 2 socket 88 Threads Intel x86 machine: +--------+------------------+---------------------------+ \|Threads \| Without patch \| With patch \| \| \| \| \| +--------+--------+---------+-----------------+---------+ \| \| Mean \| Std Dev \| Mean \| Std Dev \| +--------+--------+---------+-----------------+---------+ \|1 \| 539.36 \| 60.16 \| 572.54 (+6.15%) \| 40.95 \| \|2 \| 481.01 \| 19.32 \| 530.64 (+10.32%)\| 56.16 \| \|4 \| 474.78 \| 22.28 \| 479.46 (+0.99%) \| 18.89 \| \|8 \| 450.06 \| 24.91 \| 447.82 (-0.50%) \| 12.36 \| \|16 \| 436.99 \| 22.57 \| 441.88 (+1.12%) \| 7.39 \| \|32 \| 388.28 \| 55.59 \| 429.4 (+10.59%)\| 31.14 \| \|64 \| 314.62 \| 6.33 \| 311.81 (-0.89%) \| 11.99 \| +--------+--------+---------+-----------------+---------+ 3) ping+hackbench test on bare-metal sever (Rohit ran this test) ---------------------------------------------------------------- Here hackbench is running in threaded mode along with, running ping on CPU 0 and 1 as: 'ping -l 10000 -q -s 10 -f hostX' This test is running on 2 socket, 20 core and 40 threads Intel x86 machine: Number of loops is 10000 and runtime is in seconds (Lower is better). +--------------+-----------------+--------------------------+ \|Task Groups \| Without patch \| With patch \| \| +-------+---------+----------------+---------+ \|(Groups of 40)\| Mean \| Std Dev \| Mean \| Std Dev \| +--------------+-------+---------+----------------+---------+ \|1 \| 0.851 \| 0.007 \| 0.828 (+2.77%)\| 0.032 \| \|2 \| 1.083 \| 0.203 \| 1.087 (-0.37%)\| 0.246 \| \|4 \| 1.601 \| 0.051 \| 1.611 (-0.62%)\| 0.055 \| \|8 \| 2.837 \| 0.060 \| 2.827 (+0.35%)\| 0.031 \| \|16 \| 5.139 \| 0.133 \| 5.107 (+0.63%)\| 0.085 \| \|25 \| 7.569 \| 0.142 \| 7.503 (+0.88%)\| 0.143 \| +--------------+-------+---------+----------------+---------+ [1] https://patchwork.kernel.org/patch/9991635/ Matt Fleming also ran cyclictest and several different hackbench tests on his test machines to santiy-check that the patch doesn't harm any of his usecases. Change-Id: I1de1e40a3a92a54a9adc1303ccc98104704c1e35 Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Morten Ramussen <morten.rasmussen@arm.com> Cc: Brendan Jackman <brendan.jackman@arm.com> Tested-by: Rohit Jain <rohit.k.jain@oracle.com> Tested-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Joel Fernandes <joelaf@google.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>	2018-05-29 09:00:30 -07:00
Victor Wan	324524de04	Merge branch 'android-4.9' into amlogic-4.9-dev	2018-05-22 10:48:42 +08:00
Victor Wan	810c6dd972	Merge branch 'android-4.9' into amlogic-4.9-dev Signed-off-by: Victor Wan <victor.wan@amlogic.com> Conflicts: arch/arm/configs/bcm2835_defconfig arch/arm/configs/sunxi_defconfig include/linux/cpufreq.h init/main.c	2018-04-24 17:43:19 +08:00
Greg Kroah-Hartman	8683408f8e	Merge 4.9.94 into android-4.9 Changes in 4.9.94 qed: Fix overriding of supported autoneg value. cfg80211: make RATE_INFO_BW_20 the default md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock rtc: snvs: fix an incorrect check of return value x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic() x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility ovl: persistent inode numbers for upper hardlinks NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION x86/boot: Declare error() as noreturn IB/srpt: Fix abort handling IB/srpt: Avoid that aborting a command triggers a kernel warning af_key: Fix slab-out-of-bounds in pfkey_compile_policy. mac80211: bail out from prep_connection() if a reconfig is ongoing bna: Avoid reading past end of buffer qlge: Avoid reading past end of buffer ubi: fastmap: Fix slab corruption ipmi_ssif: unlock on allocation failure net: cdc_ncm: Fix TX zero padding net: ethernet: ti: cpsw: adjust cpsw fifos depth for fullduplex flow control lockd: fix lockd shutdown race drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow tests pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid() s390: move _text symbol to address higher than zero net/mlx4_en: Avoid adding steering rules with invalid ring qed: Correct doorbell configuration for !4Kb pages NFSv4.1: Work around a Linux server bug... CIFS: silence lockdep splat in cifs_relock_file() perf/callchain: Force USER_DS when invoking perf_callchain_user() blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op net: qca_spi: Fix alignment issues in rx path netxen_nic: set rcode to the return status from the call to netxen_issue_cmd mdio: mux: Correct mdio_mux_init error path issues Input: elan_i2c - check if device is there before really probing Input: elantech - force relative mode on a certain module KVM: PPC: Book3S PR: Check copy_to/from_user return values irqchip/mbigen: Fix the clear register offset calculation vmxnet3: ensure that adapter is in proper state during force_close mm, vmstat: Remove spurious WARN() during zoneinfo print SMB2: Fix share type handling bus: brcmstb_gisb: Use register offsets with writes too bus: brcmstb_gisb: correct support for 64-bit address output PowerCap: Fix an error code in powercap_register_zone() iio: pressure: zpa2326: report interrupted case as failure ARM: dts: imx53-qsrb: Pulldown PMIC IRQ pin staging: wlan-ng: prism2mgmt.c: fixed a double endian conversion before calling hfa384x_drvr_setconfig16, also fixes relative sparse warning clk: renesas: rcar-gen2: Fix PLL0 on R-Car V2H and E2 x86/tsc: Provide 'tsc=unstable' boot parameter powerpc/modules: If mprofile-kernel is enabled add it to vermagic ARM: dts: imx6qdl-wandboard: Fix audio channel swap i2c: mux: reg: put away the parent i2c adapter on probe failure arm64: perf: Ignore exclude_hv when kernel is running in HYP mdio: mux: fix device_node_continue.cocci warnings ipv6: avoid dad-failures for addresses with NODAD async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome() KVM: arm: Restore banked registers and physical timer access on hyp_panic() KVM: arm64: Restore host physical timer access on hyp_panic() usb: dwc3: keystone: check return value btrfs: fix incorrect error return ret being passed to mapping_set_error ata: libahci: properly propagate return value of platform_get_irq() ipmr: vrf: Find VIFs using the actual device uio: fix incorrect memory leak cleanup neighbour: update neigh timestamps iff update is effective arp: honour gratuitous ARP _replies_ ARM: dts: rockchip: fix rk322x i2s1 pinctrl error usb: chipidea: properly handle host or gadget initialization failure pxa_camera: fix module remove codepath for v4l2 clock USB: ene_usb6250: fix first command execution net: x25: fix one potential use-after-free issue USB: ene_usb6250: fix SCSI residue overwriting serial: 8250: omap: Disable DMA for console UART serial: sh-sci: Fix race condition causing garbage during shutdown net/wan/fsl_ucc_hdlc: fix unitialized variable warnings net/wan/fsl_ucc_hdlc: fix incorrect memory allocation fsl/qe: add bit description for SYNL register for GUMR sh_eth: Use platform device for printing before register_netdev() mlxsw: spectrum: Avoid possible NULL pointer dereference scsi: csiostor: fix use after free in csio_hw_use_fwconfig() powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash ath5k: fix memory leak on buf on failed eeprom read selftests/powerpc: Fix TM resched DSCR test with some compilers xfrm: fix state migration copy replay sequence numbers ASoC: simple-card: fix mic jack initialization iio: hi8435: avoid garbage event at first enable iio: hi8435: cleanup reset gpio iio: light: rpr0521 poweroff for probe fails ext4: handle the rest of ext4_mb_load_buddy() ENOMEM errors md-cluster: fix potential lock issue in add_new_disk ARM: davinci: da8xx: Create DSP device only when assigned memory ray_cs: Avoid reading past end of buffer net/wan/fsl_ucc_hdlc: fix muram allocation error leds: pca955x: Correct I2C Functionality perf/core: Fix error handling in perf_event_alloc() sched/numa: Use down_read_trylock() for the mmap_sem gpio: crystalcove: Do not write regular gpio registers for virtual GPIOs net/mlx5: Tolerate irq_set_affinity_hint() failures selinux: do not check open permission on sockets block: fix an error code in add_partition() mlx5: fix bug reading rss_hash_type from CQE net: ieee802154: fix net_device reference release too early libceph: NULL deref on crush_decode() error path perf report: Fix off-by-one for non-activation frames netfilter: ctnetlink: fix incorrect nf_ct_put during hash resize pNFS/flexfiles: missing error code in ff_layout_alloc_lseg() ASoC: rsnd: SSI PIO adjust to 24bit mode scsi: bnx2fc: fix race condition in bnx2fc_get_host_stats() fix race in drivers/char/random.c:get_reg() ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff() ARM64: PCI: Fix struct acpi_pci_root_ops allocation failure path tcp: better validation of received ack sequences net: move somaxconn init from sysctl code Input: elan_i2c - clear INT before resetting controller bonding: Don't update slave->link until ready to commit cpuhotplug: Link lock stacks for hotplug callbacks PCI/msi: fix the pci_alloc_irq_vectors_affinity stub KVM: X86: Fix preempt the preemption timer cancel KVM: nVMX: Fix handling of lmsw instruction net: llc: add lock_sock in llc_ui_bind to avoid a race condition drm/msm: Take the mutex before calling msm_gem_new_impl i40iw: Fix sequence number for the first partial FPDU i40iw: Correct Q1/XF object count equation ARM: dts: ls1021a: add "fsl,ls1021a-esdhc" compatible string to esdhc node thermal: power_allocator: fix one race condition issue for thermal_instances list perf probe: Add warning message if there is unexpected event name l2tp: fix missing print session offset info rds; Reset rs->rs_bound_addr in rds_add_bound() failure path ACPI / video: Default lcd_only to true on Win8-ready and newer machines net/mlx4_en: Change default QoS settings VFS: close race between getcwd() and d_move() PM / devfreq: Fix potential NULL pointer dereference in governor_store hwmon: (ina2xx) Make calibration register value fixed media: videobuf2-core: don't go out of the buffer range ASoC: Intel: Skylake: Disable clock gating during firmware and library download ASoC: Intel: cht_bsw_rt5645: Analog Mic support scsi: libiscsi: Allow sd_shutdown on bad transport scsi: mpt3sas: Proper handling of set/clear of "ATA command pending" flag. irqchip/gic-v3: Fix the driver probe() fail due to disabled GICC entry ACPI: EC: Fix debugfs_create_*() usage mac80211: Fix setting TX power on monitor interfaces vfb: fix video mode and line_length being set when loaded gpio: label descriptors using the device name IB/rdmavt: Allocate CQ memory on the correct node blk-mq: fix race between updating nr_hw_queues and switching io sched backlight: tdo24m: Fix the SPI CS between transfers pinctrl: baytrail: Enable glitch filter for GPIOs used as interrupts ASoC: Intel: sst: Fix the return value of 'sst_send_byte_stream_mrfld()' rt2x00: do not pause queue unconditionally on error path wl1251: check return from call to wl1251_acx_arp_ip_filter hdlcdrv: Fix divide by zero in hdlcdrv_ioctl x86/efi: Disable runtime services on kexec kernel if booted with efi=old_map netfilter: conntrack: don't call iter for non-confirmed conntracks HID: i2c: Call acpi_device_fix_up_power for ACPI-enumerated devices ovl: filter trusted xattr for non-admin powerpc/[booke\|4xx]: Don't clobber TCR[WP] when setting TCR[DIE] dmaengine: imx-sdma: Handle return value of clk_prepare_enable backlight: Report error on failure arm64: futex: Fix undefined behaviour with FUTEX_OP_OPARG_SHIFT usage net/mlx5: avoid build warning for uniprocessor cxgb4: FW upgrade fixes cxgb4: Fix netdev_features flag rtc: m41t80: fix SQW dividers override when setting a date i40evf: fix merge error in older patch rtc: opal: Handle disabled TPO in opal_get_tpo_time() rtc: interface: Validate alarm-time before handling rollover SUNRPC: ensure correct error is reported by xs_tcp_setup_socket() net: freescale: fix potential null pointer dereference clk: at91: fix clk-generated parenting drm/sun4i: Ignore the generic connectors for components dt-bindings: display: sun4i: Add allwinner,tcon-channel property mtd: nand: gpmi: Fix gpmi_nand_init() error path mtd: nand: check ecc->total sanity in nand_scan_tail KVM: SVM: do not zero out segment attributes if segment is unusable or not present clk: scpi: fix return type of __scpi_dvfs_round_rate clk: Fix __set_clk_rates error print-string powerpc/spufs: Fix coredump of SPU contexts drm/amdkfd: NULL dereference involving create_process() ath10k: add BMI parameters to fix calibration from DT/pre-cal perf trace: Add mmap alias for s390 qlcnic: Fix a sleep-in-atomic bug in qlcnic_82xx_hw_write_wx_2M and qlcnic_82xx_hw_read_wx_2M arm64: kernel: restrict /dev/mem read() calls to linear region mISDN: Fix a sleep-in-atomic bug net: phy: micrel: Restore led_mode and clk_sel on resume RDMA/iw_cxgb4: Avoid touch after free error in ARP failure handlers RDMA/hfi1: fix array termination by appending NULL to attr array drm/omap: fix tiled buffer stride calculations powerpc/8xx: fix mpc8xx_get_irq() return on no irq cxgb4: fix incorrect cim_la output for T6 Fix serial console on SNI RM400 machines bio-integrity: Do not allocate integrity context for bio w/o data ip6_tunnel: fix traffic class routing for tunnels skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow macsec: check return value of skb_to_sgvec always sit: reload iphdr in ipip6_rcv net/mlx4: Fix the check in attaching steering rules net/mlx4: Check if Granular QoS per VF has been enabled before updating QP qos_vport perf header: Set proper module name when build-id event found perf report: Ensure the perf DSO mapping matches what libdw sees iwlwifi: mvm: fix firmware debug restart recording watchdog: f71808e_wdt: Add F71868 support iwlwifi: mvm: Fix command queue number on d0i3 flow iwlwifi: tt: move ucode_loaded check under mutex iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3 iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265 tags: honor COMPILED_SOURCE with apart output directory ARM: dts: qcom: ipq4019: fix i2c_0 node e1000e: fix race condition around skb_tstamp_tx() igb: fix race condition with PTP_TX_IN_PROGRESS bits cxl: Unlock on error in probe cx25840: fix unchecked return values mceusb: sporadic RX truncation corruption fix net: phy: avoid genphy_aneg_done() for PHYs without clause 22 support ARM: imx: Add MXC_CPU_IMX6ULL and cpu_is_imx6ull nvme-pci: fix multiple ctrl removal scheduling nvme: fix hang in remove path KVM: nVMX: Update vmcs12->guest_linear_address on nested VM-exit e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails perf/core: Correct event creation with PERF_FORMAT_GROUP sched/deadline: Use the revised wakeup rule for suspending constrained dl tasks MIPS: mm: fixed mappings: correct initialisation MIPS: mm: adjust PKMAP location MIPS: kprobes: flush_insn_slot should flush only if probe initialised ARM: dts: armadillo800eva: Split LCD mux and gpio Fix loop device flush before configure v3 net: emac: fix reset timeout with AR8035 phy perf tools: Decompress kernel module when reading DSO data perf tests: Decompress kernel module before objdump skbuff: only inherit relevant tx_flags xen: avoid type warning in xchg_xen_ulong X.509: Fix error code in x509_cert_parse() pinctrl: meson-gxbb: remove non-existing pin GPIOX_22 coresight: Fix reference count for software sources coresight: tmc: Configure DMA mask appropriately stmmac: fix ptp header for GMAC3 hw timestamp geneve: add missing rx stats accounting crypto: omap-sham - buffer handling fixes for hashing later crypto: omap-sham - fix closing of hash with separate finalize call bnx2x: Allow vfs to disable txvlan offload sctp: fix recursive locking warning in sctp_do_peeloff net: fec: Add a fec_enet_clear_ethtool_stats() stub for CONFIG_M5272 sparc64: ldc abort during vds iso boot iio: magnetometer: st_magn_spi: fix spi_device_id table net: ena: fix rare uncompleted admin command false alarm net: ena: fix race condition between submit and completion admin command net: ena: add missing return when ena_com_get_io_handlers() fails net: ena: add missing unmap bars on device removal net: ena: disable admin msix while working in polling mode clk: meson: meson8b: add compatibles for Meson8 and Meson8m2 Bluetooth: Send HCI Set Event Mask Page 2 command only when needed cpuidle: dt: Add missing 'of_node_put()' ACPICA: OSL: Add support to exclude stdarg.h ACPICA: Events: Add runtime stub support for event APIs ACPICA: Disassembler: Abort on an invalid/unknown AML opcode s390/dasd: fix hanging safe offline vxlan: dont migrate permanent fdb entries during learn hsr: fix incorrect warning selftests: kselftest_harness: Fix compile warning drm/vc4: Fix resource leak in 'vc4_get_hang_state_ioctl()' in error handling path bcache: stop writeback thread after detaching bcache: segregate flash only volume write streams scsi: libsas: fix memory leak in sas_smp_get_phy_events() scsi: libsas: fix error when getting phy events scsi: libsas: initialize sas_phy status according to response of DISCOVER blk-mq: fix kernel oops in blk_mq_tag_idle() tty: n_gsm: Allow ADM response in addition to UA for control dlci EDAC, mv64x60: Fix an error handling path cxgb4vf: Fix SGE FL buffer initialization logic for 64K pages sdhci: Advertise 2.0v supply on SDIO host controller Input: goodix - disable IRQs while suspended mtd: mtd_oobtest: Handle bitflips during reads perf tools: Fix copyfile_offset update of output offset ipsec: check return value of skb_to_sgvec always rxrpc: check return value of skb_to_sgvec always virtio_net: check return value of skb_to_sgvec always virtio_net: check return value of skb_to_sgvec in one more location random: use lockless method of accessing and updating f->reg_idx clk: at91: fix clk-generated compilation arp: fix arp_filter on l3slave devices ipv6: the entire IPv6 header chain must fit the first fragment net: fix possible out-of-bound read in skb_network_protocol() net/ipv6: Fix route leaking between VRFs net/ipv6: Increment OUTxxx counters after netfilter hook netlink: make sure nladdr has correct size in netlink_connect() net/sched: fix NULL dereference in the error path of tcf_bpf_init() pptp: remove a buggy dst release in pptp_connect() r8169: fix setting driver_data after register_netdev sctp: do not leak kernel memory to user space sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6 sky2: Increase D3 delay to sky2 stops working after suspend vhost: correctly remove wait queue during poll failure vlan: also check phy_driver ts_info for vlan's real device bonding: fix the err path for dev hwaddr sync in bond_enslave bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave bonding: process the err returned by dev_set_allmulti properly in bond_enslave net: fool proof dev_valid_name() ip_tunnel: better validate user provided tunnel names ipv6: sit: better validate user provided tunnel names ip6_gre: better validate user provided tunnel names ip6_tunnel: better validate user provided tunnel names vti6: better validate user provided tunnel names net/mlx5e: Sync netdev vxlan ports at open net/sched: fix NULL dereference in the error path of tunnel_key_init() net/sched: fix NULL dereference on the error path of tcf_skbmod_init() net/mlx4_en: Fix mixed PFC and Global pause user control requests vhost: validate log when IOTLB is enabled route: check sysctl_fib_multipath_use_neigh earlier than hash team: move dev_mc_sync after master_upper_dev_link in team_port_add vhost_net: add missing lock nesting notation net/mlx4_core: Fix memory leak while delete slave's resources strparser: Fix sign of err codes net sched actions: fix dumping which requires several messages to user space vrf: Fix use after free and double free in vrf_finish_output Revert "xhci: plat: Register shutdown for xhci_plat" Linux 4.9.94 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-04-14 15:40:56 +02:00
Vlastimil Babka	a1e7a9e2e3	sched/numa: Use down_read_trylock() for the mmap_sem [ Upstream commit `8655d54977` ] A customer has reported a soft-lockup when running an intensive memory stress test, where the trace on multiple CPU's looks like this: RIP: 0010:[<ffffffff810c53fe>] [<ffffffff810c53fe>] native_queued_spin_lock_slowpath+0x10e/0x190 ... Call Trace: [<ffffffff81182d07>] queued_spin_lock_slowpath+0x7/0xa [<ffffffff811bc331>] change_protection_range+0x3b1/0x930 [<ffffffff811d4be8>] change_prot_numa+0x18/0x30 [<ffffffff810adefe>] task_numa_work+0x1fe/0x310 [<ffffffff81098322>] task_work_run+0x72/0x90 Further investigation showed that the lock contention here is pmd_lock(). The task_numa_work() function makes sure that only one thread is let to perform the work in a single scan period (via cmpxchg), but if there's a thread with mmap_sem locked for writing for several periods, multiple threads in task_numa_work() can build up a convoy waiting for mmap_sem for read and then all get unblocked at once. This patch changes the down_read() to the trylock version, which prevents the build up. For a workload experiencing mmap_sem contention, it's probably better to postpone the NUMA balancing work anyway. This seems to have fixed the soft lockups involving pmd_lock(), which is in line with the convoy theory. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170515131316.21909-1-vbabka@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-04-13 19:48:04 +02:00
Quentin Perret	21bd85cd95	Merge remote-tracking branch 'android-4.9' into 'android-4.9-eas-dev' Fixed merge conflicts caused by `8a174b4749` ("sched/fair: prevent possible infinite loop in sched_group_energy) Change-Id: Iba35631a213b88f6361838f28573af3407568919 Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-02-22 18:04:18 +00:00
Quentin Perret	b5ba569f6f	sched/fair: select the most energy-efficient CPU candidate on wake-up The current implementation of the energy-aware wake-up path relies on find_best_target() to select an ordered list of CPU candidates for task placement. The first candidate of the list saving energy is then chosen, disregarding all the others to avoid the overhead of an expensive energy_diff. With the recent refactoring of select_energy_cpu_idx(), the cost of exploring multiple CPUs has been reduced, hence offering the opportunity to select the most energy-efficient candidate at a lower cost. This commit seizes this opportunity by allowing to change select_energy_cpu_idx()'s behaviour as to ignore the order of CPUs returned by find_best_target() and to pick the best candidate energy-wise. As this functionality is still considered as experimental, it is hidden behind a sched_feature named FBT_STRICT_ORDER (like the equivalent feature in Android 4.14) which defaults to true, hence keeping the current behaviour by default. Change-Id: I0cb833bfec1a4a053eddaff1652c0b6cad554f97 Suggested-by: Patrick Bellasi <patrick.bellasi@arm.com> Suggested-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-02-14 16:21:10 +00:00
Chris Redpath	8a174b4749	sched/fair: prevent possible infinite loop in sched_group_energy There is a race between hotplug and energy_diff which might result in endless loop in sched_group_energy. When this happens, the end condition cannot be detected. We can store how many CPUs we need to visit at the beginning, and bail out of the energy calculation if we visit more cpus than expected. Bug: 72311797 72202633 Change-Id: I8dda75468ee1570da4071cd8165ef5131a8205d8 Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-10 00:45:21 +00:00
Pavankumar Kondeti	de22a05ed0	sched/fair: fix array out of bounds access in select_energy_cpu_idx() We are using an incorrect index while initializing the nrg_delta for the previous CPU in select_energy_cpu_idx(). This initialization it self is not needed as the nrg_delta for the previous CPU is already initialized to 0 while preparing the energ_env struct. Change-Id: Iee4e2c62f904050d2680a0a1df646d4d515c62cc Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2018-02-03 05:29:57 +05:30
Ionela Voinescu	f964120739	sched/fair: use min capacity when evaluating active cpus When we are calculating what the impact of placing a task on a specific cpu is, we should include the information that there might be a minimum capacity imposed upon that cpu which could change the performance and/or energy cost decisions. When choosing an active target CPU, favour CPUs that won't end up running at a high OPP due to a min capacity cap imposed by external actors. Change-Id: Ibc3302304345b63107f172b1fc3ffdabc19aa9d4 Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-01 18:37:35 +00:00
Ionela Voinescu	88a968ca63	sched/fair: use min capacity when evaluating idle backup cpus When we are calculating what the impact of placing a task on a specific cpu is, we should include the information that there might be a minimum capacity imposed upon that cpu which could change the performance and/or energy cost decisions. When choosing an idle backup CPU, favour CPUs that won't end up running at a high OPP due to a min capacity cap imposed by external actors. Change-Id: I566623ffb3a7c5b61a23242dcce1cb4147ef8a4a Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-01 18:37:32 +00:00
Ionela Voinescu	9e1c648c71	sched/fair: use min capacity when evaluating placement energy costs Add the ability to track minimim capacity forced onto a sched_group by some external actor. group_max_util returns the highest utilisation inside a sched_group and is used when we are trying to calculate an energy cost estimate for a specific scheduling scenario. Minimum capacities imposed from elsewhere will influence this energy cost so we should reflect it here. Change-Id: Ibd537a6dbe6d67b11cc9e9be18f40fcb2c0f13de Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-01 18:37:27 +00:00
Ionela Voinescu	58b761f4c0	sched/fair: introduce minimum capacity capping sched feature We have the ability to track minimum capacity forced onto a CPU by userspace or external actors. This is provided though a minimum frequency scale factor exposed by arch_scale_min_freq_capacity. The use of this information is enabled through the MIN_CAPACITY_CAPPING feature. If not enabled, the minimum frequency scale factor will remain 0 and it will not impact energy estimation or scheduling decisions. Change-Id: Ibc61f2bf4fddf186695b72b262e602a6e8bfde37 Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-01 18:37:23 +00:00
Dietmar Eggemann	da6833cff7	sched/fair: introduce an arch scaling function for max frequency capping The max frequency scaling factor is defined as: max_freq_scale = policy_max_freq / cpuinfo_max_freq To be able to scale the cpu capacity by this factor introduce a call to the new arch scaling function arch_scale_max_freq_capacity() in update_cpu_capacity() and provide a default implementation which returns SCHED_CAPACITY_SCALE. Another subsystem (e.g. cpufreq) can overwrite this default implementation, exactly as for frequency and cpu invariance. It has to be enabled by the arch by defining arch_scale_max_freq_capacity to the actual implementation. Change-Id: I266cd1f4c1c82f54b80063c36aa5f7662599dd28 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-02-01 15:59:24 +00:00
Ke Wang	7be1985454	ANDROID: sched: EAS: check energy_aware() before calling select_energy_cpu_brute() in up-migrate path In up-migrate path, select_energy_cpu_brute() was called directly without checking energy_aware(). This will make select_energy_cpu_brute() always worked even disabling energy_aware() on the asymmetric cpu capacity system. Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>	2018-01-29 15:31:47 +00:00
Victor Wan	20946741c8	Merge branch 'android-4.9' into amlogic-4.9-dev Conflicts: Makefile init/main.c	2018-01-22 20:17:25 +08:00
Patrick Bellasi	3b9305de32	sched/fair: reduce rounding errors in energy computations The SG's energy is obtained by adding busy and idle contributions which are computed by considering a proper fraction of the SCHED_CAPACITY_SCALE defined by the SG's utilizations. By scaling each and every contribution conputed we risk to accumulate rounding errors which can results into a non null energy_delta also in cases when the same total accomulated utilization is differently distributed among different CPUs. To reduce rouding errors, this patch accumulated non-scaled busy/idle energy contributions for each visited SG, and scale each of them just one time at the end. Change-Id: Idf8367fee0ac11938c6436096f0c1b2d630210d2 Suggested-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:55:12 +00:00
Patrick Bellasi	cf28cf03a3	sched/fair: re-factor energy_diff to use a single (extensible) energy_env The energy_env data structure is used to cache values required by multiple different functions involved in energy_diff computation. Some of these functions require additional parameters which can be easily embedded into the energy_env itself. The current implementation of energy_diff hardcodes the usage of two different energy_env structures to estimate and compare the energy consumption related to a "before" and an "after" CPU. Moreover, it does this energy estimation by walking multiple times the SDs/SGs data structures. A better design can be envisioned by better using the energy_env structure to support a more efficient and concurrent evaluation of multiple schedule candidates. To this purpose, this patch provides a complete re-factoring of the energy_diff implementation to: 1. use a single energy_env structure for the evaluation of all the candidate CPUs 2. walk just one time the SDs/SGs, thus improving the overall performance to compute the energy estimation for each CPU candidate specified by the single used energy_env 3. simplify the code (at least if you look at the new version and not at this re-factoring patch) thus providing a more clean code to maintain and extend for additional features This patch updated all the clients of energy_env to use only the data provided by this structure and an index for one of its CPUs candidates. Embedding everything within the energy env will make it simple to add tracepoints for this new version, which can easily provide an holistic view on how energy_diff evaluated the proposed CPU candidates. The new proposed structure, for both "struct energy_env" and the functions using it, is designed in such a way to easily accommodate additional further extensions (e.g. SchedTune filtering) without requiring an additional big re-factoring of these core functions. Finally, the usage of a CPUs candidate array, embedded into the energy_diff structure, allows also to seamless extend the exploration of multiple candidate CPUs, for example to support the comparison of a spread-vs-packing strategy. Change-Id: Ic04ffb6848b2c763cf1788767f22c6872eb12bee Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> [reworked find_new_capacity() and enforced the respect of find_best_target() selection order] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:55:08 +00:00
Patrick Bellasi	142ce32a63	sched/fair: cleanup select_energy_cpu_brute to be more consistent The current definition of select_energy_cpu_brute is a bit confusing in the definition of the value for the target_cpu to be returned as wakeup CPU for the specified task. This cleanup the code by ensuring that we always set target_cpu right before returning it. rcu_read_lock and check on *sd!=NULL are also moved around to be exactly where they are required. Change-Id: I70a4b558b3624a13395da1a87ddc0776fd1d6641 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:55:03 +00:00
Patrick Bellasi	7f44e92d1c	sched/fair: remove capacity tracking from energy_diff In preparation for the energy_diff refactoring, let's remove all the SchedTune specific bits which are used to keep track of the capacity variations requited by the PESpace filtering. This removes also the energy_normalization function and the wrapper of energy_diff which is used to trigger a PESpace filtering by schedtune_accept_deltas(). The remaining code is the "original" energy_diff function which looks just at the energy variations to compare prev_cpu vs next_cpu. Change-Id: I4fb1d1c5ba45a364e6db9ab8044969349aba0307 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:54:59 +00:00
Patrick Bellasi	4905932b05	sched/fair: remove energy_diff tracepoint in preparation to re-factoring The format of the energy_diff tracepoint is going to be changed by the following energ_diff refactoring patches. Let's remove it now to start from a clean slate. Change-Id: Id4f537ed60d90a7ddcca0a29a49944bfacb85c8c Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:54:55 +00:00
Chris Redpath	e997bf0fde	sched/fair: use p to reference task_structs This is a simple renaming patch which just align to the most common code convention used in fair.c, task_structs pointers are usually named p. Change-Id: Id0769e52b6a271014d89353fdb4be9bb721b5b2f Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:54:50 +00:00
Ke Wang	225006a4f7	sched: EAS: Fix the calculation of group util in group_idle_state() util_delta becomes not zero in eenv_before, which will affect the calculation of grp_util in group_idle_state(). Fix it under the new condition. Change-Id: Ic3853bb45876a8e388afcbe4e72d25fc42b1d7b0 Signed-off-by: Ke Wang <ke.wang@spreadtrum.com> (cherry picked from commit `47c87b2654`) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-01-08 15:54:46 +00:00
Victor Wan	2c95ea743b	Merge branch 'android-4.9' into amlogic-4.9-dev Conflicts: arch/arm/configs/omap2plus_defconfig drivers/Makefile drivers/android/binder.c	2018-01-08 18:44:19 +08:00
Greg Kroah-Hartman	3f1d77ca5f	Merge 4.9.69 into android-4.9 Changes in 4.9.69 usb: gadget: udc: renesas_usb3: fix number of the pipes can: ti_hecc: Fix napi poll return value for repoll can: kvaser_usb: free buf in error paths can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback() can: kvaser_usb: ratelimit errors if incomplete messages are received can: kvaser_usb: cancel urb on -EPIPE and -EPROTO can: ems_usb: cancel urb on -EPIPE and -EPROTO can: esd_usb2: cancel urb on -EPIPE and -EPROTO can: usb_8dev: cancel urb on -EPIPE and -EPROTO virtio: release virtio index when fail to device_register hv: kvp: Avoid reading past allocated blocks from KVP file isa: Prevent NULL dereference in isa_bus driver callbacks scsi: dma-mapping: always provide dma_get_cache_alignment scsi: use dma_get_cache_alignment() as minimum DMA alignment scsi: libsas: align sata_device's rps_resp on a cacheline efi: Move some sysfs files to be read-only by root efi/esrt: Use memunmap() instead of kfree() to free the remapping ASN.1: fix out-of-bounds read when parsing indefinite length item ASN.1: check for error from ASN1_OP_END__ACT actions KEYS: add missing permission check for request_key() destination X.509: reject invalid BIT STRING for subjectPublicKey X.509: fix comparisons of ->pkey_algo x86/PCI: Make broadcom_postcore_init() check acpi_disabled KVM: x86: fix APIC page invalidation btrfs: fix missing error return in btrfs_drop_snapshot ALSA: pcm: prevent UAF in snd_pcm_info ALSA: seq: Remove spurious WARN_ON() at timer check ALSA: usb-audio: Fix out-of-bound error ALSA: usb-audio: Add check return value for usb_string() iommu/vt-d: Fix scatterlist offset handling smp/hotplug: Move step CPUHP_AP_SMPCFD_DYING to the correct place s390: fix compat system call table KVM: s390: Fix skey emulation permission check powerpc/64s: Initialize ISAv3 MMU registers before setting partition table brcmfmac: change driver unbind order of the sdio function devices kdb: Fix handling of kallsyms_symbol_next() return value drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU media: dvb: i2c transfers over usb cannot be done from stack arm64: KVM: fix VTTBR_BADDR_MASK BUG_ON off-by-one arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one KVM: VMX: remove I/O port 0x80 bypass on Intel hosts KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion KVM: arm/arm64: vgic-irqfd: Fix MSI entry allocation KVM: arm/arm64: vgic-its: Check result of allocation before use arm64: fpsimd: Prevent registers leaking from dead tasks bus: arm-cci: Fix use of smp_processor_id() in preemptible context bus: arm-ccn: Check memory allocation failure bus: arm-ccn: Fix use of smp_processor_id() in preemptible context bus: arm-ccn: fix module unloading Error: Removing state 147 which has instances left. crypto: talitos - fix AEAD test failures crypto: talitos - fix memory corruption on SEC2 crypto: talitos - fix setkey to check key weakness crypto: talitos - fix AEAD for sha224 on non sha224 capable chips crypto: talitos - fix use of sg_link_tbl_len crypto: talitos - fix ctr-aes-talitos usb: f_fs: Force Reserved1=1 in OS_DESC_EXT_COMPAT ARM: BUG if jumping to usermode address in kernel mode ARM: avoid faulting on qemu thp: reduce indentation level in change_huge_pmd() thp: fix MADV_DONTNEED vs. numa balancing race mm: drop unused pmdp_huge_get_and_clear_notify() Revert "drm/armada: Fix compile fail" Revert "spi: SPI_FSL_DSPI should depend on HAS_DMA" ARM: 8657/1: uaccess: consistently check object sizes vti6: Don't report path MTU below IPV6_MIN_MTU. ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure x86/selftests: Add clobbers for int80 on x86_64 x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register sched/fair: Make select_idle_cpu() more aggressive x86/hpet: Prevent might sleep splat on resume powerpc/64: Invalidate process table caching after setting process table selftest/powerpc: Fix false failures for skipped tests powerpc: Fix compiling a BE kernel with a powerpc64le toolchain lirc: fix dead lock between open and wakeup_filter module: set __jump_table alignment to 8 powerpc/64: Fix checksum folding in csum_add() ARM: OMAP2+: Fix device node reference counts ARM: OMAP2+: Release device node after it is no longer needed. ASoC: rcar: avoid SSI_MODEx settings for SSI8 gpio: altera: Use handle_level_irq when configured as a level_high HID: chicony: Add support for another ASUS Zen AiO keyboard usb: gadget: configs: plug memory leak USB: gadgetfs: Fix a potential memory leak in 'dev_config()' usb: dwc3: gadget: Fix system suspend/resume on TI platforms usb: gadget: pxa27x: Test for a valid argument pointer usb: gadget: udc: net2280: Fix tmp reusage in net2280 driver kvm: nVMX: VMCLEAR should not cause the vCPU to shut down libata: drop WARN from protocol error in ata_sff_qc_issue() workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq scsi: qla2xxx: Fix ql_dump_buffer scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters irqchip/crossbar: Fix incorrect type of register size KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset arm: KVM: Survive unknown traps from guests arm64: KVM: Survive unknown traps from guests KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled spi_ks8995: fix "BUG: key accdaa28 not in .data!" spi_ks8995: regs_size incorrect for some devices bnx2x: prevent crash when accessing PTP with interface down bnx2x: fix possible overrun of VFPF multicast addresses array bnx2x: fix detection of VLAN filtering feature for VF bnx2x: do not rollback VF MAC/VLAN filters we did not configure rds: tcp: Sequence teardown of listen and acceptor sockets to avoid races ibmvnic: Fix overflowing firmware/hardware TX queue ibmvnic: Allocate number of rx/tx buffers agreed on by firmware ipv6: reorder icmpv6_init() and ip6_mr_init() crypto: s5p-sss - Fix completing crypto request in IRQ handler i2c: riic: fix restart condition blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue() zram: set physical queue limits to avoid array out of bounds accesses netfilter: don't track fragmented packets axonram: Fix gendisk handling drm/amd/amdgpu: fix console deadlock if late init failed powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro EDAC, i5000, i5400: Fix definition of NRECMEMB register kbuild: pkg: use --transform option to prefix paths in tar coccinelle: fix parallel build with CHECK=scripts/coccicheck x86/mpx/selftests: Fix up weird arrays mac80211_hwsim: Fix memory leak in hwsim_new_radio_nl() gre6: use log_ecn_error module parameter in ip6_tnl_rcv() route: also update fnhe_genid when updating a route cache route: update fnhe_expires for redirect when the fnhe exists drivers/rapidio/devices/rio_mport_cdev.c: fix resource leak in error handling path in 'rio_dma_transfer()' lib/genalloc.c: make the avail variable an atomic_long_t dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0 NFS: Fix a typo in nfs_rename() sunrpc: Fix rpc_task_begin trace point xfs: fix forgotten rcu read unlock when skipping inode reclaim dt-bindings: usb: fix reg-property port-number range block: wake up all tasks blocked in get_request() sparc64/mm: set fields in deferred pages zsmalloc: calling zs_map_object() from irq is a bug sctp: do not free asoc when it is already dead in sctp_sendmsg sctp: use the right sk after waking up from wait_buf sleep bpf: fix lockdep splat clk: uniphier: fix DAPLL2 clock rate of Pro5 atm: horizon: Fix irq release error jump_label: Invoke jump_label_test() via early_initcall() xfrm: Copy policy family in clone_policy IB/mlx4: Increase maximal message size under UD QP IB/mlx5: Assign send CQ and recv CQ of UMR QP afs: Connect up the CB.ProbeUuid Linux 4.9.69 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-12-14 09:58:43 +01:00
Peter Zijlstra	4e4a9ebe33	sched/fair: Make select_idle_cpu() more aggressive [ Upstream commit `4c77b18cf8` ] Kitsunyan reported desktop latency issues on his Celeron 887 because of commit: `1b568f0aab` ("sched/core: Optimize SCHED_SMT") ... even though his CPU doesn't do SMT. The effect of running the SMT code on a !SMT part is basically a more aggressive select_idle_cpu(). Removing the avg condition fixed things for him. I also know FB likes this test gone, even though other workloads like having it. For now, take it out by default, until we get a better idea. Reported-by: kitsunyan <kitsunyan@inbox.ru> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-14 09:28:17 +01:00
Victor Wan	1e547d2935	Merge branch 'android-4.9' into amlogic-4.9-dev	2017-12-02 16:52:23 +08:00
Ke Wang	b763480947	sched: EAS/WALT: Don't take into account of running task's util For upmigrating misfit running task case, the currently running task's util has been counted into cpu_util(). Thus currently __cpu_overutilized() which add task's uitl twice is overestimated. Change-Id: I5326f4c736a55679009d2e7293f8792311c04294 Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>	2017-12-01 16:14:01 +00:00
Joonwoo Park	b41e1ca689	sched: EAS/WALT: take into account of waking task's load WALT's function cpu_util(cpu) reports CPU's load without taking into account of waking task's load. Thus currently cpu_overutilized() underestimates load on the previous CPU of waking task. Take into account of task's load to determine whether previous CPU is overutilzed to bail out early without running energy_diff() which is expensive. Change-Id: I30f146984a880ad2cc1b8a4ce35bd239a8c9a607 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> (minor rebase conflicts) Signed-off-by: Chris Redpath <chris.redpath@arm.com> (cherry picked from commit `94e5c96507`) [trivial cherry-pick issues] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-12-01 16:11:57 +00:00
Joonwoo Park	4f0693ad08	sched: EAS: upmigrate misfit current task Upmigrate misfit current task upon scheduler tick with stopper. We can kick an random (not necessarily big CPU) NOHZ idle CPU when a CPU bound task is in need of upmigration. But it's not efficient as that way needs following unnecessary wakeups: 1. Busy little CPU A to kick idle B 2. B runs idle balancer and enqueue migration/A 3. B goes idle 4. A runs migration/A, enqueues busy task on B. 5. B wakes up again. This change makes active upmigration more efficiently by doing: 1. Busy little CPU A find target CPU B upon tick. 2. CPU A enqueues migration/A. Change-Id: Ie865738054ea3296f28e6ba01710635efa7193c0 [joonwoop: The original version had logic to reserve CPU. The logic is omitted in this version.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> (cherry picked from commit `9e293db052`) [trivial cherry-pick issues] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-12-01 16:11:45 +00:00
Prasad Sodagudi	d345372a29	sched: avoid pushing tasks to an offline CPU Currently active_load_balance_cpu_stop is run by cpu stopper and it pushes running tasks off the busiest CPU onto idle target CPU. But there is no check to see whether target cpu is offline or not before pushing the tasks. With the introduction of active migration in the scheduler tick path (see check_for_migration()) there have been instances of attempts to migrate tasks to offline CPUs. Add a check as to whether the target cpu is online or not to prevent scheduling on offline CPUs. Change-Id: Ib8ac7f8aeabd3ca7365f3eae977075952dab4f21 Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> (cherry picked from commit `dc626b28ee`) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-12-01 16:11:35 +00:00
Srivatsa Vaddagiri	70e14af60c	sched: Extend active balance to accept 'push_task' argument Active balance currently picks one task to migrate from busy cpu to a chosen cpu (push_cpu). This patch extends active load balance to recognize a particular task ('push_task') that needs to be migrated to 'push_cpu'. This capability will be leveraged by HMP-aware task placement in a subsequent patch. Change-Id: If31320111e6cc7044e617b5c3fd6d8e0c0e16952 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> (cherry picked from commit `2da014c0d8`) [ trivial cherry-pick issues, fix "may be used uninitialized" warning for push_task ] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-12-01 16:11:25 +00:00
Joonwoo Park	a8611936de	sched/fair: prevent meaningless active migration At present need_active_balance() determines whether an active upmigration is needed by using capacity_of(). A CPU's capacity may be reduced by RT pressure, and therefore distinguishing capability differences with capacity_of() may lead to suboptimal active migrations to less capable CPUs. Use capacity_orig_of to distinguish differently capable CPUs in addition to capacity_of(), thus avoiding placing tasks on less capable CPUs due to instantaneous RT pressure. Change-Id: I3e1435246a8edc3ad618ef98a34866cfbd8c16a5 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [markivx: Reworked the commit text a bit] Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> (cherry picked from commit `7ab48e4c8d`) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-12-01 16:10:47 +00:00
Joonwoo Park	cd76b21535	sched: EAS: kill incorrect nohz idle cpu kick EAS won't allow NOHZ idle balancer until CPU's over utilized. However nohz_kick_needed() can return true. This causes idle CPU wake up for nothing. Change-Id: I6e548442e29e4f85cda695e4c7101dd591b12fe6 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> (cherry picked from commit `3989a247e2`) [trivial cherry-pick issues] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2017-11-14 16:38:38 +00:00

1 2 3 4 5 ...

763 Commits