linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-05 12:43:09 +09:00

Author	SHA1	Message	Date
Joonwoo Park	2067342dbc	sched/fair: prevent meaningless active migration At present need_active_balance() determines whether an active upmigration is needed by using capacity_of(). A CPU's capacity may be reduced by RT pressure, and therefore distinguishing capability differences with capacity_of() may lead to suboptimal active migrations to less capable CPUs. Use capacity_orig_of to distinguish differently capable CPUs in addition to capacity_of(), thus avoiding placing tasks on less capable CPUs due to instantaneous RT pressure. Change-Id: I3e1435246a8edc3ad618ef98a34866cfbd8c16a5 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [markivx: Reworked the commit text a bit] Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>	2017-11-20 21:15:59 +05:30
Vikram Mulukutla	1765eacee8	sched: walt: Leverage existing helper APIs to apply invariance There's no need for a separate hierarchy of notifiers, APIs and variables in walt.c for the purpose of applying frequency and IPC invariance. Let's just use capacity_curr_of and get rid of a lot of the infrastructure relating to capacity, load_scale_factor etc. Change-Id: Ia220e2c896373fa535db05bff60f9aa33aefc978 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>	2017-11-20 21:15:59 +05:30
Olav Haugan	b7d6f8f22b	sched: Update task->on_rq when tasks are moving between runqueues Task->on_rq has three states: 0 - Task is not on runqueue (rq) 1 (TASK_ON_RQ_QUEUED) - Task is on rq 2 (TASK_ON_RQ_MIGRATING) - Task is on rq but in the process of being migrated to another rq When a task is moving between rqs task->on_rq state should be TASK_ON_RQ_MIGRATING in order for WALT to account rq's cumulative runnable average correctly. Without such state marking for all the classes, WALT's update_history() would try to fixup task's demand which was never contributed to any of CPUs during migration. Change-Id: Iced3428f3924fe8ab5d0075698273ead04f12d5b Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [joonwoop: Reinforced changelog to explain why this is needed by WALT. Fixed conflicts in deadline.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2017-11-20 21:15:59 +05:30
Amit Pundir	839f249169	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android Conflicts due to AOSP's backported commits: fs/f2fs/crypto.c fs/f2fs/crypto_fname.c Deleted by AOSP commit `c1286ff41c` ("f2fs: backport from (`4c1fad64` - Merge tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs)") fs/f2fs/crypto_key.c fs/f2fs/data.c fs/f2fs/file.c AOSP commit `13f002354d` ("f2fs: catch up to v4.14-rc1") override most of stable 4.4.y changes. Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2017-11-20 20:53:19 +05:30
Alex Shi	d983367513	Merge tag 'v4.4.98' into linux-linaro-lsk-v4.4 This is the 4.4.98 stable release	2017-11-16 12:03:04 +08:00
Li Bin	44540ead8a	workqueue: Fix NULL pointer dereference commit `cef572ad9b` upstream. When queue_work() is used in irq (not in task context), there is a potential case that trigger NULL pointer dereference. ---------------------------------------------------------------- worker_thread() \|-spin_lock_irq() \|-process_one_work() \|-worker->current_pwq = pwq \|-spin_unlock_irq() \|-worker->current_func(work) \|-spin_lock_irq() \|-worker->current_pwq = NULL \|-spin_unlock_irq() //interrupt here \|-irq_handler \|-__queue_work() //assuming that the wq is draining \|-is_chained_work(wq) \|-current_wq_worker() //Here, 'current' is the interrupted worker! \|-current->current_pwq is NULL here! \|-schedule() ---------------------------------------------------------------- Avoid it by checking for task context in current_wq_worker(), and if not in task context, we shouldn't use the 'current' to check the condition. Reported-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: Li Bin <huawei.libin@huawei.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `8d03ecfe47` ("workqueue: reimplement is_chained_work() using current_wq_worker()") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-15 17:13:11 +01:00
Alex Shi	2f68ef7576	Merge tag 'v4.4.96' into linux-linaro-lsk-v4.4 This is the 4.4.96 stable release	2017-11-03 12:02:27 +08:00
Tejun Heo	fce67b31c7	workqueue: replace pool->manager_arb mutex with a flag commit `692b48258d` upstream. Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by lockdep: [ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected [ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted [ 1270.473240] ----------------------------------------------------- [ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 1270.474239] (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280 [ 1270.474994] [ 1270.474994] and this task is already holding: [ 1270.475440] (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0 [ 1270.476046] which would create a new lock dependency: [ 1270.476436] (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.} [ 1270.476949] [ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock: [ 1270.477553] (&pool->lock/1){-.-.} ... [ 1270.488900] to a HARDIRQ-irq-unsafe lock: [ 1270.489327] (&(&lock->wait_lock)->rlock){+.+.} ... [ 1270.494735] Possible interrupt unsafe locking scenario: [ 1270.494735] [ 1270.495250] CPU0 CPU1 [ 1270.495600] ---- ---- [ 1270.495947] lock(&(&lock->wait_lock)->rlock); [ 1270.496295] local_irq_disable(); [ 1270.496753] lock(&pool->lock/1); [ 1270.497205] lock(&(&lock->wait_lock)->rlock); [ 1270.497744] <Interrupt> [ 1270.497948] lock(&pool->lock/1); , which will cause a irq inversion deadlock if the above lock scenario happens. The root cause of this safe -> unsafe lock order is the mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock held. Unlocking mutex while holding an irq spinlock was never safe and this problem has been around forever but it never got noticed because the only time the mutex is usually trylocked while holding irqlock making actual failures very unlikely and lockdep annotation missed the condition until the recent `b9c16a0e1f` ("locking/mutex: Fix lockdep_assert_held() fail"). Using mutex for pool->manager_arb has always been a bit of stretch. It primarily is an mechanism to arbitrate managership between workers which can easily be done with a pool flag. The only reason it became a mutex is that pool destruction path wants to exclude parallel managing operations. This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE and make the destruction path wait for the current manager on a wait queue. v2: Drop unnecessary flag clearing before pool destruction as suggested by Boqun. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-02 09:40:48 +01:00
Alex Shi	51f5845319	Merge tag 'v4.4.95' into linux-linaro-lsk-v4.4 This is the 4.4.95 stable release	2017-10-28 12:06:21 +08:00
Oleg Nesterov	0f85c0954b	sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task() commit `18f649ef34` upstream. The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove it, but see the next patch. However the comment is correct in that autogroup_move_group() must always change task_group() for every thread so the sysctl_ check is very wrong; we can race with cgroups and even sys_setsid() is not safe because a task running with task_group() == ag->tg must participate in refcounting: int main(void) { int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY); assert(sctl > 0); if (fork()) { wait(NULL); // destroy the child's ag/tg pause(); } assert(pwrite(sctl, "1\n", 2, 0) == 2); assert(setsid() > 0); if (fork()) pause(); kill(getppid(), SIGKILL); sleep(1); // The child has gone, the grandchild runs with kref == 1 assert(pwrite(sctl, "0\n", 2, 0) == 2); assert(setsid() > 0); // runs with the freed ag/tg for (;;) sleep(1); return 0; } crashes the kernel. It doesn't really need sleep(1), it doesn't matter if autogroup_move_group() actually frees the task_group or this happens later. Reported-by: Vern Lovejoy <vlovejoy@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hartsjc@redhat.com Cc: vbendel@redhat.com Link: http://lkml.kernel.org/r/20161114184609.GA15965@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> [sumits: submit to 4.4 LTS, post testing on Hikey] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-27 10:23:17 +02:00
Alex Shi	fb596ec8d3	Merge branch 'v4.4/topic/kexec-kdump' into linux-linaro-lsk-v4.4	2017-10-25 13:27:31 +08:00
Alex Shi	3ad68227f5	Merge tag 'v4.4.94' into linux-linaro-lsk-v4.4 This is the 4.4.94 stable release	2017-10-25 11:50:26 +08:00
Xunlei Pang	8702f7853b	s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres() Commit 3f625002581b ("kexec: introduce a protection mechanism for the crashkernel reserved memory") is a similar mechanism for protecting the crash kernel reserved memory to previous crash_map/unmap_reserved_pages() implementation, the new one is more generic in name and cleaner in code (besides, some arch may not be allowed to unmap the pgtable). Therefore, this patch consolidates them, and uses the new arch_kexec_protect(unprotect)_crashkres() to replace former crash_map/unmap_reserved_pages() which by now has been only used by S390. The consolidation work needs the crash memory to be mapped initially, this is done in machine_kdump_pm_init() which is after reserve_crashkernel(). Once kdump kernel is loaded, the new arch_kexec_protect_crashkres() implemented for S390 will actually unmap the pgtable like before. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Minfei Huang <mhuang@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `7a0058ec78`) Signed-off-by: Alex Shi <alex.shi@linaro.org>	2017-10-25 11:26:32 +08:00
Minfei Huang	d54d9726ca	kexec: do a cleanup for function kexec_load There are a lof of work to be done in function kexec_load, not only for allocating structs and loading initram, but also for some misc. To make it more clear, wrap a new function do_kexec_load which is used to allocate structs and load initram. And the pre-work will be done in kexec_load. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `0eea08678e`) Signed-off-by: Alex Shi <alex.shi@linaro.org>	2017-10-25 11:26:03 +08:00
Minfei Huang	b67152e919	kexec: make a pair of map/unmap reserved pages in error path For some arch, kexec shall map the reserved pages, then use them, when we try to start the kdump service. kexec may return directly, without unmaping the reserved pages, if it fails during starting service. To fix it, we make a pair of map/unmap reserved pages both in generic path and error path. This patch only affects s390. Other architecturess don't implement the interface of crash_unmap_reserved_pages and crash_map_reserved_pages. It isn't a urgent patch. Kernel can work well without any risk, although the reserved pages are not unmapped before returning in error path. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `917a35605f`) Signed-off-by: Alex Shi <alex.shi@linaro.org>	2017-10-25 11:25:42 +08:00
Xunlei Pang	4b97cecd54	kexec: introduce a protection mechanism for the crashkernel reserved memory For the cases that some kernel (module) path stamps the crash reserved memory(already mapped by the kernel) where has been loaded the second kernel data, the kdump kernel will probably fail to boot when panic happens (or even not happens) leaving the culprit at large, this is unacceptable. The patch introduces a mechanism for detecting such cases: 1) After each crash kexec loading, it simply marks the reserved memory regions readonly since we no longer access it after that. When someone stamps the region, the first kernel will panic and trigger the kdump. The weak arch_kexec_protect_crashkres() is introduced to do the actual protection. 2) To allow multiple loading, once 1) was done we also need to remark the reserved memory to readwrite each time a system call related to kdump is made. The weak arch_kexec_unprotect_crashkres() is introduced to do the actual protection. The architecture can make its specific implementation by overriding arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres(). Signed-off-by: Xunlei Pang <xlpang@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Minfei Huang <mhuang@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `9b492cf580`) Signed-off-by: Alex Shi <alex.shi@linaro.org>	2017-10-25 11:23:52 +08:00
Peter Zijlstra	28eab3db72	locking/lockdep: Add nest_lock integrity test [ Upstream commit `7fb4a2cea6` ] Boqun reported that hlock->references can overflow. Add a debug test for that to generate a clear error when this happens. Without this, lockdep is likely to report a mysterious failure on unlock. Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-21 17:09:03 +02:00
Yonghong Song	1a4f1ecdb2	bpf: one perf event close won't free bpf program attached by another perf event [ Upstream commit `ec9dd352d5` ] This patch fixes a bug exhibited by the following scenario: 1. fd1 = perf_event_open with attr.config = ID1 2. attach bpf program prog1 to fd1 3. fd2 = perf_event_open with attr.config = ID1 <this will be successful> 4. user program closes fd2 and prog1 is detached from the tracepoint. 5. user program with fd1 does not work properly as tracepoint no output any more. The issue happens at step 4. Multiple perf_event_open can be called successfully, but only one bpf prog pointer in the tp_event. In the current logic, any fd release for the same tp_event will free the tp_event->prog. The fix is to free tp_event->prog only when the closing fd corresponds to the one which registered the program. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-21 17:09:02 +02:00
Edward Cree	2ec54b21dd	bpf/verifier: reject BPF_ALU64\|BPF_END [ Upstream commit `e67b8a685c` ] Neither ___bpf_prog_run nor the JITs accept it. Also adds a new test case. Fixes: `17a5267067` ("bpf: verifier (add verifier core)") Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-21 17:09:02 +02:00
Alex Shi	218dea0148	Merge tag 'v4.4.93' into linux-linaro-lsk-v4.4 This is the 4.4.93 stable release	2017-10-19 14:23:00 +08:00
Paul E. McKenney	5fd4551659	rcu: Allow for page faults in NMI handlers commit `28585a8326` upstream. A number of architecture invoke rcu_irq_enter() on exception entry in order to allow RCU read-side critical sections in the exception handler when the exception is from an idle or nohz_full CPU. This works, at least unless the exception happens in an NMI handler. In that case, rcu_nmi_enter() would already have exited the extended quiescent state, which would mean that rcu_irq_enter() would (incorrectly) cause RCU to think that it is again in an extended quiescent state. This will in turn result in lockdep splats in response to later RCU read-side critical sections. This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to take no action if there is an rcu_nmi_enter() in effect, thus avoiding the unscheduled return to RCU quiescent state. This in turn should make the kernel safe for on-demand RCU voyeurism. Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: `0be964be0` ("module: Sanitize RCU usage and locking") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-18 09:20:41 +02:00
Joel Fernandes	43227089a4	FROMLIST: tracing: Add support for preempt and irq enable/disable events Preempt and irq trace events can be used for tracing the start and end of an atomic section which can be used by a trace viewer like systrace to graphically view the start and end of an atomic section and correlate them with latencies and scheduling issues. This also serves as a prelude to using synthetic events or probes to rewrite the preempt and irqsoff tracers, along with numerous benefits of using trace events features for these events. Change-Id: I718d40f7c3c48579adf9d7121b21495a669c89bd Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zilstra <peterz@infradead.org> Cc: kernel-team@android.com Link: https://patchwork.kernel.org/patch/9988157/ Signed-off-by: Joel Fernandes <joelaf@google.com>	2017-10-15 23:53:55 +05:30
Joel Fernandes	6ef223a680	FROMLIST: tracing: Prepare to add preempt and irq trace events In preparation of adding irqsoff and preemptsoff enable and disable trace events, move required functions and code to make it easier to add these events in a later patch. This patch is just code movement and no functional change. Change-Id: I587d411da5efbc4959bcccd7a05c7a66c231e1e0 Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@android.com Link: https://patchwork.kernel.org/patch/9988159/ Signed-off-by: Joel Fernandes <joelaf@google.com>	2017-10-15 23:53:23 +05:30
Juri Lelli	406cbca78c	UPSTREAM: cpufreq: schedutil: use now as reference when aggregating shared policy requests Currently, sugov_next_freq_shared() uses last_freq_update_time as a reference to decide when to start considering CPU contributions as stale. However, since last_freq_update_time is set by the last CPU that issued a frequency transition, this might cause problems in certain cases. In practice, the detection of stale utilization values fails whenever the CPU with such values was the last to update the policy. For example (and please note again that the SCHED_CPUFREQ_RT flag is not the problem here, but only the detection of after how much time that flag has to be considered stale), suppose a policy with 2 CPUs: CPU0 \| CPU1 \| \| RT task scheduled \| SCHED_CPUFREQ_RT is set \| CPU1->last_update = now \| freq transition to max \| last_freq_update_time = now \| more than TICK_NSEC nsecs \| a small CFS wakes up \| CPU0->last_update = now1 \| delta_ns(CPU0) < TICK_NSEC* \| CPU0's util is considered \| delta_ns(CPU1) = \| last_freq_update_time - \| CPU1->last_update = 0 \| < TICK_NSEC \| CPU1 is still considered \| CPU1->SCHED_CPUFREQ_RT is set \| we stay at max (until CPU1 \| exits from idle) \| * delta_ns is actually negative as now1 > last_freq_update_time While last_freq_update_time is a sensible reference for rate limiting, it doesn't seem to be useful for working around stale CPU states. Fix the problem by always considering now (time) as the reference for deciding when CPUs have stale contributions. Signed-off-by: Juri Lelli <juri.lelli@arm.com> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit `d86ab9cff8`)	2017-10-15 23:28:32 +05:30
Vikram Mulukutla	060add8804	Revert "ANDROID: sched/tune: Initialize raw_spin_lock in boosted_groups" This reverts commit `c5616f2f87`. If we re-init the per-cpu boostgroup spinlock every time that we add a new boosted cgroup, we can easily wipe out (reinit) a spinlock struct while in a critical section. We should only be setting up the per-cpu boostgroup data, and the spin_lock initialization need only happen once - which we're already doing in a postcore_initcall. For example: -------- CPU 0 -------- \| -------- CPU1 -------- cgroupX boost group added \| schedtune_enqueue_task \| acquires(bg->lock) \| cgroupY boost group added \| for_each_cpu() \| raw_spin_lock_init(bg->lock) releases(bg->lock) \| BUG (already unlocked) \| \| This results in the following BUG from the debug spinlock code: BUG: spinlock already unlocked on CPU#5, rcuop/6/68 Change-Id: I3016702780b461a0cd95e26c538cd18df27d6316 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>	2017-10-15 23:21:09 +05:30
Michal Hocko	37c7a3c876	BACKPORT: partial: mm, oom_reaper: do not mmput synchronously from the oom reaper context (cherry picked from commit `ec8d7c14ea`) Tetsuo has properly noted that mmput slow path might get blocked waiting for another party (e.g. exit_aio waits for an IO). If that happens the oom_reaper would be put out of the way and will not be able to process next oom victim. We should strive for making this context as reliable and independent on other subsystems as much as possible. Introduce mmput_async which will perform the slow path from an async (WQ) context. This will delay the operation but that shouldn't be a problem because the oom_reaper has reclaimed the victim's address space for most cases as much as possible and the remaining context shouldn't bind too much memory anymore. The only exception is when mmap_sem trylock has failed which shouldn't happen too often. The issue is only theoretical but not impossible. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Only backports mmput_async. Change-Id: I5fe54abcc629e7d9eab9fe03908903d1174177f1 Signed-off-by: Arve Hjønnevåg <arve@android.com>	2017-10-15 23:21:09 +05:30
Michael Ellerman	e0e5d1e258	UPSTREAM: Fix build break in fork.c when THREAD_SIZE < PAGE_SIZE Commit `b235beea9e` ("Clarify naming of thread info/stack allocators") breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE: kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack' kernel/fork.c:355:8: error: assignment from incompatible pointer type stack = alloc_thread_stack_node(tsk, node); ^ Fix it by renaming free_stack() to free_thread_stack(), and updating the return type of alloc_thread_stack_node(). Fixes: `b235beea9e` ("Clarify naming of thread info/stack allocators") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 38331309 Change-Id: I5b7f920b459fb84adf5fc75f83bb488b855c4deb (cherry picked from commit `9521d39976`) Signed-off-by: Zubin Mithra <zsm@google.com>	2017-10-15 23:21:09 +05:30
Alex Shi	80cd9be34a	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android Conflicts: included wakeup_reason.h file of `57caa2ad5c` in kernel/power/process.c	2017-10-13 23:14:45 +08:00
Alex Shi	6c7d89c123	Merge tag 'v4.4.92' into linux-linaro-lsk-v4.4 This is the 4.4.92 stable release	2017-10-13 12:04:17 +08:00
Peter Zijlstra	90fd673873	sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs commit `50e7663233` upstream. Cpusets vs. suspend-resume is _completely_ broken. And it got noticed because it now resulted in non-cpuset usage breaking too. On suspend cpuset_cpu_inactive() doesn't call into cpuset_update_active_cpus() because it doesn't want to move tasks about, there is no need, all tasks are frozen and won't run again until after we've resumed everything. But this means that when we finally do call into cpuset_update_active_cpus() after resuming the last frozen cpu in cpuset_cpu_active(), the top_cpuset will not have any difference with the cpu_active_mask and this it will not in fact do _anything_. So the cpuset configuration will not be restored. This was largely hidden because we would unconditionally create identity domains and mobile users would not in fact use cpusets much. And servers what do use cpusets tend to not suspend-resume much. An addition problem is that we'd not in fact wait for the cpuset work to finish before resuming the tasks, allowing spurious migrations outside of the specified domains. Fix the rebuild by introducing cpuset_force_rebuild() and fix the ordering with cpuset_wait_for_hotplug(). Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `deb7aa308e` ("cpuset: reorganize CPU / memory hotplug handling") Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-12 11:27:35 +02:00
Shu Wang	87509592ec	ftrace: Fix kmemleak in unregister_ftrace_graph commit `2b0b8499ae` upstream. The trampoline allocated by function tracer was overwriten by function_graph tracer, and caused a memory leak. The save_global_trampoline should have saved the previous trampoline in register_ftrace_graph() and restored it in unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was only used in unregister_ftrace_graph as default value 0, and it overwrote the previous trampoline's value. Causing the previous allocated trampoline to be lost. kmmeleak backtrace: kmemleak_vmalloc+0x77/0xc0 __vmalloc_node_range+0x1b5/0x2c0 module_alloc+0x7c/0xd0 arch_ftrace_update_trampoline+0xb5/0x290 ftrace_startup+0x78/0x210 register_ftrace_function+0x8b/0xd0 function_trace_init+0x4f/0x80 tracing_set_tracer+0xe6/0x170 tracing_set_trace_write+0x90/0xd0 __vfs_write+0x37/0x170 vfs_write+0xb2/0x1b0 SyS_write+0x55/0xc0 do_syscall_64+0x67/0x180 return_from_SYSCALL_64+0x0/0x6a [ Looking further into this, I found that this was left over from when the function and function graph tracers shared the same ftrace_ops. But in commit `5f151b2401` ("ftrace: Fix function_profiler and function tracer together"), the two were separated, and the save_global_trampoline no longer was necessary (and it may have been broken back then too). -- Steven Rostedt ] Link: http://lkml.kernel.org/r/20170912021454.5976-1-shuwang@redhat.com Fixes: `5f151b2401` ("ftrace: Fix function_profiler and function tracer together") Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-12 11:27:33 +02:00
Alex Shi	fb68750598	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2017-10-09 14:32:35 +08:00
Alex Shi	11c5615ee5	Merge remote-tracking branch 'lts/linux-4.4.y' into linux-linaro-lsk-v4.4	2017-10-09 14:30:34 +08:00
Myungho Jung	5e9b526fcc	timer/sysclt: Restrict timer migration sysctl values to 0 and 1 commit `b94bf594cf` upstream. timer_migration sysctl acts as a boolean switch, so the allowed values should be restricted to 0 and 1. Add the necessary extra fields to the sysctl table entry to enforce that. [ tglx: Rewrote changelog ] Signed-off-by: Myungho Jung <mhjungk@gmail.com> Link: http://lkml.kernel.org/r/1492640690-3550-1-git-send-email-mhjungk@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kazuhiro Hayashi <kazuhiro3.hayashi@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-05 09:41:47 +02:00
Oleg Nesterov	9237605e0b	seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter() commit `66a733ea6b` upstream. As Chris explains, get_seccomp_filter() and put_seccomp_filter() can end up using different filters. Once we drop ->siglock it is possible for task->seccomp.filter to have been replaced by SECCOMP_FILTER_FLAG_TSYNC. Fixes: `f8e529ed94` ("seccomp, ptrace: add support for dumping seccomp filters") Reported-by: Chris Salls <chrissalls5@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> [tycho: add __get_seccomp_filter vs. open coding refcount_inc()] Signed-off-by: Tycho Andersen <tycho@docker.com> [kees: tweak commit log] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-05 09:41:46 +02:00
Bo Yan	68a4a52899	tracing: Erase irqsoff trace with empty write commit `8dd33bcb70` upstream. One convenient way to erase trace is "echo > trace". However, this is currently broken if the current tracer is irqsoff tracer. This is because irqsoff tracer use max_buffer as the default trace buffer. Set the max_buffer as the one to be cleared when it's the trace buffer currently in use. Link: http://lkml.kernel.org/r/1505754215-29411-1-git-send-email-byan@nvidia.com Cc: <mingo@redhat.com> Fixes: `4acd4d00f` ("tracing: give easy way to clear trace buffer") Signed-off-by: Bo Yan <byan@nvidia.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-05 09:41:44 +02:00
Tahsin Erdogan	9c5afa726a	tracing: Fix trace_pipe behavior for instance traces commit `75df6e688c` upstream. When reading data from trace_pipe, tracing_wait_pipe() performs a check to see if tracing has been turned off after some data was read. Currently, this check always looks at global trace state, but it should be checking the trace instance where trace_pipe is located at. Because of this bug, cat instances/i1/trace_pipe in the following script will immediately exit instead of waiting for data: cd /sys/kernel/debug/tracing echo 0 > tracing_on mkdir -p instances/i1 echo 1 > instances/i1/tracing_on echo 1 > instances/i1/events/sched/sched_process_exec/enable cat instances/i1/trace_pipe Link: http://lkml.kernel.org/r/20170917102348.1615-1-tahsin@google.com Fixes: `10246fa35d` ("tracing: give easy way to clear trace buffer") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-05 09:41:44 +02:00
Alex Shi	a759573d34	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2017-10-04 12:03:25 +08:00
Alex Shi	cda2b94814	Merge tag 'v4.4.89' into linux-linaro-lsk-v4.4 This is the 4.4.89 stable release	2017-10-04 12:03:22 +08:00
Steven Rostedt (VMware)	ed1bf4397d	ftrace: Fix memleak when unregistering dynamic ops when tracing disabled commit `edb096e007` upstream. If function tracing is disabled by the user via the function-trace option or the proc sysctl file, and a ftrace_ops that was allocated on the heap is unregistered, then the shutdown code exits out without doing the proper clean up. This was found via kmemleak and running the ftrace selftests, as one of the tests unregisters with function tracing disabled. # cat kmemleak unreferenced object 0xffffffffa0020000 (size 4096): comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s) hex dump (first 32 bytes): 55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89 U.t$.UH...t$.UH. e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c .H......H.D$PH.L backtrace: [<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0 [<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0 [<ffffffff8109697f>] module_alloc+0x4f/0x90 [<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420 [<ffffffff81249947>] ftrace_startup+0xe7/0x300 [<ffffffff81249bd2>] register_ftrace_function+0x72/0x90 [<ffffffff81263786>] trace_selftest_ops+0x204/0x397 [<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624 [<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7 [<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192 [<ffffffff81002230>] do_one_initcall+0x90/0x1e2 [<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe [<ffffffff81d61ec3>] kernel_init+0x13/0x122 [<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40 [<ffffffffffffffff>] 0xffffffffffffffff Fixes: `12cce594fa` ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-27 11:00:17 +02:00
Baohong Liu	d28e96be7c	tracing: Apply trace_clock changes to instance max buffer commit `170b3b1050` upstream. Currently trace_clock timestamps are applied to both regular and max buffers only for global trace. For instance trace, trace_clock timestamps are applied only to regular buffer. But, regular and max buffers can be swapped, for example, following a snapshot. So, for instance trace, bad timestamps can be seen following a snapshot. Let's apply trace_clock timestamps to instance max buffer as well. Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com Fixes: `277ba0446` ("tracing: Add interface to allow multiple trace buffers") Signed-off-by: Baohong Liu <baohong.liu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-27 11:00:16 +02:00
Steven Rostedt (VMware)	753154fcfe	ftrace: Fix selftest goto location on error commit `46320a6acc` upstream. In the second iteration of trace_selftest_ops(), the error goto label is wrong in the case where trace_selftest_test_global_cnt is off. In the case of error, it leaks the dynamic ops that was allocated. Fixes: `95950c2e` ("ftrace: Add self-tests for multiple function trace users") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-27 11:00:16 +02:00
Rafael J. Wysocki	aa603fc147	BACKPORT: cpufreq: schedutil: Use policy-dependent transition delays Make the schedutil governor take the initial (default) value of the rate_limit_us sysfs attribute from the (new) transition_delay_us policy parameter (to be set by the scaling driver). That will allow scaling drivers to make schedutil use smaller default values of rate_limit_us and reduce the default average time interval between consecutive frequency changes. Make intel_pstate set transition_delay_us to 500. BACKPORT: Modified to support the separate up_rate_limit_us and down_rate_limit_us (upstream just has a single rate_limit_us). Also dropped the changes for intel_pstate as there's a merge conflict. Change-Id: I62a8543879a4d8582cdcb31ebd55607705d1c8b1 Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> (cherry picked from commit `1b72e7fd30`) Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>	2017-09-18 21:14:35 +01:00
Joonwoo Park	7530f3548d	sched: WALT: fix window mis-alignment The initial window start needs to be close to ktime ns = 0 to be aligned with scheduler tick. Change-Id: Ia91f74efce2f910106622a054a6fcd507e763ca5 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2017-09-18 21:14:34 +01:00
Joonwoo Park	d5a3aedb50	sched: EAS: kill incorrect nohz idle cpu kick EAS won't allow NOHZ idle balancer until CPU's over utilized. However nohz_kick_needed() can return true. This causes idle CPU wake up for nothing. Change-Id: I6e548442e29e4f85cda695e4c7101dd591b12fe6 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2017-09-18 21:14:34 +01:00
Joonwoo Park	9b792188f9	sched: EAS: fix incorrect energy delta calculation due to rounding error In order to calculate energy difference we currently iterates CPUs under the same sched doamin to accumulate total energy cost and compare before and after : for_each_domain(cpu) total_energy_before += (cpu_util * power) >> SCHED_CAPACITY_SHIFT; for_each_domain(cpu) total_energy_after += (cpu_util * power) >> SCHED_CAPACITY_SHIFT; Doing such can incorrectly calculate and report abs(delta) > 0 when there is actually no energy delta between before and after because the same total accumulated cpu_util of all the CPUs can be distributed differently before and after and it causes different amount of rounding error. Fix such incorrectness by shifting just once with accumulated total_energy. Change-Id: I82f1e2e358367058960938b4ef81714f57e921cf Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> (moved part to another commit) Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2017-09-18 21:14:33 +01:00
Joonwoo Park	77ce105b15	sched: EAS/WALT: take into account of waking task's load WALT's function cpu_util(cpu) reports CPU's load without taking into account of waking task's load. Thus currently cpu_overutilized() underestimates load on the previous CPU of waking task. Take into account of task's load to determine whether previous CPU is overutilzed to bail out early without running energy_diff() which is expensive. Change-Id: I30f146984a880ad2cc1b8a4ce35bd239a8c9a607 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> (minor rebase conflicts) Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2017-09-18 21:14:33 +01:00
Joonwoo Park	d6614c4e2c	cpufreq: sched: WALT: don't apply capacity margin twice With WALT all the scheduler classes' load are accounted in scr->cfs and update_cpu_capacity_request() adds capacity margin. At present, at tick path, scheduler also adds capacity margin. Therefore the margin applied twice. Fix such error by using margin applied cpu utilization only for checking whether frequency increase is needed. Change-Id: Id7d8cc73b2e4eec70b274ca66e09bb0b16bf6f09 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> (trivial rebase conflict) Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2017-09-18 21:14:33 +01:00
Joonwoo Park	9305d63017	sched: WALT: fix potential overflow Task demand and CPU util are in u64. Change-Id: If7ec1623e723026d3346201122aab0303a6d2ba2 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2017-09-18 21:14:32 +01:00
Joonwoo Park	55f3247db9	sched: EAS: schedfreq: fix CPU util over estimation WALT CPU utilization reports CPU load of all the scheduler classes. Therefore adding RT class's load additionally will cause frequency overshooting. Fix such issue by not accounting RT class load when requesting capacity. Change-Id: I29600d7af7ca8c00e0d2ff1e13872024ccaa72bf Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2017-09-18 21:14:32 +01:00

1 2 3 4 5 ...

22173 Commits