linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-09 22:53:28 +09:00

Author	SHA1	Message	Date
Tao Huang	3430c68a33	Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git * linux-linaro-lsk-v4.4-android: (660 commits) ANDROID: keychord: Check for write data size ANDROID: sdcardfs: Set num in extension_details during make_item ANDROID: sdcardfs: Hold i_mutex for i_size_write BACKPORT, FROMGIT: crypto: speck - add test vectors for Speck64-XTS BACKPORT, FROMGIT: crypto: speck - add test vectors for Speck128-XTS BACKPORT, FROMGIT: crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS FROMGIT: crypto: speck - export common helpers BACKPORT, FROMGIT: crypto: speck - add support for the Speck block cipher UPSTREAM: ANDROID: binder: synchronize_rcu() when using POLLFREE. f2fs: updates on v4.16-rc1 BACKPORT: tee: shm: Potential NULL dereference calling tee_shm_register() BACKPORT: tee: shm: don't put_page on null shm->pages BACKPORT: tee: shm: make function __tee_shm_alloc static BACKPORT: tee: optee: check type of registered shared memory BACKPORT: tee: add start argument to shm_register callback BACKPORT: tee: optee: fix header dependencies BACKPORT: tee: shm: inline tee_shm_get_id() BACKPORT: tee: use reference counting for tee_context BACKPORT: tee: optee: enable dynamic SHM support BACKPORT: tee: optee: add optee-specific shared pool implementation ... Conflicts: drivers/irqchip/Kconfig drivers/media/i2c/tc35874x.c drivers/media/v4l2-core/v4l2-compat-ioctl32.c drivers/usb/gadget/function/f_fs.c fs/f2fs/node.c Change-Id: Icecd73a515821b536fa3d81ea91b63d9b3699916	2018-03-09 19:10:14 +08:00
Chris Redpath	daa3eacc9e	sched/fair: prevent possible infinite loop in sched_group_energy There is a race between hotplug and energy_diff which might result in endless loop in sched_group_energy. When this happens, the end condition cannot be detected. We can store how many CPUs we need to visit at the beginning, and bail out of the energy calculation if we visit more cpus than expected. Bug: 72311797 72202633 Change-Id: I8dda75468ee1570da4071cd8165ef5131a8205d8 Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2018-03-05 21:56:13 +05:30
Joel Fernandes	6c48002863	ANDROID: sched/rt: schedtune: Add boost retention to RT Boosted RT tasks can be deboosted quickly, this makes boost usless for RT tasks and causes lots of glitching. Use timers to prevent de-boost too soon and wait for long enough such that next enqueue happens after a threshold. While this can be solved in the governor, there are following advantages: - The approach used is governor-independent - Reduces boost group lock contention for frequently sleepers/wakers Note: Fixed build breakage due to schedfreq dependency which isn't used for RT anymore. Bug: 30210506 Change-Id: I428a2695cac06cc3458cdde0dea72315e4e66c00 Signed-off-by: Joel Fernandes <joelaf@google.com>	2018-03-05 21:52:26 +05:30
Ke Wang	272395d80b	ANDROID: sched: EAS: check energy_aware() before calling select_energy_cpu_brute() in up-migrate path In up-migrate path, select_energy_cpu_brute() was called directly without checking energy_aware(). This will make select_energy_cpu_brute() always worked even disabling energy_aware() on the asymmetric cpu capacity system. Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>	2018-03-05 21:52:13 +05:30
Amit Pundir	24740dab5c	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Conflicts: fs/f2fs/extent_cache.c Pick changes from AOSP Change-Id: Icd8a85ac0c19a8aa25cd2591a12b4e9b85bdf1c5 ("f2fs: catch up to v4.14-rc1") fs/f2fs/namei.c Pick changes from AOSP F2FS backport commit `7d5c08fd91` ("f2fs: backport from (`4c1fad64` - Merge tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs)")	2018-03-05 20:20:17 +05:30
Steven Rostedt (VMware)	911357aed6	sched/rt: Up the root domain ref count when passing it around via IPIs commit `364f566537` upstream. When issuing an IPI RT push, where an IPI is sent to each CPU that has more than one RT task scheduled on it, it references the root domain's rto_mask, that contains all the CPUs within the root domain that has more than one RT task in the runable state. The problem is, after the IPIs are initiated, the rq->lock is released. This means that the root domain that is associated to the run queue could be freed while the IPIs are going around. Add a sched_get_rd() and a sched_put_rd() that will increment and decrement the root domain's ref count respectively. This way when initiating the IPIs, the scheduler will up the root domain's ref count before releasing the rq->lock, ensuring that the root domain does not go away until the IPI round is complete. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `4bdced5c9a` ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-02-16 20:09:40 +01:00
Steven Rostedt (VMware)	af9de1a10f	sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() commit `ad0f1d9d65` upstream. When the rto_push_irq_work_func() is called, it looks at the RT overloaded bitmask in the root domain via the runqueue (rq->rd). The problem is that during CPU up and down, nothing here stops rq->rd from changing between taking the rq->rd->rto_lock and releasing it. That means the lock that is released is not the same lock that was taken. Instead of using this_rq()->rd to get the root domain, as the irq work is part of the root domain, we can simply get the root domain from the irq work that is passed to the routine: container_of(work, struct root_domain, rto_push_work) This keeps the root domain consistent. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `4bdced5c9a` ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-02-16 20:09:40 +01:00
Tao Huang	8d2d0b6a51	Merge tag 'lsk-v4.4-18.02-android' of git://git.linaro.org/kernel/linux-linaro-stable.git LSK 18.02 v4.4-android * tag 'lsk-v4.4-18.02-android': (131 commits) Linux 4.4.114 nfsd: auth: Fix gid sorting when rootsquash enabled net: tcp: close sock if net namespace is exiting flow_dissector: properly cap thoff field ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY net: Allow neigh contructor functions ability to modify the primary_key vmxnet3: repair memory leak sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf sctp: do not allow the v4 socket to bind a v4mapped v6 address r8169: fix memory corruption on retrieval of hardware statistics. pppoe: take ->needed_headroom of lower device into account on xmit net: qdisc_pkt_len_init() should be more robust tcp: __tcp_hdrlen() helper net: igmp: fix source address check for IGMPv3 reports lan78xx: Fix failure in USB Full Speed ipv6: ip6_make_skb() needs to clear cork.base.dst ipv6: fix udpv6 sendmsg crash caused by too small MTU ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state hrtimer: Reset hrtimer cpu base proper on CPU hotplug ...	2018-02-07 20:59:20 +08:00
Alex Shi	59e35359ec	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2018-02-01 12:02:38 +08:00
Daniel Bristot de Oliveira	1d00e3d9b7	sched/deadline: Use the revised wakeup rule for suspending constrained dl tasks commit `3effcb4247` upstream. We have been facing some problems with self-suspending constrained deadline tasks. The main reason is that the original CBS was not designed for such sort of tasks. One problem reported by Xunlei Pang takes place when a task suspends, and then is awakened before the deadline, but so close to the deadline that its remaining runtime can cause the task to have an absolute density higher than allowed. In such situation, the original CBS assumes that the task is facing an early activation, and so it replenishes the task and set another deadline, one deadline in the future. This rule works fine for implicit deadline tasks. Moreover, it allows the system to adapt the period of a task in which the external event source suffered from a clock drift. However, this opens the window for bandwidth leakage for constrained deadline tasks. For instance, a task with the following parameters: runtime = 5 ms deadline = 7 ms [density] = 5 / 7 = 0.71 period = 1000 ms If the task runs for 1 ms, and then suspends for another 1ms, it will be awakened with the following parameters: remaining runtime = 4 laxity = 5 presenting a absolute density of 4 / 5 = 0.80. In this case, the original CBS would assume the task had an early wakeup. Then, CBS will reset the runtime, and the absolute deadline will be postponed by one relative deadline, allowing the task to run. The problem is that, if the task runs this pattern forever, it will keep receiving bandwidth, being able to run 1ms every 2ms. Following this behavior, the task would be able to run 500 ms in 1 sec. Thus running more than the 5 ms / 1 sec the admission control allowed it to run. Trying to address the self-suspending case, Luca Abeni, Giuseppe Lipari, and Juri Lelli [1] revisited the CBS in order to deal with self-suspending tasks. In the new approach, rather than replenishing/postponing the absolute deadline, the revised wakeup rule adjusts the remaining runtime, reducing it to fit into the allowed density. A revised version of the idea is: At a given time t, the maximum absolute density of a task cannot be higher than its relative density, that is: runtime / (deadline - t) <= dl_runtime / dl_deadline Knowing the laxity of a task (deadline - t), it is possible to move it to the other side of the equality, thus enabling to define max remaining runtime a task can use within the absolute deadline, without over-running the allowed density: runtime = (dl_runtime / dl_deadline) * (deadline - t) For instance, in our previous example, the task could still run: runtime = ( 5 / 7 ) * 5 runtime = 3.57 ms Without causing damage for other deadline tasks. It is note worthy that the laxity cannot be negative because that would cause a negative runtime. Thus, this patch depends on the patch: `df8eac8caf` ("sched/deadline: Throttle a constrained deadline task activated after the deadline") Which throttles a constrained deadline task activated after the deadline. Finally, it is also possible to use the revised wakeup rule for all other tasks, but that would require some more discussions about pros and cons. [The main difference from the original commit is that the BW_SHIFT define was not present yet. As BW_SHIFT was introduced in a new feature, I just used the value (20), likewise we used to use before the #define. Other changes were required because of comments. - bistrot] Reported-by: Xunlei Pang <xpang@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> [peterz: replaced dl_is_constrained with dl_is_implicit] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Link: http://lkml.kernel.org/r/5c800ab3a74a168a84ee5f3f84d12a02e11383be.1495803804.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-31 12:06:07 +01:00
Tao Huang	640193f76b	Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git * linux-linaro-lsk-v4.4-android: (733 commits) LSK-ANDROID: memcg: Remove wrong ->attach callback LSK-ANDROID: arm64: mm: Fix __create_pgd_mapping() call ANDROID: sdcardfs: Move default_normal to superblock blkdev: Refactoring block io latency histogram codes FROMLIST: arm64: kpti: Fix the interaction between ASID switching and software PAN FROMLIST: arm64: Move post_ttbr_update_workaround to C code FROMLIST: arm64: mm: Rename post_ttbr0_update_workaround sched: EAS: Initialize push_task as NULL to avoid direct reference on out_unlock path fscrypt: updates on 4.15-rc4 ANDROID: uid_sys_stats: fix the comment BACKPORT: tee: indicate privileged dev in gen_caps BACKPORT: tee: optee: sync with new naming of interrupts BACKPORT: tee: tee_shm: Constify dma_buf_ops structures. BACKPORT: tee: optee: interruptible RPC sleep BACKPORT: tee: optee: add const to tee_driver_ops and tee_desc structures BACKPORT: tee.txt: standardize document format BACKPORT: tee: add forward declaration for struct device BACKPORT: tee: optee: fix uninitialized symbol 'parg' BACKPORT: tee: add ARM_SMCCC dependency BACKPORT: selinux: nlmsgtab: add SOCK_DESTROY to the netlink mapping tables ... Conflicts: arch/arm64/kernel/vdso.c drivers/usb/host/xhci-plat.c include/drm/drmP.h include/linux/kasan.h kernel/time/timekeeping.c mm/kasan/kasan.c security/selinux/nlmsgtab.c Also add this commit: `0bcdc0987c` ("time: Fix ktime_get_raw() incorrect base accumulation")	2018-01-26 19:26:47 +08:00
Xunlei Pang	8bd58b61d2	sched/deadline: Zero out positive runtime after throttling constrained tasks commit `ae83b56a56` upstream. When a contrained task is throttled by dl_check_constrained_dl(), it may carry the remaining positive runtime, as a result when dl_task_timer() fires and calls replenish_dl_entity(), it will not be replenished correctly due to the positive dl_se->runtime. This patch assigns its runtime to 0 if positive after throttling. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `df8eac8caf` ("sched/deadline: Throttle a constrained deadline task activated after the deadline) Link: http://lkml.kernel.org/r/1494421417-27550-1-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-23 19:50:15 +01:00
Ke Wang	204bbd5268	sched: EAS: Initialize push_task as NULL to avoid direct reference on out_unlock path After applying up-migrate patches(`dc626b2` sched: avoid pushing tasks to an offline CPU, `2da014c` sched: Extend active balance to accept 'push_task' argument), leaving EAS disabled and doing a stability test which includes some random cpu plugin/plugout. There are two types crashes happened as below: TYPE 1: [ 2072.653091] c1 ------------[ cut here ]------------ [ 2072.653133] c1 WARNING: CPU: 1 PID: 13 at kernel/fork.c:252 __put_task_struct+0x30/0x124() [ 2072.653173] c1 CPU: 1 PID: 13 Comm: migration/1 Tainted: G W O 4.4.83-01066-g04c5403-dirty #17 [ 2072.653215] c1 [<c011141c>] (unwind_backtrace) from [<c010ced8>] (show_stack+0x20/0x24) [ 2072.653235] c1 [<c010ced8>] (show_stack) from [<c043d7f8>] (dump_stack+0xa8/0xe0) [ 2072.653255] c1 [<c043d7f8>] (dump_stack) from [<c012be04>] (warn_slowpath_common+0x98/0xc4) [ 2072.653273] c1 [<c012be04>] (warn_slowpath_common) from [<c012beec>] (warn_slowpath_null+0x2c/0x34) [ 2072.653291] c1 [<c012beec>] (warn_slowpath_null) from [<c01293b4>] (__put_task_struct+0x30/0x124) [ 2072.653310] c1 [<c01293b4>] (__put_task_struct) from [<c0166964>] (active_load_balance_cpu_stop+0x22c/0x314) [ 2072.653331] c1 [<c0166964>] (active_load_balance_cpu_stop) from [<c01c2604>] (cpu_stopper_thread+0x90/0x144) [ 2072.653352] c1 [<c01c2604>] (cpu_stopper_thread) from [<c014d80c>] (smpboot_thread_fn+0x258/0x270) [ 2072.653370] c1 [<c014d80c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c) [ 2072.653388] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24) [ 2072.653400] c1 ---[ end trace 49c3d154890763fc ]--- [ 2072.653418] c1 Unable to handle kernel NULL pointer dereference at virtual address 00000000 ... [ 2072.832804] c1 [<c01ba00c>] (put_css_set) from [<c01be870>] (cgroup_free+0x6c/0x78) [ 2072.832823] c1 [<c01be870>] (cgroup_free) from [<c01293f8>] (__put_task_struct+0x74/0x124) [ 2072.832844] c1 [<c01293f8>] (__put_task_struct) from [<c0166964>] (active_load_balance_cpu_stop+0x22c/0x314) [ 2072.832860] c1 [<c0166964>] (active_load_balance_cpu_stop) from [<c01c2604>] (cpu_stopper_thread+0x90/0x144) [ 2072.832879] c1 [<c01c2604>] (cpu_stopper_thread) from [<c014d80c>] (smpboot_thread_fn+0x258/0x270) [ 2072.832896] c1 [<c014d80c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c) [ 2072.832914] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24) [ 2072.832930] c1 Code: f57ff05b f590f000 e3e02000 e3a03001 (e1941f9f) [ 2072.839208] c1 ---[ end trace 49c3d154890763fd ]--- TYPE 2: [ 214.742695] c1 ------------[ cut here ]------------ [ 214.742709] c1 kernel BUG at kernel/smpboot.c:136! [ 214.742718] c1 Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [ 214.748785] c1 CPU: 1 PID: 18 Comm: migration/2 Tainted: G W O 4.4.83-00912-g370f62c #1 [ 214.748805] c1 task: ef2d9680 task.stack: ee862000 [ 214.748821] c1 PC is at smpboot_thread_fn+0x168/0x270 [ 214.748832] c1 LR is at smpboot_thread_fn+0xe4/0x270 ... [ 214.821339] c1 [<c014d71c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c) [ 214.821363] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24) [ 214.821378] c1 Code: e5950000 e5943010 e1500003 0a000000 (e7f001f2) [ 214.827676] c1 ---[ end trace da87539f59bab8de ]--- For the first type crash, the root cause is the push_task pointer will be used without initialization on the out_lock path. And maybe cpu hotplug in/out make this happen more easily. For the second type crash, it hits 'BUG_ON(td->cpu != smp_processor_id());' in smpboot_thread_fn(). It seems that OOPS was caused by migration/2 which actually running on cpu1. And I haven't found what actually happened. However, after this fix, the second type crash seems gone too. Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>	2018-01-22 13:16:20 +05:30
Dmitry Vyukov	8c8a5ee21a	BACKPORT: kernel: add kcov code coverage kcov provides code coverage collection for coverage-guided fuzzing (randomized testing). Coverage-guided fuzzing is a testing technique that uses coverage feedback to determine new interesting inputs to a system. A notable user-space example is AFL (http://lcamtuf.coredump.cx/afl/). However, this technique is not widely used for kernel testing due to missing compiler and kernel support. kcov does not aim to collect as much coverage as possible. It aims to collect more or less stable coverage that is function of syscall inputs. To achieve this goal it does not collect coverage in soft/hard interrupts and instrumentation of some inherently non-deterministic or non-interesting parts of kernel is disbled (e.g. scheduler, locking). Currently there is a single coverage collection mode (tracing), but the API anticipates additional collection modes. Initially I also implemented a second mode which exposes coverage in a fixed-size hash table of counters (what Quentin used in his original patch). I've dropped the second mode for simplicity. This patch adds the necessary support on kernel side. The complimentary compiler support was added in gcc revision 231296. We've used this support to build syzkaller system call fuzzer, which has found 90 kernel bugs in just 2 months: https://github.com/google/syzkaller/wiki/Found-Bugs We've also found 30+ bugs in our internal systems with syzkaller. Another (yet unexplored) direction where kcov coverage would greatly help is more traditional "blob mutation". For example, mounting a random blob as a filesystem, or receiving a random blob over wire. Why not gcov. Typical fuzzing loop looks as follows: (1) reset coverage, (2) execute a bit of code, (3) collect coverage, repeat. A typical coverage can be just a dozen of basic blocks (e.g. an invalid input). In such context gcov becomes prohibitively expensive as reset/collect coverage steps depend on total number of basic blocks/edges in program (in case of kernel it is about 2M). Cost of kcov depends only on number of executed basic blocks/edges. On top of that, kernel requires per-thread coverage because there are always background threads and unrelated processes that also produce coverage. With inlined gcov instrumentation per-thread coverage is not possible. kcov exposes kernel PCs and control flow to user-space which is insecure. But debugfs should not be mapped as user accessible. Based on a patch by Quentin Casasnovas. [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode'] [akpm@linux-foundation.org: unbreak allmodconfig] [akpm@linux-foundation.org: follow x86 Makefile layout standards] Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: syzkaller <syzkaller@googlegroups.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tavis Ormandy <taviso@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@google.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: David Drysdale <drysdale@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 64145065 (cherry-picked from `5c9a8750a6`) Change-Id: I17b5e04f6e89b241924e78ec32ead79c38b860ce Signed-off-by: Paul Lawrence <paullawrence@google.com>	2018-01-22 13:15:43 +05:30
Ke Wang	2cc4764f3f	sched: EAS/WALT: Don't take into account of running task's util For upmigrating misfit running task case, the currently running task's util has been counted into cpu_util(). Thus currently __cpu_overutilized() which add task's uitl twice is overestimated. Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>	2018-01-22 13:15:43 +05:30
Viresh Kumar	56c4ae908c	BACKPORT: schedutil: Reset cached freq if it is not in sync with next_freq 'cached_raw_freq' is used to get the next frequency quickly but should always be in sync with sg_policy->next_freq. There are cases where it is not and in such cases it should be reset to avoid switching to incorrect frequencies. Consider this case for example: - policy->cur is 1.2 GHz (Max) - New request comes for 780 MHz and we store that in cached_raw_freq. - Based on 780 MHz, we calculate the effective frequency as 800 MHz. - We then decide not to update the frequency as sugov_up_down_rate_limit() return true. - Here cached_raw_freq is 780 MHz and sg_policy->next_freq is 1.2 GHz. - Now if the utilization doesn't change in next request, then the next target frequency will still be 780 MHz and it will match with cached_raw_freq and so we will directly return 1.2 GHz instead of 800 MHz. BACKPORT of upstream commit `07458f6a51` ("cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq"). This also updates sugov_update_commit() for handling up/down tunables, which aren't present in mainline. Change-Id: I70bca2c5dfdb545a0471d1c9e4c5addb30ab5494 Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2018-01-22 13:15:43 +05:30
Amit Pundir	395ae9f4e6	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Conflicts: kernel/fork.c Conflict due to Kaiser implementation in LTS 4.4.110. net/ipv4/raw.c Minor conflict due to LTS commit `be27b620a8` ("net: ipv4: fix for a race condition in raw_sendmsg")	2018-01-22 11:50:22 +05:30
Andy Lutomirski	18a5348d49	sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off() commit `252d2a4117` upstream. idle_task_exit() can be called with IRQs on x86 on and therefore should use switch_mm(), not switch_mm_irqs_off(). This doesn't seem to cause any problems right now, but it will confuse my upcoming TLB flush changes. Nonetheless, I think it should be backported because it's trivial. There won't be any meaningful performance impact because idle_task_exit() is only used when offlining a CPU. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: `f98db6013c` ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler") Link: http://lkml.kernel.org/r/ca3d1a9fa93a0b49f5a8ff729eda3640fb6abdf9.1497034141.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-25 14:22:09 +01:00
Andy Lutomirski	425f13a366	sched/core: Add switch_mm_irqs_off() and use it in the scheduler commit `f98db6013c` upstream. By default, this is the same thing as switch_mm(). x86 will override it as an optimization. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/df401df47bdd6be3e389c6f1e3f5310d70e81b2c.1461688545.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-25 14:22:09 +01:00
Alex Shi	317c5652a7	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2017-12-21 12:06:53 +08:00
Steven Rostedt (VMware)	51b3eac39a	sched/deadline: Use deadline instead of period when calculating overflow [ Upstream commit `2317d5f1c3` ] I was testing Daniel's changes with his test case, and tweaked it a little. Instead of having the runtime equal to the deadline, I increased the deadline ten fold. Daniel's test case had: attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms / attr.sched_deadline = 2 1000 * 1000; /* 2 ms / attr.sched_period = 2 1000 * 1000 * 1000; /* 2 s / To make it more interesting, I changed it to: attr.sched_runtime = 2 1000 * 1000; /* 2 ms / attr.sched_deadline = 20 1000 * 1000; /* 20 ms / attr.sched_period = 2 1000 * 1000 * 1000; /* 2 s */ The results were rather surprising. The behavior that Daniel's patch was fixing came back. The task started using much more than .1% of the CPU. More like 20%. Looking into this I found that it was due to the dl_entity_overflow() constantly returning true. That's because it uses the relative period against relative runtime vs the absolute deadline against absolute runtime. runtime / (deadline - t) > dl_runtime / dl_period There's even a comment mentioning this, and saying that when relative deadline equals relative period, that the equation is the same as using deadline instead of period. That comment is backwards! What we really want is: runtime / (deadline - t) > dl_runtime / dl_deadline We care about if the runtime can make its deadline, not its period. And then we can say "when the deadline equals the period, the equation is the same as using dl_period instead of dl_deadline". After correcting this, now when the task gets enqueued, it can throttle correctly, and Daniel's fix to the throttling of sleeping deadline tasks works even when the runtime and deadline are not the same. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Link: http://lkml.kernel.org/r/02135a27f1ae3fe5fd032568a5a2f370e190e8d7.1488392936.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-20 10:04:55 +01:00
Daniel Bristot de Oliveira	ca91884bcf	sched/deadline: Throttle a constrained deadline task activated after the deadline [ Upstream commit `df8eac8caf` ] During the activation, CBS checks if it can reuse the current task's runtime and period. If the deadline of the task is in the past, CBS cannot use the runtime, and so it replenishes the task. This rule works fine for implicit deadline tasks (deadline == period), and the CBS was designed for implicit deadline tasks. However, a task with constrained deadline (deadine < period) might be awakened after the deadline, but before the next period. In this case, replenishing the task would allow it to run for runtime / deadline. As in this case deadline < period, CBS enables a task to run for more than the runtime / period. In a very loaded system, this can cause a domino effect, making other tasks miss their deadlines. To avoid this problem, in the activation of a constrained deadline task after the deadline but before the next period, throttle the task and set the replenishing timer to the begin of the next period, unless it is boosted. Reproducer: --------------- %< --------------- int main (int argc, char *argv) { int ret; int flags = 0; unsigned long l = 0; struct timespec ts; struct sched_attr attr; memset(&attr, 0, sizeof(attr)); attr.size = sizeof(attr); attr.sched_policy = SCHED_DEADLINE; attr.sched_runtime = 2 1000 * 1000; /* 2 ms / attr.sched_deadline = 2 1000 * 1000; /* 2 ms / attr.sched_period = 2 1000 * 1000 * 1000; /* 2 s / ts.tv_sec = 0; ts.tv_nsec = 2000 1000; /* 2 ms / ret = sched_setattr(0, &attr, flags); if (ret < 0) { perror("sched_setattr"); exit(-1); } for(;;) { / XXX: you may need to adjust the loop / for (l = 0; l < 150000; l++); / * The ideia is to go to sleep right before the deadline * and then wake up before the next period to receive * a new replenishment. */ nanosleep(&ts, NULL); } exit(0); } --------------- >% --------------- On my box, this reproducer uses almost 50% of the CPU time, which is obviously wrong for a task with 2/2000 reservation. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Link: http://lkml.kernel.org/r/edf58354e01db46bf42df8d2dd32418833f68c89.1488392936.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-20 10:04:55 +01:00
Daniel Bristot de Oliveira	cd0e18d2f2	sched/deadline: Make sure the replenishment timer fires in the next period [ Upstream commit `5ac69d3778` ] Currently, the replenishment timer is set to fire at the deadline of a task. Although that works for implicit deadline tasks because the deadline is equals to the begin of the next period, that is not correct for constrained deadline tasks (deadline < period). For instance: f.c: --------------- %< --------------- int main (void) { for(;;); } --------------- >% --------------- # gcc -o f f.c # trace-cmd record -e sched:sched_switch \ -e syscalls:sys_exit_sched_setattr \ chrt -d --sched-runtime 490000000 \ --sched-deadline 500000000 \ --sched-period 1000000000 0 ./f # trace-cmd report \| grep "{pid of ./f}" After setting parameters, the task is replenished and continue running until being throttled: f-11295 [003] 13322.113776: sys_exit_sched_setattr: 0x0 The task is throttled after running 492318 ms, as expected: f-11295 [003] 13322.606094: sched_switch: f:11295 [-1] R ==> watchdog/3:32 [0] But then, the task is replenished 500719 ms after the first replenishment: <idle>-0 [003] 13322.614495: sched_switch: swapper/3:0 [120] R ==> f:11295 [-1] Running for 490277 ms: f-11295 [003] 13323.104772: sched_switch: f:11295 [-1] R ==> swapper/3:0 [120] Hence, in the first period, the task runs 2 * runtime, and that is a bug. During the first replenishment, the next deadline is set one period away. So the runtime / period starts to be respected. However, as the second replenishment took place in the wrong instant, the next replenishment will also be held in a wrong instant of time. Rather than occurring in the nth period away from the first activation, it is taking place in the (nth period - relative deadline). Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Luca Abeni <luca.abeni@santannapisa.it> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Link: http://lkml.kernel.org/r/ac50d89887c25285b47465638354b63362f8adff.1488392936.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-20 10:04:55 +01:00
Steven Rostedt	af36d95af5	sched/rt: Do not pull from current CPU if only one CPU to pull commit `f73c52a5bc` upstream. Daniel Wagner reported a crash on the BeagleBone Black SoC. This is a single CPU architecture, and does not have a functional arch_send_call_function_single_ipi() implementation which can crash the kernel if that is called. As it only has one CPU, it shouldn't be called, but if the kernel is compiled for SMP, the push/pull RT scheduling logic now calls it for irq_work if the one CPU is overloaded, it can use that function to call itself and crash the kernel. Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system only has a single CPU. But SCHED_FEAT is a constant if sched debugging is turned off. Another fix can also be used, and this should also help with normal SMP machines. That is, do not initiate the pull code if there's only one RT overloaded CPU, and that CPU happens to be the current CPU that is scheduling in a lower priority task. Even on a system with many CPUs, if there's many RT tasks waiting to run on a single CPU, and that CPU schedules in another RT task of lower priority, it will initiate the PULL logic in case there's a higher priority RT task on another CPU that is waiting to run. But if there is no other CPU with waiting RT tasks, it will initiate the RT pull logic on itself (as it still has RT tasks waiting to run). This is a wasted effort. Not only does this help with SMP code where the current CPU is the only one with RT overloaded tasks, it should also solve the issue that Daniel encountered, because it will prevent the PULL logic from executing, as there's only one CPU on the system, and the check added here will cause it to exit the RT pull code. Reported-by: Daniel Wagner <wagi@monom.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Fixes: `4bdced5c9` ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-20 10:04:52 +01:00
Tao Huang	1be021d3fd	sched/fair: fix push_task uninitialized in active_load_balance_cpu_stop Fixes: `a80b8c7559` ("sched: Extend active balance to accept 'push_task' argument") Change-Id: I002bd443c1dbb1d63f195f4b8cb7e9998987a6af Signed-off-by: Tao Huang <huangtao@rock-chips.com>	2017-12-11 19:18:52 +08:00
Tao Huang	afd240d168	Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git * linux-linaro-lsk-v4.4-android: (510 commits) Linux 4.4.103 Revert "sctp: do not peel off an assoc from one netns to another one" xen: xenbus driver must not accept invalid transaction ids s390/kbuild: enable modversions for symbols exported from asm ASoC: wm_adsp: Don't overrun firmware file buffer when reading region data btrfs: return the actual error value from from btrfs_uuid_tree_iterate ASoC: rsnd: don't double free kctrl netfilter: nf_tables: fix oob access netfilter: nft_queue: use raw_smp_processor_id() spi: SPI_FSL_DSPI should depend on HAS_DMA staging: iio: cdc: fix improper return value iio: light: fix improper return value mac80211: Suppress NEW_PEER_CANDIDATE event if no room mac80211: Remove invalid flag operations in mesh TSF synchronization drm: Apply range restriction after color adjustment when allocation ALSA: hda - Apply ALC269_FIXUP_NO_SHUTUP on HDA_FIXUP_ACT_PROBE ath10k: set CTS protection VDEV param only if VDEV is up ath10k: fix potential memory leak in ath10k_wmi_tlv_op_pull_fw_stats() ath10k: ignore configuring the incorrect board_id ath10k: fix incorrect txpower set by P2P_DEVICE interface ... Conflicts: drivers/media/v4l2-core/v4l2-ctrls.c kernel/sched/fair.c Change-Id: I48152b2a0ab1f9f07e1da7823119b94f9b9e1751	2017-12-01 11:04:13 +08:00
Alex Shi	0f646885c3	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2017-12-01 01:02:04 +08:00
Steven Rostedt (Red Hat)	cb1831a83e	sched/rt: Simplify the IPI based RT balancing logic commit `4bdced5c9a` upstream. When a CPU lowers its priority (schedules out a high priority task for a lower priority one), a check is made to see if any other CPU has overloaded RT tasks (more than one). It checks the rto_mask to determine this and if so it will request to pull one of those tasks to itself if the non running RT task is of higher priority than the new priority of the next task to run on the current CPU. When we deal with large number of CPUs, the original pull logic suffered from large lock contention on a single CPU run queue, which caused a huge latency across all CPUs. This was caused by only having one CPU having overloaded RT tasks and a bunch of other CPUs lowering their priority. To solve this issue, commit: `b6366f048e` ("sched/rt: Use IPI to trigger RT task push migration instead of pulling") changed the way to request a pull. Instead of grabbing the lock of the overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work. Although the IPI logic worked very well in removing the large latency build up, it still could suffer from a large number of IPIs being sent to a single CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet, when I tested this on a 120 CPU box, with a stress test that had lots of RT tasks scheduling on all CPUs, it actually triggered the hard lockup detector! One CPU had so many IPIs sent to it, and due to the restart mechanism that is triggered when the source run queue has a priority status change, the CPU spent minutes! processing the IPIs. Thinking about this further, I realized there's no reason for each run queue to send its own IPI. As all CPUs with overloaded tasks must be scanned regardless if there's one or many CPUs lowering their priority, because there's no current way to find the CPU with the highest priority task that can schedule to one of these CPUs, there really only needs to be one IPI being sent around at a time. This greatly simplifies the code! The new approach is to have each root domain have its own irq work, as the rto_mask is per root domain. The root domain has the following fields attached to it: rto_push_work - the irq work to process each CPU set in rto_mask rto_lock - the lock to protect some of the other rto fields rto_loop_start - an atomic that keeps contention down on rto_lock the first CPU scheduling in a lower priority task is the one to kick off the process. rto_loop_next - an atomic that gets incremented for each CPU that schedules in a lower priority task. rto_loop - a variable protected by rto_lock that is used to compare against rto_loop_next rto_cpu - The cpu to send the next IPI to, also protected by the rto_lock. When a CPU schedules in a lower priority task and wants to make sure overloaded CPUs know about it. It increments the rto_loop_next. Then it atomically sets rto_loop_start with a cmpxchg. If the old value is not "0", then it is done, as another CPU is kicking off the IPI loop. If the old value is "0", then it will take the rto_lock to synchronize with a possible IPI being sent around to the overloaded CPUs. If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no IPI being sent around, or one is about to finish. Then rto_cpu is set to the first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set in rto_mask, then there's nothing to be done. When the CPU receives the IPI, it will first try to push any RT tasks that is queued on the CPU but can't run because a higher priority RT task is currently running on that CPU. Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it finds one, it simply sends an IPI to that CPU and the process continues. If there's no more CPUs in the rto_mask, then rto_loop is compared with rto_loop_next. If they match, everything is done and the process is over. If they do not match, then a CPU scheduled in a lower priority task as the IPI was being passed around, and the process needs to start again. The first CPU in rto_mask is sent the IPI. This change removes this duplication of work in the IPI logic, and greatly lowers the latency caused by the IPIs. This removed the lockup happening on the 120 CPU machine. It also simplifies the code tremendously. What else could anyone ask for? Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and supplying me with the rto_start_trylock() and rto_start_unlock() helper functions. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Clark Williams <williams@redhat.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:37:25 +00:00
Paul E. McKenney	8ff3471878	sched: Make resched_cpu() unconditional commit `7c2102e56a` upstream. The current implementation of synchronize_sched_expedited() incorrectly assumes that resched_cpu() is unconditional, which it is not. This means that synchronize_sched_expedited() can hang when resched_cpu()'s trylock fails as follows (analysis by Neeraj Upadhyay): o CPU1 is waiting for expedited wait to complete: sync_rcu_exp_select_cpus rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5 IPI sent to CPU5 synchronize_sched_expedited_wait ret = swait_event_timeout(rsp->expedited_wq, sync_rcu_preempt_exp_done(rnp_root), jiffies_stall); expmask = 0x20, CPU 5 in idle path (in cpuidle_enter()) o CPU5 handles IPI and fails to acquire rq lock. Handles IPI sync_sched_exp_handler resched_cpu returns while failing to try lock acquire rq->lock need_resched is not set o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to idle (schedule() is not called). o CPU 1 reports RCU stall. Given that resched_cpu() is now used only by RCU, this commit fixes the assumption by making resched_cpu() unconditional. Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org> Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:37:19 +00:00
Todd Kjos	f353dca59d	Revert "ANDROID: sched/rt: schedtune: Add boost retention to RT" This reverts commit `d194ba5d71`. Reason for revert: Broke some builds. Will fix and resubmit. Change-Id: I4e6fa1562346eda1bbf058f1d5ace5ba6256ce07	2017-11-20 21:15:59 +05:30
Joel Fernandes	4053577123	ANDROID: sched/rt: schedtune: Add boost retention to RT Boosted RT tasks can be deboosted quickly, this makes boost usless for RT tasks and causes lots of glitching. Use timers to prevent de-boost too soon and wait for long enough such that next enqueue happens after a threshold. While this can be solved in the governor, there are following advantages: - The approach used is governor-independent - Reduces boost group lock contention for frequently sleepers/wakers - Works with schedfreq without any other schedfreq hacks. Bug: 30210506 Change-Id: I41788b235586988be446505deb7c0529758a9898 Signed-off-by: Joel Fernandes <joelaf@google.com>	2017-11-20 21:15:59 +05:30
Viresh Kumar	0dae60cb4e	cpufreq: Drop schedfreq governor We all should be using (and improving) the schedutil governor now. Get rid of the non-upstream governor. Tested on Hikey. Change-Id: Ic660756536e5da51952738c3c18b94e31f58cd57 Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2017-11-20 21:15:59 +05:30
Joel Fernandes	d07b5deabc	ANDROID: sched/rt: add schedtune accounting This patch adds schedtune enqueue/dequeue to RT scheduling class. Change-Id: If416e64319d62191f3aedd675d3e9a21fe2102fb Signed-off-by: Joel Fernandes <joelaf@google.com>	2017-11-20 21:15:59 +05:30
Ke Wang	0dd512ae00	sched: EAS: Fix the calculation of group util in group_idle_state() util_delta becomes not zero in eenv_before, which will affect the calculation of grp_util in group_idle_state(). Fix it under the new condition. Change-Id: Ic3853bb45876a8e388afcbe4e72d25fc42b1d7b0 Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>	2017-11-20 21:15:59 +05:30
Ke Wang	4e8e7d6d23	sched: EAS: update trg_cpu to backup_cpu if no energy saving for target_cpu If no energy saving for target_cpu in the calculation of energy_diff(), backup_cpu will be set as the new dst_cpu for the next calculation. At this point, we also need update the new trg_cpu as backup_cpu to make sure the subsequent calculation of energy_diff() is correct. Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>	2017-11-20 21:15:59 +05:30
Ke Wang	bf9ce3c71f	sched: EAS: Fix the condition to distinguish energy before/after Before commit `5f8b3a757d` ("sched/fair: consider task utilization in group_norm_util()"), eenv->util_delta is used to distinguish energy before and energy after in sched_group_energy(). After that commit, eenv->util_delta can not do that any more. In this commit, use trg_cpu to distinguish energy before/after in sched_group_energy(). Before apply this commit, cap_before/cap_delta is not correct: <idle>-0 [001] 147504.608920: sched_energy_diff: pid=7 comm=rcu_preempt src_cpu=1 dst_cpu=3 usage_delta=7 nrg_before=250 nrg_after=250 nrg_diff=0 cap_before=0 cap_after=528 cap_delta=1056 nrg_delta=0 nrg_payoff=0 After apply this commit, cap_before/cap_delta retrun to normal: <idle>-0 [001] 220.494011: sched_energy_diff: pid=7 comm=rcu_preempt src_cpu=1 dst_cpu=2 usage_delta=3 nrg_before=248 nrg_after=248 nrg_diff=0 cap_before=528 cap_after=528 cap_delta=0 nrg_delta=0 nrg_payoff=0 Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>	2017-11-20 21:15:59 +05:30
Joonwoo Park	dbf572f159	sched: EAS: upmigrate misfit current task Upmigrate misfit current task upon scheduler tick with stopper. We can kick an random (not necessarily big CPU) NOHZ idle CPU when a CPU bound task is in need of upmigration. But it's not efficient as that way needs following unnecessary wakeups: 1. Busy little CPU A to kick idle B 2. B runs idle balancer and enqueue migration/A 3. B goes idle 4. A runs migration/A, enqueues busy task on B. 5. B wakes up again. This change makes active upmigration more efficiently by doing: 1. Busy little CPU A find target CPU B upon tick. 2. CPU A enqueues migration/A. Change-Id: Ie865738054ea3296f28e6ba01710635efa7193c0 [joonwoop: The original version had logic to reserve CPU. The logic is omitted in this version.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>	2017-11-20 21:15:59 +05:30
Prasad Sodagudi	b03fa7ff06	sched: avoid pushing tasks to an offline CPU Currently active_load_balance_cpu_stop is run by cpu stopper and it pushes running tasks off the busiest CPU onto idle target CPU. But there is no check to see whether target cpu is offline or not before pushing the tasks. With the introduction of active migration in the scheduler tick path (see check_for_migration()) there have been instances of attempts to migrate tasks to offline CPUs. Add a check as to whether the target cpu is online or not to prevent scheduling on offline CPUs. Change-Id: Ib8ac7f8aeabd3ca7365f3eae977075952dab4f21 Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>	2017-11-20 21:15:59 +05:30
Srivatsa Vaddagiri	a80b8c7559	sched: Extend active balance to accept 'push_task' argument Active balance currently picks one task to migrate from busy cpu to a chosen cpu (push_cpu). This patch extends active load balance to recognize a particular task ('push_task') that needs to be migrated to 'push_cpu'. This capability will be leveraged by HMP-aware task placement in a subsequent patch. Change-Id: If31320111e6cc7044e617b5c3fd6d8e0c0e16952 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2017-11-20 21:15:59 +05:30
Joel Fernandes	ace670d5c6	Revert "sched/core: Warn if ENERGY_AWARE is enabled but data is missing" This reverts commit `a21299785a`. Change-Id: Idb707c80788d8b6d26d400a59d9b14f854cce89f	2017-11-20 21:15:59 +05:30
Joel Fernandes	05a1b4ba04	Revert "sched/core: fix have_sched_energy_data build warning" This reverts commit `a899b9085c`. Reverting for now due to suspend warning issues. Change-Id: I9387297b52552a73d5a253b9e8e4467b366479a5	2017-11-20 21:15:59 +05:30
Joel Fernandes	27f6af2701	sched/core: fix have_sched_energy_data build warning have_sched_energy_data is defined only for CONFIG_SMP, so declare it only with CONFIG_SMP. Fixes warning from intel bot: tree: https://android.googlesource.com/kernel/msm android-4.4 head: `a21299785a` commit: `a21299785a` [5/5] sched/core: Warn if ENERGY_AWARE is enabled but data is missing config: i386-randconfig-x002-201743 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: git checkout `a21299785a` # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): >> kernel//sched/core.c:94:13: warning: 'have_sched_energy_data' used but never defined static bool have_sched_energy_data(void); ^~~~~~~~~~~~~~~~~~~~~~ vim +/have_sched_energy_data +94 kernel//sched/core.c 93 > 94 static bool have_sched_energy_data(void); 95 Change-Id: I266b63ece6fb31d2b5b11821a8244e147ba6d3a4 Signed-off-by: Joel Fernandes <joelaf@google.com>	2017-11-20 21:15:59 +05:30
Brendan Jackman	3dcbf5e447	sched/core: Warn if ENERGY_AWARE is enabled but data is missing If the EAS energy model is missing or incomplete, i.e. sd_scs is NULL, then sched_group_energy will return -EINVAL on the assumption that it raced with a CPU hotplug event. In that case, energy_diff will return 0 and the energy-aware wake path will silently fail to trigger any migrations. This case can be triggered by disabling CONFIG_SCHED_MC on existing platforms, so that there are no sched_groups with the SD_SHARE_CAP_STATES flag, so that sd_scs is NULL. Add checks so that a warning is printed if EAS is ever enabled while the necessary data is not present. Change-Id: Id233a510b5ad8b7fcecac0b1d789e730bbfc7c4a Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>	2017-11-20 21:15:59 +05:30
Vikram Mulukutla	907c5489f9	sched: walt: Correct WALT window size initialization It is preferable that WALT window rollover occurs just before a tick, since the tick is an opportune moment to record a complete window's statistics, as well as report those stats to the cpu frequency governor. When CONFIG_HZ results in a TICK_NSEC that isn't a integral number, this requirement may be violated. Account for this by reducing the WALT window size to the nearest multiple of TICK_NSEC. Commit `d368c6faa1` ("sched: walt: fix window misalignment when HZ=300") attempted to do this but WALT isn't using MIN_SCHED_RAVG_WINDOW as the window size and the patch was doing nothing. Also, change the type of 'walt_disabled' to bool and warn if an invalid window size causes WALT to be disabled. Change-Id: Ie3dcfc21a3df4408254ca1165a355bbe391ed5c7 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>	2017-11-20 21:15:59 +05:30
Brendan Jackman	2b022c3747	FROMLIST: sched/fair: Use wake_q length as a hint for wake_wide (from https://patchwork.kernel.org/patch/9895261/) This patch adds a parameter to select_task_rq, sibling_count_hint allowing the caller, where it has this information, to inform the sched_class the number of tasks that are being woken up as part of the same event. The wake_q mechanism is one case where this information is available. select_task_rq_fair can then use the information to detect that it needs to widen the search space for task placement in order to avoid overloading the last-level cache domain's CPUs. * * * The reason I am investigating this change is the following use case on ARM big.LITTLE (asymmetrical CPU capacity): 1 task per CPU, which all repeatedly do X amount of work then pthread_barrier_wait (i.e. sleep until the last task finishes its X and hits the barrier). On big.LITTLE, the tasks which get a "big" CPU finish faster, and then those CPUs pull over the tasks that are still running: v CPU v ->time-> ------------- 0 (big) 11111 /333 ------------- 1 (big) 22222 /444\| ------------- 2 (LITTLE) 333333/ ------------- 3 (LITTLE) 444444/ ------------- Now when task 4 hits the barrier (at \|) and wakes the others up, there are 4 tasks with prev_cpu=<big> and 0 tasks with prev_cpu=<little>. want_affine therefore means that we'll only look in CPUs 0 and 1 (sd_llc), so tasks will be unnecessarily coscheduled on the bigs until the next load balance, something like this: v CPU v ->time-> ------------------------ 0 (big) 11111 /333 31313\33333 ------------------------ 1 (big) 22222 /444\|424\4444444 ------------------------ 2 (LITTLE) 333333/ \222222 ------------------------ 3 (LITTLE) 444444/ \1111 ------------------------ ^^^ underutilization So, I'm trying to get want_affine = 0 for these tasks. I don't _think_ any incarnation of the wakee_flips mechanism can help us here because which task is waker and which tasks are wakees generally changes with each iteration. However pthread_barrier_wait (or more accurately FUTEX_WAKE) has the nice property that we know exactly how many tasks are being woken, so we can cheat. It might be a disadvantage that we "widen" _every_ task that's woken in an event, while select_idle_sibling would work fine for the first sd_llc_size - 1 tasks. IIUC, if wake_affine() behaves correctly this trick wouldn't be necessary on SMP systems, so it might be best guarded by the presence of SD_ASYM_CPUCAPACITY? * * * Final note.. In order to observe "perfect" behaviour for this use case, I also had to disable the TTWU_QUEUE sched feature. Suppose during the wakeup above we are working through the work queue and have placed tasks 3 and 2, and are about to place task 1: v CPU v ->time-> -------------- 0 (big) 11111 /333 3 -------------- 1 (big) 22222 /444\|4 -------------- 2 (LITTLE) 333333/ 2 -------------- 3 (LITTLE) 444444/ <- Task 1 should go here -------------- If TTWU_QUEUE is enabled, we will not yet have enqueued task 2 (having instead sent a reschedule IPI) or attached its load to CPU 2. So we are likely to also place task 1 on cpu 2. Disabling TTWU_QUEUE means that we enqueue task 2 before placing task 1, solving this issue. TTWU_QUEUE is there to minimise rq lock contention, and I guess that this contention is less of an issue on big.LITTLE systems since they have relatively few CPUs, which suggests the trade-off makes sense here. Change-Id: I2080302839a263e0841a89efea8589ea53bbda9c Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Matt Fleming <matt@codeblueprint.co.uk>	2017-11-20 21:15:59 +05:30
Joonwoo Park	ae0177aba1	sched: WALT: account cumulative window demand Energy cost estimation has been a long lasting challenge for WALT because WALT guides CPU frequency based on the CPU utilization of previous window. Consequently it's not possible to know newly waking-up task's energy cost until WALT's end of the current window. The WALT already tracks 'Previous Runnable Sum' (prev_runnable_sum) and 'Cumulative Runnable Average' (cr_avg). They are designed for CPU frequency guidance and task placement but unfortunately both are not suitable for the energy cost estimation. It's because using prev_runnable_sum for energy cost calculation would make us to account CPU and task's energy solely based on activity in the previous window so for example, any task didn't have an activity in the previous window will be accounted as a 'zero energy cost' task. Energy estimation with cr_avg is what energy_diff() relies on at present. However cr_avg can only represent instantaneous picture of energy cost thus for example, if a CPU was fully occupied for an entire WALT window and became idle just before window boundary, and if there is a wake-up, energy_diff() accounts that CPU is a 'zero energy cost' CPU. As a result, introduce a new accounting unit 'Cumulative Window Demand'. The cumulative window demand tracks all the tasks' demands have seen in current window which is neither instantaneous nor actual execution time. Because task demand represents estimated scaled execution time when the task runs a full window, accumulation of all the demands represents predicted CPU load at the end of window. Thus we can estimate CPU's frequency at the end of current WALT window with the cumulative window demand. The use of prev_runnable_sum for the CPU frequency guidance and cr_avg for the task placement have not changed and these are going to be used for both purpose while this patch aims to add an additional statistics. Change-Id: I9908c77ead9973a26dea2b36c001c2baf944d4f5 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2017-11-20 21:15:59 +05:30
Leo Yan	3e5b2955be	sched/fair: remove useless variable in find_best_target Patch `5680f23f20` ("sched/fair: streamline find_best_target heuristics") has reworked function find_best_target, as result the variable "target_util" is useless now. So remove it. Change-Id: I5447062419e5828a49115119984fac6cd37db034 Signed-off-by: Leo Yan <leo.yan@linaro.org>	2017-11-20 21:15:59 +05:30
Russ Weight	b3501ab084	sched/tune: access schedtune_initialized under CGROUP_SCHEDTUNE schedtune_initialized is protected by CONFIG_CGROUP_SCHEDTUNE, but is being used without CONFIG_CGROUP_SCHEDTUNE being defined. Add appropriate ifdefs around the usage of schedtune_initialized to avoid a compilation error when CONFIG_CGROUP_SCHEDTUNE is not defined. Change-Id: Iab79bf053d74db3eeb84c09d71d43b4e39746ed2 Signed-off-by: Russ Weight <russell.h.weight@intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com>	2017-11-20 21:15:59 +05:30
Patrick Bellasi	ab4dd38143	sched/fair: consider task utilization in group_max_util() The group_max_util() function is used to compute the maximum utilization across the CPUs of a certain energy_env configuration. Its main client is the energy_diff function when it needs to compute the SG capacity for one of the before/after scheduling candidates. Currently, the energy_diff function sets util_delta = 0 when it wants to compute the energy corresponding to the scheduling candidate where the task runs in the previous CPU. This implies that, for the task waking up in the previous CPU we consider only its blocked load tracked by the CPU RQ. However, in case of a medium-big task which is waking up on a long time idle CPU, this blocked load can be already completely decayed. More in general, the current approach is biased towards under-estimating the capacity requirements for the "before" scheduling candidate. This patch fixes this by: - always use the cpu_util_wake() to properly get the utilization of a CPU without any (partially decayed) contribution of the waking up task - adding the task utilization to the cpu_util_wake just for the target cpu The "target CPU" is defined by the energy_env to be either the src_cpu or the dst_cpu, depending on which scheduling candidate we are considering. Finally, since this update removes the last usage of calc_util_delta() this function is now safely removed. Change-Id: I20ee1bcf40cee6bf6e265fb2d32ef79061ad6ced Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2017-11-20 21:15:59 +05:30
Chris Redpath	df0d98c6ab	sched/fair: consider task utilization in group_norm_util() The group_norm_util() function is used to compute the normalized utilization of a SG given a certain energy_env configuration. The main client of this function is the energy_diff function when it comes to compute the SG energy for one of the before/after scheduling candidates. Currently, the energy_diff function sets util_delta = 0 when it wants to compute the energy corresponding to the scheduling candidate where the task runs in the previous CPU. This implies that, for the task waking up in the previous CPU we consider only its blocked load tracked by the CPU RQ. However, in case of a medium-big task which is waking up on a long time idle CPU, this blocked load can be already completely decayed. More in general, the current approach is biased towards under-estimating the energy consumption for the "before" scheduling candidate. This patch fixes this by: - always use the cpu_util_wake() to properly get the utilization of a CPU without any (partially decayed) contribution of the waking up task - adding the task utilization to the cpu_util_wake just for the target cpu The "target CPU" is defined by the energy_env to be either the src_cpu or the dst_cpu, depending on which scheduling candidate we are considering. This patch update also the definition of __cpu_norm_util(), which is currently called just by the group_norm_util() function. This allows to simplify the code by using this function just to normalize a specified utilization with respect to a given capacity. This update allows to completely remove any dependency of group_norm_util() from calc_util_delta(). Change-Id: I3b6ec50ce8decb1521faae660e326ab3319d3c82 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2017-11-20 21:15:59 +05:30

1 2 3 4 5 ...

1725 Commits