linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-09 06:33:38 +09:00

Author	SHA1	Message	Date
Christoph Lameter	e86992e170	vmstat: make vmstat_updater deferrable again and shut down on idle Currently the vmstat updater is not deferrable as a result of commit `ba4877b9ca` ("vmstat: do not use deferrable delayed work for vmstat_update"). This in turn can cause multiple interruptions of the applications because the vmstat updater may run at Make vmstate_update deferrable again and provide a function that folds the differentials when the processor is going to idle mode thus addressing the issue of the above commit in a clean way. Note that the shepherd thread will continue scanning the differentials from another processor and will reenable the vmstat workers if it detects any changes. Change-Id: Idf256cfacb40b4dc8dbb6795cf06b34e8fec7a06 Fixes: `ba4877b9ca` ("vmstat: do not use deferrable delayed work for vmstat_update") Signed-off-by: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Git-commit: `0eb77e9880` [shashim@codeaurora.org: resolve minor merge conflicts] Signed-off-by: Shiraz Hashim <shashim@codeaurora.org> [jstultz: fwdport to 4.4] Signed-off-by: John Stultz <john.stultz@linaro.org>	2016-09-14 15:02:22 +05:30
Patrick Bellasi	49e4bbf3cc	DEBUG: sched: add tracepoint for RD overutilized Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-09-14 15:02:22 +05:30
Srinath Sridharan	cf8449f421	sched/walt: Accounting for number of irqs pending on each core Schedules on a core whose irq count is less than a threshold. Improves I/O performance of EAS. Change-Id: I08ff7dd0d22502a0106fc636b1af2e6fe9e758b5	2016-09-14 15:02:22 +05:30
Srivatsa Vaddagiri	b41fa2aec5	sched: Introduce Window Assisted Load Tracking (WALT) use a window based view of time in order to track task demand and CPU utilization in the scheduler. Window Assisted Load Tracking (WALT) implementation credits: Srivatsa Vaddagiri, Steve Muckle, Syed Rameez Mustafa, Joonwoo Park, Pavan Kumar Kondeti, Olav Haugan 2016-03-06: Integration with EAS/refactoring by Vikram Mulukutla and Todd Kjos Change-Id: I21408236836625d4e7d7de1843d20ed5ff36c708 Includes fixes for issues: eas/walt: Use walt_ktime_clock() instead of ktime_get_ns() to avoid a race resulting in watchdog resets BUG: 29353986 Change-Id: Ic1820e22a136f7c7ebd6f42e15f14d470f6bbbdb Handle walt accounting anomoly during resume During resume, there is a corner case where on wakeup, a task's prev_runnable_sum can go negative. This is a workaround that fixes the condition and warns (instead of crashing). BUG: 29464099 Change-Id: I173e7874324b31a3584435530281708145773508 Signed-off-by: Todd Kjos <tkjos@google.com> Signed-off-by: Srinath Sridharan <srinathsr@google.com> Signed-off-by: Juri Lelli <juri.lelli@arm.com> [jstultz: fwdported to 4.4] Signed-off-by: John Stultz <john.stultz@linaro.org>	2016-09-14 15:02:22 +05:30
Srinath Sridharan	3fc52a99e7	sched/tune: Add support for negative boost values Change-Id: I164ee04ba98c3a776605f18cb65ee61b3e917939 Contains also: eas/stune: schedtune cpu boost_max must be non-negative. This is to avoid under-accounting cpu capacity which may cause task stacking and frequency spikes. Change-Id: Ie1c1cbd52a6edb77b4c15a830030aa748dff6f29	2016-09-14 14:59:32 +05:30
Todd Kjos	b312c991e9	sched/fair: add tunable to set initial task load The choice of initial task load upon fork has a large influence on CPU and OPP selection when scheduler-driven DVFS is in use. Make this tuneable by adding a new sysctl "sched_initial_task_util". If the sched governor is not used, the default remains at SCHED_LOAD_SCALE Otherwise, the value from the sysctl is used. This defaults to 0. Signed-off-by: "Todd Kjos <tkjos@google.com>"	2016-09-14 14:59:32 +05:30
Juri Lelli	d42fb8f959	sched/fair: add tunable to force selection at cpu granularity EAS assumes that clusters with smaller capacity cores are more energy-efficient. This may not be true on non-big-little devices, so EAS can make incorrect cluster selections when finding a CPU to wake. The "sched_is_big_little" hint can be used to cause a cpu-based selection instead of cluster-based selection. This change incorporates the addition of the sync hint enable patch EAS did not honour synchronous wakeup hints, a new sysctl is created to ask EAS to use this information when selecting a CPU. The control is called "sched_sync_hint_enable". Also contains: EAS: sched/fair: for SMP bias toward idle core with capacity For SMP devices, on wakeup bias towards idle cores that have capacity vs busy devices that need a higher OPP eas: favor idle cpus for boosted tasks BUG: 29533997 BUG: 29512132 Change-Id: I0cc9a1b1b88fb52916f18bf2d25715bdc3634f9c Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Srinath Sridharan <srinathsr@google.com> eas/sched/fair: Favoring busy cpus with low OPPs BUG: 29533997 BUG: 29512132 Change-Id: I9305b3239698d64278db715a2e277ea0bb4ece79 Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2016-09-14 14:59:32 +05:30
Srinath Sridharan	7169e3a073	sched: EAS: take cstate into account when selecting idle core Introduce a new sysctl for this option, 'sched_cstate_aware'. When this is enabled, select_idle_sibling in CFS is modified to choose the idle CPU in the sibling group which has the lowest idle state index - idle state indexes are assumed to increase as sleep depth and hence wakeup latency increase. In this way, we attempt to minimise wakeup latency when an idle CPU is required. Signed-off-by: Srinath Sridharan <srinathsr@google.com> Includes: sched: EAS: fix select_idle_sibling when sysctl_sched_cstate_aware is enabled, best_idle cpu will not be chosen in the original flow because it will goto done directly Bug: 30107557 Change-Id: Ie09c2e3960cafbb976f8d472747faefab3b4d6ac Signed-off-by: martin_liu <martin_liu@htc.com>	2016-09-14 14:59:32 +05:30
Patrick Bellasi	f0ba6a5d0c	FIXUP: sched: fix build for non-SMP target Currently the build for a single-core (e.g. user-mode) Linux is broken and this configuration is required (at least) to run some network tests. The main issues for the current code support on single-core systems are: 1. {se,rq}::sched_avg is not available nor maintained for !SMP systems This means that load and utilisation signals are NOT available in single core systems. All the EAS code depends on these signals. 2. sched_group_energy is also SMP dependant. Again this means that all the EAS setup and preparation code (energyn model initialization) has to be properly guarded/disabled for !SMP systems. 3. SchedFreq depends on utilization signal, which is not available on !SMP systems. 4. SchedTune is useless on unicore systems if SchedFreq is not available. 5. WALT machinery is not required on single-core systems. This patch addresses all these issues by enforcing some constraints for single-core systems: a) WALT, SchedTune and SchedTune are now dependant on SMP b) The default governor for !SMP systems is INTERACTIVE c) The energy model initialisation/build functions are d) Other minor code re-arrangements and CONFIG_SMP guarding to enable single core builds. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-09-14 14:59:32 +05:30
Patrick Bellasi	13a60dc148	DEBUG: sched/tune: add tracepoint on P-E space filtering Change-Id: I31dfed67c0486713b88efb75df767329f2802e06 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-09-14 14:59:32 +05:30
Patrick Bellasi	486c7afa76	DEBUG: sched/tune: add tracepoint for energy_diff() values Change-Id: Id8fafbd85f6d81248f322e073ee790a7ceec0bf7 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-09-14 14:59:32 +05:30
Patrick Bellasi	88c4572243	DEBUG: sched/tune: add tracepoint for task boost signal Change-Id: I545d3bf5569fc41c0fa70f51dff9a19c11d532ee Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-09-14 14:59:32 +05:30
Joseph Lo	75f2b9bac8	CHROMIUM: sched: update the average of nr_running Doing a Exponential moving average per nr_running++/-- does not guarantee a fixed sample rate which induces errors if there are lots of threads being enqueued/dequeued from the rq (Linpack mt). Instead of keeping track of the avg, the scheduler now keeps track of the integral of nr_running and allows the readers to perform filtering on top. Original-author: Sai Charan Gurrappadi <sgurrappadi@nvidia.com> Change-Id: Id946654f32fa8be0eaf9d8fa7c9a8039b5ef9fab Signed-off-by: Joseph Lo <josephl@nvidia.com> Signed-off-by: Andrew Bresticker <abrestic@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/174694 Reviewed-on: https://chromium-review.googlesource.com/272853 [jstultz: fwdported to 4.4] Signed-off-by: John Stultz <john.stultz@linaro.org>	2016-09-14 14:59:32 +05:30
Patrick Bellasi	a727f6b626	DEBUG: schedtune: add tracepoint for schedtune_tasks_update() values Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-09-14 14:59:32 +05:30
Patrick Bellasi	37437e7367	DEBUG: schedtune: add tracepoint for CPU boost signal Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-09-14 14:59:32 +05:30
Patrick Bellasi	9b8dc08c88	DEBUG: schedtune: add tracepoint for SchedTune configuration update Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-09-14 14:59:32 +05:30
Juri Lelli	bd818ccdee	DEBUG: sched,cpufreq: add cpu_capacity change tracepoint This is useful when we want to compare cpu utilization and cpu curr capacity side by side. Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2016-09-14 14:59:32 +05:30
Juri Lelli	69852bd9a3	DEBUG: sched: add tracepoint for CPU load/util signals Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2016-09-14 14:58:22 +05:30
Juri Lelli	c604a5fa23	DEBUG: sched: add tracepoint for task load/util signals Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2016-09-14 14:58:22 +05:30
Juri Lelli	f4886c38ff	DEBUG: sched: add tracepoint for cpu/freq scale invariance Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2016-09-14 14:58:22 +05:30
Patrick Bellasi	92757bdea5	sched/tune: add initial support for CGroups based boosting To support task performance boosting, the usage of a single knob has the advantage to be a simple solution, both from the implementation and the usability standpoint. However, on a real system it can be difficult to identify a single value for the knob which fits the needs of multiple different tasks. For example, some kernel threads and/or user-space background services should be better managed the "standard" way while we still want to be able to boost the performance of specific workloads. In order to improve the flexibility of the task boosting mechanism this patch is the first of a small series which extends the previous implementation to introduce a "per task group" support. This first patch introduces just the basic CGroups support, a new "schedtune" CGroups controller is added which allows to configure different boost value for different groups of tasks. To keep the implementation simple but still effective for a boosting strategy, the new controller: 1. allows only a two layer hierarchy 2. supports only a limited number of boost groups A two layer hierarchy allows to place each task either: a) in the root control group thus being subject to a system-wide boosting value b) in a child of the root group thus being subject to the specific boost value defined by that "boost group" The limited number of "boost groups" supported is mainly motivated by the observation that in a real system it could be useful to have only few classes of tasks which deserve different treatment. For example, background vs foreground or interactive vs low-priority. As an additional benefit, a limited number of boost groups allows also to have a simpler implementation especially for the code required to compute the boost value for CPUs which have runnable tasks belonging to different boost groups. cc: Tejun Heo <tj@kernel.org> cc: Li Zefan <lizefan@huawei.com> cc: Johannes Weiner <hannes@cmpxchg.org> cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-09-14 14:58:22 +05:30
Patrick Bellasi	724d562ae0	sched/tune: add sysctl interface to define a boost value The current (CFS) scheduler implementation does not allow "to boost" tasks performance by running them at a higher OPP compared to the minimum required to meet their workload demands. To support tasks performance boosting the scheduler should provide a "knob" which allows to tune how much the system is going to be optimised for energy efficiency vs performance. This patch is the first of a series which provides a simple interface to define a tuning knob. One system-wide "boost" tunable is exposed via: /proc/sys/kernel/sched_cfs_boost which can be configured in the range [0..100], to define a percentage where: - 0% boost requires to operate in "standard" mode by scheduling tasks at the minimum capacities required by the workload demand - 100% boost requires to push at maximum the task performances, "regardless" of the incurred energy consumption A boost value in between these two boundaries is used to bias the power/performance trade-off, the higher the boost value the more the scheduler is biased toward performance boosting instead of energy efficiency. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-09-14 14:58:22 +05:30
Steve Muckle	6e4178facd	sched/cpufreq_sched: add trace events Trace events will aid in debugging, profiling and tuning. Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-09-14 14:58:22 +05:30
Michael Turquette	5c905a0861	sched: scheduler-driven cpu frequency selection Scheduler-driven CPU frequency selection hopes to exploit both per-task and global information in the scheduler to improve frequency selection policy, achieving lower power consumption, improved responsiveness/performance, and less reliance on heuristics and tunables. For further discussion on the motivation of this integration see [0]. This patch implements a shim layer between the Linux scheduler and the cpufreq subsystem. The interface accepts capacity requests from the CFS, RT and deadline sched classes. The requests from each sched class are summed on each CPU with a margin applied to the CFS and RT capacity requests to provide some headroom. Deadline requests are expected to be precise enough given their nature to not require headroom. The maximum total capacity request for a CPU in a frequency domain drives the requested frequency for that domain. Policy is determined by both the sched classes and this shim layer. Note that this algorithm is event-driven. There is no polling loop to check cpu idle time nor any other method which is unsynchronized with the scheduler, aside from a throttling mechanism to ensure frequency changes are not attempted faster than the hardware can accommodate them. Thanks to Juri Lelli <juri.lelli@arm.com> for contributing design ideas, code and test results, and to Ricky Liang <jcliang@chromium.org> for initialization and static key inc/dec fixes. [0] http://article.gmane.org/gmane.linux.kernel/1499836 [smuckle@linaro.org: various additions and fixes, revised commit text] CC: Ricky Liang <jcliang@chromium.org> Signed-off-by: Michael Turquette <mturquette@baylibre.com> Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-09-14 14:58:22 +05:30
Michael Turquette	10cbfd68e2	cpufreq: introduce cpufreq_driver_is_slow Some architectures and platforms perform CPU frequency transitions through a non-blocking method, while some might block or sleep. Even when frequency transitions do not block or sleep they may be very slow. This distinction is important when trying to change frequency from a non-interruptible context in a scheduler hot path. Describe this distinction with a cpufreq driver flag, CPUFREQ_DRIVER_FAST. The default is to not have this flag set, thus erring on the side of caution. cpufreq_driver_is_slow() is also introduced in this patch. Setting the above flag will allow this function to return false. [smuckle@linaro.org: change flag/API to include drivers that are too slow for scheduler hot paths, in addition to those that block/sleep] Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Michael Turquette <mturquette@baylibre.com> Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-09-14 14:48:50 +05:30
Dietmar Eggemann	74a07a6950	cpufreq: Max freq invariant scheduler load-tracking and cpu capacity support Implements cpufreq_scale_max_freq_capacity() to provide the scheduler with a maximum frequency scaling correction factor for more accurate load-tracking and cpu capacity handling by being able to deal with frequency capping. This scaling factor describes the influence of running a cpu with a current maximum frequency lower than the absolute possible maximum frequency on load tracking and cpu capacity. The factor is: current_max_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu) In fact, max_freq_scale should be a struct cpufreq_policy data member. But this would require that the scheduler hot path (__update_load_avg()) would have to grab the cpufreq lock. This can be avoided by using per-cpu data initialized to SCHED_CAPACITY_SCALE for max_freq_scale. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>	2016-09-14 14:48:50 +05:30
Robin Randhawa	907f4d5e6a	sched: Support for extracting EAS energy costs from DT This patch implements support for extracting energy cost data from DT. The data should conform to the DT bindings for energy cost data needed by EAS (energy aware scheduling). Signed-off-by: Robin Randhawa <robin.randhawa@arm.com>	2016-09-14 14:48:50 +05:30
Morten Rasmussen	df05d846a4	sched, cpuidle: Track cpuidle state index in the scheduler The idle-state of each cpu is currently pointed to by rq->idle_state but there isn't any information in the struct cpuidle_state that can used to look up the idle-state energy model data stored in struct sched_group_energy. For this purpose is necessary to store the idle state index as well. Ideally, the idle-state data should be unified. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>	2016-09-14 14:48:50 +05:30
Morten Rasmussen	e2cee0c2f0	sched: Introduce SD_SHARE_CAP_STATES sched_domain flag cpufreq is currently keeping it a secret which cpus are sharing clock source. The scheduler needs to know about clock domains as well to become more energy aware. The SD_SHARE_CAP_STATES domain flag indicates whether cpus belonging to the sched_domain share capacity states (P-states). There is no connection with cpufreq (yet). The flag must be set by the arch specific topology code. cc: Russell King <linux@arm.linux.org.uk> cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>	2016-09-14 14:48:50 +05:30
Dietmar Eggemann	029e7b086e	sched: Introduce energy data structures The struct sched_group_energy represents the per sched_group related data which is needed for energy aware scheduling. It contains: (1) number of elements of the idle state array (2) pointer to the idle state array which comprises 'power consumption' for each idle state (3) number of elements of the capacity state array (4) pointer to the capacity state array which comprises 'compute capacity and power consumption' tuples for each capacity state The struct sched_group obtains a pointer to a struct sched_group_energy. The function pointer sched_domain_energy_f is introduced into struct sched_domain_topology_level which will allow the arch to pass a particular struct sched_group_energy from the topology shim layer into the scheduler core. The function pointer sched_domain_energy_f has an 'int cpu' parameter since the folding of two adjacent sd levels via sd degenerate doesn't work for all sd levels. I.e. it is not possible for example to use this feature to provide per-cpu energy in sd level DIE on ARM's TC2 platform. It was discussed that the folding of sd levels approach is preferable over the cpu parameter approach, simply because the user (the arch specifying the sd topology table) can introduce less errors. But since it is not working, the 'int cpu' parameter is the only way out. It's possible to use the folding of sd levels approach for sched_domain_flags_f and the cpu parameter approach for the sched_domain_energy_f at the same time though. With the use of the 'int cpu' parameter, an extra check function has to be provided to make sure that all cpus spanned by a sched group are provisioned with the same energy data. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>	2016-09-14 14:48:50 +05:30
Dietmar Eggemann	e3c1175d91	cpufreq: Frequency invariant scheduler load-tracking support Implements cpufreq_scale_freq_capacity() to provide the scheduler with a frequency scaling correction factor for more accurate load-tracking. The factor is: current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu) In fact, freq_scale should be a struct cpufreq_policy data member. But this would require that the scheduler hot path (__update_load_avg()) would have to grab the cpufreq lock. This can be avoided by using per-cpu data initialized to SCHED_CAPACITY_SCALE for freq_scale. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>	2016-09-14 14:48:50 +05:30
Mark Salyzyn	872ffdd406	FROMLIST: pstore: drop pmsg bounce buffer (from https://lkml.org/lkml/2016/9/1/428) (cherry pick from android-3.10 commit b58133100b38f2bf83cad2d7097417a3a196ed0b) Removing a bounce buffer copy operation in the pmsg driver path is always better. We also gain in overall performance by not requesting a vmalloc on every write as this can cause precious RT tasks, such as user facing media operation, to stall while memory is being reclaimed. Added a write_buf_user to the pstore functions, a backup platform write_buf_user that uses the small buffer that is part of the instance, and implemented a ramoops write_buf_user that only supports PSTORE_TYPE_PMSG. Signed-off-by: Mark Salyzyn <salyzyn@google.com> Bug: 31057326 Change-Id: I4cdee1cd31467aa3e6c605bce2fbd4de5b0f8caa	2016-09-14 14:44:33 +05:30
Kees Cook	0c9f69c443	UPSTREAM: usercopy: force check_object_size() inline Just for good measure, make sure that check_object_size() is always inlined too, as already done for copy__user() and __copy__user(). Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kees Cook <keescook@chromium.org> Change-Id: Ibfdf4790d03fe426e68d9a864c55a0d1bbfb7d61 (cherry picked from commit `a85d6b8242`) Signed-off-by: Sami Tolvanen <samitolvanen@google.com>	2016-09-14 14:44:29 +05:30
Kees Cook	ab8010a1ae	BACKPORT: usercopy: fold builtin_const check into inline function Instead of having each caller of check_object_size() need to remember to check for a const size parameter, move the check into check_object_size() itself. This actually matches the original implementation in PaX, though this commit cleans up the now-redundant builtin_const() calls in the various architectures. Signed-off-by: Kees Cook <keescook@chromium.org> Change-Id: I348809399c10ffa051251866063be674d064b9ff (cherry picked from `81409e9e28`) Signed-off-by: Sami Tolvanen <samitolvanen@google.com>	2016-09-14 14:44:29 +05:30
Mohan Srinivasan	34363937f7	Android: Fix build breakages. The IO latency histogram change broke allmodconfig and allnoconfig builds. This fixes those breakages. Change-Id: I9cdae655b40ed155468f3cef25cdb74bb56c4d3e Signed-off-by: Mohan Srinivasan <srmohan@google.com>	2016-09-14 14:44:29 +05:30
Joonsoo Kim	e08e07aec2	UPSTREAM: mm/slub: support left redzone SLUB already has a redzone debugging feature. But it is only positioned at the end of object (aka right redzone) so it cannot catch left oob. Although current object's right redzone acts as left redzone of next object, first object in a slab cannot take advantage of this effect. This patch explicitly adds a left red zone to each object to detect left oob more precisely. Background: Someone complained to me that left OOB doesn't catch even if KASAN is enabled which does page allocation debugging. That page is out of our control so it would be allocated when left OOB happens and, in this case, we can't find OOB. Moreover, SLUB debugging feature can be enabled without page allocator debugging and, in this case, we will miss that OOB. Before trying to implement, I expected that changes would be too complex, but, it doesn't look that complex to me now. Almost changes are applied to debug specific functions so I feel okay. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: Ib893a17ecabd692e6c402e864196bf89cd6781a5 (cherry picked from commit `d86bd1bece`) Signed-off-by: Sami Tolvanen <samitolvanen@google.com>	2016-09-14 14:44:29 +05:30
Mohan Srinivasan	8e4b2f84a8	Android: MMC/UFS IO Latency Histograms. This patch adds a new sysfs node (latency_hist) and reports IO (svc time) latency histograms. Disabled by default, can be enabled by echoing 0 into latency_hist, stats can be cleared by writing 2 into latency_hist. This commit fixes the 32 bit build breakage in the previous commit. Tested on both 32 bit and 64 bit arm devices. Bug: 30677035 Change-Id: I9a615a16616d80f87e75676ac4d078a5c429dcf9 Signed-off-by: Mohan Srinivasan <srmohan@google.com>	2016-09-14 14:43:34 +05:30
Linus Torvalds	1cbefb3fb1	UPSTREAM: unsafe_[get\|put]_user: change interface to use a error target label When I initially added the unsafe_[get\|put]_user() helpers in commit `5b24a7a2aa` ("Add 'unsafe' user access functions for batched accesses"), I made the mistake of modeling the interface on our traditional __[get\|put]_user() functions, which return zero on success, or -EFAULT on failure. That interface is fairly easy to use, but it's actually fairly nasty for good code generation, since it essentially forces the caller to check the error value for each access. In particular, since the error handling is already internally implemented with an exception handler, and we already use "asm goto" for various other things, we could fairly easily make the error cases just jump directly to an error label instead, and avoid the need for explicit checking after each operation. So switch the interface to pass in an error label, rather than checking the error value in the caller. Best do it now before we start growing more users (the signal handling code in particular would be a good place to use the new interface). So rather than if (unsafe_get_user(x, ptr)) ... handle error .. the interface is now unsafe_get_user(x, ptr, label); where an error during the user mode fetch will now just cause a jump to 'label' in the caller. Right now the actual _implementation_ of this all still ends up being a "if (err) goto label", and does not take advantage of any exception label tricks, but for "unsafe_put_user()" in particular it should be fairly straightforward to convert to using the exception table model. Note that "unsafe_get_user()" is much harder to convert to a clever exception table model, because current versions of gcc do not allow the use of "asm goto" (for the exception) with output values (for the actual value to be fetched). But that is hopefully not a limitation in the long term. [ Also note that it might be a good idea to switch unsafe_get_user() to actually _return_ the value it fetches from user space, but this commit only changes the error handling semantics ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: Ib905a84a04d46984320f6fd1056da4d72f3d6b53 (cherry picked from commit `1bd4403d86`) Signed-off-by: Sami Tolvanen <samitolvanen@google.com>	2016-09-14 14:43:17 +05:30
Eric Dumazet	2a111a3c24	UPSTREAM: tcp: fix use after free in tcp_xmit_retransmit_queue() (cherry picked from commit `bb1fceca22`) When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the tail of the write queue using tcp_add_write_queue_tail() Then it attempts to copy user data into this fresh skb. If the copy fails, we undo the work and remove the fresh skb. Unfortunately, this undo lacks the change done to tp->highest_sack and we can leave a dangling pointer (to a freed skb) Later, tcp_xmit_retransmit_queue() can dereference this pointer and access freed memory. For regular kernels where memory is not unmapped, this might cause SACK bugs because tcp_highest_sack_seq() is buggy, returning garbage instead of tp->snd_nxt, but with various debug features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel. This bug was found by Marco Grassi thanks to syzkaller. Fixes: `6859d49475` ("[TCP]: Abstract tp->highest_sack accessing & point to next skb") Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Change-Id: I58bb02d6e4e399612e8580b9e02d11e661df82f5 Bug: 31183296	2016-09-14 14:26:42 +05:30
Peter Zijlstra	a81c69e149	RFC: FROMLIST: cgroup: avoid synchronize_sched() in __cgroup_procs_write() The current percpu-rwsem read side is entirely free of serializing insns at the cost of having a synchronize_sched() in the write path. The latency of the synchronize_sched() is too high for cgroups. The commit `1ed1328792` talks about the write path being a fairly cold path but this is not the case for Android which moves task to the foreground cgroup and back around binder IPC calls from foreground processes to background processes, so it is significantly hotter than human initiated operations. Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the problem, hopefully it should not be that slow after another commit `80127a3968` ("locking/percpu-rwsem: Optimize readers and reduce global impact"). We could just add rcu_sync_enter() into cgroup_init() but we do not want another synchronize_sched() at boot time, so this patch adds the new helper which doesn't block but currently can only be called before the first use. Cc: Tejun Heo <tj@kernel.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Reported-by: John Stultz <john.stultz@linaro.org> Reported-by: Dmitry Shmidt <dimitrysh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> [jstultz: backported to 4.4] Change-Id: I34aa9c394d3052779b56976693e96d861bd255f2 Mailing-list-URL: https://lkml.org/lkml/2016/8/11/557 Signed-off-by: John Stultz <john.stultz@linaro.org>	2016-09-14 14:26:20 +05:30
Peter Zijlstra	d4d74af4b8	RFC: FROMLIST: locking/percpu-rwsem: Optimize readers and reduce global impact Currently the percpu-rwsem switches to (global) atomic ops while a writer is waiting; which could be quite a while and slows down releasing the readers. This patch cures this problem by ordering the reader-state vs reader-count (see the comments in __percpu_down_read() and percpu_down_write()). This changes a global atomic op into a full memory barrier, which doesn't have the global cacheline contention. This also enables using the percpu-rwsem with rcu_sync disabled in order to bias the implementation differently, reducing the writer latency by adding some cost to readers. Mailing-list-URL: https://lkml.org/lkml/2016/8/9/181 Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [jstultz: Backported to 4.4] Change-Id: I8ea04b4dca2ec36f1c2469eccafde1423490572f Signed-off-by: John Stultz <john.stultz@linaro.org>	2016-09-14 14:26:20 +05:30
Alex Shi	b56111f481	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android Conflicts: arch/arm/Kconfig	2016-08-30 10:27:13 +08:00
Alex Shi	59e65b4bbf	Merge remote-tracking branch 'v4.4/topic/mm-kaslr-pax_usercopy' into linux-linaro-lsk-v4.4	2016-08-27 11:27:14 +08:00
Linus Torvalds	798522d907	Add 'unsafe' user access functions for batched accesses The naming is meant to discourage random use: the helper functions are not really any more "unsafe" than the traditional double-underscore functions (which need the address range checking), but they do need even more infrastructure around them, and should not be used willy-nilly. In addition to checking the access range, these user access functions require that you wrap the user access with a "user_acess_{begin,end}()" around it. That allows architectures that implement kernel user access control (x86: SMAP, arm64: PAN) to do the user access control in the wrapping user_access_begin/end part, and then batch up the actual user space accesses using the new interfaces. The main (and hopefully only) use for these are for core generic access helpers, initially just the generic user string functions (strnlen_user() and strncpy_from_user()). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `5b24a7a2aa`) Signed-off-by: Alex Shi <alex.shi@linaro.org>	2016-08-27 11:23:38 +08:00
Kees Cook	799abb4f95	mm: Hardened usercopy This is the start of porting PAX_USERCOPY into the mainline kernel. This is the first set of features, controlled by CONFIG_HARDENED_USERCOPY. The work is based on code by PaX Team and Brad Spengler, and an earlier port from Casey Schaufler. Additional non-slab page tests are from Rik van Riel. This patch contains the logic for validating several conditions when performing copy_to_user() and copy_from_user() on the kernel object being copied to/from: - address range doesn't wrap around - address range isn't NULL or zero-allocated (with a non-zero copy size) - if on the slab allocator: - object size must be less than or equal to copy size (when check is implemented in the allocator, which appear in subsequent patches) - otherwise, object must not span page allocations (excepting Reserved and CMA ranges) - if on the stack - object must not extend before/after the current process stack - object must be contained by a valid stack frame (when there is arch/build support for identifying stack frames) - object must not overlap with kernel text Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Tested-by: Michael Ellerman <mpe@ellerman.id.au> (cherry picked from commit `f5509cc18d`) Signed-off-by: Alex Shi <alex.shi@linaro.org> Conflicts: skip debug_page_ref and KCOV_INSTRUMENT in mm/Makefile	2016-08-27 11:23:38 +08:00
Kees Cook	fdb92b0de3	mm: Implement stack frame object validation This creates per-architecture function arch_within_stack_frames() that should validate if a given object is contained by a kernel stack frame. Initial implementation is on x86. This is based on code from PaX. Signed-off-by: Kees Cook <keescook@chromium.org> (cherry picked from commit `0f60a8efe4`) Signed-off-by: Alex Shi <alex.shi@linaro.org> Conflicts: skip EBPF_JIT in arch/x86/Kconfig	2016-08-27 11:23:38 +08:00
Laura Abbott	472dd6904d	mm: Add is_migrate_cma_page Code such as hardened user copy[1] needs a way to tell if a page is CMA or not. Add is_migrate_cma_page in a similar way to is_migrate_isolate_page. [1]http://article.gmane.org/gmane.linux.kernel.mm/155238 Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> (cherry picked from commit `7c15d9bb82`) Signed-off-by: Alex Shi <alex.shi@linaro.org>	2016-08-27 11:23:38 +08:00
Amit Pundir	6acba5b6a1	ANDROID: net: core: fix UID-based routing Fix RTA_UID enum to match it with the Android userspace code which assumes RTA_UID=18. With this patch all Android kernel networking unit tests mentioned here https://source.android.com/devices/tech/config/kernel_network_tests.html are success. Without this patch multinetwork_test.py unit test fails. Change-Id: I3ff36670f7d4e5bf5f01dce584ae9d53deabb3ed Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2016-08-18 18:56:03 +05:30
Amit Pundir	439bce3122	ANDROID: net: fib: remove duplicate assignment Remove duplicate FRA_GOTO assignment. Fixes: `fd2cf795f3` ("net: core: Support UID-based routing.") Change-Id: I462c24b16fdef42ae2332571a0b95de3ef9d2e25 Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2016-08-18 18:56:03 +05:30
Will Drewry	5d4ac07c90	CHROMIUM: dm: boot time specification of dm= This is a wrap-up of three patches pending upstream approval. I'm bundling them because they are interdependent, and it'll be easier to drop it on rebase later. 1. dm: allow a dm-fs-style device to be shared via dm-ioctl Integrates feedback from Alisdair, Mike, and Kiyoshi. Two main changes occur here: - One function is added which allows for a programmatically created mapped device to be inserted into the dm-ioctl hash table. This binds the device to a name and, optional, uuid which is needed by udev and allows for userspace management of the mapped device. - dm_table_complete() was extended to handle all of the final functional changes required for the table to be operational once called. 2. init: boot to device-mapper targets without an initr* Add a dm= kernel parameter modeled after the md= parameter from do_mounts_md. It allows for device-mapper targets to be configured at boot time for use early in the boot process (as the root device or otherwise). It also replaces /dev/XXX calls with major:minor opportunistically. The format is dm="name uuid ro,table line 1,table line 2,...". The parser expects the comma to be safe to use as a newline substitute but, otherwise, uses the normal separator of space. Some attempt has been made to make it forgiving of additional spaces (using skip_spaces()). A mapped device created during boot will be assigned a minor of 0 and may be access via /dev/dm-0. An example dm-linear root with no uuid may look like: root=/dev/dm-0 dm="lroot none ro, 0 4096 linear /dev/ubdb 0, 4096 4096 linear /dv/ubdc 0" Once udev is started, /dev/dm-0 will become /dev/mapper/lroot. Older upstream threads: http://marc.info/?l=dm-devel&m=127429492521964&w=2 http://marc.info/?l=dm-devel&m=127429499422096&w=2 http://marc.info/?l=dm-devel&m=127429493922000&w=2 Latest upstream threads: https://patchwork.kernel.org/patch/104859/ https://patchwork.kernel.org/patch/104860/ https://patchwork.kernel.org/patch/104861/ Bug: 27175947 Signed-off-by: Will Drewry <wad@chromium.org> Review URL: http://codereview.chromium.org/2020011 Change-Id: I92bd53432a11241228d2e5ac89a3b20d19b05a31	2016-08-18 18:56:03 +05:30

1 2 3 4 5 ...

79760 Commits