the mm operation for logo memory will conflict with userspace
memory operation, so we add mm_lock to guaranteed no conflict.
Change-Id: If02310e3a9e48478124201edd94af38dd900944a
Signed-off-by: Sandy Huang <hjc@rock-chips.com>
We need to enable lvds mode before calling phy_power_on.
Change-Id: I38625c44998bde81bb4c98b70b8be5995d64b477
Signed-off-by: Wyon Bi <bivvy.bi@rock-chips.com>
Add legency ion driver support, and rockchip ion driver
only(default) used by the legency ion.
Change-Id: I428d856e96033004943ee024e8d280f3e96753a1
Signed-off-by: Jianqun Xu <jay.xu@rock-chips.com>
The recommended max voltage of vdd_npu is 0.88V in the datasheet.
Change-Id: I9713810692c5d32b8c41b0b0e0d02405c01dd0b7
Signed-off-by: Liang Chen <cl@rock-chips.com>
CONFIG_CPU_FREQ_GOV_PERFORMANCE is auto selected by
CONFIG_CPU_FREQ_DEFAULT_GOV_INTERACTIVE.
Change-Id: Ic5315e7fe80da246761a7da1628a7e705cb8aef4
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
CONFIG_CPU_FREQ_GOV_PERFORMANCE is auto selected by
CONFIG_CPU_FREQ_DEFAULT_GOV_INTERACTIVE.
Change-Id: Ibc2bf7079759be158aa4f4999813f72e2730f933
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
HDR panel metadata is in connector, not in drm_hdmi_info. So HDR
panel metadata won't be reset when parsing edid. If hdmi is connected
from one HDR TV to another TV that does not support HDR, HDR panel
metadata is still from HDR TV, that will cause hdmi display error.
Change-Id: I2c38240cdb8a7ab6517fa5987db21bc28940da38
Signed-off-by: Algea Cao <algea.cao@rock-chips.com>
[ Upstream commit def97da136 ]
Commit f92b070f2d ("printk: Do not miss new messages when replaying
the log") introduced a new variable @exclusive_console_stop_seq to
store when an exclusive console should stop printing. It should be
set to the @console_seq value at registration. However, @console_seq
is previously set to @syslog_seq so that the exclusive console knows
where to begin. This results in the exclusive console immediately
reactivating all the other consoles and thus repeating the messages
for those consoles.
Set @console_seq after @exclusive_console_stop_seq has stored the
current @console_seq value.
Fixes: f92b070f2d ("printk: Do not miss new messages when replaying the log")
Change-Id: Ibea8f5ee2c1a5fc77f46ea11e8769c2d3aa418d0
Link: http://lkml.kernel.org/r/20191219115322.31160-1-john.ogness@linutronix.de
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
https://source.android.com/security/bulletin/2020-02-01
CVE-2020-0030
CVE-2019-11599
* tag 'ASB-2020-02-05_4.19': (4206 commits)
UPSTREAM: sched/fair/util_est: Implement faster ramp-up EWMA on utilization increases
ANDROID: Re-use SUGOV_RT_MAX_FREQ to control uclamp rt behavior
BACKPORT: sched/fair: Make EAS wakeup placement consider uclamp restrictions
BACKPORT: sched/fair: Make task_fits_capacity() consider uclamp restrictions
ANDROID: sched/core: Move SchedTune task API into UtilClamp wrappers
ANDROID: sched/core: Add a latency-sensitive flag to uclamp
ANDROID: sched/tune: Move SchedTune cpu API into UtilClamp wrappers
ANDROID: init: kconfig: Only allow sched tune if !uclamp
FROMGIT: sched/core: Fix size of rq::uclamp initialization
FROMGIT: sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
FROMGIT: sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
FROMGIT: sched/uclamp: Make uclamp util helpers use and return UL values
FROMGIT: sched/uclamp: Remove uclamp_util()
BACKPORT: sched/rt: Make RT capacity-aware
UPSTREAM: tools headers UAPI: Sync sched.h with the kernel
UPSTREAM: sched/uclamp: Fix overzealous type replacement
UPSTREAM: sched/uclamp: Fix incorrect condition
UPSTREAM: sched/core: Fix compilation error when cgroup not selected
UPSTREAM: sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
UPSTREAM: sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
...
Conflicts:
drivers/char/random.c
drivers/devfreq/devfreq.c
drivers/gpu/drm/drm_fb_helper.c
drivers/media/i2c/ov2680.c
drivers/media/i2c/ov2685.c
drivers/media/i2c/ov5670.c
drivers/media/i2c/ov5695.c
drivers/media/usb/uvc/uvc_driver.c
drivers/mmc/host/cqhci.c
drivers/spi/spi-rockchip.c
drivers/usb/dwc2/params.c
drivers/usb/dwc3/debugfs.c
drivers/usb/dwc3/gadget.c
drivers/usb/serial/usb_wwan.c
include/linux/clk-provider.h
include/linux/mfd/rk808.h
kernel/cpu.c
sound/usb/quirks.c
- Export symbol mm_trace_rss_stat on mm/memory.c for GPU drivers.
- Fix sound/usb/pcm.c for SNDRV_PCM_TRIGGER_SUSPEND.
- Enable DEBUG_FS which is not selected by TRACING.
- Disable of_devlink which broken boot. of_devlink is enabled by commit
ba3aa33b8f ("ANDROID: of: property: Enable of_devlink by default").
- Add CLK_DONT_HOLD_STATE and CLK_KEEP_REQ_RATE to clk_flags
on drivers/clk/clk.c.
Change-Id: I500ca1bbc735753f9c8251ed2ac8ad757d5a24a4
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
This reverts commit 4f29b1cfa6.
Maybe fixed by commit be41df88a5 ("HID: core: check whether Usage Page item is after Usage ID items").
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
If want to use i2c2, we must write i2c2 register bit with 1 at GRF.
Change-Id: Ia7e59c105647304162bde283a3fb98d9e0db75c3
Signed-off-by: David Wu <david.wu@rock-chips.com>
This fixed some Rockchip SoCs just shared one irq with
all mailbox channels.
Change-Id: Ie78a3372b58ad20a20e75046aca379a2db65260f
Signed-off-by: Frank Wang <frank.wang@rock-chips.com>
This change amend the below features.
- Checked the mailbox channel status before send message.
- Used the con_priv variable to handle the channel private data.
- Added the spinlock cfg_lock to protect the register R/W.
- Optimized the interrupt handler can receive B2A message proactively.
Change-Id: If1939e51e821307788ab59dd4ef874a20a6568e2
Signed-off-by: Frank Wang <frank.wang@rock-chips.com>
The estimated utilization for a task:
util_est = max(util_avg, est.enqueue, est.ewma)
is defined based on:
- util_avg: the PELT defined utilization
- est.enqueued: the util_avg at the end of the last activation
- est.ewma: a exponential moving average on the est.enqueued samples
According to this definition, when a task suddenly changes its bandwidth
requirements from small to big, the EWMA will need to collect multiple
samples before converging up to track the new big utilization.
This slow convergence towards bigger utilization values is not
aligned to the default scheduler behavior, which is to optimize for
performance. Moreover, the est.ewma component fails to compensate for
temporarely utilization drops which spans just few est.enqueued samples.
To let util_est do a better job in the scenario depicted above, change
its definition by making util_est directly follow upward motion and
only decay the est.ewma on downward.
Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@matbug.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191023205630.14469-1-patrick.bellasi@matbug.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit b8c9636140)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I5c0bdd401f3fe599a2b7b9215c9a3a621f91002d
By default uclamp RT tasks will use the max frequency, which is not the
desired default behavior on mobile devices.
Re-use the SUGOV_RT_MAX_FREQ sched_feat to control the default behavior.
When SUGOV_RT_MAX_FREQ is NOT selected, the uclamp_min value of the RT
tasks will be 0.
Note, since now we use SUGOV_RT_MAX_FREQ to enforce the default max
frequency for RT when uclamp is compiled in; the condition in
schedutil_cpu_util() needs to be inverted so that max no longer
unconditionally applied when uclamp is compiled in && SUGOV_RT_MAX_FREQ
is true. This unconditional application means uclamp values are always
ignored which is not what we want when uclamp is compiled in.
Bug: 120440300
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I3d36f1ebed6ef35a6299af32bbf4462d0353e783
Signed-off-by: Quentin Perret <qperret@google.com>
task_fits_capacity() has just been made uclamp-aware, and
find_energy_efficient_cpu() needs to go through the same treatment.
Things are somewhat different here however - using the task max clamp isn't
sufficient. Consider the following setup:
The target runqueue, rq:
rq.cpu_capacity_orig = 512
rq.cfs.avg.util_avg = 200
rq.uclamp.max = 768 // the max p.uclamp.max of all enqueued p's is 768
The waking task, p (not yet enqueued on rq):
p.util_est = 600
p.uclamp.max = 100
Now, consider the following code which doesn't use the rq clamps:
util = uclamp_task_util(p);
// Does the task fit in the spare CPU capacity?
cpu = cpu_of(rq);
fits_capacity(util, cpu_capacity(cpu) - cpu_util(cpu))
This would lead to:
util = 100;
fits_capacity(100, 512 - 200)
fits_capacity() would return true. However, enqueuing p on that CPU *will*
cause it to become overutilized since rq clamp values are max-aggregated,
so we'd remain with
rq.uclamp.max = 768
which comes from the other tasks already enqueued on rq. Thus, we could
select a high enough frequency to reach beyond 0.8 * 512 utilization
(== overutilized) after enqueuing p on rq. What find_energy_efficient_cpu()
needs here is uclamp_rq_util_with() which lets us peek at the future
utilization landscape, including rq-wide uclamp values.
Make find_energy_efficient_cpu() use uclamp_rq_util_with() for its
fits_capacity() check. This is in line with what compute_energy() ends up
using for estimating utilization.
[QP: moved changes to select_cpu_candidates(), which is the equivalent
to the mainline path, and fix missing dependency on fits_capacity() by
using the open coded version]
Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-6-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1d42509e47)
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Ibe1643cd5e6c97daceceae9733344e54bf4a4857
task_fits_capacity() drives CPU selection at wakeup time, and is also used
to detect misfit tasks. Right now it does so by comparing task_util_est()
with a CPU's capacity, but doesn't take into account uclamp restrictions.
There's a few interesting uses that can come out of doing this. For
instance, a low uclamp.max value could prevent certain tasks from being
flagged as misfit tasks, so they could merrily remain on low-capacity CPUs.
Similarly, a high uclamp.min value would steer tasks towards high capacity
CPUs at wakeup (and, should that fail, later steered via misfit balancing),
so such "boosted" tasks would favor CPUs of higher capacity.
Introduce uclamp_task_util() and make task_fits_capacity() use it.
[QP: fixed missing dependency on fits_capacity() by using the open coded
alternative]
Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-5-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit a7008c07a5)
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Iabde2eda7252c3bcc273e61260a7a12a7de991b1
The main SchedTune API calls realted to task tuning attributes are now
wrapped by more generic and mainlinish UtilClamp calls.
The new APIs are:
- uclamp_task(p) <= boosted_task_util(p)
- uclamp_boosted(p) <= schedtune_task_boost(p) > 0
- uclamp_latency_sensitive(p) <= schedtune_prefer_idle(p)
Let's provide also an implementation of the same API based on the new
uclamp.uclamp_latency_sensitive flag.
Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
[Modified the patch to use uclamp.latency_sensitive instead mainline
attributes]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ib1a6902e1c07a82a370e36bf1776d895b7528cbc
Signed-off-by: Quentin Perret <qperret@google.com>
Add a 'latency_sensitive' flag to uclamp in order to express the need
for some tasks to find a CPU where they can wake-up quickly. This is not
expected to be used without cgroup support, so add solely a cgroup
interface for it.
As this flag represents a boolean attribute and not an amount of
resources to be shared, it is not clear what the delegation logic should
be. As such, it is kept simple: every new cgroup starts with
latency_sensitive set to false, regardless of the parent.
In essence, this is similar to SchedTune's prefer-idle flag which was
used in android-4.19 and prior.
Bug: 120440300
Change-Id: I722d8ecabb428bb7b95a5b54bc70a87f182dde2a
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
(cherry picked from commit ad7dd648fc7dbe11f23673a3463af2468a274998)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Quentin Perret <qperret@google.com>
The SchedTune CPU boosting API is currently used from sugov_get_util()
to get the boosted utilization and to pass it into schedutil_cpu_util().
When UtilClamp is in use instead we call schedutil_cpu_util() by
passing in just the CFS utilization and the clamping is done internally
on the aggregated CFS+RT utilization for FREQUENCY_UTIL calls.
This asymmetry is not required moreover, schedutil code is polluted by
non-mainline SchedTune code.
Wrap SchedTune API call related to cpu utilization boosting with a more
generic and mainlinish UtilClamp call:
- uclamp_rq_util_with(cpu, util, p) <= boosted_cpu_util(cpu)
This new API is already used in schedutil_cpu_util() to clamp the
aggregated RT+CFS utilization on FREQUENCY_UTIL calls.
Move the cpu boosting into uclamp_rq_util_with() so that we remove any
SchedTune specific bit from kernel/sched/cpufreq_schedutil.c.
Get rid of the no more required boosted_cpu_util(cpu) method and replace
it with a stune_util(cpu, util) which signature is better aligned with
its uclamp_rq_util_with(cpu, util, p) counterpart.
Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I45b0f0f54123fe0a2515fa9f1683842e6b99234f
[Removed superfluous __maybe_unused for capacity_orig_of]
Signed-off-by: Quentin Perret <qperret@google.com>
Uclamap and sched_tune features are mutually exclusive. The kernel must
be compiled with one or the other.
Bug: 120440300
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Iec209516aadb984fd3dead48ea3f3f3ca117335e
Signed-off-by: Quentin Perret <qperret@google.com>
When a new cgroup is created, the effective uclamp value wasn't updated
with a call to cpu_util_update_eff() that looks at the hierarchy and
update to the most restrictive values.
Fix it by ensuring to call cpu_util_update_eff() when a new cgroup
becomes online.
Without this change, the newly created cgroup uses the default
root_task_group uclamp values, which is 1024 for both uclamp_{min, max},
which will cause the rq to to be clamped to max, hence cause the
system to run at max frequency.
The problem was observed on Ubuntu server and was reproduced on Debian
and Buildroot rootfs.
By default, Ubuntu and Debian create a cpu controller cgroup hierarchy
and add all tasks to it - which creates enough noise to keep the rq
uclamp value at max most of the time. Imitating this behavior makes the
problem visible in Buildroot too which otherwise looks fine since it's a
minimal userspace.
Bug: 120440300
Fixes: 0b60ba2dd3 ("sched/uclamp: Propagate parent clamps")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Doug Smythies <dsmythies@telus.net>
Link: https://lore.kernel.org/lkml/000701d5b965$361b6c60$a2524520$@net/
(cherry picked from commit 7226017ad3https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I9636c60e04d58bbfc5041df1305b34a12b5a3f46
Signed-off-by: Quentin Perret <qperret@google.com>
Capacity Awareness refers to the fact that on heterogeneous systems
(like Arm big.LITTLE), the capacity of the CPUs is not uniform, hence
when placing tasks we need to be aware of this difference of CPU
capacities.
In such scenarios we want to ensure that the selected CPU has enough
capacity to meet the requirement of the running task. Enough capacity
means here that capacity_orig_of(cpu) >= task.requirement.
The definition of task.requirement is dependent on the scheduling class.
For CFS, utilization is used to select a CPU that has >= capacity value
than the cfs_task.util.
capacity_orig_of(cpu) >= cfs_task.util
DL isn't capacity aware at the moment but can make use of the bandwidth
reservation to implement that in a similar manner CFS uses utilization.
The following patchset implements that:
https://lore.kernel.org/lkml/20190506044836.2914-1-luca.abeni@santannapisa.it/
capacity_orig_of(cpu)/SCHED_CAPACITY >= dl_deadline/dl_runtime
For RT we don't have a per task utilization signal and we lack any
information in general about what performance requirement the RT task
needs. But with the introduction of uclamp, RT tasks can now control
that by setting uclamp_min to guarantee a minimum performance point.
ATM the uclamp value are only used for frequency selection; but on
heterogeneous systems this is not enough and we need to ensure that the
capacity of the CPU is >= uclamp_min. Which is what implemented here.
capacity_orig_of(cpu) >= rt_task.uclamp_min
Note that by default uclamp.min is 1024, which means that RT tasks will
always be biased towards the big CPUs, which make for a better more
predictable behavior for the default case.
Must stress that the bias acts as a hint rather than a definite
placement strategy. For example, if all big cores are busy executing
other RT tasks we can't guarantee that a new RT task will be placed
there.
On non-heterogeneous systems the original behavior of RT should be
retained. Similarly if uclamp is not selected in the config.
[ mingo: Minor edits to comments. ]
Bug: 120440300
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191009104611.15363-1-qais.yousef@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 804d402fb6https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git)
[Qais: resolved minor conflict in kernel/sched/cpupri.c]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ifc9da1c47de1aec9b4d87be2614e4c8968366900
Signed-off-by: Quentin Perret <qperret@google.com>
When cgroup is disabled the following compilation error was hit
kernel/sched/core.c: In function ‘uclamp_update_active_tasks’:
kernel/sched/core.c:1081:23: error: storage size of ‘it’ isn’t known
struct css_task_iter it;
^~
kernel/sched/core.c:1084:2: error: implicit declaration of function ‘css_task_iter_start’; did you mean ‘__sg_page_iter_start’? [-Werror=implicit-function-declaration]
css_task_iter_start(css, 0, &it);
^~~~~~~~~~~~~~~~~~~
__sg_page_iter_start
kernel/sched/core.c:1085:14: error: implicit declaration of function ‘css_task_iter_next’; did you mean ‘__sg_page_iter_next’? [-Werror=implicit-function-declaration]
while ((p = css_task_iter_next(&it))) {
^~~~~~~~~~~~~~~~~~
__sg_page_iter_next
kernel/sched/core.c:1091:2: error: implicit declaration of function ‘css_task_iter_end’; did you mean ‘get_task_cred’? [-Werror=implicit-function-declaration]
css_task_iter_end(&it);
^~~~~~~~~~~~~~~~~
get_task_cred
kernel/sched/core.c:1081:23: warning: unused variable ‘it’ [-Wunused-variable]
struct css_task_iter it;
^~
cc1: some warnings being treated as errors
make[2]: *** [kernel/sched/core.o] Error 1
Fix by protetion uclamp_update_active_tasks() with
CONFIG_UCLAMP_TASK_GROUP
Bug: 120440300
Fixes: babbe170e0 ("sched/uclamp: Update CPU's refcount on TG's clamp changes")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Patrick Bellasi <patrick.bellasi@matbug.net>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ben Segall <bsegall@google.com>
Link: https://lkml.kernel.org/r/20191105112212.596-1-qais.yousef@arm.com
(cherry picked from commit e3b8b6a0d1)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ia4c0f801d68050526f9f117ec9189e448b01345a
Signed-off-by: Quentin Perret <qperret@google.com>
Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:
$ chrt -p $$
chrt: failed to get pid 26306's policy: Argument list too long
and he has root-caused the bug to the following commit increasing sched_attr
size and breaking sched_read_attr() into returning -EFBIG:
a509a7cd79 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
logic of checking non-zero bits in new ABI components is arguably broken,
and pretty much any extension of the ABI will spuriously break the ABI.
That's way too fragile.
Instead implement the perf syscall's extensible ABI instead, which we
already implement on the sched_setattr() side:
- if user-attributes have the same size as kernel attributes then the
logic is unchanged.
- if user-attributes are larger than the kernel knows about then simply
skip the extra bits, but set attr->size to the (smaller) kernel size
so that tooling can (in principle) handle older kernel as well.
- if user-attributes are smaller than the kernel knows about then just
copy whatever user-space can accept.
Also clean up the whole logic:
- Simplify the code flow - there's no need for 'ret' for example.
- Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
always know which side we are dealing with.
- Why is it called 'read' when what it does is to copy to user? This
code is so far away from VFS read() semantics that the naming is
actively confusing. Name it sched_attr_copy_to_user() instead, which
mirrors other copy_to_user() functionality.
- Move the attr->size assignment from the head of sched_getattr() to the
sched_attr_copy_to_user() function. Nothing else within the kernel
should care about the size of the structure.
With these fixes the sched_getattr() syscall now nicely supports an
extensible ABI in both a forward and backward compatible fashion, and
will also fix the chrt bug.
As an added bonus the bogus -EFBIG return is removed as well, which as
Thadeu noted should have been -E2BIG to begin with.
Bug: 120440300
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a509a7cd79 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1251201c0d)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I67e653c4f69db0140e9651c125b60e2b8cfd62f1
Signed-off-by: Quentin Perret <qperret@google.com>
When a task specific clamp value is configured via sched_setattr(2), this
value is accounted in the corresponding clamp bucket every time the task is
{en,de}qeued. However, when cgroups are also in use, the task specific
clamp values could be restricted by the task_group (TG) clamp values.
Update uclamp_cpu_inc() to aggregate task and TG clamp values. Every time a
task is enqueued, it's accounted in the clamp bucket tracking the smaller
clamp between the task specific value and its TG effective value. This
allows to:
1. ensure cgroup clamps are always used to restrict task specific requests,
i.e. boosted not more than its TG effective protection and capped at
least as its TG effective limit.
2. implement a "nice-like" policy, where tasks are still allowed to request
less than what enforced by their TG effective limits and protections
Do this by exploiting the concept of "effective" clamp, which is already
used by a TG to track parent enforced restrictions.
Apply task group clamp restrictions only to tasks belonging to a child
group. While, for tasks in the root group or in an autogroup, system
defaults are still enforced.
Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-5-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 3eac870a32)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I0215e0a68cc0fa7c441e33052757f8571b7c99b9
Signed-off-by: Quentin Perret <qperret@google.com>
The clamp values are not tunable at the level of the root task group.
That's for two main reasons:
- the root group represents "system resources" which are always
entirely available from the cgroup standpoint.
- when tuning/restricting "system resources" makes sense, tuning must
be done using a system wide API which should also be available when
control groups are not.
When a system wide restriction is available, cgroups should be aware of
its value in order to know exactly how much "system resources" are
available for the subgroups.
Utilization clamping supports already the concepts of:
- system defaults: which define the maximum possible clamp values
usable by tasks.
- effective clamps: which allows a parent cgroup to constraint (maybe
temporarily) its descendants without losing the information related
to the values "requested" from them.
Exploit these two concepts and bind them together in such a way that,
whenever system default are tuned, the new values are propagated to
(possibly) restrict or relax the "effective" value of nested cgroups.
When cgroups are in use, force an update of all the RUNNABLE tasks.
Otherwise, keep things simple and do just a lazy update next time each
task will be enqueued.
Do that since we assume a more strict resource control is required when
cgroups are in use. This allows also to keep "effective" clamp values
updated in case we need to expose them to user-space.
Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-4-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7274a5c1bb)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ibf7ce5c46b67c79765b56b792ee22ed9595802c3
Signed-off-by: Quentin Perret <qperret@google.com>