Add the update_rq_clock() call at the top of the callstack instead of
at the bottom where we find it missing, this to aid later effort to
minimize the number of update_rq_lock() calls.
WARNING: CPU: 30 PID: 194 at ../kernel/sched/sched.h:797 assert_clock_updated()
rq->clock_update_flags < RQCF_ACT_SKIP
Call Trace:
dump_stack()
__warn()
warn_slowpath_fmt()
assert_clock_updated.isra.63.part.64()
can_migrate_task()
load_balance()
pick_next_task_fair()
__schedule()
schedule()
worker_thread()
kthread()
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 3bed5e2166)
Change-Id: Ief5070dcce486535334dcb739ee16b989ea9df42
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Instead of adding the update_rq_clock() all the way at the bottom of
the callstack, add one at the top, this to aid later effort to
minimize update_rq_lock() calls.
WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq()
rq->clock_update_flags < RQCF_ACT_SKIP
Call Trace:
dump_stack()
__warn()
warn_slowpath_fmt()
detach_task_cfs_rq()
switched_from_fair()
__sched_setscheduler()
_sched_setscheduler()
sched_set_stop_task()
cpu_stop_create()
__smpboot_create_thread.part.2()
smpboot_register_percpu_thread_cpumask()
cpu_stop_init()
do_one_initcall()
? print_cpu_info()
kernel_init_freeable()
? rest_init()
kernel_init()
ret_from_fork()
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 80f5c1b84b)
Change-Id: Ibffde077d18eabec4c2984158bd9d6d73bd0fb96
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Vincent and Yuyang found another few scenarios in which entity
tracking goes wobbly.
The scenarios are basically due to the fact that new tasks are not
immediately attached and thereby differ from the normal situation -- a
task is always attached to a cfs_rq load average (such that it
includes its blocked contribution) and are explicitly
detached/attached on migration to another cfs_rq.
Scenario 1: switch to fair class
p->sched_class = fair_class;
if (queued)
enqueue_task(p);
...
enqueue_entity()
enqueue_entity_load_avg()
migrated = !sa->last_update_time (true)
if (migrated)
attach_entity_load_avg()
check_class_changed()
switched_from() (!fair)
switched_to() (fair)
switched_to_fair()
attach_entity_load_avg()
If @p is a new task that hasn't been fair before, it will have
!last_update_time and, per the above, end up in
attach_entity_load_avg() _twice_.
Scenario 2: change between cgroups
sched_move_group(p)
if (queued)
dequeue_task()
task_move_group_fair()
detach_task_cfs_rq()
detach_entity_load_avg()
set_task_rq()
attach_task_cfs_rq()
attach_entity_load_avg()
if (queued)
enqueue_task();
...
enqueue_entity()
enqueue_entity_load_avg()
migrated = !sa->last_update_time (true)
if (migrated)
attach_entity_load_avg()
Similar as with scenario 1, if @p is a new task, it will have
!load_update_time and we'll end up in attach_entity_load_avg()
_twice_.
Furthermore, notice how we do a detach_entity_load_avg() on something
that wasn't attached to begin with.
As stated above; the problem is that the new task isn't yet attached
to the load tracking and thereby violates the invariant assumption.
This patch remedies this by ensuring a new task is indeed properly
attached to the load tracking on creation, through
post_init_entity_util_avg().
Of course, this isn't entirely as straightforward as one might think,
since the task is hashed before we call wake_up_new_task() and thus
can be poked at. We avoid this by adding TASK_NEW and teaching
cpu_cgroup_can_attach() to refuse such tasks.
.:: BACKPORT
Complicated by the fact that mch of the lines changed by the original
of this commit were then changed by:
df217913e7 sched/fair: Factorize attach/detach entity <Vincent Guittot>
and then
d31b1a66cb sched/fair: Factorize PELT update <Vincent Guittot>
, which have both already been backported here.
Reported-by: Yuyang Du <yuyang.du@intel.com>
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7dc603c902)
Change-Id: Ibc59eb52310a62709d49a744bd5a24e8b97c4ae8
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
A new fair task is detached and attached from/to task_group with:
cgroup_post_fork()
ss->fork(child) := cpu_cgroup_fork()
sched_move_task()
task_move_group_fair()
Which is wrong, because at this point in fork() the task isn't fully
initialized and it cannot 'move' to another group, because its not
attached to any group as yet.
In fact, cpu_cgroup_fork() needs a small part of sched_move_task() so we
can just call this small part directly instead sched_move_task(). And
the task doesn't really migrate because it is not yet attached so we
need the following sequence:
do_fork()
sched_fork()
__set_task_cpu()
cgroup_post_fork()
set_task_rq() # set task group and runqueue
wake_up_new_task()
select_task_rq() can select a new cpu
__set_task_cpu
post_init_entity_util_avg
attach_task_cfs_rq()
activate_task
enqueue_task
This patch makes that happen.
BACKPORT: Difference from original commit:
- Removed use of DEQUEUE_MOVE (which isn't defined in 4.4) in
dequeue_task flags
- Replaced "struct rq_flags rf" with "unsigned long flags".
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
[ Added TASK_SET_GROUP to set depth properly. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit ea86cb4b76)
Change-Id: I8126fd923288acf961218431ffd29d6bf6fd8d72
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
The task_fork_fair() callback already calls __set_task_cpu() and takes
rq->lock.
If we move the sched_class::task_fork callback in sched_fork() under
the existing p->pi_lock, right after its set_task_cpu() call, we can
avoid doing two such calls and omit the IRQ disabling on the rq->lock.
Change to __set_task_cpu() to skip the migration bits, this is a new
task, not a migration. Similarly, make wake_up_new_task() use
__set_task_cpu() for the same reason, the task hasn't actually
migrated as it hasn't ever ran.
This cures the problem of calling migrate_task_rq_fair(), which does
remove_entity_from_load_avg() on tasks that have never been added to
the load avg to begin with.
This bug would result in transiently messed up load_avg values, averaged
out after a few dozen milliseconds. This is probably the reason why
this bug was not found for such a long time.
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit e210bffd39)
Change-Id: Icbddbaa6e8c1071859673d8685bc3f38955cf144
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical.
The original commit that adds it:
fab476228b ("sched: Force balancing on newidle balance if local group has capacity")
explains:
Under certain situations, such as a niced down task (i.e. nice =
-15) in the presence of nr_cpus NICE0 tasks, the niced task lands
on a sched group and kicks away other tasks because of its large
weight. This leads to sub-optimal utilization of the
machine. Even though the sched group has capacity, it does not
pull tasks because sds.this_load >> sds.max_load, and f_b_g()
returns NULL.
A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU
capacity) systems - consider 8 always-running, same-priority tasks on a
system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on
the "big" CPUs (which will be represented by one sched_group in the DIE
sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving
one CPU unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".
The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.
Change-Id: I807ba5cba0ef1b8bbec02cbcd4755fd32af10135
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170807163900.25180-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 583ffd99d7 tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
When using schedfreq on cpus with max capacity significantly smaller than
1024, the tick update uses non-normalised capacities - this leads to
selecting an incorrect OPP as we were scaling the frequency as if the
max capacity achievable was 1024 rather than the max for that particular
cpu or group. This could result in a cpu being stuck at the lowest OPP
and unable to generate enough utilisation to climb out if the max
capacity is significantly smaller than 1024.
Instead, normalize the capacity to be in the range 0-1024 in the tick
so that when we later select a frequency, we get the correct one.
Also comments updated to be clearer about what is needed.
Change-Id: Id84391c7ac015311002ada21813a353ee13bee60
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
While set_task_rq_fair() is introduced in mainline by commit ad936d8658
("sched/fair: Make it possible to account fair load avg consistently"),
the function results to be introduced here by the backport of
commit 09a43ace1f ("sched/fair: Propagate load during synchronous
attach/detach"). The problem (apart from the confusion introduced by the
backport) is actually that set_task_rq_fair() is currently not called at
all.
Fix the problem by backporting again commit ad936d8658
("sched/fair: Make it possible to account fair load avg consistently").
Original change log:
The current code accounts for the time a task was absent from the fair
class (per ATTACH_AGE_LOAD). However it does not work correctly when a
task got migrated or moved to another cgroup while outside of the fair
class.
This patch tries to address that by aging on migration. We locklessly
read the 'last_update_time' stamp from both the old and new cfs_rq,
ages the load upto the old time, and sets it to the new time.
These timestamps should in general not be more than 1 tick apart from
one another, so there is a definite bound on things.
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[ Changelog, a few edits and !SMP build fix ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445616981-29904-2-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked from ad936d8658)
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I17294ab0ada3901d35895014715fd60952949358
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
This flag determines whether the thread should currently
process the work in the thread->todo worklist.
The prime usecase for this is improving the performance
of synchronous transactions: all synchronous transactions
post a BR_TRANSACTION_COMPLETE to the calling thread,
but there's no reason to return that command to userspace
right away - userspace anyway needs to wait for the reply.
Likewise, a synchronous transaction that contains a binder
object can cause a BC_ACQUIRE/BC_INCREFS to be returned to
userspace; since the caller must anyway hold a strong/weak
ref for the duration of the call, postponing these commands
until the reply comes in is not a problem.
Note that this flag is not used to determine whether a
thread can handle process work; a thread should never pick
up process work when thread work is still pending.
Before patch:
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_sendVec_binderize/4 45959 ns 20288 ns 34351
BM_sendVec_binderize/8 45603 ns 20080 ns 34909
BM_sendVec_binderize/16 45528 ns 20113 ns 34863
BM_sendVec_binderize/32 45551 ns 20122 ns 34881
BM_sendVec_binderize/64 45701 ns 20183 ns 34864
BM_sendVec_binderize/128 45824 ns 20250 ns 34576
BM_sendVec_binderize/256 45695 ns 20171 ns 34759
BM_sendVec_binderize/512 45743 ns 20211 ns 34489
BM_sendVec_binderize/1024 46169 ns 20430 ns 34081
After patch:
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_sendVec_binderize/4 42939 ns 17262 ns 40653
BM_sendVec_binderize/8 42823 ns 17243 ns 40671
BM_sendVec_binderize/16 42898 ns 17243 ns 40594
BM_sendVec_binderize/32 42838 ns 17267 ns 40527
BM_sendVec_binderize/64 42854 ns 17249 ns 40379
BM_sendVec_binderize/128 42881 ns 17288 ns 40427
BM_sendVec_binderize/256 42917 ns 17297 ns 40429
BM_sendVec_binderize/512 43184 ns 17395 ns 40411
BM_sendVec_binderize/1024 43119 ns 17357 ns 40432
Signed-off-by: Martijn Coenen <maco@android.com>
Change-Id: Ia70287066d62aba64e98ac44ff1214e37ca75693
Show the high watermark of the index into the alloc->pages
array, to facilitate sizing the buffer on a per-process
basis.
Change-Id: I2b40cd16628e0ee45216c51dc9b3c5b0c862032e
Signed-off-by: Martijn Coenen <maco@android.com>
(cherry picked from commit 82d24d114f)
Commit a1d5ebaf8c ("arm64: big-endian: don't treat code as data when
copying sigret code") moved the 32-bit sigreturn trampoline code from
the aarch32_sigret_code array to kuser32.S. The commit removed the
array definition from signal32.c, but not its declaration in
signal32.h. Remove the leftover declaration.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Bug: 20045882
Bug: 63737556
Change-Id: Ic8a5f0e367f0ecd5c5ddd9e3885d0285f91cf89e
The util returned from group_max_util is not capped at the max util
present in the group, so it can be larger than the capacity stored in
the array. Ensure that when this happens, we always use the last entry
in the array to fetch energy from.
Tested with synthetics on Juno board.
Bug: 38159576
Change-Id: I89fb52fb7e68fa3e682e308acc232596672d03f7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
The code is to get the CPU util by accumulate different scheduling
classes and when the total util value is larger than CPU capacity
then it clamps util to CPU maximum capacity. So we can get correct util
value when use PELT signal but if with WALT signal it misses to clamp
util value.
On the other hand, WALT doesn't accumulate different class utilization
but it needs to applying boost margin for WALT signal the CPU util
value is possible to be larger than CPU capacity; so this patch is to
always clamp util to CPU maximum capacity.
Change-Id: I05481ddbf20246bb9be15b6bd21b6ec039015ea8
Signed-off-by: Leo Yan <leo.yan@linaro.org>
When we convert capacity into frequency, we used policy->max to get
the max freq of the cpu. Since this can be changed by userspace policy
or thermal events, we are potentially asking for a lower frequency
than the utilization demands.
Change over to using cpuinfo.max which is the max freq supported by
that cpu rather than the currently-chosen max. Frequency granted still
honours the max policy.
Tested by setting a userspace policy and observing the relevant vars
in a trace. In this instance, we ask for around 1ghz instead of 620MHz.
freq_new=1013512
unfixed_freq_new=624487
capacity=546
cpuinfo_max=1900800
policy_max=1171200
Change-Id: I8c5694db42243c6fb78bb9be9046b06ac81295e7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
cumulative_runnable_avg was introduced in commit ee4cebd75e ("sched:
EAS/WALT: use cr_avg instead of prev_runnable_sum") in cpu_util() for
task placement, which is used to replace prev_runnable_sum.
Fix util_avg_walt in sched_load_avg_cpu trace, which use prev_runnable_sum
for cpu_util().
Moreover, fix potential overflow due to cumulative_runnable_avg is in u64.
Change-Id: I1220477bf2ff32a6e34a34b6280b15a8178203a8
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
In order to set rq->misfit_task in time, call update_task_ravg() prior
to task_tick. This reduces upmigration delay by 1 scheduler window.
Change-Id: I7cc80badd423f2e7684125fbfd853b0a3610f0e8
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
At present, sched_freq_tick() skips updating of capacity update when
current frequency is fmax. This can cause incorrect frequency drop
when a CPU bound task goes into sleep for example :
1) A task (A) enqueues onto CPU 0 and executes for long time.
2) A new task (B) which has low task demand enqueues onto CPU 1 and
executes long so becomes a CPU bound task.
3) Both CPU 0 and 1 gets scheduler tick but skip sched_freq_tick()
since current frequency is fmax.
4) Task (A) sleeps and lower the CPU 0's capacity request.
5) Because task (B) voted CPU capacity at step 2 with low demand and
skipped to request afterwards, cluster frequency for both CPU 0
and 1 drops to match capacity voted by CPU 1 at step 2 even though
task (B) on CPU 1 requires max capacity.
Fix such incorrectness by not skipping CPU capacity voting at tick
path.
Change-Id: Ieb46af1ac96ffce7a5532c58c7f07bf1ada06b86
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
At present need_active_balance() determines whether an active
upmigration is needed by using capacity_of(). A CPU's capacity
may be reduced by RT pressure, and therefore distinguishing
capability differences with capacity_of() may lead to suboptimal
active migrations to less capable CPUs. Use capacity_orig_of
to distinguish differently capable CPUs in addition to
capacity_of(), thus avoiding placing tasks on less capable CPUs
due to instantaneous RT pressure.
Change-Id: I3e1435246a8edc3ad618ef98a34866cfbd8c16a5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[markivx: Reworked the commit text a bit]
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
There's no need for a separate hierarchy of notifiers, APIs
and variables in walt.c for the purpose of applying frequency
and IPC invariance. Let's just use capacity_curr_of and get
rid of a lot of the infrastructure relating to capacity,
load_scale_factor etc.
Change-Id: Ia220e2c896373fa535db05bff60f9aa33aefc978
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
LLVM bug 30792 causes clang's AArch64 backend to crash compiling
arch/arm64/crypto/aes-ce-cipher.c. Replacing -mgeneral-regs-only with
-mno-implicit-float is the suggested workaround.
Drop this patch once the clang bug has been fixed.
Change-Id: I7c7bb9315a281970698120a6d2a9fcd126aad65e
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Android has an unusual setup where the kernel needs to target
[arch]-linux-gnu to avoid Android userspace-specific flags and
optimizations, but AOSP doesn't ship a matching binutils.
Add a new variable CLANG_TRIPLE which can override the "-target" triple
used to compile the kernel, while using a different CROSS_COMPILE to
pick the binutils/gcc installation. For Android you'd do something
like:
export CLANG_TRIPLE=aarch64-linux-gnu-
export CROSS_COMPILE=aarch64-linux-android-
If you don't need something like this, leave CLANG_TRIPLE unset and it
will default to CROSS_COMPILE.
Change-Id: Ib544c37f4ee4ed005437471b2984486a3e7c0da7
Signed-off-by: Greg Hackmann <ghackmann@google.com>
clang raises 'asm-operand-widths' warnings in inline assembly code when
the size of an operand is < 64 bits and the operand width is unspecified.
Most warnings are raised in macros, i.e. the datatype of the operand may
vary. Most of these warnings are fixed in upstream, however we consider it
isn't worth the effort/risk to backport all the necessary changes. On
future CrOS kernels >= v4.13 the warning should be re-enabled.
Change-Id: Ia331bc83d44b8c1499450aefb45c576cd29ebf55
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Greg Hackmann <ghackmann@google.com>
clang generates plenty of these warnings in different parts of the code.
They are mostly caused by container_of() and other macros which declare
a "const <type> *" variable for their internal use which triggers a
"duplicate 'const' specifier" warning if the <type> is already const
qualified.
Change-Id: I85ffb201003d3a04fe8b8ff94478344250d2db68
Wording-mostly-from: Michael Davidson <md@google.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Commit:
d77698df39 ("x86/build: Specify stack alignment for clang")
intended to use the same stack alignment for clang as with gcc.
The two compilers use different options to configure the stack alignment
(gcc: -mpreferred-stack-boundary=n, clang: -mstack-alignment=n).
The above commit assumes that the clang option uses the same parameter
type as gcc, i.e. that the alignment is specified as 2^n. However clang
interprets the value of this option literally to use an alignment of n,
in consequence the stack remains misaligned.
Change the values used with -mstack-alignment to be the actual alignment
instead of a power of two.
cc-option isn't used here with the typical pattern of KBUILD_CFLAGS +=
$(call cc-option ...). The reason is that older gcc versions don't
support the -mpreferred-stack-boundary option, since cc-option doesn't
verify whether the alternative option is valid it would incorrectly
select the clang option -mstack-alignment..
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bernhard.Rosenkranzer@linaro.org
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michael Davidson <md@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hines <srhines@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dianders@chromium.org
Link: http://lkml.kernel.org/r/20170817004740.170588-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 8f91869766)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: I7991bfed754f5ac10ac8b383c20ec89d56b2afc0
Clang may emit absolute symbol references when building in non-PIC mode,
even when using the default 'small' code model, which is already mostly
position independent to begin with, due to its use of adrp/add pairs
that have a relative range of +/- 4 GB. The remedy is to pass the -fpie
flag, which can be done safely now that the code has been updated to avoid
GOT indirections (which may be emitted due to the compiler assuming that
the PIC/PIE code may end up in a shared library that is subject to ELF
symbol preemption)
Passing -fpie when building code that needs to execute at an a priori
unknown offset is arguably an improvement in any case, and given that
the recent visibility changes allow the PIC build to pass with GCC as
well, let's add -fpie for all arm64 builds rather than only for Clang.
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170818194947.19347-5-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 91ee5b21ee)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: I0a011945239d39a2d1eb04c20bf1b9ceb7d2b91d
The motivation for commit abb2ea7dfd ("compiler, clang: suppress
warning for unused static inline functions") was to suppress clang's
warnings about unused static inline functions.
For configs without CONFIG_OPTIMIZE_INLINING enabled, such as any non-x86
architecture, `inline' in the kernel implies that
__attribute__((always_inline)) is used.
Some code depends on that behavior, see
https://lkml.org/lkml/2017/6/13/918:
net/built-in.o: In function `__xchg_mb':
arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99'
arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99
The full fix would be to identify these breakages and annotate the
functions with __always_inline instead of `inline'. But since we are
late in the 4.12-rc cycle, simply carry forward the forced inlining
behavior and work toward moving arm64, and other architectures, toward
CONFIG_OPTIMIZE_INLINING behavior.
(cherry picked from commit 9a04dbcfb3)
Change-Id: I13891c2f1e588d8c7febe5d2d57134abb31d6ecd
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1706261552200.1075@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Sodagudi Prasad <psodagud@codeaurora.org>
Tested-by: Sodagudi Prasad <psodagud@codeaurora.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Replace the inline asm which exports struct offsets as ELF symbols
with proper const variables exposing the same values. This works
around an issue with Clang which does not interpret the "i" (or "I")
constraints in the same way as GCC.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit f4857f4c2e)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: I1f882de15bd447d6fc41858dfc0cbfd3f6e2466c
clang generates plenty of these warnings in different parts of the code,
to an extent that the warnings are little more than noise. Disable the
'address-of-packed-member' warning.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked from commit bfb38988c5)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: I35ecf1b35a908d41ee791a8a651e3cfb4edd081b
For gcc stack alignment is configured with -mpreferred-stack-boundary=N,
clang has the option -mstack-alignment=N for that purpose. Use the same
alignment as with gcc.
If the alignment is not specified clang assumes an alignment of
16 bytes, as required by the standard ABI. However as mentioned in
d9b0cde91c ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if
supported") the standard kernel entry on x86-64 leaves the stack
on an 8-byte boundary, as a consequence clang will keep the stack
misaligned.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked commit d77698df39)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: I4283d10c6fe31cf194b35adc5371732b89eb3ae3
cc-option is used to enable compiler options for the boot code if they
are available. The macro uses KBUILD_CFLAGS and KBUILD_CPPFLAGS for the
check, however these flags aren't used to build the boot code, in
consequence cc-option can yield wrong results. For example
-mpreferred-stack-boundary=2 is never set with a 64-bit compiler,
since the setting is only valid for 16 and 32-bit binaries. This
is also the case for 32-bit kernel builds, because the option -m32 is
added to KBUILD_CFLAGS after the assignment of REALMODE_CFLAGS.
Use __cc-option instead of cc-option for the boot mode options.
The macro receives the compiler options as parameter instead of using
KBUILD_C*FLAGS, for the boot code we pass REALMODE_CFLAGS.
Also use separate statements for the __cc-option checks instead
of performing them in the initial assignment of REALMODE_CFLAGS since
the variable is an input of the macro.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked commit 032a2c4f65)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: I7756f875771edb00238eb770be912f713407681a
cc-option uses KBUILD_CFLAGS and KBUILD_CPPFLAGS when it determines
whether an option is supported or not. This is fine for options used to
build the kernel itself, however some components like the x86 boot code
use a different set of flags.
Add the new macro __cc-option which is a more generic version of
cc-option with additional parameters. One parameter is the compiler
with which the check should be performed, the other the compiler options
to be used instead KBUILD_C*FLAGS.
Refactor cc-option and hostcc-option to use __cc-option and move
hostcc-option to scripts/Kbuild.include.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked from commit 9f3f1fd299)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Conflicts:
scripts/Kbuild.include
Change-Id: I4c8288b9c74bd6b9199307a0e04b78a27e28361d
People complained about ARCH_HWEIGHT_CFLAGS and how it throws a wrench
into kcov, lto, etc, experimentations.
Add asm versions for __sw_hweight{32,64}() and do explicit saving and
restoring of clobbered registers. This gets rid of the special calling
convention. We get to call those functions on !X86_FEATURE_POPCNT CPUs.
We still need to hardcode POPCNT and register operands as some old gas
versions which we support, do not know about POPCNT.
Btw, remove redundant REX prefix from 32-bit POPCNT because alternatives
can do padding now.
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1464605787-20603-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit f5967101e9)
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Conflicts:
lib/Makefile
Change-Id: Ie7e6dce51c7093b1162337ec8bfc5abde0d79688
The constraint "rm" allows the compiler to put mix_const into memory.
When the input operand is a memory location then MUL needs an operand
size suffix, since Clang can't infer the multiplication width from the
operand.
Add and use the _ASM_MUL macro which determines the operand size and
resolves to the NUL instruction with the corresponding suffix.
This fixes the following error when building with clang:
CC arch/x86/lib/kaslr.o
/tmp/kaslr-dfe1ad.s: Assembler messages:
/tmp/kaslr-dfe1ad.s:182: Error: no instruction mnemonic suffix given and no register operands; can't size instruction
Upstream commit: 121843eb02
Change-Id: I53f51839705dabeb6c950d1def3a45881294129c
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Cc: Grant Grundler <grundler@chromium.org>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Davidson <md@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170501224741.133938-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Hackmann <ghackmann@google.com>
aes_ctrby8_avx-x86_64.S uses the C preprocessor for token pasting
of character sequences that are not valid preprocessor tokens.
While this is allowed when preprocessing assembler files it exposes
an incompatibilty between the clang and gcc preprocessors where
clang does not strip leading white space from macro parameters,
leading to the CONCAT(%xmm, i) macro expansion on line 96 resulting
in a token with a space character embedded in it.
While this could be resolved by deleting the offending space character,
the assembler is perfectly capable of doing the token pasting correctly
for itself so we can just get rid of the preprocessor macros.
Signed-off-by: Michael Davidson <md@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit fdb2726f4e)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: I087414d3575ea7b8703f39d429ccbf0361b314ae
Commit abb2ea7dfd ("compiler, clang: suppress warning for unused
static inline functions") just caused more warnings due to re-defining
the 'inline' macro.
So undef it before re-defining it, and also add the 'notrace' attribute
like the gcc version that this is overriding does.
Maybe this makes clang happier.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6d53cefb18)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: Ie01b45583954c6104c854a3810e35c1171764e78
GCC explicitly does not warn for unused static inline functions for
-Wunused-function. The manual states:
Warn whenever a static function is declared but not defined or
a non-inline static function is unused.
Clang does warn for static inline functions that are unused.
It turns out that suppressing the warnings avoids potentially complex
Suppress the warning for clang.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit abb2ea7dfd)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: I68e6246b03c962cc87b9d0bf4b7fefeda27068c0
The default __UNIQUE_ID macro in compiler.h fails to work for some drivers:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c:615:1: error: redefinition of
'__UNIQUE_ID_firmware615'
BRCMF_FW_NVRAM_DEF(4354, "brcmfmac4354-sdio.bin", "brcmfmac4354-sdio.txt");
This adds a copy of the version we use for gcc-4.3 and higher, as the same
one works with all versions of clang that I could find in svn (2.6 and higher).
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
(cherry picked from commit b41c29b052)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: I161dfa3ccb6b226966c3c87bba6b2fff1561bc61
clang warns about unused inline functions by default:
arch/arm/crypto/aes-cipher-glue.c:68:1: warning: unused function '__inittest' [-Wunused-function]
arch/arm/crypto/aes-cipher-glue.c:69:1: warning: unused function '__exittest' [-Wunused-function]
As these appear in every single module, let's just disable the warnings by marking the
two functions as __maybe_unused.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jessica Yu <jeyu@redhat.com>
(cherry picked from commit 1f318a8baf)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: I39c75bdb61834020320d41a678dfcc9442f07e4b
Add rules to kbuild in order to generate LLVM assembly files with the .ll
extension when using clang.
# from c code
make CC=clang kernel/pid.ll
Signed-off-by: Vinícius Tinti <viniciustinti@gmail.com>
Signed-off-by: Behan Webster <behanw@converseincode.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked from commit 433db3e260)
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Change-Id: I1fcc7ec14357e19e46cc2dd1772c5c258aec91d1