Commit Graph

818 Commits

Author SHA1 Message Date
Mark Brown
43b74bf7c6 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2015-02-02 11:29:26 +00:00
Mark Brown
692d96aa22 Merge tag 'v3.10.67' into linux-linaro-lsk
This is the 3.10.67 stable release
2015-02-02 11:29:07 +00:00
Sasha Levin
2497402c9a time: adjtimex: Validate the ADJ_FREQUENCY values
commit 5e5aeb4367 upstream.

Verify that the frequency value from userspace is valid and makes sense.

Unverified values can cause overflows later on.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[jstultz: Fix up bug for negative values and drop redunent cap check]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29 17:40:56 -08:00
Mark Brown
17d1e25c88 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2015-01-22 11:00:04 +00:00
Mark Brown
709d228228 Merge remote-tracking branch 'lsk/v3.10/topic/hrtimer' into linux-linaro-lsk
Conflicts:
	kernel/time/clockevents.c
	kernel/time/tick-broadcast.c
	kernel/time/tick-common.c
	kernel/time/tick-internal.h
2015-01-21 22:27:49 +00:00
Preeti U Murthy
eab6f41c54 tick: Fixup more fallout from hrtimer broadcast mode
The hrtimer mode of broadcast is supported only when
GENERIC_CLOCKEVENTS_BROADCAST and TICK_ONESHOT config options
are enabled. Hence compile in the functions for hrtimer mode
of broadcast only when these options are selected.
Also fix max_delta_ticks value for the pseudo clock device.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/52F719EE.9010304@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 849401b66d)
Signed-off-by: Mark Brown <broonie@kernel.org>

Conflicts:
	kernel/time/Makefile
2015-01-21 22:08:58 +00:00
Thomas Gleixner
a8db4f03c3 time: Fixup fallout from recent clockevent/tick changes
Make the stub function static inline instead of static and move the
clockevents related function into the proper ifdeffed section.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Soren Brinkmann <soren.brinkmann@xilinx.com>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
(cherry picked from commit f1689bb7ab)
Signed-off-by: Mark Brown <broonie@kernel.org>
2015-01-21 22:08:21 +00:00
Preeti U Murthy
fe20b8ebfd tick: Introduce hrtimer based broadcast
On some architectures, in certain CPU deep idle states the local timers stop.
An external clock device is used to wakeup these CPUs. The kernel support for the
wakeup of these CPUs is provided by the tick broadcast framework by using the
external clock device as the wakeup source.

However not all implementations of architectures provide such an external
clock device. This patch includes support in the broadcast framework to handle
the wakeup of the CPUs in deep idle states on such systems by queuing a hrtimer
on one of the CPUs, which is meant to handle the wakeup of CPUs in deep idle states.

This patchset introduces a pseudo clock device which can be registered by the
archs as tick_broadcast_device in the absence of a real external clock
device. Once registered, the broadcast framework will work as is for these
architectures as long as the archs take care of the BROADCAST_ENTER
notification failing for one of the CPUs. This CPU is made the stand by CPU to
handle wakeup of the CPUs in deep idle and it *must not enter deep idle states*.

The CPU with the earliest wakeup is chosen to be this CPU. Hence this way the
stand by CPU dynamically moves around and so does the hrtimer which is queued
to trigger at the next earliest wakeup time. This is consistent with the case where
an external clock device is present. The smp affinity of this clock device is
set to the CPU with the earliest wakeup. This patchset handles the hotplug of
the stand by CPU as well by moving the hrtimer on to the CPU handling the CPU_DEAD
notification.

Originally-from: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: deepthi@linux.vnet.ibm.com
Cc: paulmck@linux.vnet.ibm.com
Cc: fweisbec@gmail.com
Cc: paulus@samba.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Cc: svaidy@linux.vnet.ibm.com
Cc: peterz@infradead.org
Cc: benh@kernel.crashing.org
Cc: rafael.j.wysocki@intel.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20140207080632.17187.80532.stgit@preeti.in.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 5d1638acb9)
Signed-off-by: Mark Brown <broonie@kernel.org>

Conflicts:
	kernel/time/Makefile
2015-01-21 22:08:01 +00:00
Preeti U Murthy
05c2671739 time: Change the return type of clockevents_notify() to integer
The broadcast framework can potentially be made use of by archs which do not have an
external clock device as well. Then, it is required that one of the CPUs need
to handle the broadcasting of wakeup IPIs to the CPUs in deep idle. As a
result its local timers should remain functional all the time. For such
a CPU, the BROADCAST_ENTER notification has to fail indicating that its clock
device cannot be shutdown. To make way for this support, change the return
type of tick_broadcast_oneshot_control() and hence clockevents_notify() to
indicate such scenarios.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: deepthi@linux.vnet.ibm.com
Cc: paulmck@linux.vnet.ibm.com
Cc: fweisbec@gmail.com
Cc: paulus@samba.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Cc: svaidy@linux.vnet.ibm.com
Cc: peterz@infradead.org
Cc: benh@kernel.crashing.org
Cc: rafael.j.wysocki@intel.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20140207080606.17187.78306.stgit@preeti.in.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit da7e6f45c3)
Signed-off-by: Mark Brown <broonie@kernel.org>
2015-01-21 21:57:00 +00:00
Thomas Gleixner
32f45d57f4 clockevents: Move the tick_notify() switch case to clockevents_notify()
No need to call another function and have duplicated cases.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143436.235746557@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 8c53daf63f)
Signed-off-by: Mark Brown <broonie@kernel.org>

Conflicts:
	kernel/time/tick-common.c
2015-01-21 21:54:18 +00:00
Thomas Gleixner
6ad33af5f2 clockevents: Get rid of the notifier chain
7+ years and still a single user. Kill it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143436.098520211@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 7172a286ce)
Signed-off-by: Mark Brown <broonie@kernel.org>
2015-01-21 21:50:31 +00:00
Mark Brown
4a105b2f85 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-10-06 17:33:43 +01:00
Richard Larocque
3c47864204 alarmtimer: Lock k_itimer during timer callback
commit 474e941bed upstream.

Locks the k_itimer's it_lock member when handling the alarm timer's
expiry callback.

The regular posix timers defined in posix-timers.c have this lock held
during timout processing because their callbacks are routed through
posix_timer_fn().  The alarm timers follow a different path, so they
ought to grab the lock somewhere else.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sharvil Nanavati <sharvil@google.com>
Signed-off-by: Richard Larocque <rlarocque@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05 14:54:14 -07:00
Richard Larocque
5cebda5d05 alarmtimer: Do not signal SIGEV_NONE timers
commit 265b81d23a upstream.

Avoids sending a signal to alarm timers created with sigev_notify set to
SIGEV_NONE by checking for that special case in the timeout callback.

The regular posix timers avoid sending signals to SIGEV_NONE timers by
not scheduling any callbacks for them in the first place.  Although it
would be possible to do something similar for alarm timers, it's simpler
to handle this as a special case in the timeout.

Prior to this patch, the alarm timer would ignore the sigev_notify value
and try to deliver signals to the process anyway.  Even worse, the
sanity check for the value of sigev_signo is skipped when SIGEV_NONE was
specified, so the signal number could be bogus.  If sigev_signo was an
unitialized value (as it often would be if SIGEV_NONE is used), then
it's hard to predict which signal will be sent.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sharvil Nanavati <sharvil@google.com>
Signed-off-by: Richard Larocque <rlarocque@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05 14:54:14 -07:00
Alex Shi
c482787d23 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-08-08 13:38:04 +08:00
Jan Kara
562eebeb9c timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
commit 504d58745c upstream.

clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:

======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04b #2 Not tainted
-------------------------------------------------------
trinity-main/74 is trying to acquire lock:
 (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c

but task is already holding lock:
 (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (hrtimer_bases.lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
       [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
       [<81080792>] task_clock_event_start+0x3a/0x3f
       [<810807a4>] task_clock_event_add+0xd/0x14
       [<8108259a>] event_sched_in+0xb6/0x17a
       [<810826a2>] group_sched_in+0x44/0x122
       [<81082885>] ctx_sched_in.isra.67+0x105/0x11f
       [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
       [<81082bf6>] __perf_install_in_context+0x8b/0xa3
       [<8107eb8e>] remote_function+0x12/0x2a
       [<8105f5af>] smp_call_function_single+0x2d/0x53
       [<8107e17d>] task_function_call+0x30/0x36
       [<8107fb82>] perf_install_in_context+0x87/0xbb
       [<810852c9>] SYSC_perf_event_open+0x5c6/0x701
       [<810856f9>] SyS_perf_event_open+0x17/0x19
       [<8142f8ee>] syscall_call+0x7/0xb

-> #4 (&ctx->lock){......}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

-> #3 (&rq->lock){-.-.-.}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81040873>] __task_rq_lock+0x33/0x3a
       [<8104184c>] wake_up_new_task+0x25/0xc2
       [<8102474b>] do_fork+0x15c/0x2a0
       [<810248a9>] kernel_thread+0x1a/0x1f
       [<814232a2>] rest_init+0x1a/0x10e
       [<817af949>] start_kernel+0x303/0x308
       [<817af2ab>] i386_start_kernel+0x79/0x7d

-> #2 (&p->pi_lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<810413dd>] try_to_wake_up+0x1d/0xd6
       [<810414cd>] default_wake_function+0xb/0xd
       [<810461f3>] __wake_up_common+0x39/0x59
       [<81046346>] __wake_up+0x29/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #1 (&tty->write_wait){-.....}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<81046332>] __wake_up+0x15/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #0 (&port_lock_key){-.....}:
       [<8104a62d>] __lock_acquire+0x9ea/0xc6d
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<811c60be>] serial8250_console_write+0x8c/0x10c
       [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
       [<8104f5d5>] console_unlock+0x1d7/0x398
       [<8104fb70>] vprintk_emit+0x3da/0x3e4
       [<81425f76>] printk+0x17/0x19
       [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
       [<8105c548>] clockevents_program_event+0xe7/0xf3
       [<8105cc1c>] tick_program_event+0x1e/0x23
       [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
       [<8103c49e>] __remove_hrtimer+0x5b/0x79
       [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
       [<8103cb4b>] hrtimer_cancel+0xd/0x18
       [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [<81080705>] task_clock_event_stop+0x20/0x64
       [<81080756>] task_clock_event_del+0xd/0xf
       [<81081350>] event_sched_out+0xab/0x11e
       [<810813e0>] group_sched_out+0x1d/0x66
       [<81081682>] ctx_sched_out+0xaf/0xbf
       [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

other info that might help us debug this:

Chain exists of:
  &port_lock_key --> &ctx->lock --> hrtimer_bases.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hrtimer_bases.lock);
                               lock(&ctx->lock);
                               lock(hrtimer_bases.lock);
  lock(&port_lock_key);

 *** DEADLOCK ***

4 locks held by trinity-main/74:
 #0:  (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
 #1:  (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
 #2:  (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
 #3:  (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4

stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
 [<81426f69>] dump_stack+0x16/0x18
 [<81425a99>] print_circular_bug+0x18f/0x19c
 [<8104a62d>] __lock_acquire+0x9ea/0xc6d
 [<8104a942>] lock_acquire+0x92/0x101
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c60be>] serial8250_console_write+0x8c/0x10c
 [<8104af87>] ? lock_release+0x191/0x223
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
 [<8104f5d5>] console_unlock+0x1d7/0x398
 [<8104fb70>] vprintk_emit+0x3da/0x3e4
 [<81425f76>] printk+0x17/0x19
 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
 [<8105cc1c>] tick_program_event+0x1e/0x23
 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
 [<8103c49e>] __remove_hrtimer+0x5b/0x79
 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
 [<8103cb4b>] hrtimer_cancel+0xd/0x18
 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
 [<81080705>] task_clock_event_stop+0x20/0x64
 [<81080756>] task_clock_event_del+0xd/0xf
 [<81081350>] event_sched_out+0xab/0x11e
 [<810813e0>] group_sched_out+0x1d/0x66
 [<81081682>] ctx_sched_out+0xaf/0xbf
 [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
 [<8104416d>] ? __dequeue_entity+0x23/0x27
 [<81044505>] ? pick_next_task_fair+0xb1/0x120
 [<8142cacc>] __schedule+0x4c6/0x4cb
 [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
 [<810475b0>] ? trace_hardirqs_off+0xb/0xd
 [<81056346>] ? rcu_irq_exit+0x64/0x77

Fix the problem by using printk_deferred() which does not call into the
scheduler.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07 14:30:26 -07:00
Alex Shi
7c18f677c1 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-07-29 13:55:15 +08:00
John Stultz
c933192733 alarmtimer: Fix bug where relative alarm timers were treated as absolute
commit 16927776ae upstream.

Sharvil noticed with the posix timer_settime interface, using the
CLOCK_REALTIME_ALARM or CLOCK_BOOTTIME_ALARM clockid, if the users
tried to specify a relative time timer, it would incorrectly be
treated as absolute regardless of the state of the flags argument.

This patch corrects this, properly checking the absolute/relative flag,
as well as adds further error checking that no invalid flag bits are set.

Reported-by: Sharvil Nanavati <sharvil@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sharvil Nanavati <sharvil@google.com>
Link: http://lkml.kernel.org/r/1404767171-6902-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28 08:00:07 -07:00
Mark Brown
471dd6b43f Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-07-01 11:20:33 +01:00
Thomas Gleixner
ec804bd9e1 nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off
commit 0e576acbc1 upstream.

If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ.

If CONFIG_NO_HZ=y and the nohz functionality is disabled via the
command line option "nohz=off" or not enabled due to missing hardware
support, then tick_nohz_get_sleep_length() returns 0. That happens
because ts->sleep_length is never set in that case.

Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive.

Reported-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rui Xiang <rui.xiang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26 15:12:41 -04:00
Alex Shi
696e897344 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-03-28 11:07:37 +08:00
Thomas Gleixner
a191212af8 tick: Make oneshot broadcast robust vs. CPU offlining
commit c9b5a266b1 upstream.

In periodic mode we remove offline cpus from the broadcast propagation
mask. In oneshot mode we fail to do so. This was not a problem so far,
but the recent changes to the broadcast propagation introduced a
constellation which can result in a NULL pointer dereference.

What happens is:

CPU0			CPU1
			idle()
			  arch_idle()
			    tick_broadcast_oneshot_control(OFF);
			      set cpu1 in tick_broadcast_force_mask
			  if (cpu_offline())
			     arch_cpu_dead()

cpu_dead_cleanup(cpu1)
 cpu1 tickdevice pointer = NULL

broadcast interrupt
  dereference cpu1 tickdevice pointer -> OOPS

We dereference the pointer because cpu1 is still set in
tick_broadcast_force_mask and tick_do_broadcast() expects a valid
cpumask and therefor lacks any further checks.

Remove the cpu from the tick_broadcast_force_mask before we set the
tick device pointer to NULL. Also add a sanity check to the oneshot
broadcast function, so we can detect such issues w/o crashing the
machine.

Reported-by: Prarit Bhargava <prarit@redhat.com>
Cc: athorlton@sgi.com
Cc: CAI Qian <caiqian@redhat.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1306261303260.4013@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:21 -07:00
Alex Shi
a0692dda2a Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-02-27 09:39:31 +08:00
Mikulas Patocka
ea7dfc423c time: Fix overflow when HZ is smaller than 60
commit 80d767d770 upstream.

When compiling for the IA-64 ski emulator, HZ is set to 32 because the
emulation is slow and we don't want to waste too many cycles processing
timers. Alpha also has an option to set HZ to 32.

This causes integer underflow in
kernel/time/jiffies.c:
kernel/time/jiffies.c:66:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
  .mult  = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
  ^

This patch reduces the JIFFIES_SHIFT value to avoid the overflow.

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1401241639100.23871@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22 12:41:29 -08:00
Thomas Gleixner
dbd515879a tick: Clear broadcast pending bit when switching to oneshot
commit dd5fd9b91a upstream.

AMD systems which use the C1E workaround in the amd_e400_idle routine
trigger the WARN_ON_ONCE in the broadcast code when onlining a CPU.

The reason is that the idle routine of those AMD systems switches the
cpu into forced broadcast mode early on before the newly brought up
CPU can switch over to high resolution / NOHZ mode. The timer related
CPU1 bringup looks like this:

  clockevent_register_device(local_apic);
  tick_setup(local_apic);
  ...
  idle()
	tick_broadcast_on_off(FORCE);
	tick_broadcast_oneshot_control(ENTER)
	  cpumask_set(cpu, broadcast_oneshot_mask);
	halt();

Now the broadcast interrupt on CPU0 sets CPU1 in the
broadcast_pending_mask and wakes CPU1. So CPU1 continues:

	local_apic_timer_interrupt()
	   tick_handle_periodic();
	   softirq()
	     tick_init_highres();
	       cpumask_clr(cpu, broadcast_oneshot_mask);

	tick_broadcast_oneshot_control(ENTER)
	   WARN_ON(cpumask_test(cpu, broadcast_pending_mask);

So while we remove CPU1 from the broadcast_oneshot_mask when we switch
over to highres mode, we do not clear the pending bit, which then
triggers the warning when we go back to idle.

The reason why this is only visible on C1E affected AMD systems is
that the other machines enter the deep sleep states via
acpi_idle/intel_idle and exit the broadcast mode before executing the
remote triggered local_apic_timer_interrupt. So the pending bit is
already cleared when the switch over to highres mode is clearing the
oneshot mask.

The solution is simple: Clear the pending bit together with the mask
bit when we switch over to highres mode.

Stanislaw came up independently with the same patch by enforcing the
C1E workaround and debugging the fallout. I picked mine, because mine
has a changelog :)

Reported-by: poma <pomidorabelisima@gmail.com>
Debugged-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Justin M. Forbes <jforbes@redhat.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402111434180.21991@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22 12:41:29 -08:00
Mark Brown
6a7fe00e1f Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2014-02-14 12:15:57 +00:00
John Stultz
d9e8fada0c timekeeping: Avoid possible deadlock from clock_was_set_delayed
commit 6fdda9a9c5 upstream.

As part of normal operaions, the hrtimer subsystem frequently calls
into the timekeeping code, creating a locking order of
  hrtimer locks -> timekeeping locks

clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
between the timekeeping the hrtimer subsystem, so that we could
notify the hrtimer subsytem the time had changed while holding
the timekeeping locks. This was done by scheduling delayed work
that would run later once we were out of the timekeeing code.

But unfortunately the lock chains are complex enoguh that in
scheduling delayed work, we end up eventually trying to grab
an hrtimer lock.

Sasha Levin noticed this in testing when the new seqlock lockdep
enablement triggered the following (somewhat abrieviated) message:

[  251.100221] ======================================================
[  251.100221] [ INFO: possible circular locking dependency detected ]
[  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
[  251.101967] -------------------------------------------------------
[  251.101967] kworker/10:1/4506 is trying to acquire lock:
[  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
[  251.101967]
[  251.101967] but task is already holding lock:
[  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[  251.101967]
[  251.101967] which lock already depends on the new lock.
[  251.101967]
[  251.101967]
[  251.101967] the existing dependency chain (in reverse order) is:
[  251.101967]
-> #5 (hrtimer_bases.lock#11){-.-...}:
[snipped]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[snipped]
-> #3 (&rq->lock){-.-.-.}:
[snipped]
-> #2 (&p->pi_lock){-.-.-.}:
[snipped]
-> #1 (&(&pool->lock)->rlock){-.-...}:
[  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
[  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
[  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
[  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
[  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
[  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
[  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
[  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
[  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
[  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
[  251.101967]
-> #0 (timekeeper_seq){----..}:
[snipped]
[  251.101967] other info that might help us debug this:
[  251.101967]
[  251.101967] Chain exists of:
  timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11

[  251.101967]  Possible unsafe locking scenario:
[  251.101967]
[  251.101967]        CPU0                    CPU1
[  251.101967]        ----                    ----
[  251.101967]   lock(hrtimer_bases.lock#11);
[  251.101967]                                lock(&rt_b->rt_runtime_lock);
[  251.101967]                                lock(hrtimer_bases.lock#11);
[  251.101967]   lock(timekeeper_seq);
[  251.101967]
[  251.101967]  *** DEADLOCK ***
[  251.101967]
[  251.101967] 3 locks held by kworker/10:1/4506:
[  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[  251.101967]
[  251.101967] stack backtrace:
[  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
[  251.101967] Workqueue: events clock_was_set_work

So the best solution is to avoid calling clock_was_set_delayed() while
holding the timekeeping lock, and instead using a flag variable to
decide if we should call clock_was_set() once we've released the locks.

This works for the case here, where the do_adjtimex() was the deadlock
trigger point. Unfortuantely, in update_wall_time() we still hold
the jiffies lock, which would deadlock with the ipi triggered by
clock_was_set(), preventing us from calling it even after we drop the
timekeeping lock. So instead call clock_was_set_delayed() at that point.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:04 -08:00
John Stultz
226e0f713f timekeeping: Fix missing timekeeping_update in suspend path
commit 330a1617b0 upstream.

Since 48cdc135d4 (Implement a shadow timekeeper), we have to
call timekeeping_update() after any adjustment to the timekeeping
structure in order to make sure that any adjustments to the structure
persist.

In the timekeeping suspend path, we udpate the timekeeper
structure, so we should be sure to update the shadow-timekeeper
before releasing the timekeeping locks. Currently this isn't done.

In most cases, the next time related code to run would be
timekeeping_resume, which does update the shadow-timekeeper, but
in an abundence of caution, this patch adds the call to
timekeeping_update() in the suspend path.

Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:03 -08:00
John Stultz
a8ad6b6772 timekeeping: Fix CLOCK_TAI timer/nanosleep delays
commit 04005f6011 upstream.

A think-o in the calculation of the monotonic -> tai time offset
results in CLOCK_TAI timers and nanosleeps to expire late (the
latency is ~2x the tai offset).

Fix this by adding the tai offset from the realtime offset instead
of subtracting.

Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:03 -08:00
John Stultz
77535a0a16 timekeeping: Fix lost updates to tai adjustment
commit f55c07607a upstream.

Since 48cdc135d4 (Implement a shadow timekeeper), we have to
call timekeeping_update() after any adjustment to the timekeeping
structure in order to make sure that any adjustments to the structure
persist.

Unfortunately, the updates to the tai offset via adjtimex do not
trigger this update, causing adjustments to the tai offset to be
made and then over-written by the previous value at the next
update_wall_time() call.

This patch resovles the issue by calling timekeeping_update()
right after setting the tai offset.

Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:03 -08:00
Mark Brown
78157f5f8c Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2013-12-12 10:22:32 +00:00
Martin Schwidefsky
78f8d9b564 time: Fix 1ns/tick drift w/ GENERIC_TIME_VSYSCALL_OLD
commit 4be77398ac upstream.

Since commit 1e75fa8be9 (time: Condense timekeeper.xtime
into xtime_sec - merged in v3.6), there has been an problem
with the error accounting in the timekeeping code, such that
when truncating to nanoseconds, we round up to the next nsec,
but the balancing adjustment to the ntp_error value was dropped.

This causes 1ns per tick drift forward of the clock.

In 3.7, this logic was isolated to only GENERIC_TIME_VSYSCALL_OLD
architectures (s390, ia64, powerpc).

The fix is simply to balance the accounting and to subtract the
added nanosecond from ntp_error. This allows the internal long-term
clock steering to keep the clock accurate.

While this fix removes the regression added in 1e75fa8be9, the
ideal solution is to move away from GENERIC_TIME_VSYSCALL_OLD
and use the new VSYSCALL method, which avoids entirely the
nanosecond granular rounding, and the resulting short-term clock
adjustment oscillation needed to keep long term accurate time.

[ jstultz: Many thanks to Martin for his efforts identifying this
  	   subtle bug, and providing the fix. ]

Originally-from: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Paul Turner <pjt@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1385149491-20307-1-git-send-email-john.stultz@linaro.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-11 22:36:27 -08:00
Mark Brown
3ba8f67bac Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2013-12-08 21:51:36 +00:00
Miroslav Lichvar
9baca2ff10 ntp: Make periodic RTC update more reliable
commit a97ad0c4b4 upstream.

The current code requires that the scheduled update of the RTC happens
in the closest tick to the half of the second. This seems to be
difficult to achieve reliably. The scheduled work may be missing the
target time by a tick or two and be constantly rescheduled every second.

Relax the limit to 10 ticks. As a typical RTC drifts in the 11-minute
update interval by several milliseconds, this shouldn't affect the
overall accuracy of the RTC much.

Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08 07:29:28 -08:00
Stephen Boyd
7281bb5614 clockevents: Prefer CPU local devices over global devices
commit 70e5975d3a upstream.

On an SMP system with only one global clockevent and a dummy
clockevent per CPU we run into problems. We want the dummy
clockevents to be registered as the per CPU tick devices, but
we can only achieve that if we register the dummy clockevents
before the global clockevent or if we artificially inflate the
rating of the dummy clockevents to be higher than the rating
of the global clockevent. Failure to do so leads to boot
hangs when the dummy timers are registered on all other CPUs
besides the CPU that accepted the global clockevent as its tick
device and there is no broadcast timer to poke the dummy
devices.

If we're registering multiple clockevents and one clockevent is
global and the other is local to a particular CPU we should
choose to use the local clockevent regardless of the rating of
the device. This way, if the clockevent is a dummy it will take
the tick device duty as long as there isn't a higher rated tick
device and any global clockevent will be bumped out into
broadcast mode, fixing the problem described above.

Reported-and-tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: soren.brinkmann@xilinx.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20130613183950.GA32061@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08 07:29:27 -08:00
Thomas Gleixner
9bae8ea054 clockevents: Split out selection logic
commit 45cb8e01b2 upstream.

Split out the clockevent device selection logic. Preparatory patch to
allow unbinding active clockevent devices.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143436.431796247@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08 07:29:27 -08:00
Thomas Gleixner
409d4ffaf0 clockevents: Add module refcount
commit ccf33d6880 upstream.

We want to be able to remove clockevent modules as well. Add a
refcount so we don't remove a module with an active clock event
device.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143436.307435149@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08 07:29:27 -08:00
Thomas Gleixner
e8d630331d clockevents: Get rid of the notifier chain
commit 7172a286ce upstream.

7+ years and still a single user. Kill it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143436.098520211@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08 07:29:27 -08:00
Mark Brown
a8eb5f1e4b Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android
Conflicts (add/add):
	drivers/input/evdev.c
2013-12-05 10:16:50 +00:00
KOSAKI Motohiro
4501cfd0e3 alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist
commit 98d6f4dd84 upstream.

Fedora Ruby maintainer reported latest Ruby doesn't work on Fedora Rawhide
on ARM. (http://bugs.ruby-lang.org/issues/9008)

Because of, commit 1c6b39ad3f (alarmtimers: Return -ENOTSUPP if no
RTC device is present) intruduced to return ENOTSUPP when
clock_get{time,res} can't find a RTC device. However this is incorrect.

First, ENOTSUPP isn't exported to userland (ENOTSUP or EOPNOTSUP are the
closest userland equivlents).

Second, Posix and Linux man pages agree that clock_gettime and
clock_getres should return EINVAL if clk_id argument is invalid.
While the arugment that the clockid is valid, but just not supported
on this hardware could be made, this is just a technicality that
doesn't help userspace applicaitons, and only complicates error
handling.

Thus, this patch changes the code to use EINVAL.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: Vit Ondruch <v.ondruch@tiscali.cz>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
[jstultz: Tweaks to commit message to include full rational]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-04 10:55:49 -08:00
Mark Brown
4cb518ab3d Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2013-11-13 12:06:53 +00:00
Thomas Gleixner
b5b02b1406 clockevents: Sanitize ticks to nsec conversion
commit 97b9410643 upstream.

Marc Kleine-Budde pointed out, that commit 77cc982 "clocksource: use
clockevents_config_and_register() where possible" caused a regression
for some of the converted subarchs.

The reason is, that the clockevents core code converts the minimal
hardware tick delta to a nanosecond value for core internal
usage. This conversion is affected by integer math rounding loss, so
the backwards conversion to hardware ticks will likely result in a
value which is less than the configured hardware limitation. The
affected subarchs used their own workaround (SIGH!) which got lost in
the conversion.

The solution for the issue at hand is simple: adding evt->mult - 1 to
the shifted value before the integer divison in the core conversion
function takes care of it. But this only works for the case where for
the scaled math mult/shift pair "mult <= 1 << shift" is true. For the
case where "mult > 1 << shift" we can apply the rounding add only for
the minimum delta value to make sure that the backward conversion is
not less than the given hardware limit. For the upper bound we need to
omit the rounding add, because the backwards conversion is always
larger than the original latch value. That would violate the upper
bound of the hardware device.

Though looking closer at the details of that function reveals another
bogosity: The upper bounds check is broken as well. Checking for a
resulting "clc" value greater than KTIME_MAX after the conversion is
pointless. The conversion does:

      u64 clc = (latch << evt->shift) / evt->mult;

So there is no sanity check for (latch << evt->shift) exceeding the
64bit boundary. The latch argument is "unsigned long", so on a 64bit
arch the handed in argument could easily lead to an unnoticed shift
overflow. With the above rounding fix applied the calculation before
the divison is:

       u64 clc = (latch << evt->shift) + evt->mult - 1;

So we need to make sure, that neither the shift nor the rounding add
is overflowing the u64 boundary.

[ukl: move assignment to rnd after eventually changing mult, fix build
 issue and correct comment with the right math]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: nicolas.ferre@atmel.com
Cc: Marc Pignat <marc.pignat@hevs.ch>
Cc: john.stultz@linaro.org
Cc: kernel@pengutronix.de
Cc: Ronald Wahl <ronald.wahl@raritan.com>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-13 12:05:32 +09:00
Mark Brown
7d89516d24 Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2013-10-04 00:30:46 +01:00
John Stultz
63946e8616 timekeeping: Fix HRTICK related deadlock from ntp lock changes
commit 7bd3601446 upstream.

Gerlando Falauto reported that when HRTICK is enabled, it is
possible to trigger system deadlocks. These were hard to
reproduce, as HRTICK has been broken in the past, but seemed
to be connected to the timekeeping_seq lock.

Since seqlock/seqcount's aren't supported w/ lockdep, I added
some extra spinlock based locking and triggered the following
lockdep output:

[   15.849182] ntpd/4062 is trying to acquire lock:
[   15.849765]  (&(&pool->lock)->rlock){..-...}, at: [<ffffffff810aa9b5>] __queue_work+0x145/0x480
[   15.850051]
[   15.850051] but task is already holding lock:
[   15.850051]  (timekeeper_lock){-.-.-.}, at: [<ffffffff810df6df>] do_adjtimex+0x7f/0x100

<snip>

[   15.850051] Chain exists of: &(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock
[   15.850051]  Possible unsafe locking scenario:
[   15.850051]
[   15.850051]        CPU0                    CPU1
[   15.850051]        ----                    ----
[   15.850051]   lock(timekeeper_lock);
[   15.850051]                                lock(&p->pi_lock);
[   15.850051] lock(timekeeper_lock);
[   15.850051] lock(&(&pool->lock)->rlock);
[   15.850051]
[   15.850051]  *** DEADLOCK ***

The deadlock was introduced by 06c017fdd4 ("timekeeping:
Hold timekeepering locks in do_adjtimex and hardpps") in 3.10

This patch avoids this deadlock, by moving the call to
schedule_delayed_work() outside of the timekeeper lock
critical section.

Reported-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Tested-by: Lin Ming <minggr@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-01 09:17:45 -07:00
Mark Brown
ed9d23700e Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2013-09-08 13:24:43 +01:00
Nathan Zimmer
b3772c81e3 timer_list: correct the iterator for timer_list
commit 84a78a6504 upstream.

Correct an issue with /proc/timer_list reported by Holger.

When reading from the proc file with a sufficiently small buffer, 2k so
not really that small, there was one could get hung trying to read the
file a chunk at a time.

The timer_list_start function failed to account for the possibility that
the offset was adjusted outside the timer_list_next.

Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Reported-by: Holger Hans Peter Freyther <holger@freyther.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Berke Durak <berke.durak@xiphos.com>
Cc: Jeff Layton <jlayton@redhat.com>
Tested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-07 22:09:58 -07:00
Mark Brown
0a2d5c259e Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android
Conflicts (the ARM security fixes vs the read only kernel changes from
Google):
	arch/arm/mm/mmu.c
2013-08-15 20:29:53 +01:00
Rafael J. Wysocki
d201a0b94d Revert "cpuidle: Quickly notice prediction failure for repeat mode"
commit 148519120c upstream.

Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
repeat mode), because it has been identified as the source of a
significant performance regression in v3.8 and later as explained by
Jeremy Eder:

  We believe we've identified a particular commit to the cpuidle code
  that seems to be impacting performance of variety of workloads.
  The simplest way to reproduce is using netperf TCP_RR test, so
  we're using that, on a pair of Sandy Bridge based servers.  We also
  have data from a large database setup where performance is also
  measurably/positively impacted, though that test data isn't easily
  share-able.

  Included below are test results from 3 test kernels:

  kernel       reverts
  -----------------------------------------------------------
  1) vanilla   upstream (no reverts)

  2) perfteam2 reverts e11538d1f0

  3) test      reverts 69a37beabf
                       e11538d1f0

  In summary, netperf TCP_RR numbers improve by approximately 4%
  after reverting 69a37beabf.  When
  69a37beabf is included, C0 residency
  never seems to get above 40%.  Taking that patch out gets C0 near
  100% quite often, and performance increases.

  The below data are histograms representing the %c0 residency @
  1-second sample rates (using turbostat), while under netperf test.

  - If you look at the first 4 histograms, you can see %c0 residency
    almost entirely in the 30,40% bin.
  - The last pair, which reverts 69a37beabf,
    shows %c0 in the 80,90,100% bins.

  Below each kernel name are netperf TCP_RR trans/s numbers for the
  particular kernel that can be disclosed publicly, comparing the 3
  test kernels.  We ran a 4th test with the vanilla kernel where
  we've also set /dev/cpu_dma_latency=0 to show overall impact
  boosting single-threaded TCP_RR performance over 11% above
  baseline.

  3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
  TCP_RR trans/s 54323.78

  -----------------------------------------------------------
  3.10-rc2 vanilla RX (no reverts)
  TCP_RR trans/s 48192.47

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    59]:
  ***********************************************************
     40.0000 -    50.0000 [     1]: *
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  Sender %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    11]: ***********
     40.0000 -    50.0000 [    49]:
  *************************************************
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  -----------------------------------------------------------
  3.10-rc2 perfteam2 RX (reverts commit
  e11538d1f0)
  TCP_RR trans/s 49698.69

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     1]: *
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    59]:
  ***********************************************************
     40.0000 -    50.0000 [     0]:
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  Sender %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [     2]: **
     40.0000 -    50.0000 [    58]:
  **********************************************************
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  -----------------------------------------------------------
  3.10-rc2 test RX (reverts 69a37beabf
  and e11538d1f0)
  TCP_RR trans/s 47766.95

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     1]: *
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    27]: ***************************
     40.0000 -    50.0000 [     2]: **
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     2]: **
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [    28]: ****************************

  Sender:
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    11]: ***********
     40.0000 -    50.0000 [     0]:
     50.0000 -    60.0000 [     1]: *
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     3]: ***
     80.0000 -    90.0000 [     7]: *******
     90.0000 -   100.0000 [    38]: **************************************

  These results demonstrate gaining back the tendency of the CPU to
  stay in more responsive, performant C-states (and thus yield
  measurably better performance), by reverting commit
  69a37beabf.

Requested-by: Jeremy Eder <jeder@redhat.com>
Tested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-11 18:35:24 -07:00
Mark Brown
535ad2c06c Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android 2013-07-26 00:01:00 +01:00
Thomas Gleixner
084c895d3c tick: Prevent uncontrolled switch to oneshot mode
commit 1f73a9806b upstream.

When the system switches from periodic to oneshot mode, the broadcast
logic causes a possibility that a CPU which has not yet switched to
oneshot mode puts its own clock event device into oneshot mode without
updating the state and the timer handler.

CPU0				CPU1
				per cpu tickdev is in periodic mode
				and switched to broadcast

Switch to oneshot mode
 tick_broadcast_switch_to_oneshot()
  cpumask_copy(tick_oneshot_broacast_mask,
	       tick_broadcast_mask);

  broadcast device mode = oneshot

				Timer interrupt

				irq_enter()
				 tick_check_oneshot_broadcast()
				  dev->set_mode(ONESHOT);

				tick_handle_periodic()
				 if (dev->mode == ONESHOT)
				   dev->next_event += period;
				   FAIL.

We fail, because dev->next_event contains KTIME_MAX, if the device was
in periodic mode before the uncontrolled switch to oneshot happened.

We must copy the broadcast bits over to the oneshot mask, because
otherwise a CPU which relies on the broadcast would not been woken up
anymore after the broadcast device switched to oneshot mode.

So we need to verify in tick_check_oneshot_broadcast() whether the CPU
has already switched to oneshot mode. If not, leave the device
untouched and let the CPU switch controlled into oneshot mode.

This is a long standing bug, which was never noticed, because the main
user of the broadcast x86 cannot run into that scenario, AFAICT. The
nonarchitected timer mess of ARM creates a gazillion of differently
broken abominations which trigger the shortcomings of that broadcast
code, which better had never been necessary in the first place.

Reported-and-tested-by: Stehle Vincent-B46079 <B46079@freescale.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: John Stultz <john.stultz@linaro.org>,
Cc: Mark Rutland <mark.rutland@arm.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-25 14:07:29 -07:00