Commit Graph

8487 Commits

Author SHA1 Message Date
吴庆棋
ebd9913f06 reboot function 2010-06-21 13:35:26 +08:00
San Mehat
9e6cb2f4db sched: Add a generic notifier when a task struct is about to be freed
This patch adds a notifier which can be used by subsystems that may
be interested in when a task has completely died and is about to
have it's last resource freed.

  The Android lowmemory killer uses this to determine when a task
it has killed has finally given up its goods.

Signed-off-by: San Mehat <san@google.com>
2010-05-06 15:51:00 -07:00
Arve Hjønnevåg
67078ecae3 Merge commit 'v2.6.32.9' into android-2.6.32 2010-03-10 16:38:33 -08:00
Jason Wang
1c63c20663 Export the symbol of getboottime and mmonotonic_to_bootbased
commit c93d89f3db upstream.

Export getboottime and monotonic_to_bootbased in order to let them
could be used by following patch.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23 07:37:52 -08:00
Thomas Gleixner
22240ab64b futex: Handle futex value corruption gracefully
commit 59647b6ac3 upstream.

The WARN_ON in lookup_pi_state which complains about a mismatch
between pi_state->owner->pid and the pid which we retrieved from the
user space futex is completely bogus.

The code just emits the warning and then continues despite the fact
that it detected an inconsistent state of the futex. A conveniant way
for user space to spam the syslog.

Replace the WARN_ON by a consistency check. If the values do not match
return -EINVAL and let user space deal with the mess it created.

This also fixes the missing task_pid_vnr() when we compare the
pi_state->owner pid with the futex value.

Reported-by: Jermome Marchand <jmarchan@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23 07:37:43 -08:00
Thomas Gleixner
c03d9d422d futex: Handle user space corruption gracefully
commit 51246bfd18 upstream.

If the owner of a PI futex dies we fix up the pi_state and set
pi_state->owner to NULL. When a malicious or just sloppy programmed
user space application sets the futex value to 0 e.g. by calling
pthread_mutex_init(), then the futex can be acquired again. A new
waiter manages to enqueue itself on the pi_state w/o damage, but on
unlock the kernel dereferences pi_state->owner and oopses.

Prevent this by checking pi_state->owner in the unlock path. If
pi_state->owner is not current we know that user space manipulated the
futex value. Ignore the mess and return -EINVAL.

This catches the above case and also the case where a task hijacks the
futex by setting the tid value and then tries to unlock it.

Reported-by: Jermome Marchand <jmarchan@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23 07:37:43 -08:00
Mikael Pettersson
5f6af116c2 futex_lock_pi() key refcnt fix
commit 5ecb01cfdf upstream.

This fixes a futex key reference count bug in futex_lock_pi(),
where a key's reference count is incremented twice but decremented
only once, causing the backing object to not be released.

If the futex is created in a temporary file in an ext3 file system,
this bug causes the file's inode to become an "undead" orphan,
which causes an oops from a BUG_ON() in ext3_put_super() when the
file system is unmounted. glibc's test suite is known to trigger this,
see <http://bugzilla.kernel.org/show_bug.cgi?id=14256>.

The bug is a regression from 2.6.28-git3, namely Peter Zijlstra's
38d47c1b70 "[PATCH] futex: rely on
get_user_pages() for shared futexes". That commit made get_futex_key()
also increment the reference count of the futex key, and updated its
callers to decrement the key's reference count before returning.
Unfortunately the normal exit path in futex_lock_pi() wasn't corrected:
the reference count is incremented by get_futex_key() and queue_lock(),
but the normal exit path only decrements once, via unqueue_me_pi().
The fix is to put_futex_key() after unqueue_me_pi(), since 2.6.31
this is easily done by 'goto out_put_key' rather than 'goto out'.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23 07:37:43 -08:00
Mike Chan
7175b7585f power: wakelock: Print active wakelocks when has_wake_lock() is called
When DEBUG_SUSPEND is enabled print active wakelocks when we check
if there are any active wakelocks.

In print_active_locks(), print expired wakelocks if DEBUG_EXPIRE is enabled

Change-Id: Ib1cb795555e71ff23143a2bac7c8a58cbce16547
Signed-off-by: Mike Chan <mike@android.com>
2010-02-19 17:03:09 -08:00
jamal
6117db7678 NET: fix oops at bootime in sysctl code
This fixes the boot time oops on the 2.6.32-stable tree.  It is needed
only in this tree due to the divergance from upstream.

From: jamal <hadi@cyberus.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09 04:51:02 -08:00
Julia Lawall
e06fbe9a40 kernel/cred.c: use kmem_cache_free
commit b8a1d37c5f upstream.

Free memory allocated using kmem_cache_zalloc using kmem_cache_free rather
than kfree.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,E,c;
@@

 x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...)
 ... when != x = E
     when != &x
?-kfree(x)
+kmem_cache_free(c,x)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Steve Dickson <steved@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09 04:51:01 -08:00
Aaro Koskinen
359e2f2722 clocksource: fix compilation if no GENERIC_TIME
commit a362c638bd upstream

Commit a9238ce3bb broke compilation on
platforms that do not implement GENERIC_TIME (e.g. iop32x):

  kernel/time/clocksource.c: In function 'clocksource_register':
  kernel/time/clocksource.c:556: error: implicit declaration of function 'clocksource_max_deferment'

Provide the implementation of clocksource_max_deferment() also for
such platforms.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09 04:50:54 -08:00
San Mehat
90bb75ece8 kernel: printk: Add non exported function for clearing the log ring buffer
Signed-off-by: San Mehat <san@google.com>
2010-02-08 15:36:03 -08:00
Arve Hjønnevåg
8a3025c963 printk: Fix log_buf_copy termination.
If idx was non-zero and the log had wrapped, len did not get truncated
to stop at the last byte written to the log.
2010-02-08 15:35:53 -08:00
Arve Hjønnevåg
afa82e25d4 Revert "printk: remove unused code from kernel/printk.c"
This reverts commit acff181d35.
2010-02-08 15:35:53 -08:00
San Mehat
5038d42d2e cgroup: Add generic cgroup subsystem permission checks.
Rather than using explicit euid == 0 checks when trying to move
tasks into a cgroup via CFS, move permission checks into each
specific cgroup subsystem. If a subsystem does not specify a
'can_attach' handler, then we fall back to doing our checks the old way.

    This way non-root processes can add arbitrary processes to
a cgroup if all the registered subsystems on that cgroup agree.

    Also change explicit euid == 0 check to CAP_SYS_ADMIN

Signed-off-by: San Mehat <san@google.com>
2010-02-08 15:09:13 -08:00
Rebecca Schultz
cc2ae286f8 PM: earlysuspend: Removing dependence on console.
Rather than signaling a full update of the display from userspace via a
console switch, this patch introduces 2 files int /sys/power,
wait_for_fb_sleep and wait_for_fb_wake.  Reading these files will block
until the requested state has been entered.  When a read from
wait_for_fb_sleep returns userspace should stop drawing.  When
wait_for_fb_wake returns, it should do a full update.  If either are called
when the fb driver is already in the requested state, they will return
immediately.

Signed-off-by: Rebecca Schultz <rschultz@google.com>
Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03 21:27:08 -08:00
Arve Hjønnevåg
7fc8c1c281 consoleearlysuspend: Fix for 2.6.32
vt_waitactive now needs a 1 based console number

Change-Id: I07ab9a3773c93d67c09d928c8d5494ce823ffa2e
2010-02-03 21:27:08 -08:00
Arve Hjønnevåg
849d91c2b3 PM: earlysuspend: Add console switch when user requested sleep state changes.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03 21:27:07 -08:00
Arve Hjønnevåg
ddbad8da31 PM: wakelock: Don't dump unfrozen task list when aborting try_to_freeze_tasks after less than one second
Change-Id: Ib2976e5b97a5ee4ec9abd4d4443584d9257d0941
Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03 21:27:07 -08:00
Arve Hjønnevåg
9a4999395d PM: wakelock: Abort task freezing if a wake lock is held.
Avoids a problem where the device sometimes hangs for 20 seconds
before the screen is turned on.
2010-02-03 21:27:06 -08:00
Arve Hjønnevåg
f5c158e396 PM: Add user-space wake lock api.
This adds /sys/power/wake_lock and /sys/power/wake_unlock.
Writing a string to wake_lock creates a wake lock the
first time is sees a string and locks it. Optionally, the
string can be followed by a timeout.
To unlock the wake lock, write the same string to wake_unlock.
2010-02-03 21:27:05 -08:00
Arve Hjønnevåg
5f36bdf159 PM: Enable early suspend through /sys/power/state
If EARLYSUSPEND is enabled then writes to /sys/power/state no longer
blocks, and the kernel will try to enter the requested state every
time no wakelocks are held. Write "on" to resume normal operation.
2010-02-03 21:27:05 -08:00
Arve Hjønnevåg
74643fdcc3 PM: Implement early suspend api 2010-02-03 21:27:04 -08:00
Arve Hjønnevåg
d38f947b14 PM: wakelocks: Use seq_file for /proc/wakelocks so we can get more than 3K of stats.
Change-Id: I42ed8bea639684f7a8a95b2057516764075c6b01
Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03 21:27:04 -08:00
Erik Gilling
0f8a3256cb power: wakelocks: fix buffer overflow in print_wake_locks
Change-Id: Ic944e3b3d3bc53eddc6fd0963565fd072cac373c
Signed-off-by: Erik Gilling <konkers@android.com>
2010-02-03 21:27:03 -08:00
Mike Chan
368bb8b6ee power: Prevent spinlock recursion when wake_unlock() is called
Signed-off-by: Mike Chan <mike@android.com>
2010-02-03 21:27:03 -08:00
Arve Hjønnevåg
1ea6dbb506 PM: Implement wakelock api.
PM: wakelock: Replace expire work with a timer

The expire work function did not work in the normal case.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03 21:27:02 -08:00
Erik Gilling
604ad7390d sched: make task dump print all 15 chars of proc comm
Change-Id: I1a5c9676baa06c9f9b4424bbcab01b9b2fbfcd99
Signed-off-by: Erik Gilling <konkers@android.com>
2010-02-03 21:26:58 -08:00
Arve Hjønnevåg
5630369405 sched: Enable might_sleep before initializing drivers.
This allows detection of init bugs in built-in drivers.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03 21:26:58 -08:00
Arve Hjønnevåg
207f66e23a Add build option to to set the default panic timeout. 2010-02-03 21:26:57 -08:00
Arve Hjønnevåg
8d4c42afa5 futex: Restore one of the fast paths eliminated by 38d47c1b70
This improves futex performance until our user-space code is fixed to use
FUTEX_PRIVATE_FLAG for non-shared futexes.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03 21:26:56 -08:00
Arve Hjønnevåg
c177d04116 mm: Add min_free_order_shift tunable.
By default the kernel tries to keep half as much memory free at each
order as it does for one order below. This can be too agressive when
running without swap.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03 20:48:13 -08:00
Tony Lindgren
39078ee4d1 ARM: Make low-level printk work
Makes low-level printk work.

Signed-off-by: Tony Lindgren <tony@atomide.com>
2010-02-03 20:48:10 -08:00
Peter Zijlstra
18ed2ed460 sched: Fix task priority bug
83f9ac removed a call to effective_prio() in wake_up_new_task(), which
leads to tasks running at MAX_PRIO.

This is caused by the idle thread being set to MAX_PRIO before forking
off init. O(1) used that to make sure idle was always preempted, CFS
uses check_preempt_curr_idle() for that so we can savely remove this bit
of legacy code.

Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1259754383.4003.610.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-03 20:48:07 -08:00
Mike Travis
5cf92e9a8f timers, init: Limit the number of per cpu calibration bootup messages
commit feae3203d7 upstream.

Limit the number of per cpu calibration messages by only
printing out results for the first cpu to boot.

Also, don't print "CPUx is down" as this is expected, and we
don't need 4096 reminders... ;-)

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091118002219.889552000@alcatraz.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:14 -08:00
Jon Hunter
a9238ce3bb nohz: Prevent clocksource wrapping during idle
commit 98962465ed upstream.

The dynamic tick allows the kernel to sleep for periods longer than a
single tick, but it does not limit the sleep time currently. In the
worst case the kernel could sleep longer than the wrap around time of
the time keeping clock source which would result in losing track of
time.

Prevent this by limiting it to the safe maximum sleep time of the
current time keeping clock source. The value is calculated when the
clock source is registered.

[ tglx: simplified the code a bit and massaged the commit msg ]

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:12 -08:00
Christian Ehrhardt
db47a1671a sched: Fix missing sched tunable recalculation on cpu add/remove
commit 0bcdcf28c9 upstream.

Based on Peter Zijlstras patch suggestion this enables recalculation of
the scheduler tunables in response of a change in the number of cpus. It
also adds a max of eight cpus that are considered in that scaling.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1259579808-11357-2-git-send-email-ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:11 -08:00
Rusty Russell
08b84be9e9 sched: Fix isolcpus boot option
commit bdddd2963c upstream.

Anton Blanchard wrote:

> We allocate and zero cpu_isolated_map after the isolcpus
> __setup option has run. This means cpu_isolated_map always
> ends up empty and if CPUMASK_OFFSTACK is enabled we write to a
> cpumask that hasn't been allocated.

I introduced this regression in 49557e6203 (sched: Fix
boot crash by zalloc()ing most of the cpu masks).

Use the bootmem allocator if they set isolcpus=, otherwise
allocate and zero like normal.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: peterz@infradead.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
LKML-Reference: <200912021409.17013.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Anton Blanchard <anton@samba.org>
2010-01-28 15:01:09 -08:00
H Hartley Sweeten
ce946bce17 clockevents: Add missing include to pacify sparse
commit 8e1a928a2e upstream.

Include "tick-internal.h" in order to pick up the extern function
prototype for clockevents_shutdown(). This quiets the following sparse
build noise:

  warning: symbol 'clockevents_shutdown' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
LKML-Reference: <BD79186B4FD85F4B8E60E381CAEE190901E24550@mi8nycmail19.Mi8.com>
Reviewed-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: johnstul@us.ibm.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:00:24 -08:00
Xiaotian Feng
08b8ff4435 clockevent: Don't remove broadcast device when cpu is dead
commit ea9d8e3f45 upstream.

Marc reported that the BUG_ON in clockevents_notify() triggers on his
system. This happens because the kernel tries to remove an active
clock event device (used for broadcasting) from the device list.

The handling of devices which can be used as per cpu device and as a
global broadcast device is suboptimal.

The simplest solution for now (and for stable) is to check whether the
device is used as global broadcast device, but this needs to be
revisited.

[ tglx: restored the cpuweight check and massaged the changelog ]

Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
LKML-Reference: <1262834564-13033-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:00:22 -08:00
Peter Zijlstra
9607f0688f perf: Honour event state for aux stream data
commit 22e190851f upstream.

Anton reported that perf record kept receiving events even after calling
ioctl(PERF_EVENT_IOC_DISABLE). It turns out that FORK,COMM and MMAP
events didn't respect the disabled state and kept flowing in.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Anton Blanchard <anton@samba.org>
LKML-Reference: <1263459187.4244.265.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25 10:49:46 -08:00
Peter Zijlstra
b0a93920c4 perf events: Dont report side-band events on each cpu for per-task-per-cpu events
commit 5d27c23df0 upstream.

Acme noticed that his FORK/MMAP numbers were inflated by about
the same factor as his cpu-count.

This led to the discovery of a few more sites that need to
respect the event->cpu filter.

Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20091217121830.215333434@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25 10:49:45 -08:00
Peter Zijlstra
26931397cc sched: Fix task priority bug
commit 57785df5ac upstream.

83f9ac removed a call to effective_prio() in wake_up_new_task(), which
leads to tasks running at MAX_PRIO.

This is caused by the idle thread being set to MAX_PRIO before forking
off init. O(1) used that to make sure idle was always preempted, CFS
uses check_preempt_curr_idle() for that so we can savely remove this bit
of legacy code.

Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1259754383.4003.610.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22 15:18:40 -08:00
David Miller
896fb0d2fb sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
commit b9f8fcd55b upstream.

Relax stable-sched-clock architectures to not save/disable/restore
hardirqs in cpu_clock().

The background is that I was trying to resolve a sparc64 perf
issue when I discovered this problem.

On sparc64 I implement pseudo NMIs by simply running the kernel
at IRQ level 14 when local_irq_disable() is called, this allows
performance counter events to still come in at IRQ level 15.

This doesn't work if any code in an NMI handler does
local_irq_save() or local_irq_disable() since the "disable" will
kick us back to cpu IRQ level 14 thus letting NMIs back in and
we recurse.

The only path which that does that in the perf event IRQ
handling path is the code supporting frequency based events.  It
uses cpu_clock().

cpu_clock() simply invokes sched_clock() with IRQs disabled.

And that's a fundamental bug all on it's own, particularly for
the HAVE_UNSTABLE_SCHED_CLOCK case.  NMIs can thus get into the
sched_clock() code interrupting the local IRQ disable code
sections of it.

Furthermore, for the not-HAVE_UNSTABLE_SCHED_CLOCK case, the IRQ
disabling done by cpu_clock() is just pure overhead and
completely unnecessary.

So the core problem is that sched_clock() is not NMI safe, but
we are invoking it from NMI contexts in the perf events code
(via cpu_clock()).

A less important issue is the overhead of IRQ disabling when it
isn't necessary in cpu_clock().

CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures are not
affected by this patch.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091213.182502.215092085.davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22 15:18:30 -08:00
KOSAKI Motohiro
d4c893f207 futexes: Remove rw parameter from get_futex_key()
commit 7485d0d375 upstream.

Currently, futexes have two problem:

A) The current futex code doesn't handle private file mappings properly.

get_futex_key() uses PageAnon() to distinguish file and
anon, which can cause the following bad scenario:

  1) thread-A call futex(private-mapping, FUTEX_WAIT), it
     sleeps on file mapping object.
  2) thread-B writes a variable and it makes it cow.
  3) thread-B calls futex(private-mapping, FUTEX_WAKE), it
     wakes up blocked thread on the anonymous page. (but it's nothing)

B) Current futex code doesn't handle zero page properly.

Read mode get_user_pages() can return zero page, but current
futex code doesn't handle it at all. Then, zero page makes
infinite loop internally.

The solution is to use write mode get_user_page() always for
page lookup. It prevents the lookup of both file page of private
mappings and zero page.

Performance concerns:

Probaly very little, because glibc always initialize variables
for futex before to call futex(). It means glibc users never see
the overhead of this patch.

Compatibility concerns:

This patch has few compatibility issues. After this patch,
FUTEX_WAIT require writable access to futex variables (read-only
mappings makes EFAULT). But practically it's not a problem,
glibc always initalizes variables for futexes explicitly - nobody
uses read-only mappings.

Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Ulrich Drepper <drepper@gmail.com>
LKML-Reference: <20100105162633.45A2.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22 15:18:11 -08:00
Rusty Russell
54f1b39ce0 module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y
commit d4703aefdb upstream.

powerpc applies relocations to the kcrctab.  They're absolute symbols,
but it's not completely unreasonable: other archs may too, but the
relocation is often 0.

http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-November/077972.html

Inspired-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18 10:19:51 -08:00
Al Viro
9ef9a7c717 fix more leaks in audit_tree.c tag_chunk()
commit b4c30aad39 upstream.

Several leaks in audit_tree didn't get caught by commit
318b6d3d7d, including the leak on normal
exit in case of multiple rules refering to the same chunk.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18 10:19:50 -08:00
Al Viro
dffaea5bd7 fix braindamage in audit_tree.c untag_chunk()
commit 6f5d511489 upstream.

... aka "Al had badly fscked up when writing that thing and nobody
noticed until Eric had fixed leaks that used to mask the breakage".

The function essentially creates a copy of old array sans one element
and replaces the references to elements of original (they are on cyclic
lists) with those to corresponding elements of new one.  After that the
old one is fair game for freeing.

First of all, there's a dumb braino: when we get to list_replace_init we
use indices for wrong arrays - position in new one with the old array
and vice versa.

Another bug is more subtle - termination condition is wrong if the
element to be excluded happens to be the last one.  We shouldn't go
until we fill the new array, we should go until we'd finished the old
one.  Otherwise the element we are trying to kill will remain on the
cyclic lists...

That crap used to be masked by several leaks, so it was not quite
trivial to hit.  Eric had fixed some of those leaks a while ago and the
shit had hit the fan...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18 10:19:50 -08:00
Mike Frysinger
71c77079a7 kernel/sysctl.c: fix stable merge error in NOMMU mmap_min_addr
Stable commit 0399123f3d didn't match the
original upstream commit.  The CONFIG_MMU check was added much too early
in the list disabling a lot of proc entries in the process.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18 10:19:49 -08:00
Andi Kleen
0696a3b5e0 kernel/signal.c: fix kernel information leak with print-fatal-signals=1
commit b45c6e76bc upstream.

When print-fatal-signals is enabled it's possible to dump any memory
reachable by the kernel to the log by simply jumping to that address from
user space.

Or crash the system if there's some hardware with read side effects.

The fatal signals handler will dump 16 bytes at the execution address,
which is fully controlled by ring 3.

In addition when something jumps to a unmapped address there will be up to
16 additional useless page faults, which might be potentially slow (and at
least is not very efficient)

Fortunately this option is off by default and only there on i386.

But fix it by checking for kernel addresses and also stopping when there's
a page fault.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18 10:19:33 -08:00