linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-19 12:00:46 +09:00

Author	SHA1	Message	Date
Colin Cross	3f29a88349	Merge branch 'linux-tegra-2.6.36' into android-tegra-2.6.36 Conflicts: drivers/usb/gadget/composite.c Change-Id: I1a332ec21da62aea98912df9a01cf0282ed50ee1	2010-12-21 18:38:13 -08:00
Colin Cross	05946a1e0f	cgroup: Remove call to synchronize_rcu in cgroup_attach_task synchronize_rcu can be very expensive, averaging 100 ms in some cases. In cgroup_attach_task, it is used to prevent a task->cgroups pointer dereferenced in an RCU read side critical section from being invalidated, by delaying the call to put_css_set until after an RCU grace period. To avoid the call to synchronize_rcu, make the put_css_set call rcu-safe by moving the deletion of the css_set links into free_css_set_work, scheduled by the rcu callback free_css_set_rcu. The decrement of the cgroup refcount is no longer synchronous with the call to put_css_set, which can result in the cgroup refcount staying positive after the last call to cgroup_attach_task returns. To allow the cgroup to be deleted with cgroup_rmdir synchronously after cgroup_attach_task, have rmdir check the refcount of all associated css_sets. If cgroup_rmdir is called on a cgroup for which the css_sets all have refcount zero but the cgroup refcount is nonzero, reuse the rmdir waitqueue to block the rmdir until free_css_set_work is called. Signed-off-by: Colin Cross <ccross@android.com>	2010-12-17 17:01:51 -08:00
Colin Cross	60cdbd1f33	cgroup: Set CGRP_RELEASABLE when adding to a cgroup Changes the meaning of CGRP_RELEASABLE to be set on any cgroup that has ever had a task or cgroup in it, or had css_get called on it. The bit is set in cgroup_attach_task, cgroup_create, and __css_get. It is not necessary to set the bit in cgroup_fork, as the task is either in the root cgroup, in which can never be released, or the task it was forked from already set the bit in croup_attach_task. Signed-off-by: Colin Cross <ccross@android.com>	2010-12-17 17:01:49 -08:00
Kenji Kaneshige	2184192578	genirq: Fix incorrect proc spurious output commit `25c9170ed6` upstream. Since commit a1afb637(switch /proc/irq/*/spurious to seq_file) all /proc/irq/XX/spurious files show the information of irq 0. Current irq_spurious_proc_open() passes on NULL as the 3rd argument, which is used as an IRQ number in irq_spurious_proc_show(), to the single_open(). Because of this, all the /proc/irq/XX/spurious file shows IRQ 0 information regardless of the IRQ number. To fix the problem, irq_spurious_proc_open() must pass on the appropreate data (IRQ number) to single_open(). Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Reviewed-by: Yong Zhang <yong.zhang0@gmail.com> LKML-Reference: <4CF4B778.90604@jp.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:33:27 -08:00
Rafael J. Wysocki	53e87163a1	PM / Hibernate: Fix memory corruption related to swap commit `c9e664f1fd` upstream. There is a problem that swap pages allocated before the creation of a hibernation image can be released and used for storing the contents of different memory pages while the image is being saved. Since the kernel stored in the image doesn't know of that, it causes memory corruption to occur after resume from hibernation, especially on systems with relatively small RAM that need to swap often. This issue can be addressed by keeping the GFP_IOFS bits clear in gfp_allowed_mask during the entire hibernation, including the saving of the image, until the system is finally turned off or the hibernation is aborted. Unfortunately, for this purpose it's necessary to rework the way in which the hibernate and suspend code manipulates gfp_allowed_mask. This change is based on an earlier patch from Hugh Dickins. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-by: Ondrej Zary <linux@rainbow-software.org> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:33:26 -08:00
Thomas Gleixner	03aff981d0	perf: Fix inherit vs. context rotation bug commit `dddd3379a6` upstream. It was found that sometimes children of tasks with inherited events had one extra event. Eventually it turned out to be due to the list rotation no being exclusive with the list iteration in the inheritance code. Cure this by temporarily disabling the rotation while we inherit the events. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:33:25 -08:00
Colin Cross	d81b749c97	PM / PM QoS: Fix reversed min and max commit `00fafcda17` upstream. pm_qos_get_value had min and max reversed, causing all pm_qos requests to have no effect. Signed-off-by: Colin Cross <ccross@android.com> Acked-by: mark <markgross@thegnar.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:33:18 -08:00
Nelson Elhage	9b504ae860	do_exit(): make sure that we run with get_fs() == USER_DS commit `33dd94ae1c` upstream. If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not otherwise reset before do_exit(). do_exit may later (via mm_release in fork.c) do a put_user to a user-controlled address, potentially allowing a user to leverage an oops into a controlled write into kernel memory. This is only triggerable in the presence of another bug, but this potentially turns a lot of DoS bugs into privilege escalations, so it's worth fixing. I have proof-of-concept code which uses this bug along with CVE-2010-3849 to write a zero to an arbitrary kernel address, so I've tested that this is not theoretical. A more logical place to put this fix might be when we know an oops has occurred, before we call do_exit(), but that would involve changing every architecture, in multiple places. Let's just stick it in do_exit instead. [akpm@linux-foundation.org: update code comment] Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:33:14 -08:00
Peter Zijlstra	e23d50e43c	sched: fix RCU lockdep splat from task_group() commit `6506cf6ce6` upstream. This addresses the following RCU lockdep splat: [0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03 [0.052999] lockdep: fixing up alternatives. [0.054105] [0.054106] =================================================== [0.054999] [ INFO: suspicious rcu_dereference_check() usage. ] [0.054999] --------------------------------------------------- [0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection! [0.054999] [0.054999] other info that might help us debug this: [0.054999] [0.054999] [0.054999] rcu_scheduler_active = 1, debug_locks = 1 [0.054999] 3 locks held by swapper/1: [0.054999] #0: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff814be933>] cpu_up+0x42/0x6a [0.054999] #1: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51 [0.054999] #2: (&rq->lock){-.-...}, at: [<ffffffff814be2f7>] init_idle+0x2f/0x113 [0.054999] [0.054999] stack backtrace: [0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1 [0.054999] Call Trace: [0.054999] [<ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3 [0.054999] [<ffffffff810325c3>] task_group+0x7b/0x8a [0.054999] [<ffffffff810325e5>] set_task_rq+0x13/0x40 [0.054999] [<ffffffff814be39a>] init_idle+0xd2/0x113 [0.054999] [<ffffffff814be78a>] fork_idle+0xb8/0xc7 [0.054999] [<ffffffff81068717>] ? mark_held_locks+0x4d/0x6b [0.054999] [<ffffffff814bcebd>] do_fork_idle+0x17/0x2b [0.054999] [<ffffffff814bc89b>] native_cpu_up+0x1c1/0x724 [0.054999] [<ffffffff814bcea6>] ? do_fork_idle+0x0/0x2b [0.054999] [<ffffffff814be876>] _cpu_up+0xac/0x127 [0.054999] [<ffffffff814be946>] cpu_up+0x55/0x6a [0.054999] [<ffffffff81ab562a>] kernel_init+0xe1/0x1ff [0.054999] [<ffffffff81003854>] kernel_thread_helper+0x4/0x10 [0.054999] [<ffffffff814c353c>] ? restore_args+0x0/0x30 [0.054999] [<ffffffff81ab5549>] ? kernel_init+0x0/0x1ff [0.054999] [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10 [0.056074] Booting Node 0, Processors #1lockdep: fixing up alternatives. [0.130045] #2lockdep: fixing up alternatives. [0.203089] #3 Ok. [0.275286] Brought up 4 CPUs [0.276005] Total of 4 processors activated (16017.17 BogoMIPS). The cgroup_subsys_state structures referenced by idle tasks are never freed, because the idle tasks should be part of the root cgroup, which is not removable. The problem is that while we do in-fact hold rq->lock, the newly spawned idle thread's cpu is not yet set to the correct cpu so the lockdep check in task_group(): lockdep_is_held(&task_rq(p)->lock) will fail. But this is a chicken and egg problem. Setting the CPU's runqueue requires that the CPU's runqueue already be set. ;-) So insert an RCU read-side critical section to avoid the complaint. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:32:59 -08:00
Ken Chen	a7a001eb00	latencytop: fix per task accumulator commit `38715258aa` upstream. Per task latencytop accumulator prematurely terminates due to erroneous placement of latency_record_count. It should be incremented whenever a new record is allocated instead of increment on every latencytop event. Also fix search iterator to only search known record events instead of blindly searching all pre-allocated space. Signed-off-by: Ken Chen <kenchen@google.com> Reviewed-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:32:53 -08:00
Dima Zavin	415c429577	Revert "futex: Restore one of the fast paths eliminated by 38d47c1b7075bd7ec3881141bb3629da58f88dab" This reverts commit `cb93471ed5`. Change-Id: I7f0b45c29b3b91ba5282c51eb6b315d70ac6c813 Signed-off-by: Dima Zavin <dima@android.com>	2010-12-08 16:27:01 -08:00
Darren Hart	380c7ff30f	futex: Fix errors in nested key ref-counting commit `7ada876a87` upstream. futex_wait() is leaking key references due to futex_wait_setup() acquiring an additional reference via the queue_lock() routine. The nested key ref-counting has been masking bugs and complicating code analysis. queue_lock() is only called with a previously ref-counted key, so remove the additional ref-counting from the queue_(un)lock() functions. Also futex_wait_requeue_pi() drops one key reference too many in unqueue_me_pi(). Remove the key reference handling from unqueue_me_pi(). This was paired with a queue_lock() in futex_lock_pi(), so the count remains unchanged. Document remaining nested key ref-counting sites. Signed-off-by: Darren Hart <dvhart@linux.intel.com> Reported-and-tested-by: Matthieu Fertré<matthieu.fertre@kerlabs.com> Reported-by: Louis Rilling<louis.rilling@kerlabs.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <4CBB17A8.70401@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-11-22 11:03:02 -08:00
Mathieu Desnoyers	781ec62a52	sched: Fix string comparison in /proc/sched_features commit `7740191cd9` upstream. Fix incorrect handling of the following case: INTERACTIVE INTERACTIVE_SOMETHING_ELSE The comparison only checks up to each element's length. Changelog since v1: - Embellish using some Rostedtisms. [ mingo: ^^ == smaller and cleaner ] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Lindgren <tony@atomide.com> LKML-Reference: <20100913214700.GB16118@Krystal> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-11-22 11:03:01 -08:00
Linus Walleij	0c1a8b4384	sched: Drop all load weight manipulation for RT tasks commit `17bdcf949d` upstream. Load weights are for the CFS, they do not belong in the RT task. This makes all RT scheduling classes leave the CFS weights alone. This fixes a real bug as well: I noticed the following phonomena: a process elevated to SCHED_RR forks with SCHED_RESET_ON_FORK set, and the child is indeed SCHED_OTHER, and the niceval is indeed reset to 0. However the weight inserted by set_load_weight() remains at 0, giving the task insignificat priority. With this fix, the weight is reset to what the task had before being elevated to SCHED_RR/SCHED_FIFO. Cc: Lennart Poettering <lennart@poettering.net> Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286807811-10568-1-git-send-email-linus.walleij@stericsson.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-11-22 11:03:01 -08:00
Stephane Eranian	603cb849c8	perf_events: Fix bogus context time tracking commit `c530ccd9a1` upstream. You can only call update_context_time() when the context is active, i.e., the thread it is attached to is still running. However, perf_event_read() can be called even when the context is inactive, e.g., user read() the counters. The call to update_context_time() must be conditioned on the status of the context, otherwise, bogus time_enabled, time_running may be returned. Here is an example on AMD64. The task program is an example from libpfm4. The -p prints deltas every 1s. $ task -p -e cpu_clk_unhalted sleep 5 2,266,610 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 5,242,358,071 cpu_clk_unhalted (99.95% scaling, ena=5,000,359,984, run=2,319,270) Whereas if you don't read deltas, e.g., no call to perf_event_read() until the process terminates: $ task -e cpu_clk_unhalted sleep 5 2,497,783 cpu_clk_unhalted (0.00% scaling, ena=2,376,899, run=2,376,899) Notice that time_enable, time_running are bogus in the first example causing bogus scaling. This patch fixes the problem, by conditionally calling update_context_time() in perf_event_read(). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4cb856dc.51edd80a.5ae0.38fb@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-11-22 11:03:00 -08:00
Colin Cross	56ee878646	pm_qos: Fix reversed min and max pm_qos_get_value had min and max reversed, causing all pm_qos requests to have no effect. Broken by the plist conversion, sha `5f279845f9`. Change-Id: I252c764821eac8d94de57eb482c05bf6afcea15b Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: mark gross <markgross@thegnar.org> Cc: James Bottomley <James.Bottomley@suse.de> Cc: stable <stable@kernel.org> Signed-off-by: Colin Cross <ccross@android.com>	2010-11-04 12:10:08 -07:00
Dima Zavin	5ff1b9cdb1	Merge commit 'v2.6.36' into android-2.6.36	2010-10-21 15:00:17 -07:00
Eric Dumazet	a9febbb4bd	sysctl: min/max bounds are optional sysctl check complains with a WARN() when proc_doulongvec_minmax() or proc_doulongvec_ms_jiffies_minmax() are used by a vector of longs (with more than one element), with no min or max value specified. This is unexpected, given we had a bug on this min/max handling :) Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Miller <davem@davemloft.net> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-15 14:42:24 -07:00
Salman Qazi	f13d4f979c	hrtimer: Preserve timer state in remove_hrtimer() The race is described as follows: CPU X CPU Y remove_hrtimer // state & QUEUED == 0 timer->state = CALLBACK unlock timer base timer->f(n) //very long hrtimer_start lock timer base remove_hrtimer // no effect hrtimer_enqueue timer->state = CALLBACK \| QUEUED unlock timer base hrtimer_start lock timer base remove_hrtimer mode = INACTIVE // CALLBACK bit lost! switch_hrtimer_base CALLBACK bit not set: timer->base changes to a different CPU. lock this CPU's timer base The bug was introduced with commit `ca109491f` (hrtimer: removing all ur callback modes) in 2.6.29 [ tglx: Feed new state via local variable and add a comment. ] Signed-off-by: Salman Qazi <sqazi@google.com> Cc: akpm@linux-foundation.org Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20101012142351.8485.21823.stgit@dungbeetle.mtv.corp.google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org	2010-10-14 13:29:59 +02:00
Steven Rostedt	d01343244a	ring-buffer: Fix typo of time extends per page Time stamps for the ring buffer are created by the difference between two events. Each page of the ring buffer holds a full 64 bit timestamp. Each event has a 27 bit delta stamp from the last event. The unit of time is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events happen more than 134 milliseconds apart, a time extend is inserted to add more bits for the delta. The time extend has 59 bits, which is good for ~18 years. Currently the time extend is committed separately from the event. If an event is discarded before it is committed, due to filtering, the time extend still exists. If all events are being filtered, then after ~134 milliseconds a new time extend will be added to the buffer. This can only happen till the end of the page. Since each page holds a full timestamp, there is no reason to add a time extend to the beginning of a page. Time extends can only fill a page that has actual data at the beginning, so there is no fear that time extends will fill more than a page without any data. When reading an event, a loop is made to skip over time extends since they are only used to maintain the time stamp and are never given to the caller. As a paranoid check to prevent the loop running forever, with the knowledge that time extends may only fill a page, a check is made that tests the iteration of the loop, and if the iteration is more than the number of time extends that can fit in a page a warning is printed and the ring buffer is disabled (all of ftrace is also disabled with it). There is another event type that is called a TIMESTAMP which can hold 64 bits of data in the theoretical case that two events happen 18 years apart. This code has not been implemented, but the name of this event exists, as well as the structure for it. The size of a TIMESTAMP is 16 bytes, where as a time extend is only 8 bytes. The macro used to calculate how many time extends can fit on a page used the TIMESTAMP size instead of the time extend size cutting the amount in half. The following test case can easily trigger the warning since we only need to have half the page filled with time extends to trigger the warning: # cd /sys/kernel/debug/tracing/ # echo function > current_tracer # echo 'common_pid < 0' > events/ftrace/function/filter # echo > trace # echo 1 > trace_marker # sleep 120 # cat trace Enabling the function tracer and then setting the filter to only trace functions where the process id is negative (no events), then clearing the trace buffer to ensure that we have nothing in the buffer, then write to trace_marker to add an event to the beginning of a page, sleep for 2 minutes (only 35 seconds is probably needed, but this guarantees the bug), and then finally reading the trace which will trigger the bug. This patch fixes the typo and prevents the false positive of that warning. Reported-by: Hans J. Koch <hjk@linutronix.de> Tested-by: Hans J. Koch <hjk@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stable Kernel <stable@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-10-12 12:06:43 -04:00
John Blackwood	ad0cf3478d	perf: Fix incorrect copy_from_user() usage perf events: repair incorrect use of copy_from_user This makes the perf_event_period() return 0 instead of -EFAULT on success. Signed-off-by: John Blackwood<john.blackwood@ccur.com> Signed-off-by: Joe Korty <joe.korty@ccur.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100928220311.GA18145@tsunami.ccur.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 11:45:01 +02:00
Linus Torvalds	6b0cd00bc3	Merge branch 'hwpoison-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: HWPOISON: Stop shrinking at right page count HWPOISON: Report correct address granuality for AO huge page errors HWPOISON: Copy si_addr_lsb to user page-types.c: fix name of unpoison interface	2010-10-07 13:59:32 -07:00
Eric Dumazet	27b3d80a7b	sysctl: fix min/max handling in __do_proc_doulongvec_minmax() When proc_doulongvec_minmax() is used with an array of longs, and no min/max check requested (.extra1 or .extra2 being NULL), we dereference a NULL pointer for the second element of the array. Noticed while doing some changes in network stack for the "16TB problem" Fix is to not change min & max pointers in __do_proc_doulongvec_minmax(), so that all elements of the vector share an unique min/max limit, like proc_dointvec_minmax(). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-07 13:31:21 -07:00
Andi Kleen	a337fdac7a	HWPOISON: Copy si_addr_lsb to user The original hwpoison code added a new siginfo field si_addr_lsb to pass the granuality of the fault address to user space. Unfortunately this field was never copied to user space. Fix this here. I added explicit checks for the MCEERR codes to avoid having to patch all potential callers to initialize the field. Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-07 09:41:25 +02:00
Linus Torvalds	e1d9694cae	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: rcu_read_lock_bh_held(): disabling irqs also disables bh generic-ipi: Fix deadlock in __smp_call_function_single	2010-10-05 13:07:43 -07:00
Linus Torvalds	5336377d62	modules: Fix module_bug_list list corruption race With all the recent module loading cleanups, we've minimized the code that sits under module_mutex, fixing various deadlocks and making it possible to do most of the module loading in parallel. However, that whole conversion totally missed the rather obscure code that adds a new module to the list for BUG() handling. That code was doubly obscure because (a) the code itself lives in lib/bugs.c (for dubious reasons) and (b) it gets called from the architecture-specific "module_finalize()" rather than from generic code. Calling it from arch-specific code makes no sense what-so-ever to begin with, and is now actively wrong since that code isn't protected by the module loading lock any more. So this commit moves the "module_bug_{finalize,cleanup}()" calls away from the arch-specific code, and into the generic code - and in the process protects it with the module_mutex so that the list operations are now safe. Future fixups: - move the module list handling code into kernel/module.c where it belongs. - get rid of 'module_bug_list' and just use the regular list of modules (called 'modules' - imagine that) that we already create and maintain for other reasons. Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-05 11:29:27 -07:00
Dima Zavin	776dd2c874	sched: use the old min_vruntime when normalizing on dequeue After pulling the thread off the run-queue during a cgroup change, the cfs_rq.min_vruntime gets recalculated. The dequeued thread's vruntime then gets normalized to this new value. This can then lead to the thread getting an unfair boost in the new group if the vruntime of the next task in the old run-queue was way further ahead. Cc: Arve Hjønnevåg <arve@android.com> Signed-off-by: Dima Zavin <dima@android.com>	2010-10-04 12:08:05 -07:00
Dima Zavin	cbb6589904	sched: normalize sleeper's vruntime during group change If you switch the cgroup of a sleeping thread, its vruntime does not get adjusted correctly for the difference between the min_vruntime values of the two groups. This patch adds a new callback, prep_move_task, to struct sched_class to give sched_fair the opportunity to adjust the task's vruntime just before setting its new group. This allows us to properly normalize a sleeping task's vruntime when moving it between different cgroups. More details about the problem: http://lkml.org/lkml/2010/9/28/24 Cc: Arve Hjønnevåg <arve@android.com> Signed-off-by: Dima Zavin <dima@android.com>	2010-10-04 12:08:04 -07:00
Ira W. Snyder	399f1e30ac	kfifo: fix scatterlist usage The kfifo_dma family of functions use sg_mark_end() on the last element in their scatterlist. This forces use of a fresh scatterlist for each DMA operation, which makes recycling a single scatterlist impossible. Change the behavior of the kfifo_dma functions to match the usage of the dma_map_sg function. This means that users must respect the returned nents value. The sample code is updated to reflect the change. This bug is trivial to cause: call kfifo_dma_in_prepare() such that it prepares a scatterlist with a single entry comprising the whole fifo. This is the case when you map the entirety of a newly created empty fifo. This causes the setup_sgl() function to mark the first scatterlist entry as the end of the chain, no matter what comes after it. Afterwards, add and remove some data from the fifo such that another call to kfifo_dma_in_prepare() will create two scatterlist entries. It returns nents=2. However, due to the previous sg_mark_end() call, sg_is_last() will now return true for the first scatterlist element. This causes the sample code to print a single scatterlist element when it should print two. By removing the call to sg_mark_end(), we make the API as similar as possible to the DMA mapping API. All users are required to respect the returned nents. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Cc: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-01 10:50:58 -07:00
Erik Gilling	43e1b91e2f	power: wakelock: call __get_wall_to_monotonic() instead of using wall_to_monotonic Change-Id: I9e9c3b923bf9a22ffd48f80a72050289496e57d8	2010-09-29 17:49:47 -07:00
Colin Cross	eb31fe8d3c	wakelock: Fix operator precedence bug Change-Id: I21366ace371d1b8f4684ddbe4ea8d555a926ac21 Signed-off-by: Colin Cross <ccross@google.com>	2010-09-29 17:49:45 -07:00
Mike Chan	9237de98d1	scheduler: cpuacct: Enable platform callbacks for cpuacct power tracking Platform must register cpu power function that return power in milliWatt seconds. Change-Id: I1caa0335e316c352eee3b1ddf326fcd4942bcbe8 Signed-off-by: Mike Chan <mike@android.com>	2010-09-29 17:49:41 -07:00
Mike Chan	8d8e56b794	scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies Introduce new platform callback hooks for cpuacct for tracking CPU frequencies Not all platforms / architectures have a set CPU_FREQ_TABLE defined for CPU transition speeds. In order to track time spent in at various CPU frequencies, we enable platform callbacks from cpuacct for this accounting. Architectures that support overclock boosting, or don't have pre-defined frequency tables can implement their own bucketing system that makes sense given their cpufreq scaling abilities. New file: cpuacct.cpufreq reports the CPU time (in nanoseconds) spent at each CPU frequency. Change-Id: I10a80b3162e6fff3a8a2f74dd6bb37e88b12ba96 Signed-off-by: Mike Chan <mike@android.com>	2010-09-29 17:49:41 -07:00
San Mehat	e79ef44016	sched: Add a generic notifier when a task struct is about to be freed This patch adds a notifier which can be used by subsystems that may be interested in when a task has completely died and is about to have it's last resource freed. The Android lowmemory killer uses this to determine when a task it has killed has finally given up its goods. Signed-off-by: San Mehat <san@google.com>	2010-09-29 17:49:34 -07:00
Mike Chan	4006c25811	power: wakelock: Print active wakelocks when has_wake_lock() is called When DEBUG_SUSPEND is enabled print active wakelocks when we check if there are any active wakelocks. In print_active_locks(), print expired wakelocks if DEBUG_EXPIRE is enabled Change-Id: Ib1cb795555e71ff23143a2bac7c8a58cbce16547 Signed-off-by: Mike Chan <mike@android.com>	2010-09-29 17:49:28 -07:00
San Mehat	755d3f1c9b	kernel: printk: Add non exported function for clearing the log ring buffer Signed-off-by: San Mehat <san@google.com>	2010-09-29 17:49:24 -07:00
Arve Hjønnevåg	703a8808cc	printk: Fix log_buf_copy termination. If idx was non-zero and the log had wrapped, len did not get truncated to stop at the last byte written to the log.	2010-09-29 17:49:21 -07:00
Arve Hjønnevåg	916040d8b9	Revert "printk: remove unused code from kernel/printk.c" This reverts commit `acff181d35`.	2010-09-29 17:49:19 -07:00
Colin Cross	8dc808da52	Fix cgroup: Add generic cgroup subsystem permission checks. Change-Id: Ib82f6a716686a3ebb4592112400fc5e4a2ce066c Signed-off-by: Colin Cross <ccross@android.com>	2010-09-29 17:49:18 -07:00
San Mehat	c316a1d4a0	cgroup: Add generic cgroup subsystem permission checks. Rather than using explicit euid == 0 checks when trying to move tasks into a cgroup via CFS, move permission checks into each specific cgroup subsystem. If a subsystem does not specify a 'can_attach' handler, then we fall back to doing our checks the old way. This way non-root processes can add arbitrary processes to a cgroup if all the registered subsystems on that cgroup agree. Also change explicit euid == 0 check to CAP_SYS_ADMIN Signed-off-by: San Mehat <san@google.com>	2010-09-29 17:49:18 -07:00
Rebecca Schultz	229b3e82d7	PM: earlysuspend: Removing dependence on console. Rather than signaling a full update of the display from userspace via a console switch, this patch introduces 2 files int /sys/power, wait_for_fb_sleep and wait_for_fb_wake. Reading these files will block until the requested state has been entered. When a read from wait_for_fb_sleep returns userspace should stop drawing. When wait_for_fb_wake returns, it should do a full update. If either are called when the fb driver is already in the requested state, they will return immediately. Signed-off-by: Rebecca Schultz <rschultz@google.com> Signed-off-by: Arve Hjønnevåg <arve@android.com>	2010-09-29 17:49:08 -07:00
Arve Hjønnevåg	e1310abde5	consoleearlysuspend: Fix for 2.6.32 vt_waitactive now needs a 1 based console number Change-Id: I07ab9a3773c93d67c09d928c8d5494ce823ffa2e	2010-09-29 17:49:08 -07:00
Arve Hjønnevåg	8dcec0d65d	PM: earlysuspend: Add console switch when user requested sleep state changes. Signed-off-by: Arve Hjønnevåg <arve@android.com>	2010-09-29 17:49:07 -07:00
Arve Hjønnevåg	a0902564d1	PM: wakelock: Don't dump unfrozen task list when aborting try_to_freeze_tasks after less than one second Change-Id: Ib2976e5b97a5ee4ec9abd4d4443584d9257d0941 Signed-off-by: Arve Hjønnevåg <arve@android.com>	2010-09-29 17:49:07 -07:00
Arve Hjønnevåg	003cfdd042	PM: wakelock: Abort task freezing if a wake lock is held. Avoids a problem where the device sometimes hangs for 20 seconds before the screen is turned on.	2010-09-29 17:49:07 -07:00
Arve Hjønnevåg	507fedc715	PM: Add user-space wake lock api. This adds /sys/power/wake_lock and /sys/power/wake_unlock. Writing a string to wake_lock creates a wake lock the first time is sees a string and locks it. Optionally, the string can be followed by a timeout. To unlock the wake lock, write the same string to wake_unlock. Change-Id: I66c6e3fe6487d17f9c2fafde1174042e57d15cd7	2010-09-29 17:49:07 -07:00
Arve Hjønnevåg	f104aa02b8	PM: Enable early suspend through /sys/power/state If EARLYSUSPEND is enabled then writes to /sys/power/state no longer blocks, and the kernel will try to enter the requested state every time no wakelocks are held. Write "on" to resume normal operation.	2010-09-29 17:49:07 -07:00
Arve Hjønnevåg	adc899ef78	PM: Implement early suspend api	2010-09-29 17:49:07 -07:00
Arve Hjønnevåg	501a7506d8	PM: wakelocks: Use seq_file for /proc/wakelocks so we can get more than 3K of stats. Change-Id: I42ed8bea639684f7a8a95b2057516764075c6b01 Signed-off-by: Arve Hjønnevåg <arve@android.com>	2010-09-29 17:49:06 -07:00
Erik Gilling	b19163eeb4	power: wakelocks: fix buffer overflow in print_wake_locks Change-Id: Ic944e3b3d3bc53eddc6fd0963565fd072cac373c Signed-off-by: Erik Gilling <konkers@android.com>	2010-09-29 17:49:06 -07:00

1 2 3 4 5 ...

10309 Commits