linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-19 20:10:43 +09:00

Author	SHA1	Message	Date
Colin Cross	3515161181	Merge branch 'linux-tegra-2.6.36' into android-tegra-2.6.36 Conflicts: arch/arm/mm/cache-v6.S Change-Id: I1a2063218dd705a762a40f4a9dfe504ce1a1d491	2011-01-07 17:20:53 -08:00
Oleg Nesterov	2a71e215aa	posix-cpu-timers: workaround to suppress the problems with mt exec commit `e0a7021710` upstream. posix-cpu-timers.c correctly assumes that the dying process does posix_cpu_timers_exit_group() and removes all !CPUCLOCK_PERTHREAD timers from signal->cpu_timers list. But, it also assumes that timer->it.cpu.task is always the group leader, and thus the dead ->task means the dead thread group. This is obviously not true after de_thread() changes the leader. After that almost every posix_cpu_timer_ method has problems. It is not simple to fix this bug correctly. First of all, I think that timer->it.cpu should use struct pid instead of task_struct. Also, the locking should be reworked completely. In particular, tasklist_lock should not be used at all. This all needs a lot of nontrivial and hard-to-test changes. Change __exit_signal() to do posix_cpu_timers_exit_group() when the old leader dies during exec. This is not the fix, just the temporary hack to hide the problem for 2.6.37 and stable. IOW, this is obviously wrong but this is what we currently have anyway: cpu timers do not work after mt exec. In theory this change adds another race. The exiting leader can detach the timers which were attached to the new leader. However, the window between de_thread() and release_task() is small, we can pretend that sys_timer_create() was called before de_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-01-07 13:58:52 -08:00
Mike Galbraith	9f4bd59e7b	Sched: fix skip_clock_update optimization commit `f26f9aff6a` upstream. idle_balance() drops/retakes rq->lock, leaving the previous task vulnerable to set_tsk_need_resched(). Clear it after we return from balancing instead, and in setup_thread_stack() as well, so no successfully descheduled or never scheduled task has it set. Need resched confused the skip_clock_update logic, which assumes that the next call to update_rq_clock() will come nearly immediately after being set. Make the optimization robust against the waking a sleeper before it sucessfully deschedules case by checking that the current task has not been dequeued before setting the flag, since it is that useless clock update we're trying to save, and clear unconditionally in schedule() proper instead of conditionally in put_prev_task(). Signed-off-by: Mike Galbraith <efault@gmx.de> Reported-by: Bjoern B. Brandenburg <bbb.lst@gmail.com> Tested-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1291802742.1417.9.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-01-07 13:58:51 -08:00
Ben Hutchings	36bd931c61	watchdog: Improve initialisation error message and documentation commit `551423748a` upstream. The error message 'NMI watchdog failed to create perf event...' does not make it clear that this is a fatal error for the watchdog. It also currently prints the error value as a pointer, rather than extracting the error code with PTR_ERR(). Fix that. Add a note to the description of the 'nowatchdog' kernel parameter to associate it with this message. Reported-by: Cesare Leonardi <celeonar@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: 599368@bugs.debian.org Cc: 608138@bugs.debian.org Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1294009362.3167.126.camel@localhost> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-01-07 13:58:43 -08:00
Hillf Danton	c9009e7208	fix freeing user_struct in user cache commit `4ef9e11d68` upstream. When racing on adding into user cache, the new allocated from mm slab is freed without putting user namespace. Since the user namespace is already operated by getting, putting has to be issued. Signed-off-by: Hillf Danton <dhillf@gmail.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-01-07 13:58:42 -08:00
Slava Pestov	3ee6c4cf57	tracing: Fix panic when lseek() called on "trace" opened for writing commit `364829b126` upstream. The file_ops struct for the "trace" special file defined llseek as seq_lseek(). However, if the file was opened for writing only, seq_open() was not called, and the seek would dereference a null pointer, file->private_data. This patch introduces a new wrapper for seq_lseek() which checks if the file descriptor is opened for reading first. If not, it does nothing. Signed-off-by: Slava Pestov <slavapestov@google.com> LKML-Reference: <1290640396-24179-1-git-send-email-slavapestov@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-01-07 13:58:33 -08:00
Peter Zijlstra	2ebd26f64a	sched: Cure more NO_HZ load average woes commit `0f004f5a69` upstream. There's a long-running regression that proved difficult to fix and which is hitting certain people and is rather annoying in its effects. Damien reported that after `74f5187ac8` (sched: Cure load average vs NO_HZ woes) his load average is unnaturally high, he also noted that even with that patch reverted the load avgerage numbers are not correct. The problem is that the previous patch only solved half the NO_HZ problem, it addressed the part of going into NO_HZ mode, not of comming out of NO_HZ mode. This patch implements that missing half. When comming out of NO_HZ mode there are two important things to take care of: - Folding the pending idle delta into the global active count. - Correctly aging the averages for the idle-duration. So with this patch the NO_HZ interaction should be complete and behaviour between CONFIG_NO_HZ=[yn] should be equivalent. Furthermore, this patch slightly changes the load average computation by adding a rounding term to the fixed point multiplication. Reported-by: Damien Wyart <damien.wyart@free.fr> Reported-by: Tim McGrath <tmhikaru@gmail.com> Tested-by: Damien Wyart <damien.wyart@free.fr> Tested-by: Orion Poplawski <orion@cora.nwra.com> Tested-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Chase Douglas <chase.douglas@canonical.com> LKML-Reference: <1291129145.32004.874.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-01-07 13:58:31 -08:00
Heiko Carstens	b356878782	printk: Fix wake_up_klogd() vs cpu hotplug commit `49f4138346` upstream. wake_up_klogd() may get called from preemptible context but uses __raw_get_cpu_var() to write to a per cpu variable. If it gets preempted between getting the address and writing to it, the cpu in question could be offline if the process gets scheduled back and hence writes to the per cpu data of an offline cpu. This buggy behaviour was introduced with `fa33507a` "printk: robustify printk, fix #2" which was supposed to fix a "using smp_processor_id() in preemptible" warning. Let's use this_cpu_write() instead which disables preemption and makes sure that the outlined scenario cannot happen. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101126124247.GC7023@osiris.boeblingen.de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-01-07 13:58:30 -08:00
Takashi Iwai	91364cfb2f	PM / Hibernate: Fix PM_POST_* notification with user-space suspend commit `1497dd1d29` upstream. The user-space hibernation sends a wrong notification after the image restoration because of thinko for the file flag check. RDONLY corresponds to hibernation and WRONLY to restoration, confusingly. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-01-07 13:58:19 -08:00
Heiko Carstens	de25afbd8a	nohz: Fix get_next_timer_interrupt() vs cpu hotplug commit `dbd87b5af0` upstream. This fixes a bug as seen on 2.6.32 based kernels where timers got enqueued on offline cpus. If a cpu goes offline it might still have pending timers. These will be migrated during CPU_DEAD handling after the cpu is offline. However while the cpu is going offline it will schedule the idle task which will then call tick_nohz_stop_sched_tick(). That function in turn will call get_next_timer_intterupt() to figure out if the tick of the cpu can be stopped or not. If it turns out that the next tick is just one jiffy off (delta_jiffies == 1) tick_nohz_stop_sched_tick() incorrectly assumes that the tick should not stop and takes an early exit and thus it won't update the load balancer cpu. Just afterwards the cpu will be killed and the load balancer cpu could be the offline cpu. On 2.6.32 based kernel get_nohz_load_balancer() gets called to decide on which cpu a timer should be enqueued (see __mod_timer()). Which leads to the possibility that timers get enqueued on an offline cpu. These will never expire and can cause a system hang. This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses get_nohz_timer_target() which doesn't have that problem. However there might be other problems because of the too early exit tick_nohz_stop_sched_tick() in case a cpu goes offline. The easiest and probably safest fix seems to be to let get_next_timer_interrupt() just lie and let it say there isn't any pending timer if the current cpu is offline. I also thought of moving migrate_[hr]timers() from CPU_DEAD to CPU_DYING, but seeing that there already have been fixes at least in the hrtimer code in this area I'm afraid that this could add new subtle bugs. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101201091109.GA8984@osiris.boeblingen.de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-01-07 13:58:15 -08:00
Heiko Carstens	a32a0b1102	nohz: Fix printk_needs_cpu() return value on offline cpus commit `61ab25447a` upstream. This patch fixes a hang observed with 2.6.32 kernels where timers got enqueued on offline cpus. printk_needs_cpu() may return 1 if called on offline cpus. When a cpu gets offlined it schedules the idle process which, before killing its own cpu, will call tick_nohz_stop_sched_tick(). That function in turn will call printk_needs_cpu() in order to check if the local tick can be disabled. On offline cpus this function should naturally return 0 since regardless if the tick gets disabled or not the cpu will be dead short after. That is besides the fact that __cpu_disable() should already have made sure that no interrupts on the offlined cpu will be delivered anyway. In this case it prevents tick_nohz_stop_sched_tick() to call select_nohz_load_balancer(). No idea if that really is a problem. However what made me debug this is that on 2.6.32 the function get_nohz_load_balancer() is used within __mod_timer() to select a cpu on which a timer gets enqueued. If printk_needs_cpu() returns 1 then the nohz_load_balancer cpu doesn't get updated when a cpu gets offlined. It may contain the cpu number of an offline cpu. In turn timers get enqueued on an offline cpu and not very surprisingly they never expire and cause system hangs. This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses get_nohz_timer_target() which doesn't have that problem. However there might be other problems because of the too early exit tick_nohz_stop_sched_tick() in case a cpu goes offline. Easiest way to fix this is just to test if the current cpu is offline and call printk_tick() directly which clears the condition. Alternatively I tried a cpu hotplug notifier which would clear the condition, however between calling the notifier function and printk_needs_cpu() something could have called printk() again and the problem is back again. This seems to be the safest fix. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101126120235.406766476@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-01-07 13:58:15 -08:00
Colin Cross	3f29a88349	Merge branch 'linux-tegra-2.6.36' into android-tegra-2.6.36 Conflicts: drivers/usb/gadget/composite.c Change-Id: I1a332ec21da62aea98912df9a01cf0282ed50ee1	2010-12-21 18:38:13 -08:00
Colin Cross	05946a1e0f	cgroup: Remove call to synchronize_rcu in cgroup_attach_task synchronize_rcu can be very expensive, averaging 100 ms in some cases. In cgroup_attach_task, it is used to prevent a task->cgroups pointer dereferenced in an RCU read side critical section from being invalidated, by delaying the call to put_css_set until after an RCU grace period. To avoid the call to synchronize_rcu, make the put_css_set call rcu-safe by moving the deletion of the css_set links into free_css_set_work, scheduled by the rcu callback free_css_set_rcu. The decrement of the cgroup refcount is no longer synchronous with the call to put_css_set, which can result in the cgroup refcount staying positive after the last call to cgroup_attach_task returns. To allow the cgroup to be deleted with cgroup_rmdir synchronously after cgroup_attach_task, have rmdir check the refcount of all associated css_sets. If cgroup_rmdir is called on a cgroup for which the css_sets all have refcount zero but the cgroup refcount is nonzero, reuse the rmdir waitqueue to block the rmdir until free_css_set_work is called. Signed-off-by: Colin Cross <ccross@android.com>	2010-12-17 17:01:51 -08:00
Colin Cross	60cdbd1f33	cgroup: Set CGRP_RELEASABLE when adding to a cgroup Changes the meaning of CGRP_RELEASABLE to be set on any cgroup that has ever had a task or cgroup in it, or had css_get called on it. The bit is set in cgroup_attach_task, cgroup_create, and __css_get. It is not necessary to set the bit in cgroup_fork, as the task is either in the root cgroup, in which can never be released, or the task it was forked from already set the bit in croup_attach_task. Signed-off-by: Colin Cross <ccross@android.com>	2010-12-17 17:01:49 -08:00
Kenji Kaneshige	2184192578	genirq: Fix incorrect proc spurious output commit `25c9170ed6` upstream. Since commit a1afb637(switch /proc/irq/*/spurious to seq_file) all /proc/irq/XX/spurious files show the information of irq 0. Current irq_spurious_proc_open() passes on NULL as the 3rd argument, which is used as an IRQ number in irq_spurious_proc_show(), to the single_open(). Because of this, all the /proc/irq/XX/spurious file shows IRQ 0 information regardless of the IRQ number. To fix the problem, irq_spurious_proc_open() must pass on the appropreate data (IRQ number) to single_open(). Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Reviewed-by: Yong Zhang <yong.zhang0@gmail.com> LKML-Reference: <4CF4B778.90604@jp.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:33:27 -08:00
Rafael J. Wysocki	53e87163a1	PM / Hibernate: Fix memory corruption related to swap commit `c9e664f1fd` upstream. There is a problem that swap pages allocated before the creation of a hibernation image can be released and used for storing the contents of different memory pages while the image is being saved. Since the kernel stored in the image doesn't know of that, it causes memory corruption to occur after resume from hibernation, especially on systems with relatively small RAM that need to swap often. This issue can be addressed by keeping the GFP_IOFS bits clear in gfp_allowed_mask during the entire hibernation, including the saving of the image, until the system is finally turned off or the hibernation is aborted. Unfortunately, for this purpose it's necessary to rework the way in which the hibernate and suspend code manipulates gfp_allowed_mask. This change is based on an earlier patch from Hugh Dickins. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-by: Ondrej Zary <linux@rainbow-software.org> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:33:26 -08:00
Thomas Gleixner	03aff981d0	perf: Fix inherit vs. context rotation bug commit `dddd3379a6` upstream. It was found that sometimes children of tasks with inherited events had one extra event. Eventually it turned out to be due to the list rotation no being exclusive with the list iteration in the inheritance code. Cure this by temporarily disabling the rotation while we inherit the events. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:33:25 -08:00
Colin Cross	d81b749c97	PM / PM QoS: Fix reversed min and max commit `00fafcda17` upstream. pm_qos_get_value had min and max reversed, causing all pm_qos requests to have no effect. Signed-off-by: Colin Cross <ccross@android.com> Acked-by: mark <markgross@thegnar.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:33:18 -08:00
Nelson Elhage	9b504ae860	do_exit(): make sure that we run with get_fs() == USER_DS commit `33dd94ae1c` upstream. If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not otherwise reset before do_exit(). do_exit may later (via mm_release in fork.c) do a put_user to a user-controlled address, potentially allowing a user to leverage an oops into a controlled write into kernel memory. This is only triggerable in the presence of another bug, but this potentially turns a lot of DoS bugs into privilege escalations, so it's worth fixing. I have proof-of-concept code which uses this bug along with CVE-2010-3849 to write a zero to an arbitrary kernel address, so I've tested that this is not theoretical. A more logical place to put this fix might be when we know an oops has occurred, before we call do_exit(), but that would involve changing every architecture, in multiple places. Let's just stick it in do_exit instead. [akpm@linux-foundation.org: update code comment] Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:33:14 -08:00
Peter Zijlstra	e23d50e43c	sched: fix RCU lockdep splat from task_group() commit `6506cf6ce6` upstream. This addresses the following RCU lockdep splat: [0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03 [0.052999] lockdep: fixing up alternatives. [0.054105] [0.054106] =================================================== [0.054999] [ INFO: suspicious rcu_dereference_check() usage. ] [0.054999] --------------------------------------------------- [0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection! [0.054999] [0.054999] other info that might help us debug this: [0.054999] [0.054999] [0.054999] rcu_scheduler_active = 1, debug_locks = 1 [0.054999] 3 locks held by swapper/1: [0.054999] #0: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff814be933>] cpu_up+0x42/0x6a [0.054999] #1: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51 [0.054999] #2: (&rq->lock){-.-...}, at: [<ffffffff814be2f7>] init_idle+0x2f/0x113 [0.054999] [0.054999] stack backtrace: [0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1 [0.054999] Call Trace: [0.054999] [<ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3 [0.054999] [<ffffffff810325c3>] task_group+0x7b/0x8a [0.054999] [<ffffffff810325e5>] set_task_rq+0x13/0x40 [0.054999] [<ffffffff814be39a>] init_idle+0xd2/0x113 [0.054999] [<ffffffff814be78a>] fork_idle+0xb8/0xc7 [0.054999] [<ffffffff81068717>] ? mark_held_locks+0x4d/0x6b [0.054999] [<ffffffff814bcebd>] do_fork_idle+0x17/0x2b [0.054999] [<ffffffff814bc89b>] native_cpu_up+0x1c1/0x724 [0.054999] [<ffffffff814bcea6>] ? do_fork_idle+0x0/0x2b [0.054999] [<ffffffff814be876>] _cpu_up+0xac/0x127 [0.054999] [<ffffffff814be946>] cpu_up+0x55/0x6a [0.054999] [<ffffffff81ab562a>] kernel_init+0xe1/0x1ff [0.054999] [<ffffffff81003854>] kernel_thread_helper+0x4/0x10 [0.054999] [<ffffffff814c353c>] ? restore_args+0x0/0x30 [0.054999] [<ffffffff81ab5549>] ? kernel_init+0x0/0x1ff [0.054999] [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10 [0.056074] Booting Node 0, Processors #1lockdep: fixing up alternatives. [0.130045] #2lockdep: fixing up alternatives. [0.203089] #3 Ok. [0.275286] Brought up 4 CPUs [0.276005] Total of 4 processors activated (16017.17 BogoMIPS). The cgroup_subsys_state structures referenced by idle tasks are never freed, because the idle tasks should be part of the root cgroup, which is not removable. The problem is that while we do in-fact hold rq->lock, the newly spawned idle thread's cpu is not yet set to the correct cpu so the lockdep check in task_group(): lockdep_is_held(&task_rq(p)->lock) will fail. But this is a chicken and egg problem. Setting the CPU's runqueue requires that the CPU's runqueue already be set. ;-) So insert an RCU read-side critical section to avoid the complaint. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:32:59 -08:00
Ken Chen	a7a001eb00	latencytop: fix per task accumulator commit `38715258aa` upstream. Per task latencytop accumulator prematurely terminates due to erroneous placement of latency_record_count. It should be incremented whenever a new record is allocated instead of increment on every latencytop event. Also fix search iterator to only search known record events instead of blindly searching all pre-allocated space. Signed-off-by: Ken Chen <kenchen@google.com> Reviewed-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-12-09 13:32:53 -08:00
Dima Zavin	415c429577	Revert "futex: Restore one of the fast paths eliminated by 38d47c1b7075bd7ec3881141bb3629da58f88dab" This reverts commit `cb93471ed5`. Change-Id: I7f0b45c29b3b91ba5282c51eb6b315d70ac6c813 Signed-off-by: Dima Zavin <dima@android.com>	2010-12-08 16:27:01 -08:00
Darren Hart	380c7ff30f	futex: Fix errors in nested key ref-counting commit `7ada876a87` upstream. futex_wait() is leaking key references due to futex_wait_setup() acquiring an additional reference via the queue_lock() routine. The nested key ref-counting has been masking bugs and complicating code analysis. queue_lock() is only called with a previously ref-counted key, so remove the additional ref-counting from the queue_(un)lock() functions. Also futex_wait_requeue_pi() drops one key reference too many in unqueue_me_pi(). Remove the key reference handling from unqueue_me_pi(). This was paired with a queue_lock() in futex_lock_pi(), so the count remains unchanged. Document remaining nested key ref-counting sites. Signed-off-by: Darren Hart <dvhart@linux.intel.com> Reported-and-tested-by: Matthieu Fertré<matthieu.fertre@kerlabs.com> Reported-by: Louis Rilling<louis.rilling@kerlabs.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <4CBB17A8.70401@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-11-22 11:03:02 -08:00
Mathieu Desnoyers	781ec62a52	sched: Fix string comparison in /proc/sched_features commit `7740191cd9` upstream. Fix incorrect handling of the following case: INTERACTIVE INTERACTIVE_SOMETHING_ELSE The comparison only checks up to each element's length. Changelog since v1: - Embellish using some Rostedtisms. [ mingo: ^^ == smaller and cleaner ] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Lindgren <tony@atomide.com> LKML-Reference: <20100913214700.GB16118@Krystal> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-11-22 11:03:01 -08:00
Linus Walleij	0c1a8b4384	sched: Drop all load weight manipulation for RT tasks commit `17bdcf949d` upstream. Load weights are for the CFS, they do not belong in the RT task. This makes all RT scheduling classes leave the CFS weights alone. This fixes a real bug as well: I noticed the following phonomena: a process elevated to SCHED_RR forks with SCHED_RESET_ON_FORK set, and the child is indeed SCHED_OTHER, and the niceval is indeed reset to 0. However the weight inserted by set_load_weight() remains at 0, giving the task insignificat priority. With this fix, the weight is reset to what the task had before being elevated to SCHED_RR/SCHED_FIFO. Cc: Lennart Poettering <lennart@poettering.net> Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286807811-10568-1-git-send-email-linus.walleij@stericsson.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-11-22 11:03:01 -08:00
Stephane Eranian	603cb849c8	perf_events: Fix bogus context time tracking commit `c530ccd9a1` upstream. You can only call update_context_time() when the context is active, i.e., the thread it is attached to is still running. However, perf_event_read() can be called even when the context is inactive, e.g., user read() the counters. The call to update_context_time() must be conditioned on the status of the context, otherwise, bogus time_enabled, time_running may be returned. Here is an example on AMD64. The task program is an example from libpfm4. The -p prints deltas every 1s. $ task -p -e cpu_clk_unhalted sleep 5 2,266,610 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 5,242,358,071 cpu_clk_unhalted (99.95% scaling, ena=5,000,359,984, run=2,319,270) Whereas if you don't read deltas, e.g., no call to perf_event_read() until the process terminates: $ task -e cpu_clk_unhalted sleep 5 2,497,783 cpu_clk_unhalted (0.00% scaling, ena=2,376,899, run=2,376,899) Notice that time_enable, time_running are bogus in the first example causing bogus scaling. This patch fixes the problem, by conditionally calling update_context_time() in perf_event_read(). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4cb856dc.51edd80a.5ae0.38fb@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-11-22 11:03:00 -08:00
Colin Cross	56ee878646	pm_qos: Fix reversed min and max pm_qos_get_value had min and max reversed, causing all pm_qos requests to have no effect. Broken by the plist conversion, sha `5f279845f9`. Change-Id: I252c764821eac8d94de57eb482c05bf6afcea15b Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: mark gross <markgross@thegnar.org> Cc: James Bottomley <James.Bottomley@suse.de> Cc: stable <stable@kernel.org> Signed-off-by: Colin Cross <ccross@android.com>	2010-11-04 12:10:08 -07:00
Dima Zavin	5ff1b9cdb1	Merge commit 'v2.6.36' into android-2.6.36	2010-10-21 15:00:17 -07:00
Eric Dumazet	a9febbb4bd	sysctl: min/max bounds are optional sysctl check complains with a WARN() when proc_doulongvec_minmax() or proc_doulongvec_ms_jiffies_minmax() are used by a vector of longs (with more than one element), with no min or max value specified. This is unexpected, given we had a bug on this min/max handling :) Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Miller <davem@davemloft.net> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-15 14:42:24 -07:00
Salman Qazi	f13d4f979c	hrtimer: Preserve timer state in remove_hrtimer() The race is described as follows: CPU X CPU Y remove_hrtimer // state & QUEUED == 0 timer->state = CALLBACK unlock timer base timer->f(n) //very long hrtimer_start lock timer base remove_hrtimer // no effect hrtimer_enqueue timer->state = CALLBACK \| QUEUED unlock timer base hrtimer_start lock timer base remove_hrtimer mode = INACTIVE // CALLBACK bit lost! switch_hrtimer_base CALLBACK bit not set: timer->base changes to a different CPU. lock this CPU's timer base The bug was introduced with commit `ca109491f` (hrtimer: removing all ur callback modes) in 2.6.29 [ tglx: Feed new state via local variable and add a comment. ] Signed-off-by: Salman Qazi <sqazi@google.com> Cc: akpm@linux-foundation.org Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20101012142351.8485.21823.stgit@dungbeetle.mtv.corp.google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org	2010-10-14 13:29:59 +02:00
Steven Rostedt	d01343244a	ring-buffer: Fix typo of time extends per page Time stamps for the ring buffer are created by the difference between two events. Each page of the ring buffer holds a full 64 bit timestamp. Each event has a 27 bit delta stamp from the last event. The unit of time is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events happen more than 134 milliseconds apart, a time extend is inserted to add more bits for the delta. The time extend has 59 bits, which is good for ~18 years. Currently the time extend is committed separately from the event. If an event is discarded before it is committed, due to filtering, the time extend still exists. If all events are being filtered, then after ~134 milliseconds a new time extend will be added to the buffer. This can only happen till the end of the page. Since each page holds a full timestamp, there is no reason to add a time extend to the beginning of a page. Time extends can only fill a page that has actual data at the beginning, so there is no fear that time extends will fill more than a page without any data. When reading an event, a loop is made to skip over time extends since they are only used to maintain the time stamp and are never given to the caller. As a paranoid check to prevent the loop running forever, with the knowledge that time extends may only fill a page, a check is made that tests the iteration of the loop, and if the iteration is more than the number of time extends that can fit in a page a warning is printed and the ring buffer is disabled (all of ftrace is also disabled with it). There is another event type that is called a TIMESTAMP which can hold 64 bits of data in the theoretical case that two events happen 18 years apart. This code has not been implemented, but the name of this event exists, as well as the structure for it. The size of a TIMESTAMP is 16 bytes, where as a time extend is only 8 bytes. The macro used to calculate how many time extends can fit on a page used the TIMESTAMP size instead of the time extend size cutting the amount in half. The following test case can easily trigger the warning since we only need to have half the page filled with time extends to trigger the warning: # cd /sys/kernel/debug/tracing/ # echo function > current_tracer # echo 'common_pid < 0' > events/ftrace/function/filter # echo > trace # echo 1 > trace_marker # sleep 120 # cat trace Enabling the function tracer and then setting the filter to only trace functions where the process id is negative (no events), then clearing the trace buffer to ensure that we have nothing in the buffer, then write to trace_marker to add an event to the beginning of a page, sleep for 2 minutes (only 35 seconds is probably needed, but this guarantees the bug), and then finally reading the trace which will trigger the bug. This patch fixes the typo and prevents the false positive of that warning. Reported-by: Hans J. Koch <hjk@linutronix.de> Tested-by: Hans J. Koch <hjk@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stable Kernel <stable@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-10-12 12:06:43 -04:00
John Blackwood	ad0cf3478d	perf: Fix incorrect copy_from_user() usage perf events: repair incorrect use of copy_from_user This makes the perf_event_period() return 0 instead of -EFAULT on success. Signed-off-by: John Blackwood<john.blackwood@ccur.com> Signed-off-by: Joe Korty <joe.korty@ccur.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100928220311.GA18145@tsunami.ccur.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 11:45:01 +02:00
Linus Torvalds	6b0cd00bc3	Merge branch 'hwpoison-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: HWPOISON: Stop shrinking at right page count HWPOISON: Report correct address granuality for AO huge page errors HWPOISON: Copy si_addr_lsb to user page-types.c: fix name of unpoison interface	2010-10-07 13:59:32 -07:00
Eric Dumazet	27b3d80a7b	sysctl: fix min/max handling in __do_proc_doulongvec_minmax() When proc_doulongvec_minmax() is used with an array of longs, and no min/max check requested (.extra1 or .extra2 being NULL), we dereference a NULL pointer for the second element of the array. Noticed while doing some changes in network stack for the "16TB problem" Fix is to not change min & max pointers in __do_proc_doulongvec_minmax(), so that all elements of the vector share an unique min/max limit, like proc_dointvec_minmax(). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-07 13:31:21 -07:00
Andi Kleen	a337fdac7a	HWPOISON: Copy si_addr_lsb to user The original hwpoison code added a new siginfo field si_addr_lsb to pass the granuality of the fault address to user space. Unfortunately this field was never copied to user space. Fix this here. I added explicit checks for the MCEERR codes to avoid having to patch all potential callers to initialize the field. Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-07 09:41:25 +02:00
Linus Torvalds	e1d9694cae	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: rcu_read_lock_bh_held(): disabling irqs also disables bh generic-ipi: Fix deadlock in __smp_call_function_single	2010-10-05 13:07:43 -07:00
Linus Torvalds	5336377d62	modules: Fix module_bug_list list corruption race With all the recent module loading cleanups, we've minimized the code that sits under module_mutex, fixing various deadlocks and making it possible to do most of the module loading in parallel. However, that whole conversion totally missed the rather obscure code that adds a new module to the list for BUG() handling. That code was doubly obscure because (a) the code itself lives in lib/bugs.c (for dubious reasons) and (b) it gets called from the architecture-specific "module_finalize()" rather than from generic code. Calling it from arch-specific code makes no sense what-so-ever to begin with, and is now actively wrong since that code isn't protected by the module loading lock any more. So this commit moves the "module_bug_{finalize,cleanup}()" calls away from the arch-specific code, and into the generic code - and in the process protects it with the module_mutex so that the list operations are now safe. Future fixups: - move the module list handling code into kernel/module.c where it belongs. - get rid of 'module_bug_list' and just use the regular list of modules (called 'modules' - imagine that) that we already create and maintain for other reasons. Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-05 11:29:27 -07:00
Dima Zavin	776dd2c874	sched: use the old min_vruntime when normalizing on dequeue After pulling the thread off the run-queue during a cgroup change, the cfs_rq.min_vruntime gets recalculated. The dequeued thread's vruntime then gets normalized to this new value. This can then lead to the thread getting an unfair boost in the new group if the vruntime of the next task in the old run-queue was way further ahead. Cc: Arve Hjønnevåg <arve@android.com> Signed-off-by: Dima Zavin <dima@android.com>	2010-10-04 12:08:05 -07:00
Dima Zavin	cbb6589904	sched: normalize sleeper's vruntime during group change If you switch the cgroup of a sleeping thread, its vruntime does not get adjusted correctly for the difference between the min_vruntime values of the two groups. This patch adds a new callback, prep_move_task, to struct sched_class to give sched_fair the opportunity to adjust the task's vruntime just before setting its new group. This allows us to properly normalize a sleeping task's vruntime when moving it between different cgroups. More details about the problem: http://lkml.org/lkml/2010/9/28/24 Cc: Arve Hjønnevåg <arve@android.com> Signed-off-by: Dima Zavin <dima@android.com>	2010-10-04 12:08:04 -07:00
Ira W. Snyder	399f1e30ac	kfifo: fix scatterlist usage The kfifo_dma family of functions use sg_mark_end() on the last element in their scatterlist. This forces use of a fresh scatterlist for each DMA operation, which makes recycling a single scatterlist impossible. Change the behavior of the kfifo_dma functions to match the usage of the dma_map_sg function. This means that users must respect the returned nents value. The sample code is updated to reflect the change. This bug is trivial to cause: call kfifo_dma_in_prepare() such that it prepares a scatterlist with a single entry comprising the whole fifo. This is the case when you map the entirety of a newly created empty fifo. This causes the setup_sgl() function to mark the first scatterlist entry as the end of the chain, no matter what comes after it. Afterwards, add and remove some data from the fifo such that another call to kfifo_dma_in_prepare() will create two scatterlist entries. It returns nents=2. However, due to the previous sg_mark_end() call, sg_is_last() will now return true for the first scatterlist element. This causes the sample code to print a single scatterlist element when it should print two. By removing the call to sg_mark_end(), we make the API as similar as possible to the DMA mapping API. All users are required to respect the returned nents. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Cc: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-01 10:50:58 -07:00
Erik Gilling	43e1b91e2f	power: wakelock: call __get_wall_to_monotonic() instead of using wall_to_monotonic Change-Id: I9e9c3b923bf9a22ffd48f80a72050289496e57d8	2010-09-29 17:49:47 -07:00
Colin Cross	eb31fe8d3c	wakelock: Fix operator precedence bug Change-Id: I21366ace371d1b8f4684ddbe4ea8d555a926ac21 Signed-off-by: Colin Cross <ccross@google.com>	2010-09-29 17:49:45 -07:00
Mike Chan	9237de98d1	scheduler: cpuacct: Enable platform callbacks for cpuacct power tracking Platform must register cpu power function that return power in milliWatt seconds. Change-Id: I1caa0335e316c352eee3b1ddf326fcd4942bcbe8 Signed-off-by: Mike Chan <mike@android.com>	2010-09-29 17:49:41 -07:00
Mike Chan	8d8e56b794	scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies Introduce new platform callback hooks for cpuacct for tracking CPU frequencies Not all platforms / architectures have a set CPU_FREQ_TABLE defined for CPU transition speeds. In order to track time spent in at various CPU frequencies, we enable platform callbacks from cpuacct for this accounting. Architectures that support overclock boosting, or don't have pre-defined frequency tables can implement their own bucketing system that makes sense given their cpufreq scaling abilities. New file: cpuacct.cpufreq reports the CPU time (in nanoseconds) spent at each CPU frequency. Change-Id: I10a80b3162e6fff3a8a2f74dd6bb37e88b12ba96 Signed-off-by: Mike Chan <mike@android.com>	2010-09-29 17:49:41 -07:00
San Mehat	e79ef44016	sched: Add a generic notifier when a task struct is about to be freed This patch adds a notifier which can be used by subsystems that may be interested in when a task has completely died and is about to have it's last resource freed. The Android lowmemory killer uses this to determine when a task it has killed has finally given up its goods. Signed-off-by: San Mehat <san@google.com>	2010-09-29 17:49:34 -07:00
Mike Chan	4006c25811	power: wakelock: Print active wakelocks when has_wake_lock() is called When DEBUG_SUSPEND is enabled print active wakelocks when we check if there are any active wakelocks. In print_active_locks(), print expired wakelocks if DEBUG_EXPIRE is enabled Change-Id: Ib1cb795555e71ff23143a2bac7c8a58cbce16547 Signed-off-by: Mike Chan <mike@android.com>	2010-09-29 17:49:28 -07:00
San Mehat	755d3f1c9b	kernel: printk: Add non exported function for clearing the log ring buffer Signed-off-by: San Mehat <san@google.com>	2010-09-29 17:49:24 -07:00
Arve Hjønnevåg	703a8808cc	printk: Fix log_buf_copy termination. If idx was non-zero and the log had wrapped, len did not get truncated to stop at the last byte written to the log.	2010-09-29 17:49:21 -07:00
Arve Hjønnevåg	916040d8b9	Revert "printk: remove unused code from kernel/printk.c" This reverts commit `acff181d35`.	2010-09-29 17:49:19 -07:00
Colin Cross	8dc808da52	Fix cgroup: Add generic cgroup subsystem permission checks. Change-Id: Ib82f6a716686a3ebb4592112400fc5e4a2ce066c Signed-off-by: Colin Cross <ccross@android.com>	2010-09-29 17:49:18 -07:00

1 2 3 4 5 ...

10320 Commits