The context switch tracer was made before tracepoints were mature, and
the original version used markers. This is no longer true and this
patch removes the select.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The root cause is a duplicate section name (.text); is this legal?
[ Amerigo Wang: "AFAIK, yes." ]
However, there's a problem with commit
6d76013381 in that if you fail to allocate
a mod->sect_attrs (in this case it's null because of the duplication),
it still gets used without checking in add_notes_attrs()
This should fix it
[ This patch leaves other problems, particularly the sections directory,
but recent parisc toolchains seem to produce these modules and this
prevents a crash and is a minimal change -- RR ]
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
inotify: Ensure we alwasy write the terminating NULL.
inotify: fix locking around inotify watching in the idr
inotify: do not BUG on idr entries at inotify destruction
inotify: seperate new watch creation updating existing watches
We call lmb_end_of_DRAM() to test whether a DMA mask is ok on a machine
without IOMMU, but this function is marked as __init.
I don't think there's a clean way to get the top of RAM max_pfn doesn't
appear to include highmem or I missed (or we have a bug :-) so for now,
let's just avoid having a broken 2.6.31 by making this function
non-__init and we can revisit later.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: update documentation pointers
9p: remove unnecessary v9fses->options which duplicates the mount string
net/9p: insulate the client against an invalid error code sent by a 9p server
9p: Add missing cast for the error return value in v9fs_get_inode
9p: Remove redundant inode uid/gid assignment
9p: Fix possible regressions when ->get_sb fails.
9p: Fix v9fs show_options
9p: Fix possible memleak in v9fs_inode_from fid.
9p: minor comment fixes
9p: Fix possible inode leak in v9fs_get_inode.
9p: Check for error in return value of v9fs_fid_add
kAFS crashes when asked to read a symbolic link because page_getlink()
passes a NULL file pointer to read_mapping_page(), but afs_readpage()
expects a file pointer from which to extract a key.
Modify afs_readpage() to request the appropriate key from the calling
process's keyrings if a file struct is not supplied with one attached.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The recent commit:
tracing/events: fix the include file dependencies
fixed a file dependency problem while including more than
one trace event header file.
This fix undefined TRACE_EVENT after an event header macro
preprocessing in order to make tracepoint.h able to correctly declare
the tracepoints necessary for the next event header file.
But now we also need to undefine TRACE_EVENT_FN at the end of an event
header file preprocessing for the same reason.
This fixes the following build error:
In file included from include/trace/events/napi.h:5,
from net/core/net-traces.c:28:
include/linux/tracepoint.h:285:1: warning: "TRACE_EVENT_FN" redefined
In file included from include/trace/define_trace.h:61,
from include/trace/events/skb.h:40,
from net/core/net-traces.c:27:
include/trace/ftrace.h:50:1: warning: this is the location of the previous definition
In file included from include/trace/events/napi.h:5,
from net/core/net-traces.c:28:
include/linux/tracepoint.h:285:1: warning: "TRACE_EVENT_FN" redefined
In file included from include/trace/define_trace.h:61,
from include/trace/events/skb.h:40,
from net/core/net-traces.c:27:
include/trace/ftrace.h:50:1: warning: this is the location of the previous definition
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <20090827161732.GA7618@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add debug module option to snd core.
This controls the debug print level. When CONFIG_SND_DEBUG_VERBOSE
is set, you can suppress the debug messages by giving or changing this
parameter to a lower value. debug=0 means no debug messsages.
As default, it's set to the verbose level 2.
Since this option can be changed dynamically via sysfs file, you can
suppress the verbose debug messages on the fly, which wasn't possible
before.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The inum structure used throughout GFS2 has two fields. One
no_addr is the disk block number of the inode in question and
is used everywhere as the inode number. The other, no_formal_ino,
is used only as the generation number for NFS.
Historically the no_formal_ino field was set using a complicated
system of one global and one per-node file containing inode numbers
in order to ensure that each no_formal_ino was unique. Also this
code made no provision for what would happen when eventually the
(64 bit) numbers ran out. Now I know that is pretty unlikely to
happen given the large space of numbers, but it is possible
nevertheless.
The only guarantee required for no_formal_ino is that, for any
single inode, the same number doesn't get reused too quickly.
We already have a generation number which is kept in the inode
and initialised from a counter in the resource group (almost
no overhead, since we have to touch the resource group anyway
in order to allocate an inode in the first place). Aside from
ensuring that we never use the value 0 in the no_formal_ino
field, we can use that counter directly.
As a result of that change, we lose about 200 lines of code and
also gain about 10 creates/sec on the postmark benchmark (on
my test machine).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Some architectures initialize clocks and timers in late_time_init and
x86 wants to do the same to avoid FIXMAP hackery for calibrating the
TSC. That would result in undefined sched_clock readout and wreckaged
printk timestamps again. We probably have those already on archs which
do all their time/clock setup in late_time_init.
There is no harm to move that after late_time_init except that a few
more boot timestamps are stale. The scheduler is not active at that
point so no real wreckage is expected.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: linux-arch@vger.kernel.org
Introducing printing of the objects hex dump to the seq file.
The number of lines to be printed is limited to HEX_MAX_LINES
to prevent seq file spamming. The actual number of printed
bytes is less than or equal to (HEX_MAX_LINES * HEX_ROW_SIZE).
(slight adjustments by Catalin Marinas)
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch sets the min_count for alloc_bootmem objects to 0 so that
they are never reported as leaks. This is because many of these blocks
are only referred via the physical address which is not looked up by
kmemleak.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Before slab is initialised, kmemleak save the allocations in an early
log buffer. They are later recorded as normal memory allocations. This
patch adds the stack trace saving to the early log buffer, otherwise the
information shown for such objects only refers to the kmemleak_init()
function.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This buffer isn't needed after kmemleak was initialised so it can be
freed together with the .init.data section. This patch also marks
functions conditionally accessing the early log variables with __ref.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
By writing dump=<addr> to the kmemleak file, kmemleak will look up an
object with that address and dump the information it has about it to
syslog. This is useful in debugging memory leaks.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If the object size is bigger than a predefined value (4K in this case),
release the object lock during scanning and call cond_resched().
Re-acquire the lock after rescheduling and test whether the object is
still valid.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Before the rewrite copy_event_to_user always wrote a terqminating '\0'
byte to user space after the filename. Since the rewrite that
terminating byte was skipped if your filename is exactly a multiple of
event_size. Ouch!
So add one byte to name_size before we round up and use clear_user to
set userspace to zero like /dev/zero does instead of copying the
strange nul_inotify_event. I can't quite convince myself len_to_zero
will never exceed 16 and even if it doesn't clear_user should be more
efficient and a more accurate reflection of what the code is trying to
do.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
The are races around the idr storage of inotify watches. It's possible
that a watch could be found from sys_inotify_rm_watch() in the idr, but it
could be removed from the idr before that code does it's removal. Move the
locking and the refcnt'ing so that these have to happen atomically.
Signed-off-by: Eric Paris <eparis@redhat.com>
If an inotify watch is left in the idr when an fsnotify group is destroyed
this will lead to a BUG. This is not a dangerous situation and really
indicates a programming bug and leak of memory. This patch changes it to
use a WARN and a printk rather than killing people's boxes.
Signed-off-by: Eric Paris <eparis@redhat.com>
There is nothing known wrong with the inotify watch addition/modification
but this patch seperates the two code paths to make them each easy to
verify as correct.
Signed-off-by: Eric Paris <eparis@redhat.com>
When modules are built with M= option, they pass long file paths to
__FILE__. This results in ugly outputs of snd_print*() when
CONFIG_SND_VERBOSE_PRINTK is set.
This patch adds a check of the path and strips the leading path dirs
if the file name is an absolute path to improve the readability of logs.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Masking oneshot edge type interrupts is wrong as we might lose an
interrupt which is issued when the threaded handler is handling the
device. We can keep the irq unmasked safely as with edge type
interrupts there is no danger of interrupt floods. If the threaded
handler has not yet finished then IRQTF_RUNTHREAD is set which will
keep the handler thread active.
Debugged and verified in preempt-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Update ps3_defconfig.
o Refresh for 2.6.31.
o Remove MTD support.
o Add more HID drivers.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On non-PS3, we get:
| kernel BUG at drivers/rtc/rtc-ps3.c:36!
because the rtc-ps3 platform device is registered unconditionally in a kernel
with builtin support for PS3.
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k,m68knommu: Wire up rt_tgsigqueueinfo and perf_counter_open
m68k: Fix redefinition of pgprot_noncached
arch/m68k/include/asm/motorola_pgalloc.h: fix kunmap arg
m68k: cnt reaches -1, not 0
m68k: count can reach 51, not 50
Time time taken for a single cpu online operation on a pseries machine
is as follows:
Dedicated LPAR (POWER6): ~220ms.
Shared LPAR (POWER5) : ~240ms.
Of this time, approximately 200ms is taken up by __cpu_up(). This is because
we poll every 200ms to check if the new cpu has notified it's presence
through the cpu_callin_map. We repeat this operation until the new cpu sets
the value in cpu_callin_map or 5 seconds elapse, whichever comes earlier.
However, using completion_structs instead of polling loops,
the time taken by the new processor to indicate it's presence has
found to be less than 1ms on pseries. This method however may not
work on all powerpc platforms due to the time-base synchronization code.
Keeping this in mind, we could reduce msleep polling interval from
200ms to 1ms while retaining the 5 second timeout.
With this, the time taken for a cpu online operation changes as follows:
Dedicated LPAR (POWER6): 20-25ms.
Shared LPAR (POWER5) : 60-80ms.
In both these cases, it was found that the code polls through the loop
only once indicating that 1ms is a reasonable value, atleast on pseries.
The code needs testing on other powerpc platforms.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On Thu, Aug 13, 2009 at 04:14:58PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2009-08-11 at 11:39 +0200, Bastian Blank wrote:
> > This patch just disables this driver on SMP kernels, as it is obviously
> > not supported.
> Why not remove the #error instead ? :-) I don't think it's still
> meaningful, especially since we use the timebase for delays nowadays
> which doesn't depend on the CPU frequency...
Your call. Take this one:
The build of a PowerMac 32bit kernel currently fails with
error: #warning "WARNING, CPUFREQ not recommended on SMP kernels"
Thie patch removes the not longer applicable SMP warning from the
PowerMac cpufreq code.
Signed-off-by: Bastian Blank <waldi@debian.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The ptrace POKETEXT interface allows a process to modify the text pages of
a child process being ptraced, usually to insert breakpoints via trap
instructions. The kernel eventually calls copy_to_user_page, which in turn
calls __flush_icache_range to invalidate the icache lines for the child
process.
However, this function does not work on 44x due to the icache being virtually
indexed. This was noticed by a breakpoint being triggered after it had been
cleared by ltrace on a 440EPx board. The convenient solution is to do a
flash invalidate of the icache in the __flush_icache_range function.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is an attempt at cleaning up a bit the way we handle execute
permission on powerpc. _PAGE_HWEXEC is gone, _PAGE_EXEC is now only
defined by CPUs that can do something with it, and the myriad of
#ifdef's in the I$/D$ coherency code is reduced to 2 cases that
hopefully should cover everything.
The logic on BookE is a little bit different than what it was though
not by much. Since now, _PAGE_EXEC will be set by the generic code
for executable pages, we need to filter out if they are unclean and
recover it. However, I don't expect the code to be more bloated than
it already was in that area due to that change.
I could boast that this brings proper enforcing of per-page execute
permissions to all BookE and 40x but in fact, we've had that now for
some time as a side effect of my previous rework in that area (and
I didn't even know it :-) We would only enable execute permission if
the page was cache clean and we would only cache clean it if we took
and exec fault. Since we now enforce that the later only work if
VM_EXEC is part of the VMA flags, we de-fact already enforce per-page
execute permissions... Unless I missed something
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When setting the same GPIO number, multiple IRQ shared requests will be
done without freing the previous request. It will also try to free a
failed request or an already freed IRQ if 0 was written to the gpio file.
All these oops and leaks were fixed with the following solution: keep the
previous allocated GPIO (if any) still allocated in case the new request
fails. The alternative solution would desallocate the previous allocated
GPIO and set gpio as 0.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Samuel R. C. Vale <srcvale@holoscopio.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the BIOS reports an invalid throttling state (which seems to be
fairly common after system boot), a reset is done to state T0.
Because of a check in acpi_processor_get_throttling_ptc(), the reset
never actually gets executed, which results in the error reoccurring
on every access of for example /proc/acpi/processor/CPU0/throttling.
Add a 'force' option to acpi_processor_set_throttling() to ensure
the reset really takes effect.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13389
This patch, together with the next one, fixes a regression introduced in
2.6.30, listed on the regression list. They have been available for 2.5
months now in bugzilla, but have not been picked up, despite various
reminders and without any reason given.
Google shows that numerous people are hitting this issue. The issue is in
itself relatively minor, but the bug in the code is clear.
The patches have been in all my kernels and today testing has shown that
throttling works correctly with the patches applied when the system
overheats (http://bugzilla.kernel.org/show_bug.cgi?id=13918#c14).
Signed-off-by: Frans Pop <elendil@planet.nl>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Summary:
Kernel panic arise when stack protection is enabled, since strncat will
add a null terminating byte '\0'; So in functions
like this one (wmi_query_block):
char wc[4]="WC";
....
strncat(method, block->object_id, 2);
...
the length of wc should be n+1 (wc[5]) or stack protection
fault will arise. This is not noticeable when stack protection is
disabled,but , isn't good either.
Config used: [CONFIG_CC_STACKPROTECTOR_ALL=y,
CONFIG_CC_STACKPROTECTOR=y]
Panic Trace
------------
.... stack-protector: kernel stack corrupted in : fa7b182c
2.6.30-rc8-obelisco-generic
call_trace:
[<c04a6c40>] ? panic+0x45/0xd9
[<c012925d>] ? __stack_chk_fail+0x1c/0x40
[<fa7b182c>] ? wmi_query_block+0x15a/0x162 [wmi]
[<fa7b182c>] ? wmi_query_block+0x15a/0x162 [wmi]
[<fa7e7000>] ? acer_wmi_init+0x00/0x61a [acer_wmi]
[<fa7e7135>] ? acer_wmi_init+0x135/0x61a [acer_wmi]
[<c0101159>] ? do_one_initcall+0x50+0x126
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13514
Signed-off-by: Costantino Leandro <lcostantino@gmail.com>
Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Cc: Len Brown <len.brown@intel.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The return value of the get_temp function is not checked when doing a
thermal zone update. This may lead to a critical shutdown if get_temp
fails and the content of the temp variable is incorrectly set higher than
the critical trip point.
This has been observed on a system with incorrect ACPI implementation
where the corresponding methods were not serialized and therefore
sometimes triggered ACPI errors (AE_ALREADY_EXISTS). The following
critical shutdowns indicated a temperature of 2097 C, which was obviously
wrong.
The patch adds a return value check that jumps over all trip point
evaluations printing a warning if get_temp fails. The trip points are
evaluated again on the next polling interval with successful get_temp
execution.
Signed-off-by: Michael Brunner <mibru@gmx.de>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Spotted by Hiroshi Shimamoto who also provided the test-case below.
copy_process() uses signal->count as a reference counter, but it is not.
This test case
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
void *null_thread(void *p)
{
for (;;)
sleep(1);
return NULL;
}
void *exec_thread(void *p)
{
execl("/bin/true", "/bin/true", NULL);
return null_thread(p);
}
int main(int argc, char **argv)
{
for (;;) {
pid_t pid;
int ret, status;
pid = fork();
if (pid < 0)
break;
if (!pid) {
pthread_t tid;
pthread_create(&tid, NULL, exec_thread, NULL);
for (;;)
pthread_create(&tid, NULL, null_thread, NULL);
}
do {
ret = waitpid(pid, &status, 0);
} while (ret == -1 && errno == EINTR);
}
return 0;
}
quickly creates an unkillable task.
If copy_process(CLONE_THREAD) races with de_thread()
copy_signal()->atomic(signal->count) breaks the signal->notify_count
logic, and the execing thread can hang forever in kernel space.
Change copy_process() to increment count/live only when we know for sure
we can't fail. In this case the forked thread will take care of its
reference to signal correctly.
If copy_process() fails, check CLONE_THREAD flag. If it it set - do
nothing, the counters were not changed and current belongs to the same
thread group. If it is not set, ->signal must be released in any case
(and ->count must be == 1), the forked child is the only thread in the
thread group.
We need more cleanups here, in particular signal->count should not be used
by de_thread/__exit_signal at all. This patch only fixes the bug.
Reported-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Tested-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An mlocked page might lose the isolatation race. This causes the page to
clear PG_mlocked while it remains in a VM_LOCKED vma. This means it can
be put onto the [in]active list. We can rescue it by using try_to_unmap()
in shrink_page_list().
But now, As Wu Fengguang pointed out, vmscan has a bug. If the page has
PG_referenced, it can't reach try_to_unmap() in shrink_page_list() but is
put into the active list. If the page is referenced repeatedly, it can
remain on the [in]active list without being moving to the unevictable
list.
This patch fixes it.
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <<kosaki.motohiro@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's problematic to allow signed element_nr's or total's to be passed as
part of the flex array API.
flex_array_alloc() allows total_nr_elements to be set to a negative
quantity, which is obviously erroneous.
flex_array_get() and flex_array_put() allows negative array indices in
dereferencing an array part, which could address memory mapped before
struct flex_array.
The fix is to convert all existing element_nr formals to be qualified as
unsigned. Existing checks to compare it to total_nr_elements or the max
array size based on element_size need not be changed.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The `parts' member of struct flex_array should evaluate to an incomplete
type so that sizeof() cannot be used and C99 does not require the
zero-length specification.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>