Commit Graph

19484 Commits

Author SHA1 Message Date
Linus Torvalds
2df3be967d Un-inline get_pipe_info() helper function
commit 7208364652 upstream.

This avoids some include-file hell, and the function isn't really
important enough to be inlined anyway.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:33:36 -08:00
Linus Torvalds
de6b162361 Export 'get_pipe_info()' to other users
commit c66fb34794 upstream.

And in particular, use it in 'pipe_fcntl()'.

The other pipe functions do not need to use the 'careful' version, since
they are only ever called for things that are already known to be pipes.

The normal read/write/ioctl functions are called through the file
operations structures, so if a file isn't a pipe, they'd never get
called.  But pipe_fcntl() is special, and called directly from the
generic fcntl code, and needs to use the same careful function that the
splice code is using.

Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:33:36 -08:00
Linus Torvalds
68fadbe6fc Rename 'pipe_info()' to 'get_pipe_info()'
commit 71993e62a4 upstream.

.. and change it to take the 'file' pointer instead of an inode, since
that's what all users want anyway.

The renaming is preparatory to exporting it to other users.  The old
'pipe_info()' name was too generic and is already used elsewhere, so
before making the function public we need to use a more specific name.

Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:33:36 -08:00
Sergey Senozhatsky
6e551814bc ext4: fix NULL pointer dereference in print_daily_error_info()
commit a1c6c5698d upstream.

Fix NULL pointer dereference in print_daily_error_info, when
called on unmounted fs (EXT4_SB(sb) returns NULL), by removing error
reporting timer in ext4_put_super.

Google-Bug-Id: 3017663

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:33:31 -08:00
Eric W. Biederman
f6b88b337b Revert "vfs: show unreachable paths in getcwd and proc"
commit 7b2a69ba70 upstream.

Because it caused a chroot ttyname regression in 2.6.36.

As of 2.6.36 ttyname does not work in a chroot.  It has already been
reported that screen breaks, and for me this breaks an automated
distribution testsuite, that I need to preserve the ability to run the
existing binaries on for several more years.  glibc 2.11.3 which has a
fix for this is not an option.

The root cause of this breakage is:

    commit 8df9d1a414
    Author: Miklos Szeredi <mszeredi@suse.cz>
    Date:   Tue Aug 10 11:41:41 2010 +0200

    vfs: show unreachable paths in getcwd and proc

    Prepend "(unreachable)" to path strings if the path is not reachable
    from the current root.

    Two places updated are
     - the return string from getcwd()
     - and symlinks under /proc/$PID.

    Other uses of d_path() are left unchanged (we know that some old
    software crashes if /proc/mounts is changed).

    Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

So remove the nice sounding, but ultimately ill advised change to how
/proc/fd symlinks work.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:33:22 -08:00
Jeff Layton
8dac975b17 cifs: fix parsing of hostname in dfs referrals
commit ba03864872 upstream.

The DFS referral parsing code does a memchr() call to find the '\\'
delimiter that separates the hostname in the referral UNC from the
sharename. It then uses that value to set the length of the hostname via
pointer subtraction.  Instead of subtracting the start of the hostname
however, it subtracts the start of the UNC, which causes the code to
pass in a hostname length that is 2 bytes too long.

Regression introduced in commit 1a4240f4.

Reported-and-Tested-by: Robbert Kouprie <robbert@exx.nl>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Wang Lei <wang840925@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:33:15 -08:00
Oskar Schirmer
1fdd366b9d cifs: fix another memleak, in cifs_root_iget
commit a7851ce73b upstream.

cifs_root_iget allocates full_path through
cifs_build_path_to_root, but fails to kfree it upon
cifs_get_inode_info* failure.

Make all failure exit paths traverse clean up
handling at the end of the function.

Signed-off-by: Oskar Schirmer <oskar@scara.com>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:33:15 -08:00
Ken Sumrall
2f240adb90 fuse: fix attributes after open(O_TRUNC)
commit a0822c5577 upstream.

The attribute cache for a file was not being cleared when a file is opened
with O_TRUNC.

If the filesystem's open operation truncates the file ("atomic_o_trunc"
feature flag is set) then the kernel should invalidate the cached st_mtime
and st_ctime attributes.

Also i_size should be explicitly be set to zero as it is used sometimes
without refreshing the cache.

Signed-off-by: Ken Sumrall <ksumrall@android.com>
Cc: Anfei <anfei.zhou@gmail.com>
Cc: "Anand V. Avati" <avati@gluster.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:33:13 -08:00
Oleg Nesterov
b7d121bbfa exec: copy-and-paste the fixes into compat_do_execve() paths
commit 114279be21 upstream.

Note: this patch targets 2.6.37 and tries to be as simple as possible.
That is why it adds more copy-and-paste horror into fs/compat.c and
uglifies fs/exec.c, this will be cleanuped later.

compat_copy_strings() plays with bprm->vma/mm directly and thus has
two problems: it lacks the RLIMIT_STACK check and argv/envp memory
is not visible to oom killer.

Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
as do_execve() does.

Add the fatal_signal_pending/cond_resched checks into compat_count() and
compat_copy_strings(), this matches the code in fs/exec.c and certainly
makes sense.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:57 -08:00
Oleg Nesterov
35cc70fc58 exec: make argv/envp memory visible to oom-killer
commit 3c77f84572 upstream.

Brad Spengler published a local memory-allocation DoS that
evades the OOM-killer (though not the virtual memory RLIMIT):
http://www.grsecurity.net/~spender/64bit_dos.c

execve()->copy_strings() can allocate a lot of memory, but
this is not visible to oom-killer, nobody can see the nascent
bprm->mm and take it into account.

With this patch get_arg_page() increments current's MM_ANONPAGES
counter every time we allocate the new page for argv/envp. When
do_execve() succeds or fails, we change this counter back.

Technically this is not 100% correct, we can't know if the new
page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
I don't think this really matters and everything becomes correct
once exec changes ->mm or fails.

Reported-by: Brad Spengler <spender@grsecurity.net>
Reviewed-and-discussed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:57 -08:00
Jens Axboe
89dc7fe165 bio: take care not overflow page count when mapping/copying user data
commit cb4644cac4 upstream.

If the iovec is being set up in a way that causes uaddr + PAGE_SIZE
to overflow, we could end up attempting to map a huge number of
pages. Check for this invalid input type.

Reported-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:53 -08:00
Jeff Layton
f0b7bc25f0 nfs: handle lock context allocation failures in nfs_create_request
commit 015f0212d5 upstream.

nfs_get_lock_context can return NULL on an allocation failure.
Regression introduced by commit f11ac8db.

Reported-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:44 -08:00
Roberto Sassu
9e49bc3724 ecryptfs: call vfs_setxattr() in ecryptfs_setxattr()
commit 48b512e685 upstream.

Ecryptfs is a stackable filesystem which relies on lower filesystems the
ability of setting/getting extended attributes.

If there is a security module enabled on the system it updates the
'security' field of inodes according to the owned extended attribute set
with the function vfs_setxattr().  When this function is performed on a
ecryptfs filesystem the 'security' field is not updated for the lower
filesystem since the call security_inode_post_setxattr() is missing for
the lower inode.
Further, the call security_inode_setxattr() is missing for the lower inode,
leading to policy violations in the security module because specific
checks for this hook are not performed (i. e. filesystem
'associate' permission on SELinux is not checked for the lower filesystem).

This patch replaces the call of the setxattr() method of the lower inode
in the function ecryptfs_setxattr() with vfs_setxattr().

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Cc: Dustin Kirkland <kirkland@canonical.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:39 -08:00
Tyler Hicks
d42809b6fa eCryptfs: Clear LOOKUP_OPEN flag when creating lower file
commit 2e21b3f124 upstream.

eCryptfs was passing the LOOKUP_OPEN flag through to the lower file
system, even though ecryptfs_create() doesn't support the flag. A valid
filp for the lower filesystem could be returned in the nameidata if the
lower file system's create() function supported LOOKUP_OPEN, possibly
resulting in unencrypted writes to the lower file.

However, this is only a potential problem in filesystems (FUSE, NFS,
CIFS, CEPH, 9p) that eCryptfs isn't known to support today.

https://bugs.launchpad.net/ecryptfs/+bug/641703

Reported-by: Kevin Buhr
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:38 -08:00
Richard Weinberger
b470e76854 hostfs: fix UML crash: remove f_spare from hostfs
commit 1b627d5771 upstream.

365b1818 ("add f_flags to struct statfs(64)") resized f_spare within
struct statfs which caused a UML crash.  There is no need to copy f_spare.

Signed-off-by: Richard Weinberger <richard@nod.at>
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:31 -08:00
Frederic Weisbecker
3b8ccb80d9 reiserfs: don't acquire lock recursively in reiserfs_acl_chmod
commit 238af8751f upstream.

reiserfs_acl_chmod() can be called by reiserfs_set_attr() and then take
the reiserfs lock a second time.  Thereafter it may call journal_begin()
that definitely requires the lock not to be nested in order to release
it before taking the journal mutex because the reiserfs lock depends on
the journal mutex already.

So, aviod nesting the lock in reiserfs_acl_chmod().

Reported-by: Pawel Zawora <pzawora@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Pawel Zawora <pzawora@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:26 -08:00
Frederic Weisbecker
fec32bc579 reiserfs: fix inode mutex - reiserfs lock misordering
commit da905873ef upstream.

reiserfs_unpack() locks the inode mutex with reiserfs_mutex_lock_safe()
to protect against reiserfs lock dependency.  However this protection
requires to have the reiserfs lock to be locked.

This is the case if reiserfs_unpack() is called by reiserfs_ioctl but
not from reiserfs_quota_on() when it tries to unpack tails of quota
files.

Fix the ordering of the two locks in reiserfs_unpack() to fix this
issue.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: Markus Gapp <markus.gapp@gmx.net>
Reported-by: Jan Kara <jack@suse.cz>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:26 -08:00
Trond Myklebust
34e1c8500c NFS: Don't SIGBUS if nfs_vm_page_mkwrite races with a cache invalidation
commit bc4866b6e0 upstream.

In the case where we lock the page, and then find out that the page has
been thrown out of the page cache, we should just return VM_FAULT_NOPAGE.
This is what block_page_mkwrite() does in these situations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:21 -08:00
Trond Myklebust
a6b346485b NFSv4: Fix open recovery
commit b0ed9dbc24 upstream.

NFSv4 open recovery is currently broken: since we do not clear the
state->flags states before attempting recovery, we end up with the
'can_open_cached()' function triggering. This again leads to no OPEN call
being put on the wire.

Reported-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:20 -08:00
Trond Myklebust
bd863c343d NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlers
commit ae1007d37e upstream.

In the case of a server reboot, the state recovery thread starts by calling
nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when
the server reboots while the client is in the middle of recovery.

However, if the client has already marked the nfs4_state as requiring
reboot recovery, then the above behaviour will cause the recovery thread to
treat the open as if it was part of such an edge condition: the open will
be recovered as if it was part of a lease expiration (and all the locks
will be lost).
Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it
to the recovery thread to do this for us.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:20 -08:00
Trond Myklebust
4e16957a4c NFSv4: Don't call nfs4_reclaim_complete() on receiving NFS4ERR_STALE_CLIENTID
commit 6eaa61496f upstream.

If the server sends us an NFS4ERR_STALE_CLIENTID while the state management
thread is busy reclaiming state, we do want to treat all state that wasn't
reclaimed before the STALE_CLIENTID as if a network partition occurred (see
the edge conditions described in RFC3530 and RFC5661).
What we do not want to do is to send an nfs4_reclaim_complete(), since we
haven't yet even started reclaiming state after the server rebooted.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:32:20 -08:00
Jens Axboe
e085dd9e93 block: limit vec count in bio_kmalloc() and bio_alloc_map_data()
commit f3f63c1c28 upstream.

Reported-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09 13:31:54 -08:00
Steve Dickson
9c1560611d Fixed Regression in NFS Direct I/O path
commit 568a810d7e upstream.

A typo, introduced by commit f11ac8db, in the nfs_direct_write()
routine causes writes with O_DIRECT set to fail with a ENOMEM error.

Found-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22 11:03:17 -08:00
Nicolas Kaiser
6021775fb6 pipe: fix failure to return error code on ->confirm()
commit e5953cbdff upstream.

The arguments were transposed, we want to assign the error code to
'ret', which is being returned.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22 11:03:06 -08:00
Suresh Jayaraman
30d01aacb2 cifs: fix broken oplock handling
commit aa91c7e4ab upstream.

cifs_new_fileinfo() does not use the 'oplock' value from the callers. Instead,
it sets it to REQ_OPLOCK which seems wrong. We should be using the oplock value
obtained from the Server to set the inode's clientCanCacheAll or
clientCanCacheRead flags. Fix this by passing oplock from the callers to
cifs_new_fileinfo().

This change dates back to commit a6ce4932 (2.6.30-rc3). So, all the affected
versions will need this fix. Please Cc stable once reviewed and accepted.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22 11:03:02 -08:00
Linus Torvalds
8fd01d6cfb Export dump_{write,seek} to binary loader modules
If you build aout support as a module, you'll want these exported.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-14 19:15:28 -07:00
Linus Torvalds
3aa0ce825a Un-inline the core-dump helper functions
Tony Luck reports that the addition of the access_ok() check in commit
0eead9ab41 ("Don't dump task struct in a.out core-dumps") broke the
ia64 compile due to missing the necessary header file includes.

Rather than add yet another include (<asm/unistd.h>) to make everything
happy, just uninline the silly core dump helper functions and move the
bodies to fs/exec.c where they make a lot more sense.

dump_seek() in particular was too big to be an inline function anyway,
and none of them are in any way performance-critical.  And we really
don't need to mess up our include file headers more than they already
are.

Reported-and-tested-by: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-14 14:32:06 -07:00
Linus Torvalds
0eead9ab41 Don't dump task struct in a.out core-dumps
akiphie points out that a.out core-dumps have that odd task struct
dumping that was never used and was never really a good idea (it goes
back into the mists of history, probably the original core-dumping
code).  Just remove it.

Also do the access_ok() check on dump_write().  It probably doesn't
matter (since normal filesystems all seem to do it anyway), but he
points out that it's normally done by the VFS layer, so ...

[ I suspect that we should possibly do "vfs_write()" instead of
  calling ->write directly.  That also does the whole fsnotify and write
  statistics thing, which may or may not be a good idea. ]

And just to be anal, do this all for the x86-64 32-bit a.out emulation
code too, even though it's not enabled (and won't currently even
compile)

Reported-by: akiphie <akiphie@lavabit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-14 10:57:40 -07:00
Linus Torvalds
8c35bf368c Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlink
2010-10-13 16:51:29 -07:00
J. Bruce Fields
b1e86db1de nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlink
As of commit 43a9aa64a2 "NFSD:
Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR", we sometimes call
fh_unlock on a filehandle that isn't fully initialized.

We should fix up the callers, but as a quick fix it is also sufficient
just to remove this assertion.

Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-13 15:48:55 -04:00
Eric Paris
7c5347733d fanotify: disable fanotify syscalls
This patch disables the fanotify syscalls by just not building them and
letting the cond_syscall() statements in kernel/sys_ni.c redirect them
to sys_ni_syscall().

It was pointed out by Tvrtko Ursulin that the fanotify interface did not
include an explicit prioritization between groups.  This is necessary
for fanotify to be usable for hierarchical storage management software,
as they must get first access to the file, before inotify-like notifiers
see the file.

This feature can be added in an ABI compatible way in the next release
(by using a number of bits in the flags field to carry the info) but it
was suggested by Alan that maybe we should just hold off and do it in
the next cycle, likely with an (new) explicit argument to the syscall.
I don't like this approach best as I know people are already starting to
use the current interface, but Alan is all wise and noone on list backed
me up with just using what we have.  I feel this is needlessly ripping
the rug out from under people at the last minute, but if others think it
needs to be a new argument it might be the best way forward.

Three choices:
Go with what we got (and implement the new feature next cycle).  Add a
new field right now (and implement the new feature next cycle).  Wait
till next cycle to release the ABI (and implement the new feature next
cycle).  This is number 3.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-11 18:15:28 -07:00
Linus Torvalds
8dc54e49ce Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: update issue_seq on cap grant
  ceph: send cap release message early on failed revoke.
  ceph: Update max_len with minimum required size
  ceph: Fix return value of encode_fh function
  ceph: avoid null deref in osd request error path
  ceph: fix list_add usage on unsafe_writes list
2010-10-09 12:03:46 -07:00
Linus Torvalds
267aeb6c14 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  exofs: Fix double page_unlock BUG in write_begin/end
2010-10-09 12:03:23 -07:00
Boaz Harrosh
f17b1f9f1a exofs: Fix double page_unlock BUG in write_begin/end
This BUG is there since the first submit of the code, but only triggered
in last Kernel. It's timing related do to the asynchronous object-creation
behaviour of exofs. (Which should be investigated farther)

The bug is obvious hence the fixed.

Signed-off-by: Boaz Harrosh <Boaz Harrosh bharrosh@panasas.com>
2010-10-08 11:26:54 -04:00
Linus Torvalds
5710c2b275 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: properly account for reclaimed inodes
2010-10-07 13:45:26 -07:00
Sage Weil
d91f2438d8 ceph: update issue_seq on cap grant
We need to update the issue_seq on any grant operation, be it via an MDS
reply or a separate grant message.  The update in the grant path was
missing.  This broke cap release for inodes in which the MDS sent an
explicit grant message that was not soon after followed by a successful
MDS reply on the same inode.

Also fix the signedness on seq locals.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07 08:01:50 -07:00
Greg Farnum
21b559de56 ceph: send cap release message early on failed revoke.
If an MDS tries to revoke caps that we don't have, we want to send
releases early since they probably contain the caps message the MDS
is looking for.

Previously, we only sent the messages if we didn't have the inode either. But
in a multi-mds system we can retain the inode after dropping all caps for
a single MDS.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07 08:00:24 -07:00
Aneesh Kumar K.V
bba0cd0e3d ceph: Update max_len with minimum required size
encode_fh on error should update max_len with minimum required
size, so that caller can redo the call with the reallocated buffer.
This is required with open by handle patch series

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07 08:00:24 -07:00
Aneesh Kumar K.V
92923dcbfc ceph: Fix return value of encode_fh function
encode_fh function should return 255 on error as done by other file
system to indicate EOVERFLOW. Also max_len is in sizeof(u32) units
and not in bytes.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07 08:00:23 -07:00
Sage Weil
6bc18876ba ceph: avoid null deref in osd request error path
If we interrupt an osd request, we call __cancel_request, but it wasn't
verifying that req->r_osd was non-NULL before dereferencing it.  This could
cause a crash if osds were flapping and we aborted a request on said osd.

Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07 08:00:23 -07:00
Henry C Chang
936aeb5c4a ceph: fix list_add usage on unsafe_writes list
Fix argument order.

Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07 08:00:23 -07:00
Johannes Weiner
081003fff4 xfs: properly account for reclaimed inodes
When marking an inode reclaimable, a per-AG counter is increased, the
inode is tagged reclaimable in its per-AG tree, and, when this is the
first reclaimable inode in the AG, the AG entry in the per-mount tree
is also tagged.

When an inode is finally reclaimed, however, it is only deleted from
the per-AG tree.  Neither the counter is decreased, nor is the parent
tree's AG entry untagged properly.

Since the tags in the per-mount tree are not cleared, the inode
shrinker iterates over all AGs that have had reclaimable inodes at one
point in time.

The counters on the other hand signal an increasing amount of slab
objects to reclaim.  Since "70e60ce xfs: convert inode shrinker to
per-filesystem context" this is not a real issue anymore because the
shrinker bails out after one iteration.

But the problem was observable on a machine running v2.6.34, where the
reclaimable work increased and each process going into direct reclaim
eventually got stuck on the xfs inode shrinking path, trying to scan
several million objects.

Fix this by properly unwinding the reclaimable-state tracking of an
inode when it is reclaimed.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@kernel.org
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-06 22:35:48 -05:00
Linus Torvalds
089eed29b4 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  writeback: always use sb->s_bdi for writeback purposes
2010-10-06 11:11:18 -07:00
Linus Torvalds
8fe9793af0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: Initialize total_len in fuse_retrieve()
2010-10-06 09:50:41 -07:00
Christoph Hellwig
aaead25b95 writeback: always use sb->s_bdi for writeback purposes
We currently use struct backing_dev_info for various different purposes.
Originally it was introduced to describe a backing device which includes
an unplug and congestion function and various bits of readahead information
and VM-relevant flags.  We're also using for tracking dirty inodes for
writeback.

To make writeback properly find all inodes we need to only access the
per-filesystem backing_device pointed to by the superblock in ->s_bdi
inside the writeback code, and not the instances pointeded to by
inode->i_mapping->backing_dev which can be overriden by special devices
or might not be set at all by some filesystems.

Long term we should split out the writeback-relevant bits of struct
backing_device_info (which includes more than the current bdi_writeback)
and only point to it from the superblock while leaving the traditional
backing device as a separate structure that can be overriden by devices.

The one exception for now is the block device filesystem which really
wants different writeback contexts for it's different (internal) inodes
to handle the writeout more efficiently.  For now we do this with
a hack in fs-writeback.c because we're so late in the cycle, but in
the future I plan to replace this with a superblock method that allows
for multiple writeback contexts per filesystem.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-04 14:25:33 +02:00
Geert Uytterhoeven
0157443c56 fuse: Initialize total_len in fuse_retrieve()
fs/fuse/dev.c:1357: warning: ‘total_len’ may be used uninitialized in this
function

Initialize total_len to zero, else its value will be undefined.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2010-10-04 10:45:32 +02:00
Linus Torvalds
c6ea21e35b Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: prevent infinite recursion in cifs_reconnect_tcon
  cifs: set backing_dev_info on new S_ISREG inodes
2010-10-01 15:03:37 -07:00
Frederic Weisbecker
9d8117e72b reiserfs: fix unwanted reiserfs lock recursion
Prevent from recursively locking the reiserfs lock in reiserfs_unpack()
because we may call journal_begin() that requires the lock to be taken
only once, otherwise it won't be able to release the lock while taking
other mutexes, ending up in inverted dependencies between the journal
mutex and the reiserfs lock for example.

This fixes:

  =======================================================
  [ INFO: possible circular locking dependency detected ]
  2.6.35.4.4a #3
  -------------------------------------------------------
  lilo/1620 is trying to acquire lock:
   (&journal->j_mutex){+.+...}, at: [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]

  but task is already holding lock:
   (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
         [<c10562b7>] lock_acquire+0x67/0x80
         [<c12facad>] __mutex_lock_common+0x4d/0x410
         [<c12fb0c8>] mutex_lock_nested+0x18/0x20
         [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]
         [<d0325c06>] do_journal_begin_r+0x86/0x340 [reiserfs]
         [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
         [<d0315be4>] reiserfs_remount+0x224/0x530 [reiserfs]
         [<c10b6a20>] do_remount_sb+0x60/0x110
         [<c10cee25>] do_mount+0x625/0x790
         [<c10cf014>] sys_mount+0x84/0xb0
         [<c12fca3d>] syscall_call+0x7/0xb

  -> #0 (&journal->j_mutex){+.+...}:
         [<c10560f6>] __lock_acquire+0x1026/0x1180
         [<c10562b7>] lock_acquire+0x67/0x80
         [<c12facad>] __mutex_lock_common+0x4d/0x410
         [<c12fb0c8>] mutex_lock_nested+0x18/0x20
         [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]
         [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
         [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
         [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs]
         [<c10db9db>] __block_prepare_write+0x1bb/0x3a0
         [<c10dbbe6>] block_prepare_write+0x26/0x40
         [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs]
         [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs]
         [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs]
         [<c10c3188>] vfs_ioctl+0x28/0xa0
         [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0
         [<c10c3eb3>] sys_ioctl+0x63/0x70
         [<c12fca3d>] syscall_call+0x7/0xb

  other info that might help us debug this:

  2 locks held by lilo/1620:
   #0:  (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d032945a>] reiserfs_unpack+0x6a/0x120 [reiserfs]
   #1:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]

  stack backtrace:
  Pid: 1620, comm: lilo Not tainted 2.6.35.4.4a #3
  Call Trace:
   [<c10560f6>] __lock_acquire+0x1026/0x1180
   [<c10562b7>] lock_acquire+0x67/0x80
   [<c12facad>] __mutex_lock_common+0x4d/0x410
   [<c12fb0c8>] mutex_lock_nested+0x18/0x20
   [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]
   [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
   [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
   [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs]
   [<c10db9db>] __block_prepare_write+0x1bb/0x3a0
   [<c10dbbe6>] block_prepare_write+0x26/0x40
   [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs]
   [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs]
   [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs]
   [<c10c3188>] vfs_ioctl+0x28/0xa0
   [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0
   [<c10c3eb3>] sys_ioctl+0x63/0x70
   [<c12fca3d>] syscall_call+0x7/0xb

Reported-by: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: All since 2.6.32 <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01 10:50:59 -07:00
Frederic Weisbecker
3f259d092c reiserfs: fix dependency inversion between inode and reiserfs mutexes
The reiserfs mutex already depends on the inode mutex, so we can't lock
the inode mutex in reiserfs_unpack() without using the safe locking API,
because reiserfs_unpack() is always called with the reiserfs mutex locked.

This fixes:

  =======================================================
  [ INFO: possible circular locking dependency detected ]
  2.6.35c #13
  -------------------------------------------------------
  lilo/1606 is trying to acquire lock:
   (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs]

  but task is already holding lock:
   (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs]

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
         [<c1056347>] lock_acquire+0x67/0x80
         [<c12f083d>] __mutex_lock_common+0x4d/0x410
         [<c12f0c58>] mutex_lock_nested+0x18/0x20
         [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs]
         [<d0329e9a>] reiserfs_lookup_privroot+0x2a/0x90 [reiserfs]
         [<d0316b81>] reiserfs_fill_super+0x941/0xe60 [reiserfs]
         [<c10b7d17>] get_sb_bdev+0x117/0x170
         [<d0313e21>] get_super_block+0x21/0x30 [reiserfs]
         [<c10b74ba>] vfs_kern_mount+0x6a/0x1b0
         [<c10b7659>] do_kern_mount+0x39/0xe0
         [<c10cebe0>] do_mount+0x340/0x790
         [<c10cf0b4>] sys_mount+0x84/0xb0
         [<c12f25cd>] syscall_call+0x7/0xb

  -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
         [<c1056186>] __lock_acquire+0x1026/0x1180
         [<c1056347>] lock_acquire+0x67/0x80
         [<c12f083d>] __mutex_lock_common+0x4d/0x410
         [<c12f0c58>] mutex_lock_nested+0x18/0x20
         [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs]
         [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs]
         [<c10c3228>] vfs_ioctl+0x28/0xa0
         [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0
         [<c10c3f53>] sys_ioctl+0x63/0x70
         [<c12f25cd>] syscall_call+0x7/0xb

  other info that might help us debug this:

  1 lock held by lilo/1606:
   #0:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs]

  stack backtrace:
  Pid: 1606, comm: lilo Not tainted 2.6.35c #13
  Call Trace:
   [<c1056186>] __lock_acquire+0x1026/0x1180
   [<c1056347>] lock_acquire+0x67/0x80
   [<c12f083d>] __mutex_lock_common+0x4d/0x410
   [<c12f0c58>] mutex_lock_nested+0x18/0x20
   [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs]
   [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs]
   [<c10c3228>] vfs_ioctl+0x28/0xa0
   [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0
   [<c10c3f53>] sys_ioctl+0x63/0x70
   [<c12f25cd>] syscall_call+0x7/0xb

Reported-by: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: <stable@kernel.org>		[2.6.32 and later]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01 10:50:59 -07:00
Jiri Olsa
3036e7b490 proc: make /proc/pid/limits world readable
Having the limits file world readable will ease the task of system
management on systems where root privileges might be restricted.

Having admin restricted with root priviledges, he/she could not check
other users process' limits.

Also it'd align with most of the /proc stat files.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Eugene Teo <eugene@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01 10:50:59 -07:00