Commit Graph

15827 Commits

Author SHA1 Message Date
David Woodhouse
3a03456dc3 jffs2: Fix long-standing bug with symlink garbage collection.
commit 2e16cfca6e upstream.

Ever since jffs2_garbage_collect_metadata() was first half-written in
February 2001, it's been broken on architectures where 'char' is signed.
When garbage collecting a symlink with target length above 127, the payload
length would end up negative, causing interesting and bad things to happen.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:05:52 -08:00
Jan Kara
62965d8347 ext3: Fix data / filesystem corruption when write fails to copy data
commit 68eb3db083 upstream.

When ext3_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks already
instantiated beyond i_size. Although these blocks were never inside i_size, we
have to truncate pagecache of these blocks so that corresponding buffers get
unmapped. Otherwise subsequent __block_prepare_write (called because we are
retrying the write) will find the buffers mapped, not call ->get_block, and
thus the page will be backed by already freed blocks leading to filesystem and
data corruption.

Reported-by: James Y Knight <foom@fuhm.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:05:39 -08:00
Mathieu Desnoyers
9f2813ec1f debugfs: fix create mutex racy fops and private data
commit d3a3b0adad upstream.

Setting fops and private data outside of the mutex at debugfs file
creation introduces a race where the files can be opened with the wrong
file operations and private data.  It is easy to trigger with a process
waiting on file creation notification.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:04:16 -08:00
Sukadev Bhattiprolu
5451f13711 devpts_get_tty() should validate inode
commit edfacdd6f8 upstream.

devpts_get_tty() assumes that the inode passed in is associated with a valid
pty.  But if the only reference to the pty is via a bind-mount, the inode
passed to devpts_get_tty() while valid, would refer to a pty that no longer
exists.

With a lot of debug effort, Grzegorz Nosek developed a small program (see
below) to reproduce a crash on recent kernels. This crash is a regression
introduced by the commit:

	commit 527b3e4773
	Author: Sukadev Bhattiprolu <sukadev@us.ibm.com>
	Date:   Mon Oct 13 10:43:08 2008 +0100

To fix, ensure that the dentry associated with the inode has not yet been
deleted/unhashed by devpts_pty_kill().

See also:
https://lists.linux-foundation.org/pipermail/containers/2009-July/019273.html

tty-bug.c:

#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <stdlib.h>
#include <sys/mount.h>
#include <sys/signal.h>
#include <unistd.h>
#include <stdio.h>

#include <linux/fs.h>

void dummy(int sig)
{
}

static int child(void *unused)
{
	int fd;

	signal(SIGINT, dummy); signal(SIGHUP, dummy);
	pause(); /* cheesy synchronisation to wait for /dev/pts/0 to appear */

	mount("/dev/pts/0", "/dev/console", NULL, MS_BIND, NULL);
	sleep(2);

	fd = open("/dev/console", O_RDWR);
	dup(0); dup(0);
	write(1, "Hello world!\n", sizeof("Hello world!\n")-1);
	return 0;
}

int main(void)
{
	pid_t pid;
	char *stack;

	stack = malloc(16384);
	pid = clone(child, stack+16384, CLONE_NEWNS|SIGCHLD, NULL);

	open("/dev/ptmx", O_RDWR|O_NOCTTY|O_NONBLOCK);

	unlockpt(fd); grantpt(fd);

	sleep(2);
	kill(pid, SIGHUP);
	sleep(1);
	return 0; /* exit before child opens /dev/console */
}

Reported-by: Grzegorz Nosek <root@localdomain.pl>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:04:15 -08:00
Trond Myklebust
e4f676cea7 NFS: Fix nfs_migrate_page()
commit 190f38e5ce upstream.

The call to migrate_page() will cause the page->private field to be
cleared.
Also fix up the locking around the page->private transfer, so that we ensure
that calls to nfs_page_find_request() don't end up racing.

Finally, fix up a double free bug: nfs_unlock_request() already calls
nfs_release_request() for us...

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Tested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:04:09 -08:00
Amerigo Wang
037b7867ec hfs: fix a potential buffer overflow
commit ec81aecb29 upstream.

A specially-crafted Hierarchical File System (HFS) filesystem could cause
a buffer overflow to occur in a process's kernel stack during a memcpy()
call within the hfs_bnode_read() function (at fs/hfs/bnode.c:24).  The
attacker can provide the source buffer and length, and the destination
buffer is a local variable of a fixed length.  This local variable (passed
as "&entry" from fs/hfs/dir.c:112 and allocated on line 60) is stored in
the stack frame of hfs_bnode_read()'s caller, which is hfs_readdir().
Because the hfs_readdir() function executes upon any attempt to read a
directory on the filesystem, it gets called whenever a user attempts to
inspect any filesystem contents.

[amwang@redhat.com: modify this patch and fix coding style problems]
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Anderson <anderson@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:04:08 -08:00
Theodore Ts'o
b71a005bac jbd2: don't wipe the journal on a failed journal checksum
commit e6a47428de upstream.

If there is a failed journal checksum, don't reset the journal.  This
allows for userspace programs to decide how to recover from this
situation.  It may be that ignoring the journal checksum failure might
be a better way of recovering the file system.  Once we add per-block
checksums, we can definitely do better.  Until then, a system
administrator can try backing up the file system image (or taking a
snapshot) and and trying to determine experimentally whether ignoring
the checksum failure or aborting the journal replay results in less
data loss.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:56 -08:00
Theodore Ts'o
abb247066f ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem)
(cherry picked from commit fab3a549e2)

Fix the following potential circular locking dependency between
mm->mmap_sem and ei->i_data_sem:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.32-04115-gec044c5 #37
    -------------------------------------------------------
    ureadahead/1855 is trying to acquire lock:
     (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac

    but task is already holding lock:
     (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&ei->i_data_sem){++++..}:
           [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f
           [<ffffffff81099e7e>] lock_acquire+0xdc/0x102
           [<ffffffff81516633>] down_read+0x51/0x84
           [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5
           [<ffffffff811a3453>] ext4_get_block+0xab/0xef
           [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d
           [<ffffffff81155360>] mpage_readpages+0xd0/0x114
           [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f
           [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc
           [<ffffffff810f86f2>] ra_submit+0x21/0x25
           [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c
           [<ffffffff81107b97>] __do_fault+0x55/0x3a2
           [<ffffffff81109db0>] handle_mm_fault+0x327/0x734
           [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa
           [<ffffffff81518205>] page_fault+0x25/0x30
           [<ffffffff812a34d8>] clear_user+0x38/0x3c
           [<ffffffff81167e16>] padzero+0x20/0x31
           [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed
           [<ffffffff81130e95>] search_binary_handler+0xc2/0x259
           [<ffffffff81166d64>] load_script+0x1b8/0x1cc
           [<ffffffff81130e95>] search_binary_handler+0xc2/0x259
           [<ffffffff8113255f>] do_execve+0x1ce/0x2cf
           [<ffffffff81027494>] sys_execve+0x43/0x5a
           [<ffffffff8102918a>] stub_execve+0x6a/0xc0

    -> #0 (&mm->mmap_sem){++++++}:
           [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f
           [<ffffffff81099e7e>] lock_acquire+0xdc/0x102
           [<ffffffff81107251>] might_fault+0x89/0xac
           [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda
           [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157
           [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1
           [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159
           [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6
           [<ffffffff811392ca>] sys_ioctl+0x56/0x79
           [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b

    other info that might help us debug this:

    1 lock held by ureadahead/1855:
     #0:  (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159

    stack backtrace:
    Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37
    Call Trace:
     [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7
     [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f
     [<ffffffff8102f229>] ? sched_clock+0x9/0xd
     [<ffffffff81099e7e>] lock_acquire+0xdc/0x102
     [<ffffffff81107224>] ? might_fault+0x5c/0xac
     [<ffffffff81107251>] might_fault+0x89/0xac
     [<ffffffff81107224>] ? might_fault+0x5c/0xac
     [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c
     [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda
     [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157
     [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157
     [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1
     [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159
     [<ffffffff81107224>] ? might_fault+0x5c/0xac
     [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6
     [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95
     [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b
     [<ffffffff811392ca>] sys_ioctl+0x56/0x79
     [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:28 -08:00
Akira Fujita
0fd023ecf1 ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT
(cherry picked from commit 4a58579b9e)

This patch fixes three problems in the handling of the
EXT4_IOC_MOVE_EXT ioctl:

1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for
original and donor files, but they allow the illegal write access to
donor file, since donor file is overwritten by original file data.  To
fix this problem, change access mode checks of original (r->r/w) and
donor (r->w) files.

2.  Disallow the use of donor files that have a setuid or setgid bits.

3.  Call mnt_want_write() and mnt_drop_write() before and after
ext4_move_extents() calling to get write access to a mount.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:26 -08:00
Jan Kara
eebb744d30 ext4: Wait for proper transaction commit on fsync
(cherry picked from commit b436b9bef8)

We cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction
commit. What we do is that we track which transaction has last changed
the inode and which transaction last changed allocation and force it to
disk on fsync.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:25 -08:00
Dmitry Monakhov
caa305aa34 ext4: fix incorrect block reservation on quota transfer.
(cherry picked from commit 194074acac)

Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid
This means that we may end-up with transferring all quotas. Add
we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in
case of QUOTA_INIT_BLOCKS.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:23 -08:00
Dmitry Monakhov
da2068b384 ext4: quota macros cleanup
(cherry picked from commit 5aca07eb7d)

Currently all quota block reservation macros contains hard-coded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:22 -08:00
Dmitry Monakhov
6798788a72 ext4: ext4_get_reserved_space() must return bytes instead of blocks
(cherry picked from commit 8aa6790f87)

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:21 -08:00
Curt Wohlgemuth
637b13106b ext4: remove blocks from inode prealloc list on failure
(cherry picked from commit b844167edc)

This fixes a leak of blocks in an inode prealloc list if device failures
cause ext4_mb_mark_diskspace_used() to fail.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:16 -08:00
Josef Bacik
1cd3f1980c ext4: wait for log to commit when umounting
(cherry picked from commit d4edac314e)

There is a potential race when a transaction is committing right when
the file system is being umounting.  This could reduce in a race
because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super
before the commit code calls a callback so the mballoc code can
release freed blocks in the transaction, resulting in a panic trying
to access the freed s_group_info.

The fix is to wait for the transaction to finish committing before we
shutdown the multiblock allocator.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:14 -08:00
Jan Kara
35a6f78249 ext4: Avoid data / filesystem corruption when write fails to copy data
(cherry picked from commit b9a4207d5e)

When ext4_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks
already instantiated beyond i_size.  Although these blocks were never
inside i_size, we have to truncate the pagecache of these blocks so
that corresponding buffers get unmapped.  Otherwise subsequent
__block_prepare_write (called because we are retrying the write) will
find the buffers mapped, not call ->get_block, and thus the page will
be backed by already freed blocks leading to filesystem and data
corruption.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:13 -08:00
Roel Kluin
66c3a71833 ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks()
(cherry picked from commit c09eef305d)

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:12 -08:00
Theodore Ts'o
86d291a396 jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer()
(cherry picked from commit e6ec116b67)

OOM happens.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:10 -08:00
Akira Fujita
ce5cf38ef1 ext4: move_extent_per_page() cleanup
(cherry picked from commit ac48b0a1d0)

Integrate duplicate lines (acquire/release semaphore and invalidate
extent cache in move_extent_per_page()) into mext_replace_branches(),
to reduce source and object code size.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:09 -08:00
Kazuya Mio
3369cbb6be ext4: initialize moved_len before calling ext4_move_extents()
(cherry picked from commit 446aaa6e7e)

The move_extent.moved_len is used to pass back the number of exchanged
blocks count to user space.  Currently the caller must clear this
field; but we spend more code space checking for this requirement than
simply zeroing the field ourselves, so let's just make life easier for
everyone all around.

Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:08 -08:00
Akira Fujita
74920c74ad ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT
(cherry picked from commit 94d7c16cbb)

At the beginning of ext4_move_extent(), we call
ext4_discard_preallocations() to discard inode PAs of orig and donor
inodes.  But in the following case, blocks can be double freed, so
move ext4_discard_preallocations() to the end of ext4_move_extents().

1. Discard inode PAs of orig and donor inodes with
   ext4_discard_preallocations() in ext4_move_extents().

   orig : [ DATA1 ]
   donor: [ DATA2 ]

2. While data blocks are exchanging between orig and donor inodes, new
   inode PAs is created to orig by other process's block allocation.
   (Since there are semaphore gaps in ext4_move_extents().)  And new
   inode PAs is used partially (2-1).

   2-1 Create new inode PAs to orig inode
   orig : [ DATA1 | used PA1 | free PA1 ]
   donor: [ DATA2 ]

3. Donor inode which has old orig inode's blocks is deleted after
   EXT4_IOC_MOVE_EXT finished (3-1, 3-2).  So the block bitmap
   corresponds to old orig inode's blocks are freed.

   3-1 After EXT4_IOC_MOVE_EXT finished
   orig : [ DATA2 |  free PA1 ]
   donor: [ DATA1 |  used PA1 ]

   3-2 Delete donor inode
   orig : [ DATA2 |  free PA1 ]
   donor: [ FREE SPACE(DATA1) | FREE SPACE(used PA1) ]

4. The double-free of blocks is occurred, when close() is called to
   orig inode.  Because ext4_discard_preallocations() for orig inode
   frees used PA1 and free PA1, though used PA1 is already freed in 3.

   4-1 Double-free of blocks is occurred
   orig : [ DATA2 |  FREE SPACE(free PA1) ]
   donor: [ FREE SPACE(DATA1) | DOUBLE FREE(used PA1) ]

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:07 -08:00
Eric Sandeen
6011d0baad ext4: make "norecovery" an alias for "noload"
(cherry picked from commit e3bb52ae2b)

Users on the linux-ext4 list recently complained about differences
across filesystems w.r.t. how to mount without a journal replay.

In the discussion it was noted that xfs's "norecovery" option is
perhaps more descriptively accurate than "noload," so let's make
that an alias for ext4.

Also show this status in /proc/mounts

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:06 -08:00
Eric Sandeen
feac39ba7f ext4: make trim/discard optional (and off by default)
(cherry picked from commit 5328e63531)

It is anticipated that when sb_issue_discard starts doing
real work on trim-capable devices, we may see issues.  Make
this mount-time optional, and default it to off until we know
that things are working out OK.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:04 -08:00
Jan Kara
cd9c823a68 ext4: fix error handling in ext4_ind_get_blocks()
(cherry picked from commit 2bba702d4f)

When an error happened in ext4_splice_branch we failed to notice that
in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem
by checking for error properly.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:02 -08:00
Theodore Ts'o
43e932d311 ext4: avoid issuing unnecessary barriers
(cherry picked from commit 6b17d902fd)

We don't to issue an I/O barrier on an error or if we force commit
because we are doing data journaling.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:02 -08:00
Theodore Ts'o
31299e22be ext4: fix block validity checks so they work correctly with meta_bg
(cherry picked from commit 1032988c71)

The block validity checks used by ext4_data_block_valid() wasn't
correctly written to check file systems with the meta_bg feature.  Fix
this.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:45:01 -08:00
Theodore Ts'o
58ffefbe7b ext4: fix uninit block bitmap initialization when s_meta_first_bg is non-zero
(cherry picked from commit 8dadb198cb)

The number of old-style block group descriptor blocks is
s_meta_first_bg when the meta_bg feature flag is set.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:59 -08:00
Theodore Ts'o
9e9ddddfe7 ext4: don't update the superblock in ext4_statfs()
(cherry picked from commit 3f8fb9490e)

commit a71ce8c6c9 updated ext4_statfs()
to update the on-disk superblock counters, but modified this buffer
directly without any journaling of the change.  This is one of the
accesses that was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:58 -08:00
Eric Sandeen
023c1d20ed ext4: journal all modifications in ext4_xattr_set_handle
(cherry picked from commit 86ebfd08a1)

ext4_xattr_set_handle() was zeroing out an inode outside
of journaling constraints; this is one of the accesses that
was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.

Reviewed-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:57 -08:00
Julia Lawall
70095d96ce ext4: fix i_flags access in ext4_da_writepages_trans_blocks()
(cherry picked from commit 30c6e07a92)

We need to be testing the i_flags field in the ext4 specific portion
of the inode, instead of the (confusingly aliased) i_flags field in
the generic struct inode.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:55 -08:00
Theodore Ts'o
0f9036c7ee ext4: make sure directory and symlink blocks are revoked
(cherry picked from commit 5068969686)

When an inode gets unlinked, the functions ext4_clear_blocks() and
ext4_remove_blocks() call ext4_forget() for all the buffer heads
corresponding to the deleted inode's data blocks.  If the inode is a
directory or a symlink, the is_metadata parameter must be non-zero so
ext4_forget() will revoke them via jbd2_journal_revoke().  Otherwise,
if these blocks are reused for a data file, and the system crashes
before a journal checkpoint, the journal replay could end up
corrupting these data blocks.

Thanks to Curt Wohlgemuth for pointing out potential problems in this
area.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:54 -08:00
Theodore Ts'o
21a4b3aaa2 ext4: plug a buffer_head leak in an error path of ext4_iget()
(cherry picked from commit 567f3e9a70)

One of the invalid error paths in ext4_iget() forgot to brelse() the
inode buffer head.  Fix it by adding a brelse() in the common error
return path, which also simplifies function.

Thanks to Andi Kleen <ak@linux.intel.com> reporting the problem.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:53 -08:00
Akira Fujita
a752b960d4 ext4: fix possible recursive locking warning in EXT4_IOC_MOVE_EXT
(cherry picked from commit 49bd22bc4d)

If CONFIG_PROVE_LOCKING is enabled, the double_down_write_data_sem()
will trigger a false-positive warning of a recursive lock.  Since we
take i_data_sem for the two inodes ordered by their inode numbers,
this isn't a problem.  Use of down_write_nested() will notify the lock
dependency checker machinery that there is no problem here.

This problem was reported by Brian Rogers:

	http://marc.info/?l=linux-ext4&m=125115356928011&w=1

Reported-by: Brian Rogers <brian@xyzw.org>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:52 -08:00
Akira Fujita
52a4345d3d ext4: fix lock order problem in ext4_move_extents()
(cherry picked from commit fc04cb49a8)

ext4_move_extents() checks the logical block contiguousness
of original file with ext4_find_extent() and mext_next_extent().
Therefore the extent which ext4_ext_path structure indicates
must not be changed between above functions.

But in current implementation, there is no i_data_sem protection
between ext4_ext_find_extent() and mext_next_extent().  So the extent
which ext4_ext_path structure indicates may be overwritten by
delalloc.  As a result, ext4_move_extents() will exchange wrong blocks
between original and donor files.  I change the place where
acquire/release i_data_sem to solve this problem.

Moreover, I changed move_extent_per_page() to start transaction first,
and then acquire i_data_sem.  Without this change, there is a
possibility of the deadlock between mmap() and ext4_move_extents():

* NOTE: "A", "B" and "C" mean different processes

A-1: ext4_ext_move_extents() acquires i_data_sem of two inodes.

B:   do_page_fault() starts the transaction (T),
     and then tries to acquire i_data_sem.
     But process "A" is already holding it, so it is kept waiting.

C:   While "A" and "B" running, kjournald2 tries to commit transaction (T)
     but it is under updating, so kjournald2 waits for it.

A-2: Call ext4_journal_start with holding i_data_sem,
     but transaction (T) is locked.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:51 -08:00
Akira Fujita
a4a87a7f39 ext4: fix the returned block count if EXT4_IOC_MOVE_EXT fails
(cherry picked from commit f868a48d06)

If the EXT4_IOC_MOVE_EXT ioctl fails, the number of blocks that were
exchanged before the failure should be returned to the userspace
caller.  Unfortunately, currently if the block size is not the same as
the page size, the returned block count that is returned is the
page-aligned block count instead of the actual block count.  This
commit addresses this bug.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:49 -08:00
Theodore Ts'o
8ed33ff520 ext4: avoid divide by zero when trying to mount a corrupted file system
(cherry picked from commit 503358ae01)

If s_log_groups_per_flex is greater than 31, then groups_per_flex will
will overflow and cause a divide by zero error.  This can cause kernel
BUG if such a file system is mounted.

Thanks to Nageswara R Sastry for analyzing the failure and providing
an initial patch.

http://bugzilla.kernel.org/show_bug.cgi?id=14287

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:49 -08:00
Theodore Ts'o
6662a8d031 ext4: fix potential buffer head leak when add_dirent_to_buf() returns ENOSPC
(cherry picked from commit 2de770a406)

Previously add_dirent_to_buf() did not free its passed-in buffer head
in the case of ENOSPC, since in some cases the caller still needed it.
However, this led to potential buffer head leaks since not all callers
dealt with this correctly.  Fix this by making simplifying the freeing
convention; now add_dirent_to_buf() *never* frees the passed-in buffer
head, and leaves that to the responsibility of its caller.  This makes
things cleaner and easier to prove that the code is neither leaking
buffer heads or calling brelse() one time too many.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:47 -08:00
Marc Dionne
3350b2acdd CacheFiles: Update IMA counters when using dentry_open
When IMA is active, using dentry_open without updating the
IMA counters will result in free/open imbalance errors when
fput is eventually called.

Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-01 07:35:11 -08:00
David Howells
6f05416432 9p: fix build breakage introduced by FS-Cache
While building 2.6.32-rc8-git2 for Fedora I noticed the following thinko
in commit 201a15428b ("FS-Cache: Handle
pages pending storage that get evicted under OOM conditions"):

  fs/9p/cache.c: In function '__v9fs_fscache_release_page':
  fs/9p/cache.c:346: error: 'vnode' undeclared (first use in this function)
  fs/9p/cache.c:346: error: (Each undeclared identifier is reported only once
  fs/9p/cache.c:346: error: for each function it appears in.)
  make[2]: *** [fs/9p/cache.o] Error 1

Fix the 9P filesystem to correctly construct the argument to
fscache_maybe_release_page().

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com> [from identical patch]
Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> [from identical patch]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-01 07:35:11 -08:00
Linus Torvalds
99d7832c0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Fix sparse warning
  [CIFS] Duplicate data on appending to some Samba servers
  [CIFS] fix oops in cifs_lookup during net boot
2009-11-30 14:51:01 -08:00
Linus Torvalds
d0964c37b5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: reject O_DIRECT flag also in fuse_create
2009-11-30 14:00:09 -08:00
David Woodhouse
199bc9ff5c jffs2: Fix memory corruption in jffs2_read_inode_range()
In 2.6.23 kernel, commit a32ea1e1f9
("Fix read/truncate race") fixed a race in the generic code, and as a
side effect, now do_generic_file_read() can ask us to readpage() past
the i_size. This seems to be correctly handled by the block routines
(e.g. block_read_full_page() fills the page with zeroes in case if
somebody is trying to read past the last inode's block).

JFFS2 doesn't handle this; it assumes that it won't be asked to read
pages which don't exist -- and thus that there will be at least _one_
valid 'frag' on the page it's being asked to read. It will fill any
holes with the following memset:

  memset(buf, 0, min(end, frag->ofs + frag->size) - offset);

When the 'closest smaller match' returned by jffs2_lookup_node_frag() is
actually on a previous page and ends before 'offset', that results in:

  memset(buf, 0, <huge unsigned negative>);

Hopefully, in most cases the corruption is fatal, and quickly causing
random oopses, like this:

  root@10.0.0.4:~/ltp-fs-20090531# ./testcases/kernel/fs/ftest/ftest01
  Unable to handle kernel paging request for data at address 0x00000008
  Faulting instruction address: 0xc01cd980
  Oops: Kernel access of bad area, sig: 11 [#1]
  [...]
  NIP [c01cd980] rb_insert_color+0x38/0x184
  LR [c0043978] enqueue_hrtimer+0x88/0xc4
  Call Trace:
  [c6c63b60] [c004f9a8] tick_sched_timer+0xa0/0xe4 (unreliable)
  [c6c63b80] [c0043978] enqueue_hrtimer+0x88/0xc4
  [c6c63b90] [c0043a48] __run_hrtimer+0x94/0xbc
  [c6c63bb0] [c0044628] hrtimer_interrupt+0x140/0x2b8
  [c6c63c10] [c000f8e8] timer_interrupt+0x13c/0x254
  [c6c63c30] [c001352c] ret_from_except+0x0/0x14
  --- Exception: 901 at memset+0x38/0x5c
      LR = jffs2_read_inode_range+0x144/0x17c
  [c6c63cf0] [00000000] (null) (unreliable)

This patch fixes the issue, plus fixes all LTP tests on NAND/UBI with
JFFS2 filesystem that were failing since 2.6.23 (seems like the bug
above also broke the truncation).

Reported-By: Anton Vorontsov <avorontsov@ru.mvista.com>
Tested-By: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-30 13:52:40 -08:00
Linus Torvalds
6e80133f7f Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (31 commits)
  FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n
  SLOW_WORK: Fix GFS2 to #include <linux/module.h> before using THIS_MODULE
  SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user()
  CacheFiles: Don't log lookup/create failing with ENOBUFS
  CacheFiles: Catch an overly long wait for an old active object
  CacheFiles: Better showing of debugging information in active object problems
  CacheFiles: Mark parent directory locks as I_MUTEX_PARENT to keep lockdep happy
  CacheFiles: Handle truncate unlocking the page we're reading
  CacheFiles: Don't write a full page if there's only a partial page to cache
  FS-Cache: Actually requeue an object when requested
  FS-Cache: Start processing an object's operations on that object's death
  FS-Cache: Make sure FSCACHE_COOKIE_LOOKING_UP cleared on lookup failure
  FS-Cache: Add a retirement stat counter
  FS-Cache: Handle pages pending storage that get evicted under OOM conditions
  FS-Cache: Handle read request vs lookup, creation or other cache failure
  FS-Cache: Don't delete pending pages from the page-store tracking tree
  FS-Cache: Fix lock misorder in fscache_write_op()
  FS-Cache: The object-available state can't rely on the cookie to be available
  FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phase
  FS-Cache: Use radix tree preload correctly in tracking of pages to be stored
  ...
2009-11-30 13:33:48 -08:00
Csaba Henk
1b7323965a fuse: reject O_DIRECT flag also in fuse_create
The comment in fuse_open about O_DIRECT:

  "VFS checks this, but only _after_ ->open()"

also holds for fuse_create, however, the same kind of check was missing there.

As an impact of this bug, open(newfile, O_RDWR|O_CREAT|O_DIRECT) fails, but a
stub newfile will remain if the fuse server handled the implied FUSE_CREATE
request appropriately.

Other impact: in the above situation ima_file_free() will complain to open/free
imbalance if CONFIG_IMA is set.

Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Harshavardhana <harsha@gluster.com>
Cc: stable@kernel.org
2009-11-27 16:37:13 +01:00
Steve French
2f81e752da [CIFS] Fix sparse warning
Also update CHANGES file

Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-25 00:11:31 +00:00
Steve French
cea6234395 [CIFS] Duplicate data on appending to some Samba servers
SMB writes are sent with a starting offset and length. When the server
supports the newer SMB trans2 posix open (rather than using the SMB
NTCreateX) a file can be opened with SMB_O_APPEND flag, and for that
case Samba server assumes that the offset sent in SMBWriteX is unneeded
since the write should go to the end of the file - which can cause
problems if the write was cached (since the beginning part of a
page could be written twice by the client mm).  Jeff suggested that
masking the flag on posix open on the client is easiest for the time
being. Note that recent Samba server also had an unrelated problem with
SMB NTCreateX and append (see samba bugzilla bug number 6898) which
should not affect current Linux clients (unless cifs Unix Extensions
are disabled).

The cifs client did not send the O_APPEND flag on posix open
before 2.6.29 so the fix is unneeded on early kernels.

In the future, for the non-cached case (O_DIRECT, and forcedirectio mounts)
it would be possible and useful to send O_APPEND on posix open (for Windows
case: FILE_APPEND_DATA but not FILE_WRITE_DATA on SMB NTCreateX) but for
cached writes although the vfs sets the offset to end of file it
may fragment a write across pages - so we can't send O_APPEND on
open (could result in sending part of a page twice).

CC: Stable <stable@kernel.org>
Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-24 22:52:13 +00:00
Steve French
8e6c0332d5 [CIFS] fix oops in cifs_lookup during net boot
Fixes bugzilla.kernel.org bug number 14641

Lookup called during network boot (network root filesystem
for diskless workstation) has case where nd is null in
lookup.  This patch fixes that in cifs_lookup.

(Shirish noted that 2.6.30 and 2.6.31 stable need the same check)

Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Tested-by:  Vladimir Stavrinov <vs@inist.ru>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-24 22:17:59 +00:00
David Howells
4fa9f4ede8 FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n
Provide nop fscache_stat_d() macro if CONFIG_FSCACHE_STATS=n lest errors like
the following occur:

	fs/fscache/cache.c: In function 'fscache_withdraw_cache':
	fs/fscache/cache.c:386: error: implicit declaration of function 'fscache_stat_d'
	fs/fscache/cache.c:386: error: 'fscache_n_cop_sync_cache' undeclared (first use in this function)
	fs/fscache/cache.c:386: error: (Each undeclared identifier is reported only once
	fs/fscache/cache.c:386: error: for each function it appears in.)
	fs/fscache/cache.c:392: error: 'fscache_n_cop_dissociate_pages' undeclared (first use in this function)

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-20 21:50:44 +00:00
David Howells
1c2ea8a2c0 SLOW_WORK: Fix GFS2 to #include <linux/module.h> before using THIS_MODULE
GFS2 has been altered to pass THIS_MODULE to slow_work_register_user(), but
hasn't been altered to #include <linux/module.h> to provide it, resulting in
the following error:

	fs/gfs2/recovery.c:596: error: 'THIS_MODULE' undeclared here (not in a function)

Add the missing #include.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-20 21:50:40 +00:00
David Howells
0109d7e614 SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user()
As of the patch:

	SLOW_WORK: Wait for outstanding work items belonging to a module to clear

	Wait for outstanding slow work items belonging to a module to clear
	when unregistering that module as a user of the facility.  This
	prevents the put_ref code of a work item from being taken away before
	it returns.

slow_work_register_user() takes a module pointer as an argument.  CIFS must now
pass THIS_MODULE as that argument, lest the following error be observed:

	fs/cifs/cifsfs.c: In function 'init_cifs':
	fs/cifs/cifsfs.c:1040: error: too few arguments to function 'slow_work_register_user'

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-20 21:50:36 +00:00