Commit Graph

32081 Commits

Author SHA1 Message Date
Christoph Hellwig
3d3da408c3 posix_acl: handle NULL ACL in posix_acl_equiv_mode
commit 50c6e282bd upstream.

Various filesystems don't bother checking for a NULL ACL in
posix_acl_equiv_mode, and thus can dereference a NULL pointer when it
gets passed one. This usually happens from the NFS server, as the ACL tools
never pass a NULL ACL, but instead of one representing the mode bits.

Instead of adding boilerplat to all filesystems put this check into one place,
which will allow us to remove the check from other filesystems as well later
on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>,
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07 13:25:33 -07:00
Al Viro
68ad89c73d don't bother with {get,put}_write_access() on non-regular files
commit dd20908a8a upstream.

it's pointless and actually leads to wrong behaviour in at least one
moderately convoluted case (pipe(), close one end, try to get to
another via /proc/*/fd and run into ETXTBUSY).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30 21:52:12 -07:00
Jeff Layton
72a6e59449 lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
commit 679b033df4 upstream.

We had a Fedora ABRT report with a stack trace like this:

kernel BUG at net/sunrpc/svc.c:550!
invalid opcode: 0000 [#1] SMP
[...]
CPU: 2 PID: 913 Comm: rpc.nfsd Not tainted 3.13.6-200.fc20.x86_64 #1
Hardware name: Hewlett-Packard HP ProBook 4740s/1846, BIOS 68IRR Ver. F.40 01/29/2013
task: ffff880146b00000 ti: ffff88003f9b8000 task.ti: ffff88003f9b8000
RIP: 0010:[<ffffffffa0305fa8>]  [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
RSP: 0018:ffff88003f9b9de0  EFLAGS: 00010206
RAX: ffff88003f829628 RBX: ffff88003f829600 RCX: 00000000000041ee
RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
RBP: ffff88003f9b9de8 R08: 0000000000017360 R09: ffff88014fa97360
R10: ffffffff8114ce57 R11: ffffea00051c9c00 R12: ffff88003f829600
R13: 00000000ffffff9e R14: ffffffff81cc7cc0 R15: 0000000000000000
FS:  00007f4fde284840(0000) GS:ffff88014fa80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4fdf5192f8 CR3: 00000000a569a000 CR4: 00000000001407e0
Stack:
 ffff88003f792300 ffff88003f9b9e18 ffffffffa02de02a 0000000000000000
 ffffffff81cc7cc0 ffff88003f9cb000 0000000000000008 ffff88003f9b9e60
 ffffffffa033bb35 ffffffff8131c86c ffff88003f9cb000 ffff8800a5715008
Call Trace:
 [<ffffffffa02de02a>] lockd_up+0xaa/0x330 [lockd]
 [<ffffffffa033bb35>] nfsd_svc+0x1b5/0x2f0 [nfsd]
 [<ffffffff8131c86c>] ? simple_strtoull+0x2c/0x50
 [<ffffffffa033c630>] ? write_pool_threads+0x280/0x280 [nfsd]
 [<ffffffffa033c6bb>] write_threads+0x8b/0xf0 [nfsd]
 [<ffffffff8114efa4>] ? __get_free_pages+0x14/0x50
 [<ffffffff8114eff6>] ? get_zeroed_page+0x16/0x20
 [<ffffffff811dec51>] ? simple_transaction_get+0xb1/0xd0
 [<ffffffffa033c098>] nfsctl_transaction_write+0x48/0x80 [nfsd]
 [<ffffffff811b8b34>] vfs_write+0xb4/0x1f0
 [<ffffffff811c3f99>] ? putname+0x29/0x40
 [<ffffffff811b9569>] SyS_write+0x49/0xa0
 [<ffffffff810fc2a6>] ? __audit_syscall_exit+0x1f6/0x2a0
 [<ffffffff816962e9>] system_call_fastpath+0x16/0x1b
Code: 31 c0 e8 82 db 37 e1 e9 2a ff ff ff 48 8b 07 8b 57 14 48 c7 c7 d5 c6 31 a0 48 8b 70 20 31 c0 e8 65 db 37 e1 e9 f4 fe ff ff 0f 0b <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55
RIP  [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
 RSP <ffff88003f9b9de0>

Evidently, we created some lockd sockets and then failed to create
others. make_socks then returned an error and we tried to tear down the
svc, but svc->sv_permsocks was not empty so we ended up tripping over
the BUG() in svc_destroy().

Fix this by ensuring that we tear down any live sockets we created when
socket creation is going to return an error.

Fixes: 786185b5f8 (SUNRPC: move per-net operations from...)
Reported-by: Raphos <raphoszap@laposte.net>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-13 13:59:46 +02:00
Jeff Layton
8771da18d2 locks: allow __break_lease to sleep even when break_time is 0
commit 4991a628a7 upstream.

A fl->fl_break_time of 0 has a special meaning to the lease break code
that basically means "never break the lease". knfsd uses this to ensure
that leases don't disappear out from under it.

Unfortunately, the code in __break_lease can end up passing this value
to wait_event_interruptible as a timeout, which prevents it from going
to sleep at all. This causes __break_lease to spin in a tight loop and
causes soft lockups.

Fix this by ensuring that we pass a minimum value of 1 as a timeout
instead.

Cc: J. Bruce Fields <bfields@fieldses.org>
Reported-by: Terry Barnaby <terry1@beam.ltd.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-13 13:59:44 +02:00
Theodore Ts'o
fcc742c294 ext4: use i_size_read in ext4_unaligned_aio()
commit 6e6358fc3c upstream.

We haven't taken i_mutex yet, so we need to use i_size_read().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:32 -07:00
Jan Kara
f78e2d2db6 ext4: fix jbd2 warning under heavy xattr load
commit ec4cb1aa2b upstream.

When heavily exercising xattr code the assertion that
jbd2_journal_dirty_metadata() shouldn't return error was triggered:

WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
jbd2_journal_dirty_metadata+0x1ba/0x260()

CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G    W 3.10.0-ceph-00049-g68d04c9 #1
Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
 ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
 ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
 0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
Call Trace:
 [<ffffffff816311b0>] dump_stack+0x19/0x1b
 [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
 [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
 [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
 [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
 [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
 [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
 [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
 [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
 [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
 [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
 [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
 [<ffffffff8119421e>] setxattr+0x13e/0x1e0
 [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
 [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
 [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b

The reason for the warning is that buffer_head passed into
jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
caused by the following race of two ext4_xattr_release_block() calls:

CPU1                                CPU2
ext4_xattr_release_block()          ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
  le32_add_cpu(&BHDR(bh)->h_refcount, -1);
  unlock_buffer(bh);
                                    lock_buffer(bh);
                                    /* True */
                                    if (BHDR(bh)->h_refcount == cpu_to_le32(1))
                                      get_bh(bh);
                                      ext4_free_blocks()
                                        ...
                                        jbd2_journal_forget()
                                          jbd2_journal_unfile_buffer()
                                          -> JH is gone
  error = ext4_handle_dirty_xattr_block(handle, inode, bh);
  -> triggers the warning

We fix the problem by moving ext4_handle_dirty_xattr_block() under the
buffer lock. Sadly this cannot be done in nojournal mode as that
function can call sync_dirty_buffer() which would deadlock. Luckily in
nojournal mode the race is harmless (we only dirty already freed buffer)
and thus for nojournal mode we leave the dirtying outside of the buffer
lock.

Reported-by: Sage Weil <sage@inktank.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:32 -07:00
alex chen
5da43e877e ocfs2: do not put bh when buffer_uptodate failed
commit f7cf4f5bfe upstream.

Do not put bh when buffer_uptodate failed in ocfs2_write_block and
ocfs2_write_super_or_backup, because it will put bh in b_end_io.
Otherwise it will hit a warning "VFS: brelse: Trying to free free
buffer".

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:32 -07:00
Junxiao Bi
a093bceda6 ocfs2: dlm: fix recovery hung
commit ded2cf7141 upstream.

There is a race window in dlm_do_recovery() between dlm_remaster_locks()
and dlm_reset_recovery() when the recovery master nearly finish the
recovery process for a dead node.  After the master sends FINALIZE_RECO
message in dlm_remaster_locks(), another node may become the recovery
master for another dead node, and then send the BEGIN_RECO message to
all the nodes included the old master, in the handler of this message
dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
dlm->reco.new_master will be set to the second dead node and the new
master, then in dlm_reset_recovery(), these two variables will be reset
to default value.  This will cause new recovery master can not finish
the recovery process and hung, at last the whole cluster will hung for
recovery.

old recovery master:                                 new recovery master:
dlm_remaster_locks()
                                                  become recovery master for
                                                  another dead node.
                                                  dlm_send_begin_reco_message()
dlm_begin_reco_handler()
{
 if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
  return -EAGAIN;
 }
 dlm_set_reco_master(dlm, br->node_idx);
 dlm_set_reco_dead_node(dlm, br->dead_node);
}
dlm_reset_recovery()
{
 dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
 dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
}
                                                  will hang in dlm_remaster_locks() for
                                                  request dlm locks info

Before send FINALIZE_RECO message, recovery master should set
DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
this can break the race windows as the BEGIN_RECO messages will not be
handled before DLM_RECO_STATE_FINALIZE flag is cleared.

A similar race may happen between new recovery master and normal node
which is in dlm_finalize_reco_handler(), also fix it.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:32 -07:00
Junxiao Bi
261985c439 ocfs2: dlm: fix lock migration crash
commit 34aa8dac48 upstream.

This issue was introduced by commit 800deef3f6 ("ocfs2: use
list_for_each_entry where benefical") in 2007 where it replaced
list_for_each with list_for_each_entry.  The variable "lock" will point
to invalid data if "tmpq" list is empty and a panic will be triggered
due to this.  Sunil advised reverting it back, but the old version was
also not right.  At the end of the outer for loop, that
list_for_each_entry will also set "lock" to an invalid data, then in the
next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
data and cause the panic.  So reverting the list_for_each back and reset
"lock" to NULL to fix this issue.

Another concern is that this seemes can not happen because the "tmpq"
list should not be empty.  Let me describe how.

old lock resource owner(node 1):                                  migratation target(node 2):
image there's lockres with a EX lock from node 2 in
granted list, a NR lock from node x with convert_type
EX in converting list.
dlm_empty_lockres() {
 dlm_pick_migration_target() {
   pick node 2 as target as its lock is the first one
   in granted list.
 }
 dlm_migrate_lockres() {
   dlm_mark_lockres_migrating() {
     res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
     wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
	 //after the above code, we can not dirty lockres any more,
     // so dlm_thread shuffle list will not run
                                                                   downconvert lock from EX to NR
                                                                   upconvert lock from NR to EX
<<< migration may schedule out here, then
<<< node 2 send down convert request to convert type from EX to
<<< NR, then send up convert request to convert type from NR to
<<< EX, at this time, lockres granted list is empty, and two locks
<<< in the converting list, node x up convert lock followed by
<<< node 2 up convert lock.

	 // will set lockres RES_MIGRATING flag, the following
	 // lock/unlock can not run
     dlm_lockres_release_ast(dlm, res);
   }

   dlm_send_one_lockres()
                                                                 dlm_process_recovery_data()
                                                                   for (i=0; i<mres->num_locks; i++)
                                                                     if (ml->node == dlm->node_num)
                                                                       for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
                                                                        list_for_each_entry(lock, tmpq, list)
                                                                        if (lock) break; <<< lock is invalid as grant list is empty.
                                                                       }
                                                                       if (lock->ml.node != ml->node)
                                                                         BUG() >>> crash here
 }

I see the above locks status from a vmcore of our internal bug.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:32 -07:00
Jeff Mahoney
1d06be05aa reiserfs: fix race in readdir
commit 01d8885785 upstream.

jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2)

The -ENOENT is due to readdir calling dir_emit on the same entry twice.

If the dir_emit callback sleeps and the tree is changed underneath us,
we won't be able to trust deh_offset(deh) anymore. We need to save
next_pos before we might sleep so we can find the next entry.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:30 -07:00
Jeff Layton
da05ff98dc nfsd: set timeparms.to_maxval in setup_callback_client
commit 3758cf7e14 upstream.

...otherwise the logic in the timeout handling doesn't work correctly.

Spotted-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:29 -07:00
Kinglong Mee
d8639f2438 NFSD: Traverse unconfirmed client through hash-table
commit 2b90563598 upstream.

When stopping nfsd, I got BUG messages, and soft lockup messages,
The problem is cuased by double rb_erase() in nfs4_state_destroy_net()
and destroy_client().

This patch just let nfsd traversing unconfirmed client through
hash-table instead of rbtree.

[ 2325.021995] BUG: unable to handle kernel NULL pointer dereference at
          (null)
[ 2325.022809] IP: [<ffffffff8133c18c>] rb_erase+0x14c/0x390
[ 2325.022982] PGD 7a91b067 PUD 7a33d067 PMD 0
[ 2325.022982] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 2325.022982] Modules linked in: nfsd(OF) cfg80211 rfkill bridge stp
llc snd_intel8x0 snd_ac97_codec ac97_bus auth_rpcgss nfs_acl serio_raw
e1000 i2c_piix4 ppdev snd_pcm snd_timer lockd pcspkr joydev parport_pc
snd parport i2c_core soundcore microcode sunrpc ata_generic pata_acpi
[last unloaded: nfsd]
[ 2325.022982] CPU: 1 PID: 2123 Comm: nfsd Tainted: GF          O
3.14.0-rc8+ #2
[ 2325.022982] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 2325.022982] task: ffff88007b384800 ti: ffff8800797f6000 task.ti:
ffff8800797f6000
[ 2325.022982] RIP: 0010:[<ffffffff8133c18c>]  [<ffffffff8133c18c>]
rb_erase+0x14c/0x390
[ 2325.022982] RSP: 0018:ffff8800797f7d98  EFLAGS: 00010246
[ 2325.022982] RAX: ffff880079c1f010 RBX: ffff880079f4c828 RCX:
0000000000000000
[ 2325.022982] RDX: 0000000000000000 RSI: ffff880079bcb070 RDI:
ffff880079f4c810
[ 2325.022982] RBP: ffff8800797f7d98 R08: 0000000000000000 R09:
ffff88007964fc70
[ 2325.022982] R10: 0000000000000000 R11: 0000000000000400 R12:
ffff880079f4c800
[ 2325.022982] R13: ffff880079bcb000 R14: ffff8800797f7da8 R15:
ffff880079f4c860
[ 2325.022982] FS:  0000000000000000(0000) GS:ffff88007f900000(0000)
knlGS:0000000000000000
[ 2325.022982] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2325.022982] CR2: 0000000000000000 CR3: 000000007a3ef000 CR4:
00000000000006e0
[ 2325.022982] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 2325.022982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 2325.022982] Stack:
[ 2325.022982]  ffff8800797f7de0 ffffffffa0191c6e ffff8800797f7da8
ffff8800797f7da8
[ 2325.022982]  ffff880079f4c810 ffff880079bcb000 ffffffff81cc26c0
ffff880079c1f010
[ 2325.022982]  ffff880079bcb070 ffff8800797f7e28 ffffffffa01977f2
ffff8800797f7df0
[ 2325.022982] Call Trace:
[ 2325.022982]  [<ffffffffa0191c6e>] destroy_client+0x32e/0x3b0 [nfsd]
[ 2325.022982]  [<ffffffffa01977f2>] nfs4_state_shutdown_net+0x1a2/0x220
[nfsd]
[ 2325.022982]  [<ffffffffa01700b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
[ 2325.022982]  [<ffffffffa017013e>] nfsd_last_thread+0x4e/0x80 [nfsd]
[ 2325.022982]  [<ffffffffa001f1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
[ 2325.022982]  [<ffffffffa017064b>] nfsd_destroy+0x5b/0x80 [nfsd]
[ 2325.022982]  [<ffffffffa0170773>] nfsd+0x103/0x130 [nfsd]
[ 2325.022982]  [<ffffffffa0170670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 2325.022982]  [<ffffffff810a8232>] kthread+0xd2/0xf0
[ 2325.022982]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 2325.022982]  [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
[ 2325.022982]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 2325.022982] Code: 48 83 e1 fc 48 89 10 0f 84 02 01 00 00 48 3b 41 10
0f 84 08 01 00 00 48 89 51 08 48 89 fa e9 74 ff ff ff 0f 1f 40 00 48 8b
50 10 <f6> 02 01 0f 84 93 00 00 00 48 8b 7a 10 48 85 ff 74 05 f6 07 01
[ 2325.022982] RIP  [<ffffffff8133c18c>] rb_erase+0x14c/0x390
[ 2325.022982]  RSP <ffff8800797f7d98>
[ 2325.022982] CR2: 0000000000000000
[ 2325.022982] ---[ end trace 28c27ed011655e57 ]---

[  228.064071] BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:558]
[  228.064428] Modules linked in: ip6t_rpfilter ip6t_REJECT cfg80211
xt_conntrack rfkill ebtable_nat ebtable_broute bridge stp llc
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw
ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security
iptable_raw nfsd(OF) auth_rpcgss nfs_acl lockd snd_intel8x0
snd_ac97_codec ac97_bus joydev snd_pcm snd_timer e1000 sunrpc snd ppdev
parport_pc serio_raw pcspkr i2c_piix4 microcode parport soundcore
i2c_core ata_generic pata_acpi
[  228.064539] CPU: 0 PID: 558 Comm: nfsd Tainted: GF          O
3.14.0-rc8+ #2
[  228.064539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[  228.064539] task: ffff880076adec00 ti: ffff880074616000 task.ti:
ffff880074616000
[  228.064539] RIP: 0010:[<ffffffff8133ba17>]  [<ffffffff8133ba17>]
rb_next+0x27/0x50
[  228.064539] RSP: 0018:ffff880074617de0  EFLAGS: 00000282
[  228.064539] RAX: ffff880074478010 RBX: ffff88007446f860 RCX:
0000000000000014
[  228.064539] RDX: ffff880074478010 RSI: 0000000000000000 RDI:
ffff880074478010
[  228.064539] RBP: ffff880074617de0 R08: 0000000000000000 R09:
0000000000000012
[  228.064539] R10: 0000000000000001 R11: ffffffffffffffec R12:
ffffea0001d11a00
[  228.064539] R13: ffff88007f401400 R14: ffff88007446f800 R15:
ffff880074617d50
[  228.064539] FS:  0000000000000000(0000) GS:ffff88007f800000(0000)
knlGS:0000000000000000
[  228.064539] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  228.064539] CR2: 00007fe9ac6ec000 CR3: 000000007a5d6000 CR4:
00000000000006f0
[  228.064539] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  228.064539] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  228.064539] Stack:
[  228.064539]  ffff880074617e28 ffffffffa01ab7db ffff880074617df0
ffff880074617df0
[  228.064539]  ffff880079273000 ffffffff81cc26c0 ffffffff81cc26c0
0000000000000000
[  228.064539]  0000000000000000 ffff880074617e48 ffffffffa01840b8
ffffffff81cc26c0
[  228.064539] Call Trace:
[  228.064539]  [<ffffffffa01ab7db>] nfs4_state_shutdown_net+0x18b/0x220
[nfsd]
[  228.064539]  [<ffffffffa01840b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
[  228.064539]  [<ffffffffa018413e>] nfsd_last_thread+0x4e/0x80 [nfsd]
[  228.064539]  [<ffffffffa00aa1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
[  228.064539]  [<ffffffffa018464b>] nfsd_destroy+0x5b/0x80 [nfsd]
[  228.064539]  [<ffffffffa0184773>] nfsd+0x103/0x130 [nfsd]
[  228.064539]  [<ffffffffa0184670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[  228.064539]  [<ffffffff810a8232>] kthread+0xd2/0xf0
[  228.064539]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[  228.064539]  [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
[  228.064539]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[  228.064539] Code: 1f 44 00 00 55 48 8b 17 48 89 e5 48 39 d7 74 3b 48
8b 47 08 48 85 c0 75 0e eb 25 66 0f 1f 84 00 00 00 00 00 48 89 d0 48 8b
50 10 <48> 85 d2 75 f4 5d c3 66 90 48 3b 78 08 75 f6 48 8b 10 48 89 c7

Fixes: ac55fdc408 (nfsd: move the confirmed and unconfirmed hlists...)
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:29 -07:00
J. Bruce Fields
83caff8d17 nfsd4: fix setclientid encode size
commit 480efaee08 upstream.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:29 -07:00
Stanislav Kinsbursky
344f3a6bce nfsd: check passed socket's net matches NFSd superblock's one
commit 3064639423 upstream.

There could be a case, when NFSd file system is mounted in network, different
to socket's one, like below:

"ip netns exec" creates new network and mount namespace, which duplicates NFSd
mount point, created in init_net context. And thus NFS server stop in nested
network context leads to RPCBIND client destruction in init_net.
Then, on NFSd start in nested network context, rpc.nfsd process creates socket
in nested net and passes it into "write_ports", which leads to RPCBIND sockets
creation in init_net context because of the same reason (NFSd monut point was
created in init_net context). An attempt to register passed socket in nested
net leads to panic, because no RPCBIND client present in nexted network
namespace.

This patch add check that passed socket's net matches NFSd superblock's one.
And returns -EINVAL error to user psace otherwise.

v2: Put socket on exit.

Reported-by: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:29 -07:00
J. Bruce Fields
b57d9c3fe3 nfsd: notify_change needs elevated write count
commit 9f67f18993 upstream.

Looks like this bug has been here since these write counts were
introduced, not sure why it was just noticed now.

Thanks also to Jan Kara for pointing out the problem.

Reported-by: Matthew Rahtz <mrahtz@rapitasystems.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:29 -07:00
J. Bruce Fields
5aad7571c2 nfsd4: fix test_stateid error reply encoding
commit a11fcce154 upstream.

If the entire operation fails then there's nothing to encode.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:29 -07:00
J. Bruce Fields
f4a0a32b20 nfsd4: buffer-length check for SUPPATTR_EXCLCREAT
commit de3997a7ee upstream.

This was an omission from 8c18f2052e
"nfsd41: SUPPATTR_EXCLCREAT attribute".

Cc: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:29 -07:00
J. Bruce Fields
c5393b70f9 nfsd4: session needs room for following op to error out
commit 4c69d5855a upstream.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:29 -07:00
Mark Tinguely
0d267970ad xfs: fix directory hash ordering bug
commit c88547a811 upstream.

Commit f5ea1100 ("xfs: add CRCs to dir2/da node blocks") introduced
in 3.10 incorrectly converted the btree hash index array pointer in
xfs_da3_fixhashpath(). It resulted in the the current hash always
being compared against the first entry in the btree rather than the
current block index into the btree block's hash entry array. As a
result, it was comparing the wrong hashes, and so could misorder the
entries in the btree.

For most cases, this doesn't cause any problems as it requires hash
collisions to expose the ordering problem. However, when there are
hash collisions within a directory there is a very good probability
that the entries will be ordered incorrectly and that actually
matters when duplicate hashes are placed into or removed from the
btree block hash entry array.

This bug results in an on-disk directory corruption and that results
in directory verifier functions throwing corruption warnings into
the logs. While no data or directory entries are lost, access to
them may be compromised, and attempts to remove entries from a
directory that has suffered from this corruption may result in a
filesystem shutdown.  xfs_repair will fix the directory hash
ordering without data loss occuring.

[dchinner: wrote useful a commit message]

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:28 -07:00
Li Zefan
cc8ece834b jffs2: remove from wait queue after schedule()
commit 3ead957844 upstream.

@wait is a local variable, so if we don't remove it from the wait queue
list, later wake_up() may end up accessing invalid memory.

This was spotted by eyes.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:36 -07:00
Li Zefan
3a196f46fc jffs2: avoid soft-lockup in jffs2_reserve_space_gc()
commit 13b546d962 upstream.

We triggered soft-lockup under stress test on 2.6.34 kernel.

BUG: soft lockup - CPU#1 stuck for 60009ms! [lockf2.test:14488]
...
[<bf09a4d4>] (jffs2_do_reserve_space+0x420/0x440 [jffs2])
[<bf09a528>] (jffs2_reserve_space_gc+0x34/0x78 [jffs2])
[<bf0a1350>] (jffs2_garbage_collect_dnode.isra.3+0x264/0x478 [jffs2])
[<bf0a2078>] (jffs2_garbage_collect_pass+0x9c0/0xe4c [jffs2])
[<bf09a670>] (jffs2_reserve_space+0x104/0x2a8 [jffs2])
[<bf09dc48>] (jffs2_write_inode_range+0x5c/0x4d4 [jffs2])
[<bf097d8c>] (jffs2_write_end+0x198/0x2c0 [jffs2])
[<c00e00a4>] (generic_file_buffered_write+0x158/0x200)
[<c00e14f4>] (__generic_file_aio_write+0x3a4/0x414)
[<c00e15c0>] (generic_file_aio_write+0x5c/0xbc)
[<c012334c>] (do_sync_write+0x98/0xd4)
[<c0123a84>] (vfs_write+0xa8/0x150)
[<c0123d74>] (sys_write+0x3c/0xc0)]

Fix this by adding a cond_resched() in the while loop.

[akpm@linux-foundation.org: don't initialize `ret']
Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:35 -07:00
Ajesh Kunhipurayil Vijayan
16668c2b1a jffs2: Fix crash due to truncation of csize
commit 41bf1a24c1 upstream.

mounting JFFS2 partition sometimes crashes with this call trace:

[ 1322.240000] Kernel bug detected[#1]:
[ 1322.244000] Cpu 2
[ 1322.244000] $ 0   : 0000000000000000 0000000000000018 000000003ff00070 0000000000000001
[ 1322.252000] $ 4   : 0000000000000000 c0000000f3980150 0000000000000000 0000000000010000
[ 1322.260000] $ 8   : ffffffffc09cd5f8 0000000000000001 0000000000000088 c0000000ed300de8
[ 1322.268000] $12   : e5e19d9c5f613a45 ffffffffc046d464 0000000000000000 66227ba5ea67b74e
[ 1322.276000] $16   : c0000000f1769c00 c0000000ed1e0200 c0000000f3980150 0000000000000000
[ 1322.284000] $20   : c0000000f3a80000 00000000fffffffc c0000000ed2cfbd8 c0000000f39818f0
[ 1322.292000] $24   : 0000000000000004 0000000000000000
[ 1322.300000] $28   : c0000000ed2c0000 c0000000ed2cfab8 0000000000010000 ffffffffc039c0b0
[ 1322.308000] Hi    : 000000000000023c
[ 1322.312000] Lo    : 000000000003f802
[ 1322.316000] epc   : ffffffffc039a9f8 check_tn_node+0x88/0x3b0
[ 1322.320000]     Not tainted
[ 1322.324000] ra    : ffffffffc039c0b0 jffs2_do_read_inode_internal+0x1250/0x1e48
[ 1322.332000] Status: 5400f8e3    KX SX UX KERNEL EXL IE
[ 1322.336000] Cause : 00800034
[ 1322.340000] PrId  : 000c1004 (Netlogic XLP)
[ 1322.344000] Modules linked in:
[ 1322.348000] Process jffs2_gcd_mtd7 (pid: 264, threadinfo=c0000000ed2c0000, task=c0000000f0e68dd8, tls=0000000000000000)
[ 1322.356000] Stack : c0000000f1769e30 c0000000ed010780 c0000000ed010780 c0000000ed300000
        c0000000f1769c00 c0000000f3980150 c0000000f3a80000 00000000fffffffc
        c0000000ed2cfbd8 ffffffffc039c0b0 ffffffffc09c6340 0000000000001000
        0000000000000dec ffffffffc016c9d8 c0000000f39805a0 c0000000f3980180
        0000008600000000 0000000000000000 0000000000000000 0000000000000000
        0001000000000dec c0000000f1769d98 c0000000ed2cfb18 0000000000010000
        0000000000010000 0000000000000044 c0000000f3a80000 c0000000f1769c00
        c0000000f3d207a8 c0000000f1769d98 c0000000f1769de0 ffffffffc076f9c0
        0000000000000009 0000000000000000 0000000000000000 ffffffffc039cf90
        0000000000000017 ffffffffc013fbdc 0000000000000001 000000010003e61c
        ...
[ 1322.424000] Call Trace:
[ 1322.428000] [<ffffffffc039a9f8>] check_tn_node+0x88/0x3b0
[ 1322.432000] [<ffffffffc039c0b0>] jffs2_do_read_inode_internal+0x1250/0x1e48
[ 1322.440000] [<ffffffffc039cf90>] jffs2_do_crccheck_inode+0x70/0xd0
[ 1322.448000] [<ffffffffc03a1b80>] jffs2_garbage_collect_pass+0x160/0x870
[ 1322.452000] [<ffffffffc03a392c>] jffs2_garbage_collect_thread+0xdc/0x1f0
[ 1322.460000] [<ffffffffc01541c8>] kthread+0xb8/0xc0
[ 1322.464000] [<ffffffffc0106d18>] kernel_thread_helper+0x10/0x18
[ 1322.472000]
[ 1322.472000]
Code: 67bd0050  94a4002c  2c830001 <00038036> de050218  2403fffc  0080a82d  00431824  24630044
[ 1322.480000] ---[ end trace b052bb90e97dfbf5 ]---

The variable csize in structure jffs2_tmp_dnode_info is of type uint16_t, but it
is used to hold the compressed data length(csize) which is declared as uint32_t.
So, when the value of csize exceeds 16bits, it gets truncated when assigned to
tn->csize. This is causing a kernel BUG.
Changing the definition of csize in jffs2_tmp_dnode_info to uint32_t fixes the issue.

Signed-off-by: Ajesh Kunhipurayil Vijayan <ajesh@broadcom.com>
Signed-off-by: Kamlakant Patel <kamlakant.patel@broadcom.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:35 -07:00
Kamlakant Patel
5734144cd1 jffs2: Fix segmentation fault found in stress test
commit 3367da5610 upstream.

Creating a large file on a JFFS2 partition sometimes crashes with this call
trace:

[  306.476000] CPU 13 Unable to handle kernel paging request at virtual address c0000000dfff8002, epc == ffffffffc03a80a8, ra == ffffffffc03a8044
[  306.488000] Oops[#1]:
[  306.488000] Cpu 13
[  306.492000] $ 0   : 0000000000000000 0000000000000000 0000000000008008 0000000000008007
[  306.500000] $ 4   : c0000000dfff8002 000000000000009f c0000000e0007cde c0000000ee95fa58
[  306.508000] $ 8   : 0000000000000001 0000000000008008 0000000000010000 ffffffffffff8002
[  306.516000] $12   : 0000000000007fa9 000000000000ff0e 000000000000ff0f 80e55930aebb92bb
[  306.524000] $16   : c0000000e0000000 c0000000ee95fa5c c0000000efc80000 ffffffffc09edd70
[  306.532000] $20   : ffffffffc2b60000 c0000000ee95fa58 0000000000000000 c0000000efc80000
[  306.540000] $24   : 0000000000000000 0000000000000004
[  306.548000] $28   : c0000000ee950000 c0000000ee95f738 0000000000000000 ffffffffc03a8044
[  306.556000] Hi    : 00000000000574a5
[  306.560000] Lo    : 6193b7a7e903d8c9
[  306.564000] epc   : ffffffffc03a80a8 jffs2_rtime_compress+0x98/0x198
[  306.568000]     Tainted: G        W
[  306.572000] ra    : ffffffffc03a8044 jffs2_rtime_compress+0x34/0x198
[  306.580000] Status: 5000f8e3    KX SX UX KERNEL EXL IE
[  306.584000] Cause : 00800008
[  306.588000] BadVA : c0000000dfff8002
[  306.592000] PrId  : 000c1100 (Netlogic XLP)
[  306.596000] Modules linked in:
[  306.596000] Process dd (pid: 170, threadinfo=c0000000ee950000, task=c0000000ee6e0858, tls=0000000000c47490)
[  306.608000] Stack : 7c547f377ddc7ee4 7ffc7f967f5d7fae 7f617f507fc37ff4 7e7d7f817f487f5f
        7d8e7fec7ee87eb3 7e977ff27eec7f9e 7d677ec67f917f67 7f3d7e457f017ed7
        7fd37f517f867eb2 7fed7fd17ca57e1d 7e5f7fe87f257f77 7fd77f0d7ede7fdb
        7fba7fef7e197f99 7fde7fe07ee37eb5 7f5c7f8c7fc67f65 7f457fb87f847e93
        7f737f3e7d137cd9 7f8e7e9c7fc47d25 7dbb7fac7fb67e52 7ff17f627da97f64
        7f6b7df77ffa7ec5 80057ef17f357fb3 7f767fa27dfc7fd5 7fe37e8e7fd07e53
        7e227fcf7efb7fa1 7f547e787fa87fcc 7fcb7fc57f5a7ffb 7fc07f6c7ea97e80
        7e2d7ed17e587ee0 7fb17f9d7feb7f31 7f607e797e887faa 7f757fdd7c607ff3
        7e877e657ef37fbd 7ec17fd67fe67ff7 7ff67f797ff87dc4 7eef7f3a7c337fa6
        7fe57fc97ed87f4b 7ebe7f097f0b8003 7fe97e2a7d997cba 7f587f987f3c7fa9
        ...
[  306.676000] Call Trace:
[  306.680000] [<ffffffffc03a80a8>] jffs2_rtime_compress+0x98/0x198
[  306.684000] [<ffffffffc0394f10>] jffs2_selected_compress+0x110/0x230
[  306.692000] [<ffffffffc039508c>] jffs2_compress+0x5c/0x388
[  306.696000] [<ffffffffc039dc58>] jffs2_write_inode_range+0xd8/0x388
[  306.704000] [<ffffffffc03971bc>] jffs2_write_end+0x16c/0x2d0
[  306.708000] [<ffffffffc01d3d90>] generic_file_buffered_write+0xf8/0x2b8
[  306.716000] [<ffffffffc01d4e7c>] __generic_file_aio_write+0x1ac/0x350
[  306.720000] [<ffffffffc01d50a0>] generic_file_aio_write+0x80/0x168
[  306.728000] [<ffffffffc021f7dc>] do_sync_write+0x94/0xf8
[  306.732000] [<ffffffffc021ff6c>] vfs_write+0xa4/0x1a0
[  306.736000] [<ffffffffc02202e8>] SyS_write+0x50/0x90
[  306.744000] [<ffffffffc0116cc0>] handle_sys+0x180/0x1a0
[  306.748000]
[  306.748000]
Code: 020b202d  0205282d  90a50000 <90840000> 14a40038  00000000  0060602d  0000282d  016c5823
[  306.760000] ---[ end trace 79dd088435be02d0 ]---
Segmentation fault

This crash is caused because the 'positions' is declared as an array of signed
short. The value of position is in the range 0..65535, and will be converted
to a negative number when the position is greater than 32767 and causes a
corruption and crash. Changing the definition to 'unsigned short' fixes this
issue

Signed-off-by: Jayachandran C <jchandra@broadcom.com>
Signed-off-by: Kamlakant Patel <kamlakant.patel@broadcom.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:35 -07:00
Eric Whitney
8459d9c808 ext4: fix partial cluster handling for bigalloc file systems
commit c063449394 upstream.

Commit 9cb00419fa, which enables hole punching for bigalloc file
systems, exposed a bug introduced by commit 6ae06ff51e in an earlier
release.  When run on a bigalloc file system, xfstests generic/013, 068,
075, 083, 091, 100, 112, 127, 263, 269, and 270 fail with e2fsck errors
or cause kernel error messages indicating that previously freed blocks
are being freed again.

The latter commit optimizes the selection of the starting extent in
ext4_ext_rm_leaf() when hole punching by beginning with the extent
supplied in the path argument rather than with the last extent in the
leaf node (as is still done when truncating).  However, the code in
rm_leaf that initially sets partial_cluster to track cluster sharing on
extent boundaries is only guaranteed to run if rm_leaf starts with the
last node in the leaf.  Consequently, partial_cluster is not correctly
initialized when hole punching, and a cluster on the boundary of a
punched region that should be retained may instead be deallocated.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:35 -07:00
Eric Whitney
cbe099ed98 ext4: fix error return from ext4_ext_handle_uninitialized_extents()
commit ce37c42919 upstream.

Commit 3779473246 breaks the return of error codes from
ext4_ext_handle_uninitialized_extents() in ext4_ext_map_blocks().  A
portion of the patch assigns that function's signed integer return
value to an unsigned int.  Consequently, negatively valued error codes
are lost and can be treated as a bogus allocated block count.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:35 -07:00
Hidetoshi Seto
19fc37ed7b Btrfs: skip submitting barrier for missing device
commit f88ba6a2a4 upstream.

I got an error on v3.13:
 BTRFS error (device sdf1) in write_all_supers:3378: errno=-5 IO failure (errors while submitting device barriers.)

how to reproduce:
  > mkfs.btrfs -f -d raid1 /dev/sdf1 /dev/sdf2
  > wipefs -a /dev/sdf2
  > mount -o degraded /dev/sdf1 /mnt
  > btrfs balance start -f -sconvert=single -mconvert=single -dconvert=single /mnt

The reason of the error is that barrier_all_devices() failed to submit
barrier to the missing device.  However it is clear that we cannot do
anything on missing device, and also it is not necessary to care chunks
on the missing device.

This patch stops sending/waiting barrier if device is missing.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:35 -07:00
Jan Kara
bf0972039d bdi: avoid oops on device removal
commit 5acda9d12d upstream.

After commit 839a8e8660 ("writeback: replace custom worker pool
implementation with unbound workqueue") when device is removed while we
are writing to it we crash in bdi_writeback_workfn() ->
set_worker_desc() because bdi->dev is NULL.

This can happen because even though bdi_unregister() cancels all pending
flushing work, nothing really prevents new ones from being queued from
balance_dirty_pages() or other places.

Fix the problem by clearing BDI_registered bit in bdi_unregister() and
checking it before scheduling of any flushing work.

Fixes: 839a8e8660

Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Derek Basehore <dbasehore@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:35 -07:00
Derek Basehore
39305a6ac7 backing_dev: fix hung task on sync
commit 6ca738d60c upstream.

bdi_wakeup_thread_delayed() used the mod_delayed_work() function to
schedule work to writeback dirty inodes.  The problem with this is that
it can delay work that is scheduled for immediate execution, such as the
work from sync_inodes_sb().  This can happen since mod_delayed_work()
can now steal work from a work_queue.  This fixes the problem by using
queue_delayed_work() instead.  This is a regression caused by commit
839a8e8660 ("writeback: replace custom worker pool implementation with
unbound workqueue").

The reason that this causes a problem is that laptop-mode will change
the delay, dirty_writeback_centisecs, to 60000 (10 minutes) by default.
In the case that bdi_wakeup_thread_delayed() races with
sync_inodes_sb(), sync will be stopped for 10 minutes and trigger a hung
task.  Even if dirty_writeback_centisecs is not long enough to cause a
hung task, we still don't want to delay sync for that long.

We fix the problem by using queue_delayed_work() when we want to
schedule writeback sometime in future.  This function doesn't change the
timer if it is already armed.

For the same reason, we also change bdi_writeback_workfn() to
immediately queue the work again in the case that the work_list is not
empty.  The same problem can happen if the sync work is run on the
rescue worker.

[jack@suse.cz: update changelog, add comment, use bdi_wakeup_thread_delayed()]
Signed-off-by: Derek Basehore <dbasehore@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zento.linux.org.uk>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Derek Basehore <dbasehore@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Benson Leung <bleung@chromium.org>
Cc: Sonny Rao <sonnyrao@chromium.org>
Cc: Luigi Semenzato <semenzato@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:35 -07:00
David Rientjes
def52acc90 mm: close PageTail race
commit 668f9abbd4 upstream.

Commit bf6bddf192 ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page).  Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation.  This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling.  The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-03 12:01:05 -07:00
Theodore Ts'o
0a0ae7b3fb ext4: atomically set inode->i_flags in ext4_set_inode_flags()
commit 00a1a053eb upstream.

Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.

Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-03 12:01:04 -07:00
Nicholas Bellinger
68c52c3ef8 bio-integrity: Fix bio_integrity_verify segment start bug
commit 5837c80e87 upstream.

This patch addresses a bug in bio_integrity_verify() code that has
been causing DIF READ verify operations to be silently skipped.

The issue is that bio->bi_idx will have been incremented within
bio_advance() code in the normal blk_update_request() ->
req_bio_endio() completion path, and bio_integrity_verify() is
using bio_for_each_segment() which starts the bio segment walk
at the current bio->bi_idx.

So instead use bio_for_each_segment_all() to always start the bio
segment walk from zero, regardless of the current bio->bi_idx
value after bio_advance() has been called.

(Context change for v3.10.y -> v3.13.y code - nab)

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:21 -07:00
Oleg Drokin
f1352fb2a3 Fix mountpoint reference leakage in linkat
commit d22e6338db upstream.

Recent changes to retry on ESTALE in linkat
(commit 442e31ca5a)
introduced a mountpoint reference leak and a small memory
leak in case a filesystem link operation returns ESTALE
which is pretty normal for distributed filesystems like
lustre, nfs and so on.
Free old_path in such a case.

[AV: there was another missing path_put() nearby - on the previous
goto retry]

Signed-off-by: Oleg Drokin: <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:20 -07:00
Filipe David Borba Manana
0e48f06cf0 Btrfs: fix data corruption when reading/updating compressed extents
commit a2aa75e18a upstream.

When using a mix of compressed file extents and prealloc extents, it
is possible to fill a page of a file with random, garbage data from
some unrelated previous use of the page, instead of a sequence of zeroes.

A simple sequence of steps to get into such case, taken from the test
case I made for xfstests, is:

   _scratch_mkfs
   _scratch_mount "-o compress-force=lzo"
   $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

This results in the following file items in the fs tree:

   item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
       inode generation 6 transid 6 size 542872 block group 0 mode 100600
   item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
       inode ref index 2 namelen 6 name: foobar
   item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
       extent data disk byte 0 nr 0 gen 6
       extent data offset 0 nr 24576 ram 266240
       extent compression 0
   item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
       prealloc data disk byte 12849152 nr 241664 gen 6
       prealloc data offset 0 nr 241664
   item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
       extent data disk byte 12845056 nr 4096 gen 6
       extent data offset 0 nr 20480 ram 20480
       extent compression 2
   item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
       prealloc data disk byte 13090816 nr 405504 gen 6
       prealloc data offset 0 nr 258048

The on disk extent at offset 266240 (which corresponds to 1 single disk block),
contains 5 compressed chunks of file data. Each of the first 4 compress 4096
bytes of file data, while the last one only compresses 3024 bytes of file data.
Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
1072 bytes) should always return zeroes (our next extent is a prealloc one).

The solution here is the compression code path to zero the remaining (untouched)
bytes of the last page it uncompressed data into, as the information about how
much space the file data consumes in the last page is not known in the upper layer
fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
the remainder of the page but only if it corresponds to the last page of the inode
and if the inode's size is not a multiple of the page size.

This would cause not only returning random data on reads, but also permanently
storing random data when updating parts of the region that should be zeroed.
For the example above, it means updating a single byte in the region [285648 ; 286720[
would store that byte correctly but also store random data on disk.

A test case for xfstests follows soon.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:20 -07:00
Artem Fetishev
63490f40e0 fs/proc/base.c: fix GPF in /proc/$PID/map_files
commit 70335abb26 upstream.

The expected logic of proc_map_files_get_link() is either to return 0
and initialize 'path' or return an error and leave 'path' uninitialized.

By the time dname_to_vma_addr() returns 0 the corresponding vma may have
already be gone.  In this case the path is not initialized but the
return value is still 0.  This results in 'general protection fault'
inside d_path().

Steps to reproduce:

  CONFIG_CHECKPOINT_RESTORE=y

    fd = open(...);
    while (1) {
        mmap(fd, ...);
        munmap(fd, ...);
    }

  ls -la /proc/$PID/map_files

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=68991

Signed-off-by: Artem Fetishev <artem_fetishev@epam.com>
Signed-off-by: Aleksandr Terekhov <aleksandr_terekhov@epam.com>
Reported-by: <wiebittewas@gmail.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:17 -07:00
Trond Myklebust
c8dd8fdf0b NFSv4: nfs4_stateid_is_current should return 'true' for an invalid stateid
commit e1253be0ec upstream.

When nfs4_set_rw_stateid() can fails by returning EIO to indicate that
the stateid is completely invalid, then it makes no sense to have it
trigger a retry of the READ or WRITE operation. Instead, we should just
have it fall through and attempt a recovery.

This fixes an infinite loop in which the client keeps replaying the same
bad stateid back to the server.

Reported-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:17 -07:00
Trond Myklebust
40cb674a5a NFS: Fix a delegation callback race
commit 755a48a7a4 upstream.

The clean-up in commit 36281caa83 ended up removing a NULL pointer check
that is needed in order to prevent an Oops in
nfs_async_inode_return_delegation().

Reported-by: "Yan, Zheng" <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/5313E9F6.2020405@intel.com
Fixes: 36281caa83 (NFSv4: Further clean-ups of delegation stateid validation)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:16 -07:00
Al Viro
52ed96fc71 ocfs2 syncs the wrong range...
commit 1b56e98990 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:09 -07:00
Jan Kara
3e8e77367e ocfs2: fix quota file corruption
commit 15c34a7606 upstream.

Global quota files are accessed from different nodes.  Thus we cannot
cache offset of quota structure in the quota file after we drop our node
reference count to it because after that moment quota structure may be
freed and reallocated elsewhere by a different node resulting in
corruption of quota file.

Fix the problem by clearing dq_off when we are releasing dquot structure.
We also remove the DB_READ_B handling because it is useless -
DQ_ACTIVE_B is set iff DQ_READ_B is set.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:09 -07:00
Jan Kara
4480120a6c quota: Fix race between dqput() and dquot_scan_active()
commit 1362f4ea20 upstream.

Currently last dqput() can race with dquot_scan_active() causing it to
call callback for an already deactivated dquot. The race is as follows:

CPU1					CPU2
  dqput()
    spin_lock(&dq_list_lock);
    if (atomic_read(&dquot->dq_count) > 1) {
     - not taken
    if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
      spin_unlock(&dq_list_lock);
      ->release_dquot(dquot);
        if (atomic_read(&dquot->dq_count) > 1)
         - not taken
					  dquot_scan_active()
					    spin_lock(&dq_list_lock);
					    if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
					     - not taken
					    atomic_inc(&dquot->dq_count);
					    spin_unlock(&dq_list_lock);
        - proceeds to release dquot
					    ret = fn(dquot, priv);
					     - called for inactive dquot

Fix the problem by making sure possible ->release_dquot() is finished by
the time we call the callback and new calls to it will notice reference
dquot_scan_active() has taken and bail out.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:12 -08:00
Christoph Hellwig
d5685be133 fs: fix iversion handling
commit dff6efc326 upstream.

Currently notify_change directly updates i_version for size updates,
which not only is counter to how all other fields are updated through
struct iattr, but also breaks XFS, which need inode updates to happen
under its own lock, and synchronized to the structure that gets written
to the log.

Remove the update in the common code, and it to btrfs and ext4,
XFS already does a proper updaste internally and currently gets a
double update with the existing code.

IMHO this is 3.13 and -stable material and should go in through the XFS
tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:06 -08:00
Pavel Shilovsky
32c36d8888 CIFS: Fix too big maxBuf size for SMB3 mounts
commit 2365c4eaf0 upstream.

SMB3 servers can respond with MaxTransactSize of more than 4M
that can cause a memory allocation error returned from kmalloc
in a lock codepath. Also the client doesn't support multicredit
requests now and allows buffer sizes of 65536 bytes only. Set
MaxTransactSize to this maximum supported value.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:02 -08:00
Jeff Layton
9f0afafeb7 cifs: ensure that uncached writes handle unmapped areas correctly
commit 5d81de8e86 upstream.

It's possible for userland to pass down an iovec via writev() that has a
bogus user pointer in it. If that happens and we're doing an uncached
write, then we can end up getting less bytes than we expect from the
call to iov_iter_copy_from_user. This is CVE-2014-0069

cifs_iovec_write isn't set up to handle that situation however. It'll
blindly keep chugging through the page array and not filling those pages
with anything useful. Worse yet, we'll later end up with a negative
number in wdata->tailsz, which will confuse the sending routines and
cause an oops at the very least.

Fix this by having the copy phase of cifs_iovec_write stop copying data
in this situation and send the last write as a short one. At the same
time, we want to avoid sending a zero-length write to the server, so
break out of the loop and set rc to -EFAULT if that happens. This also
allows us to handle the case where no address in the iovec is valid.

[Note: Marking this for stable on v3.4+ kernels, but kernels as old as
       v2.6.38 may have a similar problem and may need similar fix]

Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:02 -08:00
Theodore Ts'o
0c9f219834 ext4: don't leave i_crtime.tv_sec uninitialized
commit 19ea806037 upstream.

If the i_crtime field is not present in the inode, don't leave the
field uninitialized.

Fixes: ef7f38359 ("ext4: Add nanosecond timestamps")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:00 -08:00
Theodore Ts'o
d6b55805ac ext4: fix online resize with a non-standard blocks per group setting
commit 3d2660d0c9 upstream.

The set_flexbg_block_bitmap() function assumed that the number of
blocks in a blockgroup was sb->blocksize * 8, which is normally true,
but not always!  Use EXT4_BLOCKS_PER_GROUP(sb) instead, to fix block
bitmap corruption after:

mke2fs -t ext4 -g 3072 -i 4096 /dev/vdd 1G
mount -t ext4 /dev/vdd /vdd
resize2fs /dev/vdd 8G

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Jon Bernard <jbernard@tuxion.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:00 -08:00
Theodore Ts'o
94a197c6bc ext4: fix online resize with very large inode tables
commit b93c953534 upstream.

If a file system has a large number of inodes per block group, all of
the metadata blocks in a flex_bg may be larger than what can fit in a
single block group.  Unfortunately, ext4_alloc_group_tables() in
resize.c was never tested to see if it would handle this case
correctly, and there were a large number of bugs which caused the
following sequence to result in a BUG_ON:

kernel bug at fs/ext4/resize.c:409!
   ...
call trace:
 [<ffffffff81256768>] ext4_flex_group_add+0x1448/0x1830
 [<ffffffff81257de2>] ext4_resize_fs+0x7b2/0xe80
 [<ffffffff8123ac50>] ext4_ioctl+0xbf0/0xf00
 [<ffffffff811c111d>] do_vfs_ioctl+0x2dd/0x4b0
 [<ffffffff811b9df2>] ? final_putname+0x22/0x50
 [<ffffffff811c1371>] sys_ioctl+0x81/0xa0
 [<ffffffff81676aa9>] system_call_fastpath+0x16/0x1b
code: c8 4c 89 df e8 41 96 f8 ff 44 89 e8 49 01 c4 44 29 6d d4 0
rip  [<ffffffff81254fa1>] set_flexbg_block_bitmap+0x171/0x180


This can be reproduced with the following command sequence:

   mke2fs -t ext4 -i 4096 /dev/vdd 1G
   mount -t ext4 /dev/vdd /vdd
   resize2fs /dev/vdd 8G

To fix this, we need to make sure the right thing happens when a block
group's inode table straddles two block groups, which means the
following bugs had to be fixed:

1) Not clearing the BLOCK_UNINIT flag in the second block group in
   ext4_alloc_group_tables --- the was proximate cause of the BUG_ON.

2) Incorrectly determining how many block groups contained contiguous
   free blocks in ext4_alloc_group_tables().

3) Incorrectly setting the start of the next block range to be marked
   in use after a discontinuity in setup_new_flex_group_blocks().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:00 -08:00
Theodore Ts'o
f13ea13288 ext4: don't try to modify s_flags if the the file system is read-only
commit 2330141097 upstream.

If an ext4 file system is created by some tool other than mke2fs
(perhaps by someone who has a pathalogical fear of the GPL) that
doesn't set one or the other of the EXT2_FLAGS_{UN}SIGNED_HASH flags,
and that file system is then mounted read-only, don't try to modify
the s_flags field.  Otherwise, if dm_verity is in use, the superblock
will change, causing an dm_verity failure.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:00 -08:00
Zheng Liu
4687eb1ebf ext4: fix error paths in swap_inode_boot_loader()
commit 30d29b119e upstream.

In swap_inode_boot_loader() we forgot to release ->i_mutex and resume
unlocked dio for inode and inode_bl if there is an error starting the
journal handle.  This commit fixes this issue.

Reported-by: Ahmed Tamrawi <ahmedtamrawi@gmail.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dr. Tilmann Bubeck <t.bubeck@reinform.de>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:00 -08:00
Martin K. Petersen
caaeac3554 block: Fix nr_vecs for inline integrity vectors
commit 087787959c upstream.

Commit 9f060e2231 changed the way we handle allocations for the
integrity vectors. When the vectors are inline there is no associated
slab and consequently bvec_nr_vecs() returns 0. Ensure that we check
against BIP_INLINE_VECS in that case.

Reported-by: David Milburn <dmilburn@redhat.com>
Tested-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22 12:41:28 -08:00
NeilBrown
5de6426083 lockd: send correct lock when granting a delayed lock.
commit 2ec197db1a upstream.

If an NFS client attempts to get a lock (using NLM) and the lock is
not available, the server will remember the request and when the lock
becomes available it will send a GRANT request to the client to
provide the lock.

If the client already held an adjacent lock, the GRANT callback will
report the union of the existing and new locks, which can confuse the
client.

This happens because __posix_lock_file (called by vfs_lock_file)
updates the passed-in file_lock structure when adjacent or
over-lapping locks are found.

To avoid this problem we take a copy of the two fields that can
be changed (fl_start and fl_end) before the call and restore them
afterwards.
An alternate would be to allocate a 'struct file_lock', initialise it,
use locks_copy_lock() to take a copy, then locks_release_private()
after the vfs_lock_file() call.  But that is a lot more work.

Reported-by: Olaf Kirch <okir@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

--
v1 had a couple of issues (large on-stack struct and didn't really work properly).
This version is much better tested.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-02-22 12:41:27 -08:00
Steve French
6bd8c85082 retrieving CIFS ACLs when mounted with SMB2 fails dropping session
commit 83e3bc23ef upstream.

The get/set ACL xattr support for CIFS ACLs attempts to send old
cifs dialect protocol requests even when mounted with SMB2 or later
dialects. Sending cifs requests on an smb2 session causes problems -
the server drops the session due to the illegal request.

This patch makes CIFS ACL operations protocol specific to fix that.

Attempting to query/set CIFS ACLs for SMB2 will now return
EOPNOTSUPP (until we add worker routines for sending query
ACL requests via SMB2) instead of sending invalid (cifs)
requests.

A separate followon patch will be needed to fix cifs_acl_to_fattr
(which takes a cifs specific u16 fid so can't be abstracted
to work with SMB2 until that is changed) and will be needed
to fix mount problems when "cifsacl" is specified on mount
with e.g. vers=2.1

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22 12:41:26 -08:00