Commit Graph

1350 Commits

Author SHA1 Message Date
Jeff Layton
304902e1f7 ceph: flush dirty inodes before proceeding with remount
commit 00abf69dd2 upstream.

xfstest generic/452 was triggering a "Busy inodes after umount" warning.
ceph was allowing the mount to go read-only without first flushing out
dirty inodes in the cache. Ensure we sync out the filesystem before
allowing a remount to proceed.

Cc: stable@vger.kernel.org
Link: http://tracker.ceph.com/issues/39571
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25 18:26:49 +02:00
Al Viro
46b53c29f0 ceph: fix use-after-free on symlink traversal
[ Upstream commit daf5cc27ee ]

free the symlink body after the same RCU delay we have for freeing the
struct inode itself, so that traversal during RCU pathwalk wouldn't step
into freed memory.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-04 08:49:09 +02:00
Yan, Zheng
99055c03d2 ceph: fix ci->i_head_snapc leak
commit 37659182bf upstream.

We missed two places that i_wrbuffer_ref_head, i_wr_ref, i_dirty_caps
and i_flushing_caps may change. When they are all zeros, we should free
i_head_snapc.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/38224
Reported-and-tested-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02 09:32:01 +02:00
Jeff Layton
a9adfd9439 ceph: ensure d_name stability in ceph_dentry_hash()
commit 76a495d666 upstream.

Take the d_lock here to ensure that d_name doesn't change.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02 09:32:01 +02:00
Yan, Zheng
79e3959b68 ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
commit 04242ff3ac upstream.

Otherwise, mdsc->snap_flush_list may get corrupted.

Cc: stable@vger.kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-27 10:06:58 +01:00
Yan, Zheng
6fa22a7d5e ceph: don't update importing cap's mseq when handing cap export
commit 3c1392d4c4 upstream.

Updating mseq makes client think importer mds has accepted all prior
cap messages and importer mds knows what caps client wants. Actually
some cap messages may have been dropped because of mseq mismatch.

If mseq is left untouched, importing cap's mds_wanted later will get
reset by cap import message.

Cc: stable@vger.kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 10:03:54 +01:00
Ilya Dryomov
06e925920d libceph: add authorizer challenge
commit 6daca13d2e upstream.

When a client authenticates with a service, an authorizer is sent with
a nonce to the service (ceph_x_authorize_[ab]) and the service responds
with a mutation of that nonce (ceph_x_authorize_reply).  This lets the
client verify the service is who it says it is but it doesn't protect
against a replay: someone can trivially capture the exchange and reuse
the same authorizer to authenticate themselves.

Allow the service to reject an initial authorizer with a random
challenge (ceph_x_authorize_challenge).  The client then has to respond
with an updated authorizer proving they are able to decrypt the
service's challenge and that the new authorizer was produced for this
specific connection instance.

The accepting side requires this challenge and response unconditionally
if the client side advertises they have CEPHX_V2 feature bit.

This addresses CVE-2018-1128.

Link: http://tracker.ceph.com/issues/24836
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 13:05:10 +01:00
Ilya Dryomov
9041d1e200 libceph: drop len argument of *verify_authorizer_reply()
commit 0dde584882 upstream.

The length of the reply is protocol-dependent - for cephx it's
ceph_x_authorize_reply.  Nothing sensible can be passed from the
messenger layer anyway.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 13:05:09 +01:00
Yan, Zheng
43e38372f9 Revert "ceph: fix dentry leak in splice_dentry()"
commit efe328230d upstream.

This reverts commit 8b8f53af1e.

splice_dentry() is used by three places. For two places, req->r_dentry
is passed to splice_dentry(). In the case of error, req->r_dentry does
not get updated. So splice_dentry() should not drop reference.

Cc: stable@vger.kernel.org # 4.18+
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21 09:25:59 +01:00
Yan, Zheng
01d012fe08 ceph: fix dentry leak in splice_dentry()
[ Upstream commit 8b8f53af1e ]

In any case, d_splice_alias() does not drop reference of original
dentry.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-24 13:12:33 +02:00
Chengguang Xu
087d268b76 ceph: fix dentry leak when failing to init debugfs
[ Upstream commit 18106734b5 ]

When failing from ceph_fs_debugfs_init() in ceph_real_mount(),
there is lack of dput of root_dentry and it causes slab errors,
so change the calling order of ceph_fs_debugfs_init() and
open_root_dentry() and do some cleanups to avoid this issue.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:50:32 +02:00
Yan, Zheng
ff1c42b0c6 ceph: only dirty ITER_IOVEC pages for direct read
commit 85784f9395 upstream.

If a page is already locked, attempting to dirty it leads to a deadlock
in lock_page().  This is what currently happens to ITER_BVEC pages when
a dio-enabled loop device is backed by ceph:

  $ losetup --direct-io /dev/loop0 /mnt/cephfs/img
  $ xfs_io -c 'pread 0 4k' /dev/loop0

Follow other file systems and only dirty ITER_IOVEC pages.

Cc: stable@kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08 12:12:42 +02:00
Yan, Zheng
e081bd0d70 ceph: drop negative child dentries before try pruning inode's alias
commit 040d786032 upstream.

Negative child dentry holds reference on inode's alias, it makes
d_prune_aliases() do nothing.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:07:16 +01:00
Jeff Layton
a703da486e ceph: unlock dangling spinlock in try_flush_caps()
commit 6c2838fbde upstream.

sparse warns:

  fs/ceph/caps.c:2042:9: warning: context imbalance in 'try_flush_caps' - wrong count at exit

We need to exit this function with the lock unlocked, but a couple of
cases leave it locked.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 09:49:13 +01:00
Jeff Layton
bd53085221 ceph: clean up unsafe d_parent accesses in build_dentry_path
[ Upstream commit c6b0b656ca ]

While we hold a reference to the dentry when build_dentry_path is
called, we could end up racing with a rename that changes d_parent.
Handle that situation correctly, by using the rcu_read_lock to
ensure that the parent dentry and inode stick around long enough
to safely check ceph_snap and ceph_ino.

Link: http://tracker.ceph.com/issues/18148
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:21:37 +02:00
Jeff Layton
6839ad59f9 ceph: fix bogus endianness change in ceph_ioctl_set_layout
[ Upstream commit 24c149ad69 ]

sparse says:

    fs/ceph/ioctl.c:100:28: warning: cast to restricted __le64

preferred_osd is a __s64 so we don't need to do any conversion. Also,
just remove the cast in ceph_ioctl_get_layout as it's not needed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:21:37 +02:00
Jeff Layton
df37e8fadf ceph: don't update_dentry_lease unless we actually got one
[ Upstream commit 80d025ffed ]

This if block updates the dentry lease even in the case where
the MDS didn't grant one.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:21:37 +02:00
Yan, Zheng
715849268b ceph: fix readpage from fscache
commit dd2bc47348 upstream.

ceph_readpage() unlocks page prematurely prematurely in the case
that page is reading from fscache. Caller of readpage expects that
page is uptodate when it get unlocked. So page shoule get locked
by completion callback of fscache_read_or_alloc_pages()

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-07 08:35:40 +02:00
Yan, Zheng
acccf01a80 ceph: fix race in concurrent readdir
commit 84583cfb97 upstream.

For a large directory, program needs to issue multiple readdir
syscalls to get all dentries. When there are multiple programs
read the directory concurrently. Following sequence of events
can happen.

 - program calls readdir with pos = 2. ceph sends readdir request
   to mds. The reply contains N1 entries. ceph adds these N1 entries
   to readdir cache.
 - program calls readdir with pos = N1+2. The readdir is satisfied
   by the readdir cache, N2 entries are returned. (Other program
   calls readdir in the middle, which fills the cache)
 - program calls readdir with pos = N1+N2+2. ceph sends readdir
   request to mds. The reply contains N3 entries and it reaches
   directory end. ceph adds these N3 entries to the readdir cache
   and marks directory complete.

The second readdir call does not update fi->readdir_cache_idx.
ceph add the last N3 entries to wrong places.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27 15:08:07 -07:00
Yan, Zheng
9403514ba1 ceph: choose readdir frag based on previous readdir reply
commit b50c2de51e upstream.

The dirfragtree is lazily updated, it's not always accurate. Infinite
loops happens in following circumstance.

- client send request to read frag A
- frag A has been fragmented into frag B and C. So mds fills the reply
  with contents of frag B
- client wants to read next frag C. ceph_choose_frag(frag value of C)
  return frag A.

The fix is using previous readdir reply to calculate next readdir frag
when possible.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 15:01:02 +02:00
Fabian Frederick
61604a2626 fs: add i_blocksize()
commit 93407472a2 upstream.

Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
branch.

This patch also fixes multiple checkpatch warnings: WARNING: Prefer
'unsigned int' to bare use of 'unsigned'

Thanks to Andrew Morton for suggesting more appropriate function instead
of macro.

[geliangtang@gmail.com: truncate: use i_blocksize()]
  Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:06:00 +02:00
Luis Henriques
091784ae97 ceph: fix memory leak in __ceph_setxattr()
commit eeca958dce upstream.

The ceph_inode_xattr needs to be released when removing an xattr.  Easily
reproducible running the 'generic/020' test from xfstests or simply by
doing:

  attr -s attr0 -V 0 /mnt/test && attr -r attr0 /mnt/test

While there, also fix the error path.

Here's the kmemleak splat:

unreferenced object 0xffff88001f86fbc0 (size 64):
  comm "attr", pid 244, jiffies 4294904246 (age 98.464s)
  hex dump (first 32 bytes):
    40 fa 86 1f 00 88 ff ff 80 32 38 1f 00 88 ff ff  @........28.....
    00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
  backtrace:
    [<ffffffff81560199>] kmemleak_alloc+0x49/0xa0
    [<ffffffff810f3e5b>] kmem_cache_alloc+0x9b/0xf0
    [<ffffffff812b157e>] __ceph_setxattr+0x17e/0x820
    [<ffffffff812b1c57>] ceph_set_xattr_handler+0x37/0x40
    [<ffffffff8111fb4b>] __vfs_removexattr+0x4b/0x60
    [<ffffffff8111fd37>] vfs_removexattr+0x77/0xd0
    [<ffffffff8111fdd1>] removexattr+0x41/0x60
    [<ffffffff8111fe65>] path_removexattr+0x75/0xa0
    [<ffffffff81120aeb>] SyS_lremovexattr+0xb/0x10
    [<ffffffff81564b20>] entry_SYSCALL_64_fastpath+0x13/0x94
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:28:39 +02:00
Yan, Zheng
e99b0ea393 ceph: try getting buffer capability for readahead/fadvise
commit 2b1ac852eb upstream.

For readahead/fadvise cases, caller of ceph_readpages does not
hold buffer capability. Pages can be added to page cache while
there is no buffer capability. This can cause data integrity
issue.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-08 07:47:55 +02:00
Yan, Zheng
36e0be3187 ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
commit 8179a101eb upstream.

ceph_set_acl() calls __ceph_setattr() if the setacl operation needs
to modify inode's i_mode. __ceph_setattr() updates inode's i_mode,
then calls posix_acl_chmod().

The problem is that __ceph_setattr() calls posix_acl_chmod() before
sending the setattr request. The get_acl() call in posix_acl_chmod()
can trigger a getxattr request. The reply of the getxattr request
can restore inode's i_mode to its old value. The set_acl() call in
posix_acl_chmod() sees old value of inode's i_mode, so it calls
__ceph_setattr() again.

Link: http://tracker.ceph.com/issues/19688
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-03 08:36:39 -07:00
Jeff Layton
5da90d0018 ceph: remove req from unsafe list when unregistering it
commit df963ea8a0 upstream.

There's no reason a request should ever be on a s_unsafe list but not
in the request tree.

Link: http://tracker.ceph.com/issues/18474
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:45 +08:00
Yan, Zheng
dc8470f3c8 ceph: update readpages osd request according to size of pages
commit d641df819d upstream.

add_to_page_cache_lru() can fails, so the actual pages to read
can be smaller than the initial size of osd request. We need to
update osd request size in that case.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:53 +01:00
Jeff Layton
6e9fa67c58 ceph: fix endianness bug in frag_tree_split_cmp
commit fe2ed42517 upstream.

sparse says:

    fs/ceph/inode.c:308:36: warning: incorrect type in argument 1 (different base types)
    fs/ceph/inode.c:308:36:    expected unsigned int [unsigned] [usertype] a
    fs/ceph/inode.c:308:36:    got restricted __le32 [usertype] frag
    fs/ceph/inode.c:308:46: warning: incorrect type in argument 2 (different base types)
    fs/ceph/inode.c:308:46:    expected unsigned int [unsigned] [usertype] b
    fs/ceph/inode.c:308:46:    got restricted __le32 [usertype] frag

We need to convert these values to host-endian before calling the
comparator.

Fixes: a407846ef7 ("ceph: don't assume frag tree splits in mds reply are sorted")
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:24:43 +01:00
Jeff Layton
2e4f2131b6 ceph: fix endianness of getattr mask in ceph_d_revalidate
commit 1097680d75 upstream.

sparse says:

    fs/ceph/dir.c:1248:50: warning: incorrect type in assignment (different base types)
    fs/ceph/dir.c:1248:50:    expected restricted __le32 [usertype] mask
    fs/ceph/dir.c:1248:50:    got int [signed] [assigned] mask

Fixes: 200fd27c8f ("ceph: use lookup request to revalidate dentry")
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:24:43 +01:00
Yan, Zheng
8934e06967 ceph: fix ceph_get_caps() interruption
commit 6e09d0fb64 upstream.

Commit 5c341ee328 ("ceph: fix scheduler warning due to nested
blocking") causes infinite loop when process is interrupted.  Fix it.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:24:43 +01:00
Nikolay Borisov
48baa92410 ceph: fix scheduler warning due to nested blocking
commit 5c341ee328 upstream.

try_get_cap_refs can be used as a condition in a wait_event* calls.
This is all fine until it has to call __ceph_do_pending_vmtruncate,
which in turn acquires the i_truncate_mutex. This leads to a situation
in which a task's state is !TASK_RUNNING and at the same time it's
trying to acquire a sleeping primitive. In essence a nested sleeping
primitives are being used. This causes the following warning:

WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109447d>] prepare_to_wait_event+0x5d/0x110
 ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6
CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G           O    4.4.20-clouder2 #6
Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015
 0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0
 ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c
 0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0
Call Trace:
 [<ffffffff812f4409>] dump_stack+0x67/0x9e
 [<ffffffff81052b46>] warn_slowpath_common+0x86/0xc0
 [<ffffffff81052bcc>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110
 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110
 [<ffffffff8107767f>] __might_sleep+0x9f/0xb0
 [<ffffffff81612d30>] mutex_lock+0x20/0x40
 [<ffffffffa04eea14>] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph]
 [<ffffffffa04fa692>] try_get_cap_refs+0xa2/0x320 [ceph]
 [<ffffffffa04fd6f5>] ceph_get_caps+0x255/0x2b0 [ceph]
 [<ffffffff81094370>] ? wait_woken+0xb0/0xb0
 [<ffffffffa04f2c11>] ceph_write_iter+0x2b1/0xde0 [ceph]
 [<ffffffff81613f22>] ? schedule_timeout+0x202/0x260
 [<ffffffff8117f01a>] ? kmem_cache_free+0x1ea/0x200
 [<ffffffff811b46ce>] ? iput+0x9e/0x230
 [<ffffffff81077632>] ? __might_sleep+0x52/0xb0
 [<ffffffff81156147>] ? __might_fault+0x37/0x40
 [<ffffffff8119e123>] ? cp_new_stat+0x153/0x170
 [<ffffffff81198cfa>] __vfs_write+0xaa/0xe0
 [<ffffffff81199369>] vfs_write+0xa9/0x190
 [<ffffffff811b6d01>] ? set_close_on_exec+0x31/0x70
 [<ffffffff8119a056>] SyS_write+0x46/0xa0

This happens since wait_event_interruptible can interfere with the
mutex locking code, since they both fiddle with the task state.

Fix the issue by using the newly-added nested blocking infrastructure
in 61ada528de ("sched/wait: Provide infrastructure to deal with
nested blocking")

Link: https://lwn.net/Articles/628628/
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:24:43 +01:00
Jeff Layton
1f75575aca ceph: fix bad endianness handling in parse_reply_info_extra
commit 6df8c9d80a upstream.

sparse says:

    fs/ceph/mds_client.c:291:23: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:293:28: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:294:28: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:296:28: warning: restricted __le32 degrades to integer

The op value is __le32, so we need to convert it before comparing it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:24:40 +01:00
Jeff Layton
c3f4688a08 ceph: don't set req->r_locked_dir in ceph_d_revalidate
This function sets req->r_locked_dir which is supposed to indicate to
ceph_fill_trace that the parent's i_rwsem is locked for write.
Unfortunately, there is no guarantee that the dir will be locked when
d_revalidate is called, so we really don't want ceph_fill_trace to do
any dcache manipulation from this context. Clear req->r_locked_dir since
it's clearly not safe to do that.

What we really want to know with d_revalidate is whether the dentry
still points to the same inode. ceph_fill_trace installs a pointer to
the inode in req->r_target_inode, so we can just compare that to
d_inode(dentry) to see if it's the same one after the lookup.

Also, since we aren't generally interested in the parent here, we can
switch to using a GETATTR to hint that to the MDS, which also means that
we only need to reserve one cap.

Finally, just remove the d_unhashed check. That's really outside the
purview of a filesystem's d_revalidate. If the thing became unhashed
while we're checking it, then that's up to the VFS to handle anyway.

Fixes: 200fd27c8f ("ceph: use lookup request to revalidate dentry")
Link: http://tracker.ceph.com/issues/18041
Reported-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-08 14:32:16 +01:00
Yan, Zheng
8a8d561766 ceph: use default file splice read callback
Splice read/write implementation changed recently. When using
generic_file_splice_read(), iov_iter with type == ITER_PIPE is
passed to filesystem's read_iter callback. But ceph_sync_read()
can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter
expects pages from page cache).

Fixing ceph_sync_read() requires a big patch. So use default
splice read callback for now.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-11-10 20:13:04 +01:00
Wei Yongjun
5130ccea7c ceph: fix non static symbol warning
Fixes the following sparse warning:

fs/ceph/xattr.c:19:28: warning:
 symbol 'ceph_other_xattr_handler' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-18 12:30:32 +02:00
Geert Uytterhoeven
31ca587810 ceph: fix uninitialized dentry pointer in ceph_real_mount()
fs/ceph/super.c: In function ‘ceph_real_mount’:
    fs/ceph/super.c:818: warning: ‘root’ may be used uninitialized in this function

If s_root is already valid, dentry pointer root is never initialized,
and returned by ceph_real_mount(). This will cause a crash later when
the caller dereferences the pointer.

Fixes: ce2728aaa8 ("ceph: avoid accessing / when mounting a subpath")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-18 12:10:59 +02:00
Yan, Zheng
f72f94555a ceph: fix readdir vs fragmentation race
following sequence of events tigger the race

- client readdir frag 0* -> got item 'A'
- MDS merges frag 0* and frag 1*
- client send readdir request (frag 1*, offset 2, readdir_start 'A')
- MDS reply items (that are after item 'A') in frag *

Link: http://tracker.ceph.com/issues/17286
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-18 12:09:58 +02:00
Nikolay Borisov
0d7718f666 ceph: fix error handling in ceph_read_iter
In case __ceph_do_getattr returns an error and the retry_op in
ceph_read_iter is not READ_INLINE, then it's possible to invoke
__free_page on a page which is NULL, this naturally leads to a crash.
This can happen when, for example, a process waiting on a MDS reply
receives sigterm.

Fix this by explicitly checking whether the page is set or not.

Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-15 23:28:07 +02:00
Linus Torvalds
101105b171 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 ">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Replace current_fs_time() with current_time()
  fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
  fs: Replace CURRENT_TIME with current_time() for inode timestamps
  fs: proc: Delete inode time initializations in proc_alloc_inode()
  vfs: Add current_time() api
  vfs: add note about i_op->rename changes to porting
  fs: rename "rename2" i_op to "rename"
  vfs: remove unused i_op->rename
  fs: make remaining filesystems use .rename2
  libfs: support RENAME_NOREPLACE in simple_rename()
  fs: support RENAME_NOREPLACE for local filesystems
  ncpfs: fix unused variable warning
2016-10-10 20:16:43 -07:00
Al Viro
3873691e5a Merge remote-tracking branch 'ovl/rename2' into for-linus 2016-10-10 23:02:51 -04:00
Linus Torvalds
97d2116708 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr updates from Al Viro:
 "xattr stuff from Andreas

  This completes the switch to xattr_handler ->get()/->set() from
  ->getxattr/->setxattr/->removexattr"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: Remove {get,set,remove}xattr inode operations
  xattr: Stop calling {get,set,remove}xattr inode operations
  vfs: Check for the IOP_XATTR flag in listxattr
  xattr: Add __vfs_{get,set,remove}xattr helpers
  libfs: Use IOP_XATTR flag for empty directory handling
  vfs: Use IOP_XATTR flag for bad-inode handling
  vfs: Add IOP_XATTR inode operations flag
  vfs: Move xattr_resolve_name to the front of fs/xattr.c
  ecryptfs: Switch to generic xattr handlers
  sockfs: Get rid of getxattr iop
  sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
  kernfs: Switch to generic xattr handlers
  hfs: Switch to generic xattr handlers
  jffs2: Remove jffs2_{get,set,remove}xattr macros
  xattr: Remove unnecessary NULL attribute name check
2016-10-10 17:11:50 -07:00
Linus Torvalds
8dfb790b15 Merge tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client
Pull Ceph updates from Ilya Dryomov:
 "The big ticket item here is support for rbd exclusive-lock feature,
  with maintenance operations offloaded to userspace (Douglas Fuller,
  Mike Christie and myself). Another block device bullet is a series
  fixing up layering error paths (myself).

  On the filesystem side, we've got patches that improve our handling of
  buffered vs dio write races (Neil Brown) and a few assorted fixes from
  Zheng. Also included a couple of random cleanups and a minor CRUSH
  update"

* tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client: (39 commits)
  crush: remove redundant local variable
  crush: don't normalize input of crush_ln iteratively
  libceph: ceph_build_auth() doesn't need ceph_auth_build_hello()
  libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello()
  ceph: fix description for rsize and rasize mount options
  rbd: use kmalloc_array() in rbd_header_from_disk()
  ceph: use list_move instead of list_del/list_add
  ceph: handle CEPH_SESSION_REJECT message
  ceph: avoid accessing / when mounting a subpath
  ceph: fix mandatory flock check
  ceph: remove warning when ceph_releasepage() is called on dirty page
  ceph: ignore error from invalidate_inode_pages2_range() in direct write
  ceph: fix error handling of start_read()
  rbd: add rbd_obj_request_error() helper
  rbd: img_data requests don't own their page array
  rbd: don't call rbd_osd_req_format_read() for !img_data requests
  rbd: rework rbd_img_obj_exists_submit() error paths
  rbd: don't crash or leak on errors in rbd_img_obj_parent_read_full_callback()
  rbd: move bumping img_request refcount into rbd_obj_request_submit()
  rbd: mark the original request as done if stat request fails
  ...
2016-10-10 13:52:05 -07:00
Al Viro
e55f1d1d13 Merge remote-tracking branch 'jk/vfs' into work.misc 2016-10-08 11:06:08 -04:00
Andreas Gruenbacher
fd50ecaddf vfs: Remove {get,set,remove}xattr inode operations
These inode operations are no longer used; remove them.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-07 21:48:36 -04:00
Wei Yongjun
8cdcc07dde ceph: use list_move instead of list_del/list_add
Using list_move() instead of list_del() + list_add().

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-03 16:13:50 +02:00
Yan, Zheng
fcff415c94 ceph: handle CEPH_SESSION_REJECT message
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03 16:13:50 +02:00
Yan, Zheng
ce2728aaa8 ceph: avoid accessing / when mounting a subpath
Accessing / causes failuire if the client has caps that restrict path

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03 16:13:50 +02:00
Yan, Zheng
db4a63aab4 ceph: fix mandatory flock check
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03 16:13:49 +02:00
NeilBrown
e55f1a1871 ceph: remove warning when ceph_releasepage() is called on dirty page
If O_DIRECT writes are racing with buffered writes, then
the call to invalidate_inode_pages2_range() can call ceph_releasepage()
on dirty pages.

Most filesystems hold inode_lock() across O_DIRECT writes so they do not
suffer this race, but cephfs deliberately drops the lock, and opens a window
for the race.

This race can be triggered with the generic/036 test from the xfstests
test suite.  It doesn't happen every time, but it does happen often.

As the possibilty is expected, remove the warning, and instead include
the PageDirty() status in the debug message.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-10-03 16:13:49 +02:00
NeilBrown
5d7eb1a322 ceph: ignore error from invalidate_inode_pages2_range() in direct write
This call can fail if there are dirty pages.  The preceding call to
filemap_write_and_wait_range() will normally remove dirty pages, but
as inode_lock() is not held over calls to ceph_direct_read_write(), it
could race with non-direct writes and pages could be dirtied
immediately after filemap_write_and_wait_range() returns

If there are dirty pages, they will be removed by the subsequent call
to truncate_inode_pages_range(), so having them here is not a problem.

If the 'ret' value is left holding an error, then in the async IO case
(aio_req is not NULL) the loop that would normally call
ceph_osdc_start_request() will see the error in 'ret' and abort all
requests.  This doesn't seem like correct behaviour.

So use separate 'ret2' instead of overloading 'ret'.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-10-03 16:13:49 +02:00
Yan, Zheng
1afe478569 ceph: fix error handling of start_read()
If start_page() fails to add a page to page cache or fails to send
OSD request. It should cal put_page() (instead of free_page()) for
relevant pages.

Besides, start_page() need to cancel fscache readpage if it fails
to send OSD request.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reported-by: Zhi Zhang <zhang.david2011@gmail.com>
2016-10-03 16:13:49 +02:00