Commit Graph

65089 Commits

Author SHA1 Message Date
Christoph Hellwig
5781464bd1 xfs: move the ioerror check out of xlog_state_clean_iclog
Use the shutdown flag in the log to bypass xlog_state_clean_iclog
entirely in case of a shut down log.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-23 08:27:59 -07:00
Christoph Hellwig
c814b4f24e xfs: refactor xlog_state_clean_iclog
Factor out a few self-contained helpers from xlog_state_clean_iclog, and
update the documentation so it primarily documents why things happens
instead of how.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-23 08:27:59 -07:00
Christoph Hellwig
12e6a0f449 xfs: remove the aborted parameter to xlog_state_done_syncing
We can just check for a shut down log all the way down in
xlog_cil_committed instead of passing the parameter.  This means a
slight behavior change in that we now also abort log items if the
shutdown came in halfway into the I/O completion processing, which
actually is the right thing to do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-23 08:27:59 -07:00
Christoph Hellwig
a582f32fad xfs: simplify log shutdown checking in xfs_log_release_iclog
There is no need to check for the ioerror state before the lock, as
the shutdown case is not a fast path.  Also remove the call to force
shutdown the file system, as it must have been shut down already
for an iclog to be in the ioerror state.  Also clean up the flow of
the function a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-23 08:27:59 -07:00
Christoph Hellwig
f97a43e436 xfs: simplify the xfs_log_release_iclog calling convention
The only caller of xfs_log_release_iclog doesn't care about the return
value, so remove it.  Also don't bother passing the mount pointer,
given that we can trivially derive it from the iclog.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-23 08:27:58 -07:00
Christoph Hellwig
81e5b50a8f xfs: factor out a xlog_wait_on_iclog helper
Factor out the shared code to wait for a log force into a new helper.
This helper uses the XLOG_FORCED_SHUTDOWN check previous only used
by the unmount code over the equivalent iclog ioerror state used by
the other two functions.

There is a slight behavior change in that the force of the unmount
record is now accounted in the log force statistics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-23 08:27:58 -07:00
Christoph Hellwig
c7cc296ddd xfs: merge xlog_cil_push into xlog_cil_push_work
xlog_cil_push is only called by xlog_cil_push_work, so merge the two
functions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-23 08:27:58 -07:00
Domenico Andreoli
56939e014a hibernate: Allow uswsusp to write to swap
It turns out that there is one use case for programs being able to
write to swap devices, and that is the userspace hibernation code.

Quick fix: disable the S_SWAPFILE check if hibernation is configured.

Fixes: dc617f29db ("vfs: don't allow writes to swap files")
Reported-by: Domenico Andreoli <domenico.andreoli@linux.com>
Reported-by: Marian Klein <mkleinsoft@gmail.com>
Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-23 08:22:15 -07:00
Hillf Danton
a5318d3cdf io-uring: drop 'free_pfile' in struct io_file_put
Sync removal of file is only used in case of a GFP_KERNEL kmalloc
failure at the cost of io_file_put::done and work flush, while a
glich like it can be handled at the call site without too much pain.

That said, what is proposed is to drop sync removing of file, and
the kink in neck as well.

Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-23 09:22:15 -06:00
Hillf Danton
4afdb733b1 io-uring: drop completion when removing file
A case of task hung was reported by syzbot,

INFO: task syz-executor975:9880 blocked for more than 143 seconds.
      Not tainted 5.6.0-rc6-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor975 D27576  9880   9878 0x80004000
Call Trace:
 schedule+0xd0/0x2a0 kernel/sched/core.c:4154
 schedule_timeout+0x6db/0xba0 kernel/time/timer.c:1871
 do_wait_for_common kernel/sched/completion.c:83 [inline]
 __wait_for_common kernel/sched/completion.c:104 [inline]
 wait_for_common kernel/sched/completion.c:115 [inline]
 wait_for_completion+0x26a/0x3c0 kernel/sched/completion.c:136
 io_queue_file_removal+0x1af/0x1e0 fs/io_uring.c:5826
 __io_sqe_files_update.isra.0+0x3a1/0xb00 fs/io_uring.c:5867
 io_sqe_files_update fs/io_uring.c:5918 [inline]
 __io_uring_register+0x377/0x2c00 fs/io_uring.c:7131
 __do_sys_io_uring_register fs/io_uring.c:7202 [inline]
 __se_sys_io_uring_register fs/io_uring.c:7184 [inline]
 __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:7184
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

and bisect pointed to 05f3fb3c53 ("io_uring: avoid ring quiesce for
fixed file set unregister and update").

It is down to the order that we wait for work done before flushing it
while nobody is likely going to wake us up.

We can drop that completion on stack as flushing work itself is a sync
operation we need and no more is left behind it.

To that end, io_file_put::done is re-used for indicating if it can be
freed in the workqueue worker context.

Reported-and-Inspired-by: syzbot <syzbot+538d1957ce178382a394@syzkaller.appspotmail.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>

Rename ->done to ->free_pfile

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-23 09:21:06 -06:00
Luis Henriques
c8d6ee0144 ceph: fix memory leak in ceph_cleanup_snapid_map()
kmemleak reports the following memory leak:

unreferenced object 0xffff88821feac8a0 (size 96):
  comm "kworker/1:0", pid 17, jiffies 4294896362 (age 20.512s)
  hex dump (first 32 bytes):
    a0 c8 ea 1f 82 88 ff ff 00 c9 ea 1f 82 88 ff ff  ................
    00 00 00 00 00 00 00 00 00 01 00 00 00 00 ad de  ................
  backtrace:
    [<00000000b3ea77fb>] ceph_get_snapid_map+0x75/0x2a0
    [<00000000d4060942>] fill_inode+0xb26/0x1010
    [<0000000049da6206>] ceph_readdir_prepopulate+0x389/0xc40
    [<00000000e2fe2549>] dispatch+0x11ab/0x1521
    [<000000007700b894>] ceph_con_workfn+0xf3d/0x3240
    [<0000000039138a41>] process_one_work+0x24d/0x590
    [<00000000eb751f34>] worker_thread+0x4a/0x3d0
    [<000000007e8f0d42>] kthread+0xfb/0x130
    [<00000000d49bd1fa>] ret_from_fork+0x3a/0x50

A kfree is missing while looping the 'to_free' list of ceph_snapid_map
objects.

Cc: stable@vger.kernel.org
Fixes: 75c9627efb ("ceph: map snapid to anonymous bdev ID")
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-23 13:07:08 +01:00
Ilya Dryomov
7614209736 ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL
CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need to consult
per-pool flags as well.  Unfortunately the backwards compatibility here
is lacking:

- the change that deprecated OSDMAP_FULL/NEARFULL went into mimic, but
  was guarded by require_osd_release >= RELEASE_LUMINOUS
- it was subsequently backported to luminous in v12.2.2, but that makes
  no difference to clients that only check OSDMAP_FULL/NEARFULL because
  require_osd_release is not client-facing -- it is for OSDs

Since all kernels are affected, the best we can do here is just start
checking both map flags and pool flags and send that to stable.

These checks are best effort, so take osdc->lock and look up pool flags
just once.  Remove the FIXME, since filesystem quotas are checked above
and RADOS quotas are reflected in POOL_FLAG_FULL: when the pool reaches
its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set.

Cc: stable@vger.kernel.org
Reported-by: Yanhu Cao <gmayyyha@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Sage Weil <sage@redhat.com>
2020-03-23 13:07:08 +01:00
Randy Dunlap
44a52022e7 ext2: fix empty body warnings when -Wextra is used
When EXT2_ATTR_DEBUG is not defined, modify the 2 debug macros
to use the no_printk() macro instead of <nothing>.
This fixes gcc warnings when -Wextra is used:

../fs/ext2/xattr.c:252:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
../fs/ext2/xattr.c:258:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
../fs/ext2/xattr.c:330:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
../fs/ext2/xattr.c:872:45: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body]

I have verified that the only object code change (with gcc 7.5.0) is
the reversal of some instructions from 'cmp a,b' to 'cmp b,a'.

Link: https://lore.kernel.org/r/e18a7395-61fb-2093-18e8-ed4f8cf56248@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-23 13:01:37 +01:00
Chao Yu
1a67cbe141 f2fs: fix to account compressed blocks in f2fs_compressed_blocks()
por_fsstress reports inconsistent status in orphan inode, the root cause
of this is in f2fs_write_raw_pages() we decrease i_compr_blocks incorrectly
due to wrong calculation in f2fs_compressed_blocks().

So this patch exposes below two functions based on __f2fs_cluster_blocks:
- f2fs_compressed_blocks: get count of compressed blocks in compressed cluster
- f2fs_cluster_blocks: get count of valid blocks (including reserved blocks)
in compressed cluster.

Then use f2fs_compress_blocks() to get correct compressed blocks count in
f2fs_write_raw_pages().

sanity_check_inode: inode (ino=ad80) hash inconsistent i_compr_blocks:2, i_blocks:1, run fsck to fix

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-22 21:16:29 -07:00
Gustavo A. R. Silva
50b1203d8c f2fs: xattr.h: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-22 21:16:29 -07:00
Chao Yu
a4ba5dfc5c f2fs: fix to update f2fs_super_block fields under sb_lock
Fields in struct f2fs_super_block should be updated under coverage
of sb_lock, fix to adjust update_sb_metadata() for that rule.

Fixes: 04f0b2eaa3 ("f2fs: ioctl for removing a range from F2FS")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-22 21:16:29 -07:00
Sahitya Tummala
c84ef3c5e6 f2fs: Add a new CP flag to help fsck fix resize SPO issues
Add and set a new CP flag CP_RESIZEFS_FLAG during
online resize FS to help fsck fix the metadata mismatch
that may happen due to SPO during resize, where SB
got updated but CP data couldn't be written yet.

fsck errors -
Info: CKPT version = 6ed7bccb
        Wrong user_block_count(2233856)
[f2fs_do_mount:3365] Checkpoint is polluted

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-22 21:16:28 -07:00
Sahitya Tummala
6827568275 f2fs: Fix mount failure due to SPO after a successful online resize FS
Even though online resize is successfully done, a SPO immediately
after resize, still causes below error in the next mount.

[   11.294650] F2FS-fs (sda8): Wrong user_block_count: 2233856
[   11.300272] F2FS-fs (sda8): Failed to get valid F2FS checkpoint

This is because after FS metadata is updated in update_fs_metadata()
if the SBI_IS_DIRTY is not dirty, then CP will not be done to reflect
the new user_block_count.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-22 21:16:28 -07:00
Chao Yu
a999150f4f f2fs: use kmem_cache pool during inline xattr lookups
It's been observed that kzalloc() on lookup_all_xattrs() are called millions
of times on Android, quickly becoming the top abuser of slub memory allocator.

Use a dedicated kmem cache pool for xattr lookups to mitigate this.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-22 21:16:27 -07:00
Yilu Lin
97adda8b3a CIFS: Fix bug which the return value by asynchronous read is error
This patch is used to fix the bug in collect_uncached_read_data()
that rc is automatically converted from a signed number to an
unsigned number when the CIFS asynchronous read fails.
It will cause ctx->rc is error.

Example:
Share a directory and create a file on the Windows OS.
Mount the directory to the Linux OS using CIFS.
On the CIFS client of the Linux OS, invoke the pread interface to
deliver the read request.

The size of the read length plus offset of the read request is greater
than the maximum file size.

In this case, the CIFS server on the Windows OS returns a failure
message (for example, the return value of
smb2.nt_status is STATUS_INVALID_PARAMETER).

After receiving the response message, the CIFS client parses
smb2.nt_status to STATUS_INVALID_PARAMETER
and converts it to the Linux error code (rdata->result=-22).

Then the CIFS client invokes the collect_uncached_read_data function to
assign the value of rdata->result to rc, that is, rc=rdata->result=-22.

The type of the ctx->total_len variable is unsigned integer,
the type of the rc variable is integer, and the type of
the ctx->rc variable is ssize_t.

Therefore, during the ternary operation, the value of rc is
automatically converted to an unsigned number. The final result is
ctx->rc=4294967274. However, the expected result is ctx->rc=-22.

Signed-off-by: Yilu Lin <linyilu@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-03-22 22:49:10 -05:00
Murphy Zhou
ef4a632ccc CIFS: check new file size when extending file by fallocate
xfstests generic/228 checks if fallocate respect RLIMIT_FSIZE.
After fallocate mode 0 extending enabled, we can hit this failure.
Fix this by check the new file size with vfs helper, return
error if file size is larger then RLIMIT_FSIZE(ulimit -f).

This patch has been tested by LTP/xfstests aginst samba and
Windows server.

Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2020-03-22 22:49:10 -05:00
Steve French
8895c66f2b SMB3: Minor cleanup of protocol definitions
And add one missing define (COMPRESSION_TRANSFORM_ID) and
flag (TRANSFORM_FLAG_ENCRYPTED)

Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:10 -05:00
Steve French
8f23343131 SMB3: Additional compression structures
New transform header structures. See recent updates
to MS-SMB2 adding section 2.2.42.1 and 2.2.42.2

Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-03-22 22:49:10 -05:00
Steve French
2fe4f62de4 SMB3: Add new compression flags
Additional compression capabilities can now be negotiated and a
new compression algorithm.  Add the flags for these.

See newly updated MS-SMB2 sections 3.1.4.4.1 and 2.2.3.1.3

Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-03-22 22:49:10 -05:00
Gustavo A. R. Silva
cff2def598 cifs: smb2pdu.h: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:10 -05:00
Eric Biggers
dc920277f1 cifs: clear PF_MEMALLOC before exiting demultiplex thread
Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
during do_exit().  That can confuse things.  For example, if BSD process
accounting is enabled and the accounting file has FS_SYNC_FL set and is
located on an ext4 filesystem without a journal, then do_exit() can end
up calling ext4_write_inode().  That triggers the
WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
(appropriately) that inodes aren't written when allocating memory.

This was originally reported for another kernel thread, xfsaild() [1].
cifs_demultiplex_thread() also exits with PF_MEMALLOC set, so it's
potentially subject to this same class of issue -- though I haven't been
able to reproduce the WARN_ON_ONCE() via CIFS, since unlike xfsaild(),
cifs_demultiplex_thread() is sent SIGKILL before exiting, and that
interrupts the write to the BSD process accounting file.

Either way, leaving PF_MEMALLOC set is potentially problematic.  Let's
clean this up by properly saving and restoring PF_MEMALLOC.

[1] https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:10 -05:00
Gustavo A. R. Silva
266b9fecc5 cifs: cifspdu.h: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:10 -05:00
Steve French
ba55344f36 CIFS: Warn less noisily on default mount
The warning we print on mount about how to use less secure dialects
(when the user does not specify a version on mount) is useful
but is noisy to print on every default mount, and can be changed
to a warn_once.  Slightly updated the warning text as well to note
SMB3.1.1 which has been the default which is typically negotiated
(for a few years now) by most servers.

      "No dialect specified on mount. Default has changed to a more
       secure dialect, SMB2.1 or later (e.g. SMB3.1.1), from CIFS
       (SMB1). To use the less secure SMB1 dialect to access old
       servers which do not support SMB3.1.1 (or even SMB3 or SMB2.1)
       specify vers=1.0 on mount."

Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-03-22 22:49:09 -05:00
Qiujun Huang
f2d67931fd fs/cifs: fix gcc warning in sid_to_id
fix warning [-Wunused-but-set-variable] at variable 'rc',
keeping the code readable.

Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:09 -05:00
Murphy Zhou
0667059d0b cifs: allow unlock flock and OFD lock across fork
Since commit d0677992d2 ("cifs: add support for flock") added
support for flock, LTP/flock03[1] testcase started to fail.

This testcase is testing flock lock and unlock across fork.
The parent locks file and starts the child process, in which
it unlock the same fd and lock the same file with another fd
again. All the lock and unlock operation should succeed.

Now the child process does not actually unlock the file, so
the following lock fails. Fix this by allowing flock and OFD
lock go through the unlock routine, not skipping if the unlock
request comes from another process.

Patch has been tested by LTP/xfstests on samba and Windows
server, v3.11, with or without cache=none mount option.

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/flock/flock03.c
Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-03-22 22:49:09 -05:00
Steve French
c7e9f78f7b cifs: do d_move in rename
See commit 349457ccf2
"Allow file systems to manually d_move() inside of ->rename()"

Lessens possibility of race conditions in rename

Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:09 -05:00
Aurelien Aptel
69dda3059e cifs: add SMB2_open() arg to return POSIX data
allows SMB2_open() callers to pass down a POSIX data buffer that will
trigger requesting POSIX create context and parsing the response into
the provided buffer.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
2020-03-22 22:49:09 -05:00
Aurelien Aptel
3d519bd126 cifs: plumb smb2 POSIX dir enumeration
* add code to request POSIX info level
* parse dir entries and fill cifs_fattr to get correct inode data

since the POSIX payload is variable size the number of entries in a
FIND response needs to be computed differently.

Dirs and regular files are properly reported along with mode bits,
hardlink number, c/m/atime. No special files yet (see below).

Current experimental version of Samba with the extension unfortunately
has issues with wildcards and needs the following patch:

> --- i/source3/smbd/smb2_query_directory.c
> +++ w/source3/smbd/smb2_query_directory.c
> @@ -397,9 +397,7 @@ smbd_smb2_query_directory_send(TALLOC_CTX
> *mem_ctx,
> 		}
> 	}
>
> -       if (!state->smbreq->posix_pathnames) {
> 		wcard_has_wild = ms_has_wild(state->in_file_name);
> -       }
>
> 	/* Ensure we've canonicalized any search path if not a wildcard. */
> 	if (!wcard_has_wild) {
>

Also for special files despite reporting them as reparse point samba
doesn't set the reparse tag field. This patch will mark them as needing
re-evaluation but the re-evaluate code doesn't deal with it yet.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:09 -05:00
Aurelien Aptel
349e13ad30 cifs: add smb2 POSIX info level
* add new info level and structs for SMB2 posix extension
* add functions to parse and validate it

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:09 -05:00
Aurelien Aptel
2e8af978d9 cifs: rename posix create rsp
little progress on the posix create response.

* rename struct to create_posix_rsp to match with the request
  create_posix context
* make struct packed
* pass smb info struct for parse_posix_ctxt to fill
* use smb info struct as param
* update TODO

What needs to be done:

SMB2_open() has an optional smb info out argument that it will fill.
Callers making use of this are:

- smb3_query_mf_symlink (need to investigate)
- smb2_open_file

Callers of smb2_open_file (via server->ops->open) are passing an
smbinfo struct but that struct cannot hold POSIX information. All the
call stack needs to be changed for a different info type. Maybe pass
SMB generic struct like cifs_fattr instead.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:09 -05:00
Steve French
8fe0c2c2cb cifs: print warning mounting with vers=1.0
We really, really don't want people using insecure dialects
unless they realize what they are doing ...

Add mount warning if mounting with vers=1.0 (older SMB1/CIFS
dialect) instead of the default (SMB2.1 or later, typically
SMB3.1.1).

Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-03-22 22:49:09 -05:00
Steve French
cf5371ae46 smb3: fix performance regression with setting mtime
There are cases when we don't want to send the SMB2 flush operation
(e.g. when user specifies mount parm "nostrictsync") and it can be
a very expensive operation on the server.  In most cases in order
to set mtime, we simply need to flush (write) the dirtry pages from
the client and send the writes to the server not also send a flush
protocol operation to the server.

Fixes: aa081859b1 ("cifs: flush before set-info if we have writeable handles")
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:09 -05:00
Stefan Metzmacher
864138cb31 cifs: make use of cap_unix(ses) in cifs_reconnect_tcon()
cap_unix(ses) defaults to false for SMB2.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:09 -05:00
Stefan Metzmacher
b08484d715 cifs: use mod_delayed_work() for &server->reconnect if already queued
mod_delayed_work() is safer than queue_delayed_work() if there's a
chance that the work is already in the queue.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:09 -05:00
Stefan Metzmacher
e2e87519bd cifs: call wake_up(&server->response_q) inside of cifs_reconnect()
This means it's consistently called and the callers don't need to
care about it.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:09 -05:00
Paulo Alcantara (SUSE)
bacd704a95 cifs: handle prefix paths in reconnect
For the case where we have a DFS path like below and we're currently
connected to targetA:

    //dfsroot/link -> //targetA/share/foo, //targetB/share/bar

after failover, we should make sure to update cifs_sb->prepath so the
next operations will use the new prefix path "/bar".

Besides, in order to simplify the use of different prefix paths,
enforce CIFS_MOUNT_USE_PREFIX_PATH for DFS mounts so we don't have to
revalidate the root dentry every time we set a new prefix path.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:09 -05:00
Steve French
ffdec8d642 cifs: do not ignore the SYNC flags in getattr
Check the AT_STATX_FORCE_SYNC flag and force an attribute
revalidation if requested by the caller, and if the caller
specificies AT_STATX_DONT_SYNC only revalidate cached attributes
if required.  In addition do not flush writes in getattr (which
can be expensive) if size or timestamps not requested by the
caller.

Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-03-22 22:49:09 -05:00
Pavel Begunkov
18a542ff19 io_uring: Fix ->data corruption on re-enqueue
work->data and work->list are shared in union. io_wq_assign_next() sets
->data if a req having a linked_timeout, but then io-wq may want to use
work->list, e.g. to do re-enqueue of a request, so corrupting ->data.

->data is not necessary, just remove it and extract linked_timeout
through @link_list.

Fixes: 60cf46ae60 ("io-wq: hash dependent work")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-22 19:31:27 -06:00
Misono Tomohiro
8605cf0e85 NFS: direct.c: Fix memory leak of dreq when nfs_get_lock_context fails
When dreq is allocated by nfs_direct_req_alloc(), dreq->kref is
initialized to 2. Therefore we need to call nfs_direct_req_release()
twice to release the allocated dreq. Usually it is called in
nfs_file_direct_{read, write}() and nfs_direct_complete().

However, current code only calls nfs_direct_req_relese() once if
nfs_get_lock_context() fails in nfs_file_direct_{read, write}().
So, that case would result in memory leak.

Fix this by adding the missing call.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-22 16:47:58 -04:00
Linus Torvalds
67d584e33e Merge tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
 "Two fixes.

  The first is a regression: when dropping some incompat bits the
  conditions were reversed. The other is a fix for rename whiteout
  potentially leaving stack memory linked to a list"

* tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix removal of raid[56|1c34} incompat flags after removing block group
  btrfs: fix log context list corruption after rename whiteout error
2020-03-22 11:35:33 -07:00
Linus Torvalds
b3c03db67e Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  x86/mm: split vmalloc_sync_all()
  mm, slub: prevent kmalloc_node crashes and memory leaks
  mm/mmu_notifier: silence PROVE_RCU_LIST warnings
  epoll: fix possible lost wakeup on epoll_ctl() path
  mm: do not allow MADV_PAGEOUT for CoW pages
  mm, memcg: throttle allocators based on ancestral memory.high
  mm, memcg: fix corruption on 64-bit divisor in memory.high throttling
  page-flags: fix a crash at SetPageError(THP_SWAP)
  mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
2020-03-22 10:46:50 -07:00
Pavel Begunkov
f2cf11492b io-wq: close cancel gap for hashed linked work
After io_assign_current_work() of a linked work, it can be decided to
offloaded to another thread so doing io_wqe_enqueue(). However, until
next io_assign_current_work() it can be cancelled, that isn't handled.

Don't assign it, if it's not going to be executed.

Fixes: 60cf46ae60 ("io-wq: hash dependent work")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-22 11:33:58 -06:00
Roman Penyaev
1b53734bd0 epoll: fix possible lost wakeup on epoll_ctl() path
This fixes possible lost wakeup introduced by commit a218cc4914.
Originally modifications to ep->wq were serialized by ep->wq.lock, but
in commit a218cc4914 ("epoll: use rwlock in order to reduce
ep_poll_callback() contention") a new rw lock was introduced in order to
relax fd event path, i.e. callers of ep_poll_callback() function.

After the change ep_modify and ep_insert (both are called on epoll_ctl()
path) were switched to ep->lock, but ep_poll (epoll_wait) was using
ep->wq.lock on wqueue list modification.

The bug doesn't lead to any wqueue list corruptions, because wake up
path and list modifications were serialized by ep->wq.lock internally,
but actual waitqueue_active() check prior wake_up() call can be
reordered with modifications of ep ready list, thus wake up can be lost.

And yes, can be healed by explicit smp_mb():

  list_add_tail(&epi->rdlink, &ep->rdllist);
  smp_mb();
  if (waitqueue_active(&ep->wq))
	wake_up(&ep->wp);

But let's make it simple, thus current patch replaces ep->wq.lock with
the ep->lock for wqueue modifications, thus wake up path always observes
activeness of the wqueue correcty.

Fixes: a218cc4914 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
Reported-by: Max Neunhoeffer <max@arangodb.com>
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jes Sorensen <jes.sorensen@gmail.com>
Cc: <stable@vger.kernel.org>	[5.1+]
Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
Bisected-by: Max Neunhoeffer <max@arangodb.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Linus Torvalds
1ab7ea1f83 Merge tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "Two different fixes in here:

   - Fix for a potential NULL pointer deref for links with async or
     drain marked (Pavel)

   - Fix for not properly checking RLIMIT_NOFILE for async punted
     operations.

     This affects openat/openat2, which were added this cycle, and
     accept4. I did a full audit of other cases where we might check
     current->signal->rlim[] and found only RLIMIT_FSIZE for buffered
     writes and fallocate. That one is fixed and queued for 5.7 and
     marked stable"

* tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block:
  io_uring: make sure accept honor rlimit nofile
  io_uring: make sure openat/openat2 honor rlimit nofile
  io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN}
2020-03-21 11:54:47 -07:00
Filipe Manana
d8e6fd5c79 btrfs: fix removal of raid[56|1c34} incompat flags after removing block group
We are incorrectly dropping the raid56 and raid1c34 incompat flags when
there are still raid56 and raid1c34 block groups, not when we do not any
of those anymore. The logic just got unintentionally broken after adding
the support for the raid1c34 modes.

Fix this by clear the flags only if we do not have block groups with the
respective profiles.

Fixes: 9c907446dc ("btrfs: drop incompat bit for raid1c34 after last block group is gone")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-20 21:31:32 +01:00