Commit Graph

65089 Commits

Author SHA1 Message Date
Linus Torvalds
f757165705 Merge tag 'fuse-fixes-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:

 - Fix a regression introduced in v5.1 that triggers WARNINGs for some
   fuse filesystems

 - Fix an xfstest failure

 - Allow overlayfs to be used on top of fuse/virtiofs

 - Code and documentation cleanups

* tag 'fuse-fixes-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: use true,false for bool variable
  Documentation: filesystems: convert fuse to RST
  fuse: Support RENAME_WHITEOUT flag
  fuse: don't overflow LLONG_MAX with end offset
  fix up iter on short count in fuse_direct_io()
2020-02-07 17:59:07 -08:00
Linus Torvalds
175787e011 Merge tag 'gfs2-for-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:

 - Fix a bug in Abhi Das's journal head lookup improvements that can
   cause a valid journal to be rejected.

 - Fix an O_SYNC write handling bug reported by Christoph Hellwig.

* tag 'gfs2-for-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: fix O_SYNC write handling
  gfs2: move setting current->backing_dev_info
  gfs2: fix gfs2_find_jhead that returns uninitialized jhead with seq 0
2020-02-07 17:54:46 -08:00
Linus Torvalds
60ea27e936 Merge tag 'for-linus-5.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs fix from Mike Marshall:
 "Debugfs fix for orangefs.

  Vasliy Averin noticed that 'if seq_file .next function does not change
  position index, read after some lseek can generate unexpected output'
  and sent in this fix"

* tag 'for-linus-5.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  help_next should increase position index
2020-02-07 17:52:38 -08:00
Linus Torvalds
08dffcc7d9 Merge tag 'nfsd-5.6' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "Highlights:

   - Server-to-server copy code from Olga.

     To use it, client and both servers must have support, the target
     server must be able to access the source server over NFSv4.2, and
     the target server must have the inter_copy_offload_enable module
     parameter set.

   - Improvements and bugfixes for the new filehandle cache, especially
     in the container case, from Trond

   - Also from Trond, better reporting of write errors.

   - Y2038 work from Arnd"

* tag 'nfsd-5.6' of git://linux-nfs.org/~bfields/linux: (55 commits)
  sunrpc: expiry_time should be seconds not timeval
  nfsd: make nfsd_filecache_wq variable static
  nfsd4: fix double free in nfsd4_do_async_copy()
  nfsd: convert file cache to use over/underflow safe refcount
  nfsd: Define the file access mode enum for tracing
  nfsd: Fix a perf warning
  nfsd: Ensure sampling of the write verifier is atomic with the write
  nfsd: Ensure sampling of the commit verifier is atomic with the commit
  sunrpc: clean up cache entry add/remove from hashtable
  sunrpc: Fix potential leaks in sunrpc_cache_unhash()
  nfsd: Ensure exclusion between CLONE and WRITE errors
  nfsd: Pass the nfsd_file as arguments to nfsd4_clone_file_range()
  nfsd: Update the boot verifier on stable writes too.
  nfsd: Fix stable writes
  nfsd: Allow nfsd_vfs_write() to take the nfsd_file as an argument
  nfsd: Fix a soft lockup race in nfsd_file_mark_find_or_create()
  nfsd: Reduce the number of calls to nfsd_file_gc()
  nfsd: Schedule the laundrette regularly irrespective of file errors
  nfsd: Remove unused constant NFSD_FILE_LRU_RESCAN
  nfsd: Containerise filecache laundrette
  ...
2020-02-07 17:50:21 -08:00
Linus Torvalds
f43574d0ac Merge tag 'nfs-for-5.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Puyll NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - Fix memory leaks and corruption in readdir # v2.6.37+
   - Directory page cache needs to be locked when read # v2.6.37+

  New features:
   - Convert NFS to use the new mount API
   - Add "softreval" mount option to let clients use cache if server goes down
   - Add a config option to compile without UDP support
   - Limit the number of inactive delegations the client can cache at once
   - Improved readdir concurrency using iterate_shared()

  Other bugfixes and cleanups:
   - More 64-bit time conversions
   - Add additional diagnostic tracepoints
   - Check for holes in swapfiles, and add dependency on CONFIG_SWAP
   - Various xprtrdma cleanups to prepare for 5.7's changes
   - Several fixes for NFS writeback and commit handling
   - Fix acls over krb5i/krb5p mounts
   - Recover from premature loss of openstateids
   - Fix NFS v3 chacl and chmod bug
   - Compare creds using cred_fscmp()
   - Use kmemdup_nul() in more places
   - Optimize readdir cache page invalidation
   - Lease renewal and recovery fixes"

* tag 'nfs-for-5.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (93 commits)
  NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals
  NFSv4: try lease recovery on NFS4ERR_EXPIRED
  NFS: Fix memory leaks
  nfs: optimise readdir cache page invalidation
  NFS: Switch readdir to using iterate_shared()
  NFS: Use kmemdup_nul() in nfs_readdir_make_qstr()
  NFS: Directory page cache pages need to be locked when read
  NFS: Fix memory leaks and corruption in readdir
  SUNRPC: Use kmemdup_nul() in rpc_parse_scope_id()
  NFS: Replace various occurrences of kstrndup() with kmemdup_nul()
  NFSv4: Limit the total number of cached delegations
  NFSv4: Add accounting for the number of active delegations held
  NFSv4: Try to return the delegation immediately when marked for return on close
  NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returned
  NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNING
  NFS: nfs_find_open_context() should use cred_fscmp()
  NFS: nfs_access_get_cached_rcu() should use cred_fscmp()
  NFSv4: pnfs_roc() must use cred_fscmp() to compare creds
  NFS: remove unused macros
  nfs: Return EINVAL rather than ERANGE for mount parse errors
  ...
2020-02-07 17:39:56 -08:00
Al Viro
bf45f7fcc4 procfs: switch to use of invalfc()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:42 -05:00
Al Viro
b5db30cfb9 hugetlbfs: switch to use of invalfc()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:42 -05:00
Al Viro
e1ee7d8511 cramfs: switch to use of errofc() et.al.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:41 -05:00
Al Viro
77cb271e6a gfs2: switch to use of errorfc() et.al.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:41 -05:00
Al Viro
2e28c49ea6 fuse: switch to use errorfc() et.al.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:40 -05:00
Al Viro
d53d0f7461 ceph: use errorfc() and friends instead of spelling the prefix out
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:39 -05:00
Al Viro
328de5287b turn fs_param_is_... into functions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:38 -05:00
Al Viro
48ce73b1be fs_parse: handle optional arguments sanely
Don't bother with "mixed" options that would allow both the
form with and without argument (i.e. both -o foo and -o foo=bar).
Rather than trying to shove both into a single fs_parameter_spec,
allow having with-argument and no-argument specs with the same
name and teach fs_parse to handle that.

There are very few options of that sort, and they are actually
easier to handle that way - callers end up with less postprocessing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:37 -05:00
Al Viro
d7167b1499 fs_parse: fold fs_parameter_desc/fs_parameter_spec
The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:37 -05:00
Eric Sandeen
96cafb9ccb fs_parser: remove fs_parameter_description name field
Unused now.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:36 -05:00
Al Viro
cc3c0b533a add prefix to fs_context->log
... turning it into struct p_log embedded into fs_context.  Initialize
the prefix with fs_type->name, turning fs_parse() into a trivial
inline wrapper for __fs_parse().

This makes fs_parameter_description->name completely unused.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:35 -05:00
Al Viro
c80c98f0dc ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
... and now errorf() et.al. are never called with NULL fs_context,
so we can get rid of conditional in those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:34 -05:00
Al Viro
7f5d38141e new primitive: __fs_parse()
fs_parse() analogue taking p_log instead of fs_context.
fs_parse() turned into a wrapper, callers in ceph_common and rbd
switched to __fs_parse().

As the result, fs_parse() never gets NULL fs_context and neither
do fs_context-based logging primitives

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:34 -05:00
Al Viro
9f09f649ca teach logfc() to handle prefices, give it saner calling conventions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:32 -05:00
Al Viro
aa1918f949 get rid of fs_value_is_filename_empty
Its behaviour is identical to that of fs_value_is_filename.
It makes no sense, anyway - LOOKUP_EMPTY affects nothing
whatsoever once the pathname has been imported from userland.
And both fs_value_is_filename and fs_value_is_filename_empty
carry an already imported pathname.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:30 -05:00
Al Viro
34264ae3fa don't bother with explicit length argument for __lookup_constant()
Have the arrays of constant_table self-terminated (by NULL ->name
in the final entry).  Simplifies lookup_constant() and allows to
reuse the search for enum params as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:47:52 -05:00
Chen Zhou
50d0def966 nfsd: make nfsd_filecache_wq variable static
Fix sparse warning:

fs/nfsd/filecache.c:55:25: warning:
	symbol 'nfsd_filecache_wq' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-02-07 13:30:41 -05:00
Damien Le Moal
8dcc1a9d90 fs: New zonefs file system
zonefs is a very simple file system exposing each zone of a zoned block
device as a file. Unlike a regular file system with zoned block device
support (e.g. f2fs), zonefs does not hide the sequential write
constraint of zoned block devices to the user. Files representing
sequential write zones of the device must be written sequentially
starting from the end of the file (append only writes).

As such, zonefs is in essence closer to a raw block device access
interface than to a full featured POSIX file system. The goal of zonefs
is to simplify the implementation of zoned block device support in
applications by replacing raw block device file accesses with a richer
file API, avoiding relying on direct block device file ioctls which may
be more obscure to developers. One example of this approach is the
implementation of LSM (log-structured merge) tree structures (such as
used in RocksDB and LevelDB) on zoned block devices by allowing SSTables
to be stored in a zone file similarly to a regular file system rather
than as a range of sectors of a zoned device. The introduction of the
higher level construct "one file is one zone" can help reducing the
amount of changes needed in the application as well as introducing
support for different application programming languages.

Zonefs on-disk metadata is reduced to an immutable super block to
persistently store a magic number and optional feature flags and
values. On mount, zonefs uses blkdev_report_zones() to obtain the device
zone configuration and populates the mount point with a static file tree
solely based on this information. E.g. file sizes come from the device
zone type and write pointer offset managed by the device itself.

The zone files created on mount have the following characteristics.
1) Files representing zones of the same type are grouped together
   under a common sub-directory:
     * For conventional zones, the sub-directory "cnv" is used.
     * For sequential write zones, the sub-directory "seq" is used.
  These two directories are the only directories that exist in zonefs.
  Users cannot create other directories and cannot rename nor delete
  the "cnv" and "seq" sub-directories.
2) The name of zone files is the number of the file within the zone
   type sub-directory, in order of increasing zone start sector.
3) The size of conventional zone files is fixed to the device zone size.
   Conventional zone files cannot be truncated.
4) The size of sequential zone files represent the file's zone write
   pointer position relative to the zone start sector. Truncating these
   files is allowed only down to 0, in which case, the zone is reset to
   rewind the zone write pointer position to the start of the zone, or
   up to the zone size, in which case the file's zone is transitioned
   to the FULL state (finish zone operation).
5) All read and write operations to files are not allowed beyond the
   file zone size. Any access exceeding the zone size is failed with
   the -EFBIG error.
6) Creating, deleting, renaming or modifying any attribute of files and
   sub-directories is not allowed.
7) There are no restrictions on the type of read and write operations
   that can be issued to conventional zone files. Buffered, direct and
   mmap read & write operations are accepted. For sequential zone files,
   there are no restrictions on read operations, but all write
   operations must be direct IO append writes. mmap write of sequential
   files is not allowed.

Several optional features of zonefs can be enabled at format time.
* Conventional zone aggregation: ranges of contiguous conventional
  zones can be aggregated into a single larger file instead of the
  default one file per zone.
* File ownership: The owner UID and GID of zone files is by default 0
  (root) but can be changed to any valid UID/GID.
* File access permissions: the default 640 access permissions can be
  changed.

The mkzonefs tool is used to format zoned block devices for use with
zonefs. This tool is available on Github at:

git@github.com:damien-lemoal/zonefs-tools.git.

zonefs-tools also includes a test suite which can be run against any
zoned block device, including null_blk block device created with zoned
mode.

Example: the following formats a 15TB host-managed SMR HDD with 256 MB
zones with the conventional zones aggregation feature enabled.

$ sudo mkzonefs -o aggr_cnv /dev/sdX
$ sudo mount -t zonefs /dev/sdX /mnt
$ ls -l /mnt/
total 0
dr-xr-xr-x 2 root root     1 Nov 25 13:23 cnv
dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq

The size of the zone files sub-directories indicate the number of files
existing for each type of zones. In this example, there is only one
conventional zone file (all conventional zones are aggregated under a
single file).

$ ls -l /mnt/cnv
total 137101312
-rw-r----- 1 root root 140391743488 Nov 25 13:23 0

This aggregated conventional zone file can be used as a regular file.

$ sudo mkfs.ext4 /mnt/cnv/0
$ sudo mount -o loop /mnt/cnv/0 /data

The "seq" sub-directory grouping files for sequential write zones has
in this example 55356 zones.

$ ls -lv /mnt/seq
total 14511243264
-rw-r----- 1 root root 0 Nov 25 13:23 0
-rw-r----- 1 root root 0 Nov 25 13:23 1
-rw-r----- 1 root root 0 Nov 25 13:23 2
...
-rw-r----- 1 root root 0 Nov 25 13:23 55354
-rw-r----- 1 root root 0 Nov 25 13:23 55355

For sequential write zone files, the file size changes as data is
appended at the end of the file, similarly to any regular file system.

$ dd if=/dev/zero of=/mnt/seq/0 bs=4K count=1 conv=notrunc oflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000452219 s, 9.1 MB/s

$ ls -l /mnt/seq/0
-rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0

The written file can be truncated to the zone size, preventing any
further write operation.

$ truncate -s 268435456 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0

Truncation to 0 size allows freeing the file zone storage space and
restart append-writes to the file.

$ truncate -s 0 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0

Since files are statically mapped to zones on the disk, the number of
blocks of a file as reported by stat() and fstat() indicates the size
of the file zone.

$ stat /mnt/seq/0
  File: /mnt/seq/0
  Size: 0       Blocks: 524288     IO Block: 4096   regular empty file
Device: 870h/2160d      Inode: 50431       Links: 1
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/  root)
Access: 2019-11-25 13:23:57.048971997 +0900
Modify: 2019-11-25 13:52:25.553805765 +0900
Change: 2019-11-25 13:52:25.553805765 +0900
 Birth: -

The number of blocks of the file ("Blocks") in units of 512B blocks
gives the maximum file size of 524288 * 512 B = 256 MB, corresponding
to the device zone size in this example. Of note is that the "IO block"
field always indicates the minimum IO size for writes and corresponds
to the device physical sector size.

This code contains contributions from:
* Johannes Thumshirn <jthumshirn@suse.de>,
* Darrick J. Wong <darrick.wong@oracle.com>,
* Christoph Hellwig <hch@lst.de>,
* Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> and
* Ting Yao <tingyao@hust.edu.cn>.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-02-07 14:39:38 +09:00
Al Viro
5eede62529 fold struct fs_parameter_enum into struct constant_table
no real difference now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 00:12:50 -05:00
Al Viro
2710c957a8 fs_parse: get rid of ->enums
Don't do a single array; attach them to fsparam_enum() entry
instead.  And don't bother trying to embed the names into those -
it actually loses memory, with no real speedup worth mentioning.

Simplifies validation as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 00:12:50 -05:00
Al Viro
0f89589a8c Pass consistent param->type to fs_parse()
As it is, vfs_parse_fs_string() makes "foo" and "foo=" indistinguishable;
both get fs_value_is_string for ->type and NULL for ->string.  To make
it even more unpleasant, that combination is impossible to produce with
fsconfig().

Much saner rules would be
        "foo"           => fs_value_is_flag, NULL
	"foo="          => fs_value_is_string, ""
	"foo=bar"       => fs_value_is_string, "bar"
All cases are distinguishable, all results are expressable by fsconfig(),
->has_value checks are much simpler that way (to the point of the field
being useless) and quite a few regressions go away (gfs2 has no business
accepting -o nodebug=, for example).

Partially based upon patches from Miklos.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 00:10:29 -05:00
Steve French
51d92d69f7 smb3: Add defines for new information level, FileIdInformation
See MS-FSCC 2.4.43.  Valid to be quried from most
Windows servers (among others).

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-02-06 17:32:24 -06:00
Steve French
ab3459d8f0 smb3: print warning once if posix context returned on open
SMB3.1.1 POSIX Context processing is not complete yet - so print warning
(once) if server returns it on open.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-02-06 17:31:56 -06:00
Steve French
2391ca41b4 smb3: add one more dynamic tracepoint missing from strict fsync path
We didn't have a dynamic trace point for catching errors in
file_write_and_wait_range error cases in cifs_strict_fsync.

Since not all apps check for write behind errors, it can be
important for debugging to be able to trace these error
paths.

Suggested-and-reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-02-06 17:21:23 -06:00
Aurelien Aptel
e3e056c351 cifs: fix mode bits from dir listing when mounted with modefromsid
When mounting with -o modefromsid, the mode bits are stored in an
ACE. Directory enumeration (e.g. ls -l /mnt) triggers an SMB Query Dir
which does not include ACEs in its response. The mode bits in this
case are silently set to a default value of 755 instead.

This patch marks the dentry created during the directory enumeration
as needing re-evaluation (i.e. additional Query Info with ACEs) so
that the mode bits can be properly extracted.

Quick repro:

$ mount.cifs //win19.test/data /mnt -o ...,modefromsid
$ touch /mnt/foo && chmod 751 /mnt/foo
$ stat /mnt/foo
  # reports 751 (OK)
$ sleep 2
  # dentry older than 1s by default get invalidated
$ ls -l /mnt
  # since dentry invalid, ls does a Query Dir
  # and reports foo as 755 (WRONG)

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-02-06 17:19:38 -06:00
Pavel Begunkov
1e95081cb5 io_uring: fix deferred req iovec leak
After defer, a request will be prepared, that includes allocating iovec
if needed, and then submitted through io_wq_submit_work() but not custom
handler (e.g. io_rw_async()/io_sendrecv_async()). However, it'll leak
iovec, as it's in io-wq and the code goes as follows:

io_read() {
	if (!io_wq_current_is_worker())
		kfree(iovec);
}

Put all deallocation logic in io_{read,write,send,recv}(), which will
leave the memory, if going async with -EAGAIN.

It also fixes a leak after failed io_alloc_async_ctx() in
io_{recv,send}_msg().

Cc: stable@vger.kernel.org # 5.5
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-06 13:58:57 -07:00
Randy Dunlap
e1d85334d6 io_uring: fix 1-bit bitfields to be unsigned
Make bitfields of size 1 bit be unsigned (since there is no room
for the sign bit).
This clears up the sparse warnings:

  CHECK   ../fs/io_uring.c
../fs/io_uring.c:207:50: error: dubious one-bit signed bitfield
../fs/io_uring.c:208:55: error: dubious one-bit signed bitfield
../fs/io_uring.c:209:63: error: dubious one-bit signed bitfield
../fs/io_uring.c:210:54: error: dubious one-bit signed bitfield
../fs/io_uring.c:211:57: error: dubious one-bit signed bitfield

Found by sight and then verified with sparse.

Fixes: 69b3e54613 ("io_uring: change io_ring_ctx bool fields into bit fields")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-06 13:41:00 -07:00
Pavel Begunkov
1cb1edb2f5 io_uring: get rid of delayed mm check
Fail fast if can't grab mm, so past that requests always have an mm
when required. This allows us to remove req->user altogether.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-06 12:53:10 -07:00
Aurelien Aptel
cc95b67727 cifs: fix channel signing
The server var was accidentally used as an iterator over the global
list of connections, thus overwritten the passed argument. This
resulted in the wrong signing key being returned for extra channels.

Fix this by using a separate var to iterate.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-02-06 12:42:36 -06:00
Andreas Gruenbacher
6e5e41e2dc gfs2: fix O_SYNC write handling
In gfs2_file_write_iter, for direct writes, the error checking in the buffered
write fallback case is incomplete.  This can cause inode write errors to go
undetected.  Fix and clean up gfs2_file_write_iter along the way.

Based on a proposed fix by Christoph Hellwig <hch@lst.de>.

Fixes: 967bcc91b0 ("gfs2: iomap direct I/O support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-06 18:49:41 +01:00
Christoph Hellwig
4c0e8dda60 gfs2: move setting current->backing_dev_info
Set current->backing_dev_info just around the buffered write calls to
prepare for the next fix.

Fixes: 967bcc91b0 ("gfs2: iomap direct I/O support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-06 17:35:23 +01:00
Abhi Das
7582026f6f gfs2: fix gfs2_find_jhead that returns uninitialized jhead with seq 0
When the first log header in a journal happens to have a sequence
number of 0, a bug in gfs2_find_jhead() causes it to prematurely exit,
and return an uninitialized jhead with seq 0. This can cause failures
in the caller. For instance, a mount fails in one test case.

The correct behavior is for it to continue searching through the journal
to find the correct journal head with the highest sequence number.

Fixes: f4686c26ec ("gfs2: read journal in large chunks")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-06 17:35:23 +01:00
Dan Carpenter
91fd3c3edc nfsd4: fix double free in nfsd4_do_async_copy()
This frees "copy->nf_src" before and again after the goto.

Fixes: ce0887ac96 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-02-06 11:22:55 -05:00
Trond Myklebust
689827cd5b nfsd: convert file cache to use over/underflow safe refcount
Use the 'refcount_t' type instead of 'atomic_t' for improved
refcounting safety.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-02-06 11:22:55 -05:00
Trond Myklebust
c19285596d nfsd: Define the file access mode enum for tracing
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-02-06 11:22:55 -05:00
Trond Myklebust
a9ceb060b3 nfsd: Fix a perf warning
perf does not know how to deal with a __builtin_bswap32() call, and
complains. All other functions just store the xid etc in host endian
form, so let's do that in the tracepoint for nfsd_file_acquire too.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-02-06 11:22:54 -05:00
zhengbin
cabdb4fa2f fuse: use true,false for bool variable
Fixes coccicheck warning:

fs/fuse/readdir.c:335:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1398:2-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1400:2-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:454:1-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:455:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:497:2-17: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:504:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:511:2-22: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:518:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:522:2-26: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:526:2-18: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:1000:1-20: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-06 16:39:28 +01:00
Vivek Goyal
519525fa47 fuse: Support RENAME_WHITEOUT flag
Allow fuse to pass RENAME_WHITEOUT to fuse server.  Overlayfs on top of
virtiofs uses RENAME_WHITEOUT.

Without this patch renaming a directory in overlayfs (dir is on lower)
fails with -EINVAL. With this patch it works.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-06 16:39:28 +01:00
Miklos Szeredi
2f1398291b fuse: don't overflow LLONG_MAX with end offset
Handle the special case of fuse_readpages() wanting to read the last page
of a hugest file possible and overflowing the end offset in the process.

This is basically to unbreak xfstests:generic/525 and prevent filesystems
from doing bad things with an overflowing offset.

Reported-by: Xiao Yang <ice_yangxiao@163.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-06 16:39:28 +01:00
Miklos Szeredi
f658adeea4 fix up iter on short count in fuse_direct_io()
fuse_direct_io() can end up advancing the iterator by more than the amount
of data read or written.  This case is handled by the generic code if going
through ->direct_IO(), but not in the FOPEN_DIRECT_IO case.

Fix by reverting the extra bytes from the iterator in case of error or a
short count.

To test: install lxcfs, then the following testcase
  int fd = open("/var/lib/lxcfs/proc/uptime", O_RDONLY);
  sendfile(1, fd, NULL, 16777216);
  sendfile(1, fd, NULL, 16777216);
will spew WARN_ON() in iov_iter_pipe().

Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 3c3db095b6 ("fuse: use iov_iter based generic splice helpers")
Cc: <stable@vger.kernel.org> # v5.1
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-06 16:39:28 +01:00
Steve French
d26c2ddd33 cifs: add SMB3 change notification support
A commonly used SMB3 feature is change notification, allowing an
app to be notified about changes to a directory. The SMB3
Notify request blocks until the server detects a change to that
directory or its contents that matches the completion flags
that were passed in and the "watch_tree" flag (which indicates
whether subdirectories under this directory should be also
included).  See MS-SMB2 2.2.35 for additional detail.

To use this simply pass in the following structure to ioctl:

 struct __attribute__((__packed__)) smb3_notify {
        uint32_t completion_filter;
        bool    watch_tree;
 } __packed;

 using CIFS_IOC_NOTIFY  0x4005cf09
 or equivalently _IOW(CIFS_IOCTL_MAGIC, 9, struct smb3_notify)

SMB3 change notification is supported by all major servers.
The ioctl will block until the server detects a change to that
directory or its subdirectories (if watch_tree is set).

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
2020-02-06 09:14:28 -06:00
Aurelien Aptel
343a1b777a cifs: make multichannel warning more visible
When no interfaces are returned by the server we cannot open multiple
channels. Make it more obvious by reporting that to the user at the
VFS log level.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-02-06 09:12:16 -06:00
Ronnie Sahlberg
09c40b1535 cifs: fix soft mounts hanging in the reconnect code
RHBZ: 1795423

This is the SMB1 version of a patch we already have for SMB2

In recent DFS updates we have a new variable controlling how many times we will
retry to reconnect the share.
If DFS is not used, then this variable is initialized to 0 in:

static inline int
dfs_cache_get_nr_tgts(const struct dfs_cache_tgt_list *tl)
{
        return tl ? tl->tl_numtgts : 0;
}

This means that in the reconnect loop in smb2_reconnect() we will immediately wrap retries to -1
and never actually get to pass this conditional:

                if (--retries)
                        continue;

The effect is that we no longer reach the point where we fail the commands with -EHOSTDOWN
and basically the kernel threads are virtually hung and unkillable.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
2020-02-06 09:12:00 -06:00
Linus Torvalds
4c46bef2e9 Merge tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:

 - a set of patches that fixes various corner cases in mount and umount
   code (Xiubo Li). This has to do with choosing an MDS, distinguishing
   between laggy and down MDSes and parsing the server path.

 - inode initialization fixes (Jeff Layton). The one included here
   mostly concerns things like open_by_handle() and there is another one
   that will come through Al.

 - copy_file_range() now uses the new copy-from2 op (Luis Henriques).
   The existing copy-from op turned out to be infeasible for generic
   filesystem use; we disable the copy offload if OSDs don't support
   copy-from2.

 - a patch to link "rbd" and "block" devices together in sysfs (Hannes
   Reinecke)

... and a smattering of cleanups from Xiubo, Jeff and Chengguang.

* tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client: (25 commits)
  rbd: set the 'device' link in sysfs
  ceph: move net/ceph/ceph_fs.c to fs/ceph/util.c
  ceph: print name of xattr in __ceph_{get,set}xattr() douts
  ceph: print r_direct_hash in hex in __choose_mds() dout
  ceph: use copy-from2 op in copy_file_range
  ceph: close holes in structs ceph_mds_session and ceph_mds_request
  rbd: work around -Wuninitialized warning
  ceph: allocate the correct amount of extra bytes for the session features
  ceph: rename get_session and switch to use ceph_get_mds_session
  ceph: remove the extra slashes in the server path
  ceph: add possible_max_rank and make the code more readable
  ceph: print dentry offset in hex and fix xattr_version type
  ceph: only touch the caps which have the subset mask requested
  ceph: don't clear I_NEW until inode metadata is fully populated
  ceph: retry the same mds later after the new session is opened
  ceph: check availability of mds cluster on mount after wait timeout
  ceph: keep the session state until it is released
  ceph: add __send_request helper
  ceph: ensure we have a new cap before continuing in fill_inode
  ceph: drop unused ttl_from parameter from fill_inode
  ...
2020-02-06 12:21:01 +00:00
Linus Torvalds
99be3f6098 Merge tag 'xfs-5.6-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull moar xfs updates from Darrick Wong:
 "This contains the buffer error code refactoring I mentioned last week,
  now that it has had extra time to complete the full xfs fuzz testing
  suite to make sure there aren't any obvious new bugs"

* tag 'xfs-5.6-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix xfs_buf_ioerror_alert location reporting
  xfs: remove unnecessary null pointer checks from _read_agf callers
  xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers
  xfs: remove the xfs_btree_get_buf[ls] functions
  xfs: make xfs_trans_get_buf return an error code
  xfs: make xfs_trans_get_buf_map return an error code
  xfs: make xfs_buf_read return an error code
  xfs: make xfs_buf_get_uncached return an error code
  xfs: make xfs_buf_get return an error code
  xfs: make xfs_buf_read_map return an error code
  xfs: make xfs_buf_get_map return an error code
  xfs: make xfs_buf_alloc return an error code
2020-02-06 07:58:38 +00:00