Commit Graph

48159 Commits

Author SHA1 Message Date
Mauro (mdrjr) Ribeiro
a79c7062da Merge tag 'v4.9.162' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.162 stable release

Change-Id: I8042a515f639d3857bb6aae03284aabaede1f7b9
2019-03-06 10:45:14 -03:00
Mauro (mdrjr) Ribeiro
2ab9e6444f Merge tag 'v4.9.161' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.161 stable release
2019-03-06 10:45:05 -03:00
Ernesto A. Fernández
93ab28b29c direct-io: allow direct writes to empty inodes
[ Upstream commit 8b9433eb4d ]

On a DIO_SKIP_HOLES filesystem, the ->get_block() method is currently
not allowed to create blocks for an empty inode.  This confusion comes
from trying to bit shift a negative number, so check the size of the
inode first.

The problem is most visible for hfsplus, because the fallback to
buffered I/O doesn't happen and the write fails with EIO.  This is in
part the fault of the module, because it gives a wrong return value on
->get_block(); that will be fixed in a separate patch.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-05 17:57:05 +01:00
Michal Hocko
2d182ba434 proc, oom: do not report alien mms when setting oom_score_adj
commit b2b469939e upstream.

Tetsuo has reported that creating a thousands of processes sharing MM
without SIGHAND (aka alien threads) and setting
/proc/<pid>/oom_score_adj will swamp the kernel log and takes ages [1]
to finish.  This is especially worrisome that all that printing is done
under RCU lock and this can potentially trigger RCU stall or softlockup
detector.

The primary reason for the printk was to catch potential users who might
depend on the behavior prior to 44a70adec9 ("mm, oom_adj: make sure
processes sharing mm have same view of oom_score_adj") but after more
than 2 years without a single report I guess it is safe to simply remove
the printk altogether.

The next step should be moving oom_score_adj over to the mm struct and
remove all the tasks crawling as suggested by [2]

[1] http://lkml.kernel.org/r/97fce864-6f75-bca5-14bc-12c9f890e740@i-love.sakura.ne.jp
[2] http://lkml.kernel.org/r/20190117155159.GA4087@dhcp22.suse.cz

Link: http://lkml.kernel.org/r/20190212102129.26288-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-27 10:06:58 +01:00
Yan, Zheng
79e3959b68 ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
commit 04242ff3ac upstream.

Otherwise, mdsc->snap_flush_list may get corrupted.

Cc: stable@vger.kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-27 10:06:58 +01:00
tao zeng
49061a2a4c mm: optimize stack usage for functions [1/1]
PD#SWPL-1773

Problem:
After adding optimization of vmap stack, we can found stack usage
of each functions when handle vmap fault. From test log we see some
functions using large stack size which over 256bytes. Especially
common call path from fs. We need to optimize stack usage of these
functions to reduce stack fault probability and save stack memory
usage.

Solution:
1. remove CONFIG_CC_STACKPROTECTOR_STRONG and set STACKPROTECTOR to
   NONE. This can save stack usage add by compiler for most functions.
   Kernel code size can also save over 1MB.
2. Add some noinline functions for android_fs_data rw trace calls. In
   these trace call it allcated a 256 bytes local buffer.
3. Add a wrap function for mem abort handler. By default, it defined a
   siginfo struct(size over 100 bytes) in local but only used when fault
   can't be handled.
4. reduce cached page size for vmap stack since probability of page
   fault caused by stack overflow is reduced after function stack usage
   optimized.
Monkey test show real stack usage ratio compared with 1st vmap
implementation reduced from 35% ~ 38% to 26 ~ 27%. Which is very
close to 25%, theory limit.

Verify:
P212

Change-Id: I5505cacc1cab51f88654052902852fd648b6a036
Signed-off-by: tao zeng <tao.zeng@amlogic.com>
2019-02-26 18:15:11 +09:00
tao zeng
76789cadf7 mm: optimize thread stack usage on arm64 [1/1]
PD#SWPL-1219

Problem:
On arm64, thread stack is 16KB for each task. If running task number
is large, this type of memory may over 40MB. It's a large amount on
small memory platform. But most case thread only use less 4KB stack.
It's waste of memory and we need optimize it.

Solution:
1. Pre-allocate a vmalloc address space for task stack;
2. Only map 1st page for stack and handle page fault in EL1
   when stack growth triggered exception;
3. handle stack switch for exception.

Verify:
p212

Change-Id: I47f511ccfa2868d982bc10a820ed6435b6d52ba9
Signed-off-by: tao zeng <tao.zeng@amlogic.com>
2019-02-26 18:13:09 +09:00
Mauro (mdrjr) Ribeiro
c96db883a1 Merge tag 'v4.9.160' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.160 stable release
2019-02-25 05:49:52 -03:00
Mauro (mdrjr) Ribeiro
b8fc2fa121 Merge tag 'v4.9.159' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.159 stable release
2019-02-25 05:49:30 -03:00
Mauro (mdrjr) Ribeiro
a71f18485f Merge tag 'v4.9.158' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.158 stable release
2019-02-25 05:49:05 -03:00
Mauro (mdrjr) Ribeiro
039a2ed13b Merge tag 'v4.9.157' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.157 stable release
2019-02-25 05:48:43 -03:00
Qu Wenruo
de5f88f888 btrfs: Remove false alert when fiemap range is smaller than on-disk extent
commit 848c23b78f upstream.

Commit 4751832da9 ("btrfs: fiemap: Cache and merge fiemap extent before
submit it to user") introduced a warning to catch unemitted cached
fiemap extent.

However such warning doesn't take the following case into consideration:

0			4K			8K
|<---- fiemap range --->|
|<----------- On-disk extent ------------------>|

In this case, the whole 0~8K is cached, and since it's larger than
fiemap range, it break the fiemap extent emit loop.
This leaves the fiemap extent cached but not emitted, and caught by the
final fiemap extent sanity check, causing kernel warning.

This patch removes the kernel warning and renames the sanity check to
emit_last_fiemap_cache() since it's possible and valid to have cached
fiemap extent.

Reported-by: David Sterba <dsterba@suse.cz>
Reported-by: Adam Borowski <kilobyte@angband.pl>
Fixes: 4751832da9 ("btrfs: fiemap: Cache and merge fiemap extent ...")
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-23 09:05:59 +01:00
Ross Lagerwall
04d7680c0f cifs: Limit memory used by lock request calls to a page
[ Upstream commit 92a8109e4d ]

The code tries to allocate a contiguous buffer with a size supplied by
the server (maxBuf). This could fail if memory is fragmented since it
results in high order allocations for commonly used server
implementations. It is also wasteful since there are probably
few locks in the usual case. Limit the buffer to be no larger than a
page to avoid memory allocation failures due to fragmentation.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-20 10:18:30 +01:00
Linus Torvalds
4a1802e367 Revert "exec: load_script: don't blindly truncate shebang string"
commit cb5b020a8d upstream.

This reverts commit 8099b047ec.

It turns out that people do actually depend on the shebang string being
truncated, and on the fact that an interpreter (like perl) will often
just re-interpret it entirely to get the full argument list.

Reported-by: Samuel Dionne-Riel <samuel@dionne-riel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-15 09:07:33 +01:00
Greg Kroah-Hartman
80000148ea Revert "cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)"
This reverts commit 4cd376638c which is
commit 6e785302da upstream.

Yi writes:
	I notice that 4.4.169 merged 60da90b224 ("cifs: In Kconfig
	CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)") add
	a Kconfig dependency CIFS_ALLOW_INSECURE_LEGACY, which was not
	defined in 4.4 stable, so after this patch we are not able to
	enable CIFS_POSIX anymore. Linux 4.4 stable didn't merge the
	legacy dialects codes, so do we really need this patch for 4.4?

So revert this patch in 4.9 as well.

Reported-by: "zhangyi (F)" <yi.zhang@huawei.com>
Cc: Steve French <stfrench@microsoft.com>
Cc: Pavel Shilovsky <pshilov@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-15 08:07:39 +01:00
J. Bruce Fields
877362fd15 nfsd4: catch some false session retries
commit 53da6a53e1 upstream.

The spec allows us to return NFS4ERR_SEQ_FALSE_RETRY if we notice that
the client is making a call that matches a previous (slot, seqid) pair
but that *isn't* actually a replay, because some detail of the call
doesn't actually match the previous one.

Catching every such case is difficult, but we may as well catch a few
easy ones.  This also handles the case described in the previous patch,
in a different way.

The spec does however require us to catch the case where the difference
is in the rpc credentials.  This prevents somebody from snooping another
user's replies by fabricating retries.

(But the practical value of the attack is limited by the fact that the
replies with the most sensitive data are READ replies, which are not
normally cached.)

Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-15 08:07:39 +01:00
J. Bruce Fields
f92c45b798 nfsd4: fix cached replies to solo SEQUENCE compounds
commit 085def3ade upstream.

Currently our handling of 4.1+ requests without "cachethis" set is
confusing and not quite correct.

Suppose a client sends a compound consisting of only a single SEQUENCE
op, and it matches the seqid in a session slot (so it's a retry), but
the previous request with that seqid did not have "cachethis" set.

The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP,
but the protocol only allows that to be returned on the op following the
SEQUENCE, and there is no such op in this case.

The protocol permits us to cache replies even if the client didn't ask
us to.  And it's easy to do so in the case of solo SEQUENCE compounds.

So, when we get a solo SEQUENCE, we can either return the previously
cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some
way from the original call.

Currently, we're returning a corrupt reply in the case a solo SEQUENCE
matches a previous compound with more ops.  This actually matters
because the Linux client recently started doing this as a way to recover
from lost replies to idempotent operations in the case the process doing
the original reply was killed: in that case it's difficult to keep the
original arguments around to do a real retry, and the client no longer
cares what the result is anyway, but it would like to make sure that the
slot's sequence id has been incremented, and the solo SEQUENCE assures
that: if the server never got the original reply, it will increment the
sequence id.  If it did get the original reply, it won't increment, and
nothing else that about the reply really matters much.  But we can at
least attempt to return valid xdr!

Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-15 08:07:39 +01:00
Greg Kroah-Hartman
b01311758a debugfs: fix debugfs_rename parameter checking
commit d88c93f090 upstream.

debugfs_rename() needs to check that the dentries passed into it really
are valid, as sometimes they are not (i.e. if the return value of
another debugfs call is passed into this one.)  So fix this up by
properly checking if the two parent directories are errors (they are
allowed to be NULL), and if the dentry to rename is not NULL or an
error.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-15 08:07:38 +01:00
Mauro (mdrjr) Ribeiro
3b7e1f914d Merge tag 'v4.9.156' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.156 stable release
2019-02-13 20:10:32 -02:00
Mauro (mdrjr) Ribeiro
a4c58a195e Merge tag 'v4.9.155' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.155 stable release
2019-02-13 20:10:25 -02:00
Mauro (mdrjr) Ribeiro
c3193985da Merge tag 'v4.9.154' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.154 stable release
2019-02-13 20:10:19 -02:00
Mauro (mdrjr) Ribeiro
33464973cc Merge tag 'v4.9.153' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.153 stable release
2019-02-13 20:10:13 -02:00
Mauro (mdrjr) Ribeiro
d6a65d5b04 Merge tag 'v4.9.152' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.152 stable release
2019-02-13 20:10:04 -02:00
Mauro (mdrjr) Ribeiro
0407aed495 Merge tag 'v4.9.151' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.151 stable release
2019-02-13 20:06:26 -02:00
Mauro (mdrjr) Ribeiro
16fbab977e Merge tag 'v4.9.150' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.150 stable release
2019-02-13 20:06:19 -02:00
Mauro (mdrjr) Ribeiro
6a990daa83 Merge tag 'v4.9.149' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.149 stable release
2019-02-13 20:06:12 -02:00
Mauro (mdrjr) Ribeiro
1167497010 Merge tag 'v4.9.148' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.148 stable release
2019-02-13 20:05:54 -02:00
Mauro (mdrjr) Ribeiro
dd003401a2 Merge tag 'v4.9.147' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.147 stable release
2019-02-13 20:02:58 -02:00
Mauro (mdrjr) Ribeiro
97de9566b3 Merge tag 'v4.9.146' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.146 stable release
2019-02-13 20:02:51 -02:00
Mauro (mdrjr) Ribeiro
a6422fb9fc Merge tag 'v4.9.145' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.145 stable release
2019-02-13 20:02:43 -02:00
Mauro (mdrjr) Ribeiro
836ef42e01 Merge tag 'v4.9.144' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.144 stable release
2019-02-13 20:02:29 -02:00
Mauro (mdrjr) Ribeiro
b598b5aeab Merge tag 'v4.9.143' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.143 stable release
2019-02-13 08:43:33 -02:00
Mauro (mdrjr) Ribeiro
663a04717d Merge tag 'v4.9.142' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidn2-4.9.y
This is the 4.9.142 stable release
2019-02-13 08:40:45 -02:00
Miklos Szeredi
64702dea06 fuse: handle zero sized retrieve correctly
commit 97e1532ef8 upstream.

Dereferencing req->page_descs[0] will Oops if req->max_pages is zero.

Reported-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
Tested-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
Fixes: b2430d7567 ("fuse: add per-page descriptor <offset, length> to fuse_req")
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12 19:45:00 +01:00
Miklos Szeredi
4f3d69898e fuse: decrement NR_WRITEBACK_TEMP on the right page
commit a2ebba8241 upstream.

NR_WRITEBACK_TEMP is accounted on the temporary page in the request, not
the page cache page.

Fixes: 8b284dc472 ("fuse: writepages: handle same page rewrites")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12 19:45:00 +01:00
Jann Horn
50449aafb8 fuse: call pipe_buf_release() under pipe lock
commit 9509941e9c upstream.

Some of the pipe_buf_release() handlers seem to assume that the pipe is
locked - in particular, anon_pipe_buf_release() accesses pipe->tmp_page
without taking any extra locks. From a glance through the callers of
pipe_buf_release(), it looks like FUSE is the only one that calls
pipe_buf_release() without having the pipe locked.

This bug should only lead to a memory leak, nothing terrible.

Fixes: dd3bb14f44 ("fuse: support splice() writing to fuse device")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12 19:45:00 +01:00
Oleg Nesterov
9e5229ebb0 exec: load_script: don't blindly truncate shebang string
[ Upstream commit 8099b047ec ]

load_script() simply truncates bprm->buf and this is very wrong if the
length of shebang string exceeds BINPRM_BUF_SIZE-2.  This can silently
truncate i_arg or (worse) we can execute the wrong binary if buf[2:126]
happens to be the valid executable path.

Change load_script() to return ENOEXEC if it can't find '\n' or zero in
bprm->buf.  Note that '\0' can come from either
prepare_binprm()->memset() or from kernel_read(), we do not care.

Link: http://lkml.kernel.org/r/20181112160931.GA28463@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ben Woodard <woodard@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:44:59 +01:00
Davidlohr Bueso
cf96e96395 fs/epoll: drop ovflist branch prediction
[ Upstream commit 76699a67f3 ]

The ep->ovflist is a secondary ready-list to temporarily store events
that might occur when doing sproc without holding the ep->wq.lock.  This
accounts for every time we check for ready events and also send events
back to userspace; both callbacks, particularly the latter because of
copy_to_user, can account for a non-trivial time.

As such, the unlikely() check to see if the pointer is being used, seems
both misleading and sub-optimal.  In fact, we go to an awful lot of
trouble to sync both lists, and populating the ovflist is far from an
uncommon scenario.

For example, profiling a concurrent epoll_wait(2) benchmark, with
CONFIG_PROFILE_ANNOTATED_BRANCHES shows that for a two threads a 33%
incorrect rate was seen; and when incrementally increasing the number of
epoll instances (which is used, for example for multiple queuing load
balancing models), up to a 90% incorrect rate was seen.

Similarly, by deleting the prediction, 3% throughput boost was seen
across incremental threads.

Link: http://lkml.kernel.org/r/20181108051006.18751-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:44:59 +01:00
Junxiao Bi
bc51a8ac6c ocfs2: don't clear bh uptodate for block read
[ Upstream commit 70306d9dce ]

For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
and submit the io, second wait io done, last check whether bh uptodate, if
not return io error.

If two sync io for the same bh were issued, it could be the first io done
and set uptodate flag, but just before check that flag, the second io came
in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
will return IO error.

Indeed it's not necessary to clear uptodate flag, as the io end handler
end_buffer_read_sync() will set or clear it based on io succeed or failed.

The following message was found from a nfs server but the underlying
storage returned no error.

[4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
[4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
[4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
[4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
[4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5

Same issue in non sync read ocfs2_read_blocks(), fixed it as well.

Link: http://lkml.kernel.org/r/20181121020023.3034-4-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:44:58 +01:00
Sahitya Tummala
564651f1cb f2fs: fix sbi->extent_list corruption issue
[ Upstream commit e4589fa545 ]

When there is a failure in f2fs_fill_super() after/during
the recovery of fsync'd nodes, it frees the current sbi and
retries again. This time the mount is successful, but the files
that got recovered before retry, still holds the extent tree,
whose extent nodes list is corrupted since sbi and sbi->extent_list
is freed up. The list_del corruption issue is observed when the
file system is getting unmounted and when those recoverd files extent
node is being freed up in the below context.

list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
<...>
kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
lr : __list_del_entry_valid+0x94/0xb4
pc : __list_del_entry_valid+0x94/0xb4
<...>
Call trace:
__list_del_entry_valid+0x94/0xb4
__release_extent_node+0xb0/0x114
__free_extent_tree+0x58/0x7c
f2fs_shrink_extent_tree+0xdc/0x3b0
f2fs_leave_shrinker+0x28/0x7c
f2fs_put_super+0xfc/0x1e0
generic_shutdown_super+0x70/0xf4
kill_block_super+0x2c/0x5c
kill_f2fs_super+0x44/0x50
deactivate_locked_super+0x60/0x8c
deactivate_super+0x68/0x74
cleanup_mnt+0x40/0x78
__cleanup_mnt+0x1c/0x28
task_work_run+0x48/0xd0
do_notify_resume+0x678/0xe98
work_pending+0x8/0x14

Fix this by not creating extents for those recovered files if shrinker is
not registered yet. Once mount is successful and shrinker is registered,
those files can have extents again.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:44:58 +01:00
Ronnie Sahlberg
5c61a608d8 cifs: check ntwrk_buf_start for NULL before dereferencing it
[ Upstream commit 59a63e479c ]

RHBZ: 1021460

There is an issue where when multiple threads open/close the same directory
ntwrk_buf_start might end up being NULL, causing the call to smbCalcSize
later to oops with a NULL deref.

The real bug is why this happens and why this can become NULL for an
open cfile, which should not be allowed.
This patch tries to avoid a oops until the time when we fix the underlying
issue.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:44:58 +01:00
Chris Perl
a37706aca5 NFS: nfs_compare_mount_options always compare auth flavors.
[ Upstream commit 594d1644cd ]

This patch removes the check from nfs_compare_mount_options to see if a
`sec' option was passed for the current mount before comparing auth
flavors and instead just always compares auth flavors.

Consider the following scenario:

You have a server with the address 192.168.1.1 and two exports /export/a
and /export/b.  The first export supports `sys' and `krb5' security, the
second just `sys'.

Assume you start with no mounts from the server.

The following results in EIOs being returned as the kernel nfs client
incorrectly thinks it can share the underlying `struct nfs_server's:

$ mkdir /tmp/{a,b}
$ sudo mount -t nfs -o vers=3,sec=krb5 192.168.1.1:/export/a /tmp/a
$ sudo mount -t nfs -o vers=3          192.168.1.1:/export/b /tmp/b
$ df >/dev/null
df: ‘/tmp/b’: Input/output error

Signed-off-by: Chris Perl <cperl@janestreet.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:44:57 +01:00
Jan Kara
46171c3288 udf: Fix BUG on corrupted inode
[ Upstream commit d288d95842 ]

When inode is corrupted so that extent type is invalid, some functions
(such as udf_truncate_extents()) will just BUG. Check that extent type
is valid when loading the inode to memory.

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:44:55 +01:00
J. Bruce Fields
60846115bb nfsd4: fix crash on writing v4_end_grace before nfsd startup
[ Upstream commit 62a063b8e7 ]

Anatoly Trosinenko reports that this:

1) Checkout fresh master Linux branch (tested with commit e195ca6cb)
2) Copy x84_64-config-4.14 to .config, then enable NFS server v4 and build
3) From `kvm-xfstests shell`:

results in NULL dereference in locks_end_grace.

Check that nfsd has been started before trying to end the grace period.

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:44:53 +01:00
Tiezhu Yang
5415d1a3dc f2fs: fix wrong return value of f2fs_acl_create
[ Upstream commit f6176473a0 ]

When call f2fs_acl_create_masq() failed, the caller f2fs_acl_create()
should return -EIO instead of -ENOMEM, this patch makes it consistent
with posix_acl_create() which has been fixed in commit beaf226b86
("posix_acl: don't ignore return value of posix_acl_create_masq()").

Fixes: 83dfe53c18 ("f2fs: fix reference leaks in f2fs_acl_create")
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:44:53 +01:00
Yunlei He
6b4eec8a35 f2fs: move dir data flush to write checkpoint process
[ Upstream commit b61ac5b720 ]

This patch move dir data flush to write checkpoint process, by
doing this, it may reduce some time for dir fsync.

pre:
	-f2fs_do_sync_file enter
		-file_write_and_wait_range  <- flush & wait
		-write_checkpoint
			-do_checkpoint	    <- wait all
	-f2fs_do_sync_file exit

now:
	-f2fs_do_sync_file enter
		-write_checkpoint
			-block_operations   <- flush dir & no wait
			-do_checkpoint	    <- wait all
	-f2fs_do_sync_file exit

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:44:53 +01:00
Bob Peterson
035801540c dlm: Don't swamp the CPU with callbacks queued during recovery
[ Upstream commit 216f0efd19 ]

Before this patch, recovery would cause all callbacks to be delayed,
put on a queue, and afterward they were all queued to the callback
work queue. This patch does the same thing, but occasionally takes
a break after 25 of them so it won't swamp the CPU at the expense
of other RT processes like corosync.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:44:51 +01:00
Amir Goldstein
987d8ff3a2 fanotify: fix handling of events on child sub-directory
commit b469e7e47c upstream.

When an event is reported on a sub-directory and the parent inode has
a mark mask with FS_EVENT_ON_CHILD|FS_ISDIR, the event will be sent to
fsnotify() even if the event type is not in the parent mark mask
(e.g. FS_OPEN).

Further more, if that event happened on a mount or a filesystem with
a mount/sb mark that does have that event type in their mask, the "on
child" event will be reported on the mount/sb mark.  That is not
desired, because user will get a duplicate event for the same action.

Note that the event reported on the victim inode is never merged with
the event reported on the parent inode, because of the check in
should_merge(): old_fsn->inode == new_fsn->inode.

Fix this by looking for a match of an actual event type (i.e. not just
FS_ISDIR) in parent's inode mark mask and by not reporting an "on child"
event to group if event type is only found on mount/sb marks.

[backport hint: The bug seems to have always been in fanotify, but this
                patch will only apply cleanly to v4.19.y]

Cc: <stable@vger.kernel.org> # v4.19
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[amir: backport to v4.9]
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:33:30 +01:00
Dave Chinner
d6f62ecb9e fs: don't scan the inode cache before SB_BORN is set
commit 79f546a696 upstream.

We recently had an oops reported on a 4.14 kernel in
xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
and so the m_perag_tree lookup walked into lala land.  It produces
an oops down this path during the failed mount:

  radix_tree_gang_lookup_tag+0xc4/0x130
  xfs_perag_get_tag+0x37/0xf0
  xfs_reclaim_inodes_count+0x32/0x40
  xfs_fs_nr_cached_objects+0x11/0x20
  super_cache_count+0x35/0xc0
  shrink_slab.part.66+0xb1/0x370
  shrink_node+0x7e/0x1a0
  try_to_free_pages+0x199/0x470
  __alloc_pages_slowpath+0x3a1/0xd20
  __alloc_pages_nodemask+0x1c3/0x200
  cache_grow_begin+0x20b/0x2e0
  fallback_alloc+0x160/0x200
  kmem_cache_alloc+0x111/0x4e0

The problem is that the superblock shrinker is running before the
filesystem structures it depends on have been fully set up. i.e.
the shrinker is registered in sget(), before ->fill_super() has been
called, and the shrinker can call into the filesystem before
fill_super() does it's setup work. Essentially we are exposed to
both use-after-free and use-before-initialisation bugs here.

To fix this, add a check for the SB_BORN flag in super_cache_count.
In general, this flag is not set until ->fs_mount() completes
successfully, so we know that it is set after the filesystem
setup has completed. This matches the trylock_super() behaviour
which will not let super_cache_scan() run if SB_BORN is not set, and
hence will not allow the superblock shrinker from entering the
filesystem while it is being set up or after it has failed setup
and is being torn down.

Cc: stable@kernel.org
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Aaron Lu <aaron.lu@linux.alibaba.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:33:29 +01:00
Paulo Alcantara
fb713a1737 cifs: Always resolve hostname before reconnecting
commit 28eb24ff75 upstream.

In case a hostname resolves to a different IP address (e.g. long
running mounts), make sure to resolve it every time prior to calling
generic_ip_connect() in reconnect.

Suggested-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:33:29 +01:00