Commit Graph

1060223 Commits

Author SHA1 Message Date
Olga Kornievskaia
1976b2b314 NFSv4.1 query for fs_location attr on a new file system
Query the server for other possible trunkable locations for a given
file system on a 4.1+ mount.

v2:
-- added missing static to nfs4_discover_trunking,
reported by the kernel test robot

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-13 09:30:48 -05:00
Olga Kornievskaia
8a59bb93b7 NFSv4 store server support for fs_location attribute
Define and store if server returns it supports fs_locations attribute
as a capability.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-12 14:26:30 -05:00
Olga Kornievskaia
90e12a3191 NFSv4 remove zero number of fs_locations entries error check
Remove the check for the zero length fs_locations reply in the
xdr decoding, and instead check for that in the migration code.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-12 14:26:05 -05:00
Trond Myklebust
1751fc1db3 NFSv4: nfs_atomic_open() can race when looking up a non-regular file
If the file type changes back to being a regular file on the server
between the failed OPEN and our LOOKUP, then we need to re-run the OPEN.

Fixes: 0dd2b474d0 ("nfs: implement i_op->atomic_open()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-07 11:59:31 -05:00
Trond Myklebust
ac795161c9 NFSv4: Handle case where the lookup of a directory fails
If the application sets the O_DIRECTORY flag, and tries to open a
regular file, nfs_atomic_open() will punt to doing a regular lookup.
If the server then returns a regular file, we will happily return a
file descriptor with uninitialised open state.

The fix is to return the expected ENOTDIR error in these cases.

Reported-by: Lyu Tao <tao.lyu@epfl.ch>
Fixes: 0dd2b474d0 ("nfs: implement i_op->atomic_open()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-07 11:59:31 -05:00
Trond Myklebust
34bf20ce98 NFSv42: Fallocate and clone should also request 'blocks used'
Both fallocate and clone can end up updating the blocks used attribute.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:21 -05:00
Trond Myklebust
85847280b1 NFSv4: Allow writebacks to request 'blocks used'
When doing a non-pNFS write, allow the writeback code to specify that it
also needs to update 'blocks used'.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:21 -05:00
Greg Kroah-Hartman
86439fa267 SUNRPC: use default_groups in kobj_type
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.  Move the sunrpc sysfs code to use default_groups field which has
been the preferred way since aa30f47cf6 ("kobject: Add support for
default attribute groups to kobj_type") so that we can soon get rid of
the obsolete default_attrs field.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:21 -05:00
Greg Kroah-Hartman
01f3424572 NFS: use default_groups in kobj_type
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.  Move the NFS code to use default_groups field which has been the
preferred way since aa30f47cf6 ("kobject: Add support for default
attribute groups to kobj_type") so that we can soon get rid of the
obsolete default_attrs field.

Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Trond Myklebust
68eaba4ca9 NFS: Fix the verifier for case sensitive filesystem in nfs_atomic_open()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Trond Myklebust
00bdadc7ac NFS: Add a helper to remove case-insensitive aliases
When dealing with case insensitive names, the client has no idea how the
server performs the mapping, so cannot collapse the dentries into a
single representative. So both rename and unlink need to deal with the
fact that there could be several dentries representing the file, and
have to somehow force them to be revalidated. Use d_prune_aliases() as a
big hammer approach.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Trond Myklebust
8ce37abdeb NFS: Invalidate negative dentries on all case insensitive directory changes
If we create a file, rename it, or hardlink it, then we need to assume
that cached negative dentries need to be revalidated.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Trond Myklebust
98ca3ee60b NFSv4: Just don't cache negative dentries on case insensitive servers
If the directory contents change, we cannot rely on the negative dentry
being cacheable.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Trond Myklebust
1ab5be4ac5 NFSv4: Add some support for case insensitive filesystems
Add capabilities to allow the NFS client to recognise when it is dealing
with case insensitive and case preserving filesystems.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Trond Myklebust
b05bf5c63b NFSv4.1: Fix uninitialised variable in devicenotify
When decode_devicenotify_args() exits with no entries, we need to
ensure that the struct cb_devicenotifyargs is initialised to
{ 0, NULL } in order to avoid problems in
nfs4_callback_devicenotify().

Reported-by: <rtm@csail.mit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Xiaoke Wang
fbd2057e53 nfs: nfs4clinet: check the return value of kstrdup()
kstrdup() returns NULL when some internal memory errors happen, it is
better to check the return value of it so to catch the memory error in
time.

Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Olga Kornievskaia
2c52c8376d NFSv4 only print the label when its queried
When the bitmask of the attributes doesn't include the security label,
don't bother printing it. Since the label might not be null terminated,
adjust the printing format accordingly.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Jiapeng Chong
c4f0396688 SUNRPC: clean up some inconsistent indenting
Eliminate the follow smatch warning:

net/sunrpc/xprtsock.c:1912 xs_local_connect() warn: inconsistent
indenting.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Xu Wang
35e0f9a9af sunrpc: Remove unneeded null check
In g_verify_token_header, the null check of 'ret'
is unneeded to be done twice.

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Gustavo A. R. Silva
c72a826829 nfs41: pnfs: filelayout: Replace one-element array with flexible-array member
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

Refactor the code a bit according to the use of a flexible-array member
in struct nfs4_file_layout_dsaddr instead of a one-element array, and
use the struct_size() helper.

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

This issue was found with the help of Coccinelle and audited and fixed,
manually.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Pierguido Lambri
4b0c359b81 SUNRPC: Add source address/port to rpc_socket* traces
The rpc_socket* traces now show also the source address
and port. An example is:

kworker/u17:1-951   [005] 134218.925343: rpc_socket_close:
   socket:[46913] srcaddr=192.168.100.187:793 dstaddr=192.168.100.129:2049
   state=4 (DISCONNECTING) sk_state=7 (CLOSE)
kworker/u17:0-242   [006] 134360.841370: rpc_socket_connect:
   error=-115 socket:[56322] srcaddr=192.168.100.187:769
   dstaddr=192.168.100.129:2049 state=2 (CONNECTING) sk_state=2 (SYN_SENT)
       <idle>-0     [006] 134360.841859: rpc_socket_state_change: socket:[56322]
   srcaddr=192.168.100.187:769 dstaddr=192.168.100.129:2049 state=2 (CONNECTING)
   sk_state=1 (ESTABLISHED)

Signed-off-by: Pierguido Lambri <plambri@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Trond Myklebust
6ff9d99bb8 NFS: Ensure the server has an up to date ctime before renaming
Renaming a file is required by POSIX to update the file ctime, so
ensure that the file data is synced to disk so that we don't clobber the
updated ctime by writing back after creating the hard link.

Fixes: f2c2c552f1 ("NFS: Move delegation recall into the NFSv4 callback for rename_setup()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Trond Myklebust
204975036b NFS: Ensure the server has an up to date ctime before hardlinking
Creating a hard link is required by POSIX to update the file ctime, so
ensure that the file data is synced to disk so that we don't clobber the
updated ctime by writing back after creating the hard link.

Fixes: 9f76827287 ("NFS: Move the delegation return down into nfs4_proc_link()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
NeilBrown
6238aec83f NFS: don't store 'struct cred *' in struct nfs_access_entry
Storing the 'struct cred *' in nfs_access_entry is problematic.
An active 'cred' can keep a 'struct key *' active, and a quota is
imposed on the number of such keys that a user can maintain.
Cached 'nfs_access_entry' structs have indefinite lifetime, and having
these keep 'struct key's alive imposes on that quota.

So remove the 'struct cred *' and replace it with the fields we need:
  kuid_t, kgid_t, and struct group_info *

This makes the 'struct nfs_access_entry' 64 bits larger.

New function "access_cmp" is introduced which is identical to
cred_fscmp() except that the second arg is an 'nfs_access_entry', rather
than a 'cred'

Fixes: b68572e07c ("NFS: change access cache to use 'struct cred'.")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
NeilBrown
73fbb3fa64 NFS: pass cred explicitly for access tests
Storing the 'struct cred *' in nfs_access_entry is problematic.
An active 'cred' can keep a 'struct key *' active, and a quota is
imposed on the number of such keys that a user can maintain.
Cached 'nfs_access_entry' structs have indefinite lifetime, and having
these keep 'struct key's alive imposes on that quota.

So a future patch will remove the ->cred ref from nfs_access_entry.

To prepare, change various functions to not assume there is a 'cred' in
the nfs_access_entry, but to pass the cred around explicitly.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
NeilBrown
b5e7b59c34 NFS: change nfs_access_get_cached to only report the mask
Currently the nfs_access_get_cached family of functions report a
'struct nfs_access_entry' as the result, with both .mask and .cred set.
However the .cred is never used.  This is probably good and there is no
guarantee that it won't be freed before use.

Change to only report the 'mask' - as this is all that is used or needed.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:19 -05:00
Linus Torvalds
c9e6606c7f Linux 5.16-rc8 2022-01-02 14:23:25 -08:00
Linus Torvalds
24a0b22061 Merge tag 'perf-tools-fixes-for-v5.16-2022-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix TUI exit screen refresh race condition in 'perf top'.

 - Fix parsing of Intel PT VM time correlation arguments.

 - Honour CPU filtering command line request of a script's switch events
   in 'perf script'.

 - Fix printing of switch events in Intel PT python script.

 - Fix duplicate alias events list printing in 'perf list', noticed on
   heterogeneous arm64 systems.

 - Fix return value of ids__new(), users expect NULL for failure, not
   ERR_PTR(-ENOMEM).

* tag 'perf-tools-fixes-for-v5.16-2022-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf top: Fix TUI exit screen refresh race condition
  perf pmu: Fix alias events list
  perf scripts python: intel-pt-events.py: Fix printing of switch events
  perf script: Fix CPU filtering of a script's switch events
  perf intel-pt: Fix parsing of VM time correlation arguments
  perf expr: Fix return value of ids__new()
2022-01-02 14:09:03 -08:00
Linus Torvalds
859431ac11 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "Better input validation for compat ioctls and a documentation bugfix
  for 5.16"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  Docs: Fixes link to I2C specification
  i2c: validate user data in compat ioctl
2022-01-02 10:36:09 -08:00
Linus Torvalds
1286cc4893 Merge tag 'x86_urgent_for_v5.16_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:

 - Use the proper CONFIG symbol in a preprocessor check.

* tag 'x86_urgent_for_v5.16_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Use the proper name CONFIG_FW_LOADER
2022-01-02 09:02:54 -08:00
yaowenbin
64f18d2d04 perf top: Fix TUI exit screen refresh race condition
When the following command is executed several times, a coredump file is
generated.

	$ timeout -k 9 5 perf top -e task-clock
	*******
	*******
	*******
	0.01%  [kernel]                  [k] __do_softirq
	0.01%  libpthread-2.28.so        [.] __pthread_mutex_lock
	0.01%  [kernel]                  [k] __ll_sc_atomic64_sub_return
	double free or corruption (!prev) perf top --sort comm,dso
	timeout: the monitored command dumped core

When we terminate "perf top" using sending signal method,
SLsmg_reset_smg() called. SLsmg_reset_smg() resets the SLsmg screen
management routines by freeing all memory allocated while it was active.

However SLsmg_reinit_smg() maybe be called by another thread.

SLsmg_reinit_smg() will free the same memory accessed by
SLsmg_reset_smg(), thus it results in a double free.

SLsmg_reinit_smg() is called already protected by ui__lock, so we fix
the problem by adding pthread_mutex_trylock of ui__lock when calling
SLsmg_reset_smg().

Signed-off-by: Wenyu Liu <liuwenyu7@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: wuxu.wu@huawei.com
Link: http://lore.kernel.org/lkml/a91e3943-7ddc-f5c0-a7f5-360f073c20e6@huawei.com
Signed-off-by: Hewenliang <hewenliang4@huawei.com>
Signed-off-by: yaowenbin <yaowenbin1@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-02 11:46:44 -03:00
John Garry
e0257a01d6 perf pmu: Fix alias events list
Commit 0e0ae87422 ("perf list: Display hybrid PMU events with cpu
type") changes the event list for uncore PMUs or arm64 heterogeneous CPU
systems, such that duplicate aliases are incorrectly listed per PMU
(which they should not be), like:

  # perf list
  ...
  unc_cbo_cache_lookup.any_es
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in E or S-state]
  unc_cbo_cache_lookup.any_es
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in E or S-state]
  unc_cbo_cache_lookup.any_i
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in I-state]
  unc_cbo_cache_lookup.any_i
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in I-state]
  ...

Notice how the events are listed twice.

The named commit changed how we remove duplicate events, in that events
for different PMUs are not treated as duplicates. I suppose this is to
handle how "Each hybrid pmu event has been assigned with a pmu name".

Fix PMU alias listing by restoring behaviour to remove duplicates for
non-hybrid PMUs.

Fixes: 0e0ae87422 ("perf list: Display hybrid PMU events with cpu type")
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/1640103090-140490-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-02 11:29:05 -03:00
Linus Torvalds
278218f677 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "Two small fixups for spaceball joystick driver and appletouch touchpad
  driver"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: spaceball - fix parsing of movement data packets
  Input: appletouch - initialize work before device registration
2022-01-01 10:21:49 -08:00
Mel Gorman
8008293888 mm: vmscan: reduce throttling due to a failure to make progress -fix
Hugh Dickins reported the following

	My tmpfs swapping load (tweaked to use huge pages more heavily
	than in real life) is far from being a realistic load: but it was
	notably slowed down by your throttling mods in 5.16-rc, and this
	patch makes it well again - thanks.

	But: it very quickly hit NULL pointer until I changed that last
	line to

        if (first_pgdat)
                consider_reclaim_throttle(first_pgdat, sc);

The likely issue is that huge pages are a major component of the test
workload.  When this is the case, first_pgdat may never get set if
compaction is ready to continue due to this check

        if (IS_ENABLED(CONFIG_COMPACTION) &&
            sc->order > PAGE_ALLOC_COSTLY_ORDER &&
            compaction_ready(zone, sc)) {
                sc->compaction_ready = true;
                continue;
        }

If this was true for every zone in the zonelist, first_pgdat would never
get set resulting in a NULL pointer exception.

Link: https://lkml.kernel.org/r/20211209095453.GM3366@techsingularity.net
Fixes: 1b4e3f26f9 ("mm: vmscan: Reduce throttling due to a failure to make progress")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-31 13:12:55 -08:00
Mel Gorman
1b4e3f26f9 mm: vmscan: Reduce throttling due to a failure to make progress
Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
problems due to reclaim throttling for excessive lengths of time.  In
Alexey's case, a memory hog that should go OOM quickly stalls for
several minutes before stalling.  In Mike and Darrick's cases, a small
memcg environment stalled excessively even though the system had enough
memory overall.

Commit 69392a403f ("mm/vmscan: throttle reclaim when no progress is
being made") introduced the problem although commit a19594ca4a
("mm/vmscan: increase the timeout if page reclaim is not making
progress") made it worse.  Systems at or near an OOM state that cannot
be recovered must reach OOM quickly and memcg should kill tasks if a
memcg is near OOM.

To address this, only stall for the first zone in the zonelist, reduce
the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
the scan control nr_reclaimed is 0, kswapd is still active and there
were excessive pages pending for writeback.  If kswapd has stopped
reclaiming due to excessive failures, do not stall at all so that OOM
triggers relatively quickly.  Similarly, if an LRU is simply congested,
only lightly throttle similar to NOPROGRESS.

Alexey's original case was the most straight forward

	for i in {1..3}; do tail /dev/zero; done

On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
completes in a few seconds similar to 5.15.

Alexey's second test case added watching a youtube video while tail runs
10 times.  On 5.15, playback only jitters slightly, 5.16-rc1 stalls a
lot with lots of frames missing and numerous audio glitches.  With this
patch applies, the video plays similarly to 5.15.

[lkp@intel.com: Fix W=1 build warning]

Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de
Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv
Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net
Link: https://lore.kernel.org/r/20211202150614.22440-1-mgorman@techsingularity.net
Link: https://linux-regtracking.leemhuis.info/regzbot/regression/20211124011954.7cab9bb4@mail.inbox.lv/
Reported-and-tested-by: Alexey Avramov <hakavlad@inbox.lv>
Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Reported-and-tested-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tracked-by: Thorsten Leemhuis <regressions@leemhuis.info>
Fixes: 69392a403f ("mm/vmscan: throttle reclaim when no progress is being made")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-31 11:17:07 -08:00
Linus Torvalds
f87bcc88f3 Merge branch 'akpm' (patches from Andrew)
Merge misc mm fixes from Andrew Morton:
 "2 patches.

  Subsystems affected by this patch series: mm (userfaultfd and damon)"

* akpm:
  mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'
  userfaultfd/selftests: fix hugetlb area allocations
2021-12-31 09:28:48 -08:00
Linus Torvalds
e46227bf38 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Three fixes, all in drivers. The lpfc one doesn't look exploitable,
  but nasty things could happen in string operations if mybuf ends up
  with an on stack unterminated string"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: vmw_pvscsi: Set residual data length conditionally
  scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()
  scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()
2021-12-31 09:22:25 -08:00
SeongJae Park
ebb3f994dd mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'
DAMON debugfs interface increases the reference counts of 'struct pid's
for targets from the 'target_ids' file write callback
('dbgfs_target_ids_write()'), but decreases the counts only in DAMON
monitoring termination callback ('dbgfs_before_terminate()').

Therefore, when 'target_ids' file is repeatedly written without DAMON
monitoring start/termination, the reference count is not decreased and
therefore memory for the 'struct pid' cannot be freed.  This commit
fixes this issue by decreasing the reference counts when 'target_ids' is
written.

Link: https://lkml.kernel.org/r/20211229124029.23348-1-sj@kernel.org
Fixes: 4bc05954d0 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>	[5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-31 09:20:12 -08:00
Mike Kravetz
f5c7329718 userfaultfd/selftests: fix hugetlb area allocations
Currently, userfaultfd selftest for hugetlb as run from run_vmtests.sh
or any environment where there are 'just enough' hugetlb pages will
always fail with:

  testing events (fork, remap, remove):
		ERROR: UFFDIO_COPY error: -12 (errno=12, line=616)

The ENOMEM error code implies there are not enough hugetlb pages.
However, there are free hugetlb pages but they are all reserved.  There
is a basic problem with the way the test allocates hugetlb pages which
has existed since the test was originally written.

Due to the way 'cleanup' was done between different phases of the test,
this issue was masked until recently.  The issue was uncovered by commit
8ba6e86408 ("userfaultfd/selftests: reinitialize test context in each
test").

For the hugetlb test, src and dst areas are allocated as PRIVATE
mappings of a hugetlb file.  This means that at mmap time, pages are
reserved for the src and dst areas.  At the start of event testing (and
other tests) the src area is populated which results in allocation of
huge pages to fill the area and consumption of reserves associated with
the area.  Then, a child is forked to fault in the dst area.  Note that
the dst area was allocated in the parent and hence the parent owns the
reserves associated with the mapping.  The child has normal access to
the dst area, but can not use the reserves created/owned by the parent.
Thus, if there are no other huge pages available allocation of a page
for the dst by the child will fail.

Fix by not creating reserves for the dst area.  In this way the child
can use free (non-reserved) pages.

Also, MAP_PRIVATE of a file only makes sense if you are interested in
the contents of the file before making a COW copy.  The test does not do
this.  So, just use MAP_ANONYMOUS | MAP_HUGETLB to create an anonymous
hugetlb mapping.  There is no need to create a hugetlb file in the
non-shared case.

Link: https://lkml.kernel.org/r/20211217172919.7861-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-31 09:20:12 -08:00
Deep Majumder
c116fe1e18 Docs: Fixes link to I2C specification
The link to the I2C specification is broken. Although
"https://www.nxp.com" hosts Rev 7 (2021) of this specification, it is
behind a login-wall. Thus, an additional link has been added (which
doesn't require a login) and the NXP official docs link has been
updated.

Signed-off-by: Deep Majumder <deep@fastmail.in>
[wsa: minor updates to text and commit message]
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-12-31 14:39:28 +01:00
Pavel Skripkin
bb436283e2 i2c: validate user data in compat ioctl
Wrong user data may cause warning in i2c_transfer(), ex: zero msgs.
Userspace should not be able to trigger warnings, so this patch adds
validation checks for user data in compact ioctl to prevent reported
warnings

Reported-and-tested-by: syzbot+e417648b303855b91d8a@syzkaller.appspotmail.com
Fixes: 7d5cb45655 ("i2c compat ioctls: move to ->compat_ioctl()")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-12-31 14:28:22 +01:00
Leo L. Schwab
bc7ec91718 Input: spaceball - fix parsing of movement data packets
The spaceball.c module was not properly parsing the movement reports
coming from the device.  The code read axis data as signed 16-bit
little-endian values starting at offset 2.

In fact, axis data in Spaceball movement reports are signed 16-bit
big-endian values starting at offset 3.  This was determined first by
visually inspecting the data packets, and later verified by consulting:
http://spacemice.org/pdf/SpaceBall_2003-3003_Protocol.pdf

If this ever worked properly, it was in the time before Git...

Signed-off-by: Leo L. Schwab <ewhac@ewhac.org>
Link: https://lore.kernel.org/r/20211221101630.1146385-1-ewhac@ewhac.org
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2021-12-30 21:09:29 -08:00
Pavel Skripkin
9f3ccdc3f6 Input: appletouch - initialize work before device registration
Syzbot has reported warning in __flush_work(). This warning is caused by
work->func == NULL, which means missing work initialization.

This may happen, since input_dev->close() calls
cancel_work_sync(&dev->work), but dev->work initalization happens _after_
input_register_device() call.

So this patch moves dev->work initialization before registering input
device

Fixes: 5a6eb676d3 ("Input: appletouch - improve powersaving for Geyser3 devices")
Reported-and-tested-by: syzbot+b88c5eae27386b252bbd@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: https://lore.kernel.org/r/20211230141151.17300-1-paskripkin@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2021-12-30 21:04:04 -08:00
Linus Torvalds
4f3d93c6ea Merge tag 'drm-fixes-2021-12-31' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "This is a bit bigger than I'd like, however it has two weeks of amdgpu
  fixes in it, since they missed last week, which was very small.

  The nouveau regression is probably the biggest fix in here, and it
  needs to go into 5.15 as well, two i915 fixes, and then a scattering
  of amdgpu fixes. The biggest fix in there is for a fencing NULL
  pointer dereference, the rest are pretty minor.

  For the misc team, I've pulled the two misc fixes manually since I'm
  not sure what is happening at this time of year!

  The amdgpu maintainers have the outstanding runpm regression to fix
  still, they are just working through the last bits of it now.

  Summary:

  nouveau:
   - fencing regression fix

  i915:
   - Fix possible uninitialized variable
   - Fix composite fence seqno icrement on each fence creation

  amdgpu:
   - Fencing fix
   - XGMI fix
   - VCN regression fix
   - IP discovery regression fixes
   - Fix runpm documentation
   - Suspend/resume fixes
   - Yellow Carp display fixes
   - MCLK power management fix
   - dma-buf fix"

* tag 'drm-fixes-2021-12-31' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
  drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config
  drm/amd/display: Set optimize_pwr_state for DCN31
  drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
  drm/amd/display: Added power down for DCN10
  drm/amd/display: fix B0 TMDS deepcolor no dislay issue
  drm/amdgpu: no DC support for headless chips
  drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
  drm/amdgpu: always reset the asic in suspend (v2)
  drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
  drm/i915: Increment composite fence seqno
  drm/i915: Fix possible uninitialized variable in parallel extension
  drm/amdgpu: fix runpm documentation
  drm/nouveau: wait for the exclusive fence after the shared ones v2
  drm/amdgpu: add support for IP discovery gc_info table v2
  drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  drm/amd/pm: Fix xgmi link control on aldebaran
  drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence
  drm/amdgpu: fix dropped backing store handling in amdgpu_dma_buf_move_notify
2021-12-30 18:25:43 -08:00
Dave Airlie
ce9b333c73 Merge branch 'drm-misc-fixes' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-fixes
This merges two fixes that haven't been sent to me yet, but I wanted to get in.

One amdgpu fix, but one nouveau regression fixer.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2021-12-31 11:40:29 +10:00
Christian Brauner
012e332286 fs/mount_setattr: always cleanup mount_kattr
Make sure that finish_mount_kattr() is called after mount_kattr was
succesfully built in both the success and failure case to prevent
leaking any references we took when we built it.  We returned early if
path lookup failed thereby risking to leak an additional reference we
took when building mount_kattr when an idmapped mount was requested.

Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 9caccd4154 ("fs: introduce MOUNT_ATTR_IDMAP")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-30 15:12:13 -08:00
Linus Torvalds
74c78b4291 Merge tag 'net-5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from.. Santa?

  No regressions on our radar at this point. The igc problem fixed here
  was the last one I was tracking but it was broken in previous
  releases, anyway. Mostly driver fixes and a couple of largish SMC
  fixes.

  Current release - regressions:

   - xsk: initialise xskb free_list_node, fixup for a -rc7 fix

  Current release - new code bugs:

   - mlx5: handful of minor fixes:

   - use first online CPU instead of hard coded CPU

   - fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'

   - fix skb memory leak when TC classifier action offloads are disabled

   - fix memory leak with rules with internal OvS port

  Previous releases - regressions:

   - igc: do not enable crosstimestamping for i225-V models

  Previous releases - always broken:

   - udp: use datalen to cap ipv6 udp max gso segments

   - fix use-after-free in tw_timer_handler due to early free of stats

   - smc: fix kernel panic caused by race of smc_sock

   - smc: don't send CDC/LLC message if link not ready, avoid timeouts

   - sctp: use call_rcu to free endpoint, avoid UAF in sock diag

   - bridge: mcast: add and enforce query interval minimum

   - usb: pegasus: do not drop long Ethernet frames

   - mlx5e: fix ICOSQ recovery flow for XSK

   - nfc: uapi: use kernel size_t to fix user-space builds"

* tag 'net-5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  fsl/fman: Fix missing put_device() call in fman_port_probe
  selftests: net: using ping6 for IPv6 in udpgro_fwd.sh
  Documentation: fix outdated interpretation of ip_no_pmtu_disc
  net/ncsi: check for error return from call to nla_put_u32
  net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper
  net: fix use-after-free in tw_timer_handler
  selftests: net: Fix a typo in udpgro_fwd.sh
  selftests/net: udpgso_bench_tx: fix dst ip argument
  net: bridge: mcast: add and enforce startup query interval minimum
  net: bridge: mcast: add and enforce query interval minimum
  ipv6: raw: check passed optlen before reading
  xsk: Initialise xskb free_list_node
  net/mlx5e: Fix wrong features assignment in case of error
  net/mlx5e: TC, Fix memory leak with rules with internal port
  ionic: Initialize the 'lif->dbid_inuse' bitmap
  igc: Fix TX timestamp support for non-MSI-X platforms
  igc: Do not enable crosstimestamping for i225-V models
  net/smc: fix kernel panic caused by race of smc_sock
  net/smc: don't send CDC/LLC message if link not ready
  NFC: st21nfca: Fix memory leak in device probe and remove
  ...
2021-12-30 11:12:12 -08:00
Linus Torvalds
9bad743e8d Merge tag 'char-misc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
 "Here are two misc driver fixes for 5.16-final:

   - binder accounting fix to resolve reported problem

   - nitro_enclaves fix for mmap assert warning output

  Both of these have been for over a week with no reported issues"

* tag 'char-misc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  nitro_enclaves: Use get_user_pages_unlocked() call to handle mmap assert
  binder: fix async_free_space accounting for empty parcels
2021-12-30 09:52:32 -08:00
Linus Torvalds
2d40060bb5 Merge tag 'usb-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are some small USB driver fixes for 5.16 to resolve some reported
  problems:

   - mtu3 driver fixes

   - typec ucsi driver fix

   - xhci driver quirk added

   - usb gadget f_fs fix for reported crash

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'usb-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: ucsi: Only check the contract if there is a connection
  xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set.
  usb: mtu3: set interval of FS intr and isoc endpoint
  usb: mtu3: fix list_head check warning
  usb: mtu3: add memory barrier before set GPD's HWO
  usb: mtu3: fix interval value for intr and isoc
  usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear.
2021-12-30 09:49:54 -08:00
Miaoqian Lin
bf2b09fedc fsl/fman: Fix missing put_device() call in fman_port_probe
The reference taken by 'of_find_device_by_node()' must be released when
not needed anymore.
Add the corresponding 'put_device()' in the and error handling paths.

Fixes: 18a6c85fcc ("fsl/fman: Add FMan Port Support")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-30 13:34:06 +00:00