Commit Graph

122995 Commits

Author SHA1 Message Date
Mahesh Bandewar
316cdaa115 net: add option to not create fall-back tunnels in root-ns as well
The sysctl that was added  earlier by commit 79134e6ce2 ("net: do
not create fallback tunnels for non-default namespaces") to create
fall-back only in root-ns. This patch enhances that behavior to provide
option not to create fallback tunnels in root-ns as well. Since modules
that create fallback tunnels could be built-in and setting the sysctl
value after booting is pointless, so added a kernel cmdline options to
change this default. The default setting is preseved for backward
compatibility. The kernel command line option of fb_tunnels=initns will
set the sysctl value to 1 and will create fallback tunnels only in initns
while kernel cmdline fb_tunnels=none will set the sysctl value to 2 and
fallback tunnels are skipped in every netns.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Maciej Zenczykowski <maze@google.com>
Cc: Jian Yang <jianyang@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28 06:52:44 -07:00
David S. Miller
ae9a138f06 Merge tag 'mac80211-next-for-davem-2020-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
This time we have:
 * some code to support SAE (WPA3) offload in AP mode
 * many documentation (wording) fixes/updates
 * netlink policy updates, including the use of NLA_RANGE
   with binary attributes
 * regulatory improvements for adjacent frequency bands
 * and a few other small additions/refactorings/cleanups
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28 06:17:40 -07:00
Nicolas Dichtel
50aba46c23 gtp: add notification mechanism
Like all other network functions, let's notify gtp context on creation and
deletion.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Gabriel Ganne <gabriel.ganne@6wind.com>
Acked-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-27 08:01:47 -07:00
Andrew Lunn
232e15e1d7 net: xgene: Move shared header file into include/linux
This header file is currently included into the ethernet driver via a
relative path into the PHY subsystem. This is bad practice, and causes
issues for the upcoming move of the MDIO driver. Move the header file
into include/linux to clean this up.

v2:
Move header to include/linux/mdio

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-27 06:55:50 -07:00
Andrew Lunn
fcba68bd75 net/phy/mdio-i2c: Move header file to include/linux/mdio
In preparation for moving all MDIO drivers into drivers/net/mdio, move
the mdio-i2c header file into include/linux/mdio so it can be used by
both the MDIO driver and the SFP code which instantiates I2C MDIO
busses.

v2:
Add include/linux/mdio

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-27 06:55:50 -07:00
Andrew Lunn
2fa4e4b799 net: pcs: Move XPCS into new PCS subdirectory
Create drivers/net/pcs and move the Synopsys DesignWare XPCS into the
new directory. Move the header file into a subdirectory
include/linux/pcs

Start a naming convention of all PCS files use the prefix pcs-, and
rename the XPCS files to fit.

v2:
Add include/linux/pcs

v4:
Fix include path in stmmac.
Remove PCS_DEVICES to avoid new prompts

Cc: Jose Abreu <Jose.Abreu@synopsys.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-27 06:55:50 -07:00
Chung-Hsien Hsu
2831a63102 nl80211: support SAE authentication offload in AP mode
Let drivers advertise support for AP-mode SAE authentication offload
with a new NL80211_EXT_FEATURE_SAE_OFFLOAD_AP flag.

Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com>
Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com>
Link: https://lore.kernel.org/r/20200817073316.33402-4-stanley.hsu@cypress.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-27 15:19:44 +02:00
John Crispin
8552a434b6 mac80211: rename csa counters to countdown counters
We want to reuse the functions and structs for other counters such as BSS
color change. Rename them to more generic names.

Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20200811080107.3615705-2-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-27 14:12:15 +02:00
John Crispin
00c207edfb nl80211: rename csa counter attributes countdown counters
We want to reuse the attributes for other counters such as BSS color
change. Rename them to more generic names.

Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20200811080107.3615705-1-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-27 14:12:15 +02:00
Miles Hu
eb89a6a6b7 nl80211: add support for setting fixed HE rate/gi/ltf
This patch adds the nl80211 structs, definitions, policies and parsing
code required to pass fixed HE rate, GI and LTF settings.

Signed-off-by: Miles Hu <milehu@codeaurora.org>
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20200804081630.2013619-1-john@phrozen.org
[fix comment]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-27 14:12:14 +02:00
James Prestwood
493a0ebd80 nl80211: fix PORT_AUTHORIZED wording to reflect behavior
The CMD_PORT_AUTHORIZED event was described as an event which indicated
a successfully completed 4-way handshake. But the behavior was
not as advertized. The only driver which uses this is brcmfmac, and
this driver only sends the event after a successful 802.1X-FT roam.

This prevents userspace applications from knowing if the 4-way completed
on:

1. Normal 802.1X connects
2. Normal PSK connections
3. FT-PSK roams

wpa_supplicant handles this incorrect behavior by just completing
the connection after association, before the 4-way has completed.
If the 4-way ends up failing it disconnects at that point.

Since this behavior appears to be expected (wpa_s handles it this
way) I have changed the wording in the API description to reflect the
actual behavior.

Signed-off-by: James Prestwood <prestwoj@gmail.com>
Link: https://lore.kernel.org/r/20200413162053.3711-1-prestwoj@gmail.com
[fix spelling of 802.1X throughout ...]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-27 11:29:59 +02:00
Tariq Toukan
f468f21b7a net: Take common prefetch code structure into a function
Many device drivers use the same prefetch code structure to
deal with small L1 cacheline size.
Take this code into a function and call it from the drivers.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-26 15:55:53 -07:00
Eric Dumazet
24da79902e inet: remove inet_sk_copy_descendant()
This is no longer used, SCTP now uses a private helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-26 07:33:19 -07:00
Igor Russkikh
c5c642c55e qed: align adjacent indent
Remove extra indent on some of adjacent declarations.

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24 18:01:33 -07:00
Igor Russkikh
4f5a8db27e qed: use devlink logic to report errors
Use devlink_health_report to push error indications.
We implement this in qede via callback function to make it possible
to reuse the same for other drivers sitting on top of qed in future.

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24 18:01:33 -07:00
Igor Russkikh
9524067b9a qed: health reporter init deinit seq
Here we declare health reporter ops (empty for now)
and register these in qed probe and remove callbacks.

This way we get devlink attached to all kind of qed* PCI
device entities: networking or storage offload entity.

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24 18:01:32 -07:00
Igor Russkikh
755f982bb1 qed/qede: make devlink survive recovery
Devlink instance lifecycle was linked to qed_dev object,
that caused devlink to be recreated on each recovery.

Changing it by making higher level driver (qede) responsible for its
life. This way devlink now survives recoveries.

qede now stores devlink structure pointer as a part of its device
object, devlink private data contains a linkage structure,
qed_devlink.

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24 18:01:32 -07:00
Luke Hsiao
583bbf0624 io_uring: allow tcp ancillary data for __sys_recvmsg_sock()
For TCP tx zero-copy, the kernel notifies the process of completions by
queuing completion notifications on the socket error queue. This patch
allows reading these notifications via recvmsg to support TCP tx
zero-copy.

Ancillary data was originally disallowed due to privilege escalation
via io_uring's offloading of sendmsg() onto a kernel thread with kernel
credentials (https://crbug.com/project-zero/1975). So, we must ensure
that the socket type is one where the ancillary data types that are
delivered on recvmsg are plain data (no file descriptors or values that
are translated based on the identity of the calling process).

This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE
with tx zero-copy enabled. Before this patch, we received -EINVALID from
this specific code path. After this patch, we could read tcp tx
zero-copy completion notifications from the MSG_ERRQUEUE.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Luke Hsiao <lukehsiao@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24 16:16:06 -07:00
David S. Miller
7611cbb900 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-23 11:48:27 -07:00
Linus Torvalds
9d045ed1eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Nothing earth shattering here, lots of small fixes (f.e. missing RCU
  protection, bad ref counting, missing memset(), etc.) all over the
  place:

   1) Use get_file_rcu() in task_file iterator, from Yonghong Song.

   2) There are two ways to set remote source MAC addresses in macvlan
      driver, but only one of which validates things properly. Fix this.
      From Alvin Šipraga.

   3) Missing of_node_put() in gianfar probing, from Sumera
      Priyadarsini.

   4) Preserve device wanted feature bits across multiple netlink
      ethtool requests, from Maxim Mikityanskiy.

   5) Fix rcu_sched stall in task and task_file bpf iterators, from
      Yonghong Song.

   6) Avoid reset after device destroy in ena driver, from Shay
      Agroskin.

   7) Missing memset() in netlink policy export reallocation path, from
      Johannes Berg.

   8) Fix info leak in __smc_diag_dump(), from Peilin Ye.

   9) Decapsulate ECN properly for ipv6 in ipv4 tunnels, from Mark
      Tomlinson.

  10) Fix number of data stream negotiation in SCTP, from David Laight.

  11) Fix double free in connection tracker action module, from Alaa
      Hleihel.

  12) Don't allow empty NHA_GROUP attributes, from Nikolay Aleksandrov"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
  net: nexthop: don't allow empty NHA_GROUP
  bpf: Fix two typos in uapi/linux/bpf.h
  net: dsa: b53: check for timeout
  tipc: call rcu_read_lock() in tipc_aead_encrypt_done()
  net/sched: act_ct: Fix skb double-free in tcf_ct_handle_fragments() error flow
  net: sctp: Fix negotiation of the number of data streams.
  dt-bindings: net: renesas, ether: Improve schema validation
  gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY
  hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()
  hv_netvsc: Remove "unlikely" from netvsc_select_queue
  bpf: selftests: global_funcs: Check err_str before strstr
  bpf: xdp: Fix XDP mode when no mode flags specified
  selftests/bpf: Remove test_align leftovers
  tools/resolve_btfids: Fix sections with wrong alignment
  net/smc: Prevent kernel-infoleak in __smc_diag_dump()
  sfc: fix build warnings on 32-bit
  net: phy: mscc: Fix a couple of spelling mistakes "spcified" -> "specified"
  libbpf: Fix map index used in error message
  net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe()
  net: atlantic: Use readx_poll_timeout() for large timeout
  ...
2020-08-23 10:52:33 -07:00
Tom Parkin
eee049c0ef l2tp: remove tunnel and session debug flags field
The l2tp subsystem now uses standard kernel logging APIs for
informational and warning messages, and tracepoints for debug
information.

Now that the tunnel and session debug flags are unused, remove the field
from the core structures.

Various system calls (in the case of l2tp_ppp) and netlink messages
handle the getting and setting of debug flags.  To avoid userspace
breakage don't modify the API of these calls; simply ignore set
requests, and send dummy data for get requests.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-22 12:44:37 -07:00
David S. Miller
4af7b32f84 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-08-21

The following pull-request contains BPF updates for your *net* tree.

We've added 11 non-merge commits during the last 5 day(s) which contain
a total of 12 files changed, 78 insertions(+), 24 deletions(-).

The main changes are:

1) three fixes in BPF task iterator logic, from Yonghong.

2) fix for compressed dwarf sections in vmlinux, from Jiri.

3) fix xdp attach regression, from Andrii.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-21 12:54:50 -07:00
Linus Torvalds
f22c5579a7 Merge tag 'riscv-for-linus-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - The CLINT driver has been split in two: one to handle the M-mode
   CLINT (memory mapped and used on NOMMU systems) and one to handle the
   S-mode CLINT (via SBI).

 - The addition of SiFive's drivers to rv32_defconfig

* tag 'riscv-for-linus-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Add SiFive drivers to rv32_defconfig
  dt-bindings: timer: Add CLINT bindings
  RISC-V: Remove CLINT related code from timer and arch
  clocksource/drivers: Add CLINT timer driver
  RISC-V: Add mechanism to provide custom IPI operations
2020-08-21 12:32:42 -07:00
Tobias Klauser
b16fc097bc bpf: Fix two typos in uapi/linux/bpf.h
Also remove trailing whitespaces in bpf_skb_get_tunnel_key example code.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200821133642.18870-1-tklauser@distanz.ch
2020-08-21 12:26:17 -07:00
Linus Torvalds
d723b99ec9 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Improvements to ext4's block allocator performance for very large file
  systems, especially when the file system or files which are highly
  fragmented. There is a new mount option, prefetch_block_bitmaps which
  will pull in the block bitmaps and set up the in-memory buddy bitmaps
  when the file system is initially mounted.

  Beyond that, a lot of bug fixes and cleanups. In particular, a number
  of changes to make ext4 more robust in the face of write errors or
  file system corruptions"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
  ext4: limit the length of per-inode prealloc list
  ext4: reorganize if statement of ext4_mb_release_context()
  ext4: add mb_debug logging when there are lost chunks
  ext4: Fix comment typo "the the".
  jbd2: clean up checksum verification in do_one_pass()
  ext4: change to use fallthrough macro
  ext4: remove unused parameter of ext4_generic_delete_entry function
  mballoc: replace seq_printf with seq_puts
  ext4: optimize the implementation of ext4_mb_good_group()
  ext4: delete invalid comments near ext4_mb_check_limits()
  ext4: fix typos in ext4_mb_regular_allocator() comment
  ext4: fix checking of directory entry validity for inline directories
  fs: prevent BUG_ON in submit_bh_wbc()
  ext4: correctly restore system zone info when remount fails
  ext4: handle add_system_zone() failure in ext4_setup_system_zone()
  ext4: fold ext4_data_block_valid_rcu() into the caller
  ext4: check journal inode extents more carefully
  ext4: don't allow overlapping system zones
  ext4: handle error of ext4_setup_system_zone() on remount
  ext4: delete the invalid BUGON in ext4_mb_load_buddy_gfp()
  ...
2020-08-21 11:03:38 -07:00
Anup Patel
2ac6795fcc clocksource/drivers: Add CLINT timer driver
We add a separate CLINT timer driver for Linux RISC-V M-mode (i.e.
RISC-V NoMMU kernel).

The CLINT MMIO device provides three things:
1. 64bit free running counter register
2. 64bit per-CPU time compare registers
3. 32bit per-CPU inter-processor interrupt registers

Unlike other timer devices, CLINT provides IPI registers along with
timer registers. To use CLINT IPI registers, the CLINT timer driver
provides IPI related callbacks to arch/riscv.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Tested-by: Emil Renner Berhing <kernel@esmil.dk>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-08-20 10:57:29 -07:00
Linus Torvalds
d271b51c60 Merge tag 'dma-mapping-5.9-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
 "Fix more fallout from the dma-pool changes (Nicolas Saenz Julienne,
  me)"

* tag 'dma-mapping-5.9-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-pool: Only allocate from CMA when in same memory zone
  dma-pool: fix coherent pool allocations for IOMMU mappings
2020-08-20 10:48:17 -07:00
Kurt Kanzenbach
17060fb506 ptp: Remove unused macro
The offset for the control field is not needed anymore. Remove it.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-19 16:09:19 -07:00
Kurt Kanzenbach
036c508ba9 ptp: Add generic ptp message type function
The message type is located at different offsets within the ptp header depending
on the ptp version (v1 or v2). Therefore, drivers which also deal with ptp v1
have some code for it.

Extract this into a helper function for drivers to be used.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-19 16:07:49 -07:00
Kurt Kanzenbach
bdfbb63c31 ptp: Add generic ptp v2 header parsing function
Reason: A lot of the ptp drivers - which implement hardware time stamping - need
specific fields such as the sequence id from the ptp v2 header. Currently all
drivers implement that themselves.

Introduce a generic function to retrieve a pointer to the start of the ptp v2
header.

Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-19 16:07:49 -07:00
brookxu
27bc446e2d ext4: limit the length of per-inode prealloc list
In the scenario of writing sparse files, the per-inode prealloc list may
be very long, resulting in high overhead for ext4_mb_use_preallocated().
To circumvent this problem, we limit the maximum length of per-inode
prealloc list to 512 and allow users to modify it.

After patching, we observed that the sys ratio of cpu has dropped, and
the system throughput has increased significantly. We created a process
to write the sparse file, and the running time of the process on the
fixed kernel was significantly reduced, as follows:

Running time on unfixed kernel:
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m2.051s
user    0m0.008s
sys     0m2.026s

Running time on fixed kernel:
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m0.471s
user    0m0.004s
sys     0m0.395s

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:36 -04:00
Xin Long
4ef1a7cb08 ipv6: some fixes for ipv6_dev_find()
This patch is to do 3 things for ipv6_dev_find():

  As David A. noticed,

  - rt6_lookup() is not really needed. Different from __ip_dev_find(),
    ipv6_dev_find() doesn't have a compatibility problem, so remove it.

  As Hideaki suggested,

  - "valid" (non-tentative) check for the address is also needed.
    ipv6_chk_addr() calls ipv6_chk_addr_and_flags(), which will
    traverse the address hash list, but it's heavy to be called
    inside ipv6_dev_find(). This patch is to reuse the code of
    ipv6_chk_addr_and_flags() for ipv6_dev_find().

  - dev parameter is passed into ipv6_dev_find(), as link-local
    addresses from user space has sin6_scope_id set and the dev
    lookup needs it.

Fixes: 81f6cb3122 ("ipv6: add ipv6_dev_find()")
Suggested-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-18 15:58:53 -07:00
Johannes Berg
8aa26c575f netlink: make NLA_BINARY validation more flexible
Add range validation for NLA_BINARY, allowing validation of any
combination of combination minimum or maximum lengths, using the
existing NLA_POLICY_RANGE()/NLA_POLICY_FULL_RANGE() macros, just
like for integers where the value is checked.

Also make NLA_POLICY_EXACT_LEN(), NLA_POLICY_EXACT_LEN_WARN()
and NLA_POLICY_MIN_LEN() special cases of this, removing the old
types NLA_EXACT_LEN and NLA_MIN_LEN.

This allows us to save some code where both minimum and maximum
lengths are requires, currently the policy only allows maximum
(NLA_BINARY), minimum (NLA_MIN_LEN) or exact (NLA_EXACT_LEN), so
a range of lengths cannot be accepted and must be checked by the
code that consumes the attributes later.

Also, this allows advertising the correct ranges in the policy
export to userspace. Here, NLA_MIN_LEN and NLA_EXACT_LEN already
were special cases of NLA_BINARY with min and min/max length
respectively.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-18 12:28:45 -07:00
Linus Torvalds
9899b58758 Merge tag 'fixes-2020-08-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull ia64 page table fix from Mike Rapoport:
 "Fix regression in IA-64 caused by page table allocation refactoring

  The refactoring and consolidation of <asm/pgalloc.h> caused regression
  on parisc and ia64. The fix for parisc made it into v5.9-rc1 while the
  fix ia64 got delayed a bit and here it is"

* tag 'fixes-2020-08-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  arch/ia64: Restore arch-specific pgd_offset_k implementation
2020-08-18 12:05:46 -07:00
Linus Torvalds
4cf7562190 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Another batch of fixes:

  1) Remove nft_compat counter flush optimization, it generates warnings
     from the refcount infrastructure. From Florian Westphal.

  2) Fix BPF to search for build id more robustly, from Jiri Olsa.

  3) Handle bogus getopt lengths in ebtables, from Florian Westphal.

  4) Infoleak and other fixes to j1939 CAN driver, from Eric Dumazet and
     Oleksij Rempel.

  5) Reset iter properly on mptcp sendmsg() error, from Florian
     Westphal.

  6) Show a saner speed in bonding broadcast mode, from Jarod Wilson.

  7) Various kerneldoc fixes in bonding and elsewhere, from Lee Jones.

  8) Fix double unregister in bonding during namespace tear down, from
     Cong Wang.

  9) Disable RP filter during icmp_redirect selftest, from David Ahern"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits)
  otx2_common: Use devm_kcalloc() in otx2_config_npa()
  net: qrtr: fix usage of idr in port assignment to socket
  selftests: disable rp_filter for icmp_redirect.sh
  Revert "net: xdp: pull ethernet header off packet after computing skb->protocol"
  phylink: <linux/phylink.h>: fix function prototype kernel-doc warning
  mptcp: sendmsg: reset iter on error redux
  net: devlink: Remove overzealous WARN_ON with snapshots
  tipc: not enable tipc when ipv6 works as a module
  tipc: fix uninit skb->data in tipc_nl_compat_dumpit()
  net: Fix potential wrong skb->protocol in skb_vlan_untag()
  net: xdp: pull ethernet header off packet after computing skb->protocol
  ipvlan: fix device features
  bonding: fix a potential double-unregister
  can: j1939: add rxtimer for multipacket broadcast session
  can: j1939: abort multipacket broadcast session when timeout occurs
  can: j1939: cancel rxtimer on multipacket broadcast session complete
  can: j1939: fix support for multipacket broadcast message
  net: fddi: skfp: cfm: Remove seemingly unused variable 'ID_sccs'
  net: fddi: skfp: cfm: Remove set but unused variable 'oldstate'
  net: fddi: skfp: smt: Remove seemingly unused variable 'ID_sccs'
  ...
2020-08-17 17:09:50 -07:00
Jessica Clarke
bd05220c7b arch/ia64: Restore arch-specific pgd_offset_k implementation
IA-64 is special and treats pgd_offset_k() differently to pgd_offset(),
using different formulae to calculate the indices into the kernel and user
PGDs.  The index into the user PGDs takes into account the region number,
but the index into the kernel (init_mm) PGD always assumes a predefined
kernel region number. Commit 974b9b2c68 ("mm: consolidate pte_index() and
pte_offset_*() definitions") made IA-64 use a generic pgd_offset_k() which
incorrectly used pgd_index() for kernel page tables.  As a result, the
index into the kernel PGD was going out of bounds and the kernel hung
during early boot.

Allow overrides of pgd_offset_k() and override it on IA-64 with the old
implementation that will correctly index the kernel PGD.

Fixes: 974b9b2c68 ("mm: consolidate pte_index() and pte_offset_*() definitions")
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-08-17 21:50:54 +03:00
Randy Dunlap
0b76e642f9 phylink: <linux/phylink.h>: fix function prototype kernel-doc warning
Fix a kernel-doc warning for the pcs_config() function prototype:

../include/linux/phylink.h:406: warning: Excess function parameter 'permit_pause_to_mac' description in 'pcs_config'

Fixes: 7137e18f6f ("net: phylink: add struct phylink_pcs")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17 11:46:22 -07:00
David Howells
29e44f4535 watch_queue: Limit the number of watches a user can hold
Impose a limit on the number of watches that a user can hold so that
they can't use this mechanism to fill up all the available memory.

This is done by putting a counter in user_struct that's incremented when
a watch is allocated and decreased when it is released.  If the number
exceeds the RLIMIT_NOFILE limit, the watch is rejected with EAGAIN.

This can be tested by the following means:

 (1) Create a watch queue and attach it to fd 5 in the program given - in
     this case, bash:

	keyctl watch_session /tmp/nlog /tmp/gclog 5 bash

 (2) In the shell, set the maximum number of files to, say, 99:

	ulimit -n 99

 (3) Add 200 keyrings:

	for ((i=0; i<200; i++)); do keyctl newring a$i @s || break; done

 (4) Try to watch all of the keyrings:

	for ((i=0; i<200; i++)); do echo $i; keyctl watch_add 5 %:a$i || break; done

     This should fail when the number of watches belonging to the user hits
     99.

 (5) Remove all the keyrings and all of those watches should go away:

	for ((i=0; i<200; i++)); do keyctl unlink %:a$i; done

 (6) Kill off the watch queue by exiting the shell spawned by
     watch_session.

Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-17 09:39:18 -07:00
David S. Miller
8c26544f5a Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Endianness issue in IPv4 option support in nft_exthdr,
   from Stephen Suryaputra.

2) Removes the waitcount optimization in nft_compat,
   from Florian Westphal.

3) Remove ipv6 -> nf_defrag_ipv6 module dependency, from
   Florian Westphal.

4) Memleak in chain binding support, also from Florian.

5) Simplify nft_flowtable.sh selftest, from Fabian Frederick.

6) Optional MTU arguments for selftest nft_flowtable.sh,
   also from Fabian.

7) Remove noise error report when killing process in
   selftest nft_flowtable.sh, from Fabian Frederick.

8) Reject bogus getsockopt option length in ebtables,
   from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-16 16:05:36 -07:00
Linus Torvalds
2cc3c4b3c2 Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "A few differerent things in here.

  Seems like syzbot got some more io_uring bits wired up, and we got a
  handful of reports and the associated fixes are in here.

  General fixes too, and a lot of them marked for stable.

  Lastly, a bit of fallout from the async buffered reads, where we now
  more easily trigger short reads. Some applications don't really like
  that, so the io_read() code now handles short reads internally, and
  got a cleanup along the way so that it's now easier to read (and
  documented). We're now passing tests that failed before"

* tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block:
  io_uring: short circuit -EAGAIN for blocking read attempt
  io_uring: sanitize double poll handling
  io_uring: internally retry short reads
  io_uring: retain iov_iter state over io_read/io_write calls
  task_work: only grab task signal lock when needed
  io_uring: enable lookup of links holding inflight files
  io_uring: fail poll arm on queue proc failure
  io_uring: hold 'ctx' reference around task_work queue + execute
  fs: RWF_NOWAIT should imply IOCB_NOIO
  io_uring: defer file table grabbing request cleanup for locked requests
  io_uring: add missing REQ_F_COMP_LOCKED for nested requests
  io_uring: fix recursive completion locking on oveflow flush
  io_uring: use TWA_SIGNAL for task_work uncondtionally
  io_uring: account locked memory before potential error case
  io_uring: set ctx sq/cq entry count earlier
  io_uring: Fix NULL pointer dereference in loop_rw_iter()
  io_uring: add comments on how the async buffered read retry works
  io_uring: io_async_buf_func() need not test page bit
2020-08-16 10:55:12 -07:00
Linus Torvalds
7f5faaaa59 Merge tag 'perf-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes, an expansion of perf syscall access to CAP_PERFMON
  privileged tools, plus a RAPL HW-enablement for Intel SPR platforms"

* tag 'perf-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/rapl: Add support for Intel SPR platform
  perf/x86/rapl: Support multiple RAPL unit quirks
  perf/x86/rapl: Fix missing psys sysfs attributes
  hw_breakpoint: Remove unused __register_perf_hw_breakpoint() declaration
  kprobes: Remove show_registers() function prototype
  perf/core: Take over CAP_SYS_PTRACE creds to CAP_PERFMON capability
2020-08-15 10:34:24 -07:00
Linus Torvalds
37711e5e23 Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Stable fixes:
   - pNFS: Don't return layout segments that are being used for I/O
   - pNFS: Don't move layout segments off the active list when being used for I/O

  Features:
   - NFS: Add support for user xattrs through the NFSv4.2 protocol
   - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC
   - NFSv4.0 allow nconnect for v4.0

  Bugfixes and cleanups:
   - nfs: ensure correct writeback errors are returned on close()
   - nfs: nfs_file_write() should check for writeback errors
   - nfs: Fix getxattr kernel panic and memory overflow
   - NFS: Fix the pNFS/flexfiles mirrored read failover code
   - SUNRPC: dont update timeout value on connection reset
   - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
   - sunrpc: destroy rpc_inode_cachep after unregister_filesystem"

* tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits)
  NFS: Fix flexfiles read failover
  fs: nfs: delete repeated words in comments
  rpc_pipefs: convert comma to semicolon
  nfs: Fix getxattr kernel panic and memory overflow
  NFS: Don't return layout segments that are in use
  NFS: Don't move layouts to plh_return_segs list while in use
  NFS: Add layout segment info to pnfs read/write/commit tracepoints
  NFS: Add tracepoints for layouterror and layoutstats.
  NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close()
  SUNRPC dont update timeout value on connection reset
  nfs: nfs_file_write() should check for writeback errors
  nfs: ensure correct writeback errors are returned on close()
  NFSv4.2: xattr cache: get rid of cache discard work queue
  NFS: remove redundant initialization of variable result
  NFSv4.0 allow nconnect for v4.0
  freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
  sunrpc: destroy rpc_inode_cachep after unregister_filesystem
  NFSv4.2: add client side xattr caching.
  NFSv4.2: hook in the user extended attribute handlers
  NFSv4.2: add the extended attribute proc functions.
  ...
2020-08-15 08:26:55 -07:00
Linus Torvalds
341323fa0e Merge tag 'acpi-5.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
 "Add new hardware support to the ACPI driver for AMD SoCs, the x86 clk
  driver and the Designware i2c driver (changes from Akshu Agrawal and
  Pu Wen)"

* tag 'acpi-5.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  clk: x86: Support RV architecture
  ACPI: APD: Add a fmw property is_raven
  clk: x86: Change name from ST to FCH
  ACPI: APD: Change name from ST to FCH
  i2c: designware: Add device HID for Hygon I2C controller
2020-08-15 08:18:22 -07:00
Linus Torvalds
1a5d9dbbaf Merge tag 'pm-5.9-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull one more power management update from Rafael Wysocki:
 "Modify the intel_pstate driver to allow it to work in the passive mode
  with hardware-managed P-states (HWP) enabled"

* tag 'pm-5.9-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Implement passive mode with HWP enabled
2020-08-15 08:17:01 -07:00
Linus Torvalds
884e0d3dd5 Merge tag 'mfd-next-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
 "Core Frameworks
   - Make better attempt at matching device with the correct OF node
   - Allow batch removal of hierarchical sub-devices

  New Drivers
   - Add STM32 Clocksource driver
   - Add support for Khadas System Control Microcontroller

  Driver Removal
   - Remove unused driver for TI's SMSC ECE1099

  New Device Support
   - Add support for Intel Emmitsburg PCH to Intel LPSS PCI
   - Add support for Intel Tiger Lake PCH-H to Intel LPSS PCI
   - Add support for Dialog DA revision to Dialog DA9063

  New Functionality
   - Add support for AXP803 to be probed by I2C

  Fix-ups
   - Numerous W=1 warning fixes
   - Device Tree changes (stm32-lptimer, gateworks-gsc, khadas,mcu, stmfx, cros-ec, j721e-system-controller)
   - Enabled Regmap 'fast I/O' in stm32-lptimer
   - Change BUG_ON to WARN_ON in arizona-core
   - Remove superfluous code/initialisation (madera, max14577)
   - Trivial formatting/spelling issues (madera-core, madera-i2c, da9055, max77693-private)
   - Switch to of_platform_populate() in sprd-sc27xx-spi
   - Expand out set/get brightness/pwm macros in lm3533-ctrlbank
   - Disable IRQs on suspend in motorola-cpcap
   - Clean-up error handling in intel_soc_pmic_mrfld
   - Ensure correct removal order of sub-devices in madera
   - Many s/HTTP/HTTPS/ link changes
   - Ensure name used with Regmap is unique in syscon

  Bug Fixes
   - Properly 'put' clock on unbind and error in arizona-core
   - Fix revision handling in da9063
   - Fix 'assignment of read-only location' error in kempld-core
   - Avoid using the Regmap API when atomic in rn5t618
   - Redefine volatile register description in rn5t618
   - Use locking to protect event handler in dln2"

* tag 'mfd-next-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (76 commits)
  mfd: syscon: Use a unique name with regmap_config
  mfd: Replace HTTP links with HTTPS ones
  mfd: dln2: Run event handler loop under spinlock
  mfd: madera: Improve handling of regulator unbinding
  mfd: mfd-core: Add mechanism for removal of a subset of children
  mfd: intel_soc_pmic_mrfld: Simplify the return expression of intel_scu_ipc_dev_iowrite8()
  mfd: max14577: Remove redundant initialization of variable current_bits
  mfd: rn5t618: Fix caching of battery related registers
  mfd: max77693-private: Drop a duplicated word
  mfd: da9055: pdata.h: Drop a duplicated word
  mfd: rn5t618: Make restart handler atomic safe
  mfd: kempld-core: Fix 'assignment of read-only location' error
  mfd: axp20x: Allow the AXP803 to be probed by I2C
  mfd: da9063: Add support for latest DA silicon revision
  mfd: da9063: Fix revision handling to correctly select reg tables
  dt-bindings: mfd: st,stmfx: Remove I2C unit name
  dt-bindings: mfd: ti,j721e-system-controller.yaml: Add J721e system controller
  mfd: motorola-cpcap: Disable interrupt for suspend
  mfd: smsc-ece1099: Remove driver
  mfd: core: Add OF_MFD_CELL_REG() helper
  ...
2020-08-15 08:09:38 -07:00
Linus Torvalds
18737f4243 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "Subsystems affected by this patch series: mm/hotfixes, lz4, exec,
  mailmap, mm/thp, autofs, sysctl, mm/kmemleak, mm/misc and lib"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
  virtio: pci: constify ioreadX() iomem argument (as in generic implementation)
  ntb: intel: constify ioreadX() iomem argument (as in generic implementation)
  rtl818x: constify ioreadX() iomem argument (as in generic implementation)
  iomap: constify ioreadX() iomem argument (as in generic implementation)
  sh: use generic strncpy()
  sh: clkfwk: remove r8/r16/r32
  include/asm-generic/vmlinux.lds.h: align ro_after_init
  mm: annotate a data race in page_zonenum()
  mm/swap.c: annotate data races for lru_rotate_pvecs
  mm/rmap: annotate a data race at tlb_flush_batched
  mm/mempool: fix a data race in mempool_free()
  mm/list_lru: fix a data race in list_lru_count_one
  mm/memcontrol: fix a data race in scan count
  mm/page_counter: fix various data races at memsw
  mm/swapfile: fix and annotate various data races
  mm/filemap.c: fix a data race in filemap_fault()
  mm/swap_state: mark various intentional data races
  mm/page_io: mark various intentional data races
  mm/frontswap: mark various intentional data races
  mm/kmemleak: silence KCSAN splats in checksum
  ...
2020-08-15 08:02:03 -07:00
Krzysztof Kozlowski
8f28ca6bd8 iomap: constify ioreadX() iomem argument (as in generic implementation)
Patch series "iomap: Constify ioreadX() iomem argument", v3.

The ioread8/16/32() and others have inconsistent interface among the
architectures: some taking address as const, some not.

It seems there is nothing really stopping all of them to take pointer to
const.

This patch (of 4):

The ioreadX() and ioreadX_rep() helpers have inconsistent interface.  On
some architectures void *__iomem address argument is a pointer to const,
on some not.

Implementations of ioreadX() do not modify the memory under the address so
they can be converted to a "const" version for const-safety and
consistency among architectures.

[krzk@kernel.org: sh: clk: fix assignment from incompatible pointer type for ioreadX()]
  Link: http://lkml.kernel.org/r/20200723082017.24053-1-krzk@kernel.org
[akpm@linux-foundation.org: fix drivers/mailbox/bcm-pdc-mailbox.c]
  Link: http://lkml.kernel.org/r/202007132209.Rxmv4QyS%25lkp@intel.com

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Allen Hubbe <allenbh@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/20200709072837.5869-1-krzk@kernel.org
Link: http://lkml.kernel.org/r/20200709072837.5869-2-krzk@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14 19:56:57 -07:00
Romain Naour
7f897acbe5 include/asm-generic/vmlinux.lds.h: align ro_after_init
Since the patch [1], building the kernel using a toolchain built with
binutils 2.33.1 prevents booting a sh4 system under Qemu.  Apply the patch
provided by Alan Modra [2] that fix alignment of rodata.

[1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebd2263ba9a9124d93bbc0ece63d7e0fae89b40e
[2] https://www.sourceware.org/ml/binutils/2019-12/msg00112.html

Signed-off-by: Romain Naour <romain.naour@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Modra <amodra@gmail.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Chen Zhou <chenzhou10@huawei.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <stable@vger.kernel.org>
Link: https://marc.info/?l=linux-sh&m=158429470221261
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14 19:56:57 -07:00
Qian Cai
c403f6a3a7 mm: annotate a data race in page_zonenum()
BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page

 write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
  page_cpupid_xchg_last+0x51/0x80
  page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
  wp_page_reuse+0x3e/0xc0
  wp_page_reuse at mm/memory.c:2453
  do_wp_page+0x472/0x7b0
  do_wp_page at mm/memory.c:2798
  __handle_mm_fault+0xcb0/0xd00
  handle_pte_fault at mm/memory.c:4049
  (inlined by) __handle_mm_fault at mm/memory.c:4163
  handle_mm_fault+0xfc/0x2f0
  handle_mm_fault at mm/memory.c:4200
  do_page_fault+0x263/0x6f9
  do_user_addr_fault at arch/x86/mm/fault.c:1465
  (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
  page_fault+0x34/0x40

 read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
  put_page+0x15a/0x1f0
  page_zonenum at include/linux/mm.h:923
  (inlined by) is_zone_device_page at include/linux/mm.h:929
  (inlined by) page_is_devmap_managed at include/linux/mm.h:948
  (inlined by) put_page at include/linux/mm.h:1023
  wp_page_copy+0x571/0x930
  wp_page_copy at mm/memory.c:2615
  do_wp_page+0x107/0x7b0
  __handle_mm_fault+0xcb0/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

A page never changes its zone number. The zone number happens to be
stored in the same word as other bits which are modified, but the zone
number bits will never be modified by any other write, so it can accept
a reload of the zone bits after an intervening write and it don't need
to use READ_ONCE(). Thus, annotate this data race using
ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
writes to it.

Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Link: http://lkml.kernel.org/r/1581619089-14472-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14 19:56:57 -07:00
Qian Cai
e0e3f42fd9 mm/memcontrol: fix a data race in scan count
struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
accessed concurrently as noticed by KCSAN,

 BUG: KCSAN: data-race in lruvec_lru_size / mem_cgroup_update_lru_size

 write to 0xffff9c804ca285f8 of 8 bytes by task 50951 on cpu 12:
  mem_cgroup_update_lru_size+0x11c/0x1d0
  mem_cgroup_update_lru_size at mm/memcontrol.c:1266
  isolate_lru_pages+0x6a9/0xf30
  shrink_active_list+0x123/0xcc0
  shrink_lruvec+0x8fd/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffff9c804ca285f8 of 8 bytes by task 50964 on cpu 95:
  lruvec_lru_size+0xbb/0x270
  mem_cgroup_get_zone_lru_size at include/linux/memcontrol.h:536
  (inlined by) lruvec_lru_size at mm/vmscan.c:326
  shrink_lruvec+0x1d0/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_current+0xa6/0x120
  alloc_slab_page+0x3b1/0x540
  allocate_slab+0x70/0x660
  new_slab+0x46/0x70
  ___slab_alloc+0x4ad/0x7d0
  __slab_alloc+0x43/0x70
  kmem_cache_alloc+0x2c3/0x420
  getname_flags+0x4c/0x230
  getname+0x22/0x30
  do_sys_openat2+0x205/0x3b0
  do_sys_open+0x9a/0xf0
  __x64_sys_openat+0x62/0x80
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 95 PID: 50964 Comm: cc1 Tainted: G        W  O L    5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

The write is under lru_lock, but the read is done as lockless.  The scan
count is used to determine how aggressively the anon and file LRU lists
should be scanned.  Load tearing could generate an inefficient heuristic,
so fix it by adding READ_ONCE() for the read.

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Link: http://lkml.kernel.org/r/20200206034945.2481-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14 19:56:57 -07:00