Commit Graph

1202612 Commits

Author SHA1 Message Date
Yonghong Song
a5d0c26a27 selftests/bpf: Add a cpuv4 test runner for cpu=v4 testing
Similar to no-alu32 runner, if clang compiler supports -mcpu=v4,
a cpuv4 runner is created to test bpf programs compiled with
-mcpu=v4.

The following are some num-of-insn statistics for each newer
instructions based on existing selftests, excluding subsequent
cpuv4 insn specific tests.

   insn pattern                # of instructions
   reg = (s8)reg               4
   reg = (s16)reg              4
   reg = (s32)reg              144
   reg = *(s8 *)(reg + off)    13
   reg = *(s16 *)(reg + off)   14
   reg = *(s32 *)(reg + off)   15215
   reg = bswap16 reg           142
   reg = bswap32 reg           38
   reg = bswap64 reg           14
   reg s/= reg                 0
   reg s%= reg                 0
   gotol <offset>              58

Note that in llvm -mcpu=v4 implementation, the compiler is a little
bit conservative about generating 'gotol' insn (32-bit branch offset)
as it didn't precise count the number of insns (e.g., some insns are
debug insns, etc.). Compared to old 'goto' insn, newer 'gotol' insn
should have comparable verification states to 'goto' insn.

With current patch set, all selftests passed with -mcpu=v4
when running test_progs-cpuv4 binary. The -mcpu=v3 and -mcpu=v2 run
are also successful.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011250.3718252-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27 18:54:16 -07:00
Yonghong Song
86180493a2 selftests/bpf: Fix a test_verifier failure
The following test_verifier subtest failed due to
new encoding for BSWAP.

  $ ./test_verifier
  ...
  #99/u invalid 64-bit BPF_END FAIL
  Unexpected success to load!
  verification time 215 usec
  stack depth 0
  processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
  #99/p invalid 64-bit BPF_END FAIL
  Unexpected success to load!
  verification time 198 usec
  stack depth 0
  processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

Tighten the test so it still reports a failure.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011244.3717464-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27 18:54:16 -07:00
Yonghong Song
f835bb6222 bpf: Add kernel/bpftool asm support for new instructions
Add asm support for new instructions so kernel verifier and bpftool
xlated insn dumps can have proper asm syntax for new instructions.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27 18:54:02 -07:00
Dave Airlie
75da46c1fa Merge tag 'drm-intel-fixes-2023-07-27' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Use shmem for dpt objects [dpt] (Radhakrishna Sripada)
- Fix an error handling path in igt_write_huge() (Christophe JAILLET)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZMI4Mtom7pDhLB7M@tursulin-desk
2023-07-28 11:53:27 +10:00
Yonghong Song
4cd58e9af8 bpf: Support new 32bit offset jmp instruction
Add interpreter/jit/verifier support for 32bit offset jmp instruction.
If a conditional jmp instruction needs more than 16bit offset,
it can be simulated with a conditional jmp + a 32bit jmp insn.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011231.3716103-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27 18:52:33 -07:00
Yonghong Song
7058e3a31e bpf: Fix jit blinding with new sdiv/smov insns
Handle new insns properly in bpf_jit_blind_insn() function.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011225.3715812-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27 18:52:33 -07:00
Yonghong Song
ec0e2da95f bpf: Support new signed div/mod instructions.
Add interpreter/jit support for new signed div/mod insns.
The new signed div/mod instructions are encoded with
unsigned div/mod instructions plus insn->off == 1.
Also add basic verifier support to ensure new insns get
accepted.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011219.3714605-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27 18:52:33 -07:00
Yonghong Song
0845c3db7b bpf: Support new unconditional bswap instruction
The existing 'be' and 'le' insns will do conditional bswap
depends on host endianness. This patch implements
unconditional bswap insns.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011213.3712808-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27 18:52:33 -07:00
Yonghong Song
1f1e864b65 bpf: Handle sign-extenstin ctx member accesses
Currently, if user accesses a ctx member with signed types,
the compiler will generate an unsigned load followed by
necessary left and right shifts.

With the introduction of sign-extension load, compiler may
just emit a ldsx insn instead. Let us do a final movsx sign
extension to the final unsigned ctx load result to
satisfy original sign extension requirement.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011207.3712528-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27 18:52:33 -07:00
Yonghong Song
8100928c88 bpf: Support new sign-extension mov insns
Add interpreter/jit support for new sign-extension mov insns.
The original 'MOV' insn is extended to support reg-to-reg
signed version for both ALU and ALU64 operations. For ALU mode,
the insn->off value of 8 or 16 indicates sign-extension
from 8- or 16-bit value to 32-bit value. For ALU64 mode,
the insn->off value of 8/16/32 indicates sign-extension
from 8-, 16- or 32-bit value to 64-bit value.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011202.3712300-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27 18:52:33 -07:00
Yonghong Song
1f9a1ea821 bpf: Support new sign-extension load insns
Add interpreter/jit support for new sign-extension load insns
which adds a new mode (BPF_MEMSX).
Also add verifier support to recognize these insns and to
do proper verification with new insns. In verifier, besides
to deduce proper bounds for the dst_reg, probed memory access
is also properly handled.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27 18:52:33 -07:00
Dave Airlie
8e4bc0284c Merge tag 'drm-misc-fixes-2023-07-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A single patch to remove an unused function.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/dqvxednqyab5t7gvwvcq72x6yu7ug5gusmhpgs3kq6z7pf3co6@ofr6s7547gbe
2023-07-28 11:46:32 +10:00
Jose E. Marchesi
10d78a66a5 bpf, docs: fix BPF_NEG entry in instruction-set.rst
This patch fixes the documentation of the BPF_NEG instruction to
denote that it does not use the source register operand.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Acked-by: Dave Thaler <dthaler@microsoft.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230726092543.6362-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27 17:31:49 -07:00
Jakub Kicinski
fa46722666 MAINTAINERS: stmmac: retire Giuseppe Cavallaro
I tried to get stmmac maintainers to be more active by agreeing with
them off-list on a review rotation. I pinged Peppe 3 times over 2 weeks
during his "shift month", no reviews are flowing.

All the contributions are much appreciated! But stmmac is quite
active, we need participating maintainers :(

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230726151120.1649474-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 17:20:23 -07:00
Russell King (Oracle)
9945c1fb03 net: dsa: fix older DSA drivers using phylink
Older DSA drivers that do not provide an dsa_ops adjust_link method end
up using phylink. Unfortunately, a recent phylink change that requires
its supported_interfaces bitmap to be filled breaks these drivers
because the bitmap remains empty.

Rather than fixing each driver individually, fix it in the core code so
we have a sensible set of defaults.

Reported-by: Sergei Antonov <saproj@gmail.com>
Fixes: de5c9bf40c ("net: phylink: require supported_interfaces to be filled")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com> # dsa_loop
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/E1qOflM-001AEz-D3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 17:19:46 -07:00
YueHaibing
994650353c net: datalink: Remove unused declarations
These declarations is not used after ipx protocol removed.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230726144054.28780-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 17:17:32 -07:00
YueHaibing
d0358c1a37 net: Remove unused declaration dev_restart()
This is not used, so can remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230726143715.24700-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 17:17:28 -07:00
YueHaibing
d4a80cc69a dccp: Remove unused declaration dccp_feat_initialise_sysctls()
This is never used, so can remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230726143239.9904-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 17:16:26 -07:00
Lin Ma
d73ef2d69c rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length
There are totally 9 ndo_bridge_setlink handlers in the current kernel,
which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3)
i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5)
ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7)
nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink.

By investigating the code, we find that 1-7 parse and use nlattr
IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can
lead to an out-of-attribute read and allow a malformed nlattr (e.g.,
length 0) to be viewed as a 2 byte integer.

To avoid such issues, also for other ndo_bridge_setlink handlers in the
future. This patch adds the nla_len check in rtnl_bridge_setlink and
does an early error return if length mismatches. To make it works, the
break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure
this nla_for_each_nested iterates every attribute.

Fixes: b1edc14a3f ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
Fixes: 51616018dd ("i40e: Add support for getlink, setlink ndo ops")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 17:14:01 -07:00
YueHaibing
4d66f235c7 bridge: Remove unused declaration br_multicast_set_hash_max()
Since commit 19e3a9c90c ("net: bridge: convert multicast to generic rhashtable")
this is not used, so can remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230726143141.11704-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 17:11:29 -07:00
Patrick Rohr
ef27ba5c84 net: remove comment in ndisc_router_discovery
Removes superfluous (and misplaced) comment from ndisc_router_discovery.

Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230726184742.342825-1-prohr@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 16:56:23 -07:00
Arnd Bergmann
3fc2febb0f ata: pata_ns87415: mark ns87560_tf_read static
The global function triggers a warning because of the missing prototype

drivers/ata/pata_ns87415.c:263:6: warning: no previous prototype for 'ns87560_tf_read' [-Wmissing-prototypes]
  263 | void ns87560_tf_read(struct ata_port *ap, struct ata_taskfile *tf)

There are no other references to this, so just make it static.

Fixes: c4b5b7b6c4 ("pata_ns87415: Initial cut at 87415/87560 IDE support")
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-07-28 08:52:42 +09:00
Jakub Kicinski
014acf2668 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 15:22:46 -07:00
Sidhartha Kumar
6c54312f96 mm/memory-failure: fix hardware poison check in unpoison_memory()
It was pointed out[1] that using folio_test_hwpoison() is wrong as we need
to check the indiviual page that has poison.  folio_test_hwpoison() only
checks the head page so go back to using PageHWPoison().

User-visible effects include existing hwpoison-inject tests possibly
failing as unpoisoning a single subpage could lead to unpoisoning an
entire folio.  Memory unpoisoning could also not work as expected as
the function will break early due to only checking the head page and
not the actually poisoned subpage.

[1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/

Link: https://lkml.kernel.org/r/20230717181812.167757-1-sidhartha.kumar@oracle.com
Fixes: a6fddef49e ("mm/memory-failure: convert unpoison_memory() to folios")
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:05 -07:00
Dan Carpenter
641db40f3a proc/vmcore: fix signedness bug in read_from_oldmem()
The bug is the error handling:

	if (tmp < nr_bytes) {

"tmp" can hold negative error codes but because "nr_bytes" is type size_t
the negative error codes are treated as very high positive values
(success).  Fix this by changing "nr_bytes" to type ssize_t.  The
"nr_bytes" variable is used to store values between 1 and PAGE_SIZE and
they can fit in ssize_t without any issue.

Link: https://lkml.kernel.org/r/b55f7eed-1c65-4adc-95d1-6c7c65a54a6e@moroto.mountain
Fixes: 5d8de293c2 ("vmcore: convert copy_oldmem_page() to take an iov_iter")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:05 -07:00
Bjorn Andersson
286812b041 mailmap: update remaining active codeaurora.org email addresses
The lack of mailmap updates for @codeaurora.org addresses reduces the
usefulness of tools such as get_maintainer.pl.  Some recent (and welcome!)
additions has been made to improve the situation, this concludes the
effort.

Link: https://lkml.kernel.org/r/20230720210256.1296567-1-quic_bjorande@quicinc.com
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:05 -07:00
Jann Horn
d8ab9f7b64 mm: lock VMA in dup_anon_vma() before setting ->anon_vma
When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the
VMA that is being expanded to cover the area previously occupied by
another VMA.  This currently happens while `dst` is not write-locked.

This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as
the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent
page faults can happen on `dst` under the per-VMA lock.  This is already
icky in itself, since such page faults can now install pages into `dst`
that are attached to an `anon_vma` that is not yet tied back to the
`anon_vma` with an `anon_vma_chain`.  But if `anon_vma_clone()` fails due
to an out-of-memory error, things get much worse: `anon_vma_clone()` then
reverts `dst->anon_vma` back to NULL, and `dst` remains completely
unconnected to the `anon_vma`, even though we can have pages in the area
covered by `dst` that point to the `anon_vma`.

This means the `anon_vma` of such pages can be freed while the pages are
still mapped into userspace, which leads to UAF when a helper like
folio_lock_anon_vma_read() tries to look up the anon_vma of such a page.

This theoretically is a security bug, but I believe it is really hard to
actually trigger as an unprivileged user because it requires that you can
make an order-0 GFP_KERNEL allocation fail, and the page allocator tries
pretty hard to prevent that.

I think doing the vma_start_write() call inside dup_anon_vma() is the most
straightforward fix for now.

For a kernel-assisted reproducer, see the notes section of the patch mail.

Link: https://lkml.kernel.org/r/20230721034643.616851-1-jannh@google.com
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:04 -07:00
Jann Horn
b1f02b9575 mm: fix memory ordering for mm_lock_seq and vm_lock_seq
mm->mm_lock_seq effectively functions as a read/write lock; therefore it
must be used with acquire/release semantics.

A specific example is the interaction between userfaultfd_register() and
lock_vma_under_rcu().

userfaultfd_register() does the following from the point where it changes
a VMA's flags to the point where concurrent readers are permitted again
(in a simple scenario where only a single private VMA is accessed and no
merging/splitting is involved):

userfaultfd_register
  userfaultfd_set_vm_flags
    vm_flags_reset
      vma_start_write
        down_write(&vma->vm_lock->lock)
        vma->vm_lock_seq = mm_lock_seq [marks VMA as busy]
        up_write(&vma->vm_lock->lock)
      vm_flags_init
        [sets VM_UFFD_* in __vm_flags]
  vma->vm_userfaultfd_ctx.ctx = ctx
  mmap_write_unlock
    vma_end_write_all
      WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA]

There are no memory barriers in between the __vm_flags update and the
mm->mm_lock_seq update that unlocks the VMA, so the unlock can be
reordered to above the `vm_flags_init()` call, which means from the
perspective of a concurrent reader, a VMA can be marked as a userfaultfd
VMA while it is not VMA-locked.  That's bad, we definitely need a
store-release for the unlock operation.

The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly
fine because all accesses to vma->vm_lock_seq that matter are always
protected by the VMA lock.  There is a racy read in vma_start_read()
though that can tolerate false-positives, so we should be using
WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN).

On the other side, lock_vma_under_rcu() works as follows in the relevant
region for locking and userfaultfd check:

lock_vma_under_rcu
  vma_start_read
    vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout]
    down_read_trylock(&vma->vm_lock->lock)
    vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check]
  userfaultfd_armed
    checks vma->vm_flags & __VM_UFFD_FLAGS

Here, the interesting aspect is how far down the mm->mm_lock_seq read can
be reordered - if this read is reordered down below the vma->vm_flags
access, this could cause lock_vma_under_rcu() to partly operate on
information that was read while the VMA was supposed to be locked.  To
prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we
need to read it with a load-acquire.

Some of the comment wording is based on suggestions by Suren.

BACKPORT WARNING: One of the functions changed by this patch (which I've
written against Linus' tree) is vma_try_start_write(), but this function
no longer exists in mm/mm-everything.  I don't know whether the merged
version of this patch will be ordered before or after the patch that
removes vma_try_start_write().  If you're backporting this patch to a tree
with vma_try_start_write(), make sure this patch changes that function.

Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:04 -07:00
Drew Fustini
15571273db scripts/spelling.txt: remove 'thead' as a typo
T-Head is a vendor of processor core IP, and they have recently introduced
the RISC-V TH1520 SoC.  Remove 'thead' as a typo of 'thread' to avoid
checkpatch incorrectly warning that 'thead' is typo in patches that add
support for T-Head designs in the kernel.

Link: https://lkml.kernel.org/r/20230723010329.674186-1-dfustini@baylibre.com
Link: https://www.t-head.cn/
Signed-off-by: Drew Fustini <dfustini@baylibre.com>
Acked-by: Guo Ren <guoren@kernel.org>
Cc: Conor Dooley <conor@kernel.org>
Cc: Jisheng Zhang <jszhang@kernel.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Diederik de Haas <didi.debian@cknow.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Luca Ceresoli <luca.ceresoli@bootlin.com> # versaclock5
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:04 -07:00
Hugh Dickins
8b1cb4a2e8 mm/pagewalk: fix EFI_PGT_DUMP of espfix area
Booting x86_64 with CONFIG_EFI_PGT_DUMP=y shows messages of the form
"mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)".

EFI_PGT_DUMP dumps all of efi_mm, including the espfix area, which is set
up with pmd entries which fit the pmd_bad() check: so 0d940a9b27 warns
and clears those entries, which would ruin running Win16 binaries.

The failing pte_offset_map() stopped such a kernel from even booting,
until a few commits later be872f83bf changed the pagewalk to tolerate
that: but it needs to be even more careful, to not spoil those entries.

I might have preferred to change init_espfix_ap() not to use "bad" pmd
entries; or to leave them out of the efi_mm dump.  But there is great
value in staying away from there, and a pagewalk check of address against
TASK_SIZE may protect from other such aberrations too.

Link: https://lkml.kernel.org/r/22bca736-4cab-9ee5-6a52-73a3b2bbe865@google.com
Closes: https://lore.kernel.org/linux-mm/CABXGCsN3JqXckWO=V7p=FhPU1tK03RE1w9UE6xL5Y86SMk209w@mail.gmail.com/
Fixes: 0d940a9b27 ("mm/pgtable: allow pte_offset_map[_lock]() to fail")
Fixes: be872f83bf ("mm/pagewalk: walk_pte_range() allow for pte_offset_map()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:04 -07:00
Hugh Dickins
fa598952fa shmem: minor fixes to splice-read implementation
HWPoison: my reading of folio_test_hwpoison() is that it only tests the
head page of a large folio, whereas splice_folio_into_pipe() will splice
as much of the folio as it can: so for safety we should also check the
has_hwpoisoned flag, set if any of the folio's pages are hwpoisoned. 
(Perhaps that ugliness can be improved at the mm end later.)

The call to splice_zeropage_into_pipe() risked overrunning past EOF: ask
it for "part" not "len".

Link: https://lkml.kernel.org/r/32c72c9c-72a8-115f-407d-f0148f368@google.com
Fixes: bd194b1871 ("shmem: Implement splice-read")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:03 -07:00
Hugh Dickins
253e5df8b8 tmpfs: fix Documentation of noswap and huge mount options
The noswap mount option is surely not one of the three options for sizing:
move its description down.

The huge= mount option does not accept numeric values: those are just in
an internal enum.  Delete those numbers, and follow the manpage text more
closely (but there's not yet any fadvise() or fcntl() which applies here).

/sys/kernel/mm/transparent_hugepage/shmem_enabled is hard to describe, and
barely relevant to mounting a tmpfs: just refer to transhuge.rst (while
still using the words deny and force, to help as informal reminders).

[rdunlap@infradead.org: fixup Docs table for huge mount options]
  Link: https://lkml.kernel.org/r/20230725052333.26857-1-rdunlap@infradead.org
Link: https://lkml.kernel.org/r/986cb0bf-9780-354-9bb-4bf57aadbab@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: d0f5a85442 ("shmem: update documentation")
Fixes: 2c6efe9cf2 ("shmem: add support to ignore swap")
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> 
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:03 -07:00
Andy Shevchenko
dddfa05eb5 Revert "um: Use swap() to make code cleaner"
This reverts commit 9b0da3f223.

The sigio.c is clearly user space code which is handled by
arch/um/scripts/Makefile.rules (see USER_OBJS rule).

The above mentioned commit simply broke this agreement,
we may not use Linux kernel internal headers in them without
thorough thinking.

Hence, revert the wrong commit.

Link: https://lkml.kernel.org/r/20230724143131.30090-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307212304.cH79zJp1-lkp@intel.com/
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Herve Codina <herve.codina@bootlin.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Richard Weinberger <richard@nod.at>
Cc: Yang Guang <yang.guang5@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:03 -07:00
Feng Tang
d1836a3b2a mm/damon/core-test: initialise context before test in damon_test_set_attrs()
Running kunit test for 6.5-rc1 hits one bug:

        ok 10 damon_test_update_monitoring_result
    general protection fault, probably for non-canonical address 0x1bffa5c419cfb81: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 1 PID: 110 Comm: kunit_try_catch Tainted: G                 N 6.5.0-rc2 #15
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
    RIP: 0010:damon_set_attrs+0xb9/0x120
    Code: f8 00 00 00 4c 8d 58 e0 48 39 c3 74 ba 41 ba 59 17 b7 d1 49 8b 43 10 4d
    8d 4b 10 48 8d 70 e0 49 39 c1 74 50 49 8b 40 08 31 d2 <69> 4e 18 10 27 00 00
    49 f7 30 31 d2 48 89 c5 89 c8 f7 f5 31 d2 89
    RSP: 0000:ffffc900005bfd40 EFLAGS: 00010246
    RAX: ffffffff81159fc0 RBX: ffffc900005bfeb8 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 01bffa5c419cfb69 RDI: ffffc900005bfd70
    RBP: ffffc90000013c10 R08: ffffc900005bfdc0 R09: ffffffff81ff10ed
    R10: 00000000d1b71759 R11: ffffffff81ff10dd R12: ffffc90000013a78
    R13: ffff88810eb78180 R14: ffffffff818297c0 R15: ffffc90000013c28
    FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 0000000002a1c001 CR4: 0000000000370ee0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     damon_test_set_attrs+0x63/0x1f0
     kunit_generic_run_threadfn_adapter+0x17/0x30
     kthread+0xfd/0x130

The problem seems to be related with the damon_ctx was used without
being initialized. Fix it by adding the initialization.

Link: https://lkml.kernel.org/r/20230718052811.1065173-1-feng.tang@intel.com
Fixes: aa13779be6 ("mm/damon/core-test: add a test for damon_set_attrs()")
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:03 -07:00
Linus Torvalds
57012c5753 Merge tag 'net-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from can, netfilter.

  Current release - regressions:

   - core: fix splice_to_socket() for O_NONBLOCK socket

   - af_unix: fix fortify_panic() in unix_bind_bsd().

   - can: raw: fix lockdep issue in raw_release()

  Previous releases - regressions:

   - tcp: reduce chance of collisions in inet6_hashfn().

   - netfilter: skip immediate deactivate in _PREPARE_ERROR

   - tipc: stop tipc crypto on failure in tipc_node_create

   - eth: igc: fix kernel panic during ndo_tx_timeout callback

   - eth: iavf: fix potential deadlock on allocation failure

  Previous releases - always broken:

   - ipv6: fix bug where deleting a mngtmpaddr can create a new
     temporary address

   - eth: ice: fix memory management in ice_ethtool_fdir.c

   - eth: hns3: fix the imp capability bit cannot exceed 32 bits issue

   - eth: vxlan: calculate correct header length for GPE

   - eth: stmmac: apply redundant write work around on 4.xx too"

* tag 'net-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
  tipc: stop tipc crypto on failure in tipc_node_create
  af_unix: Terminate sun_path when bind()ing pathname socket.
  tipc: check return value of pskb_trim()
  benet: fix return value check in be_lancer_xmit_workarounds()
  virtio-net: fix race between set queues and probe
  net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64
  splice, net: Fix splice_to_socket() for O_NONBLOCK socket
  net: fec: tx processing does not call XDP APIs if budget is 0
  mptcp: more accurate NL event generation
  selftests: mptcp: join: only check for ip6tables if needed
  tools: ynl-gen: fix parse multi-attr enum attribute
  tools: ynl-gen: fix enum index in _decode_enum(..)
  netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
  netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
  netfilter: nft_set_rbtree: fix overlap expiration walk
  igc: Fix Kernel Panic during ndo_tx_timeout callback
  net: dsa: qca8k: fix mdb add/del case with 0 VID
  net: dsa: qca8k: fix broken search_and_del
  net: dsa: qca8k: fix search_and_insert wrong handling of new rule
  net: dsa: qca8k: enable use_single_write for qca8xxx
  ...
2023-07-27 12:27:37 -07:00
Linus Torvalds
bc168790de Merge tag 'soundwire-6.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fixes from Vinod Koul:

 - Core fix for enumeration completion

 - Qualcomm driver fix to update status

 - AMD driver fix for probe error check

* tag 'soundwire-6.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: amd: Fix a check for errors in probe()
  soundwire: qcom: update status correctly with mask
  soundwire: fix enumeration completion
2023-07-27 12:07:41 -07:00
Linus Torvalds
53c8621b9e Merge tag 'phy-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy fixes from Vinod Koul:

 - Out of bound fix for hisilicon phy

 - Qualcomm synopsis femto phy for keeping clock enabled during suspend
   and enabling ref clocks

 - Mediatek driver fixes for upper limit test and error code

* tag 'phy-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: hisilicon: Fix an out of bounds check in hisi_inno_phy_probe()
  phy: qcom-snps-femto-v2: use qcom_snps_hsphy_suspend/resume error code
  phy: qcom-snps-femto-v2: properly enable ref clock
  phy: qcom-snps-femto-v2: keep cfg_ahb_clk enabled during runtime suspend
  phy: mediatek: hdmi: mt8195: fix prediv bad upper limit test
  phy: phy-mtk-dp: Fix an error code in probe()
2023-07-27 11:52:41 -07:00
Linus Torvalds
64de76ce8e Merge tag 'for-6.5-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - fix accounting of global block reserve size when block group tree is
   enabled

 - the async discard has been enabled in 6.2 unconditionally, but for
   zoned mode it does not make that much sense to do it asynchronously
   as the zones are reset as needed

 - error handling and proper error value propagation fixes

* tag 'for-6.5-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: check for commit error at btrfs_attach_transaction_barrier()
  btrfs: check if the transaction was aborted at btrfs_wait_for_commit()
  btrfs: remove BUG_ON()'s in add_new_free_space()
  btrfs: account block group tree when calculating global reserve size
  btrfs: zoned: do not enable async discard
2023-07-27 11:44:08 -07:00
Linus Torvalds
379e66711b Merge tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
 "A call to memblock_free() or memblock_phys_free() issued after
  memblock data is discarded will result in use after free in
  memblock_isolate_range().

  Avoid those issues by making sure that memblock_discard points
  memblock.reserved.regions back at the static buffer"

* tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  mm,memblock: reset memblock.reserved to system init state to prevent UAF
2023-07-27 11:37:34 -07:00
Jiri Pirko
9eca8bb8da net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefix
As esw_offloads_load/unload_rep() are used outside eswitch.c it is nicer
for them to have "mlx5_" prefix. Add it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27 11:37:31 -07:00
Jiri Pirko
329980d05d net/mlx5: Make mlx5_eswitch_load/unload_vport() static
mlx5_eswitch_load/unload_vport()() functions are not used
outside of eswitch.c. Make them static.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27 11:37:30 -07:00
Jiri Pirko
b71863876f net/mlx5: Make mlx5_esw_offloads_rep_load/unload() static
mlx5_esw_offloads_rep_load/unload() functions are not used
outside of eswitch_offloads.c. Make them static.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27 11:37:30 -07:00
Jiri Pirko
3e82a9cf57 net/mlx5: Remove pointless devlink_rate checks
It is guaranteed that the devlink rate leaf is created during init paths.
No need to check during cleanup. Remove the checks.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27 11:37:30 -07:00
Jiri Pirko
550449d8e3 net/mlx5: Don't check vport->enabled in port ops
vport->enabled is always set for a vport for which a devlink port is
registered, therefore the checks in the ops are pointless.
Remove those.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27 11:37:30 -07:00
Parav Pandit
b9335a7572 net/mlx5e: Make flow classification filters static
Get and set flow classification filters are used in a single file.
Hence, make them static.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27 11:37:30 -07:00
Parav Pandit
9ec85cc9c9 net/mlx5e: Remove duplicate code for user flow
Flow table and priority detection is same for IP user flows and other L4
flows. Hence, use same code for all these flow types.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27 11:37:30 -07:00
Shay Drory
b90ebfc018 net/mlx5: Allocate command stats with xarray
Command stats is an array with more than 2K entries, which amounts to
~180KB. This is way more than actually needed, as only ~190 entries
are being used.

Therefore, replace the array with xarray.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27 11:37:29 -07:00
Shay Drory
06cd555f73 net/mlx5: split mlx5_cmd_init() to probe and reload routines
There is no need to destroy and allocate cmd SW structs during reload,
this is time consuming for no reason.
Hence, split mlx5_cmd_init() to probe and reload routines.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27 11:37:29 -07:00
Shay Drory
0714ec9ea1 net/mlx5: Remove redundant cmdif revision check
mlx5 is checking the cmdif revision twice, for no reason.
Remove the latter check.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27 11:37:29 -07:00
Shay Drory
58db72869a net/mlx5: Re-organize mlx5_cmd struct
Downstream patch will split mlx5_cmd_init() to probe and reload
routines. As a preparation, organize mlx5_cmd struct so that any
field that will be used in the reload routine are grouped at new
nested struct.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27 11:37:29 -07:00