Commit Graph

1157677 Commits

Author SHA1 Message Date
Yoshihiro Shimoda
33f5d733b5 net: renesas: rswitch: Improve TX timestamp accuracy
In the previous code, TX timestamp accuracy was bad because the irq
handler got the timestamp from the timestamp register at that time.

This hardware has "Timestamp capture" feature which can store
each TX timestamp into the timestamp descriptors. To improve
TX timestamp accuracy, implement timestamp descriptors' handling.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:44:17 -08:00
Yoshihiro Shimoda
48cf0a2570 net: renesas: rswitch: Remove gptp flag from rswitch_gwca_queue
In the previous code, the gptp flag was completely related to
the !dir_tx in struct rswitch_gwca_queue because
rswitch_gwca_queue_alloc() was called below:

< In rswitch_txdmac_alloc() >
err = rswitch_gwca_queue_alloc(ndev, priv, rdev->tx_queue, true, false,
			      TX_RING_SIZE);
So, dir_tx = true, and gptp = false.

< In rswitch_rxdmac_alloc() >
err = rswitch_gwca_queue_alloc(ndev, priv, rdev->rx_queue, false, true,
			      RX_RING_SIZE);
So, dir_tx = false, and gptp = true.

In the future, a new queue handling for timestamp will be implemented
and this gptp flag is confusable. So, remove the gptp flag.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:44:17 -08:00
Yoshihiro Shimoda
e3f38039c6 net: renesas: rswitch: Move linkfix variables to rswitch_gwca
To improve readability, move linkfix related variables to
struct rswitch_gwca. Also, rename function names "desc" with "linkfix".

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:44:17 -08:00
Yoshihiro Shimoda
251eadcc64 net: renesas: rswitch: Rename rings in struct rswitch_gwca_queue
To add a new ring which is really related to timestamp (ts_ring)
in the future, rename the following members to improve readability:

    ring --> tx_ring
    ts_ring --> rx_ring

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:44:17 -08:00
Larysa Zaremba
1f09049417 ice: xsk: Fix cleaning of XDP_TX frames
Incrementation of xsk_frames inside the for-loop produces
infinite loop, if we have both normal AF_XDP-TX and XDP_TXed
buffers to complete.

Split xsk_frames into 2 variables (xsk_frames and completed_frames)
to eliminate this bug.

Fixes: 29322791bc ("ice: xsk: change batched Tx descriptor cleaning")
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230209160130.1779890-1-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:42:10 -08:00
Pedro Tammela
ee059170b1 net/sched: tcindex: update imperfect hash filters respecting rcu
The imperfect hash area can be updated while packets are traversing,
which will cause a use-after-free when 'tcf_exts_exec()' is called
with the destroyed tcf_ext.

CPU 0:               CPU 1:
tcindex_set_parms    tcindex_classify
tcindex_lookup
                     tcindex_lookup
tcf_exts_change
                     tcf_exts_exec [UAF]

Stop operating on the shared area directly, by using a local copy,
and update the filter with 'rcu_replace_pointer()'. Delete the old
filter version only after a rcu grace period elapsed.

Fixes: 9b0d4446b5 ("net: sched: avoid atomic swap in tcf_exts_change")
Reported-by: valis <sec@valis.email>
Suggested-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:38:27 -08:00
Dan Carpenter
183514f7c5 net: libwx: fix an error code in wx_alloc_page_pool()
This function always returns success.  We need to preserve the error
code before setting rx_ring->page_pool = NULL.

Fixes: 850b971110 ("net: libwx: Allocate Rx and Tx resources")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/Y+T4aoefc1XWvGYb@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:37:13 -08:00
Arnd Bergmann
f99f22e02f net: dsa: ocelot: add PTP dependency for NET_DSA_MSCC_OCELOT_EXT
A new user of MSCC_OCELOT_SWITCH_LIB was added, bringing back an old
link failure that was fixed with e5f3155267 ("ethernet: fix
PTP_1588_CLOCK dependencies"):

x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_ptp_enable':
ocelot_ptp.c:(.text+0x8ee): undefined reference to `ptp_find_pin'
x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_get_ts_info':
ocelot_ptp.c:(.text+0xd5d): undefined reference to `ptp_clock_index'
x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_init_timestamp':
ocelot_ptp.c:(.text+0x15ca): undefined reference to `ptp_clock_register'
x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_deinit_timestamp':
ocelot_ptp.c:(.text+0x16b7): undefined reference to `ptp_clock_unregister'

Add the same PTP dependency here, as well as in the MSCC_OCELOT_SWITCH_LIB
symbol itself to make it more obvious what is going on when the next
driver selects it.

Fixes: 3d7316ac81 ("net: dsa: ocelot: add external ocelot switch control")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Colin Foster <colin.foster@in-advantage.com>
Link: https://lore.kernel.org/r/20230209124435.1317781-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:35:58 -08:00
Pietro Borrello
a1221703a0 sctp: sctp_sock_filter(): avoid list_entry() on possibly empty list
Use list_is_first() to check whether tsp->asoc matches the first
element of ep->asocs, as the list is not guaranteed to have an entry.

Fixes: 8f840e47f1 ("sctp: add the sctp_diag.c file")
Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20230208-sctp-filter-v2-1-6e1f4017f326@diag.uniroma1.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:28:29 -08:00
Siddharth Vadapalli
0ed577e7e8 net: ethernet: ti: am65-cpsw: Add RX DMA Channel Teardown Quirk
In TI's AM62x/AM64x SoCs, successful teardown of RX DMA Channel raises an
interrupt. The process of servicing this interrupt involves flushing all
pending RX DMA descriptors and clearing the teardown completion marker
(TDCM). The am65_cpsw_nuss_rx_packets() function invoked from the RX
NAPI callback services the interrupt. Thus, it is necessary to wait for
this handler to run, drain all packets and clear TDCM, before calling
napi_disable() in am65_cpsw_nuss_common_stop() function post channel
teardown. If napi_disable() executes before ensuring that TDCM is
cleared, the TDCM remains set when the interfaces are down, resulting in
an interrupt storm when the interfaces are brought up again.

Since the interrupt raised to indicate the RX DMA Channel teardown is
specific to the AM62x and AM64x SoCs, add a quirk for it.

Fixes: 4f7cce2724 ("net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g")
Co-developed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230209084432.189222-1-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:22:32 -08:00
Jakub Kicinski
1bc6cc4f7b Merge branch 'bridge-mcast-preparations-for-vxlan-mdb'
Ido Schimmel says:

====================
bridge: mcast: Preparations for VXLAN MDB

This patchset contains small preparations for VXLAN MDB that were split
from this RFC [1]. Tested using existing bridge MDB forwarding
selftests.

[1] https://lore.kernel.org/netdev/20230204170801.3897900-1-idosch@nvidia.com/
====================

Link: https://lore.kernel.org/r/20230209071852.613102-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:21:15 -08:00
Ido Schimmel
049139126e selftests: forwarding: Add MDB dump test cases
The kernel maintains three markers for the MDB dump:

1. The last bridge device from which the MDB was dumped.
2. The last MDB entry from which the MDB was dumped.
3. The last port-group entry that was dumped.

Add test cases for large scale MDB dump to make sure that all the
configured entries are dumped and that the markers are used correctly.

Specifically, create 2 bridges with 32 ports and add 256 MDB entries in
which all the ports are member of. Test that each bridge reports 8192
(256 * 32) permanent entries. Do that with IPv4, IPv6 and L2 MDB
entries.

On my system, MDB dump of the above is contained in about 50 netlink
messages.

Example output:

 # ./bridge_mdb.sh
 [...]
 INFO: # Large scale dump tests
 TEST: IPv4 large scale dump tests                                   [ OK ]
 TEST: IPv6 large scale dump tests                                   [ OK ]
 TEST: L2 large scale dump tests                                     [ OK ]
 [...]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:21:13 -08:00
Ido Schimmel
170afa71e3 bridge: mcast: Move validation to a policy
Future patches are going to move parts of the bridge MDB code to the
common rtnetlink code in preparation for VXLAN MDB support. To
facilitate code sharing between both drivers, move the validation of the
top level attributes in RTM_{NEW,DEL}MDB messages to a policy that will
eventually be moved to the rtnetlink code.

Use 'NLA_NESTED' for 'MDBA_SET_ENTRY_ATTRS' instead of
NLA_POLICY_NESTED() as this attribute is going to be validated using
different policies in the underlying drivers.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:21:13 -08:00
Ido Schimmel
7ea829664d bridge: mcast: Remove pointless sequence generation counter assignment
The purpose of the sequence generation counter in the netlink callback
is to identify if a multipart dump is consistent or not by calling
nl_dump_check_consistent() whenever a message is generated.

The function is not invoked by the MDB code, rendering the sequence
generation counter assignment pointless. Remove it.

Note that even if the function was invoked, we still could not
accurately determine if the dump is consistent or not, as there is no
sequence generation counter for MDB entries, unlike nexthop objects, for
example.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:21:13 -08:00
Ido Schimmel
ccd7f25b5b bridge: mcast: Use correct define in MDB dump
'MDB_PG_FLAGS_PERMANENT' and 'MDB_PERMANENT' happen to have the same
value, but the latter is uAPI and cannot change, so use it when dumping
an MDB entry.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:21:13 -08:00
Jakub Kicinski
ee7e1788ae Merge tag 'for-net-next-2023-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:

====================
pull-request: bluetooth-next

 - Add new PID/VID 0489:e0f2 for MT7921
 - Add VID:PID 13d3:3529 for Realtek RTL8821CE
 - Add CIS feature bits to controller information
 - Set Per Platform Antenna Gain(PPAG) for Intel controllers

* tag 'for-net-next-2023-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next:
  Bluetooth: btintel: Set Per Platform Antenna Gain(PPAG)
  Bluetooth: Make sure LE create conn cancel is sent when timeout
  Bluetooth: Free potentially unfreed SCO connection
  Bluetooth: hci_qca: get wakeup status from serdev device handle
  Bluetooth: L2CAP: Fix potential user-after-free
  Bluetooth: MGMT: add CIS feature bits to controller information
  Bluetooth: hci_conn: Refactor hci_bind_bis() since it always succeeds
  Bluetooth: HCI: Replace zero-length arrays with flexible-array members
  Bluetooth: qca: Fix sparse warnings
  Bluetooth: btusb: Add VID:PID 13d3:3529 for Realtek RTL8821CE
  Bluetooth: btusb: Add new PID/VID 0489:e0f2 for MT7921
  Bluetooth: Fix issue with Actions Semi ATS2851 based devices
====================

Link: https://lore.kernel.org/r/20230209234922.3756173-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 19:07:10 -08:00
Alexei Starovoitov
ab86cf337a Merge branch 'bpf, mm: introduce cgroup.memory=nobpf'
Yafang Shao says:

====================

The bpf memory accouting has some known problems in contianer
environment,

- The container memory usage is not consistent if there's pinned bpf
  program
  After the container restart, the leftover bpf programs won't account
  to the new generation, so the memory usage of the container is not
  consistent. This issue can be resolved by introducing selectable
  memcg, but we don't have an agreement on the solution yet. See also
  the discussions at https://lwn.net/Articles/905150/ .

- The leftover non-preallocated bpf map can't be limited
  The leftover bpf map will be reparented, and thus it will be limited by
  the parent, rather than the container itself. Furthermore, if the
  parent is destroyed, it be will limited by its parent's parent, and so
  on. It can also be resolved by introducing selectable memcg.

- The memory dynamically allocated in bpf prog is charged into root memcg
  only
  Nowdays the bpf prog can dynamically allocate memory, for example via
  bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
  pool, so it will charge into root memcg only. That needs to be
  addressed by a new proposal.

So let's give the container user an option to disable bpf memory accouting.

The idea of "cgroup.memory=nobpf" is originally by Tejun[1].

[1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/

Changes,
v1->v2:
- squash patches (Roman)
- commit log improvement in patch #2. (Johannes)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10 19:00:13 -08:00
Yafang Shao
bf39650824 bpf: allow to disable bpf prog memory accounting
We can simply disable the bpf prog memory accouting by not setting the
GFP_ACCOUNT.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://lore.kernel.org/r/20230210154734.4416-5-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10 18:59:57 -08:00
Yafang Shao
ee53cbfb1e bpf: allow to disable bpf map memory accounting
We can simply set root memcg as the map's memcg to disable bpf memory
accounting. bpf_map_area_alloc is a little special as it gets the memcg
from current rather than from the map, so we need to disable GFP_ACCOUNT
specifically for it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://lore.kernel.org/r/20230210154734.4416-4-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10 18:59:57 -08:00
Yafang Shao
ddef81b5fd bpf: use bpf_map_kvcalloc in bpf_local_storage
Introduce new helper bpf_map_kvcalloc() for the memory allocation in
bpf_local_storage(). Then the allocation will charge the memory from the
map instead of from current, though currently they are the same thing as
it is only used in map creation path now. By charging map's memory into
the memcg from the map, it will be more clear.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://lore.kernel.org/r/20230210154734.4416-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10 18:59:56 -08:00
Yafang Shao
b6c1a8af5b mm: memcontrol: add new kernel parameter cgroup.memory=nobpf
Add new kernel parameter cgroup.memory=nobpf to allow user disable bpf
memory accounting. This is a preparation for the followup patch.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://lore.kernel.org/r/20230210154734.4416-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10 18:59:56 -08:00
Daniel Borkmann
7e2a9ebe81 docs, bpf: Ensure IETF's BPF mailing list gets copied for ISA doc changes
Given BPF is increasingly being used beyond just the Linux kernel, with
implementations in NICs and other hardware, Windows, etc, there is an
ongoing effort to document and standardize parts of the existing BPF
infrastructure such as its ISA. As "source of truth" we decided some
time ago to rely on the in-tree documentation, in particular, starting
out with the Documentation/bpf/instruction-set.rst as a base for later
RFC drafts on the ISA. Therefore, we want to ensure that changes to that
document have bpf@ietf.org in Cc, so add a MAINTAINERS file entry with
a section on documents related to standardization efforts. For now, this
only relates to instruction-set.rst, and later additional files will be
added.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Thaler <dthaler@microsoft.com>
Cc: bpf@ietf.org
Link: https://datatracker.ietf.org/doc/bofreq-thaler-bpf-ebpf/
Link: https://lore.kernel.org/r/57619c0dd8e354d82bf38745f99405e3babdc970.1676068387.git.daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10 18:58:21 -08:00
Jakub Kicinski
de42873367 Daniel Borkmann says:
====================
pull-request: bpf-next 2023-02-11

We've added 96 non-merge commits during the last 14 day(s) which contain
a total of 152 files changed, 4884 insertions(+), 962 deletions(-).

There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c
between commit 5b246e533d ("ice: split probe into smaller functions")
from the net-next tree and commit 66c0e13ad2 ("drivers: net: turn on
XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev()
is otherwise there a 2nd time, and add XDP features to the existing
ice_cfg_netdev() one:

        [...]
        ice_set_netdev_features(netdev);
        netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
                               NETDEV_XDP_ACT_XSK_ZEROCOPY;
        ice_set_ops(netdev);
        [...]

Stephen's merge conflict mail:
https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/

The main changes are:

1) Add support for BPF trampoline on s390x which finally allows to remove many
   test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich.

2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski.

3) Add capability to export the XDP features supported by the NIC.
   Along with that, add a XDP compliance test tool,
   from Lorenzo Bianconi & Marek Majtyka.

4) Add __bpf_kfunc tag for marking kernel functions as kfuncs,
   from David Vernet.

5) Add a deep dive documentation about the verifier's register
   liveness tracking algorithm, from Eduard Zingerman.

6) Fix and follow-up cleanups for resolve_btfids to be compiled
   as a host program to avoid cross compile issues,
   from Jiri Olsa & Ian Rogers.

7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted
   when testing on different NICs, from Jesper Dangaard Brouer.

8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang.

9) Extend libbpf to add an option for when the perf buffer should
   wake up, from Jon Doron.

10) Follow-up fix on xdp_metadata selftest to just consume on TX
    completion, from Stanislav Fomichev.

11) Extend the kfuncs.rst document with description on kfunc
    lifecycle & stability expectations, from David Vernet.

12) Fix bpftool prog profile to skip attaching to offline CPUs,
    from Tonghao Zhang.

====================

Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 17:51:27 -08:00
Randy Dunlap
d12f9ad028 Documentation: isdn: correct spelling
Correct spelling problems for Documentation/isdn/ as reported
by codespell.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20230209071400.31476-9-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 16:28:13 -08:00
Jakub Kicinski
33c6ce4a4c Merge branch 'net-move-more-duplicate-code-of-ovs-and-tc-conntrack-into-nf_conntrack_ovs'
Xin Long says:

====================
net: move more duplicate code of ovs and tc conntrack into nf_conntrack_ovs

We've moved some duplicate code into nf_nat_ovs in:

  "net: eliminate the duplicate code in the ct nat functions of ovs and tc"

This patchset addresses more code duplication in the conntrack of ovs
and tc then creates nf_conntrack_ovs for them, and four functions will
be extracted and moved into it:

  nf_ct_handle_fragments()
  nf_ct_skb_network_trim()
  nf_ct_helper()
  nf_ct_add_helper()
====================

Link: https://lore.kernel.org/r/cover.1675810210.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 16:23:05 -08:00
Xin Long
0785407e78 net: extract nf_ct_handle_fragments to nf_conntrack_ovs
Now handle_fragments() in OVS and TC have the similar code, and
this patch removes the duplicate code by moving the function
to nf_conntrack_ovs.

Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be
done only when defrag returns 0, as it does in other places
in kernel.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 16:23:03 -08:00
Xin Long
558d95e7e1 net: sched: move frag check and tc_skb_cb update out of handle_fragments
This patch has no functional changes and just moves frag check and
tc_skb_cb update out of handle_fragments, to make it easier to move
the duplicate code from handle_fragments() into nf_conntrack_ovs later.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 16:23:03 -08:00
Xin Long
1b83bf4489 openvswitch: move key and ovs_cb update out of handle_fragments
This patch has no functional changes and just moves key and ovs_cb update
out of handle_fragments, and skb_clear_hash() and skb->ignore_df change
into handle_fragments(), to make it easier to move the duplicate code
from handle_fragments() into nf_conntrack_ovs later.

Note that it changes to pass info->family to handle_fragments() instead
of key for the packet type check, as info->family is set according to
key->eth.type in ovs_ct_copy_action() when creating the action.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 16:23:03 -08:00
Xin Long
67fc5d7ffb net: extract nf_ct_skb_network_trim function to nf_conntrack_ovs
There are almost the same code in ovs_skb_network_trim() and
tcf_ct_skb_network_trim(), this patch extracts them into a function
nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 16:23:03 -08:00
Xin Long
c0c3ab63de net: create nf_conntrack_ovs for ovs and tc use
Similar to nf_nat_ovs created by Commit ebddb14049 ("net: move the
nat function to nf_nat_ovs for ovs and tc"), this patch is to create
nf_conntrack_ovs to get these functions shared by OVS and TC only.

There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper
in this patch, and will be more in the following patches.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 16:23:03 -08:00
Linus Torvalds
420b2d431d Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
 "Two clk driver fixes

   - Use devm_kasprintf() to avoid overflows when forming clk names in
     the Microchip PolarFire driver

   - Fix the pretty broken Ingenic JZ4760 M/N/OD calculation to actually
     work and find proper divisors"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: ingenic: jz4760: Update M/N/OD calculation algorithm
  clk: microchip: mpfs-ccc: Use devm_kasprintf() for allocating formatted strings
2023-02-10 15:28:08 -08:00
Ilya Leoshkevich
17bcd27a08 libbpf: Fix alen calculation in libbpf_nla_dump_errormsg()
The code assumes that everything that comes after nlmsgerr are nlattrs.
When calculating their size, it does not account for the initial
nlmsghdr. This may lead to accessing uninitialized memory.

Fixes: bbf48c18ee ("libbpf: add error reporting in XDP")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230210001210.395194-8-iii@linux.ibm.com
2023-02-10 15:27:22 -08:00
Ilya Leoshkevich
202702e890 selftests/bpf: Attach to fopen()/fclose() in attach_probe
malloc() and free() may be completely replaced by sanitizers, use
fopen() and fclose() instead.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230210001210.395194-7-iii@linux.ibm.com
2023-02-10 15:21:27 -08:00
Ilya Leoshkevich
907300c7a6 selftests/bpf: Attach to fopen()/fclose() in uprobe_autoattach
malloc() and free() may be completely replaced by sanitizers, use
fopen() and fclose() instead.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230210001210.395194-6-iii@linux.ibm.com
2023-02-10 15:21:27 -08:00
Ilya Leoshkevich
24a87b477c selftests/bpf: Forward SAN_CFLAGS and SAN_LDFLAGS to runqslower and libbpf
To get useful results from the Memory Sanitizer, all code running in a
process needs to be instrumented. When building tests with other
sanitizers, it's not strictly necessary, but is also helpful.
So make sure runqslower and libbpf are compiled with SAN_CFLAGS and
linked with SAN_LDFLAGS.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230210001210.395194-5-iii@linux.ibm.com
2023-02-10 15:21:27 -08:00
Ilya Leoshkevich
0589d16475 selftests/bpf: Split SAN_CFLAGS and SAN_LDFLAGS
Memory Sanitizer requires passing different options to CFLAGS and
LDFLAGS: besides the mandatory -fsanitize=memory, one needs to pass
header and library paths, and passing -L to a compilation step
triggers -Wunused-command-line-argument. So introduce a separate
variable for linker flags. Use $(SAN_CFLAGS) as a default in order to
avoid complicating the ASan usage.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230210001210.395194-4-iii@linux.ibm.com
2023-02-10 15:21:27 -08:00
Ilya Leoshkevich
585bf4640e tools: runqslower: Add EXTRA_CFLAGS and EXTRA_LDFLAGS support
This makes it possible to add sanitizer flags.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230210001210.395194-3-iii@linux.ibm.com
2023-02-10 15:21:26 -08:00
Ilya Leoshkevich
795deb3f97 selftests/bpf: Quote host tools
Using HOSTCC="ccache clang" breaks building the tests, since, when it's
forwarded to e.g. bpftool, the child make sees HOSTCC=ccache and
"clang" is considered a target. Fix by quoting it, and also HOSTLD and
HOSTAR for consistency.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230210001210.395194-2-iii@linux.ibm.com
2023-02-10 15:21:26 -08:00
Linus Torvalds
545c80ab34 Merge tag 'pinctrl-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
 "Some assorted pin control fixes, the most interesting will be the
  Intel patch fixing a classic problem: laptop touchpad IRQs...

   - Some pin drive register fixes in the Mediatek driver.

   - Return proper error code in the Aspeed driver, and revert and
     ill-advised force-disablement patch that needs to be reworked.

   - Fix AMD driver debug output.

   - Fix potential NULL dereference in the Single driver.

   - Fix a group definition error in the Qualcomm SM8450 LPASS driver.

   - Restore pins used in direct IRQ mode in the Intel driver (This
     fixes some laptop touchpads!)"

* tag 'pinctrl-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: intel: Restore the pins that used to be in Direct IRQ mode
  pinctrl: qcom: sm8450-lpass-lpi: correct swr_rx_data group
  pinctrl: aspeed: Revert "Force to disable the function's signal"
  pinctrl: single: fix potential NULL dereference
  pinctrl: amd: Fix debug output for debounce time
  pinctrl: aspeed: Fix confusing types in return value
  pinctrl: mediatek: Fix the drive register definition of some Pins
2023-02-10 15:02:16 -08:00
Linus Torvalds
4cfd5afcd8 Merge tag 'pci-v6.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:

 - Move to a shared PCI git tree (Bjorn Helgaas)

 - Add Krzysztof Wilczyński as another PCI maintainer (Lorenzo
   Pieralisi)

 - Revert a couple ASPM patches to fix suspend/resume regressions (Bjorn
   Helgaas)

* tag 'pci-v6.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming"
  Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"
  MAINTAINERS: Promote Krzysztof to PCI controller maintainer
  MAINTAINERS: Move to shared PCI tree
2023-02-10 14:18:48 -08:00
Bjorn Helgaas
ff209ecc37 Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming"
This reverts commit 5e85eba6f5.

Thomas Witt reported that 5e85eba6f5 ("PCI/ASPM: Refactor L1 PM Substates
Control Register programming") broke suspend/resume on a Tuxedo
Infinitybook S 14 v5, which seems to use a Clevo L140CU Mainboard.

The main symptom is:

  iwlwifi 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
  nvme 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible

and the machine is only partially usable after resume.  It can't run dmesg
and can't do a clean reboot.  This happens on every suspend/resume cycle.

Revert 5e85eba6f5 until we can figure out the root cause.

Fixes: 5e85eba6f5 ("PCI/ASPM: Refactor L1 PM Substates Control Register programming")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Reported-by: Thomas Witt <kernel@witt.link>
Tested-by: Thomas Witt <kernel@witt.link>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org	# v6.1+
Cc: Vidya Sagar <vidyas@nvidia.com>
2023-02-10 15:30:24 -06:00
Bjorn Helgaas
a7152be79b Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"
This reverts commit 4ff116d0d5.

Tasev Nikola and Mark Enriquez reported that resume from suspend was broken
in v6.1-rc1.  Tasev bisected to a47126ec29 ("PCI/PTM: Cache PTM
Capability offset"), but we can't figure out how that could be related.

Mark saw the same symptoms and bisected to 4ff116d0d5 ("PCI/ASPM: Save L1
PM Substates Capability for suspend/resume"), which does have a connection:
it restores L1 Substates configuration while ASPM L1 may be enabled:

  pci_restore_state
    pci_restore_aspm_l1ss_state
      aspm_program_l1ss
        pci_write_config_dword(PCI_L1SS_CTL1, ctl1)         # L1SS restore
    pci_restore_pcie_state
      pcie_capability_write_word(PCI_EXP_LNKCTL, cap[i++])  # L1 restore

which is a problem because PCIe r6.0, sec 5.5.4, requires that:

  If setting either or both of the enable bits for ASPM L1 PM
  Substates, both ports must be configured as described in this
  section while ASPM L1 is disabled.

Separately, Thomas Witt reported that 5e85eba6f5 ("PCI/ASPM: Refactor L1
PM Substates Control Register programming") broke suspend/resume, and it
depends on 4ff116d0d5.

Revert 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") to fix the resume issue and enable revert of 5e85eba6f5
to fix the issue Thomas reported.

Note that reverting 4ff116d0d5 means L1 Substates config may be lost on
suspend/resume.  As far as we know the system will use more power but will
still *work* correctly.

Fixes: 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Reported-by: Tasev Nikola <tasev.stefanoska@skynet.be>
Reported-by: Mark Enriquez <enriquezmark36@gmail.com>
Reported-by: Thomas Witt <kernel@witt.link>
Tested-by: Mark Enriquez <enriquezmark36@gmail.com>
Tested-by: Thomas Witt <kernel@witt.link>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org	# v6.1+
Cc: Vidya Sagar <vidyas@nvidia.com>
2023-02-10 15:29:53 -06:00
Linus Torvalds
4f72a263e1 Merge tag 'soc-fixes-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
 "All the changes this time are minor devicetree corrections, the
  majority being for 64-bit Rockchip SoC support. These are a couple of
  corrections for properties that are in violation of the binding, some
  that put the machine into safer operating points for the eMMC and
  thermal settings, and missing properties that prevented rk356x PCIe
  and ethernet from working correctly.

  The changes for amlogic and mediatek address incorrect properties that
  were preventing the display support on MT8195 and the MMC support on
  various Meson SoCs from working correctly.

  The stihxxx-b2120 change fixes the GPIO polarity for the DVB tuner to
  allow this to be used correctly after a futre driver change, though it
  has no effect on older kernels"

* tag 'soc-fixes-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive
  arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive
  arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive
  ARM: dts: stihxxx-b2120: fix polarity of reset line of tsin0 port
  arm64: dts: mediatek: mt8195: Fix vdosys* compatible strings
  arm64: dts: rockchip: align rk3399 DMC OPP table with bindings
  arm64: dts: rockchip: set sdmmc0 speed to sd-uhs-sdr50 on rock-3a
  arm64: dts: rockchip: fix probe of analog sound card on rock-3a
  arm64: dts: rockchip: add missing #interrupt-cells to rk356x pcie2x1
  arm64: dts: rockchip: fix input enable pinconf on rk3399
  ARM: dts: rockchip: add power-domains property to dp node on rk3288
  arm64: dts: rockchip: add io domain setting to rk3566-box-demo
  arm64: dts: rockchip: remove unsupported property from sdmmc2 for rock-3a
  arm64: dts: rockchip: drop unused LED mode property from rk3328-roc-cc
  arm64: dts: rockchip: reduce thermal limits on rk3399-pinephone-pro
  arm64: dts: rockchip: use correct reset names for rk3399 crypto nodes
2023-02-10 09:48:42 -08:00
Linus Torvalds
8e9a8427a1 Merge tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
 "This is a little bigger that I'd hope for this late in the cycle, but
  they're all pretty concrete fixes and the only one that's bigger than
  a few lines is pmdp_collapse_flush() (which is almost all
  boilerplate/comment). It's also all bug fixes for issues that have
  been around for a while.

  So I think it's not all that scary, just bad timing.

   - avoid partial TLB fences for huge pages, which are disallowed by
     the ISA

   - avoid missing a frame when dumping stacks

   - avoid misaligned accesses (and possibly overflows) in kprobes

   - fix a race condition in tracking page dirtiness"

* tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte
  riscv: kprobe: Fixup misaligned load text
  riscv: stacktrace: Fix missing the first frame
  riscv: mm: Implement pmdp_collapse_flush for THP
2023-02-10 09:27:52 -08:00
Linus Torvalds
3647d2d706 Merge tag 'ceph-for-6.2-rc8' of https://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
 "A fix for a pretty embarrassing omission in the session flush handler
  from Xiubo, marked for stable"

* tag 'ceph-for-6.2-rc8' of https://github.com/ceph/ceph-client:
  ceph: flush cap releases when the session is flushed
2023-02-10 09:04:00 -08:00
Linus Torvalds
29716680ad Merge tag 'block-6.2-2023-02-10' of git://git.kernel.dk/linux
Pull block fix from Jens Axboe:
 "A single fix for a smatch regression introduced in this merge window"

* tag 'block-6.2-2023-02-10' of git://git.kernel.dk/linux:
  nvme-auth: mark nvme_auth_wq static
2023-02-10 08:55:09 -08:00
Linus Torvalds
4fe3722397 Merge tag 'sound-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "Hopefully the last one for 6.2, a collection of the fixes that have
  been gathered since the last pull.

  All changes are small and trivial device-specific fixes"

* tag 'sound-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Add Positivo N14KP6-TG
  ASoC: topology: Return -ENOMEM on memory allocation failure
  ALSA: emux: Avoid potential array out-of-bound in snd_emux_xg_control()
  ASoC: fsl_sai: fix getting version from VERID
  ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform.
  ALSA: hda/realtek: Add quirk for ASUS UM3402 using CS35L41
  ASoC: codecs: es8326: Fix DTS properties reading
  ASoC: tas5805m: add missing page switch.
  ASoC: tas5805m: rework to avoid scheduling while atomic.
  ALSA: hda/realtek: Enable mute/micmute LEDs on HP Elitebook, 645 G9
  ASoC: SOF: amd: Fix for handling spurious interrupts from DSP
  ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro 360
  ALSA: pci: lx6464es: fix a debug loop
  ASoC: rt715-sdca: fix clock stop prepare timeout issue
2023-02-10 08:37:48 -08:00
Jocelyn Falempe
7484a5bc15 drm/ast: Fix start address computation
During the driver conversion to shmem, the start address for the
scanout buffer was set to the base PCI address.
In most cases it works because only the lower 24bits are used, and
due to alignment it was almost always 0.
But on some unlucky hardware, it's not the case, and some uninitialized
memory is displayed on the BMC.
With shmem, the primary plane is always at offset 0 in GPU memory.

 * v2: rewrite the patch to set the offset to 0. (Thomas Zimmermann)
 * v3: move the change to plane_init() and also fix the cursor plane.
       (Jammy Huang)

Tested on a sr645 affected by this bug.

Fixes: f2fa5a99ca ("drm/ast: Convert ast to SHMEM")
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jammy Huang <jammy_huang@aspeedtech.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230209094417.21630-1-jfalempe@redhat.com
2023-02-10 14:32:57 +01:00
Tom Lendacky
493a2c2d23 Documentation/hw-vuln: Add documentation for Cross-Thread Return Predictions
Add the admin guide for the Cross-Thread Return Predictions vulnerability.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <60f9c0b4396956ce70499ae180cb548720b25c7e.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-10 07:27:37 -05:00
Tom Lendacky
6f0f2d5ef8 KVM: x86: Mitigate the cross-thread return address predictions bug
By default, KVM/SVM will intercept attempts by the guest to transition
out of C0. However, the KVM_CAP_X86_DISABLE_EXITS capability can be used
by a VMM to change this behavior. To mitigate the cross-thread return
address predictions bug (X86_BUG_SMT_RSB), a VMM must not be allowed to
override the default behavior to intercept C0 transitions.

Use a module parameter to control the mitigation on processors that are
vulnerable to X86_BUG_SMT_RSB. If the processor is vulnerable to the
X86_BUG_SMT_RSB bug and the module parameter is set to mitigate the bug,
KVM will not allow the disabling of the HLT, MWAIT and CSTATE exits.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4019348b5e07148eb4d593380a5f6713b93c9a16.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-10 07:27:37 -05:00