Commit Graph

1172621 Commits

Author SHA1 Message Date
Florian Fainelli
bdd034de3a mailmap: add an entry for Leonard Crestez
Link: https://lkml.kernel.org/r/20230324130737.3360169-1-f.fainelli@gmail.com
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
Cc: Leonard Crestez <cdleonard@gmail.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Vasily Averin <vasily.averin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 15:24:32 -07:00
Muchun Song
1f2803b266 mm: kfence: fix handling discontiguous page
The struct pages could be discontiguous when the kfence pool is allocated
via alloc_contig_pages() with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP.

This may result in setting PG_slab and memcg_data to a arbitrary
address (may be not used as a struct page), which in the worst case
might corrupt the kernel.

So the iteration should use nth_page().

Link: https://lkml.kernel.org/r/20230323025003.94447-1-songmuchun@bytedance.com
Fixes: 0ce20dd840 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 15:24:32 -07:00
Muchun Song
3ee2d7471f mm: kfence: fix PG_slab and memcg_data clearing
It does not reset PG_slab and memcg_data when KFENCE fails to initialize
kfence pool at runtime.  It is reporting a "Bad page state" message when
kfence pool is freed to buddy.  The checking of whether it is a compound
head page seems unnecessary since we already guarantee this when
allocating kfence pool.   Remove the check to simplify the code.

Link: https://lkml.kernel.org/r/20230320030059.20189-1-songmuchun@bytedance.com
Fixes: 0ce20dd840 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 15:24:32 -07:00
Shiyang Ruan
e900ba10d1 fsdax: dedupe should compare the min of two iters' length
In an dedupe comparison iter loop, the length of iomap_iter decreases
because it implies the remaining length after each iteration.

The dedupe command will fail with -EIO if the range is larger than one 
page size and not aligned to the page size.  Also report warning in dmesg:

[ 4338.498374] ------------[ cut here ]------------
[ 4338.498689] WARNING: CPU: 3 PID: 1415645 at fs/iomap/iter.c:16 
...

The compare function should use the min length of the current iters,
not the total length.

Link: https://lkml.kernel.org/r/1679469958-2-1-git-send-email-ruansy.fnst@fujitsu.com
Fixes: 0e79e3736d ("fsdax: dedupe: iter two files at the same time")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 15:24:32 -07:00
Shiyang Ruan
13dd4e0462 fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN
unshare copies data from source to destination.  But if the source is
HOLE or UNWRITTEN extents, we should zero the destination, otherwise
the HOLE or UNWRITTEN part will be user-visible old data of the new
allocated extent.

Found by running generic/649 while mounting with -o dax=always on pmem.

Link: https://lkml.kernel.org/r/1679483469-2-1-git-send-email-ruansy.fnst@fujitsu.com
Fixes: d984648e42 ("fsdax,xfs: port unshare to fsdax")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 15:24:32 -07:00
Tiezhu Yang
f478b9987c lib/Kconfig.debug: correct help info of LOCKDEP_STACK_TRACE_HASH_BITS
We can see the following definition in kernel/locking/lockdep_internals.h:

  #define STACK_TRACE_HASH_SIZE	(1 << CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS)

CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS is related with STACK_TRACE_HASH_SIZE
instead of MAX_STACK_TRACE_ENTRIES, fix it.

Link: https://lkml.kernel.org/r/1679380508-20830-1-git-send-email-yangtiezhu@loongson.cn
Fixes: 5dc33592e9 ("lockdep: Allow tuning tracing capacity constants.")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 15:24:31 -07:00
ye xingchen
35260cf545 Kconfig.debug: fix SCHED_DEBUG dependency
The path for SCHED_DEBUG is /sys/kernel/debug/sched.  So, SCHED_DEBUG
should depend on DEBUG_FS, not PROC_FS.

Link: https://lkml.kernel.org/r/202301291110098787982@zte.com.cn
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 15:24:31 -07:00
Leonard Göhrs
1a4b52ce85 .mailmap: add entry for Leonard Göhrs
My very first kernel commit:

  e4e1d47c79 ("ALSA: ppc: remove redundant checks in PS3 driver probe")

was sent with the umlaut in my last name transcribed (Göhrs -> Goehrs).

Add a mailmap entry so all my commits use the same name.

Link: https://lkml.kernel.org/r/20230321145525.1317230-1-l.goehrs@pengutronix.de
Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 15:24:31 -07:00
Andrii Nakryiko
07561769e0 Merge branch 'verifier/xdp_direct_packet_access.c converted to inline assembly'
Eduard Zingerman says:

====================

verifier/xdp_direct_packet_access.c automatically converted to inline
assembly using [1].

This is a leftover from [2], the last patch in a batch was blocked by
mail server for being too long. This patch-set splits it in two:
- one to add migrated test to progs/
- one to remove old test from verifier/

[1] Migration tool
    https://github.com/eddyz87/verifier-tests-migrator
[2] First batch of migrated verifier/*.c tests
    https://lore.kernel.org/bpf/167979433109.17761.17302808621381963629.git-patchwork-notify@kernel.org/
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-03-28 14:48:28 -07:00
Eduard Zingerman
c63a7d8bbb selftests/bpf: Remove verifier/xdp_direct_packet_access.c, converted to progs/verifier_xdp_direct_packet_access.c
Removing verifier/xdp_direct_packet_access.c.c as it was automatically converted to use
inline assembly in the previous commit. It is available in
progs/verifier_xdp_direct_packet_access.c.c.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230328020813.392560-3-eddyz87@gmail.com
2023-03-28 14:48:27 -07:00
Eduard Zingerman
6e9e141a7a selftests/bpf: Verifier/xdp_direct_packet_access.c converted to inline assembly
Test verifier/xdp_direct_packet_access.c automatically converted to use inline assembly.
Original test would be removed in the next patch.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230328020813.392560-2-eddyz87@gmail.com
2023-03-28 14:48:27 -07:00
Dragos Tatulea
3905f8d64c net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats
The recycle parameter used during page release is no longer
necessary: the page pool can detect when the page cannot be
recycled to the cache or ring without any outside hint.

The page pool will also take care of cleaning up after itself
once all the inflight pages have been released. So no need to
explicitly release pages to the system.

Remove the internal page_cache stats as the mlx5e_page_cache
struct no longer exists.

Delete the documentation entries along with the stats.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:59 -07:00
Dragos Tatulea
cd640b0503 net/mlx5e: RX, Break the wqe bulk refill in smaller chunks
To avoid overflowing the page pool's cache, don't release the
whole bulk which is usually larger than the cache refill size.
Group release+alloc instead into cache refill units that
allow releasing to the cache and then allocating from the cache.

A refill_unit variable is added as a iteration unit over the
wqe_bulk when doing release+alloc.

For a single ring, single core, default MTU (1500) TCP stream
test the number of pages allocated from the cache directly
(rx_pp_recycle_cached) increases from 0% to 52%:

+---------------------------------------------+
| Page Pool stats (/sec)  |  Before |   After |
+-------------------------+---------+---------+
|rx_pp_alloc_fast         | 2145422 | 2193802 |
|rx_pp_alloc_slow         |       2 |       0 |
|rx_pp_alloc_empty        |       2 |       0 |
|rx_pp_alloc_refill       |   34059 |   16634 |
|rx_pp_alloc_waive        |       0 |       0 |
|rx_pp_recycle_cached     |       0 | 1145818 |
|rx_pp_recycle_cache_full |       0 |       0 |
|rx_pp_recycle_ring       | 2179361 | 1064616 |
|rx_pp_recycle_ring_full  |     121 |       0 |
+---------------------------------------------+

With this patch, the performance for legacy rq for the above test is
back to baseline.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:59 -07:00
Dragos Tatulea
4ba2b4988c net/mlx5e: RX, Increase WQE bulk size for legacy rq
Deferred page release was added to legacy rq but its desired effect
(driver releases last fragment to page pool cache) is not yet visible
due to the WQE bulks being too small.

This patch increases the WQE bulk size to span 512 KB and clip it to
one quarter of the rx queue size.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:59 -07:00
Dragos Tatulea
76238d0fbd net/mlx5e: RX, Split off release path for xsk buffers for legacy rq
Don't mix xsk buffer releases with page releases anymore. This is
needed for handling of deferred page release.

Add a new bulk free function for xsk buffers from wqe frags.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:59 -07:00
Dragos Tatulea
3f93f82988 net/mlx5e: RX, Defer page release in legacy rq for better recycling
Currently, fragmented pages from the page pool can be released
in two ways:

1) In the mlx5e driver when trimming off the unused fragments AND the
associated skb fragments have been released. This path allows
recycling of pages to the page pool cache (allow_direct == true).

2) On the skb release path (last fragment release), which
will always release pages to the page pool ring
(allow_direct == false).

Whichever is releasing the last fragment will be decisive on
where the page gets released: the cache or the ring. So we
obviously want to maximize for doing the release from 1.

This patch does that by deferring the release of page fragments
right before requesting new ones from the page pool. A flag is
added to make sure that there's no release before first alloc
and that XDP_TX fragments are not released prematurely.

This is a preparation patch that doesn't unlock the performance
improvements yet. A followup patch will do that.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:59 -07:00
Dragos Tatulea
625dff29df net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags
Change the bool flag to a bitfield as we'll use it in a downstream patch
in the series to add signaling about skipping a fragment release.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:58 -07:00
Dragos Tatulea
4c2a132368 net/mlx5e: RX, Defer page release in striding rq for better recycling
Currently, for striding RQ, fragmented pages from the page pool can
get released in two ways:

1) In the mlx5e driver when trimming off the unused fragments AND the
   associated skb fragments have been released. This path allows
   recycling of pages to the page pool cache (allow_direct == true).

2) On the skb release path (last fragment release), which
   will always release pages to the page pool ring
   (allow_direct == false).

Whichever is releasing the last fragment will be decisive on
where the page gets released: the cache or the ring. So we
obviously want to maximize for doing the release from 1.

This patch does that by deferring the release of page fragments
right before requesting new ones from the page pool. Extra care
needs to be taken for the corner cases:

* On first call, make sure that release is not called. The
  skip_release_bitmap is used for this purpose.

* On rq shutdown, make sure that all wqes that were not
  in the linked list are released.

For a single ring, single core, default MTU (1500) TCP stream
test the number of pages allocated from the cache directly
(rx_pp_recycle_cached) increases from 31 % to 98 %:

+----------------------------------------------+
| Page Pool stats (/sec)  |  Before |   After  |
+-------------------------+---------+----------+
|rx_pp_alloc_fast         | 2137754 |  2261033 |
|rx_pp_alloc_slow         |      47 |        9 |
|rx_pp_alloc_empty        |      47 |        9 |
|rx_pp_alloc_refill       |   23230 |      819 |
|rx_pp_alloc_waive        |       0 |        0 |
|rx_pp_recycle_cached     |  672182 |  2209015 |
|rx_pp_recycle_cache_full |    1789 |        0 |
|rx_pp_recycle_ring       | 1485848 |    52259 |
|rx_pp_recycle_ring_full  |    3003 |      584 |
+----------------------------------------------+

With this patch, the performance in striding rq for the above test is
back to baseline.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:58 -07:00
Dragos Tatulea
38a36efccd net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name
The xdp_xmit_bitmap currently serves only one purpose: to avoid
releasing pages that are still in use due to XDP TX.

A following patch will use this bitmap in a slightly different context
but for the same purpose. So rename the bitmap to a more generic name
that reflects the purpose not the context.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:58 -07:00
Dragos Tatulea
6f57428460 net/mlx5e: RX, Enable skb page recycling through the page_pool
Start using the page_pool skb recycling api to recycle all pages back to
the page pool and stop using atomic page reference counting.

The mlx5e driver used to manage in-flight pages using page refcounting:
for each fragment there were 2 atomic write operations happening (one
for building the skb and one on skb release).

The page_pool api introduced a method to track page fragments more
optimally:
* The page's pp_fragment_count is set to a large bias on page alloc
  (1 x atomic write operation).
* The driver tracks the actual page fragments in a non atomic variable.
* When the skb is recycled, pp_fragment_count is decremented
  (atomic write operation).
* When page is released in the driver, the unused number of fragments
  (relative to the bias) is deducted from pp_fragment_count (atomic
  write operation).
* Last page defragmentation will only be an atomic read.

So in total there are `number of fragments + 1` atomic write ops. As
opposed to previously: `2 * frags` atomic writes ops.

Pages are wrapped in a mlx5e_frag_page structure which also contains the
number of fragments. This makes it easy to count the fragments in the
driver.

This change brings performance improvements for the case when the old rx
page_cache had low recycling rates due to head of queue blocking. For a
iperf3 TCP test with a single stream, on a single core (iperf and receive
queue running on same core), the following improvements can be noticed:

* Striding rq:
  - before (net-next baseline): bitrate = 30.1 Gbits/sec
  - after                     : bitrate = 31.4 Gbits/sec (diff: 4.14 %)

* Legacy rq:
  - before (net-next baseline): bitrate = 30.2 Gbits/sec
  - after                     : bitrate = 33.0 Gbits/sec (diff: 8.48 %)

There are 2 temporary performance degradations introduced:

1) TCP streams that had a good recycling rate with the old page_cache
   have a degradation for both striding and linear rq. This is due to
   very low page pool cache recycling: the pages are released during skb
   recycle which will release pages to the page pool ring for safety.
   The following patches in this series will tackle this problem by
   deferring the page release in the driver to increase the
   chance of having pages recycled to the cache.

2) XDP performance is now lower (4-5 %) due to the higher number of
   atomic operations used for fragment management. But this opens the
   door for supporting multiple packets per page in XDP, which will
   bring a big gain.

Otherwise, performance is similar to baseline.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:58 -07:00
Dragos Tatulea
4a5c5e2500 net/mlx5e: RX, Enable dma map and sync from page_pool allocator
Remove driver dma mapping and unmapping of pages. Let the
page_pool api do it.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:58 -07:00
Dragos Tatulea
08c9b61b07 net/mlx5e: RX, Remove internal page_cache
This patch removes the internal rx page_cache and uses the generic
page_pool api only. It used to be that the page_pool couldn't handle all
the mlx5 driver usecases, but with the introduction of skb recycling and
page fragmentaton in the page_pool full switch can now be made. Some
benfits of this transition:
* Better page recycling in the cases when the page_cache was suffering
  from head of queue blocking. The page_pool doesn't have this issue.
* DMA mapping/unmapping can be managed by the page_pool.
* mlx5e_rq size reduced by more than 50% due to the page_cache array
  being deleted.

This patch only removes the page_cache. Downstream patches will enable
the required page_pool features and will add further fine-tuning.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:57 -07:00
Dragos Tatulea
ca6ef9f031 net/mlx5e: RX, Store SHAMPO header pages in array
Save allocated SHAMPO header pages to an array to which the
mlx5e_dma_info page will point to.

This change is a preparation for introducing mlx5e_frag_page structure
in a downstream patch. There's no new functionality introduced.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:57 -07:00
Dragos Tatulea
d39092caae net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
This change removes the usage of mlx5e_alloc_unit union for
striding rq. The change is more straightforward than legacy rq as
the alloc units union is already in place.

This patch only moves things around: instead of an array of unions make
it a union of arrays. This has the effect that each mlx5e_mpw_info will
allocate the largest possible size of the array member. For xsk this
means that the array of xdp_buff pointers for the wqe will still be
contiguous, but there will be some extra unused bytes at the end of the
array.

As further patch in the series will add the mlx5e_frag_page struct for
which the described size constraint will no longer hold.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:57 -07:00
Dragos Tatulea
8fb1814f58 net/mlx5e: RX, Remove alloc unit layout constraint for legacy rq
The mlx5e_alloc_unit union is conveniently used to store arrays of
pointers to struct page or struct xdp_buff (for xsk). The union is
currently expected to have the size of a pointer for xsk batch
allocations to work. This is conveniet for the current state of the
code but makes it impossible to add a structure of a different size
to the alloc unit.

A further patch in the series will add the mlx5e_frag_page struct for
which the described size constraint will no longer hold.

This change removes the usage of mlx5e_alloc_unit union for legacy rq:

- A union of arrays is introduced (mlx5e_alloc_units) to replace the
  array of unions to allow structures of different sizes.

- Each fragment has a pointer to a unit in the mlx5e_alloc_units array.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:57 -07:00
Dragos Tatulea
09df037017 net/mlx5e: RX, Remove mlx5e_alloc_unit argument in page allocation
Change internal page cache and page pool api to use a struct page **
instead of a mlx5e_alloc_unit *.

This is the first change in a series which is meant to remove the
mlx5e_alloc_unit altogether.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:56 -07:00
Linus Torvalds
fcd476ea6a Merge tag 'urgent-rcu.2023.03.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
 "This brings the rcu_torture_read event trace into line with the new
  trace tools by replacing this event trace's __field() with the
  corresponding __array().

  Without this, the new trace tools will fail when presented wtih an
  rcu_torture_read event trace, which is a regression from the viewpoint
  of trace tools users"

Link: https://lore.kernel.org/all/20230320133650.5388a05e@gandalf.local.home/

* tag 'urgent-rcu.2023.03.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu: Fix rcu_torture_read ftrace event
2023-03-28 13:28:52 -07:00
Linus Torvalds
756c1a0593 Merge tag 'linux-kselftest-fixes-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
 "One single fix for sigaltstack test -Wuninitialized warning found when
  building with clang"

* tag 'linux-kselftest-fixes-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: sigaltstack: fix -Wuninitialized
2023-03-28 13:14:47 -07:00
Rafael J. Wysocki
b57841fb0b thermal: core: Drop excessive lockdep_assert_held() calls
The lockdep_assert_held() calls added to cooling_device_stats_setup()
and cooling_device_stats_destroy() by commit 790930f442 ("thermal:
core: Introduce thermal_cooling_device_update()") trigger false-positive
lockdep reports in code paths that are not subject to race conditions
(before cooling device registration and after cooling device removal).

For this reason, remove the lockdep_assert_held() calls from both
cooling_device_stats_setup() and cooling_device_stats_destroy() and
add one to thermal_cooling_device_stats_reinit() that has to be called
under the cdev lock.

Fixes: 790930f442 ("thermal: core: Introduce thermal_cooling_device_update()")
Link: https://lore.kernel.org/linux-acpi/ZCIDTLFt27Ei7+V6@ideak-desk.fi.intel.com
Reported-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-28 20:49:47 +02:00
Linus Torvalds
05c24161f4 Merge tag 's390-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:

 - Fix an error handling issue with PTRACE_GET_LAST_BREAK request so
   that -EFAULT is returned if put_user() fails, instead of ignoring it

 - Fix a build race for the modules_prepare target when
   CONFIG_EXPOLINE_EXTERN is enabled by reintroducing the dependence on
   scripts

 - Fix a memory leak in vfio_ap device driver

 - Add missing earlyclobber annotations to __clear_user() inline
   assembly to prevent incorrect register allocation

* tag 's390-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/ptrace: fix PTRACE_GET_LAST_BREAK error handling
  s390: reintroduce expoline dependence to scripts
  s390/vfio-ap: fix memory leak in vfio_ap device driver
  s390/uaccess: add missing earlyclobber annotations to __clear_user()
2023-03-28 10:43:04 -07:00
Jakob Koschel
e9a1cc2e4c ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
The code implicitly assumes that the list iterator finds a correct
handle. If 'vsi_handle' is not found the 'old_agg_vsi_info' was
pointing to an bogus memory location. For safety a separate list
iterator variable should be used to make the != NULL check on
'old_agg_vsi_info' correct under any circumstances.

Additionally Linus proposed to avoid any use of the list iterator
variable after the loop, in the attempt to move the list iterator
variable declaration into the macro to avoid any potential misuse after
the loop. Using it in a pointer comparison after the loop is undefined
behavior and should be omitted if possible [1].

Fixes: 37c592062b ("ice: remove the VSI info from previous agg")
Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Signed-off-by: Jakob Koschel <jkl820.git@gmail.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-28 09:48:49 -07:00
Junfeng Guo
29486b6df3 ice: add profile conflict check for AVF FDIR
Add profile conflict check while adding some FDIR rules to avoid
unexpected flow behavior, rules may have conflict including:
        IPv4 <---> {IPv4_UDP, IPv4_TCP, IPv4_SCTP}
        IPv6 <---> {IPv6_UDP, IPv6_TCP, IPv6_SCTP}

For example, when we create an FDIR rule for IPv4, this rule will work
on packets including IPv4, IPv4_UDP, IPv4_TCP and IPv4_SCTP. But if we
then create an FDIR rule for IPv4_UDP and then destroy it, the first
FDIR rule for IPv4 cannot work on pkt IPv4_UDP then.

To prevent this unexpected behavior, we add restriction in software
when creating FDIR rules by adding necessary profile conflict check.

Fixes: 1f7ea1cd6a ("ice: Enable FDIR Configure for AVF")
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-28 09:48:41 -07:00
Brett Creeley
d94dbdc4e0 ice: Fix ice_cfg_rdma_fltr() to only update relevant fields
The current implementation causes ice_vsi_update() to update all VSI
fields based on the cached VSI context. This also assumes that the
ICE_AQ_VSI_PROP_Q_OPT_VALID bit is set. This can cause problems if the
VSI context is not correctly synced by the driver. Fix this by only
updating the fields that correspond to ICE_AQ_VSI_PROP_Q_OPT_VALID.
Also, make sure to save the updated result in the cached VSI context
on success.

Fixes: 348048e724 ("ice: Implement iidc operations")
Co-developed-by: Robert Malz <robertx.malz@intel.com>
Signed-off-by: Robert Malz <robertx.malz@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Tested-by: Jakub Andrysiak <jakub.andrysiak@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-28 09:43:12 -07:00
Jesse Brandeburg
66ceaa4c45 ice: fix W=1 headers mismatch
make modules W=1 returns:
.../ice/ice_txrx_lib.c:448: warning: Function parameter or member 'first_idx' not described in 'ice_finalize_xdp_rx'
.../ice/ice_txrx.c:948: warning: Function parameter or member 'ntc' not described in 'ice_get_rx_buf'
.../ice/ice_txrx.c:1038: warning: Excess function parameter 'rx_buf' description in 'ice_construct_skb'

Fix these warnings by adding and deleting the deviant arguments.

Fixes: 2fba7dc515 ("ice: Add support for XDP multi-buffer on Rx side")
Fixes: d7956d81f1 ("ice: Pull out next_to_clean bump out of ice_put_rx_buf()")
CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-28 09:42:05 -07:00
Marek Szyprowski
f91bf3272a iommu/exynos: Fix set_platform_dma_ops() callback
There are some subtle differences between release_device() and
set_platform_dma_ops() callbacks, so separate those two callbacks. Device
links should be removed only in release_device(), because they were
created in probe_device() on purpose and they are needed for proper
Exynos IOMMU driver operation. While fixing this, remove the conditional
code as it is not really needed.

Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
Fixes: 189d496b48 ("iommu/exynos: Add missing set_platform_dma_ops callback")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
Link: https://lore.kernel.org/r/20230315232514.1046589-1-m.szyprowski@samsung.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-28 15:33:50 +02:00
Grygorii Strashko
86e2eca4dd net: ethernet: ti: am65-cpsw: enable p0 host port rx_vlan_remap
By default, the tagged ingress packets to the switch from the host port
P0 get internal switch priority assigned equal to the DMA CPPI channel
number they came from, unless CPSW_P0_CONTROL_REG.RX_REMAP_VLAN is enabled.
This causes issues with applying QoS policies and mapping packets on
external port fifos, because the default configuration is vlan_aware and
DMA CPPI channels are shared between all external ports.

Hence enable CPSW_P0_CONTROL_REG.RX_REMAP_VLAN so packet will preserve
internal switch priority assigned following the VLAN(priority) tag no
matter through which DMA CPPI Channels packets enter the switch.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://lore.kernel.org/r/20230327092103.3256118-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28 15:29:50 +02:00
Grygorii Strashko
5c8560c4a1 net: ethernet: ti: am65-cpsw: add .ndo to set dma per-queue rate
Enable rate limiting TX DMA queues for CPSW interface by configuring the
rate in absolute Mb/s units per TX queue.

Example:
    ethtool -L eth0 tx 4

    echo 100 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
    echo 200 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
    echo 50 > /sys/class/net/eth0/queues/tx-2/tx_maxrate
    echo 30 > /sys/class/net/eth0/queues/tx-3/tx_maxrate

    # disable
    echo 0 > /sys/class/net/eth0/queues/tx-0/tx_maxrate

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://lore.kernel.org/r/20230327085758.3237155-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28 15:29:22 +02:00
Kornel Dulęba
b26cd9325b pinctrl: amd: Disable and mask interrupts on resume
This fixes a similar problem to the one observed in:
commit 4e5a04be88 ("pinctrl: amd: disable and mask interrupts on probe").

On some systems, during suspend/resume cycle firmware leaves
an interrupt enabled on a pin that is not used by the kernel.
This confuses the AMD pinctrl driver and causes spurious interrupts.

The driver already has logic to detect if a pin is used by the kernel.
Leverage it to re-initialize interrupt fields of a pin only if it's not
used by us.

Cc: stable@vger.kernel.org
Fixes: dbad75dd1f ("pinctrl: add AMD GPIO driver support.")
Signed-off-by: Kornel Dulęba <korneld@chromium.org>
Link: https://lore.kernel.org/r/20230320093259.845178-1-korneld@chromium.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-03-28 15:20:18 +02:00
Jens Axboe
005308f7bd io_uring/poll: clear single/double poll flags on poll arming
Unless we have at least one entry queued, then don't call into
io_poll_remove_entries(). Normally this isn't possible, but if we
retry poll then we can have ->nr_entries cleared again as we're
setting it up. If this happens for a poll retry, then we'll still have
at least REQ_F_SINGLE_POLL set. io_poll_remove_entries() then thinks
it has entries to remove.

Clear REQ_F_SINGLE_POLL and REQ_F_DOUBLE_POLL unconditionally when
arming a poll request.

Fixes: c16bda3759 ("io_uring/poll: allow some retries for poll triggering spuriously")
Cc: stable@vger.kernel.org
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-28 07:09:01 -06:00
Paolo Abeni
917fd7d6cd Merge branch 'xen-netback-fix-issue-introduced-recently'
Juergen Gross says:

====================
xen/netback: fix issue introduced recently

The fix for XSA-423 introduced a bug which resulted in loss of network
connection in some configurations.

The first patch is fixing the issue, while the second one is removing
a test which isn't needed.
====================

Link: https://lore.kernel.org/r/20230327083646.18690-1-jgross@suse.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28 14:16:42 +02:00
Juergen Gross
8fb8ebf948 xen/netback: remove not needed test in xenvif_tx_build_gops()
The tests for the number of grant mapping or copy operations reaching
the array size of the operations buffer at the end of the main loop in
xenvif_tx_build_gops() isn't needed.

The loop can handle at maximum MAX_PENDING_REQS transfer requests, as
XEN_RING_NR_UNCONSUMED_REQUESTS() is taking unsent responses into
consideration, too.

Remove the tests.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28 14:16:40 +02:00
Juergen Gross
05310f31ca xen/netback: don't do grant copy across page boundary
Fix xenvif_get_requests() not to do grant copy operations across local
page boundaries. This requires to double the maximum number of copy
operations per queue, as each copy could now be split into 2.

Make sure that struct xenvif_tx_cb doesn't grow too large.

Cc: stable@vger.kernel.org
Fixes: ad7f402ae4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28 14:16:40 +02:00
Tanu Malhotra
38518593ec HID: intel-ish-hid: Fix kernel panic during warm reset
During warm reset device->fw_client is set to NULL. If a bus driver is
registered after this NULL setting and before new firmware clients are
enumerated by ISHTP, kernel panic will result in the function
ishtp_cl_bus_match(). This is because of reference to
device->fw_client->props.protocol_name.

ISH firmware after getting successfully loaded, sends a warm reset
notification to remove all clients from the bus and sets
device->fw_client to NULL. Until kernel v5.15, all enabled ISHTP kernel
module drivers were loaded right after any of the first ISHTP device was
registered, regardless of whether it was a matched or an unmatched
device. This resulted in all drivers getting registered much before the
warm reset notification from ISH.

Starting kernel v5.16, this issue got exposed after the change was
introduced to load only bus drivers for the respective matching devices.
In this scenario, cros_ec_ishtp device and cros_ec_ishtp driver are
registered after the warm reset device fw_client NULL setting.
cros_ec_ishtp driver_register() triggers the callback to
ishtp_cl_bus_match() to match ISHTP driver to the device and causes kernel
panic in guid_equal() when dereferencing fw_client NULL pointer to get
protocol_name.

Fixes: f155dfeaa4 ("platform/x86: isthp_eclite: only load for matching devices")
Fixes: facfe0a4fd ("platform/chrome: chros_ec_ishtp: only load for matching devices")
Fixes: 0d0cccc0fd ("HID: intel-ish-hid: hid-client: only load for matching devices")
Fixes: 44e2a58cb8 ("HID: intel-ish-hid: fw-loader: only load for matching devices")
Cc: <stable@vger.kernel.org> # 5.16+
Signed-off-by: Tanu Malhotra <tanu.malhotra@intel.com>
Tested-by: Shaunak Saha <shaunak.saha@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2023-03-28 14:04:03 +02:00
Wolfram Sang
f22c993f31 smsc911x: avoid PHY being resumed when interface is not up
SMSC911x doesn't need mdiobus suspend/resume, that's why it sets
'mac_managed_pm'. However, setting it needs to be moved from init to
probe, so mdiobus PM functions will really never be called (e.g. when
the interface is not up yet during suspend/resume).

Fixes: 3ce9f2bef7 ("net: smsc911x: Stop and start PHY during suspend and resume")
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230327083138.6044-1-wsa+renesas@sang-engineering.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28 13:39:47 +02:00
Greg Kroah-Hartman
4bffd2c7a3 Merge tag 'iio-fixes-for-6.3a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus
Jonathan writes:

1st set of IIO fixes for 6.3

Usual mixed bag:

- core - output buffers
  Fix return of bytes written when only some succeed.
  Fix O_NONBLOCK handling to not block.

- adi,ad7791
  Fix IRQ type.  Not confirmed to have any impact but good to correct it anyway

- adi,adis16400
  Missing CONFIG_CRC32

- capella,cm32181
  Unregister 2nd I2C client if one is used.

- cio-dac
  Fix bitdepth for range check on write.

- linear,ltc2497
  Fix a wrong shift of the LSB introduced when switching to be24 handling.

- maxim,max11410
  Fix handling of return code in read_poll_timeout()

- qcom,spmi-adc
  Fix an accidental change of channel name to include the reg value from OF.

- ti,palmas
  Fix a null dereference on remove due to wrong function used to get the
  drvdata.

- ti,ads7950
  Mark GPIO as can sleep.

* tag 'iio-fixes-for-6.3a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
  iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
  iio: adc: palmas_gpadc: fix NULL dereference on rmmod
  iio: adc: max11410: fix read_poll_timeout() usage
  iio: dac: cio-dac: Fix max DAC write value check for 12-bit
  iio: light: cm32181: Unregister second I2C client if present
  iio: accel: kionix-kx022a: Get the timestamp from the driver's private data in the trigger_handler
  iio: adc: ad7791: fix IRQ flags
  iio: buffer: make sure O_NONBLOCK is respected
  iio: buffer: correctly return bytes written in output buffers
  iio: light: vcnl4000: Fix WARN_ON on uninitialized lock
  iio: adis16480: select CONFIG_CRC32
  drivers: iio: adc: ltc2497: fix LSB shift
  iio: adc: qcom-spmi-adc5: Fix the channel name
2023-03-28 13:30:55 +02:00
Jens Axboe
fd72761894 powerpc: Don't try to copy PPR for task with NULL pt_regs
powerpc sets up PF_KTHREAD and PF_IO_WORKER with a NULL pt_regs, which
from my (arguably very short) checking is not commonly done for other
archs. This is fine, except when PF_IO_WORKER's have been created and
the task does something that causes a coredump to be generated. Then we
get this crash:

  Kernel attempted to read user page (160) - exploit attempt? (uid: 1000)
  BUG: Kernel NULL pointer dereference on read at 0x00000160
  Faulting instruction address: 0xc0000000000c3a60
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries
  Modules linked in: bochs drm_vram_helper drm_kms_helper xts binfmt_misc ecb ctr syscopyarea sysfillrect cbc sysimgblt drm_ttm_helper aes_generic ttm sg libaes evdev joydev virtio_balloon vmx_crypto gf128mul drm dm_mod fuse loop configfs drm_panel_orientation_quirks ip_tables x_tables autofs4 hid_generic usbhid hid xhci_pci xhci_hcd usbcore usb_common sd_mod
  CPU: 1 PID: 1982 Comm: ppc-crash Not tainted 6.3.0-rc2+ #88
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
  NIP:  c0000000000c3a60 LR: c000000000039944 CTR: c0000000000398e0
  REGS: c0000000041833b0 TRAP: 0300   Not tainted  (6.3.0-rc2+)
  MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 88082828  XER: 200400f8
  ...
  NIP memcpy_power7+0x200/0x7d0
  LR  ppr_get+0x64/0xb0
  Call Trace:
    ppr_get+0x40/0xb0 (unreliable)
    __regset_get+0x180/0x1f0
    regset_get_alloc+0x64/0x90
    elf_core_dump+0xb98/0x1b60
    do_coredump+0x1c34/0x24a0
    get_signal+0x71c/0x1410
    do_notify_resume+0x140/0x6f0
    interrupt_exit_user_prepare_main+0x29c/0x320
    interrupt_exit_user_prepare+0x6c/0xa0
    interrupt_return_srr_user+0x8/0x138

Because ppr_get() is trying to copy from a PF_IO_WORKER with a NULL
pt_regs.

Check for a valid pt_regs in both ppc_get/ppr_set, and return an error
if not set. The actual error value doesn't seem to be important here, so
just pick -EINVAL.

Fixes: fa439810cc ("powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[mpe: Trim oops in change log, add Fixes & Cc stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d9f63344-fe7c-56ae-b420-4a1a04a2ae4c@kernel.dk
2023-03-28 22:11:41 +11:00
Benjamin Gray
1abce0580b powerpc/64s: Fix __pte_needs_flush() false positive warning
Userspace PROT_NONE ptes set _PAGE_PRIVILEGED, triggering a false
positive debug assertion that __pte_flags_need_flush() is not called
on a kernel mapping.

Detect when it is a userspace PROT_NONE page by checking the required
bits of PAGE_NONE are set, and none of the RWX bits are set.
pte_protnone() is insufficient here because it always returns 0 when
CONFIG_NUMA_BALANCING=n.

Fixes: b11931e9ad ("powerpc/64s: add pte_needs_flush and huge_pmd_needs_flush")
Cc: stable@vger.kernel.org # v6.1+
Reported-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230302225947.81083-1-bgray@linux.ibm.com
2023-03-28 21:37:23 +11:00
Paolo Abeni
d8b0c963e9 Merge branch 'allocate-multiple-skbuffs-on-tx'
Arseniy Krasnov says:

====================
allocate multiple skbuffs on tx

This adds small optimization for tx path: instead of allocating single
skbuff on every call to transport, allocate multiple skbuff's until
credit space allows, thus trying to send as much as possible data without
return to af_vsock.c.

Also this patchset includes second patch which adds check and return from
'virtio_transport_get_credit()' and 'virtio_transport_put_credit()' when
these functions are called with 0 argument. This is needed, because zero
argument makes both functions to behave as no-effect, but both of them
always tries to acquire spinlock. Moreover, first patch always calls
function 'virtio_transport_put_credit()' with zero argument in case of
successful packet transmission.
====================

Link: https://lore.kernel.org/r/b0d15942-65ba-3a32-ba8d-fed64332d8f6@sberdevices.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28 12:03:54 +02:00
Arseniy Krasnov
e3ec366eb0 virtio/vsock: check argument to avoid no effect call
Both of these functions have no effect when input argument is 0, so to
avoid useless spinlock access, check argument before it.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28 12:03:50 +02:00
Arseniy Krasnov
b68ffb1b3b virtio/vsock: allocate multiple skbuffs on tx
This adds small optimization for tx path: instead of allocating single
skbuff on every call to transport, allocate multiple skbuff's until
credit space allows, thus trying to send as much as possible data without
return to af_vsock.c.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28 12:03:50 +02:00