For disabling ASPM during NAPI poll we'll have to unlock access
to the config registers in atomic context. Other code parts
running with config register access unlocked are partially
longer and can sleep. Add a usage counter to enable parallel
execution of code parts requiring unlocked config registers.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For disabling ASPM during NAPI poll we'll have to access mac ocp
registers in atomic context. This could result in races because
a mac ocp read consists of a write to register OCPDR, followed
by a read from the same register. Therefore add a spinlock to
protect access to mac ocp registers.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the feature was added it was enabled for SW timestamps only but
with current hardware the same out-of-order timestamps can be seen.
Let's expand the area for the feature to all types of timestamps.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zero-length arrays as fake flexible arrays are deprecated and we are
moving towards adopting C99 flexible-array members instead.
Transform zero-length array into flexible-array member in struct
nx_cardrsp_rx_ctx_t.
Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c:361:26: warning: array subscript <unknown> is outside array bounds of ‘char[0]’ [-Warray-bounds=]
drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c:372:25: warning: array subscript <unknown> is outside array bounds of ‘char[0]’ [-Warray-bounds=]
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].
Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/265
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZAZ57I6WdQEwWh7v@work
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Puranjay Mohan says:
====================
This series add the support of the ARM architecture to libbpf USDT. This
involves implementing the parse_usdt_arg() function for ARM.
It was seen that the last part of parse_usdt_arg() is repeated for all architectures,
so, the first patch in this series refactors these functions and moved the post
processing to parse_usdt_spec()
Changes in V2[1] to V3:
- Use a tabular approach to find register offsets.
- Add the patch for refactoring parse_usdt_arg()
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Parsing of USDT arguments is architecture-specific; on arm it is
relatively easy since registers used are r[0-10], fp, ip, sp, lr,
pc. Format is slightly different compared to aarch64; forms are
- "size @ [ reg, #offset ]" for dereferences, for example
"-8 @ [ sp, #76 ]" ; " -4 @ [ sp ]"
- "size @ reg" for register values; for example
"-4@r0"
- "size @ #value" for raw values; for example
"-8@#1"
Add support for parsing USDT arguments for ARM architecture.
To test the above changes QEMU's virt[1] board with cortex-a15
CPU was used. libbpf-bootstrap's usdt example[2] was modified to attach
to a test program with DTRACE_PROBE1/2/3/4... probes to test different
combinations.
[1] https://www.qemu.org/docs/master/system/arm/virt.html
[2] https://github.com/libbpf/libbpf-bootstrap/blob/master/examples/c/usdt.bpf.c
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-3-puranjay12@gmail.com
The parse_usdt_arg() function is defined differently for each
architecture but the last part of the function is repeated
verbatim for each architecture.
Refactor parse_usdt_arg() to fill the arg_sz and then do the repeated
post-processing in parse_usdt_spec().
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-2-puranjay12@gmail.com
Coverity reported a potential underflow of the offset variable used in
the find_cd() function. Switch to using a signed 64 bit integer for the
representation of offset to make sure we can never underflow.
Fixes: 1eebcb6063 ("libbpf: Implement basic zip archive parsing support")
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307215504.837321-1-deso@posteo.net
i225/i226 parts used only one media type copper. The copper media type is
not replaceable. Clean up the code accordingly, and remove the obsolete
media replacement and reset options.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add ConfigChangeError(qbv_config_change_errors) when user try to set the
AdminBaseTime to past value while the current GCL is still running.
The ConfigChangeError counter should not be increased when a gate control
list is scheduled into the future.
User can use "ethtool -S <interface> | grep qbv_config_change_errors"
command to check the counter values.
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In monitor mode, we try to report the EOF bit on the
first MPDU of an A-MPDU (hardware duplicates this bit
over all MPDUs, so it's only trustable on the first).
However, due to reshuffling in an ealier commit, the
toggle_bit != mvm->ampdu_toggle logic can no longer
work since mvm->ampdu_toggle is now set before this
code runs.
Fix this by tracking the first_subframe status in the
phy data struct and using that instead of checking.
Fixes: f1490546be ("wifi: iwlwifi: mvm: rxmq: refactor mac80211 rx_status setting")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230305124407.e273aa0d3fdc.I77db4cc247898eae8a98b80659386d6737052b95@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
With older compilers like gcc-9, the calculation of the vlan
priority field causes a false-positive warning from the byteswap:
In file included from drivers/net/ethernet/intel/ice/ice_tc_lib.c:4:
drivers/net/ethernet/intel/ice/ice_tc_lib.c: In function 'ice_parse_cls_flower':
include/uapi/linux/swab.h:15:15: error: integer overflow in expression '(int)(short unsigned int)((int)match.key-><U67c8>.<U6698>.vlan_priority << 13) & 57344 & 255' of type 'int' results in '0' [-Werror=overflow]
15 | (((__u16)(x) & (__u16)0x00ffU) << 8) | \
| ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
include/uapi/linux/swab.h:106:2: note: in expansion of macro '___constant_swab16'
106 | ___constant_swab16(x) : \
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/byteorder/little_endian.h:42:43: note: in expansion of macro '__swab16'
42 | #define __cpu_to_be16(x) ((__force __be16)__swab16((x)))
| ^~~~~~~~
include/linux/byteorder/generic.h:96:21: note: in expansion of macro '__cpu_to_be16'
96 | #define cpu_to_be16 __cpu_to_be16
| ^~~~~~~~~~~~~
drivers/net/ethernet/intel/ice/ice_tc_lib.c:1458:5: note: in expansion of macro 'cpu_to_be16'
1458 | cpu_to_be16((match.key->vlan_priority <<
| ^~~~~~~~~~~
After a change to be16_encode_bits(), the code becomes more
readable to both people and compilers, which avoids the warning.
Fixes: 34800178b3 ("ice: Add support for VLAN priority filters in switchdev")
Suggested-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There were few smatch warnings reported by Dan:
- ice_vsi_cfg_xdp_txqs can return 0 instead of ret, which is cleaner
- return values in ice_vsi_cfg_def were ignored
- in ice_vsi_rebuild return value was ignored in case rebuild failed,
it was a never reached code, however, rewrite it for clarity.
- ice_vsi_cfg_tc can return 0 instead of ret
Fixes: 6624e780a5 ("ice: split ice_vsi_setup into smaller functions")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When creating the TLV to send to the FW for configuring DSCP mode PFC,the
PFCENABLE field was being masked with a 4 bit mask (0xF), but this is an 8
bit bitmask for enabled classes for PFC. This means that traffic classes
4-7 could not be enabled for PFC.
Remove the mask completely, as it is not necessary, as we are assigning 8
bits to an 8 bit field.
Fixes: 2a87bd73e5 ("ice: Add DSCP support")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit 596ff4a09b ("cpumask: re-introduce constant-sized cpumask
optimizations") changed cpumask_setall() to use "bitmap_set()" instead
of "bitmap_fill()", because bitmap_fill() would explicitly set all the
bits of a constant sized small bitmap, and that's exactly what we don't
want: we want to only set bits up to 'nr_cpu_ids', which is what
"bitmap_set()" does.
However, Yury correctly points out that while "bitmap_set()" does indeed
only set bits up to the required bitmap size, it doesn't _clear_ bits
above that size, so the upper bits would still not have well-defined
values.
Now, none of this should really matter, since any bits set past
'nr_cpu_ids' should always be ignored in the first place. Yes, the bit
scanning functions might return them as a result, but since users should
always consider the ">= nr_cpu_ids" condition to mean "no more bits",
that shouldn't have any actual effect (see previous commit 8ca09d5fa3
"cpumask: fix incorrect cpumask scanning result checks").
But let's just do it right, the way the code was _intended_ to work. We
have had enough lazy code that works but bites us in the *rse later
(again, see previous commit) that there's no reason to not just do this
properly.
It turns out that "bitmap_fill()" gets this all right for the complex
case, and really only fails for the inlined optimized case that just
fills the whole word. And while we could just fix bitmap_fill() to use
the proper last word mask, there's two issues with that:
- the cpumask case wants to do the _optimization_ based on "NR_CPUS is
a small constant", but then wants to do the actual bit _fill_ based
on "nr_cpu_ids" that isn't necessarily that same constant
- we have lots of non-cpumask users of bitmap_fill(), and while they
hopefully don't care, and probably would want the proper semantics
anyway ("only set bits up to the limit"), I do not want the cpumask
changes to impact other parts
So this ends up just doing the single-word optimization by hand in the
cpumask code. If our cpumask is fundamentally limited to a single word,
just do the proper "fill in that word" exactly. And if it's the more
complex multi-word case, then the generic bitmap_fill() will DTRT.
This is all an example of how our bitmap function optimizations really
are somewhat broken. They conflate the "this is size of the bitmap"
optimizations with the actual bit(s) we want to set.
In many cases we really want to have the two be separate things:
sometimes we base our optimizations on the size of the whole bitmap ("I
know this whole bitmap fits in a single word, so I'll just use
single-word accesses"), and sometimes we base them on the bit we are
looking at ("this is just acting on bits that are in the first word, so
I'll use single-word accesses").
Notice how the end result of the two optimizations are the same, but the
way we get to them are quite different.
And all our cpumask optimization games are really about that fundamental
distinction, and we'd often really want to pass in both the "this is the
bit I'm working on" (which _can_ be a small constant but might be
variable), and "I know it's in this range even if it's variable" (based
on CONFIG_NR_CPUS).
So this cpumask_setall() implementation just makes that explicit. It
checks the "I statically know the size is small" using the known static
size of the cpumask (which is what that 'small_cpumask_bits' is all
about), but then sets the actual bits using the exact number of cpus we
have (ie 'nr_cpumask_bits')
Of course, in a perfect world, the compiler would have done all the
range analysis (possibly with help from us just telling it that
"this value is always in this range"), and would do all of this for us.
But that is not the world we live in.
While we dream of that perfect world, this does that manual logic to
make it all work out. And this was a very long explanation for a small
code change that shouldn't even matter.
Reported-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/lkml/ZAV9nGG9e1%2FrV+L%2F@yury-laptop/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yafang Shao says:
====================
Currently we can't get bpf memory usage reliably either from memcg or
from bpftool.
In memcg, there's not a 'bpf' item in memory.stat, but only 'kernel',
'sock', 'vmalloc' and 'percpu' which may related to bpf memory. With
these items we still can't get the bpf memory usage, because bpf memory
usage may far less than the kmem in a memcg, for example, the dentry may
consume lots of kmem.
bpftool now shows the bpf memory footprint, which is difference with bpf
memory usage. The difference can be quite great in some cases, for example,
- non-preallocated bpf map
The non-preallocated bpf map memory usage is dynamically changed. The
allocated elements count can be from 0 to the max entries. But the
memory footprint in bpftool only shows a fixed number.
- bpf metadata consumes more memory than bpf element
In some corner cases, the bpf metadata can consumes a lot more memory
than bpf element consumes. For example, it can happen when the element
size is quite small.
- some maps don't have key, value or max_entries
For example the key_size and value_size of ringbuf is 0, so its
memlock is always 0.
We need a way to show the bpf memory usage especially there will be more
and more bpf programs running on the production environment and thus the
bpf memory usage is not trivial.
This patchset introduces a new map ops ->map_mem_usage to calculate the
memory usage. Note that we don't intend to make the memory usage 100%
accurate, while our goal is to make sure there is only a small difference
between what bpftool reports and the real memory. That small difference
can be ignored compared to the total usage. That is enough to monitor
the bpf memory usage. For example, the user can rely on this value to
monitor the trend of bpf memory usage, compare the difference in bpf
memory usage between different bpf program versions, figure out which
maps consume large memory, and etc.
This patchset implements the bpf memory usage for all maps, and yet there's
still work to do. We don't want to introduce runtime overhead in the
element update and delete path, but we have to do it for some
non-preallocated maps,
- devmap, xskmap
When we update or delete an element, it will allocate or free memory.
In order to track this dynamic memory, we have to track the count in
element update and delete path.
- cpumap
The element size of each cpumap element is not determinated. If we
want to track the usage, we have to count the size of all elements in
the element update and delete path. So I just put it aside currently.
- local_storage, bpf_local_storage
When we attach or detach a cgroup, it will allocate or free memory. If
we want to track the dynamic memory, we also need to do something in
the update and delete path. So I just put it aside currently.
- offload map
The element update and delete of offload map is via the netdev dev_ops,
in which it may dynamically allocate or free memory, but this dynamic
memory isn't counted in offload map memory usage currently.
The result of each map can be found in the individual patch.
We may also need to track per-container bpf memory usage, that will be
addressed by a different patchset.
Changes:
v3->v4: code improvement on ringbuf (Andrii)
use READ_ONCE() to read lpm_trie (Tao)
explain why we can't get bpf memory usage from memcg.
v2->v3: check callback at map creation time and avoid warning (Alexei)
fix build error under CONFIG_BPF=n (lkp@intel.com)
v1->v2: calculate the memory usage within bpf (Alexei)
- [v1] bpf, mm: bpf memory usage
https://lwn.net/Articles/921991/
- [RFC PATCH v2] mm, bpf: Add BPF into /proc/meminfo
https://lwn.net/Articles/919848/
- [RFC PATCH v1] mm, bpf: Add BPF into /proc/meminfo
https://lwn.net/Articles/917647/
- [RFC PATCH] bpf, mm: Add a new item bpf into memory.stat
https://lore.kernel.org/bpf/20220921170002.29557-1-laoar.shao@gmail].com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A new helper is introduced to calculate xskmap memory usage.
The xfsmap memory usage can be dynamically changed when we add or remove
a xsk_map_node. Hence we need to track the count of xsk_map_node to get
its memory usage.
The result as follows,
- before
10: xskmap name count_map flags 0x0
key 4B value 4B max_entries 65536 memlock 524288B
- after
10: xskmap name count_map flags 0x0 <<< no elements case
key 4B value 4B max_entries 65536 memlock 524608B
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-17-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
sockmap and sockhash don't have something in common in allocation, so let's
introduce different helpers to calculate their memory usage.
The reuslt as follows,
- before
28: sockmap name count_map flags 0x0
key 4B value 4B max_entries 65536 memlock 524288B
29: sockhash name count_map flags 0x0
key 4B value 4B max_entries 65536 memlock 524288B
- after
28: sockmap name count_map flags 0x0
key 4B value 4B max_entries 65536 memlock 524608B
29: sockhash name count_map flags 0x0 <<<< no updated elements
key 4B value 4B max_entries 65536 memlock 1048896B
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-16-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A new helper is introduced into bpf_local_storage map to calculate the
memory usage. This helper is also used by other maps like
bpf_cgrp_storage, bpf_inode_storage, bpf_task_storage and etc.
Note that currently the dynamically allocated storage elements are not
counted in the usage, since it will take extra runtime overhead in the
elements update or delete path. So let's put it aside now, and implement
it in the future when someone really need it.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-15-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>