Previously, only tunnel decap rules required egdev registration for
offload in NFP. These are now supported via indirect TC block callbacks.
Remove the egdev code from NFP.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, TC block tunnel decap rules were only offloaded when a
callback was triggered through registration of the rules egress device.
This meant that the driver had no access to the ingress netdev and so
could not verify it was the same tunnel type that the rule implied.
Register tunnel devices for indirect TC block offloads in NFP, giving
access to new rules based on the ingress device rather than egress. Use
this to verify the netdev type of VXLAN and Geneve based rules and offload
the rules to HW if applicable.
Tunnel registration is done via a netdev notifier. On notifier
registration, this is triggered for already existing netdevs. This means
that NFP can register for offloads from devices that exist before it is
loaded (filter rules will be replayed from the TC core). Similarly, on
notifier unregister, a call is triggered for each currently active netdev.
This allows the driver to unregister any indirect block callbacks that may
still be active.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both the actions and tunnel_conf files contain local functions that check
the type of an input netdev. In preparation for re-use with tunnel offload
via indirect blocks, move these to static inline functions in a header
file.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously the offload functions in NFP assumed that the ingress (or
egress) netdev passed to them was an nfp repr.
Modify the driver to permit the passing of non repr netdevs as the ingress
device for an offload rule candidate. This may include devices such as
tunnels. The driver should then base its offload decision on a combination
of ingress device and egress port for a rule.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently drivers can register to receive TC block bind/unbind callbacks
by implementing the setup_tc ndo in any of their given netdevs. However,
drivers may also be interested in binds to higher level devices (e.g.
tunnel drivers) to potentially offload filters applied to them.
Introduce indirect block devs which allows drivers to register callbacks
for block binds on other devices. The callback is triggered when the
device is bound to a block, allowing the driver to register for rules
applied to that block using already available functions.
Freeing an indirect block callback will trigger an unbind event (if
necessary) to direct the driver to remove any offloaded rules and unreg
any block rule callbacks. It is the responsibility of the implementing
driver to clean any registered indirect block callbacks before exiting,
if the block it still active at such a time.
Allow registering an indirect block dev callback for a device that is
already bound to a block. In this case (if it is an ingress block),
register and also trigger the callback meaning that any already installed
rules can be replayed to the calling driver.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit says:
====================
net: phy: add macros for PHYID matching in PHY driver config
Add macros for PHYID matching to be used in PHY driver configs.
By using these macros some boilerplate code can be avoided.
Use them initially in the Realtek PHY drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use new macros for PHYID matching to avoid boilerplate code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add macros for PHYID matching to be used in PHY driver configs.
By using these macros some boilerplate code can be avoided.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit says:
====================
net: phy: further phylib simplifications after recent changes to the state machine
After the recent changes to the state machine phylib can be further
simplified (w/o having to make any assumptions).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that phy_mac_interrupt() doesn't call phy_change() any longer it's
called from phy_interrupt() only. Therefore phy_interrupt_is_valid()
returns true always and the check can be removed.
In case of PHY_HALTED phy_interrupt() bails out immediately,
therefore the second check for PHY_HALTED including the call to
phy_disable_interrupts() can be removed.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using phy_mac_interrupt() the irq number is set to
PHY_IGNORE_INTERRUPT, therefore phy_interrupt_is_valid() returns false.
As a result phy_change() effectively just calls phy_trigger_machine()
when called from phy_mac_interrupt() via phy_change_work(). So we can
call phy_trigger_machine() from phy_mac_interrupt() directly and
remove some now unneeded code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
State PHY_CHANGELINK isn't needed here, we can call the state machine
directly. We just have to remove the check for phy_polling_mode() to
make this work also in interrupt mode. Removing this check doesn't
cause any overhead because when not polling the state machine is
called only if required by some event.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit says:
====================
net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt
Flag PHY_HAS_INTERRUPT is used only here for this small check. I think
using interrupts isn't possible if a driver defines neither
config_intr nor ack_interrupts callback. So we can replace checking
flag PHY_HAS_INTERRUPT with checking for these callbacks.
This allows to remove this flag from all driver configs.
v2:
- add helper for check in patch 1
- remove PHY_HAS_INTERRUPT from all drivers, not only Realtek
- remove flag PHY_HAS_INTERRUPT completely
v3:
- rebase patch 2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that flag PHY_HAS_INTERRUPT has been replaced with a check for
callbacks config_intr and ack_interrupt, we can remove setting this
flag from all driver configs.
Last but not least remove flag PHY_HAS_INTERRUPT completely.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flag PHY_HAS_INTERRUPT is used only here for this small check. I think
using interrupts isn't possible if a driver defines neither
config_intr nor ack_interrupts callback. So we can replace checking
flag PHY_HAS_INTERRUPT with checking for these callbacks.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was trying to solve a double free but I introduced a more serious
NULL dereference bug. The problem is that if there is an IRQ which
triggers immediately, then we need "info->uio_dev" but it's not set yet.
This patch puts the original initialization back to how it was and just
sets info->uio_dev to NULL on the error path so it should solve both
the Oops and the double free.
Fixes: f019f07ecf ("uio: potential double frees if __uio_register_device() fails")
Reported-by: Mathias Thore <Mathias.Thore@infinera.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Tested-by: Mathias Thore <Mathias.Thore@infinera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the case where eq->fw->size > PAGE_SIZE the error return rc
is being set to EINVAL however this is being overwritten to
rc = req->fw->size because the error exit path via label 'out' is
not being taken. Fix this by adding the jump to the error exit
path 'out'.
Detected by CoverityScan, CID#1453465 ("Unused value")
Fixes: c92316bf8e ("test_firmware: add batched firmware tests")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NVMEM DT support seems to be totally broken after
commit e888d445ac ("nvmem: resolve cells from DT at registration time")
Fix this!
Index used in of_nvmem_cell_get() to find cell is specific to
consumer, It can not be used for searching the cell in provider.
Use device_node instead of this to find the matching cell in device
tree case.
Fixes: e888d445ac ("nvmem: resolve cells from DT at registration time")
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Tested-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After building the kernel with Clang, the following section mismatch
warning appears:
WARNING: vmlinux.o(.text+0x3bf19a6): Section mismatch in reference from
the function ssc_probe() to the function
.init.text:atmel_ssc_get_driver_data()
The function ssc_probe() references
the function __init atmel_ssc_get_driver_data().
This is often because ssc_probe lacks a __init
annotation or the annotation of atmel_ssc_get_driver_data is wrong.
Remove __init from atmel_ssc_get_driver_data to get rid of the mismatch.
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
req.gid can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
vers/misc/sgi-gru/grukdump.c:200 gru_dump_chiplet_request() warn:
potential spectre issue 'gru_base' [w]
Fix this by sanitizing req.gid before calling macro GID_TO_GRU, which
uses it to index gru_base.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In kvp_send_key(), we do need call process_ib_ipinfo() if
message->kvp_hdr.operation is KVP_OP_GET_IP_INFO, because it turns out
the userland hv_kvp_daemon needs the info of operation, adapter_id and
addr_family. With the incorrect fc62c3b197, the host can't get the
VM's IP via KVP.
And, fc62c3b197 added a "break;", but actually forgot to initialize
the key_size/value in the case of KVP_OP_SET, so the default key_size of
0 is passed to the kvp daemon, and the pool files
/var/lib/hyperv/.kvp_pool_* can't be updated.
This patch effectively rolls back the previous fc62c3b197, and
correctly fixes the "this statement may fall through" warnings.
This patch is tested on WS 2012 R2 and 2016.
Fixes: fc62c3b197 ("Drivers: hv: kvp: Fix two "this statement may fall through" warnings")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 37c8a5fafa ("kbuild: consolidate Devicetree dtb build rules")
moved the location of 'dtbs_install' target which caused dtbs to not be
installed when building debian package with 'bindeb-pkg' target. Update
the builddeb script to use the same logic that determines if there's a
'dtbs_install' target which is presence of the arch dts directory. Also,
use CONFIG_OF_EARLY_FLATTREE instead of CONFIG_OF as that's a better
indication of whether we are building dtbs.
This commit will also have the side effect of installing dtbs on any
arch that has dts files. Previously, it was dependent on whether the
arch defined 'dtbs_install'.
Fixes: 37c8a5fafa ("kbuild: consolidate Devicetree dtb build rules")
Reported-by: Nuno Gonçalves <nunojpg@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
This reverts commit 6147b1cf19.
The reverted patch results in attempted write access to the source
repository, even if that repository is mounted read-only.
Output from "strace git status -uno --porcelain":
getcwd("/tmp/linux-test", 129) = 16
open("/tmp/linux-test/.git/index.lock", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0666) =
-1 EROFS (Read-only file system)
While git appears to be able to handle this situation, a monitored
build environment (such as the one used for Chrome OS kernel builds)
may detect it and bail out with an access violation error. On top of
that, the attempted write access suggests that git _will_ write to the
file even if a build output directory is specified. Users may have the
reasonable expectation that the source repository remains untouched in
that situation.
Fixes: 6147b1cf19 ("scripts/setlocalversion: git: Make -dirty check more robust"
Cc: Genki Sky <sky@genki.is>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Since commit b41d920acf ("kbuild: deb-pkg: split generating packaging
and build"), the build version of the kernel contained in a deb package
is too low by 1.
Prior to the bad commit, the kernel was built first, then the number
in .version file was read out, and written into the debian control file.
Now, the debian control file is created before the kernel is actually
compiled, which is causing the version number mismatch.
Let the mkdebian script pass KBUILD_BUILD_VERSION=${revision} to require
the build system to use the specified version number.
Fixes: b41d920acf ("kbuild: deb-pkg: split generating packaging and build")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Doug Smythies <dsmythies@telus.net>
The current SED_CONFIG_EXP could match to comment lines in config
fragment files, especially when CONFIG_PREFIX_ is empty. For example,
Buildroot uses empty prefixing; starting symbols with BR2_ is just
convention.
Make the sed expression more robust against false positives from
comment lines. The new sed expression matches to only valid patterns.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Reviewed-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
Logic is there twice, and we'll need a third place
soon for ini dumping. In addition move the dumping
to a function, also to enable reuse.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
TDLS discovery response frame is a unicast direct frame to the peer.
Since we don't have a STA for this peer, this frame goes through
iwl_tx_skb_non_sta(). As the result aux_sta and some completely
arbitrary queue would be selected for this frame, resulting in a queue
hang. Fix that by sending such frames through AP sta instead.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
FW asserts 0x70, 0x71, and 0x73 all just mean that the real error
happened in another MAC, and to look there for the problem. Add their
descriptions to the assert number lookup table so users get a nicer
error message in the logs.
Also, since the 4 most-significant bits of the assert number are
dynamic, and depend on which MAC the assert occurred on, ignore those
bits when looking up the assert name.
Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
The trigger structure is being passed around, when
all we care about is whether to dump only monitor
or not. Pass a bool instead.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Currently code sets the write pointer when getting the TX queue
allocate response. This causes a redundant interrupt with any actual
change in the pointer. Remove this write altogether.
Fixes: 310181ec34 ("iwlwifi: move to TVQM mode")
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
If tx fails during connection establishment, try another antenna for
the next tx. This will increase the chance to establish connection if
one of the antennas is blocked. Note that the antenna is toggled even
when failing to tx data frames since connection establishment may use
EAPOLs for 802.1X authentication/ 4 way handshake.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
If the association supports HE, HT/VHT rates will never be used for Tx
and therefore there's no need to set the sgi-per-channel-width-support
bits, so don't set them in this case.
Fixes: 110b32f065 ("iwlwifi: mvm: rs: add basic implementation of the new RS API handlers")
Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Switch the antenna used for management tx only if previous tx failed.
If previous tx succeeded, there is no reason to switch antennas.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Enable low latency for softAP in all modes (standalone, SCM
and DCM).
This is in order to minimize the time the softAP leaves the channel for
other operations
Signed-off-by: Tova Mussai <tova.mussai@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
In D3 suspend flow in 9260 gen2 HW, the NIC receives two PERST signals.
The first PERST is expected and indicates the device on coming resume flow.
The second PERST causes FW restart FW restart.
In order to avoid this issue, the FW set the persistence bit on.
Once this bit is set, the FW ignores reset attempts.
The problem is when the FW gets assert during D3 and then the persistence
bit is set and causes the FW to ignore reset.
To handle this issue, the FW opens the preg bit which allows access
to the persistence bit, so that the driver clear the persistence bit
and reset the NIC.
The flow is as follows:
the driver checks if the persistence bit is set.
If the bit is set, the driver checks if he can clear the bit.
If the driver can not clear the bit then there is no point to continue
configuring the NIC since it will fail.
The fix was added is in start HW flow instead of the resume flow since in
general, if the persistence bit is set, the driver can not start the FW.
So it is good to check it when we start configuring the NIC.
The driver does not need to close the preg bit since the FW close it
during the start flow.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
All the queue management code runs under mvm->mutex, so there are
only very few cases of accessing the data structures without it:
* TX path, which doesn't take any locks anyway
* iwl_mvm_wake_sw_queue() and iwl_mvm_stop_sw_queue() where we
just (atomically) read a bitmap, so the lock isn't needed.
Therefore, we can remove the spinlock. This enables some cleanup
in the ugly locking in iwl_mvm_inactivity_check().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
When we mark a TID as no longer having a queue, there's no
guarantee the TX path isn't using this txq_id right now,
having accessed it just before we reset the value. To fix
this, add synchronize_net() when we change the TIDs from
having a queue to not having one, so that we can then be
sure that the TX path is no longer accessing that queue.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c: In function 'iwl_mvm_rx_mpdu_mq':
drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c:1386:7: warning:
variable 'he_phy_data' set but not used [-Wunused-but-set-variable]
u64 he_phy_data;
'he_phy_data' never used since be introduce in
commit 18ead597da ("iwlwifi: support new rx_mpdu_desc api")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Andrey Ignatov says:
====================
This patch set adds support for narrow loads with offset > 0 to BPF
verifier.
Patch 1 provides more details and is the main patch in the set.
Patches 2 and 3 add new test cases to test_verifier and test_sock_addr
selftests.
v1->v2:
- fix -Wdeclaration-after-statement warning.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add more test cases for context bpf_sock_addr to test narrow loads with
offset > 0 for ctx->user_ip4 field (__u32):
* off=1, size=1;
* off=2, size=1;
* off=3, size=1;
* off=2, size=2.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently BPF verifier allows narrow loads for a context field only with
offset zero. E.g. if there is a __u32 field then only the following
loads are permitted:
* off=0, size=1 (narrow);
* off=0, size=2 (narrow);
* off=0, size=4 (full).
On the other hand LLVM can generate a load with offset different than
zero that make sense from program logic point of view, but verifier
doesn't accept it.
E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code:
#define DST_IP4 0xC0A801FEU /* 192.168.1.254 */
...
if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) &&
where ctx is struct bpf_sock_addr.
Some versions of LLVM can produce the following byte code for it:
8: 71 12 07 00 00 00 00 00 r2 = *(u8 *)(r1 + 7)
9: 67 02 00 00 18 00 00 00 r2 <<= 24
10: 18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00 r3 = 4261412864 ll
12: 5d 32 07 00 00 00 00 00 if r2 != r3 goto +7 <LBB0_6>
where `*(u8 *)(r1 + 7)` means narrow load for ctx->user_ip4 with size=1
and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently
rejected by verifier.
Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok()
what means any is_valid_access implementation, that uses the function,
works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or
sock_addr_is_valid_access() for bpf_sock_addr.
The patch makes such loads supported. Offset can be in [0; size_default)
but has to be multiple of load size. E.g. for __u32 field the following
loads are supported now:
* off=0, size=1 (narrow);
* off=1, size=1 (narrow);
* off=2, size=1 (narrow);
* off=3, size=1 (narrow);
* off=0, size=2 (narrow);
* off=2, size=2 (narrow);
* off=0, size=4 (full).
Reported-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Same change as made to sctp_intl_store_reasm().
To be fully correct, an iterator has an undefined value when something
like skb_queue_walk() naturally terminates.
This will actually matter when SKB queues are converted over to
list_head.
Formalize what this code ends up doing with the current
implementation.
Signed-off-by: David S. Miller <davem@davemloft.net>
To be fully correct, an iterator has an undefined value when something
like skb_queue_walk() naturally terminates.
This will actually matter when SKB queues are converted over to
list_head.
Formalize what this code ends up doing with the current
implementation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate the assumption that SKBs and SKB list heads can
be cast to eachother in SKB list handling code.
This change also appears to fix a bug since the list->next pointer is
sampled outside of holding the SKB queue lock.
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of direct SKB list pointer accesses.
The loops in this function had to be rewritten to accommodate this
more easily.
The first loop iterates now over the target list in the outer loop,
and triggers an mmc data operation when the per-operation limits are
hit.
Then after the loops, if we have any residue, we trigger the last
and final operation.
For the page aligned workaround, where we have to copy the read data
back into the original list of SKBs, we use a two-tiered loop. The
outer loop stays the same and iterates over pktlist, and then we have
an inner loop which uses skb_peek_next(). The break logic has been
simplified because we know that the aggregate length of the SKBs in
the source and destination lists are the same.
This change also ends up fixing a bug, having to do with the
maintainance of the seg_sz variable and how it drove the outermost
loop. It begins as:
seg_sz = target_list->qlen;
ie. the number of packets in the target_list queue. The loop
structure was then:
while (seq_sz) {
...
while (not at end of target_list) {
...
sg_cnt++
...
}
...
seg_sz -= sg_cnt;
The assumption built into that last statement is that sg_cnt counts
how many packets from target_list have been fully processed by the
inner loop. But this not true.
If we hit one of the limits, such as the max segment size or the max
request size, we will break and copy a partial packet then contine
back up to the top of the outermost loop.
With the new loops we don't have this problem as we don't guard the
loop exit with a packet count, but instead use the progression of the
pkt_next SKB through the list to the end. The general structure is:
sg_cnt = 0;
skb_queue_walk(target_list, pkt_next) {
pkt_offset = 0;
...
sg_cnt++;
...
while (pkt_offset < pkt_next->len) {
pkt_offset += sg_data_size;
if (queued up max per request)
mmc_submit_one();
}
}
if (sg_cnt)
mmc_submit_one();
The variables that maintain where we are in the MMC command state such
as req_sz, sg_cnt, and sgl are reset when we emit one of these full
sized requests.
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanislav Fomichev says:
====================
v5 changes:
* FILE -> PATH for load/loadall (can be either file or directory now)
* simpler implementation for __bpf_program__pin_name
* removed p_err for REQ_ARGS checks
* parse_atach_detach_args -> parse_attach_detach_args
* for -> while in bpf_object__pin_{programs,maps} recovery
v4 changes:
* addressed another round of comments/style issues from Jakub Kicinski &
Quentin Monnet (thanks!)
* implemented bpf_object__pin_maps and bpf_object__pin_programs helpers and
used them in bpf_program__pin
* added new pin_name to bpf_program so bpf_program__pin
works with sections that contain '/'
* moved *loadall* command implementation into a separate patch
* added patch that implements *pinmaps* to pin maps when doing
load/loadall
v3 changes:
* (maybe) better cleanup for partial failure in bpf_object__pin
* added special case in bpf_program__pin for programs with single
instances
v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin, parts of which are now
being used to attach all flow dissector progs/maps
* third patch adds special case in bpf_program__pin for programs with
single instances (we don't create <prog>/0 pin anymore, just <prog>)
* forth patch adds pin_name to the bpf_program struct
which is now used as a pin name in bpf_program__pin et al
* fifth patch adds *loadall* command that pins all programs, not just
the first one
* sixth patch adds *pinmaps* argument to load/loadall to let users pin
all maps of the obj file
* seventh patch adds actual flow_dissector support to the bpftool and
an example
====================
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>