While the stackleak plugin was already using notrace, objtool is now a
bit more picky. Update the notrace uses to noinstr. Silences the
following objtool warnings when building with:
CONFIG_DEBUG_ENTRY=y
CONFIG_STACK_VALIDATION=y
CONFIG_VMLINUX_VALIDATION=y
CONFIG_GCC_PLUGIN_STACKLEAK=y
vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section
Note that the plugin's addition of calls to stackleak_track_stack() from
noinstr functions is expected to be safe, as it isn't runtime
instrumentation and is self-contained.
Cc: Alexander Popov <alex.popov@linux.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, netfilter, and ieee802154.
Current release - regressions:
- Partially revert "net/smc: Add netlink net namespace support", fix
uABI breakage
- netfilter:
- nft_ct: fix use after free when attaching zone template
- nft_byteorder: track register operations
Previous releases - regressions:
- ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
- phy: qca8081: fix speeds lower than 2.5Gb/s
- sched: fix use-after-free in tc_new_tfilter()
Previous releases - always broken:
- tcp: fix mem under-charging with zerocopy sendmsg()
- tcp: add missing tcp_skb_can_collapse() test in
tcp_shift_skb_data()
- neigh: do not trigger immediate probes on NUD_FAILED from
neigh_managed_work, avoid a deadlock
- bpf: use VM_MAP instead of VM_ALLOC for ringbuf, avoid KASAN
false-positives
- netfilter: nft_reject_bridge: fix for missing reply from prerouting
- smc: forward wakeup to smc socket waitqueue after fallback
- ieee802154:
- return meaningful error codes from the netlink helpers
- mcr20a: fix lifs/sifs periods
- at86rf230, ca8210: stop leaking skbs on error paths
- macsec: add missing un-offload call for NETDEV_UNREGISTER of parent
- ax25: add refcount in ax25_dev to avoid UAF bugs
- eth: mlx5e:
- fix SFP module EEPROM query
- fix broken SKB allocation in HW-GRO
- IPsec offload: fix tunnel mode crypto for non-TCP/UDP flows
- eth: amd-xgbe:
- fix skb data length underflow
- ensure reset of the tx_timer_active flag, avoid Tx timeouts
- eth: stmmac: fix runtime pm use in stmmac_dvr_remove()
- eth: e1000e: handshake with CSME starts from Alder Lake platforms"
* tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
ax25: fix reference count leaks of ax25_dev
net: stmmac: ensure PTP time register reads are consistent
net: ipa: request IPA register values be retained
dt-bindings: net: qcom,ipa: add optional qcom,qmp property
tools/resolve_btfids: Do not print any commands when building silently
bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work
tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
net: sparx5: do not refer to skb after passing it on
Partially revert "net/smc: Add netlink net namespace support"
net/mlx5e: Avoid field-overflowing memcpy()
net/mlx5e: Use struct_group() for memcpy() region
net/mlx5e: Avoid implicit modify hdr for decap drop rule
net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
net/mlx5: E-Switch, Fix uninitialized variable modact
net/mlx5e: Fix handling of wrong devices during bond netevent
net/mlx5e: Fix broken SKB allocation in HW-GRO
net/mlx5e: Fix wrong calculation of header index in HW_GRO
...
Pull selinux fix from Paul Moore:
"One small SELinux patch to ensure that a policy structure field is
properly reset after freeing so that we don't inadvertently do a
double-free on certain error conditions"
* tag 'selinux-pr-20220203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix double free of cond_list on error paths
Pull Kselftest fixes from Shuah Khan:
"Important fixes to several tests and documentation clarification on
running mainline kselftest on stable releases. A few notable fixes:
- fix kselftest run hang due to child processes that haven't been
terminated. Fix signals all child processes
- fix false pass/fail results from vdso_test_abi, openat2, mincore
- build failures when using -j (multiple jobs) option
- exec test build failure due to incorrect build rule for a run-time
created "pipe"
- zram test fixes related to interaction with zram-generator to make
sure zram test to coordinate deleted with zram-generator
- zram test compression ratio calculation fix and skipping
max_comp_streams.
- increasing rtc test timeout
- cpufreq test to write test results to stdout which will necessary
on automated test systems"
* tag 'linux-kselftest-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kselftest: Fix vdso_test_abi return status
selftests: skip mincore.check_file_mmap when fs lacks needed support
selftests: openat2: Skip testcases that fail with EOPNOTSUPP
selftests: openat2: Add missing dependency in Makefile
selftests: openat2: Print also errno in failure messages
selftests: futex: Use variable MAKE instead of make
selftests/exec: Remove pipe from TEST_GEN_FILES
selftests/zram: Adapt the situation that /dev/zram0 is being used
selftests/zram01.sh: Fix compression ratio calculation
selftests/zram: Skip max_comp_streams interface on newer kernel
docs/kselftest: clarify running mainline tests on stables
kselftest: signal all child processes
selftests: cpufreq: Write test output to stdout as well
selftests: rtc: Increase test timeout so that all tests run
The ice driver provides QoS information to auxiliary drivers
through the exported function ice_get_qos_params. This function
doesn't currently support L3 DSCP QoS.
Add the necessary defines, structure elements and code to support
DSCP QoS through the IIDC functions.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The previous commit d01ffb9eee ("ax25: add refcount in ax25_dev
to avoid UAF bugs") introduces refcount into ax25_dev, but there
are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(),
ax25_rt_add(), ax25_rt_del() and ax25_rt_opt().
This patch uses ax25_dev_put() and adjusts the position of
ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev.
Fixes: d01ffb9eee ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Even if protected from preemption and interrupts, a small time window
remains when the 2 register reads could return inconsistent values,
each time the "seconds" register changes. This could lead to an about
1-second error in the reported time.
Add logic to ensure the "seconds" and "nanoseconds" values are consistent.
Fixes: 92ba688851 ("stmmac: add the support for PTP hw clock driver")
Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20220203160025.750632-1-yannick.vignon@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2022-02-03
We've added 6 non-merge commits during the last 10 day(s) which contain
a total of 7 files changed, 11 insertions(+), 236 deletions(-).
The main changes are:
1) Fix BPF ringbuf to allocate its area with VM_MAP instead of VM_ALLOC
flag which otherwise trips over KASAN, from Hou Tao.
2) Fix unresolved symbol warning in resolve_btfids due to LSM callback
rename, from Alexei Starovoitov.
3) Fix a possible race in inc_misses_counter() when IRQ would trigger
during counter update, from He Fengqing.
4) Fix tooling infra for cross-building with clang upon probing whether
gcc provides the standard libraries, from Jean-Philippe Brucker.
5) Fix silent mode build for resolve_btfids, from Nathan Chancellor.
6) Drop unneeded and outdated lirc.h header copy from tooling infra as
BPF does not require it anymore, from Sean Young.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
tools/resolve_btfids: Do not print any commands when building silently
bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
tools: Ignore errors from `which' when searching a GCC toolchain
tools headers UAPI: remove stale lirc.h
bpf: Fix possible race in inc_misses_counter
bpf: Fix renaming task_getsecid_subj->current_getsecid_subj.
====================
Link: https://lore.kernel.org/r/20220203155815.25689-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There was a race condition in access to hw->aq.asq_last_status
while adding and deleting MAC/VLAN filters causing
incorrect error status to be printed as ERROR OK instead of
the correct error.
Change calls to i40e_aq_add_macvlan in i40e_aqc_add_filters
and i40e_aq_remove_macvlan in i40e_aqc_del_filters
to _v2 versions that return Admin Queue status on the stack
to avoid race conditions in access to hw->aq.asq_last_status.
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ASQ send command functions are returning only i40e status codes
yet some calling functions also need Admin Queue status
that is stored in hw->aq.asq_last_status. Since hw object
is stored on a heap it introduces a possibility for
a race condition in access to hw if calling function is not
fast enough to read hw->aq.asq_last_status before next
send ASQ command is executed.
Add new _v2 version of i40e_aq_add_macvlan that is using
new _v2 versions of ASQ send command functions and returns
the Admin Queue status on the stack.
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ASQ send command functions are returning only i40e status codes
yet some calling functions also need Admin Queue status
that is stored in hw->aq.asq_last_status. Since hw object
is stored on a heap it introduces a possibility for
a race condition in access to hw if calling function is not
fast enough to read hw->aq.asq_last_status before next
send ASQ command is executed.
Add new versions of send ASQ command functions that return
Admin Queue status on the stack to avoid race conditions
in access to hw->aq.asq_last_status.
Add new _v2 version of i40e_aq_remove_macvlan that is using
new _v2 versions of ASQ send command functions and returns
the Admin Queue status on the stack.
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Change functions:
- i40e_aq_add_macvlan
- i40e_aq_remove_macvlan
- i40e_aq_delete_element
- i40e_aq_add_vsi
- i40e_aq_update_vsi_params
to explicitly use i40e_asq_send_command_atomic(..., true)
instead of i40e_asq_send_command, as they use mutexes and do some
work in an atomic context.
Without this change setting vlan via netdev will fail with
call trace cased by bug "BUG: scheduling while atomic".
Signed-off-by: Witold Fijalkowski <witoldx.fijalkowski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After commit 1a557afc4d ("i40e: Refactor receive routine"),
rx_stats.realloc_count is no longer being incremented, so remove it.
The debugfs string was left, but hardcoded to 0. This is intended to
prevent breaking any existing code / scripts that are parsing debugfs
for i40e.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After loading driver hw-tc-offload is enabled by default.
Change the behaviour of driver to disable hw-tc-offload by default as
this is the expected state. Additionally since this impacts ntuple
feature state change the way of checking NETIF_F_HW_TC flag.
Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
syzbot reported a btf decl_tag bug with stack trace below:
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 PID: 3592 Comm: syz-executor914 Not tainted 5.16.0-syzkaller-11424-gb7892f7d5cb2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:btf_type_vlen include/linux/btf.h:231 [inline]
RIP: 0010:btf_decl_tag_resolve+0x83e/0xaa0 kernel/bpf/btf.c:3910
...
Call Trace:
<TASK>
btf_resolve+0x251/0x1020 kernel/bpf/btf.c:4198
btf_check_all_types kernel/bpf/btf.c:4239 [inline]
btf_parse_type_sec kernel/bpf/btf.c:4280 [inline]
btf_parse kernel/bpf/btf.c:4513 [inline]
btf_new_fd+0x19fe/0x2370 kernel/bpf/btf.c:6047
bpf_btf_load kernel/bpf/syscall.c:4039 [inline]
__sys_bpf+0x1cbb/0x5970 kernel/bpf/syscall.c:4679
__do_sys_bpf kernel/bpf/syscall.c:4738 [inline]
__se_sys_bpf kernel/bpf/syscall.c:4736 [inline]
__x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4736
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The kasan error is triggered with an illegal BTF like below:
type 0: void
type 1: int
type 2: decl_tag to func type 3
type 3: func to func_proto type 8
The total number of types is 4 and the type 3 is illegal
since its func_proto type is out of range.
Currently, the target type of decl_tag can be struct/union, var or func.
Both struct/union and var implemented their own 'resolve' callback functions
and hence handled properly in kernel.
But func type doesn't have 'resolve' callback function. When
btf_decl_tag_resolve() tries to check func type, it tries to get
vlen of its func_proto type, which triggered the above kasan error.
To fix the issue, btf_decl_tag_resolve() needs to do btf_func_check()
before trying to accessing func_proto type.
In the current implementation, func type is checked with
btf_func_check() in the main checking function btf_check_all_types().
To fix the above kasan issue, let us implement 'resolve' callback
func type properly. The 'resolve' callback will be also called
in btf_check_all_types() for func types.
Fixes: b5ea834dde ("bpf: Support for new btf kind BTF_KIND_TAG")
Reported-by: syzbot+53619be9444215e785ed@syzkaller.appspotmail.com
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220203191727.741862-1-yhs@fb.com
This reverts commit 774a1221e8.
We need to finish all async code before the module init sequence is
done. In the reverted commit the PF_USED_ASYNC flag was added to mark a
thread that called async_schedule(). Then the PF_USED_ASYNC flag was
used to determine whether or not async_synchronize_full() needs to be
invoked. This works when modprobe thread is calling async_schedule(),
but it does not work if module dispatches init code to a worker thread
which then calls async_schedule().
For example, PCI driver probing is invoked from a worker thread based on
a node where device is attached:
if (cpu < nr_cpu_ids)
error = work_on_cpu(cpu, local_pci_probe, &ddi);
else
error = local_pci_probe(&ddi);
We end up in a situation where a worker thread gets the PF_USED_ASYNC
flag set instead of the modprobe thread. As a result,
async_synchronize_full() is not invoked and modprobe completes without
waiting for the async code to finish.
The issue was discovered while loading the pm80xx driver:
(scsi_mod.scan=async)
modprobe pm80xx worker
...
do_init_module()
...
pci_call_probe()
work_on_cpu(local_pci_probe)
local_pci_probe()
pm8001_pci_probe()
scsi_scan_host()
async_schedule()
worker->flags |= PF_USED_ASYNC;
...
< return from worker >
...
if (current->flags & PF_USED_ASYNC) <--- false
async_synchronize_full();
Commit 21c3c5d280 ("block: don't request module during elevator init")
fixed the deadlock issue which the reverted commit 774a1221e8
("module, async: async_synchronize_full() on module init iff async is
used") tried to fix.
Since commit 0fdff3ec6d ("async, kmod: warn on synchronous
request_module() from async workers") synchronous module loading from
async is not allowed.
Given that the original deadlock issue is fixed and it is no longer
allowed to call synchronous request_module() from async we can remove
PF_USED_ASYNC flag to make module init consistently invoke
async_synchronize_full() unless async module probe is requested.
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In addition to the normal 64-bit instruction encoding, eBPF also has
a single instruction that uses a second 64-bit bits for a second
immediate value. Instead of only documenting this format deep down
in the document mention it in the instruction encoding section.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220131183638.3934982-5-hch@lst.de
Pull cgroup fixes from Tejun Heo:
- Eric's fix for a long standing cgroup1 permission issue where it only
checks for uid 0 instead of CAP which inadvertently allows
unprivileged userns roots to modify release_agent userhelper
- Fixes for the fallout from Waiman's recent cpuset work
* 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
cgroup-v1: Require capabilities to set release_agent
cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
cgroup/cpuset: Make child cpusets restrict parents on v1 hierarchy
Alex Elder says:
====================
net: ipa: enable register retention
With runtime power management in place, we sometimes need to issue
a command to enable retention of IPA register values before power
collapse. This requires a new Device Tree property, whose presence
will also be used to signal that the command is required.
====================
Link: https://lore.kernel.org/r/20220201150205.468403-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In some cases, the IPA hardware needs to request the always-on
subsystem (AOSS) to coordinate with the IPA microcontroller to
retain IPA register values at power collapse. This is done by
issuing a QMP request to the AOSS microcontroller. A similar
request ondoes that request.
We must get and hold the "QMP" handle early, because we might get
back EPROBE_DEFER for that. But the actual request should be sent
while we know the IPA clock is active, and when we know the
microcontroller is operational.
Fixes: 1aac309d32 ("net: ipa: use autosuspend")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For some systems, the IPA driver must make a request to ensure that
its registers are retained across power collapse of the IPA hardware.
On such systems, we'll use the existence of the "qcom,qmp" property
as a signal that this request is required.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It was found that a "suspicious RCU usage" lockdep warning was issued
with the rcu_read_lock() call in update_sibling_cpumasks(). It is
because the update_cpumasks_hier() function may sleep. So we have
to release the RCU lock, call update_cpumasks_hier() and reacquire
it afterward.
Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks()
instead of stating that in the comment.
Fixes: 4716909cc5 ("cpuset: Track cpusets that use parent's effective_cpus")
Signed-off-by: Waiman Long <longman@redhat.com>
Tested-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Andrii Nakryiko says:
====================
Clean up remaining missed uses of deprecated libbpf APIs across samples/bpf,
selftests/bpf, libbpf, and bpftool.
Also fix uninit variable warning in bpftool.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Switch to using new bpf_xdp_*() APIs across all selftests. Take
advantage of a more straightforward and user-friendly semantics of
old_prog_fd (0 means "don't care") in few places.
This is a redo of 544356524d ("selftests/bpf: switch to new libbpf XDP
APIs"), which was previously reverted to minimize conflicts during bpf
and bpf-next tree merge.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220202225916.3313522-6-andrii@kernel.org
When building with 'make -s', there is some output from resolve_btfids:
$ make -sj"$(nproc)" oldconfig prepare
MKDIR .../tools/bpf/resolve_btfids/libbpf/
MKDIR .../tools/bpf/resolve_btfids//libsubcmd
LINK resolve_btfids
Silent mode means that no information should be emitted about what is
currently being done. Use the $(silent) variable from Makefile.include
to avoid defining the msg macro so that there is no information printed.
Fixes: fbbb68de80 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220201212503.731732-1-nathan@kernel.org
This reverts commit 54d516b1d6
That commit did a refactoring that effectively combined fast and slow
gup paths (again). And that was again incorrect, for two reasons:
a) Fast gup and slow gup get reference counts on pages in different
ways and with different goals: see Linus' writeup in commit
cd1adf1b63 ("Revert "mm/gup: remove try_get_page(), call
try_get_compound_head() directly""), and
b) try_grab_compound_head() also has a specific check for
"FOLL_LONGTERM && !is_pinned(page)", that assumes that the caller
can fall back to slow gup. This resulted in new failures, as
recently report by Will McVicker [1].
But (a) has problems too, even though they may not have been reported
yet. So just revert this.
Link: https://lore.kernel.org/r/20220131203504.3458775-1-willmcvicker@google.com [1]
Fixes: 54d516b1d6 ("mm/gup: small refactoring: simplify try_grab_page()")
Reported-and-tested-by: Will McVicker <willmcvicker@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Minchan Kim <minchan@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King says:
====================
net: dsa: mv88e6xxx: convert to phylink_generic_validate()
The overall objective of this series is to convert the mv88e6xxx DSA
driver to use phylink_generic_validate().
Patch 1 adds a new helper mv88e6352_g2_scratch_port_has_serdes() which
indicates whether an 88e6352 port has a serdes associated with it. This
is necessary as ports 4 and 5 will normally be in automedia mode, where
the CMODE field in the port status register will change e.g. between 15
(internal PHY) and 9 (1000base-X) depending on whether the serdes has
link.
The existing code caches the cmode field, and depending whether the
serdes has link at probe time, determines whether we allow things such
as the serdes statistics to be accessed. This means if the link isn't
up at probe time, the serdes is essentially unavailable.
Patch 1 addresses this by reading the pin configuration to find out
whether the serdes is attached to port 4 or port 5.
Patch 2 is a joint effort between myself and Marek Behún, adding the
supported interfaces and MAC capabilities to all mv88e6xxx supported
switch devices. This is slightly more restrictive than the original
code as we didn't used to care too much about the interface mode, but
with this we do - which is why we must know if there's a serdes
associated now.
Patch 3 switches mv88e6xxx to use the generic validation by removing
the initialisation of the phylink_validate pointer in the dsa_ops
struct.
Patch 4 updates the statistics code to use the new helper in patch 1,
so the serdes statistics are available even if the link was down at
driver probe time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The decision whether to report serdes statistics currently depends on
the cached C_Mode value for the port, read at probe time or updated by
configuration. However, port 4 can be in "automedia" mode when it is
used as a serdes port, meaning it switches between the internal PHY and
the serdes, changing the read-only C_Mode value depending on which
first gains link. Consequently, the C_Mode value read at probe does not
accurately reflect whether the port has the serdes associated with it.
In "net: dsa: mv88e6xxx: add mv88e6352_g2_scratch_port_has_serdes()",
we added a way to read the hardware configuration to determine which
port has the serdes associated with it. Use this to determine which
port reports the serdes statistics.
Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the mv88e6xxx chip drivers are supplying the supported
interfaces and MAC capabilities, switch the driver to use the generic
phylink validation implementation by removing our own validation
implementations. This causes DSA to call phylink_generic_validate()
on our behalf.
Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Populate the supported interfaces and MAC capabilities for the
Marvell MV88E6xxx DSA switches in preparation to using these for the
validation functionality.
Patch co-authored by Marek.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org> [ fixed 6341 and 6393x ]
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Read the hardware configuration to determine which port is attached
to the serdes.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Waldekranz says:
====================
net: dsa: mv88e6xxx: Improve standalone port isolation
The ideal isolation between standalone ports satisfies two properties:
1. Packets from one standalone port must not be forwarded to any other
port.
2. Packets from a standalone port must be sent to the CPU port.
mv88e6xxx solves (1) by isolating standalone ports using the PVT. Up
to this point though, (2) has not guaranteed; as the ATU is still
consulted, there is a chance that incoming packets never reach the CPU
if its DA has previously been used as the SA of an earlier packet (see
1/5 for more details). This is typically not a problem, except for one
very useful setup in which switch ports are looped in order to run the
bridge kselftests in tools/testing/selftests/net/forwarding. This
series attempts to solve (2).
Ideally, we could simply use the "ForceMap" bit of more modern chips
(Agate and newer) to classify all incoming packets as MGMT. This is
not available on older silicon that is still widely used (Opal Plus
chips like the 6097 for example).
Instead, this series takes a two pronged approach:
1/5: Always clear MapDA on standalone ports to make sure that no ATU
entry can lead packets astray. This solves (2) for single-chip
systems.
2/5: Trivial prep work for 4/5.
3/5: Trivial prep work for 4/5.
4/5: On multi-chip systems though, this is not enough. On the incoming
chip, the packet will be forced out towards the CPU thanks to
1/5, but on any intermediate chips the ATU is still consulted. We
override this behavior by marking the reserved standalone VID (0)
as a policy VID, the DSA ports' VID policy is set to TRAP. This
will cause the packet to be reclassified as MGMT on the first
intermediate chip, after which it's a straight shot towards the
CPU.
Finally, we allow more tests to be run on mv88e6xxx:
5/5: The bridge_vlan{,un}aware suites sets an ageing_time of 10s on
the bridge it creates, but mv88e6xxx has a minimum supported time
of 15s. Allow this time to be overridden in forwarding.config.
With this series in place, mv88e6xxx passes the following kselftest
suites:
- bridge_port_isolation.sh
- bridge_sticky_fdb.sh
- bridge_vlan_aware.sh
- bridge_vlan_unaware.sh
v1 -> v2:
- Wording/spelling (Vladimir)
- Use standard iterator in dsa_switch_upstream_port (Vladimir)
- Limit enabling of VTU port policy to downstream DSA ports (Vladimir)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the ageing timeout that is set on bridges to be customized from
forwarding.config. This allows the tests to be run on hardware which
does not support a 10s timeout (e.g. mv88e6xxx).
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>