Now {pmd,pte}_mkdirty() set _PAGE_DIRTY bit unconditionally, this causes
random segmentation fault after commit 0ccf7f168e ("mm/thp: carry
over dirty bit when thp splits on pmd").
The reason is: when fork(), parent process use pmd_wrprotect() to clear
huge page's _PAGE_WRITE and _PAGE_DIRTY (for COW); then pte_mkdirty() set
_PAGE_DIRTY as well as _PAGE_MODIFIED while splitting dirty huge pages;
once _PAGE_DIRTY is set, there will be no tlb modify exception so the COW
machanism fails; and at last memory corruption occurred between parent
and child processes.
So, we should set _PAGE_DIRTY only when _PAGE_WRITE is set in {pmd,pte}_
mkdirty().
Cc: stable@vger.kernel.org
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
SMP operations can be shared by Loongson-2 series and Loongson-3 series,
so we change the prefix from loongson3 to loongson for all functions and
data structures.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Combine acpi_boot_table_init() and acpi_boot_init() since they are very
simple, and we don't need to check the return value of acpi_boot_init().
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
Fix this up by changing the LoongArch Makefile to use "grep -E" instead.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Function sparx5_tc_setup_qdisc_ets() always returns negative value
because it return -EOPNOTSUPP in the end. This patch returns the
rersult of sparx5_tc_ets_add() and sparx5_tc_ets_del() directly.
Fixes: 211225428d ("net: microchip: sparx5: add support for offloading ets qdisc")
Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If device_register() returns error in vmbus_device_register(),
the name allocated by dev_set_name() must be freed. As comment
of device_register() says, it should use put_device() to give
up the reference in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanup().
Fixes: 09d50ff8a2 ("Staging: hv: make the Hyper-V virtual bus code build")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20221119081135.1564691-3-yangyingliang@huawei.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Jeroen de Borst says:
====================
gve: Handle alternate miss-completions
Some versions of the virtual NIC present miss-completions in
an alternative way. Let the diver handle these alternate completions
and announce this capability to the device.
The capability is announced uing a new AdminQ command that sends
driver information to the device. The device can refuse a driver
if it is lacking support for a capability, or it can adopt it's
behavior to work around OS specific issues.
Changed in v5:
- Removed comments in fucntion calls
- Switched ENOTSUPP back to EOPNOTSUPP and made sure it gets passed
Changed in v4:
- Clarified new AdminQ command in cover letter
- Changed EOPNOTSUPP to ENOTSUPP to match device's response
Changed in v3:
- Rewording cover letter
- Added 'Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>'
Changes in v2:
- Changed the subject to include 'gve:'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The virtual NIC has 2 ways of indicating a miss-path
completion. This handles the alternate.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Chiu says:
====================
net: axienet: Use a DT property to configure frequency of the MDIO bus
Some FPGA platforms have to set frequency of the MDIO bus lower than 2.5
MHz. Thus, we use a DT property, which is "clock-frequency", to work
with it at boot time. The default 2.5 MHz would be set if the property
is not pressent. Also, factor out mdio enable/disable functions due to
the api change since 253761a0e6.
Changelog:
--- v5 ---
1. Make dt-binding patch prior to the implementation patch.
2. Disable mdio bus in error path.
3. Update description of some functions.
--- v4 ---
1. change MAX_MDIO_FREQ to DEFAULT_MDIO_FREQ as suggested by Andrew.
--- v3 RESEND ---
1. Repost the exact same patch again
--- v3 ---
1. Fix coding style, and make probing of the driver fail if MDC overflow
--- v2 ---
1. Use clock-frequency, as defined in mdio.yaml, to configure MDIO
clock.
2. Only print out frequency if it is set to a non-standard value.
3. Reduce the scope of axienet_mdio_enable and remove
axienet_mdio_disable because no one really uses it anymore.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some FPGA platforms have 80KHz MDIO bus frequency constraint when
connecting Ethernet to its on-board external Marvell PHY. Thus, we may
have to set MDIO clock according to the DT. Otherwise, use the default
2.5 MHz, as specified by 802.3, if the entry is not present.
Also, change MAX_MDIO_FREQ to DEFAULT_MDIO_FREQ because we may actually
set MDIO bus frequency higher than 2.5MHz if undelying devices support
it. And properly disable the mdio bus clock in error path.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both axienet_mdio_{enable/disable} functions are no longer used in
xilinx_axienet_main.c due to 253761a0e6. And axienet_mdio_disable is
not even used in the mdio.c. So unexport and remove them.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch complains that:
drivers/net/ethernet/microchip/sparx5/sparx5_dcb.c:112
sparx5_dcb_apptrust_validate() error: uninitialized symbol 'match'.
This would only happen if the:
if (sparx5_dcb_apptrust_policies[i].nselectors != nselectors)
condition is always true (they are not equal). The "nselectors"
variable comes from dcbnl_ieee_set() and it is a number between 0-256.
This seems like a probably a real bug.
Fixes: 23f8382cd9 ("net: microchip: sparx5: add support for apptrust")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The AMD Secure Processor (ASP) and an SNP guest use a series of
AES-GCM keys called VMPCKs to communicate securely with each other.
The IV to this scheme is a sequence number that both the ASP and the
guest track.
Currently, this sequence number in a guest request must exactly match
the sequence number tracked by the ASP. This means that if the guest
sees an error from the host during a request it can only retry that
exact request or disable the VMPCK to prevent an IV reuse. AES-GCM
cannot tolerate IV reuse, see: "Authentication Failures in NIST version
of GCM" - Antoine Joux et al.
In order to address this, make handle_guest_request() delete the VMPCK
on any non successful return. To allow userspace querying the cert_data
length make handle_guest_request() save the number of pages required by
the host, then have handle_guest_request() retry the request without
requesting the extended data, then return the number of pages required
back to userspace.
[ bp: Massage, incorporate Tom's review comments. ]
Fixes: fce96cf044 ("virt: Add SEV-SNP guest driver")
Reported-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20221116175558.2373112-1-pgonda@google.com
Fix RSTCTRL_PPE0 and RSTCTRL_PPE1 register mask definitions for
MTK_NETSYS_V2.
Remove duplicated definitions.
Fixes: 160d3a9b19 ("net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When VCAP_KUNIT_TEST is enabled the following warnings are generated:
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:257:34: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:258:41: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:342:23: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:359:23: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:1327:34: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:1328:41: warning: Using plain integer as NULL pointer
Therefore fix this.
Fixes: dccc30cc49 ("net: microchip: sparx5: Add KUNIT test of counters and sorted rules")
Fixes: c956b9b318 ("net: microchip: sparx5: Adding KUNIT tests of key/action values in VCAP API")
Fixes: 67d637516f ("net: microchip: sparx5: Adding KUNIT test for the VCAP API")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In i915_gem_madvise_ioctl() we immediately purge the object is not
currently used, like when the mm.pages are NULL. With shmem the pages
might still be hanging around or are perhaps swapped out. Similarly with
ttm we might still have the pages hanging around on the ttm resource,
like with lmem or shmem, but here we need to be extra careful since
async unbinds are possible as well as in-progress kernel moves. In
i915_ttm_purge() we expect the pipeline-gutting to nuke the ttm resource
for us, however if it's busy the memory is only moved to a ghost object,
which then leads to broken behaviour when for example clearing the
i915_tt->filp, since the actual ttm_tt is still alive and populated,
even though it's been moved to the ghost object. When we later destroy
the ghost object we hit the following, since the filp is now NULL:
[ +0.006982] #PF: supervisor read access in kernel mode
[ +0.005149] #PF: error_code(0x0000) - not-present page
[ +0.005147] PGD 11631d067 P4D 11631d067 PUD 115972067 PMD 0
[ +0.005676] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.012962] Workqueue: events ttm_device_delayed_workqueue [ttm]
[ +0.006022] RIP: 0010:i915_ttm_tt_unpopulate+0x3a/0x70 [i915]
[ +0.005879] Code: 89 fb 48 85 f6 74 11 8b 55 4c 48 8b 7d 30 45 31 c0 31 c9 e8 18 6a e5 e0 80 7d 60 00 74 20 48 8b 45 68
8b 55 08 4c 89 e7 5b 5d <48> 8b 40 20 83 e2 01 41 5c 89 d1 48 8b 70
30 e9 42 b2 ff ff 4c 89
[ +0.018782] RSP: 0000:ffffc9000bf6fd70 EFLAGS: 00010202
[ +0.005244] RAX: 0000000000000000 RBX: ffff8883e12ae380 RCX: 0000000000000000
[ +0.007150] RDX: 000000008000000e RSI: ffffffff823559b4 RDI: ffff8883e12ae3c0
[ +0.007142] RBP: ffff888103b65d48 R08: 0000000000000001 R09: 0000000000000001
[ +0.007144] R10: 0000000000000001 R11: ffff88829c2c8040 R12: ffff8883e12ae3c0
[ +0.007148] R13: 0000000000000001 R14: ffff888115184140 R15: ffff888115184248
[ +0.007154] FS: 0000000000000000(0000) GS:ffff88844db00000(0000) knlGS:0000000000000000
[ +0.008108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.005763] CR2: 0000000000000020 CR3: 000000013fdb4004 CR4: 00000000003706e0
[ +0.007152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ +0.007145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ +0.007154] Call Trace:
[ +0.002459] <TASK>
[ +0.002126] ttm_tt_unpopulate.part.0+0x17/0x70 [ttm]
[ +0.005068] ttm_bo_tt_destroy+0x1c/0x50 [ttm]
[ +0.004464] ttm_bo_cleanup_memtype_use+0x25/0x40 [ttm]
[ +0.005244] ttm_bo_cleanup_refs+0x90/0x2c0 [ttm]
[ +0.004721] ttm_bo_delayed_delete+0x235/0x250 [ttm]
[ +0.004981] ttm_device_delayed_workqueue+0x13/0x40 [ttm]
[ +0.005422] process_one_work+0x248/0x560
[ +0.004028] worker_thread+0x4b/0x390
[ +0.003682] ? process_one_work+0x560/0x560
[ +0.004199] kthread+0xeb/0x120
[ +0.003163] ? kthread_complete_and_exit+0x20/0x20
[ +0.004815] ret_from_fork+0x1f/0x30
v2:
- Just use ttm_bo_wait() directly (Niranjana)
- Add testcase reference
Testcase: igt@gem_madvise@dontneed-evict-race
Fixes: 213d509277 ("drm/i915/ttm: Introduce a TTM i915 gem object backend")
Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: <stable@vger.kernel.org> # v5.15+
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Acked-by: Nirmoy Das <Nirmoy.Das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221115104620.120432-1-matthew.auld@intel.com
(cherry picked from commit 5524b5e52e)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Simon Horman says:
====================
nfp: IPsec offload support
Huanhuan Wang says:
this series adds support for IPsec offload to the NFP driver.
It covers three enhancements:
1. Patches 1/3:
- Extend the capability word and control word to to support
new features.
2. Patch 2/3:
- Add framework to support IPsec offloading for NFP driver,
but IPsec offload control plane interface xfrm callbacks which
interact with upper layer are not implemented in this patch.
3. Patch 3/3:
- IPsec control plane interface xfrm callbacks are implemented
in this patch.
Changes since v3
* Remove structure fields that describe firmware but
are not used for Kernel offload
* Add WARN_ON(!xa_empty()) before call to xa_destroy()
* Added helpers for hash methods
Changes since v2
* OFFLOAD_HANDLE_ERROR macro and the associated code removed
* Unnecessary logging removed
* Hook function xdo_dev_state_free in struct xfrmdev_ops removed
* Use Xarray to maintain SA entries
Changes since v1
* Explicitly return failure when XFRM_STATE_ESN is set
* Fix the issue that AEAD algorithm is not correctly offloaded
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A new metadata type and config structure are introduced to
interact with firmware to support ipsec offloading. This
feature relies on specific firmware that supports ipsec
encrypt/decrypt by advertising related capability bit.
The xfrm callbacks which interact with upper layer are
implemented in the following patch.
Based on initial work of Norm Bagley <norman.bagley@netronome.com>.
Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the 32-bit capability word is almost exhausted, now
allocate some more words to support new features, and control
word is also extended accordingly. Packet-type offloading is
implemented in NIC application firmware, but it's not used in
kernel driver, so reserve this bit here in case it's redefined
for other use.
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shang XiaoJing says:
====================
nfc: Fix potential memory leak of skb
There are still somewhere maybe leak the skb, fix the memleaks by adding
fail path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
s3fwrn5_nci_send() won't free the skb when it failed for the check
before s3fwrn5_write(). As the result, the skb will memleak. Free the
skb when the check failed.
Fixes: c04c674fad ("nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Suggested-by: Pavel Machek <pavel@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
nxp_nci_send() won't free the skb when it failed for the check before
write(). As the result, the skb will memleak. Free the skb when the
check failed.
Fixes: dece45855a ("NFC: nxp-nci: Add support for NXP NCI chips")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Suggested-by: Pavel Machek <pavel@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfcmrvl_i2c_nci_send() will be called by nfcmrvl_nci_send(), and skb
should be freed in nfcmrvl_i2c_nci_send(). However, nfcmrvl_nci_send()
won't free the skb when it failed for the test_bit(). Free the skb when
test_bit() failed.
Fixes: b5b3e23e4c ("NFC: nfcmrvl: add i2c driver")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Suggested-by: Pavel Machek <pavel@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When built with Control Flow Integrity, function prototypes between
caller and function declaration must match. These mismatches are visible
at compile time with the new -Wcast-function-type-strict in Clang[1].
Fix a total of 227 warnings like these:
drivers/net/ethernet/brocade/bna/bna_enet.c:519:3: warning: cast from 'void (*)(struct bna_ethport *, enum bna_ethport_event)' to 'bfa_fsm_t' (aka 'void (*)(void *, int)') converts to incompatible function type [-Wcast-function-type-strict]
bfa_fsm_set_state(ethport, bna_ethport_sm_down);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The bna state machine code heavily overloads its state machine functions,
so these have been separated into their own sets of structs, enums,
typedefs, and helper functions. There are almost zero binary code changes,
all seem to be related to header file line numbers changing, or the
addition of the new stats helper.
Important to mention is that while I was manually implementing this changes
I was staring at this[2] patch from Kees Cook. Thanks, Kees. :)
Link: https://github.com/KSPP/linux/issues/240
[1] https://reviews.llvm.org/D134831
[2] https://lore.kernel.org/linux-hardening/20220929230334.2109344-1-keescook@chromium.org/
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sahid Orentino Ferdjaoui says:
====================
As part of commit 93b8952d22 ("libbpf: deprecate legacy BPF map
definitions") and commit bd054102a8 ("libbpf: enforce strict libbpf
1.0 behaviors") The --legacy option is not relevant anymore. #1 is
removing it. #4 is cleaning the code from using libbpf_get_error().
About patches #2 and #3 They are changes discovered while working on
this series (credits to Quentin Monnet). #2 is cleaning-up usage of an
unnecessary PTR_ERR(NULL), finally #3 is fixing an invalid value
passed to strerror().
v1 -> v2:
- Addressed review comments from Yonghong Song on patch #4
- Added a patch #5 that removes unwanted function noticed by
Yonghong Song
v2 -> v3
- Addressed review comments from Andrii Nakryiko on patch #2, #3, #4
* clean-up usage of libbpf_get_error() (#2, #3)
* fix possible return of an uninitialized local variable err
* fix returned errors using errno
v3 -> v4
- Addressed review comments from Quentin Monnet
* fix line moved from patch #2 to patch #3
* fix missing returned errors using errno
* fix some returned values to errno instead of -1
====================
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpftool is now totally compliant with libbpf 1.0 mode and is not
expected to be compiled with pre-1.0, let's clean-up the usage of
libbpf_get_error().
The changes stay aligned with returned errors always negative.
- In tools/bpf/bpftool/btf.c This fixes an uninitialized local
variable `err` in function do_dump() because it may now be returned
without having been set.
- This also removes the checks on NULL pointers before calling
btf__free() because that function already does the check.
Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@industrialdiscipline.com>
Link: https://lore.kernel.org/r/20221120112515.38165-5-sahid.ferdjaoui@industrialdiscipline.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yonghong Song says:
====================
Currenty, a non-tracing bpf program typically has a single 'context' argument
with predefined uapi struct type. Following these uapi struct, user is able
to access other fields defined in uapi header. Inside the kernel, the
user-seen 'context' argument is replaced with 'kernel context' (or 'kctx'
in short) which can access more information than what uapi header provides.
To access other info not in uapi header, people typically do two things:
(1). extend uapi to access more fields rooted from 'context'.
(2). use bpf_probe_read_kernl() helper to read particular field based on
kctx.
Using (1) needs uapi change and using (2) makes code more complex since
direct memory access is not allowed.
There are already a few instances trying to access more information from
kctx:
. trying to access some fields from perf_event kctx ([1]).
. trying to access some fields from xdp kctx ([2]).
This patch set tried to allow direct memory access for kctx fields
by introducing bpf_cast_to_kern_ctx() kfunc.
Martin mentioned a use case like type casting below:
#define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB)))
basically a 'unsigned char *" casted to 'struct skb_shared_info *'. This patch
set tries to support such a use case as well with bpf_rdonly_cast().
For the patch series, Patch 1 added support for a kfunc available to all
prog types. Patch 2 added bpf_cast_to_kern_ctx() kfunc. Patch 3 added
bpf_rdonly_cast() kfunc. Patch 4 added a few positive and negative tests.
[1] https://lore.kernel.org/bpf/ad15b398-9069-4a0e-48cb-4bb651ec3088@meta.com/
[2] https://lore.kernel.org/bpf/20221109215242.1279993-1-john.fastabend@gmail.com/
Changelog:
v3 -> v4:
- remove unnecessary bpf_ctx_convert.t error checking
- add and use meta.ret_btf_id instead of meta.arg_constant.value for
bpf_cast_to_kern_ctx().
- add PTR_TRUSTED to the return PTR_TO_BTF_ID type for bpf_cast_to_kern_ctx().
v2 -> v3:
- rebase on top of bpf-next (for merging conflicts)
- add the selftest to s390x deny list
rfcv1 -> v2:
- break original one kfunc into two.
- add missing error checks and error logs.
- adapt to the new conventions in
https://lore.kernel.org/all/20221118015614.2013203-1-memxor@gmail.com/
for example, with __ign and __k suffix.
- added support in fixup_kfunc_call() to replace kfunc calls with a single mov.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Three tests are added. One is from John Fastabend ({1]) which tests
tracing style access for xdp program from the kernel ctx.
Another is a tc test to test both kernel ctx tracing style access
and explicit non-ctx type cast. The third one is for negative tests
including two tests, a tp_bpf test where the bpf_rdonly_cast()
returns a untrusted ptr which cannot be used as helper argument,
and a tracepoint test where the kernel ctx is a u64.
Also added the test to DENYLIST.s390x since s390 does not currently
support calling kernel functions in JIT mode.
[1] https://lore.kernel.org/bpf/20221109215242.1279993-1-john.fastabend@gmail.com/
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221120195442.3114844-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Implement bpf_rdonly_cast() which tries to cast the object
to a specified type. This tries to support use case like below:
#define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB)))
where skb_end_pointer(SKB) is a 'unsigned char *' and needs to
be casted to 'struct skb_shared_info *'.
The signature of bpf_rdonly_cast() looks like
void *bpf_rdonly_cast(void *obj, __u32 btf_id)
The function returns the same 'obj' but with PTR_TO_BTF_ID with
btf_id. The verifier will ensure btf_id being a struct type.
Since the supported type cast may not reflect what the 'obj'
represents, the returned btf_id is marked as PTR_UNTRUSTED, so
the return value and subsequent pointer chasing cannot be
used as helper/kfunc arguments.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221120195437.3114585-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Implement bpf_cast_to_kern_ctx() kfunc which does a type cast
of a uapi ctx object to the corresponding kernel ctx. Previously
if users want to access some data available in kctx but not
in uapi ctx, bpf_probe_read_kernel() helper is needed.
The introduction of bpf_cast_to_kern_ctx() allows direct
memory access which makes code simpler and easier to understand.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221120195432.3113982-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Later on, we will introduce kfuncs bpf_cast_to_kern_ctx() and
bpf_rdonly_cast() which apply to all program types. Currently kfunc set
only supports individual prog types. This patch added support for kfunc
applying to all program types.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221120195426.3113828-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In the unlikely event that bpf_global_ma is not correctly initialized,
instead of checking the boolean everytime bpf_obj_new_impl is called,
simply check it while loading the program and return an error if
bpf_global_ma_set is false.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221120212610.2361700-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull tracing/probes fixes from Steven Rostedt:
- Fix possible NULL pointer dereference on trace_event_file in
kprobe_event_gen_test_exit()
- Fix NULL pointer dereference for trace_array in
kprobe_event_gen_test_exit()
- Fix memory leak of filter string for eprobes
- Fix a possible memory leak in rethook_alloc()
- Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case which
can cause a possible use-after-free
- Fix warning in eprobe filter creation
- Fix eprobe filter creation as it picked the wrong event for the
fields
* tag 'trace-probes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/eprobe: Fix eprobe filter to make a filter correctly
tracing/eprobe: Fix warning in filter creation
kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case
rethook: fix a potential memleak in rethook_alloc()
tracing/eprobe: Fix memory leak of filter string
tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit()
tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit()
Pull tracing fixes from Steven Rostedt:
- Fix polling to block on watermark like the reads do, as user space
applications get confused when the select says read is available, and
then the read blocks
- Fix accounting of ring buffer dropped pages as it is what is used to
determine if the buffer is empty or not
- Fix memory leak in tracing_read_pipe()
- Fix struct trace_array warning about being declared in parameters
- Fix accounting of ftrace pages used in output at start up.
- Fix allocation of dyn_ftrace pages by subtracting one from order
instead of diving it by 2
- Static analyzer found a case were a pointer being used outside of a
NULL check (rb_head_page_deactivate())
- Fix possible NULL pointer dereference if kstrdup() fails in
ftrace_add_mod()
- Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()
- Fix bad pointer dereference in register_synth_event() on error path
- Remove unused __bad_type_size() method
- Fix possible NULL pointer dereference of entry in list 'tr->err_log'
- Fix NULL pointer deference race if eprobe is called before the event
setup
* tag 'trace-v6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix race where eprobes can be called before the event
tracing: Fix potential null-pointer-access of entry in list 'tr->err_log'
tracing: Remove unused __bad_type_size() method
tracing: Fix wild-memory-access in register_synth_event()
tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()
ftrace: Fix null pointer dereference in ftrace_add_mod()
ring_buffer: Do not deactivate non-existant pages
ftrace: Optimize the allocation for mcount entries
ftrace: Fix the possible incorrect kernel message
tracing: Fix warning on variable 'struct trace_array'
tracing: Fix memory leak in tracing_read_pipe()
ring-buffer: Include dropped pages in counting dirty patches
tracing/ring-buffer: Have polling block on watermark
Pull x86 fixes from Borislav Petkov:
- Do not hold fpregs lock when inheriting FPU permissions because the
fpregs lock disables preemption on RT but fpu_inherit_perms() does
spin_lock_irq(), which, on RT, uses rtmutexes and they need to be
preemptible.
- Check the page offset and the length of the data supplied by
userspace for overflow when specifying a set of pages to add to an
SGX enclave
* tag 'x86_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Drop fpregs lock before inheriting FPU permissions
x86/sgx: Add overflow check in sgx_validate_offset_length()
Pull scheduler fixes from Borislav Petkov:
- Fix a small race on the task's exit path where there's a
misunderstanding whether the task holds rq->lock or not
- Prevent processes from getting killed when using deprecated or
unknown rseq ABI flags in order to be able to fuzz the rseq() syscall
with syzkaller
* tag 'sched_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix race in task_call_func()
rseq: Use pr_warn_once() when deprecated/unknown ABI flags are encountered
Pull perf fixes from Borislav Petkov:
- Fix an intel PT erratum where CPUs do not support single range output
for more than 4K
- Fix a NULL ptr dereference which can happen after an NMI interferes
with the event enabling dance in amd_pmu_enable_all()
- Free the events array too when freeing uncore contexts on CPU online,
thereby fixing a memory leak
- Improve the pending SIGTRAP check
* tag 'perf_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/pt: Fix sampling using single range output
perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf NMI and throttling
perf/x86/amd/uncore: Fix memory leak for events array
perf: Improve missing SIGTRAP checking