Merge 1e57930e9f ("Merge tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu") into android-mainline

Steps on the way to 5.19-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0b3c706909f2309bcfe1d4748e6bcb67df0337a0
This commit is contained in:
Greg Kroah-Hartman
2022-05-28 15:08:23 +02:00
107 changed files with 6401 additions and 2965 deletions

View File

@@ -0,0 +1,51 @@
What: security/secrets/coco
Date: February 2022
Contact: Dov Murik <dovmurik@linux.ibm.com>
Description:
Exposes confidential computing (coco) EFI secrets to
userspace via securityfs.
EFI can declare memory area used by confidential computing
platforms (such as AMD SEV and SEV-ES) for secret injection by
the Guest Owner during VM's launch. The secrets are encrypted
by the Guest Owner and decrypted inside the trusted enclave,
and therefore are not readable by the untrusted host.
The efi_secret module exposes the secrets to userspace. Each
secret appears as a file under <securityfs>/secrets/coco,
where the filename is the GUID of the entry in the secrets
table. This module is loaded automatically by the EFI driver
if the EFI secret area is populated.
Two operations are supported for the files: read and unlink.
Reading the file returns the content of secret entry.
Unlinking the file overwrites the secret data with zeroes and
removes the entry from the filesystem. A secret cannot be read
after it has been unlinked.
For example, listing the available secrets::
# modprobe efi_secret
# ls -l /sys/kernel/security/secrets/coco
-r--r----- 1 root root 0 Jun 28 11:54 736870e5-84f0-4973-92ec-06879ce3da0b
-r--r----- 1 root root 0 Jun 28 11:54 83c83f7f-1356-4975-8b7e-d3a0b54312c6
-r--r----- 1 root root 0 Jun 28 11:54 9553f55d-3da2-43ee-ab5d-ff17f78864d2
-r--r----- 1 root root 0 Jun 28 11:54 e6f5a162-d67f-4750-a67c-5d065f2a9910
Reading the secret data by reading a file::
# cat /sys/kernel/security/secrets/coco/e6f5a162-d67f-4750-a67c-5d065f2a9910
the-content-of-the-secret-data
Wiping a secret by unlinking a file::
# rm /sys/kernel/security/secrets/coco/e6f5a162-d67f-4750-a67c-5d065f2a9910
# ls -l /sys/kernel/security/secrets/coco
-r--r----- 1 root root 0 Jun 28 11:54 736870e5-84f0-4973-92ec-06879ce3da0b
-r--r----- 1 root root 0 Jun 28 11:54 83c83f7f-1356-4975-8b7e-d3a0b54312c6
-r--r----- 1 root root 0 Jun 28 11:54 9553f55d-3da2-43ee-ab5d-ff17f78864d2
Note: The binary format of the secrets table injected by the
Guest Owner is described in
drivers/virt/coco/efi_secret/efi_secret.c under "Structure of
the EFI secret area".

View File

@@ -973,7 +973,7 @@ The ``->dynticks`` field counts the corresponding CPU's transitions to
and from either dyntick-idle or user mode, so that this counter has an
even value when the CPU is in dyntick-idle mode or user mode and an odd
value otherwise. The transitions to/from user mode need to be counted
for user mode adaptive-ticks support (see timers/NO_HZ.txt).
for user mode adaptive-ticks support (see Documentation/timers/no_hz.rst).
The ``->rcu_need_heavy_qs`` field is used to record the fact that the
RCU core code would really like to see a quiescent state from the

View File

@@ -406,7 +406,7 @@ In earlier implementations, the task requesting the expedited grace
period also drove it to completion. This straightforward approach had
the disadvantage of needing to account for POSIX signals sent to user
tasks, so more recent implemementations use the Linux kernel's
`workqueues <https://www.kernel.org/doc/Documentation/core-api/workqueue.rst>`__.
workqueues (see Documentation/core-api/workqueue.rst).
The requesting task still does counter snapshotting and funnel-lock
processing, but the task reaching the top of the funnel lock does a

View File

@@ -370,8 +370,8 @@ pointer fetched by rcu_dereference() may not be used outside of the
outermost RCU read-side critical section containing that
rcu_dereference(), unless protection of the corresponding data
element has been passed from RCU to some other synchronization
mechanism, most commonly locking or `reference
counting <https://www.kernel.org/doc/Documentation/RCU/rcuref.txt>`__.
mechanism, most commonly locking or reference counting
(see ../../rcuref.rst).
.. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF]
.. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf
@@ -2654,6 +2654,38 @@ synchronize_rcu(), and rcu_barrier(), respectively. In
three APIs are therefore implemented by separate functions that check
for voluntary context switches.
Tasks Rude RCU
~~~~~~~~~~~~~~
Some forms of tracing need to wait for all preemption-disabled regions
of code running on any online CPU, including those executed when RCU is
not watching. This means that synchronize_rcu() is insufficient, and
Tasks Rude RCU must be used instead. This flavor of RCU does its work by
forcing a workqueue to be scheduled on each online CPU, hence the "Rude"
moniker. And this operation is considered to be quite rude by real-time
workloads that don't want their ``nohz_full`` CPUs receiving IPIs and
by battery-powered systems that don't want their idle CPUs to be awakened.
The tasks-rude-RCU API is also reader-marking-free and thus quite compact,
consisting of call_rcu_tasks_rude(), synchronize_rcu_tasks_rude(),
and rcu_barrier_tasks_rude().
Tasks Trace RCU
~~~~~~~~~~~~~~~
Some forms of tracing need to sleep in readers, but cannot tolerate
SRCU's read-side overhead, which includes a full memory barrier in both
srcu_read_lock() and srcu_read_unlock(). This need is handled by a
Tasks Trace RCU that uses scheduler locking and IPIs to synchronize with
readers. Real-time systems that cannot tolerate IPIs may build their
kernels with ``CONFIG_TASKS_TRACE_RCU_READ_MB=y``, which avoids the IPIs at
the expense of adding full memory barriers to the read-side primitives.
The tasks-trace-RCU API is also reasonably compact,
consisting of rcu_read_lock_trace(), rcu_read_unlock_trace(),
rcu_read_lock_trace_held(), call_rcu_tasks_trace(),
synchronize_rcu_tasks_trace(), and rcu_barrier_tasks_trace().
Possible Future Changes
-----------------------

View File

@@ -33,8 +33,8 @@ Situation 1: Hash Tables
Hash tables are often implemented as an array, where each array entry
has a linked-list hash chain. Each hash chain can be protected by RCU
as described in the listRCU.txt document. This approach also applies
to other array-of-list situations, such as radix trees.
as described in listRCU.rst. This approach also applies to other
array-of-list situations, such as radix trees.
.. _static_arrays:

View File

@@ -140,8 +140,7 @@ over a rather long period of time, but improvements are always welcome!
prevents destructive compiler optimizations. However,
with a bit of devious creativity, it is possible to
mishandle the return value from rcu_dereference().
Please see rcu_dereference.txt in this directory for
more information.
Please see rcu_dereference.rst for more information.
The rcu_dereference() primitive is used by the
various "_rcu()" list-traversal primitives, such
@@ -151,7 +150,7 @@ over a rather long period of time, but improvements are always welcome!
primitives. This is particularly useful in code that
is common to readers and updaters. However, lockdep
will complain if you access rcu_dereference() outside
of an RCU read-side critical section. See lockdep.txt
of an RCU read-side critical section. See lockdep.rst
to learn what to do about this.
Of course, neither rcu_dereference() nor the "_rcu()"
@@ -323,7 +322,7 @@ over a rather long period of time, but improvements are always welcome!
primitives when the update-side lock is held is that doing so
can be quite helpful in reducing code bloat when common code is
shared between readers and updaters. Additional primitives
are provided for this case, as discussed in lockdep.txt.
are provided for this case, as discussed in lockdep.rst.
One exception to this rule is when data is only ever added to
the linked data structure, and is never removed during any
@@ -480,4 +479,4 @@ over a rather long period of time, but improvements are always welcome!
both rcu_barrier() and synchronize_rcu(), if necessary, using
something like workqueues to to execute them concurrently.
See rcubarrier.txt for more information.
See rcubarrier.rst for more information.

View File

@@ -10,9 +10,8 @@ A "grace period" must elapse between the two parts, and this grace period
must be long enough that any readers accessing the item being deleted have
since dropped their references. For example, an RCU-protected deletion
from a linked list would first remove the item from the list, wait for
a grace period to elapse, then free the element. See the
:ref:`Documentation/RCU/listRCU.rst <list_rcu_doc>` for more information on
using RCU with linked lists.
a grace period to elapse, then free the element. See listRCU.rst for more
information on using RCU with linked lists.
Frequently Asked Questions
--------------------------
@@ -50,7 +49,7 @@ Frequently Asked Questions
- If I am running on a uniprocessor kernel, which can only do one
thing at a time, why should I wait for a grace period?
See :ref:`Documentation/RCU/UP.rst <up_doc>` for more information.
See UP.rst for more information.
- How can I see where RCU is currently used in the Linux kernel?
@@ -64,13 +63,13 @@ Frequently Asked Questions
- What guidelines should I follow when writing code that uses RCU?
See the checklist.txt file in this directory.
See checklist.rst.
- Why the name "RCU"?
"RCU" stands for "read-copy update".
:ref:`Documentation/RCU/listRCU.rst <list_rcu_doc>` has more information on where
this name came from, search for "read-copy update" to find it.
listRCU.rst has more information on where this name came from, search
for "read-copy update" to find it.
- I hear that RCU is patented? What is with that?

View File

@@ -8,7 +8,7 @@ This section describes how to use hlist_nulls to
protect read-mostly linked lists and
objects using SLAB_TYPESAFE_BY_RCU allocations.
Please read the basics in Documentation/RCU/listRCU.rst
Please read the basics in listRCU.rst.
Using 'nulls'
=============

View File

@@ -162,6 +162,26 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
Stall-warning messages may be enabled and disabled completely via
/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
--------------------------------
Same as the CONFIG_RCU_CPU_STALL_TIMEOUT parameter but only for
the expedited grace period. This parameter defines the period
of time that RCU will wait from the beginning of an expedited
grace period until it issues an RCU CPU stall warning. This time
period is normally 20 milliseconds on Android devices. A zero
value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be used,
after conversion to milliseconds.
This configuration parameter may be changed at runtime via the
/sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout, however
this parameter is checked only at the beginning of a cycle. If you
are in a current stall cycle, setting it to a new value will change
the timeout for the -next- stall.
Stall-warning messages may be enabled and disabled completely via
/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
RCU_STALL_DELAY_DELTA
---------------------

View File

@@ -224,7 +224,7 @@ synchronize_rcu()
be delayed. This property results in system resilience in face
of denial-of-service attacks. Code using call_rcu() should limit
update rate in order to gain this same sort of resilience. See
checklist.txt for some approaches to limiting the update rate.
checklist.rst for some approaches to limiting the update rate.
rcu_assign_pointer()
^^^^^^^^^^^^^^^^^^^^
@@ -318,7 +318,7 @@ rcu_dereference()
must prohibit. The rcu_dereference_protected() variant takes
a lockdep expression to indicate which locks must be acquired
by the caller. If the indicated protection is not provided,
a lockdep splat is emitted. See Documentation/RCU/Design/Requirements/Requirements.rst
a lockdep splat is emitted. See Design/Requirements/Requirements.rst
and the API's code comments for more details and example usage.
.. [2] If the list_for_each_entry_rcu() instance might be used by
@@ -399,8 +399,7 @@ for specialized uses, but are relatively uncommon.
This section shows a simple use of the core RCU API to protect a
global pointer to a dynamically allocated structure. More-typical
uses of RCU may be found in :ref:`listRCU.rst <list_rcu_doc>`,
:ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst <NMI_rcu_doc>`.
uses of RCU may be found in listRCU.rst, arrayRCU.rst, and NMI-RCU.rst.
::
struct foo {
@@ -482,10 +481,9 @@ So, to sum up:
RCU read-side critical sections that might be referencing that
data item.
See checklist.txt for additional rules to follow when using RCU.
And again, more-typical uses of RCU may be found in :ref:`listRCU.rst
<list_rcu_doc>`, :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst
<NMI_rcu_doc>`.
See checklist.rst for additional rules to follow when using RCU.
And again, more-typical uses of RCU may be found in listRCU.rst,
arrayRCU.rst, and NMI-RCU.rst.
.. _4_whatisRCU:
@@ -579,7 +577,7 @@ to avoid having to write your own callback::
kfree_rcu(old_fp, rcu);
Again, see checklist.txt for additional rules governing the use of RCU.
Again, see checklist.rst for additional rules governing the use of RCU.
.. _5_whatisRCU:
@@ -663,7 +661,7 @@ been able to write-acquire the lock otherwise. The smp_mb__after_spinlock()
promotes synchronize_rcu() to a full memory barrier in compliance with
the "Memory-Barrier Guarantees" listed in:
Documentation/RCU/Design/Requirements/Requirements.rst
Design/Requirements/Requirements.rst
It is possible to nest rcu_read_lock(), since reader-writer locks may
be recursively acquired. Note also that rcu_read_lock() is immune

View File

@@ -4901,6 +4901,18 @@
rcupdate.rcu_cpu_stall_timeout= [KNL]
Set timeout for RCU CPU stall warning messages.
The value is in seconds and the maximum allowed
value is 300 seconds.
rcupdate.rcu_exp_cpu_stall_timeout= [KNL]
Set timeout for expedited RCU CPU stall warning
messages. The value is in milliseconds
and the maximum allowed value is 21000
milliseconds. Please note that this value is
adjusted to an arch timer tick resolution.
Setting this to zero causes the value from
rcupdate.rcu_cpu_stall_timeout to be used (after
conversion from seconds to milliseconds).
rcupdate.rcu_expedited= [KNL]
Use expedited grace-period primitives, for
@@ -4963,10 +4975,34 @@
number avoids disturbing real-time workloads,
but lengthens grace periods.
rcupdate.rcu_task_stall_info= [KNL]
Set initial timeout in jiffies for RCU task stall
informational messages, which give some indication
of the problem for those not patient enough to
wait for ten minutes. Informational messages are
only printed prior to the stall-warning message
for a given grace period. Disable with a value
less than or equal to zero. Defaults to ten
seconds. A change in value does not take effect
until the beginning of the next grace period.
rcupdate.rcu_task_stall_info_mult= [KNL]
Multiplier for time interval between successive
RCU task stall informational messages for a given
RCU tasks grace period. This value is clamped
to one through ten, inclusive. It defaults to
the value three, so that the first informational
message is printed 10 seconds into the grace
period, the second at 40 seconds, the third at
160 seconds, and then the stall warning at 600
seconds would prevent a fourth at 640 seconds.
rcupdate.rcu_task_stall_timeout= [KNL]
Set timeout in jiffies for RCU task stall warning
messages. Disable with a value less than or equal
to zero.
Set timeout in jiffies for RCU task stall
warning messages. Disable with a value less
than or equal to zero. Defaults to ten minutes.
A change in value does not take effect until
the beginning of the next grace period.
rcupdate.rcu_self_test= [KNL]
Run the RCU early boot self tests
@@ -5385,6 +5421,17 @@
smart2= [HW]
Format: <io1>[,<io2>[,...,<io8>]]
smp.csd_lock_timeout= [KNL]
Specify the period of time in milliseconds
that smp_call_function() and friends will wait
for a CPU to release the CSD lock. This is
useful when diagnosing bugs involving CPUs
disabling interrupts for extended periods
of time. Defaults to 5,000 milliseconds, and
setting a value of zero disables this feature.
This feature may be more efficiently disabled
using the csdlock_debug- kernel parameter.
smsc-ircc2.nopnp [HW] Don't use PNP to discover SMC devices
smsc-ircc2.ircc_cfg= [HW] Device configuration I/O port
smsc-ircc2.ircc_sir= [HW] SIR base I/O port
@@ -5616,6 +5663,30 @@
off: Disable mitigation and remove
performance impact to RDRAND and RDSEED
srcutree.big_cpu_lim [KNL]
Specifies the number of CPUs constituting a
large system, such that srcu_struct structures
should immediately allocate an srcu_node array.
This kernel-boot parameter defaults to 128,
but takes effect only when the low-order four
bits of srcutree.convert_to_big is equal to 3
(decide at boot).
srcutree.convert_to_big [KNL]
Specifies under what conditions an SRCU tree
srcu_struct structure will be converted to big
form, that is, with an rcu_node tree:
0: Never.
1: At init_srcu_struct() time.
2: When rcutorture decides to.
3: Decide at boot time (default).
0x1X: Above plus if high contention.
Either way, the srcu_node tree will be sized based
on the actual runtime number of CPUs (nr_cpu_ids)
instead of the compile-time CONFIG_NR_CPUS.
srcutree.counter_wrap_check [KNL]
Specifies how frequently to check for
grace-period sequence counter wrap for the
@@ -5633,6 +5704,14 @@
expediting. Set to zero to disable automatic
expediting.
srcutree.small_contention_lim [KNL]
Specifies the number of update-side contention
events per jiffy will be tolerated before
initiating a conversion of an srcu_struct
structure to big form. Note that the value of
srcutree.convert_to_big must have the 0x10 bit
set for contention-based conversions to occur.
ssbd= [ARM64,HW]
Speculative Store Bypass Disable control

View File

@@ -17,3 +17,4 @@ Security Documentation
tpm/index
digsig
landlock
secrets/index

View File

@@ -0,0 +1,103 @@
.. SPDX-License-Identifier: GPL-2.0
==============================
Confidential Computing secrets
==============================
This document describes how Confidential Computing secret injection is handled
from the firmware to the operating system, in the EFI driver and the efi_secret
kernel module.
Introduction
============
Confidential Computing (coco) hardware such as AMD SEV (Secure Encrypted
Virtualization) allows guest owners to inject secrets into the VMs
memory without the host/hypervisor being able to read them. In SEV,
secret injection is performed early in the VM launch process, before the
guest starts running.
The efi_secret kernel module allows userspace applications to access these
secrets via securityfs.
Secret data flow
================
The guest firmware may reserve a designated memory area for secret injection,
and publish its location (base GPA and length) in the EFI configuration table
under a ``LINUX_EFI_COCO_SECRET_AREA_GUID`` entry
(``adf956ad-e98c-484c-ae11-b51c7d336447``). This memory area should be marked
by the firmware as ``EFI_RESERVED_TYPE``, and therefore the kernel should not
be use it for its own purposes.
During the VM's launch, the virtual machine manager may inject a secret to that
area. In AMD SEV and SEV-ES this is performed using the
``KVM_SEV_LAUNCH_SECRET`` command (see [sev]_). The strucutre of the injected
Guest Owner secret data should be a GUIDed table of secret values; the binary
format is described in ``drivers/virt/coco/efi_secret/efi_secret.c`` under
"Structure of the EFI secret area".
On kernel start, the kernel's EFI driver saves the location of the secret area
(taken from the EFI configuration table) in the ``efi.coco_secret`` field.
Later it checks if the secret area is populated: it maps the area and checks
whether its content begins with ``EFI_SECRET_TABLE_HEADER_GUID``
(``1e74f542-71dd-4d66-963e-ef4287ff173b``). If the secret area is populated,
the EFI driver will autoload the efi_secret kernel module, which exposes the
secrets to userspace applications via securityfs. The details of the
efi_secret filesystem interface are in [secrets-coco-abi]_.
Application usage example
=========================
Consider a guest performing computations on encrypted files. The Guest Owner
provides the decryption key (= secret) using the secret injection mechanism.
The guest application reads the secret from the efi_secret filesystem and
proceeds to decrypt the files into memory and then performs the needed
computations on the content.
In this example, the host can't read the files from the disk image
because they are encrypted. Host can't read the decryption key because
it is passed using the secret injection mechanism (= secure channel).
Host can't read the decrypted content from memory because it's a
confidential (memory-encrypted) guest.
Here is a simple example for usage of the efi_secret module in a guest
to which an EFI secret area with 4 secrets was injected during launch::
# ls -la /sys/kernel/security/secrets/coco
total 0
drwxr-xr-x 2 root root 0 Jun 28 11:54 .
drwxr-xr-x 3 root root 0 Jun 28 11:54 ..
-r--r----- 1 root root 0 Jun 28 11:54 736870e5-84f0-4973-92ec-06879ce3da0b
-r--r----- 1 root root 0 Jun 28 11:54 83c83f7f-1356-4975-8b7e-d3a0b54312c6
-r--r----- 1 root root 0 Jun 28 11:54 9553f55d-3da2-43ee-ab5d-ff17f78864d2
-r--r----- 1 root root 0 Jun 28 11:54 e6f5a162-d67f-4750-a67c-5d065f2a9910
# hd /sys/kernel/security/secrets/coco/e6f5a162-d67f-4750-a67c-5d065f2a9910
00000000 74 68 65 73 65 2d 61 72 65 2d 74 68 65 2d 6b 61 |these-are-the-ka|
00000010 74 61 2d 73 65 63 72 65 74 73 00 01 02 03 04 05 |ta-secrets......|
00000020 06 07 |..|
00000022
# rm /sys/kernel/security/secrets/coco/e6f5a162-d67f-4750-a67c-5d065f2a9910
# ls -la /sys/kernel/security/secrets/coco
total 0
drwxr-xr-x 2 root root 0 Jun 28 11:55 .
drwxr-xr-x 3 root root 0 Jun 28 11:54 ..
-r--r----- 1 root root 0 Jun 28 11:54 736870e5-84f0-4973-92ec-06879ce3da0b
-r--r----- 1 root root 0 Jun 28 11:54 83c83f7f-1356-4975-8b7e-d3a0b54312c6
-r--r----- 1 root root 0 Jun 28 11:54 9553f55d-3da2-43ee-ab5d-ff17f78864d2
References
==========
See [sev-api-spec]_ for more info regarding SEV ``LAUNCH_SECRET`` operation.
.. [sev] Documentation/virt/kvm/amd-memory-encryption.rst
.. [secrets-coco-abi] Documentation/ABI/testing/securityfs-secrets-coco
.. [sev-api-spec] https://www.amd.com/system/files/TechDocs/55766_SEV-KM_API_Specification.pdf

View File

@@ -0,0 +1,9 @@
.. SPDX-License-Identifier: GPL-2.0
=====================
Secrets documentation
=====================
.. toctree::
coco

View File

@@ -35,6 +35,7 @@ config KPROBES
depends on MODULES
depends on HAVE_KPROBES
select KALLSYMS
select TASKS_RCU if PREEMPTION
help
Kprobes allows you to trap at almost any kernel address and
execute a callback function. register_kprobe() establishes

View File

@@ -163,7 +163,11 @@ extra_header_fields:
.long 0x200 # SizeOfHeaders
.long 0 # CheckSum
.word IMAGE_SUBSYSTEM_EFI_APPLICATION # Subsystem (EFI application)
#ifdef CONFIG_DXE_MEM_ATTRIBUTES
.word IMAGE_DLL_CHARACTERISTICS_NX_COMPAT # DllCharacteristics
#else
.word 0 # DllCharacteristics
#endif
#ifdef CONFIG_X86_32
.long 0 # SizeOfStackReserve
.long 0 # SizeOfStackCommit

View File

@@ -357,6 +357,11 @@ static inline u32 efi64_convert_status(efi_status_t status)
runtime), \
func, __VA_ARGS__))
#define efi_dxe_call(func, ...) \
(efi_is_native() \
? efi_dxe_table->func(__VA_ARGS__) \
: __efi64_thunk_map(efi_dxe_table, func, __VA_ARGS__))
#else /* CONFIG_EFI_MIXED */
static inline bool efi_is_64bit(void)

View File

@@ -93,6 +93,9 @@ static const unsigned long * const efi_tables[] = {
#ifdef CONFIG_LOAD_UEFI_KEYS
&efi.mokvar_table,
#endif
#ifdef CONFIG_EFI_COCO_SECRET
&efi.coco_secret,
#endif
};
u64 efi_setup; /* efi setup_data physical address */

View File

@@ -91,6 +91,18 @@ config EFI_SOFT_RESERVE
If unsure, say Y.
config EFI_DXE_MEM_ATTRIBUTES
bool "Adjust memory attributes in EFISTUB"
depends on EFI && EFI_STUB && X86
default y
help
UEFI specification does not guarantee all memory to be
accessible for both write and execute as the kernel expects
it to be.
Use DXE services to check and alter memory protection
attributes during boot via EFISTUB to ensure that memory
ranges used by the kernel are writable and executable.
config EFI_PARAMS_FROM_FDT
bool
help
@@ -284,3 +296,34 @@ config EFI_CUSTOM_SSDT_OVERLAYS
See Documentation/admin-guide/acpi/ssdt-overlays.rst for more
information.
config EFI_DISABLE_RUNTIME
bool "Disable EFI runtime services support by default"
default y if PREEMPT_RT
help
Allow to disable the EFI runtime services support by default. This can
already be achieved by using the efi=noruntime option, but it could be
useful to have this default without any kernel command line parameter.
The EFI runtime services are disabled by default when PREEMPT_RT is
enabled, because measurements have shown that some EFI functions calls
might take too much time to complete, causing large latencies which is
an issue for Real-Time kernels.
This default can be overridden by using the efi=runtime option.
config EFI_COCO_SECRET
bool "EFI Confidential Computing Secret Area Support"
depends on EFI
help
Confidential Computing platforms (such as AMD SEV) allow the
Guest Owner to securely inject secrets during guest VM launch.
The secrets are placed in a designated EFI reserved memory area.
In order to use the secrets in the kernel, the location of the secret
area (as published in the EFI config table) must be kept.
If you say Y here, the address of the EFI secret area will be kept
for usage inside the kernel. This will allow the
virt/coco/efi_secret module to access the secrets, which in turn
allows userspace programs to access the injected secrets.

View File

@@ -46,6 +46,9 @@ struct efi __read_mostly efi = {
#ifdef CONFIG_LOAD_UEFI_KEYS
.mokvar_table = EFI_INVALID_TABLE_ADDR,
#endif
#ifdef CONFIG_EFI_COCO_SECRET
.coco_secret = EFI_INVALID_TABLE_ADDR,
#endif
};
EXPORT_SYMBOL(efi);
@@ -66,7 +69,7 @@ struct mm_struct efi_mm = {
struct workqueue_struct *efi_rts_wq;
static bool disable_runtime = IS_ENABLED(CONFIG_PREEMPT_RT);
static bool disable_runtime = IS_ENABLED(CONFIG_EFI_DISABLE_RUNTIME);
static int __init setup_noefi(char *arg)
{
disable_runtime = true;
@@ -422,6 +425,11 @@ static int __init efisubsys_init(void)
if (efi_enabled(EFI_DBG) && efi_enabled(EFI_PRESERVE_BS_REGIONS))
efi_debugfs_init();
#ifdef CONFIG_EFI_COCO_SECRET
if (efi.coco_secret != EFI_INVALID_TABLE_ADDR)
platform_device_register_simple("efi_secret", 0, NULL, 0);
#endif
return 0;
err_remove_group:
@@ -528,6 +536,9 @@ static const efi_config_table_type_t common_tables[] __initconst = {
#endif
#ifdef CONFIG_LOAD_UEFI_KEYS
{LINUX_EFI_MOK_VARIABLE_TABLE_GUID, &efi.mokvar_table, "MOKvar" },
#endif
#ifdef CONFIG_EFI_COCO_SECRET
{LINUX_EFI_COCO_SECRET_AREA_GUID, &efi.coco_secret, "CocoSecret" },
#endif
{},
};

View File

@@ -117,7 +117,8 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
unsigned long *image_size,
unsigned long *reserve_addr,
unsigned long *reserve_size,
efi_loaded_image_t *image)
efi_loaded_image_t *image,
efi_handle_t image_handle)
{
const int slack = TEXT_OFFSET - 5 * PAGE_SIZE;
int alloc_size = MAX_UNCOMP_KERNEL_SIZE + EFI_PHYS_ALIGN;

View File

@@ -83,7 +83,8 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
unsigned long *image_size,
unsigned long *reserve_addr,
unsigned long *reserve_size,
efi_loaded_image_t *image)
efi_loaded_image_t *image,
efi_handle_t image_handle)
{
efi_status_t status;
unsigned long kernel_size, kernel_memsize = 0;
@@ -100,7 +101,15 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
u64 min_kimg_align = efi_nokaslr ? MIN_KIMG_ALIGN : EFI_KIMG_ALIGN;
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
if (!efi_nokaslr) {
efi_guid_t li_fixed_proto = LINUX_EFI_LOADED_IMAGE_FIXED_GUID;
void *p;
if (efi_nokaslr) {
efi_info("KASLR disabled on kernel command line\n");
} else if (efi_bs_call(handle_protocol, image_handle,
&li_fixed_proto, &p) == EFI_SUCCESS) {
efi_info("Image placement fixed by loader\n");
} else {
status = efi_get_random_bytes(sizeof(phys_seed),
(u8 *)&phys_seed);
if (status == EFI_NOT_FOUND) {
@@ -111,8 +120,6 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
status);
efi_nokaslr = true;
}
} else {
efi_info("KASLR disabled on kernel command line\n");
}
}

View File

@@ -198,7 +198,7 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
status = handle_kernel_image(&image_addr, &image_size,
&reserve_addr,
&reserve_size,
image);
image, handle);
if (status != EFI_SUCCESS) {
efi_err("Failed to relocate kernel\n");
goto fail_free_screeninfo;

View File

@@ -36,6 +36,9 @@ extern bool efi_novamap;
extern const efi_system_table_t *efi_system_table;
typedef union efi_dxe_services_table efi_dxe_services_table_t;
extern const efi_dxe_services_table_t *efi_dxe_table;
efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
efi_system_table_t *sys_table_arg);
@@ -44,6 +47,7 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
#define efi_is_native() (true)
#define efi_bs_call(func, ...) efi_system_table->boottime->func(__VA_ARGS__)
#define efi_rt_call(func, ...) efi_system_table->runtime->func(__VA_ARGS__)
#define efi_dxe_call(func, ...) efi_dxe_table->func(__VA_ARGS__)
#define efi_table_attr(inst, attr) (inst->attr)
#define efi_call_proto(inst, func, ...) inst->func(inst, ##__VA_ARGS__)
@@ -329,6 +333,76 @@ union efi_boot_services {
} mixed_mode;
};
typedef enum {
EfiGcdMemoryTypeNonExistent,
EfiGcdMemoryTypeReserved,
EfiGcdMemoryTypeSystemMemory,
EfiGcdMemoryTypeMemoryMappedIo,
EfiGcdMemoryTypePersistent,
EfiGcdMemoryTypeMoreReliable,
EfiGcdMemoryTypeMaximum
} efi_gcd_memory_type_t;
typedef struct {
efi_physical_addr_t base_address;
u64 length;
u64 capabilities;
u64 attributes;
efi_gcd_memory_type_t gcd_memory_type;
void *image_handle;
void *device_handle;
} efi_gcd_memory_space_desc_t;
/*
* EFI DXE Services table
*/
union efi_dxe_services_table {
struct {
efi_table_hdr_t hdr;
void *add_memory_space;
void *allocate_memory_space;
void *free_memory_space;
void *remove_memory_space;
efi_status_t (__efiapi *get_memory_space_descriptor)(efi_physical_addr_t,
efi_gcd_memory_space_desc_t *);
efi_status_t (__efiapi *set_memory_space_attributes)(efi_physical_addr_t,
u64, u64);
void *get_memory_space_map;
void *add_io_space;
void *allocate_io_space;
void *free_io_space;
void *remove_io_space;
void *get_io_space_descriptor;
void *get_io_space_map;
void *dispatch;
void *schedule;
void *trust;
void *process_firmware_volume;
void *set_memory_space_capabilities;
};
struct {
efi_table_hdr_t hdr;
u32 add_memory_space;
u32 allocate_memory_space;
u32 free_memory_space;
u32 remove_memory_space;
u32 get_memory_space_descriptor;
u32 set_memory_space_attributes;
u32 get_memory_space_map;
u32 add_io_space;
u32 allocate_io_space;
u32 free_io_space;
u32 remove_io_space;
u32 get_io_space_descriptor;
u32 get_io_space_map;
u32 dispatch;
u32 schedule;
u32 trust;
u32 process_firmware_volume;
u32 set_memory_space_capabilities;
} mixed_mode;
};
typedef union efi_uga_draw_protocol efi_uga_draw_protocol_t;
union efi_uga_draw_protocol {
@@ -720,6 +794,13 @@ union efi_tcg2_protocol {
} mixed_mode;
};
struct riscv_efi_boot_protocol {
u64 revision;
efi_status_t (__efiapi *get_boot_hartid)(struct riscv_efi_boot_protocol *,
unsigned long *boot_hartid);
};
typedef union efi_load_file_protocol efi_load_file_protocol_t;
typedef union efi_load_file_protocol efi_load_file2_protocol_t;
@@ -865,7 +946,8 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
unsigned long *image_size,
unsigned long *reserve_addr,
unsigned long *reserve_size,
efi_loaded_image_t *image);
efi_loaded_image_t *image,
efi_handle_t image_handle);
asmlinkage void __noreturn efi_enter_kernel(unsigned long entrypoint,
unsigned long fdt_addr,

View File

@@ -56,6 +56,7 @@ efi_status_t efi_random_alloc(unsigned long size,
unsigned long random_seed)
{
unsigned long map_size, desc_size, total_slots = 0, target_slot;
unsigned long total_mirrored_slots = 0;
unsigned long buff_size;
efi_status_t status;
efi_memory_desc_t *memory_map;
@@ -86,8 +87,14 @@ efi_status_t efi_random_alloc(unsigned long size,
slots = get_entry_num_slots(md, size, ilog2(align));
MD_NUM_SLOTS(md) = slots;
total_slots += slots;
if (md->attribute & EFI_MEMORY_MORE_RELIABLE)
total_mirrored_slots += slots;
}
/* consider only mirrored slots for randomization if any exist */
if (total_mirrored_slots > 0)
total_slots = total_mirrored_slots;
/* find a random number between 0 and total_slots */
target_slot = (total_slots * (u64)(random_seed & U32_MAX)) >> 32;
@@ -107,6 +114,10 @@ efi_status_t efi_random_alloc(unsigned long size,
efi_physical_addr_t target;
unsigned long pages;
if (total_mirrored_slots > 0 &&
!(md->attribute & EFI_MEMORY_MORE_RELIABLE))
continue;
if (target_slot >= MD_NUM_SLOTS(md)) {
target_slot -= MD_NUM_SLOTS(md);
continue;

View File

@@ -21,9 +21,9 @@
#define MIN_KIMG_ALIGN SZ_4M
#endif
typedef void __noreturn (*jump_kernel_func)(unsigned int, unsigned long);
typedef void __noreturn (*jump_kernel_func)(unsigned long, unsigned long);
static u32 hartid;
static unsigned long hartid;
static int get_boot_hartid_from_fdt(void)
{
@@ -47,14 +47,31 @@ static int get_boot_hartid_from_fdt(void)
return 0;
}
static efi_status_t get_boot_hartid_from_efi(void)
{
efi_guid_t boot_protocol_guid = RISCV_EFI_BOOT_PROTOCOL_GUID;
struct riscv_efi_boot_protocol *boot_protocol;
efi_status_t status;
status = efi_bs_call(locate_protocol, &boot_protocol_guid, NULL,
(void **)&boot_protocol);
if (status != EFI_SUCCESS)
return status;
return efi_call_proto(boot_protocol, get_boot_hartid, &hartid);
}
efi_status_t check_platform_features(void)
{
efi_status_t status;
int ret;
ret = get_boot_hartid_from_fdt();
if (ret) {
efi_err("/chosen/boot-hartid missing or invalid!\n");
return EFI_UNSUPPORTED;
status = get_boot_hartid_from_efi();
if (status != EFI_SUCCESS) {
ret = get_boot_hartid_from_fdt();
if (ret) {
efi_err("Failed to get boot hartid!\n");
return EFI_UNSUPPORTED;
}
}
return EFI_SUCCESS;
}
@@ -80,7 +97,8 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
unsigned long *image_size,
unsigned long *reserve_addr,
unsigned long *reserve_size,
efi_loaded_image_t *image)
efi_loaded_image_t *image,
efi_handle_t image_handle)
{
unsigned long kernel_size = 0;
unsigned long preferred_addr;

View File

@@ -22,6 +22,7 @@
#define MAXMEM_X86_64_4LEVEL (1ull << 46)
const efi_system_table_t *efi_system_table;
const efi_dxe_services_table_t *efi_dxe_table;
extern u32 image_offset;
static efi_loaded_image_t *image = NULL;
@@ -211,9 +212,110 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
}
}
static void
adjust_memory_range_protection(unsigned long start, unsigned long size)
{
efi_status_t status;
efi_gcd_memory_space_desc_t desc;
unsigned long end, next;
unsigned long rounded_start, rounded_end;
unsigned long unprotect_start, unprotect_size;
int has_system_memory = 0;
if (efi_dxe_table == NULL)
return;
rounded_start = rounddown(start, EFI_PAGE_SIZE);
rounded_end = roundup(start + size, EFI_PAGE_SIZE);
/*
* Don't modify memory region attributes, they are
* already suitable, to lower the possibility to
* encounter firmware bugs.
*/
for (end = start + size; start < end; start = next) {
status = efi_dxe_call(get_memory_space_descriptor, start, &desc);
if (status != EFI_SUCCESS)
return;
next = desc.base_address + desc.length;
/*
* Only system memory is suitable for trampoline/kernel image placement,
* so only this type of memory needs its attributes to be modified.
*/
if (desc.gcd_memory_type != EfiGcdMemoryTypeSystemMemory ||
(desc.attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0)
continue;
unprotect_start = max(rounded_start, (unsigned long)desc.base_address);
unprotect_size = min(rounded_end, next) - unprotect_start;
status = efi_dxe_call(set_memory_space_attributes,
unprotect_start, unprotect_size,
EFI_MEMORY_WB);
if (status != EFI_SUCCESS) {
efi_warn("Unable to unprotect memory range [%08lx,%08lx]: %d\n",
unprotect_start,
unprotect_start + unprotect_size,
(int)status);
}
}
}
/*
* Trampoline takes 2 pages and can be loaded in first megabyte of memory
* with its end placed between 128k and 640k where BIOS might start.
* (see arch/x86/boot/compressed/pgtable_64.c)
*
* We cannot find exact trampoline placement since memory map
* can be modified by UEFI, and it can alter the computed address.
*/
#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
void startup_32(struct boot_params *boot_params);
static void
setup_memory_protection(unsigned long image_base, unsigned long image_size)
{
/*
* Allow execution of possible trampoline used
* for switching between 4- and 5-level page tables
* and relocated kernel image.
*/
adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
TRAMPOLINE_PLACEMENT_SIZE);
#ifdef CONFIG_64BIT
if (image_base != (unsigned long)startup_32)
adjust_memory_range_protection(image_base, image_size);
#else
/*
* Clear protection flags on a whole range of possible
* addresses used for KASLR. We don't need to do that
* on x86_64, since KASLR/extraction is performed after
* dedicated identity page tables are built and we only
* need to remove possible protection on relocated image
* itself disregarding further relocations.
*/
adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
#endif
}
static const efi_char16_t apple[] = L"Apple";
static void setup_quirks(struct boot_params *boot_params)
static void setup_quirks(struct boot_params *boot_params,
unsigned long image_base,
unsigned long image_size)
{
efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
efi_table_attr(efi_system_table, fw_vendor);
@@ -222,6 +324,9 @@ static void setup_quirks(struct boot_params *boot_params)
if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
retrieve_apple_device_properties(boot_params);
}
if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
setup_memory_protection(image_base, image_size);
}
/*
@@ -341,8 +446,6 @@ static void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
asm("hlt");
}
void startup_32(struct boot_params *boot_params);
void __noreturn efi_stub_entry(efi_handle_t handle,
efi_system_table_t *sys_table_arg,
struct boot_params *boot_params);
@@ -677,11 +780,17 @@ unsigned long efi_main(efi_handle_t handle,
efi_status_t status;
efi_system_table = sys_table_arg;
/* Check if we were booted by the EFI firmware */
if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
efi_exit(handle, EFI_INVALID_PARAMETER);
efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
if (efi_dxe_table &&
efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
efi_warn("Ignoring DXE services table: invalid signature\n");
efi_dxe_table = NULL;
}
/*
* If the kernel isn't already loaded at a suitable address,
* relocate it.
@@ -791,7 +900,7 @@ unsigned long efi_main(efi_handle_t handle,
setup_efi_pci(boot_params);
setup_quirks(boot_params);
setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
status = exit_boot(boot_params, handle);
if (status != EFI_SUCCESS) {

View File

@@ -47,4 +47,7 @@ source "drivers/virt/vboxguest/Kconfig"
source "drivers/virt/nitro_enclaves/Kconfig"
source "drivers/virt/acrn/Kconfig"
source "drivers/virt/coco/efi_secret/Kconfig"
endif

View File

@@ -9,3 +9,4 @@ obj-y += vboxguest/
obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
obj-$(CONFIG_ACRN_HSM) += acrn/
obj-$(CONFIG_EFI_SECRET) += coco/efi_secret/

View File

@@ -0,0 +1,16 @@
# SPDX-License-Identifier: GPL-2.0-only
config EFI_SECRET
tristate "EFI secret area securityfs support"
depends on EFI && X86_64
select EFI_COCO_SECRET
select SECURITYFS
help
This is a driver for accessing the EFI secret area via securityfs.
The EFI secret area is a memory area designated by the firmware for
confidential computing secret injection (for example for AMD SEV
guests). The driver exposes the secrets as files in
<securityfs>/secrets/coco. Files can be read and deleted (deleting
a file wipes the secret from memory).
To compile this driver as a module, choose M here.
The module will be called efi_secret.

View File

@@ -0,0 +1,2 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_EFI_SECRET) += efi_secret.o

View File

@@ -0,0 +1,349 @@
// SPDX-License-Identifier: GPL-2.0
/*
* efi_secret module
*
* Copyright (C) 2022 IBM Corporation
* Author: Dov Murik <dovmurik@linux.ibm.com>
*/
/**
* DOC: efi_secret: Allow reading EFI confidential computing (coco) secret area
* via securityfs interface.
*
* When the module is loaded (and securityfs is mounted, typically under
* /sys/kernel/security), a "secrets/coco" directory is created in securityfs.
* In it, a file is created for each secret entry. The name of each such file
* is the GUID of the secret entry, and its content is the secret data.
*/
#include <linux/platform_device.h>
#include <linux/seq_file.h>
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/io.h>
#include <linux/security.h>
#include <linux/efi.h>
#include <linux/cacheflush.h>
#define EFI_SECRET_NUM_FILES 64
struct efi_secret {
struct dentry *secrets_dir;
struct dentry *fs_dir;
struct dentry *fs_files[EFI_SECRET_NUM_FILES];
void __iomem *secret_data;
u64 secret_data_len;
};
/*
* Structure of the EFI secret area
*
* Offset Length
* (bytes) (bytes) Usage
* ------- ------- -----
* 0 16 Secret table header GUID (must be 1e74f542-71dd-4d66-963e-ef4287ff173b)
* 16 4 Length of bytes of the entire secret area
*
* 20 16 First secret entry's GUID
* 36 4 First secret entry's length in bytes (= 16 + 4 + x)
* 40 x First secret entry's data
*
* 40+x 16 Second secret entry's GUID
* 56+x 4 Second secret entry's length in bytes (= 16 + 4 + y)
* 60+x y Second secret entry's data
*
* (... and so on for additional entries)
*
* The GUID of each secret entry designates the usage of the secret data.
*/
/**
* struct secret_header - Header of entire secret area; this should be followed
* by instances of struct secret_entry.
* @guid: Must be EFI_SECRET_TABLE_HEADER_GUID
* @len: Length in bytes of entire secret area, including header
*/
struct secret_header {
efi_guid_t guid;
u32 len;
} __attribute((packed));
/**
* struct secret_entry - Holds one secret entry
* @guid: Secret-specific GUID (or NULL_GUID if this secret entry was deleted)
* @len: Length of secret entry, including its guid and len fields
* @data: The secret data (full of zeros if this secret entry was deleted)
*/
struct secret_entry {
efi_guid_t guid;
u32 len;
u8 data[];
} __attribute((packed));
static size_t secret_entry_data_len(struct secret_entry *e)
{
return e->len - sizeof(*e);
}
static struct efi_secret the_efi_secret;
static inline struct efi_secret *efi_secret_get(void)
{
return &the_efi_secret;
}
static int efi_secret_bin_file_show(struct seq_file *file, void *data)
{
struct secret_entry *e = file->private;
if (e)
seq_write(file, e->data, secret_entry_data_len(e));
return 0;
}
DEFINE_SHOW_ATTRIBUTE(efi_secret_bin_file);
/*
* Overwrite memory content with zeroes, and ensure that dirty cache lines are
* actually written back to memory, to clear out the secret.
*/
static void wipe_memory(void *addr, size_t size)
{
memzero_explicit(addr, size);
#ifdef CONFIG_X86
clflush_cache_range(addr, size);
#endif
}
static int efi_secret_unlink(struct inode *dir, struct dentry *dentry)
{
struct efi_secret *s = efi_secret_get();
struct inode *inode = d_inode(dentry);
struct secret_entry *e = (struct secret_entry *)inode->i_private;
int i;
if (e) {
/* Zero out the secret data */
wipe_memory(e->data, secret_entry_data_len(e));
e->guid = NULL_GUID;
}
inode->i_private = NULL;
for (i = 0; i < EFI_SECRET_NUM_FILES; i++)
if (s->fs_files[i] == dentry)
s->fs_files[i] = NULL;
/*
* securityfs_remove tries to lock the directory's inode, but we reach
* the unlink callback when it's already locked
*/
inode_unlock(dir);
securityfs_remove(dentry);
inode_lock(dir);
return 0;
}
static const struct inode_operations efi_secret_dir_inode_operations = {
.lookup = simple_lookup,
.unlink = efi_secret_unlink,
};
static int efi_secret_map_area(struct platform_device *dev)
{
int ret;
struct efi_secret *s = efi_secret_get();
struct linux_efi_coco_secret_area *secret_area;
if (efi.coco_secret == EFI_INVALID_TABLE_ADDR) {
dev_err(&dev->dev, "Secret area address is not available\n");
return -EINVAL;
}
secret_area = memremap(efi.coco_secret, sizeof(*secret_area), MEMREMAP_WB);
if (secret_area == NULL) {
dev_err(&dev->dev, "Could not map secret area EFI config entry\n");
return -ENOMEM;
}
if (!secret_area->base_pa || secret_area->size < sizeof(struct secret_header)) {
dev_err(&dev->dev,
"Invalid secret area memory location (base_pa=0x%llx size=0x%llx)\n",
secret_area->base_pa, secret_area->size);
ret = -EINVAL;
goto unmap;
}
s->secret_data = ioremap_encrypted(secret_area->base_pa, secret_area->size);
if (s->secret_data == NULL) {
dev_err(&dev->dev, "Could not map secret area\n");
ret = -ENOMEM;
goto unmap;
}
s->secret_data_len = secret_area->size;
ret = 0;
unmap:
memunmap(secret_area);
return ret;
}
static void efi_secret_securityfs_teardown(struct platform_device *dev)
{
struct efi_secret *s = efi_secret_get();
int i;
for (i = (EFI_SECRET_NUM_FILES - 1); i >= 0; i--) {
securityfs_remove(s->fs_files[i]);
s->fs_files[i] = NULL;
}
securityfs_remove(s->fs_dir);
s->fs_dir = NULL;
securityfs_remove(s->secrets_dir);
s->secrets_dir = NULL;
dev_dbg(&dev->dev, "Removed securityfs entries\n");
}
static int efi_secret_securityfs_setup(struct platform_device *dev)
{
struct efi_secret *s = efi_secret_get();
int ret = 0, i = 0, bytes_left;
unsigned char *ptr;
struct secret_header *h;
struct secret_entry *e;
struct dentry *dent;
char guid_str[EFI_VARIABLE_GUID_LEN + 1];
ptr = (void __force *)s->secret_data;
h = (struct secret_header *)ptr;
if (efi_guidcmp(h->guid, EFI_SECRET_TABLE_HEADER_GUID)) {
/*
* This is not an error: it just means that EFI defines secret
* area but it was not populated by the Guest Owner.
*/
dev_dbg(&dev->dev, "EFI secret area does not start with correct GUID\n");
return -ENODEV;
}
if (h->len < sizeof(*h)) {
dev_err(&dev->dev, "EFI secret area reported length is too small\n");
return -EINVAL;
}
if (h->len > s->secret_data_len) {
dev_err(&dev->dev, "EFI secret area reported length is too big\n");
return -EINVAL;
}
s->secrets_dir = NULL;
s->fs_dir = NULL;
memset(s->fs_files, 0, sizeof(s->fs_files));
dent = securityfs_create_dir("secrets", NULL);
if (IS_ERR(dent)) {
dev_err(&dev->dev, "Error creating secrets securityfs directory entry err=%ld\n",
PTR_ERR(dent));
return PTR_ERR(dent);
}
s->secrets_dir = dent;
dent = securityfs_create_dir("coco", s->secrets_dir);
if (IS_ERR(dent)) {
dev_err(&dev->dev, "Error creating coco securityfs directory entry err=%ld\n",
PTR_ERR(dent));
return PTR_ERR(dent);
}
d_inode(dent)->i_op = &efi_secret_dir_inode_operations;
s->fs_dir = dent;
bytes_left = h->len - sizeof(*h);
ptr += sizeof(*h);
while (bytes_left >= (int)sizeof(*e) && i < EFI_SECRET_NUM_FILES) {
e = (struct secret_entry *)ptr;
if (e->len < sizeof(*e) || e->len > (unsigned int)bytes_left) {
dev_err(&dev->dev, "EFI secret area is corrupted\n");
ret = -EINVAL;
goto err_cleanup;
}
/* Skip deleted entries (which will have NULL_GUID) */
if (efi_guidcmp(e->guid, NULL_GUID)) {
efi_guid_to_str(&e->guid, guid_str);
dent = securityfs_create_file(guid_str, 0440, s->fs_dir, (void *)e,
&efi_secret_bin_file_fops);
if (IS_ERR(dent)) {
dev_err(&dev->dev, "Error creating efi_secret securityfs entry\n");
ret = PTR_ERR(dent);
goto err_cleanup;
}
s->fs_files[i++] = dent;
}
ptr += e->len;
bytes_left -= e->len;
}
dev_info(&dev->dev, "Created %d entries in securityfs secrets/coco\n", i);
return 0;
err_cleanup:
efi_secret_securityfs_teardown(dev);
return ret;
}
static void efi_secret_unmap_area(void)
{
struct efi_secret *s = efi_secret_get();
if (s->secret_data) {
iounmap(s->secret_data);
s->secret_data = NULL;
s->secret_data_len = 0;
}
}
static int efi_secret_probe(struct platform_device *dev)
{
int ret;
ret = efi_secret_map_area(dev);
if (ret)
return ret;
ret = efi_secret_securityfs_setup(dev);
if (ret)
goto err_unmap;
return ret;
err_unmap:
efi_secret_unmap_area();
return ret;
}
static int efi_secret_remove(struct platform_device *dev)
{
efi_secret_securityfs_teardown(dev);
efi_secret_unmap_area();
return 0;
}
static struct platform_driver efi_secret_driver = {
.probe = efi_secret_probe,
.remove = efi_secret_remove,
.driver = {
.name = "efi_secret",
},
};
module_platform_driver(efi_secret_driver);
MODULE_DESCRIPTION("Confidential computing EFI secret area access");
MODULE_AUTHOR("IBM");
MODULE_LICENSE("GPL");
MODULE_ALIAS("platform:efi_secret");

View File

@@ -213,6 +213,8 @@ struct capsule_info {
size_t page_bytes_remain;
};
int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff,
size_t hdr_bytes);
int __efi_capsule_setup_info(struct capsule_info *cap_info);
/*
@@ -383,6 +385,7 @@ void efi_native_runtime_setup(void);
#define EFI_LOAD_FILE_PROTOCOL_GUID EFI_GUID(0x56ec3091, 0x954c, 0x11d2, 0x8e, 0x3f, 0x00, 0xa0, 0xc9, 0x69, 0x72, 0x3b)
#define EFI_LOAD_FILE2_PROTOCOL_GUID EFI_GUID(0x4006c0c1, 0xfcb3, 0x403e, 0x99, 0x6d, 0x4a, 0x6c, 0x87, 0x24, 0xe0, 0x6d)
#define EFI_RT_PROPERTIES_TABLE_GUID EFI_GUID(0xeb66918a, 0x7eef, 0x402a, 0x84, 0x2e, 0x93, 0x1d, 0x21, 0xc3, 0x8a, 0xe9)
#define EFI_DXE_SERVICES_TABLE_GUID EFI_GUID(0x05ad34ba, 0x6f02, 0x4214, 0x95, 0x2e, 0x4d, 0xa0, 0x39, 0x8e, 0x2b, 0xb9)
#define EFI_IMAGE_SECURITY_DATABASE_GUID EFI_GUID(0xd719b2cb, 0x3d3a, 0x4596, 0xa3, 0xbc, 0xda, 0xd0, 0x0e, 0x67, 0x65, 0x6f)
#define EFI_SHIM_LOCK_GUID EFI_GUID(0x605dab50, 0xe046, 0x4300, 0xab, 0xb6, 0x3d, 0xd8, 0x10, 0xdd, 0x8b, 0x23)
@@ -405,6 +408,20 @@ void efi_native_runtime_setup(void);
#define LINUX_EFI_MEMRESERVE_TABLE_GUID EFI_GUID(0x888eb0c6, 0x8ede, 0x4ff5, 0xa8, 0xf0, 0x9a, 0xee, 0x5c, 0xb9, 0x77, 0xc2)
#define LINUX_EFI_INITRD_MEDIA_GUID EFI_GUID(0x5568e427, 0x68fc, 0x4f3d, 0xac, 0x74, 0xca, 0x55, 0x52, 0x31, 0xcc, 0x68)
#define LINUX_EFI_MOK_VARIABLE_TABLE_GUID EFI_GUID(0xc451ed2b, 0x9694, 0x45d3, 0xba, 0xba, 0xed, 0x9f, 0x89, 0x88, 0xa3, 0x89)
#define LINUX_EFI_COCO_SECRET_AREA_GUID EFI_GUID(0xadf956ad, 0xe98c, 0x484c, 0xae, 0x11, 0xb5, 0x1c, 0x7d, 0x33, 0x64, 0x47)
#define RISCV_EFI_BOOT_PROTOCOL_GUID EFI_GUID(0xccd15fec, 0x6f73, 0x4eec, 0x83, 0x95, 0x3e, 0x69, 0xe4, 0xb9, 0x40, 0xbf)
/*
* This GUID may be installed onto the kernel image's handle as a NULL protocol
* to signal to the stub that the placement of the image should be respected,
* and moving the image in physical memory is undesirable. To ensure
* compatibility with 64k pages kernels with virtually mapped stacks, and to
* avoid defeating physical randomization, this protocol should only be
* installed if the image was placed at a randomized 128k aligned address in
* memory.
*/
#define LINUX_EFI_LOADED_IMAGE_FIXED_GUID EFI_GUID(0xf5a37b6d, 0x3344, 0x42a5, 0xb6, 0xbb, 0x97, 0x86, 0x48, 0xc1, 0x89, 0x0a)
/* OEM GUIDs */
#define DELLEMC_EFI_RCI2_TABLE_GUID EFI_GUID(0x2d9f28a2, 0xa886, 0x456a, 0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
@@ -435,6 +452,7 @@ typedef struct {
} efi_config_table_type_t;
#define EFI_SYSTEM_TABLE_SIGNATURE ((u64)0x5453595320494249ULL)
#define EFI_DXE_SERVICES_TABLE_SIGNATURE ((u64)0x565245535f455844ULL)
#define EFI_2_30_SYSTEM_TABLE_REVISION ((2 << 16) | (30))
#define EFI_2_20_SYSTEM_TABLE_REVISION ((2 << 16) | (20))
@@ -596,6 +614,7 @@ extern struct efi {
unsigned long tpm_log; /* TPM2 Event Log table */
unsigned long tpm_final_log; /* TPM2 Final Events Log table */
unsigned long mokvar_table; /* MOK variable config table */
unsigned long coco_secret; /* Confidential computing secret table */
efi_get_time_t *get_time;
efi_set_time_t *set_time;
@@ -1335,4 +1354,12 @@ extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt);
static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt) { }
#endif
struct linux_efi_coco_secret_area {
u64 base_pa;
u64 size;
};
/* Header of a populated EFI secret area */
#define EFI_SECRET_TABLE_HEADER_GUID EFI_GUID(0x1e74f542, 0x71dd, 0x4d66, 0x96, 0x3e, 0xef, 0x42, 0x87, 0xff, 0x17, 0x3b)
#endif /* _LINUX_EFI_H */

View File

@@ -196,6 +196,7 @@ void synchronize_rcu_tasks_rude(void);
void exit_tasks_rcu_start(void);
void exit_tasks_rcu_finish(void);
#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
#define rcu_tasks_classic_qs(t, preempt) do { } while (0)
#define rcu_tasks_qs(t, preempt) do { } while (0)
#define rcu_note_voluntary_context_switch(t) do { } while (0)
#define call_rcu_tasks call_rcu

View File

@@ -2126,6 +2126,47 @@ static inline void cond_resched_rcu(void)
#endif
}
#ifdef CONFIG_PREEMPT_DYNAMIC
extern bool preempt_model_none(void);
extern bool preempt_model_voluntary(void);
extern bool preempt_model_full(void);
#else
static inline bool preempt_model_none(void)
{
return IS_ENABLED(CONFIG_PREEMPT_NONE);
}
static inline bool preempt_model_voluntary(void)
{
return IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY);
}
static inline bool preempt_model_full(void)
{
return IS_ENABLED(CONFIG_PREEMPT);
}
#endif
static inline bool preempt_model_rt(void)
{
return IS_ENABLED(CONFIG_PREEMPT_RT);
}
/*
* Does the preemption model allow non-cooperative preemption?
*
* For !CONFIG_PREEMPT_DYNAMIC kernels this is an exact match with
* CONFIG_PREEMPTION; for CONFIG_PREEMPT_DYNAMIC this doesn't work as the
* kernel is *built* with CONFIG_PREEMPTION=y but may run with e.g. the
* PREEMPT_NONE model.
*/
static inline bool preempt_model_preemptible(void)
{
return preempt_model_full() || preempt_model_rt();
}
/*
* Does a critical section need to be broken due to another
* task waiting?: (technically does not depend on CONFIG_PREEMPTION,

View File

@@ -47,11 +47,9 @@ struct srcu_data {
*/
struct srcu_node {
spinlock_t __private lock;
unsigned long srcu_have_cbs[4]; /* GP seq for children */
/* having CBs, but only */
/* is > ->srcu_gq_seq. */
unsigned long srcu_data_have_cbs[4]; /* Which srcu_data structs */
/* have CBs for given GP? */
unsigned long srcu_have_cbs[4]; /* GP seq for children having CBs, but only */
/* if greater than ->srcu_gq_seq. */
unsigned long srcu_data_have_cbs[4]; /* Which srcu_data structs have CBs for given GP? */
unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */
struct srcu_node *srcu_parent; /* Next up in tree. */
int grplo; /* Least CPU for node. */
@@ -62,18 +60,24 @@ struct srcu_node {
* Per-SRCU-domain structure, similar in function to rcu_state.
*/
struct srcu_struct {
struct srcu_node node[NUM_RCU_NODES]; /* Combining tree. */
struct srcu_node *node; /* Combining tree. */
struct srcu_node *level[RCU_NUM_LVLS + 1];
/* First node at each level. */
int srcu_size_state; /* Small-to-big transition state. */
struct mutex srcu_cb_mutex; /* Serialize CB preparation. */
spinlock_t __private lock; /* Protect counters */
spinlock_t __private lock; /* Protect counters and size state. */
struct mutex srcu_gp_mutex; /* Serialize GP work. */
unsigned int srcu_idx; /* Current rdr array element. */
unsigned long srcu_gp_seq; /* Grace-period seq #. */
unsigned long srcu_gp_seq_needed; /* Latest gp_seq needed. */
unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */
unsigned long srcu_gp_start; /* Last GP start timestamp (jiffies) */
unsigned long srcu_last_gp_end; /* Last GP end timestamp (ns) */
unsigned long srcu_size_jiffies; /* Current contention-measurement interval. */
unsigned long srcu_n_lock_retries; /* Contention events in current interval. */
unsigned long srcu_n_exp_nodelay; /* # expedited no-delays in current GP phase. */
struct srcu_data __percpu *sda; /* Per-CPU srcu_data array. */
bool sda_is_static; /* May ->sda be passed to free_percpu()? */
unsigned long srcu_barrier_seq; /* srcu_barrier seq #. */
struct mutex srcu_barrier_mutex; /* Serialize barrier ops. */
struct completion srcu_barrier_completion;
@@ -81,10 +85,23 @@ struct srcu_struct {
atomic_t srcu_barrier_cpu_cnt; /* # CPUs not yet posting a */
/* callback for the barrier */
/* operation. */
unsigned long reschedule_jiffies;
unsigned long reschedule_count;
struct delayed_work work;
struct lockdep_map dep_map;
};
/* Values for size state variable (->srcu_size_state). */
#define SRCU_SIZE_SMALL 0
#define SRCU_SIZE_ALLOC 1
#define SRCU_SIZE_WAIT_BARRIER 2
#define SRCU_SIZE_WAIT_CALL 3
#define SRCU_SIZE_WAIT_CBS1 4
#define SRCU_SIZE_WAIT_CBS2 5
#define SRCU_SIZE_WAIT_CBS3 6
#define SRCU_SIZE_WAIT_CBS4 7
#define SRCU_SIZE_BIG 8
/* Values for state variable (bottom bits of ->srcu_gp_seq). */
#define SRCU_STATE_IDLE 0
#define SRCU_STATE_SCAN1 1
@@ -121,6 +138,7 @@ struct srcu_struct {
#ifdef MODULE
# define __DEFINE_SRCU(name, is_static) \
is_static struct srcu_struct name; \
extern struct srcu_struct * const __srcu_struct_##name; \
struct srcu_struct * const __srcu_struct_##name \
__section("___srcu_struct_ptrs") = &name
#else

View File

@@ -118,7 +118,7 @@ void _torture_stop_kthread(char *m, struct task_struct **tp);
_torture_stop_kthread("Stopping " #n " task", &(tp))
#ifdef CONFIG_PREEMPTION
#define torture_preempt_schedule() preempt_schedule()
#define torture_preempt_schedule() __preempt_schedule()
#else
#define torture_preempt_schedule() do { } while (0)
#endif

View File

@@ -27,6 +27,7 @@ config BPF_SYSCALL
bool "Enable bpf() system call"
select BPF
select IRQ_WORK
select TASKS_RCU if PREEMPTION
select TASKS_TRACE_RCU
select BINARY_PRINTF
select NET_SOCK_MSG if NET

View File

@@ -77,31 +77,56 @@ config TASKS_RCU_GENERIC
This option enables generic infrastructure code supporting
task-based RCU implementations. Not for manual selection.
config TASKS_RCU
def_bool PREEMPTION
config FORCE_TASKS_RCU
bool "Force selection of TASKS_RCU"
depends on RCU_EXPERT
select TASKS_RCU
default n
help
This option enables a task-based RCU implementation that uses
only voluntary context switch (not preemption!), idle, and
user-mode execution as quiescent states. Not for manual selection.
This option force-enables a task-based RCU implementation
that uses only voluntary context switch (not preemption!),
idle, and user-mode execution as quiescent states. Not for
manual selection in most cases.
config TASKS_RCU
bool
default n
select IRQ_WORK
config FORCE_TASKS_RUDE_RCU
bool "Force selection of Tasks Rude RCU"
depends on RCU_EXPERT
select TASKS_RUDE_RCU
default n
help
This option force-enables a task-based RCU implementation
that uses only context switch (including preemption) and
user-mode execution as quiescent states. It forces IPIs and
context switches on all online CPUs, including idle ones,
so use with caution. Not for manual selection in most cases.
config TASKS_RUDE_RCU
def_bool 0
help
This option enables a task-based RCU implementation that uses
only context switch (including preemption) and user-mode
execution as quiescent states. It forces IPIs and context
switches on all online CPUs, including idle ones, so use
with caution.
config TASKS_TRACE_RCU
def_bool 0
bool
default n
select IRQ_WORK
config FORCE_TASKS_TRACE_RCU
bool "Force selection of Tasks Trace RCU"
depends on RCU_EXPERT
select TASKS_TRACE_RCU
default n
help
This option enables a task-based RCU implementation that uses
explicit rcu_read_lock_trace() read-side markers, and allows
these readers to appear in the idle loop as well as on the CPU
hotplug code paths. It can force IPIs on online CPUs, including
idle ones, so use with caution.
these readers to appear in the idle loop as well as on the
CPU hotplug code paths. It can force IPIs on online CPUs,
including idle ones, so use with caution. Not for manual
selection in most cases.
config TASKS_TRACE_RCU
bool
default n
select IRQ_WORK
config RCU_STALL_COMMON
def_bool TREE_RCU
@@ -195,6 +220,20 @@ config RCU_BOOST_DELAY
Accept the default if unsure.
config RCU_EXP_KTHREAD
bool "Perform RCU expedited work in a real-time kthread"
depends on RCU_BOOST && RCU_EXPERT
default !PREEMPT_RT && NR_CPUS <= 32
help
Use this option to further reduce the latencies of expedited
grace periods at the expense of being more disruptive.
This option is disabled by default on PREEMPT_RT=y kernels which
disable expedited grace periods after boot by unconditionally
setting rcupdate.rcu_normal_after_boot=1.
Accept the default if unsure.
config RCU_NOCB_CPU
bool "Offload RCU callback processing from boot-selected CPUs"
depends on TREE_RCU
@@ -225,7 +264,7 @@ config RCU_NOCB_CPU
config TASKS_TRACE_RCU_READ_MB
bool "Tasks Trace RCU readers use memory barriers in user and idle"
depends on RCU_EXPERT
depends on RCU_EXPERT && TASKS_TRACE_RCU
default PREEMPT_RT || NR_CPUS < 8
help
Use this option to further reduce the number of IPIs sent

View File

@@ -28,9 +28,6 @@ config RCU_SCALE_TEST
depends on DEBUG_KERNEL
select TORTURE_TEST
select SRCU
select TASKS_RCU
select TASKS_RUDE_RCU
select TASKS_TRACE_RCU
default n
help
This option provides a kernel module that runs performance
@@ -47,9 +44,6 @@ config RCU_TORTURE_TEST
depends on DEBUG_KERNEL
select TORTURE_TEST
select SRCU
select TASKS_RCU
select TASKS_RUDE_RCU
select TASKS_TRACE_RCU
default n
help
This option provides a kernel module that runs torture tests
@@ -66,9 +60,6 @@ config RCU_REF_SCALE_TEST
depends on DEBUG_KERNEL
select TORTURE_TEST
select SRCU
select TASKS_RCU
select TASKS_RUDE_RCU
select TASKS_TRACE_RCU
default n
help
This option provides a kernel module that runs performance tests
@@ -91,6 +82,20 @@ config RCU_CPU_STALL_TIMEOUT
RCU grace period persists, additional CPU stall warnings are
printed at more widely spaced intervals.
config RCU_EXP_CPU_STALL_TIMEOUT
int "Expedited RCU CPU stall timeout in milliseconds"
depends on RCU_STALL_COMMON
range 0 21000
default 20 if ANDROID
default 0 if !ANDROID
help
If a given expedited RCU grace period extends more than the
specified number of milliseconds, a CPU stall warning is printed.
If the RCU grace period persists, additional CPU stall warnings
are printed at more widely spaced intervals. A value of zero
says to use the RCU_CPU_STALL_TIMEOUT value converted from
seconds to milliseconds.
config RCU_TRACE
bool "Enable tracing for RCU"
depends on DEBUG_KERNEL

View File

@@ -210,7 +210,9 @@ static inline bool rcu_stall_is_suppressed_at_boot(void)
extern int rcu_cpu_stall_ftrace_dump;
extern int rcu_cpu_stall_suppress;
extern int rcu_cpu_stall_timeout;
extern int rcu_exp_cpu_stall_timeout;
int rcu_jiffies_till_stall_check(void);
int rcu_exp_jiffies_till_stall_check(void);
static inline bool rcu_stall_is_suppressed(void)
{
@@ -523,6 +525,8 @@ static inline bool rcu_check_boost_fail(unsigned long gp_state, int *cpup) { ret
static inline void show_rcu_gp_kthreads(void) { }
static inline int rcu_get_gp_kthreads_prio(void) { return 0; }
static inline void rcu_fwd_progress_check(unsigned long j) { }
static inline void rcu_gp_slow_register(atomic_t *rgssp) { }
static inline void rcu_gp_slow_unregister(atomic_t *rgssp) { }
#else /* #ifdef CONFIG_TINY_RCU */
bool rcu_dynticks_zero_in_eqs(int cpu, int *vp);
unsigned long rcu_get_gp_seq(void);
@@ -534,14 +538,19 @@ int rcu_get_gp_kthreads_prio(void);
void rcu_fwd_progress_check(unsigned long j);
void rcu_force_quiescent_state(void);
extern struct workqueue_struct *rcu_gp_wq;
#ifdef CONFIG_RCU_EXP_KTHREAD
extern struct kthread_worker *rcu_exp_gp_kworker;
extern struct kthread_worker *rcu_exp_par_gp_kworker;
#else /* !CONFIG_RCU_EXP_KTHREAD */
extern struct workqueue_struct *rcu_par_gp_wq;
#endif /* CONFIG_RCU_EXP_KTHREAD */
void rcu_gp_slow_register(atomic_t *rgssp);
void rcu_gp_slow_unregister(atomic_t *rgssp);
#endif /* #else #ifdef CONFIG_TINY_RCU */
#ifdef CONFIG_RCU_NOCB_CPU
bool rcu_is_nocb_cpu(int cpu);
void rcu_bind_current_to_nocb(void);
#else
static inline bool rcu_is_nocb_cpu(int cpu) { return false; }
static inline void rcu_bind_current_to_nocb(void) { }
#endif

View File

@@ -505,10 +505,10 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
WRITE_ONCE(rsclp->tails[j], rsclp->tails[RCU_DONE_TAIL]);
/*
* Callbacks moved, so clean up the misordered ->tails[] pointers
* that now point into the middle of the list of ready-to-invoke
* callbacks. The overall effect is to copy down the later pointers
* into the gap that was created by the now-ready segments.
* Callbacks moved, so there might be an empty RCU_WAIT_TAIL
* and a non-empty RCU_NEXT_READY_TAIL. If so, copy the
* RCU_NEXT_READY_TAIL segment to fill the RCU_WAIT_TAIL gap
* created by the now-ready-to-invoke segments.
*/
for (j = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++, j++) {
if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL])

View File

@@ -268,6 +268,8 @@ static struct rcu_scale_ops srcud_ops = {
.name = "srcud"
};
#ifdef CONFIG_TASKS_RCU
/*
* Definitions for RCU-tasks scalability testing.
*/
@@ -295,6 +297,16 @@ static struct rcu_scale_ops tasks_ops = {
.name = "tasks"
};
#define TASKS_OPS &tasks_ops,
#else // #ifdef CONFIG_TASKS_RCU
#define TASKS_OPS
#endif // #else // #ifdef CONFIG_TASKS_RCU
#ifdef CONFIG_TASKS_TRACE_RCU
/*
* Definitions for RCU-tasks-trace scalability testing.
*/
@@ -324,6 +336,14 @@ static struct rcu_scale_ops tasks_tracing_ops = {
.name = "tasks-tracing"
};
#define TASKS_TRACING_OPS &tasks_tracing_ops,
#else // #ifdef CONFIG_TASKS_TRACE_RCU
#define TASKS_TRACING_OPS
#endif // #else // #ifdef CONFIG_TASKS_TRACE_RCU
static unsigned long rcuscale_seq_diff(unsigned long new, unsigned long old)
{
if (!cur_ops->gp_diff)
@@ -797,7 +817,7 @@ rcu_scale_init(void)
long i;
int firsterr = 0;
static struct rcu_scale_ops *scale_ops[] = {
&rcu_ops, &srcu_ops, &srcud_ops, &tasks_ops, &tasks_tracing_ops
&rcu_ops, &srcu_ops, &srcud_ops, TASKS_OPS TASKS_TRACING_OPS
};
if (!torture_init_begin(scale_type, verbose))

View File

@@ -737,6 +737,50 @@ static struct rcu_torture_ops busted_srcud_ops = {
.name = "busted_srcud"
};
/*
* Definitions for trivial CONFIG_PREEMPT=n-only torture testing.
* This implementation does not necessarily work well with CPU hotplug.
*/
static void synchronize_rcu_trivial(void)
{
int cpu;
for_each_online_cpu(cpu) {
rcutorture_sched_setaffinity(current->pid, cpumask_of(cpu));
WARN_ON_ONCE(raw_smp_processor_id() != cpu);
}
}
static int rcu_torture_read_lock_trivial(void) __acquires(RCU)
{
preempt_disable();
return 0;
}
static void rcu_torture_read_unlock_trivial(int idx) __releases(RCU)
{
preempt_enable();
}
static struct rcu_torture_ops trivial_ops = {
.ttype = RCU_TRIVIAL_FLAVOR,
.init = rcu_sync_torture_init,
.readlock = rcu_torture_read_lock_trivial,
.read_delay = rcu_read_delay, /* just reuse rcu's version. */
.readunlock = rcu_torture_read_unlock_trivial,
.readlock_held = torture_readlock_not_held,
.get_gp_seq = rcu_no_completed,
.sync = synchronize_rcu_trivial,
.exp_sync = synchronize_rcu_trivial,
.fqs = NULL,
.stats = NULL,
.irq_capable = 1,
.name = "trivial"
};
#ifdef CONFIG_TASKS_RCU
/*
* Definitions for RCU-tasks torture testing.
*/
@@ -780,47 +824,16 @@ static struct rcu_torture_ops tasks_ops = {
.name = "tasks"
};
/*
* Definitions for trivial CONFIG_PREEMPT=n-only torture testing.
* This implementation does not necessarily work well with CPU hotplug.
*/
#define TASKS_OPS &tasks_ops,
static void synchronize_rcu_trivial(void)
{
int cpu;
#else // #ifdef CONFIG_TASKS_RCU
for_each_online_cpu(cpu) {
rcutorture_sched_setaffinity(current->pid, cpumask_of(cpu));
WARN_ON_ONCE(raw_smp_processor_id() != cpu);
}
}
#define TASKS_OPS
static int rcu_torture_read_lock_trivial(void) __acquires(RCU)
{
preempt_disable();
return 0;
}
#endif // #else #ifdef CONFIG_TASKS_RCU
static void rcu_torture_read_unlock_trivial(int idx) __releases(RCU)
{
preempt_enable();
}
static struct rcu_torture_ops trivial_ops = {
.ttype = RCU_TRIVIAL_FLAVOR,
.init = rcu_sync_torture_init,
.readlock = rcu_torture_read_lock_trivial,
.read_delay = rcu_read_delay, /* just reuse rcu's version. */
.readunlock = rcu_torture_read_unlock_trivial,
.readlock_held = torture_readlock_not_held,
.get_gp_seq = rcu_no_completed,
.sync = synchronize_rcu_trivial,
.exp_sync = synchronize_rcu_trivial,
.fqs = NULL,
.stats = NULL,
.irq_capable = 1,
.name = "trivial"
};
#ifdef CONFIG_TASKS_RUDE_RCU
/*
* Definitions for rude RCU-tasks torture testing.
@@ -851,6 +864,17 @@ static struct rcu_torture_ops tasks_rude_ops = {
.name = "tasks-rude"
};
#define TASKS_RUDE_OPS &tasks_rude_ops,
#else // #ifdef CONFIG_TASKS_RUDE_RCU
#define TASKS_RUDE_OPS
#endif // #else #ifdef CONFIG_TASKS_RUDE_RCU
#ifdef CONFIG_TASKS_TRACE_RCU
/*
* Definitions for tracing RCU-tasks torture testing.
*/
@@ -893,6 +917,15 @@ static struct rcu_torture_ops tasks_tracing_ops = {
.name = "tasks-tracing"
};
#define TASKS_TRACING_OPS &tasks_tracing_ops,
#else // #ifdef CONFIG_TASKS_TRACE_RCU
#define TASKS_TRACING_OPS
#endif // #else #ifdef CONFIG_TASKS_TRACE_RCU
static unsigned long rcutorture_seq_diff(unsigned long new, unsigned long old)
{
if (!cur_ops->gp_diff)
@@ -1178,7 +1211,7 @@ rcu_torture_writer(void *arg)
" GP expediting controlled from boot/sysfs for %s.\n",
torture_type, cur_ops->name);
if (WARN_ONCE(nsynctypes == 0,
"rcu_torture_writer: No update-side primitives.\n")) {
"%s: No update-side primitives.\n", __func__)) {
/*
* No updates primitives, so don't try updating.
* The resulting test won't be testing much, hence the
@@ -1186,6 +1219,7 @@ rcu_torture_writer(void *arg)
*/
rcu_torture_writer_state = RTWS_STOPPING;
torture_kthread_stopping("rcu_torture_writer");
return 0;
}
do {
@@ -1322,6 +1356,17 @@ rcu_torture_fakewriter(void *arg)
VERBOSE_TOROUT_STRING("rcu_torture_fakewriter task started");
set_user_nice(current, MAX_NICE);
if (WARN_ONCE(nsynctypes == 0,
"%s: No update-side primitives.\n", __func__)) {
/*
* No updates primitives, so don't try updating.
* The resulting test won't be testing much, hence the
* above WARN_ONCE().
*/
torture_kthread_stopping("rcu_torture_fakewriter");
return 0;
}
do {
torture_hrtimeout_jiffies(torture_random(&rand) % 10, &rand);
if (cur_ops->cb_barrier != NULL &&
@@ -2916,10 +2961,12 @@ rcu_torture_cleanup(void)
pr_info("%s: Invoking %pS().\n", __func__, cur_ops->cb_barrier);
cur_ops->cb_barrier();
}
rcu_gp_slow_unregister(NULL);
return;
}
if (!cur_ops) {
torture_cleanup_end();
rcu_gp_slow_unregister(NULL);
return;
}
@@ -3016,6 +3063,7 @@ rcu_torture_cleanup(void)
else
rcu_torture_print_module_parms(cur_ops, "End of test: SUCCESS");
torture_cleanup_end();
rcu_gp_slow_unregister(&rcu_fwd_cb_nodelay);
}
#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
@@ -3096,9 +3144,9 @@ rcu_torture_init(void)
int flags = 0;
unsigned long gp_seq = 0;
static struct rcu_torture_ops *torture_ops[] = {
&rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops,
&busted_srcud_ops, &tasks_ops, &tasks_rude_ops,
&tasks_tracing_ops, &trivial_ops,
&rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops, &busted_srcud_ops,
TASKS_OPS TASKS_RUDE_OPS TASKS_TRACING_OPS
&trivial_ops,
};
if (!torture_init_begin(torture_type, verbose))
@@ -3320,6 +3368,7 @@ rcu_torture_init(void)
if (object_debug)
rcu_test_debug_objects();
torture_init_end();
rcu_gp_slow_register(&rcu_fwd_cb_nodelay);
return 0;
unwind:

View File

@@ -207,6 +207,8 @@ static struct ref_scale_ops srcu_ops = {
.name = "srcu"
};
#ifdef CONFIG_TASKS_RCU
// Definitions for RCU Tasks ref scale testing: Empty read markers.
// These definitions also work for RCU Rude readers.
static void rcu_tasks_ref_scale_read_section(const int nloops)
@@ -232,6 +234,16 @@ static struct ref_scale_ops rcu_tasks_ops = {
.name = "rcu-tasks"
};
#define RCU_TASKS_OPS &rcu_tasks_ops,
#else // #ifdef CONFIG_TASKS_RCU
#define RCU_TASKS_OPS
#endif // #else // #ifdef CONFIG_TASKS_RCU
#ifdef CONFIG_TASKS_TRACE_RCU
// Definitions for RCU Tasks Trace ref scale testing.
static void rcu_trace_ref_scale_read_section(const int nloops)
{
@@ -261,6 +273,14 @@ static struct ref_scale_ops rcu_trace_ops = {
.name = "rcu-trace"
};
#define RCU_TRACE_OPS &rcu_trace_ops,
#else // #ifdef CONFIG_TASKS_TRACE_RCU
#define RCU_TRACE_OPS
#endif // #else // #ifdef CONFIG_TASKS_TRACE_RCU
// Definitions for reference count
static atomic_t refcnt;
@@ -790,7 +810,7 @@ ref_scale_init(void)
long i;
int firsterr = 0;
static struct ref_scale_ops *scale_ops[] = {
&rcu_ops, &srcu_ops, &rcu_trace_ops, &rcu_tasks_ops, &refcnt_ops, &rwlock_ops,
&rcu_ops, &srcu_ops, RCU_TRACE_OPS RCU_TASKS_OPS &refcnt_ops, &rwlock_ops,
&rwsem_ops, &lock_ops, &lock_irq_ops, &acqrel_ops, &clock_ops,
};

View File

@@ -24,6 +24,7 @@
#include <linux/smp.h>
#include <linux/delay.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/srcu.h>
#include "rcu.h"
@@ -38,6 +39,35 @@ module_param(exp_holdoff, ulong, 0444);
static ulong counter_wrap_check = (ULONG_MAX >> 2);
module_param(counter_wrap_check, ulong, 0444);
/*
* Control conversion to SRCU_SIZE_BIG:
* 0: Don't convert at all.
* 1: Convert at init_srcu_struct() time.
* 2: Convert when rcutorture invokes srcu_torture_stats_print().
* 3: Decide at boot time based on system shape (default).
* 0x1x: Convert when excessive contention encountered.
*/
#define SRCU_SIZING_NONE 0
#define SRCU_SIZING_INIT 1
#define SRCU_SIZING_TORTURE 2
#define SRCU_SIZING_AUTO 3
#define SRCU_SIZING_CONTEND 0x10
#define SRCU_SIZING_IS(x) ((convert_to_big & ~SRCU_SIZING_CONTEND) == x)
#define SRCU_SIZING_IS_NONE() (SRCU_SIZING_IS(SRCU_SIZING_NONE))
#define SRCU_SIZING_IS_INIT() (SRCU_SIZING_IS(SRCU_SIZING_INIT))
#define SRCU_SIZING_IS_TORTURE() (SRCU_SIZING_IS(SRCU_SIZING_TORTURE))
#define SRCU_SIZING_IS_CONTEND() (convert_to_big & SRCU_SIZING_CONTEND)
static int convert_to_big = SRCU_SIZING_AUTO;
module_param(convert_to_big, int, 0444);
/* Number of CPUs to trigger init_srcu_struct()-time transition to big. */
static int big_cpu_lim __read_mostly = 128;
module_param(big_cpu_lim, int, 0444);
/* Contention events per jiffy to initiate transition to big. */
static int small_contention_lim __read_mostly = 100;
module_param(small_contention_lim, int, 0444);
/* Early-boot callback-management, so early that no lock is required! */
static LIST_HEAD(srcu_boot_list);
static bool __read_mostly srcu_init_done;
@@ -48,39 +78,90 @@ static void process_srcu(struct work_struct *work);
static void srcu_delay_timer(struct timer_list *t);
/* Wrappers for lock acquisition and release, see raw_spin_lock_rcu_node(). */
#define spin_lock_rcu_node(p) \
do { \
spin_lock(&ACCESS_PRIVATE(p, lock)); \
smp_mb__after_unlock_lock(); \
#define spin_lock_rcu_node(p) \
do { \
spin_lock(&ACCESS_PRIVATE(p, lock)); \
smp_mb__after_unlock_lock(); \
} while (0)
#define spin_unlock_rcu_node(p) spin_unlock(&ACCESS_PRIVATE(p, lock))
#define spin_lock_irq_rcu_node(p) \
do { \
spin_lock_irq(&ACCESS_PRIVATE(p, lock)); \
smp_mb__after_unlock_lock(); \
#define spin_lock_irq_rcu_node(p) \
do { \
spin_lock_irq(&ACCESS_PRIVATE(p, lock)); \
smp_mb__after_unlock_lock(); \
} while (0)
#define spin_unlock_irq_rcu_node(p) \
#define spin_unlock_irq_rcu_node(p) \
spin_unlock_irq(&ACCESS_PRIVATE(p, lock))
#define spin_lock_irqsave_rcu_node(p, flags) \
do { \
spin_lock_irqsave(&ACCESS_PRIVATE(p, lock), flags); \
smp_mb__after_unlock_lock(); \
#define spin_lock_irqsave_rcu_node(p, flags) \
do { \
spin_lock_irqsave(&ACCESS_PRIVATE(p, lock), flags); \
smp_mb__after_unlock_lock(); \
} while (0)
#define spin_unlock_irqrestore_rcu_node(p, flags) \
spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags) \
#define spin_trylock_irqsave_rcu_node(p, flags) \
({ \
bool ___locked = spin_trylock_irqsave(&ACCESS_PRIVATE(p, lock), flags); \
\
if (___locked) \
smp_mb__after_unlock_lock(); \
___locked; \
})
#define spin_unlock_irqrestore_rcu_node(p, flags) \
spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags) \
/*
* Initialize SRCU combining tree. Note that statically allocated
* Initialize SRCU per-CPU data. Note that statically allocated
* srcu_struct structures might already have srcu_read_lock() and
* srcu_read_unlock() running against them. So if the is_static parameter
* is set, don't initialize ->srcu_lock_count[] and ->srcu_unlock_count[].
*/
static void init_srcu_struct_nodes(struct srcu_struct *ssp)
static void init_srcu_struct_data(struct srcu_struct *ssp)
{
int cpu;
struct srcu_data *sdp;
/*
* Initialize the per-CPU srcu_data array, which feeds into the
* leaves of the srcu_node tree.
*/
WARN_ON_ONCE(ARRAY_SIZE(sdp->srcu_lock_count) !=
ARRAY_SIZE(sdp->srcu_unlock_count));
for_each_possible_cpu(cpu) {
sdp = per_cpu_ptr(ssp->sda, cpu);
spin_lock_init(&ACCESS_PRIVATE(sdp, lock));
rcu_segcblist_init(&sdp->srcu_cblist);
sdp->srcu_cblist_invoking = false;
sdp->srcu_gp_seq_needed = ssp->srcu_gp_seq;
sdp->srcu_gp_seq_needed_exp = ssp->srcu_gp_seq;
sdp->mynode = NULL;
sdp->cpu = cpu;
INIT_WORK(&sdp->work, srcu_invoke_callbacks);
timer_setup(&sdp->delay_work, srcu_delay_timer, 0);
sdp->ssp = ssp;
}
}
/* Invalid seq state, used during snp node initialization */
#define SRCU_SNP_INIT_SEQ 0x2
/*
* Check whether sequence number corresponding to snp node,
* is invalid.
*/
static inline bool srcu_invl_snp_seq(unsigned long s)
{
return rcu_seq_state(s) == SRCU_SNP_INIT_SEQ;
}
/*
* Allocated and initialize SRCU combining tree. Returns @true if
* allocation succeeded and @false otherwise.
*/
static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
{
int cpu;
int i;
@@ -92,6 +173,9 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp)
/* Initialize geometry if it has not already been initialized. */
rcu_init_geometry();
ssp->node = kcalloc(rcu_num_nodes, sizeof(*ssp->node), gfp_flags);
if (!ssp->node)
return false;
/* Work out the overall tree geometry. */
ssp->level[0] = &ssp->node[0];
@@ -105,10 +189,10 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp)
WARN_ON_ONCE(ARRAY_SIZE(snp->srcu_have_cbs) !=
ARRAY_SIZE(snp->srcu_data_have_cbs));
for (i = 0; i < ARRAY_SIZE(snp->srcu_have_cbs); i++) {
snp->srcu_have_cbs[i] = 0;
snp->srcu_have_cbs[i] = SRCU_SNP_INIT_SEQ;
snp->srcu_data_have_cbs[i] = 0;
}
snp->srcu_gp_seq_needed_exp = 0;
snp->srcu_gp_seq_needed_exp = SRCU_SNP_INIT_SEQ;
snp->grplo = -1;
snp->grphi = -1;
if (snp == &ssp->node[0]) {
@@ -129,39 +213,31 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp)
* Initialize the per-CPU srcu_data array, which feeds into the
* leaves of the srcu_node tree.
*/
WARN_ON_ONCE(ARRAY_SIZE(sdp->srcu_lock_count) !=
ARRAY_SIZE(sdp->srcu_unlock_count));
level = rcu_num_lvls - 1;
snp_first = ssp->level[level];
for_each_possible_cpu(cpu) {
sdp = per_cpu_ptr(ssp->sda, cpu);
spin_lock_init(&ACCESS_PRIVATE(sdp, lock));
rcu_segcblist_init(&sdp->srcu_cblist);
sdp->srcu_cblist_invoking = false;
sdp->srcu_gp_seq_needed = ssp->srcu_gp_seq;
sdp->srcu_gp_seq_needed_exp = ssp->srcu_gp_seq;
sdp->mynode = &snp_first[cpu / levelspread[level]];
for (snp = sdp->mynode; snp != NULL; snp = snp->srcu_parent) {
if (snp->grplo < 0)
snp->grplo = cpu;
snp->grphi = cpu;
}
sdp->cpu = cpu;
INIT_WORK(&sdp->work, srcu_invoke_callbacks);
timer_setup(&sdp->delay_work, srcu_delay_timer, 0);
sdp->ssp = ssp;
sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
}
smp_store_release(&ssp->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
return true;
}
/*
* Initialize non-compile-time initialized fields, including the
* associated srcu_node and srcu_data structures. The is_static
* parameter is passed through to init_srcu_struct_nodes(), and
* also tells us that ->sda has already been wired up to srcu_data.
* associated srcu_node and srcu_data structures. The is_static parameter
* tells us that ->sda has already been wired up to srcu_data.
*/
static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static)
{
ssp->srcu_size_state = SRCU_SIZE_SMALL;
ssp->node = NULL;
mutex_init(&ssp->srcu_cb_mutex);
mutex_init(&ssp->srcu_gp_mutex);
ssp->srcu_idx = 0;
@@ -170,13 +246,25 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static)
mutex_init(&ssp->srcu_barrier_mutex);
atomic_set(&ssp->srcu_barrier_cpu_cnt, 0);
INIT_DELAYED_WORK(&ssp->work, process_srcu);
ssp->sda_is_static = is_static;
if (!is_static)
ssp->sda = alloc_percpu(struct srcu_data);
if (!ssp->sda)
return -ENOMEM;
init_srcu_struct_nodes(ssp);
init_srcu_struct_data(ssp);
ssp->srcu_gp_seq_needed_exp = 0;
ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
if (READ_ONCE(ssp->srcu_size_state) == SRCU_SIZE_SMALL && SRCU_SIZING_IS_INIT()) {
if (!init_srcu_struct_nodes(ssp, GFP_ATOMIC)) {
if (!ssp->sda_is_static) {
free_percpu(ssp->sda);
ssp->sda = NULL;
return -ENOMEM;
}
} else {
WRITE_ONCE(ssp->srcu_size_state, SRCU_SIZE_BIG);
}
}
smp_store_release(&ssp->srcu_gp_seq_needed, 0); /* Init done. */
return 0;
}
@@ -213,6 +301,86 @@ EXPORT_SYMBOL_GPL(init_srcu_struct);
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
/*
* Initiate a transition to SRCU_SIZE_BIG with lock held.
*/
static void __srcu_transition_to_big(struct srcu_struct *ssp)
{
lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock));
smp_store_release(&ssp->srcu_size_state, SRCU_SIZE_ALLOC);
}
/*
* Initiate an idempotent transition to SRCU_SIZE_BIG.
*/
static void srcu_transition_to_big(struct srcu_struct *ssp)
{
unsigned long flags;
/* Double-checked locking on ->srcu_size-state. */
if (smp_load_acquire(&ssp->srcu_size_state) != SRCU_SIZE_SMALL)
return;
spin_lock_irqsave_rcu_node(ssp, flags);
if (smp_load_acquire(&ssp->srcu_size_state) != SRCU_SIZE_SMALL) {
spin_unlock_irqrestore_rcu_node(ssp, flags);
return;
}
__srcu_transition_to_big(ssp);
spin_unlock_irqrestore_rcu_node(ssp, flags);
}
/*
* Check to see if the just-encountered contention event justifies
* a transition to SRCU_SIZE_BIG.
*/
static void spin_lock_irqsave_check_contention(struct srcu_struct *ssp)
{
unsigned long j;
if (!SRCU_SIZING_IS_CONTEND() || ssp->srcu_size_state)
return;
j = jiffies;
if (ssp->srcu_size_jiffies != j) {
ssp->srcu_size_jiffies = j;
ssp->srcu_n_lock_retries = 0;
}
if (++ssp->srcu_n_lock_retries <= small_contention_lim)
return;
__srcu_transition_to_big(ssp);
}
/*
* Acquire the specified srcu_data structure's ->lock, but check for
* excessive contention, which results in initiation of a transition
* to SRCU_SIZE_BIG. But only if the srcutree.convert_to_big module
* parameter permits this.
*/
static void spin_lock_irqsave_sdp_contention(struct srcu_data *sdp, unsigned long *flags)
{
struct srcu_struct *ssp = sdp->ssp;
if (spin_trylock_irqsave_rcu_node(sdp, *flags))
return;
spin_lock_irqsave_rcu_node(ssp, *flags);
spin_lock_irqsave_check_contention(ssp);
spin_unlock_irqrestore_rcu_node(ssp, *flags);
spin_lock_irqsave_rcu_node(sdp, *flags);
}
/*
* Acquire the specified srcu_struct structure's ->lock, but check for
* excessive contention, which results in initiation of a transition
* to SRCU_SIZE_BIG. But only if the srcutree.convert_to_big module
* parameter permits this.
*/
static void spin_lock_irqsave_ssp_contention(struct srcu_struct *ssp, unsigned long *flags)
{
if (spin_trylock_irqsave_rcu_node(ssp, *flags))
return;
spin_lock_irqsave_rcu_node(ssp, *flags);
spin_lock_irqsave_check_contention(ssp);
}
/*
* First-use initialization of statically allocated srcu_struct
* structure. Wiring up the combining tree is more than can be
@@ -343,7 +511,10 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
return sum;
}
#define SRCU_INTERVAL 1
#define SRCU_INTERVAL 1 // Base delay if no expedited GPs pending.
#define SRCU_MAX_INTERVAL 10 // Maximum incremental delay from slow readers.
#define SRCU_MAX_NODELAY_PHASE 1 // Maximum per-GP-phase consecutive no-delay instances.
#define SRCU_MAX_NODELAY 100 // Maximum consecutive no-delay instances.
/*
* Return grace-period delay, zero if there are expedited grace
@@ -351,10 +522,18 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
*/
static unsigned long srcu_get_delay(struct srcu_struct *ssp)
{
if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq),
READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
return 0;
return SRCU_INTERVAL;
unsigned long jbase = SRCU_INTERVAL;
if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
jbase = 0;
if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
if (!jbase) {
WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
jbase = 1;
}
return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
}
/**
@@ -382,13 +561,20 @@ void cleanup_srcu_struct(struct srcu_struct *ssp)
return; /* Forgot srcu_barrier(), so just leak it! */
}
if (WARN_ON(rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) != SRCU_STATE_IDLE) ||
WARN_ON(rcu_seq_current(&ssp->srcu_gp_seq) != ssp->srcu_gp_seq_needed) ||
WARN_ON(srcu_readers_active(ssp))) {
pr_info("%s: Active srcu_struct %p state: %d\n",
__func__, ssp, rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)));
pr_info("%s: Active srcu_struct %p read state: %d gp state: %lu/%lu\n",
__func__, ssp, rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)),
rcu_seq_current(&ssp->srcu_gp_seq), ssp->srcu_gp_seq_needed);
return; /* Caller forgot to stop doing call_srcu()? */
}
free_percpu(ssp->sda);
ssp->sda = NULL;
if (!ssp->sda_is_static) {
free_percpu(ssp->sda);
ssp->sda = NULL;
}
kfree(ssp->node);
ssp->node = NULL;
ssp->srcu_size_state = SRCU_SIZE_SMALL;
}
EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
@@ -434,9 +620,13 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
*/
static void srcu_gp_start(struct srcu_struct *ssp)
{
struct srcu_data *sdp = this_cpu_ptr(ssp->sda);
struct srcu_data *sdp;
int state;
if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
sdp = per_cpu_ptr(ssp->sda, 0);
else
sdp = this_cpu_ptr(ssp->sda);
lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock));
WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
spin_lock_rcu_node(sdp); /* Interrupts already disabled. */
@@ -445,6 +635,8 @@ static void srcu_gp_start(struct srcu_struct *ssp)
(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
rcu_seq_snap(&ssp->srcu_gp_seq));
spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */
WRITE_ONCE(ssp->srcu_gp_start, jiffies);
WRITE_ONCE(ssp->srcu_n_exp_nodelay, 0);
smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
rcu_seq_start(&ssp->srcu_gp_seq);
state = rcu_seq_state(ssp->srcu_gp_seq);
@@ -517,7 +709,9 @@ static void srcu_gp_end(struct srcu_struct *ssp)
int idx;
unsigned long mask;
struct srcu_data *sdp;
unsigned long sgsne;
struct srcu_node *snp;
int ss_state;
/* Prevent more than one additional grace period. */
mutex_lock(&ssp->srcu_cb_mutex);
@@ -526,7 +720,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
spin_lock_irq_rcu_node(ssp);
idx = rcu_seq_state(ssp->srcu_gp_seq);
WARN_ON_ONCE(idx != SRCU_STATE_SCAN2);
cbdelay = srcu_get_delay(ssp);
cbdelay = !!srcu_get_delay(ssp);
WRITE_ONCE(ssp->srcu_last_gp_end, ktime_get_mono_fast_ns());
rcu_seq_end(&ssp->srcu_gp_seq);
gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
@@ -537,38 +731,45 @@ static void srcu_gp_end(struct srcu_struct *ssp)
/* A new grace period can start at this point. But only one. */
/* Initiate callback invocation as needed. */
idx = rcu_seq_ctr(gpseq) % ARRAY_SIZE(snp->srcu_have_cbs);
srcu_for_each_node_breadth_first(ssp, snp) {
spin_lock_irq_rcu_node(snp);
cbs = false;
last_lvl = snp >= ssp->level[rcu_num_lvls - 1];
if (last_lvl)
cbs = snp->srcu_have_cbs[idx] == gpseq;
snp->srcu_have_cbs[idx] = gpseq;
rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
if (ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, gpseq))
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
mask = snp->srcu_data_have_cbs[idx];
snp->srcu_data_have_cbs[idx] = 0;
spin_unlock_irq_rcu_node(snp);
if (cbs)
srcu_schedule_cbs_snp(ssp, snp, mask, cbdelay);
/* Occasionally prevent srcu_data counter wrap. */
if (!(gpseq & counter_wrap_check) && last_lvl)
for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
sdp = per_cpu_ptr(ssp->sda, cpu);
spin_lock_irqsave_rcu_node(sdp, flags);
if (ULONG_CMP_GE(gpseq,
sdp->srcu_gp_seq_needed + 100))
sdp->srcu_gp_seq_needed = gpseq;
if (ULONG_CMP_GE(gpseq,
sdp->srcu_gp_seq_needed_exp + 100))
sdp->srcu_gp_seq_needed_exp = gpseq;
spin_unlock_irqrestore_rcu_node(sdp, flags);
}
ss_state = smp_load_acquire(&ssp->srcu_size_state);
if (ss_state < SRCU_SIZE_WAIT_BARRIER) {
srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, 0), cbdelay);
} else {
idx = rcu_seq_ctr(gpseq) % ARRAY_SIZE(snp->srcu_have_cbs);
srcu_for_each_node_breadth_first(ssp, snp) {
spin_lock_irq_rcu_node(snp);
cbs = false;
last_lvl = snp >= ssp->level[rcu_num_lvls - 1];
if (last_lvl)
cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq;
snp->srcu_have_cbs[idx] = gpseq;
rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
sgsne = snp->srcu_gp_seq_needed_exp;
if (srcu_invl_snp_seq(sgsne) || ULONG_CMP_LT(sgsne, gpseq))
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
if (ss_state < SRCU_SIZE_BIG)
mask = ~0;
else
mask = snp->srcu_data_have_cbs[idx];
snp->srcu_data_have_cbs[idx] = 0;
spin_unlock_irq_rcu_node(snp);
if (cbs)
srcu_schedule_cbs_snp(ssp, snp, mask, cbdelay);
}
}
/* Occasionally prevent srcu_data counter wrap. */
if (!(gpseq & counter_wrap_check))
for_each_possible_cpu(cpu) {
sdp = per_cpu_ptr(ssp->sda, cpu);
spin_lock_irqsave_rcu_node(sdp, flags);
if (ULONG_CMP_GE(gpseq, sdp->srcu_gp_seq_needed + 100))
sdp->srcu_gp_seq_needed = gpseq;
if (ULONG_CMP_GE(gpseq, sdp->srcu_gp_seq_needed_exp + 100))
sdp->srcu_gp_seq_needed_exp = gpseq;
spin_unlock_irqrestore_rcu_node(sdp, flags);
}
/* Callback initiation done, allow grace periods after next. */
mutex_unlock(&ssp->srcu_cb_mutex);
@@ -583,6 +784,14 @@ static void srcu_gp_end(struct srcu_struct *ssp)
} else {
spin_unlock_irq_rcu_node(ssp);
}
/* Transition to big if needed. */
if (ss_state != SRCU_SIZE_SMALL && ss_state != SRCU_SIZE_BIG) {
if (ss_state == SRCU_SIZE_ALLOC)
init_srcu_struct_nodes(ssp, GFP_KERNEL);
else
smp_store_release(&ssp->srcu_size_state, ss_state + 1);
}
}
/*
@@ -596,20 +805,24 @@ static void srcu_funnel_exp_start(struct srcu_struct *ssp, struct srcu_node *snp
unsigned long s)
{
unsigned long flags;
unsigned long sgsne;
for (; snp != NULL; snp = snp->srcu_parent) {
if (rcu_seq_done(&ssp->srcu_gp_seq, s) ||
ULONG_CMP_GE(READ_ONCE(snp->srcu_gp_seq_needed_exp), s))
return;
spin_lock_irqsave_rcu_node(snp, flags);
if (ULONG_CMP_GE(snp->srcu_gp_seq_needed_exp, s)) {
if (snp)
for (; snp != NULL; snp = snp->srcu_parent) {
sgsne = READ_ONCE(snp->srcu_gp_seq_needed_exp);
if (rcu_seq_done(&ssp->srcu_gp_seq, s) ||
(!srcu_invl_snp_seq(sgsne) && ULONG_CMP_GE(sgsne, s)))
return;
spin_lock_irqsave_rcu_node(snp, flags);
sgsne = snp->srcu_gp_seq_needed_exp;
if (!srcu_invl_snp_seq(sgsne) && ULONG_CMP_GE(sgsne, s)) {
spin_unlock_irqrestore_rcu_node(snp, flags);
return;
}
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
spin_unlock_irqrestore_rcu_node(snp, flags);
return;
}
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
spin_unlock_irqrestore_rcu_node(snp, flags);
}
spin_lock_irqsave_rcu_node(ssp, flags);
spin_lock_irqsave_ssp_contention(ssp, &flags);
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);
spin_unlock_irqrestore_rcu_node(ssp, flags);
@@ -630,39 +843,47 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
{
unsigned long flags;
int idx = rcu_seq_ctr(s) % ARRAY_SIZE(sdp->mynode->srcu_have_cbs);
struct srcu_node *snp = sdp->mynode;
unsigned long sgsne;
struct srcu_node *snp;
struct srcu_node *snp_leaf;
unsigned long snp_seq;
/* Each pass through the loop does one level of the srcu_node tree. */
for (; snp != NULL; snp = snp->srcu_parent) {
if (rcu_seq_done(&ssp->srcu_gp_seq, s) && snp != sdp->mynode)
return; /* GP already done and CBs recorded. */
spin_lock_irqsave_rcu_node(snp, flags);
if (ULONG_CMP_GE(snp->srcu_have_cbs[idx], s)) {
/* Ensure that snp node tree is fully initialized before traversing it */
if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
snp_leaf = NULL;
else
snp_leaf = sdp->mynode;
if (snp_leaf)
/* Each pass through the loop does one level of the srcu_node tree. */
for (snp = snp_leaf; snp != NULL; snp = snp->srcu_parent) {
if (rcu_seq_done(&ssp->srcu_gp_seq, s) && snp != snp_leaf)
return; /* GP already done and CBs recorded. */
spin_lock_irqsave_rcu_node(snp, flags);
snp_seq = snp->srcu_have_cbs[idx];
if (snp == sdp->mynode && snp_seq == s)
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
spin_unlock_irqrestore_rcu_node(snp, flags);
if (snp == sdp->mynode && snp_seq != s) {
srcu_schedule_cbs_sdp(sdp, do_norm
? SRCU_INTERVAL
: 0);
if (!srcu_invl_snp_seq(snp_seq) && ULONG_CMP_GE(snp_seq, s)) {
if (snp == snp_leaf && snp_seq == s)
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
spin_unlock_irqrestore_rcu_node(snp, flags);
if (snp == snp_leaf && snp_seq != s) {
srcu_schedule_cbs_sdp(sdp, do_norm ? SRCU_INTERVAL : 0);
return;
}
if (!do_norm)
srcu_funnel_exp_start(ssp, snp, s);
return;
}
if (!do_norm)
srcu_funnel_exp_start(ssp, snp, s);
return;
snp->srcu_have_cbs[idx] = s;
if (snp == snp_leaf)
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
sgsne = snp->srcu_gp_seq_needed_exp;
if (!do_norm && (srcu_invl_snp_seq(sgsne) || ULONG_CMP_LT(sgsne, s)))
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
spin_unlock_irqrestore_rcu_node(snp, flags);
}
snp->srcu_have_cbs[idx] = s;
if (snp == sdp->mynode)
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
if (!do_norm && ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, s))
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
spin_unlock_irqrestore_rcu_node(snp, flags);
}
/* Top of tree, must ensure the grace period will be started. */
spin_lock_irqsave_rcu_node(ssp, flags);
spin_lock_irqsave_ssp_contention(ssp, &flags);
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed, s)) {
/*
* Record need for grace period s. Pair with load
@@ -678,9 +899,15 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
rcu_seq_state(ssp->srcu_gp_seq) == SRCU_STATE_IDLE) {
WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
srcu_gp_start(ssp);
// And how can that list_add() in the "else" clause
// possibly be safe for concurrent execution? Well,
// it isn't. And it does not have to be. After all, it
// can only be executed during early boot when there is only
// the one boot CPU running with interrupts still disabled.
if (likely(srcu_init_done))
queue_delayed_work(rcu_gp_wq, &ssp->work,
srcu_get_delay(ssp));
!!srcu_get_delay(ssp));
else if (list_empty(&ssp->work.work.entry))
list_add(&ssp->work.work.entry, &srcu_boot_list);
}
@@ -814,11 +1041,17 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
bool needgp = false;
unsigned long s;
struct srcu_data *sdp;
struct srcu_node *sdp_mynode;
int ss_state;
check_init_srcu_struct(ssp);
idx = srcu_read_lock(ssp);
sdp = raw_cpu_ptr(ssp->sda);
spin_lock_irqsave_rcu_node(sdp, flags);
ss_state = smp_load_acquire(&ssp->srcu_size_state);
if (ss_state < SRCU_SIZE_WAIT_CALL)
sdp = per_cpu_ptr(ssp->sda, 0);
else
sdp = raw_cpu_ptr(ssp->sda);
spin_lock_irqsave_sdp_contention(sdp, &flags);
if (rhp)
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
rcu_segcblist_advance(&sdp->srcu_cblist,
@@ -834,10 +1067,17 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
needexp = true;
}
spin_unlock_irqrestore_rcu_node(sdp, flags);
/* Ensure that snp node tree is fully initialized before traversing it */
if (ss_state < SRCU_SIZE_WAIT_BARRIER)
sdp_mynode = NULL;
else
sdp_mynode = sdp->mynode;
if (needgp)
srcu_funnel_gp_start(ssp, sdp, s, do_norm);
else if (needexp)
srcu_funnel_exp_start(ssp, sdp->mynode, s);
srcu_funnel_exp_start(ssp, sdp_mynode, s);
srcu_read_unlock(ssp, idx);
return s;
}
@@ -1097,6 +1337,28 @@ static void srcu_barrier_cb(struct rcu_head *rhp)
complete(&ssp->srcu_barrier_completion);
}
/*
* Enqueue an srcu_barrier() callback on the specified srcu_data
* structure's ->cblist. but only if that ->cblist already has at least one
* callback enqueued. Note that if a CPU already has callbacks enqueue,
* it must have already registered the need for a future grace period,
* so all we need do is enqueue a callback that will use the same grace
* period as the last callback already in the queue.
*/
static void srcu_barrier_one_cpu(struct srcu_struct *ssp, struct srcu_data *sdp)
{
spin_lock_irq_rcu_node(sdp);
atomic_inc(&ssp->srcu_barrier_cpu_cnt);
sdp->srcu_barrier_head.func = srcu_barrier_cb;
debug_rcu_head_queue(&sdp->srcu_barrier_head);
if (!rcu_segcblist_entrain(&sdp->srcu_cblist,
&sdp->srcu_barrier_head)) {
debug_rcu_head_unqueue(&sdp->srcu_barrier_head);
atomic_dec(&ssp->srcu_barrier_cpu_cnt);
}
spin_unlock_irq_rcu_node(sdp);
}
/**
* srcu_barrier - Wait until all in-flight call_srcu() callbacks complete.
* @ssp: srcu_struct on which to wait for in-flight callbacks.
@@ -1104,7 +1366,7 @@ static void srcu_barrier_cb(struct rcu_head *rhp)
void srcu_barrier(struct srcu_struct *ssp)
{
int cpu;
struct srcu_data *sdp;
int idx;
unsigned long s = rcu_seq_snap(&ssp->srcu_barrier_seq);
check_init_srcu_struct(ssp);
@@ -1120,27 +1382,13 @@ void srcu_barrier(struct srcu_struct *ssp)
/* Initial count prevents reaching zero until all CBs are posted. */
atomic_set(&ssp->srcu_barrier_cpu_cnt, 1);
/*
* Each pass through this loop enqueues a callback, but only
* on CPUs already having callbacks enqueued. Note that if
* a CPU already has callbacks enqueue, it must have already
* registered the need for a future grace period, so all we
* need do is enqueue a callback that will use the same
* grace period as the last callback already in the queue.
*/
for_each_possible_cpu(cpu) {
sdp = per_cpu_ptr(ssp->sda, cpu);
spin_lock_irq_rcu_node(sdp);
atomic_inc(&ssp->srcu_barrier_cpu_cnt);
sdp->srcu_barrier_head.func = srcu_barrier_cb;
debug_rcu_head_queue(&sdp->srcu_barrier_head);
if (!rcu_segcblist_entrain(&sdp->srcu_cblist,
&sdp->srcu_barrier_head)) {
debug_rcu_head_unqueue(&sdp->srcu_barrier_head);
atomic_dec(&ssp->srcu_barrier_cpu_cnt);
}
spin_unlock_irq_rcu_node(sdp);
}
idx = srcu_read_lock(ssp);
if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, 0));
else
for_each_possible_cpu(cpu)
srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, cpu));
srcu_read_unlock(ssp, idx);
/* Remove the initial count, at which point reaching zero can happen. */
if (atomic_dec_and_test(&ssp->srcu_barrier_cpu_cnt))
@@ -1214,6 +1462,7 @@ static void srcu_advance_state(struct srcu_struct *ssp)
srcu_flip(ssp);
spin_lock_irq_rcu_node(ssp);
rcu_seq_set_state(&ssp->srcu_gp_seq, SRCU_STATE_SCAN2);
ssp->srcu_n_exp_nodelay = 0;
spin_unlock_irq_rcu_node(ssp);
}
@@ -1228,6 +1477,7 @@ static void srcu_advance_state(struct srcu_struct *ssp)
mutex_unlock(&ssp->srcu_gp_mutex);
return; /* readers present, retry later. */
}
ssp->srcu_n_exp_nodelay = 0;
srcu_gp_end(ssp); /* Releases ->srcu_gp_mutex. */
}
}
@@ -1318,12 +1568,28 @@ static void srcu_reschedule(struct srcu_struct *ssp, unsigned long delay)
*/
static void process_srcu(struct work_struct *work)
{
unsigned long curdelay;
unsigned long j;
struct srcu_struct *ssp;
ssp = container_of(work, struct srcu_struct, work.work);
srcu_advance_state(ssp);
srcu_reschedule(ssp, srcu_get_delay(ssp));
curdelay = srcu_get_delay(ssp);
if (curdelay) {
WRITE_ONCE(ssp->reschedule_count, 0);
} else {
j = jiffies;
if (READ_ONCE(ssp->reschedule_jiffies) == j) {
WRITE_ONCE(ssp->reschedule_count, READ_ONCE(ssp->reschedule_count) + 1);
if (READ_ONCE(ssp->reschedule_count) > SRCU_MAX_NODELAY)
curdelay = 1;
} else {
WRITE_ONCE(ssp->reschedule_count, 1);
WRITE_ONCE(ssp->reschedule_jiffies, j);
}
}
srcu_reschedule(ssp, curdelay);
}
void srcutorture_get_gp_data(enum rcutorture_type test_type,
@@ -1337,43 +1603,69 @@ void srcutorture_get_gp_data(enum rcutorture_type test_type,
}
EXPORT_SYMBOL_GPL(srcutorture_get_gp_data);
static const char * const srcu_size_state_name[] = {
"SRCU_SIZE_SMALL",
"SRCU_SIZE_ALLOC",
"SRCU_SIZE_WAIT_BARRIER",
"SRCU_SIZE_WAIT_CALL",
"SRCU_SIZE_WAIT_CBS1",
"SRCU_SIZE_WAIT_CBS2",
"SRCU_SIZE_WAIT_CBS3",
"SRCU_SIZE_WAIT_CBS4",
"SRCU_SIZE_BIG",
"SRCU_SIZE_???",
};
void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf)
{
int cpu;
int idx;
unsigned long s0 = 0, s1 = 0;
int ss_state = READ_ONCE(ssp->srcu_size_state);
int ss_state_idx = ss_state;
idx = ssp->srcu_idx & 0x1;
pr_alert("%s%s Tree SRCU g%ld per-CPU(idx=%d):",
tt, tf, rcu_seq_current(&ssp->srcu_gp_seq), idx);
for_each_possible_cpu(cpu) {
unsigned long l0, l1;
unsigned long u0, u1;
long c0, c1;
struct srcu_data *sdp;
if (ss_state < 0 || ss_state >= ARRAY_SIZE(srcu_size_state_name))
ss_state_idx = ARRAY_SIZE(srcu_size_state_name) - 1;
pr_alert("%s%s Tree SRCU g%ld state %d (%s)",
tt, tf, rcu_seq_current(&ssp->srcu_gp_seq), ss_state,
srcu_size_state_name[ss_state_idx]);
if (!ssp->sda) {
// Called after cleanup_srcu_struct(), perhaps.
pr_cont(" No per-CPU srcu_data structures (->sda == NULL).\n");
} else {
pr_cont(" per-CPU(idx=%d):", idx);
for_each_possible_cpu(cpu) {
unsigned long l0, l1;
unsigned long u0, u1;
long c0, c1;
struct srcu_data *sdp;
sdp = per_cpu_ptr(ssp->sda, cpu);
u0 = data_race(sdp->srcu_unlock_count[!idx]);
u1 = data_race(sdp->srcu_unlock_count[idx]);
sdp = per_cpu_ptr(ssp->sda, cpu);
u0 = data_race(sdp->srcu_unlock_count[!idx]);
u1 = data_race(sdp->srcu_unlock_count[idx]);
/*
* Make sure that a lock is always counted if the corresponding
* unlock is counted.
*/
smp_rmb();
/*
* Make sure that a lock is always counted if the corresponding
* unlock is counted.
*/
smp_rmb();
l0 = data_race(sdp->srcu_lock_count[!idx]);
l1 = data_race(sdp->srcu_lock_count[idx]);
l0 = data_race(sdp->srcu_lock_count[!idx]);
l1 = data_race(sdp->srcu_lock_count[idx]);
c0 = l0 - u0;
c1 = l1 - u1;
pr_cont(" %d(%ld,%ld %c)",
cpu, c0, c1,
"C."[rcu_segcblist_empty(&sdp->srcu_cblist)]);
s0 += c0;
s1 += c1;
c0 = l0 - u0;
c1 = l1 - u1;
pr_cont(" %d(%ld,%ld %c)",
cpu, c0, c1,
"C."[rcu_segcblist_empty(&sdp->srcu_cblist)]);
s0 += c0;
s1 += c1;
}
pr_cont(" T(%ld,%ld)\n", s0, s1);
}
pr_cont(" T(%ld,%ld)\n", s0, s1);
if (SRCU_SIZING_IS_TORTURE())
srcu_transition_to_big(ssp);
}
EXPORT_SYMBOL_GPL(srcu_torture_stats_print);
@@ -1390,6 +1682,17 @@ void __init srcu_init(void)
{
struct srcu_struct *ssp;
/* Decide on srcu_struct-size strategy. */
if (SRCU_SIZING_IS(SRCU_SIZING_AUTO)) {
if (nr_cpu_ids >= big_cpu_lim) {
convert_to_big = SRCU_SIZING_INIT; // Don't bother waiting for contention.
pr_info("%s: Setting srcu_struct sizes to big.\n", __func__);
} else {
convert_to_big = SRCU_SIZING_NONE | SRCU_SIZING_CONTEND;
pr_info("%s: Setting srcu_struct sizes based on contention.\n", __func__);
}
}
/*
* Once that is set, call_srcu() can follow the normal path and
* queue delayed work. This must follow RCU workqueues creation
@@ -1400,6 +1703,8 @@ void __init srcu_init(void)
ssp = list_first_entry(&srcu_boot_list, struct srcu_struct,
work.work.entry);
list_del_init(&ssp->work.work.entry);
if (SRCU_SIZING_IS(SRCU_SIZING_INIT) && ssp->srcu_size_state == SRCU_SIZE_SMALL)
ssp->srcu_size_state = SRCU_SIZE_ALLOC;
queue_work(rcu_gp_wq, &ssp->work.work);
}
}

View File

@@ -111,7 +111,7 @@ static void rcu_sync_func(struct rcu_head *rhp)
* a slowpath during the update. After this function returns, all
* subsequent calls to rcu_sync_is_idle() will return false, which
* tells readers to stay off their fastpaths. A later call to
* rcu_sync_exit() re-enables reader slowpaths.
* rcu_sync_exit() re-enables reader fastpaths.
*
* When called in isolation, rcu_sync_enter() must wait for a grace
* period, however, closely spaced calls to rcu_sync_enter() can

View File

@@ -46,7 +46,7 @@ struct rcu_tasks_percpu {
/**
* struct rcu_tasks - Definition for a Tasks-RCU-like mechanism.
* @cbs_wq: Wait queue allowing new callback to get kthread's attention.
* @cbs_wait: RCU wait allowing a new callback to get kthread's attention.
* @cbs_gbl_lock: Lock protecting callback list.
* @kthread_ptr: This flavor's grace-period/callback-invocation kthread.
* @gp_func: This flavor's grace-period-wait function.
@@ -77,7 +77,7 @@ struct rcu_tasks_percpu {
* @kname: This flavor's kthread name.
*/
struct rcu_tasks {
struct wait_queue_head cbs_wq;
struct rcuwait cbs_wait;
raw_spinlock_t cbs_gbl_lock;
int gp_state;
int gp_sleep;
@@ -113,11 +113,11 @@ static void call_rcu_tasks_iw_wakeup(struct irq_work *iwp);
#define DEFINE_RCU_TASKS(rt_name, gp, call, n) \
static DEFINE_PER_CPU(struct rcu_tasks_percpu, rt_name ## __percpu) = { \
.lock = __RAW_SPIN_LOCK_UNLOCKED(rt_name ## __percpu.cbs_pcpu_lock), \
.rtp_irq_work = IRQ_WORK_INIT(call_rcu_tasks_iw_wakeup), \
.rtp_irq_work = IRQ_WORK_INIT_HARD(call_rcu_tasks_iw_wakeup), \
}; \
static struct rcu_tasks rt_name = \
{ \
.cbs_wq = __WAIT_QUEUE_HEAD_INITIALIZER(rt_name.cbs_wq), \
.cbs_wait = __RCUWAIT_INITIALIZER(rt_name.wait), \
.cbs_gbl_lock = __RAW_SPIN_LOCK_UNLOCKED(rt_name.cbs_gbl_lock), \
.gp_func = gp, \
.call_func = call, \
@@ -143,6 +143,11 @@ module_param(rcu_task_ipi_delay, int, 0644);
#define RCU_TASK_STALL_TIMEOUT (HZ * 60 * 10)
static int rcu_task_stall_timeout __read_mostly = RCU_TASK_STALL_TIMEOUT;
module_param(rcu_task_stall_timeout, int, 0644);
#define RCU_TASK_STALL_INFO (HZ * 10)
static int rcu_task_stall_info __read_mostly = RCU_TASK_STALL_INFO;
module_param(rcu_task_stall_info, int, 0644);
static int rcu_task_stall_info_mult __read_mostly = 3;
module_param(rcu_task_stall_info_mult, int, 0444);
static int rcu_task_enqueue_lim __read_mostly = -1;
module_param(rcu_task_enqueue_lim, int, 0444);
@@ -261,14 +266,16 @@ static void call_rcu_tasks_iw_wakeup(struct irq_work *iwp)
struct rcu_tasks_percpu *rtpcp = container_of(iwp, struct rcu_tasks_percpu, rtp_irq_work);
rtp = rtpcp->rtpp;
wake_up(&rtp->cbs_wq);
rcuwait_wake_up(&rtp->cbs_wait);
}
// Enqueue a callback for the specified flavor of Tasks RCU.
static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
struct rcu_tasks *rtp)
{
int chosen_cpu;
unsigned long flags;
int ideal_cpu;
unsigned long j;
bool needadjust = false;
bool needwake;
@@ -278,8 +285,9 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
rhp->func = func;
local_irq_save(flags);
rcu_read_lock();
rtpcp = per_cpu_ptr(rtp->rtpcpu,
smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift));
ideal_cpu = smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift);
chosen_cpu = cpumask_next(ideal_cpu - 1, cpu_possible_mask);
rtpcp = per_cpu_ptr(rtp->rtpcpu, chosen_cpu);
if (!raw_spin_trylock_rcu_node(rtpcp)) { // irqs already disabled.
raw_spin_lock_rcu_node(rtpcp); // irqs already disabled.
j = jiffies;
@@ -460,7 +468,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
}
}
if (rcu_segcblist_empty(&rtpcp->cblist))
if (rcu_segcblist_empty(&rtpcp->cblist) || !cpu_possible(cpu))
return;
raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
@@ -509,7 +517,9 @@ static int __noreturn rcu_tasks_kthread(void *arg)
set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
/* If there were none, wait a bit and start over. */
wait_event_idle(rtp->cbs_wq, (needgpcb = rcu_tasks_need_gpcb(rtp)));
rcuwait_wait_event(&rtp->cbs_wait,
(needgpcb = rcu_tasks_need_gpcb(rtp)),
TASK_IDLE);
if (needgpcb & 0x2) {
// Wait for one grace period.
@@ -548,8 +558,15 @@ static void __init rcu_spawn_tasks_kthread_generic(struct rcu_tasks *rtp)
static void __init rcu_tasks_bootup_oddness(void)
{
#if defined(CONFIG_TASKS_RCU) || defined(CONFIG_TASKS_TRACE_RCU)
int rtsimc;
if (rcu_task_stall_timeout != RCU_TASK_STALL_TIMEOUT)
pr_info("\tTasks-RCU CPU stall warnings timeout set to %d (rcu_task_stall_timeout).\n", rcu_task_stall_timeout);
rtsimc = clamp(rcu_task_stall_info_mult, 1, 10);
if (rtsimc != rcu_task_stall_info_mult) {
pr_info("\tTasks-RCU CPU stall info multiplier clamped to %d (rcu_task_stall_info_mult).\n", rtsimc);
rcu_task_stall_info_mult = rtsimc;
}
#endif /* #ifdef CONFIG_TASKS_RCU */
#ifdef CONFIG_TASKS_RCU
pr_info("\tTrampoline variant of Tasks RCU enabled.\n");
@@ -568,7 +585,17 @@ static void __init rcu_tasks_bootup_oddness(void)
/* Dump out rcutorture-relevant state common to all RCU-tasks flavors. */
static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s)
{
struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, 0); // for_each...
int cpu;
bool havecbs = false;
for_each_possible_cpu(cpu) {
struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
if (!data_race(rcu_segcblist_empty(&rtpcp->cblist))) {
havecbs = true;
break;
}
}
pr_info("%s: %s(%d) since %lu g:%lu i:%lu/%lu %c%c %s\n",
rtp->kname,
tasks_gp_state_getname(rtp), data_race(rtp->gp_state),
@@ -576,7 +603,7 @@ static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s)
data_race(rcu_seq_current(&rtp->tasks_gp_seq)),
data_race(rtp->n_ipis_fails), data_race(rtp->n_ipis),
".k"[!!data_race(rtp->kthread_ptr)],
".C"[!data_race(rcu_segcblist_empty(&rtpcp->cblist))],
".C"[havecbs],
s);
}
#endif // #ifndef CONFIG_TINY_RCU
@@ -592,10 +619,15 @@ static void exit_tasks_rcu_finish_trace(struct task_struct *t);
/* Wait for one RCU-tasks grace period. */
static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
{
struct task_struct *g, *t;
unsigned long lastreport;
LIST_HEAD(holdouts);
struct task_struct *g;
int fract;
LIST_HEAD(holdouts);
unsigned long j;
unsigned long lastinfo;
unsigned long lastreport;
bool reported = false;
int rtsi;
struct task_struct *t;
set_tasks_gp_state(rtp, RTGS_PRE_WAIT_GP);
rtp->pregp_func();
@@ -621,30 +653,50 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
* is empty, we are done.
*/
lastreport = jiffies;
lastinfo = lastreport;
rtsi = READ_ONCE(rcu_task_stall_info);
// Start off with initial wait and slowly back off to 1 HZ wait.
fract = rtp->init_fract;
while (!list_empty(&holdouts)) {
ktime_t exp;
bool firstreport;
bool needreport;
int rtst;
/* Slowly back off waiting for holdouts */
// Slowly back off waiting for holdouts
set_tasks_gp_state(rtp, RTGS_WAIT_SCAN_HOLDOUTS);
schedule_timeout_idle(fract);
if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
schedule_timeout_idle(fract);
} else {
exp = jiffies_to_nsecs(fract);
__set_current_state(TASK_IDLE);
schedule_hrtimeout_range(&exp, jiffies_to_nsecs(HZ / 2), HRTIMER_MODE_REL_HARD);
}
if (fract < HZ)
fract++;
rtst = READ_ONCE(rcu_task_stall_timeout);
needreport = rtst > 0 && time_after(jiffies, lastreport + rtst);
if (needreport)
if (needreport) {
lastreport = jiffies;
reported = true;
}
firstreport = true;
WARN_ON(signal_pending(current));
set_tasks_gp_state(rtp, RTGS_SCAN_HOLDOUTS);
rtp->holdouts_func(&holdouts, needreport, &firstreport);
// Print pre-stall informational messages if needed.
j = jiffies;
if (rtsi > 0 && !reported && time_after(j, lastinfo + rtsi)) {
lastinfo = j;
rtsi = rtsi * rcu_task_stall_info_mult;
pr_info("%s: %s grace period %lu is %lu jiffies old.\n",
__func__, rtp->kname, rtp->tasks_gp_seq, j - rtp->gp_start);
}
}
set_tasks_gp_state(rtp, RTGS_POST_GP);
@@ -950,6 +1002,9 @@ static void rcu_tasks_be_rude(struct work_struct *work)
// Wait for one rude RCU-tasks grace period.
static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
{
if (num_online_cpus() <= 1)
return; // Fastpath for only one CPU.
rtp->n_ipis += cpumask_weight(cpu_online_mask);
schedule_on_each_cpu(rcu_tasks_be_rude);
}

View File

@@ -1679,6 +1679,8 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */
if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap)
WRITE_ONCE(rdp->gp_seq_needed, rnp->gp_seq_needed);
if (IS_ENABLED(CONFIG_PROVE_RCU) && READ_ONCE(rdp->gpwrap))
WRITE_ONCE(rdp->last_sched_clock, jiffies);
WRITE_ONCE(rdp->gpwrap, false);
rcu_gpnum_ovf(rnp, rdp);
return ret;
@@ -1705,11 +1707,37 @@ static void note_gp_changes(struct rcu_data *rdp)
rcu_gp_kthread_wake();
}
static atomic_t *rcu_gp_slow_suppress;
/* Register a counter to suppress debugging grace-period delays. */
void rcu_gp_slow_register(atomic_t *rgssp)
{
WARN_ON_ONCE(rcu_gp_slow_suppress);
WRITE_ONCE(rcu_gp_slow_suppress, rgssp);
}
EXPORT_SYMBOL_GPL(rcu_gp_slow_register);
/* Unregister a counter, with NULL for not caring which. */
void rcu_gp_slow_unregister(atomic_t *rgssp)
{
WARN_ON_ONCE(rgssp && rgssp != rcu_gp_slow_suppress);
WRITE_ONCE(rcu_gp_slow_suppress, NULL);
}
EXPORT_SYMBOL_GPL(rcu_gp_slow_unregister);
static bool rcu_gp_slow_is_suppressed(void)
{
atomic_t *rgssp = READ_ONCE(rcu_gp_slow_suppress);
return rgssp && atomic_read(rgssp);
}
static void rcu_gp_slow(int delay)
{
if (delay > 0 &&
!(rcu_seq_ctr(rcu_state.gp_seq) %
(rcu_num_nodes * PER_RCU_NODE_PERIOD * delay)))
if (!rcu_gp_slow_is_suppressed() && delay > 0 &&
!(rcu_seq_ctr(rcu_state.gp_seq) % (rcu_num_nodes * PER_RCU_NODE_PERIOD * delay)))
schedule_timeout_idle(delay);
}
@@ -2096,14 +2124,29 @@ static noinline void rcu_gp_cleanup(void)
/* Advance CBs to reduce false positives below. */
offloaded = rcu_rdp_is_offloaded(rdp);
if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
// We get here if a grace period was needed (“needgp”)
// and the above call to rcu_accelerate_cbs() did not set
// the RCU_GP_FLAG_INIT bit in ->gp_state (which records
// the need for another grace period).  The purpose
// of the “offloaded” check is to avoid invoking
// rcu_accelerate_cbs() on an offloaded CPU because we do not
// hold the ->nocb_lock needed to safely access an offloaded
// ->cblist.  We do not want to acquire that lock because
// it can be heavily contended during callback floods.
WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
trace_rcu_grace_period(rcu_state.name,
rcu_state.gp_seq,
TPS("newreq"));
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("newreq"));
} else {
WRITE_ONCE(rcu_state.gp_flags,
rcu_state.gp_flags & RCU_GP_FLAG_INIT);
// We get here either if there is no need for an
// additional grace period or if rcu_accelerate_cbs() has
// already set the RCU_GP_FLAG_INIT bit in ->gp_flags. 
// So all we need to do is to clear all of the other
// ->gp_flags bits.
WRITE_ONCE(rcu_state.gp_flags, rcu_state.gp_flags & RCU_GP_FLAG_INIT);
}
raw_spin_unlock_irq_rcu_node(rnp);
@@ -2609,6 +2652,13 @@ static void rcu_do_batch(struct rcu_data *rdp)
*/
void rcu_sched_clock_irq(int user)
{
unsigned long j;
if (IS_ENABLED(CONFIG_PROVE_RCU)) {
j = jiffies;
WARN_ON_ONCE(time_before(j, __this_cpu_read(rcu_data.last_sched_clock)));
__this_cpu_write(rcu_data.last_sched_clock, j);
}
trace_rcu_utilization(TPS("Start scheduler-tick"));
lockdep_assert_irqs_disabled();
raw_cpu_inc(rcu_data.ticks_this_gp);
@@ -2624,6 +2674,8 @@ void rcu_sched_clock_irq(int user)
rcu_flavor_sched_clock_irq(user);
if (rcu_pending(user))
invoke_rcu_core();
if (user)
rcu_tasks_classic_qs(current, false);
lockdep_assert_irqs_disabled();
trace_rcu_utilization(TPS("End scheduler-tick"));
@@ -3717,7 +3769,9 @@ static int rcu_blocking_is_gp(void)
{
int ret;
if (IS_ENABLED(CONFIG_PREEMPTION))
// Invoking preempt_model_*() too early gets a splat.
if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE ||
preempt_model_full() || preempt_model_rt())
return rcu_scheduler_active == RCU_SCHEDULER_INACTIVE;
might_sleep(); /* Check for RCU read-side critical section. */
preempt_disable();
@@ -4179,6 +4233,7 @@ rcu_boot_init_percpu_data(int cpu)
rdp->rcu_ofl_gp_flags = RCU_GP_CLEANED;
rdp->rcu_onl_gp_seq = rcu_state.gp_seq;
rdp->rcu_onl_gp_flags = RCU_GP_CLEANED;
rdp->last_sched_clock = jiffies;
rdp->cpu = cpu;
rcu_boot_init_nocb_percpu_data(rdp);
}
@@ -4471,6 +4526,51 @@ static int rcu_pm_notify(struct notifier_block *self,
return NOTIFY_OK;
}
#ifdef CONFIG_RCU_EXP_KTHREAD
struct kthread_worker *rcu_exp_gp_kworker;
struct kthread_worker *rcu_exp_par_gp_kworker;
static void __init rcu_start_exp_gp_kworkers(void)
{
const char *par_gp_kworker_name = "rcu_exp_par_gp_kthread_worker";
const char *gp_kworker_name = "rcu_exp_gp_kthread_worker";
struct sched_param param = { .sched_priority = kthread_prio };
rcu_exp_gp_kworker = kthread_create_worker(0, gp_kworker_name);
if (IS_ERR_OR_NULL(rcu_exp_gp_kworker)) {
pr_err("Failed to create %s!\n", gp_kworker_name);
return;
}
rcu_exp_par_gp_kworker = kthread_create_worker(0, par_gp_kworker_name);
if (IS_ERR_OR_NULL(rcu_exp_par_gp_kworker)) {
pr_err("Failed to create %s!\n", par_gp_kworker_name);
kthread_destroy_worker(rcu_exp_gp_kworker);
return;
}
sched_setscheduler_nocheck(rcu_exp_gp_kworker->task, SCHED_FIFO, &param);
sched_setscheduler_nocheck(rcu_exp_par_gp_kworker->task, SCHED_FIFO,
&param);
}
static inline void rcu_alloc_par_gp_wq(void)
{
}
#else /* !CONFIG_RCU_EXP_KTHREAD */
struct workqueue_struct *rcu_par_gp_wq;
static void __init rcu_start_exp_gp_kworkers(void)
{
}
static inline void rcu_alloc_par_gp_wq(void)
{
rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
WARN_ON(!rcu_par_gp_wq);
}
#endif /* CONFIG_RCU_EXP_KTHREAD */
/*
* Spawn the kthreads that handle RCU's grace periods.
*/
@@ -4480,6 +4580,7 @@ static int __init rcu_spawn_gp_kthread(void)
struct rcu_node *rnp;
struct sched_param sp;
struct task_struct *t;
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
rcu_scheduler_fully_active = 1;
t = kthread_create(rcu_gp_kthread, NULL, "%s", rcu_state.name);
@@ -4497,9 +4598,17 @@ static int __init rcu_spawn_gp_kthread(void)
smp_store_release(&rcu_state.gp_kthread, t); /* ^^^ */
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
wake_up_process(t);
rcu_spawn_nocb_kthreads();
rcu_spawn_boost_kthreads();
/* This is a pre-SMP initcall, we expect a single CPU */
WARN_ON(num_online_cpus() > 1);
/*
* Those kthreads couldn't be created on rcu_init() -> rcutree_prepare_cpu()
* due to rcu_scheduler_fully_active.
*/
rcu_spawn_cpu_nocb_kthread(smp_processor_id());
rcu_spawn_one_boost_kthread(rdp->mynode);
rcu_spawn_core_kthreads();
/* Create kthread worker for expedited GPs */
rcu_start_exp_gp_kworkers();
return 0;
}
early_initcall(rcu_spawn_gp_kthread);
@@ -4745,7 +4854,6 @@ static void __init rcu_dump_rcu_node_tree(void)
}
struct workqueue_struct *rcu_gp_wq;
struct workqueue_struct *rcu_par_gp_wq;
static void __init kfree_rcu_batch_init(void)
{
@@ -4782,7 +4890,7 @@ static void __init kfree_rcu_batch_init(void)
void __init rcu_init(void)
{
int cpu;
int cpu = smp_processor_id();
rcu_early_boot_tests();
@@ -4802,17 +4910,15 @@ void __init rcu_init(void)
* or the scheduler are operational.
*/
pm_notifier(rcu_pm_notify, 0);
for_each_online_cpu(cpu) {
rcutree_prepare_cpu(cpu);
rcu_cpu_starting(cpu);
rcutree_online_cpu(cpu);
}
WARN_ON(num_online_cpus() > 1); // Only one CPU this early in boot.
rcutree_prepare_cpu(cpu);
rcu_cpu_starting(cpu);
rcutree_online_cpu(cpu);
/* Create workqueue for Tree SRCU and for expedited GPs. */
rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM, 0);
WARN_ON(!rcu_gp_wq);
rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
WARN_ON(!rcu_par_gp_wq);
rcu_alloc_par_gp_wq();
/* Fill in default value for rcutree.qovld boot parameter. */
/* -After- the rcu_node ->lock fields are initialized! */

View File

@@ -10,6 +10,7 @@
*/
#include <linux/cache.h>
#include <linux/kthread.h>
#include <linux/spinlock.h>
#include <linux/rtmutex.h>
#include <linux/threads.h>
@@ -23,7 +24,11 @@
/* Communicate arguments to a workqueue handler. */
struct rcu_exp_work {
unsigned long rew_s;
#ifdef CONFIG_RCU_EXP_KTHREAD
struct kthread_work rew_work;
#else
struct work_struct rew_work;
#endif /* CONFIG_RCU_EXP_KTHREAD */
};
/* RCU's kthread states for tracing. */
@@ -254,6 +259,7 @@ struct rcu_data {
unsigned long rcu_onl_gp_seq; /* ->gp_seq at last online. */
short rcu_onl_gp_flags; /* ->gp_flags at last online. */
unsigned long last_fqs_resched; /* Time of last rcu_resched(). */
unsigned long last_sched_clock; /* Jiffies of last rcu_sched_clock_irq(). */
int cpu;
};
@@ -364,6 +370,7 @@ struct rcu_state {
arch_spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
/* Synchronize offline with */
/* GP pre-initialization. */
int nocb_is_setup; /* nocb is setup from boot */
};
/* Values for rcu_state structure's gp_flags field. */
@@ -421,7 +428,6 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
static bool rcu_is_callbacks_kthread(void);
static void rcu_cpu_kthread_setup(unsigned int cpu);
static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp);
static void __init rcu_spawn_boost_kthreads(void);
static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
static bool rcu_preempt_need_deferred_qs(struct task_struct *t);
static void rcu_preempt_deferred_qs(struct task_struct *t);
@@ -439,7 +445,6 @@ static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
static bool do_nocb_deferred_wakeup(struct rcu_data *rdp);
static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp);
static void rcu_spawn_cpu_nocb_kthread(int cpu);
static void __init rcu_spawn_nocb_kthreads(void);
static void show_rcu_nocb_state(struct rcu_data *rdp);
static void rcu_nocb_lock(struct rcu_data *rdp);
static void rcu_nocb_unlock(struct rcu_data *rdp);

View File

@@ -334,15 +334,13 @@ fastpath:
* Select the CPUs within the specified rcu_node that the upcoming
* expedited grace period needs to wait for.
*/
static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp)
{
int cpu;
unsigned long flags;
unsigned long mask_ofl_test;
unsigned long mask_ofl_ipi;
int ret;
struct rcu_exp_work *rewp =
container_of(wp, struct rcu_exp_work, rew_work);
struct rcu_node *rnp = container_of(rewp, struct rcu_node, rew);
raw_spin_lock_irqsave_rcu_node(rnp, flags);
@@ -417,13 +415,119 @@ retry_ipi:
rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false);
}
static void rcu_exp_sel_wait_wake(unsigned long s);
#ifdef CONFIG_RCU_EXP_KTHREAD
static void sync_rcu_exp_select_node_cpus(struct kthread_work *wp)
{
struct rcu_exp_work *rewp =
container_of(wp, struct rcu_exp_work, rew_work);
__sync_rcu_exp_select_node_cpus(rewp);
}
static inline bool rcu_gp_par_worker_started(void)
{
return !!READ_ONCE(rcu_exp_par_gp_kworker);
}
static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp)
{
kthread_init_work(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
/*
* Use rcu_exp_par_gp_kworker, because flushing a work item from
* another work item on the same kthread worker can result in
* deadlock.
*/
kthread_queue_work(rcu_exp_par_gp_kworker, &rnp->rew.rew_work);
}
static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp)
{
kthread_flush_work(&rnp->rew.rew_work);
}
/*
* Work-queue handler to drive an expedited grace period forward.
*/
static void wait_rcu_exp_gp(struct kthread_work *wp)
{
struct rcu_exp_work *rewp;
rewp = container_of(wp, struct rcu_exp_work, rew_work);
rcu_exp_sel_wait_wake(rewp->rew_s);
}
static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew)
{
kthread_init_work(&rew->rew_work, wait_rcu_exp_gp);
kthread_queue_work(rcu_exp_gp_kworker, &rew->rew_work);
}
static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew)
{
}
#else /* !CONFIG_RCU_EXP_KTHREAD */
static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
{
struct rcu_exp_work *rewp =
container_of(wp, struct rcu_exp_work, rew_work);
__sync_rcu_exp_select_node_cpus(rewp);
}
static inline bool rcu_gp_par_worker_started(void)
{
return !!READ_ONCE(rcu_par_gp_wq);
}
static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp)
{
int cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1);
INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
/* If all offline, queue the work on an unbound CPU. */
if (unlikely(cpu > rnp->grphi - rnp->grplo))
cpu = WORK_CPU_UNBOUND;
else
cpu += rnp->grplo;
queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
}
static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp)
{
flush_work(&rnp->rew.rew_work);
}
/*
* Work-queue handler to drive an expedited grace period forward.
*/
static void wait_rcu_exp_gp(struct work_struct *wp)
{
struct rcu_exp_work *rewp;
rewp = container_of(wp, struct rcu_exp_work, rew_work);
rcu_exp_sel_wait_wake(rewp->rew_s);
}
static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew)
{
INIT_WORK_ONSTACK(&rew->rew_work, wait_rcu_exp_gp);
queue_work(rcu_gp_wq, &rew->rew_work);
}
static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew)
{
destroy_work_on_stack(&rew->rew_work);
}
#endif /* CONFIG_RCU_EXP_KTHREAD */
/*
* Select the nodes that the upcoming expedited grace period needs
* to wait for.
*/
static void sync_rcu_exp_select_cpus(void)
{
int cpu;
struct rcu_node *rnp;
trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("reset"));
@@ -435,28 +539,21 @@ static void sync_rcu_exp_select_cpus(void)
rnp->exp_need_flush = false;
if (!READ_ONCE(rnp->expmask))
continue; /* Avoid early boot non-existent wq. */
if (!READ_ONCE(rcu_par_gp_wq) ||
if (!rcu_gp_par_worker_started() ||
rcu_scheduler_active != RCU_SCHEDULER_RUNNING ||
rcu_is_last_leaf_node(rnp)) {
/* No workqueues yet or last leaf, do direct call. */
/* No worker started yet or last leaf, do direct call. */
sync_rcu_exp_select_node_cpus(&rnp->rew.rew_work);
continue;
}
INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1);
/* If all offline, queue the work on an unbound CPU. */
if (unlikely(cpu > rnp->grphi - rnp->grplo))
cpu = WORK_CPU_UNBOUND;
else
cpu += rnp->grplo;
queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
sync_rcu_exp_select_cpus_queue_work(rnp);
rnp->exp_need_flush = true;
}
/* Wait for workqueue jobs (if any) to complete. */
/* Wait for jobs (if any) to complete. */
rcu_for_each_leaf_node(rnp)
if (rnp->exp_need_flush)
flush_work(&rnp->rew.rew_work);
sync_rcu_exp_select_cpus_flush_work(rnp);
}
/*
@@ -496,7 +593,7 @@ static void synchronize_rcu_expedited_wait(void)
struct rcu_node *rnp_root = rcu_get_root();
trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
jiffies_stall = rcu_jiffies_till_stall_check();
jiffies_stall = rcu_exp_jiffies_till_stall_check();
jiffies_start = jiffies;
if (tick_nohz_full_enabled() && rcu_inkernel_boot_has_ended()) {
if (synchronize_rcu_expedited_wait_once(1))
@@ -571,7 +668,7 @@ static void synchronize_rcu_expedited_wait(void)
dump_cpu_task(cpu);
}
}
jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
}
}
@@ -622,17 +719,6 @@ static void rcu_exp_sel_wait_wake(unsigned long s)
rcu_exp_wait_wake(s);
}
/*
* Work-queue handler to drive an expedited grace period forward.
*/
static void wait_rcu_exp_gp(struct work_struct *wp)
{
struct rcu_exp_work *rewp;
rewp = container_of(wp, struct rcu_exp_work, rew_work);
rcu_exp_sel_wait_wake(rewp->rew_s);
}
#ifdef CONFIG_PREEMPT_RCU
/*
@@ -848,20 +934,19 @@ void synchronize_rcu_expedited(void)
} else {
/* Marshall arguments & schedule the expedited grace period. */
rew.rew_s = s;
INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
queue_work(rcu_gp_wq, &rew.rew_work);
synchronize_rcu_expedited_queue_work(&rew);
}
/* Wait for expedited grace period to complete. */
rnp = rcu_get_root();
wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3],
sync_exp_work_done(s));
smp_mb(); /* Workqueue actions happen before return. */
smp_mb(); /* Work actions happen before return. */
/* Let the next expedited grace period start. */
mutex_unlock(&rcu_state.exp_mutex);
if (likely(!boottime))
destroy_work_on_stack(&rew.rew_work);
synchronize_rcu_expedited_destroy_work(&rew);
}
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

View File

@@ -60,9 +60,6 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
* Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
* If the list is invalid, a warning is emitted and all CPUs are offloaded.
*/
static bool rcu_nocb_is_setup;
static int __init rcu_nocb_setup(char *str)
{
alloc_bootmem_cpumask_var(&rcu_nocb_mask);
@@ -72,7 +69,7 @@ static int __init rcu_nocb_setup(char *str)
cpumask_setall(rcu_nocb_mask);
}
}
rcu_nocb_is_setup = true;
rcu_state.nocb_is_setup = true;
return 1;
}
__setup("rcu_nocbs", rcu_nocb_setup);
@@ -215,14 +212,6 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
init_swait_queue_head(&rnp->nocb_gp_wq[1]);
}
/* Is the specified CPU a no-CBs CPU? */
bool rcu_is_nocb_cpu(int cpu)
{
if (cpumask_available(rcu_nocb_mask))
return cpumask_test_cpu(cpu, rcu_nocb_mask);
return false;
}
static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
struct rcu_data *rdp,
bool force, unsigned long flags)
@@ -1180,10 +1169,10 @@ void __init rcu_init_nohz(void)
return;
}
}
rcu_nocb_is_setup = true;
rcu_state.nocb_is_setup = true;
}
if (!rcu_nocb_is_setup)
if (!rcu_state.nocb_is_setup)
return;
#if defined(CONFIG_NO_HZ_FULL)
@@ -1241,7 +1230,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
struct task_struct *t;
struct sched_param sp;
if (!rcu_scheduler_fully_active || !rcu_nocb_is_setup)
if (!rcu_scheduler_fully_active || !rcu_state.nocb_is_setup)
return;
/* If there already is an rcuo kthread, then nothing to do. */
@@ -1277,22 +1266,6 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
}
/*
* Once the scheduler is running, spawn rcuo kthreads for all online
* no-CBs CPUs. This assumes that the early_initcall()s happen before
* non-boot CPUs come online -- if this changes, we will need to add
* some mutual exclusion.
*/
static void __init rcu_spawn_nocb_kthreads(void)
{
int cpu;
if (rcu_nocb_is_setup) {
for_each_online_cpu(cpu)
rcu_spawn_cpu_nocb_kthread(cpu);
}
}
/* How many CB CPU IDs per GP kthread? Default of -1 for sqrt(nr_cpu_ids). */
static int rcu_nocb_gp_stride = -1;
module_param(rcu_nocb_gp_stride, int, 0444);
@@ -1549,10 +1522,6 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
{
}
static void __init rcu_spawn_nocb_kthreads(void)
{
}
static void show_rcu_nocb_state(struct rcu_data *rdp)
{
}

View File

@@ -486,6 +486,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
t->rcu_read_unlock_special.s = 0;
if (special.b.need_qs) {
if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) {
rdp->cpu_no_qs.b.norm = false;
rcu_report_qs_rdp(rdp);
udelay(rcu_unlock_delay);
} else {
@@ -660,7 +661,13 @@ static void rcu_read_unlock_special(struct task_struct *t)
expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
// Get scheduler to re-evaluate and call hooks.
// If !IRQ_WORK, FQS scan will eventually IPI.
init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler);
if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
IS_ENABLED(CONFIG_PREEMPT_RT))
rdp->defer_qs_iw = IRQ_WORK_INIT_HARD(
rcu_preempt_deferred_qs_handler);
else
init_irq_work(&rdp->defer_qs_iw,
rcu_preempt_deferred_qs_handler);
rdp->defer_qs_iw_pending = true;
irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
}
@@ -1124,7 +1131,8 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
__releases(rnp->lock)
{
raw_lockdep_assert_held_rcu_node(rnp);
if (!rcu_preempt_blocked_readers_cgp(rnp) && rnp->exp_tasks == NULL) {
if (!rnp->boost_kthread_task ||
(!rcu_preempt_blocked_readers_cgp(rnp) && !rnp->exp_tasks)) {
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
return;
}
@@ -1226,18 +1234,6 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
free_cpumask_var(cm);
}
/*
* Spawn boost kthreads -- called as soon as the scheduler is running.
*/
static void __init rcu_spawn_boost_kthreads(void)
{
struct rcu_node *rnp;
rcu_for_each_leaf_node(rnp)
if (rcu_rnp_online_cpus(rnp))
rcu_spawn_one_boost_kthread(rnp);
}
#else /* #ifdef CONFIG_RCU_BOOST */
static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
@@ -1263,10 +1259,6 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
{
}
static void __init rcu_spawn_boost_kthreads(void)
{
}
#endif /* #else #ifdef CONFIG_RCU_BOOST */
/*

View File

@@ -25,6 +25,34 @@ int sysctl_max_rcu_stall_to_panic __read_mostly;
#define RCU_STALL_MIGHT_DIV 8
#define RCU_STALL_MIGHT_MIN (2 * HZ)
int rcu_exp_jiffies_till_stall_check(void)
{
int cpu_stall_timeout = READ_ONCE(rcu_exp_cpu_stall_timeout);
int exp_stall_delay_delta = 0;
int till_stall_check;
// Zero says to use rcu_cpu_stall_timeout, but in milliseconds.
if (!cpu_stall_timeout)
cpu_stall_timeout = jiffies_to_msecs(rcu_jiffies_till_stall_check());
// Limit check must be consistent with the Kconfig limits for
// CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, so check the allowed range.
// The minimum clamped value is "2UL", because at least one full
// tick has to be guaranteed.
till_stall_check = clamp(msecs_to_jiffies(cpu_stall_timeout), 2UL, 21UL * HZ);
if (cpu_stall_timeout && jiffies_to_msecs(till_stall_check) != cpu_stall_timeout)
WRITE_ONCE(rcu_exp_cpu_stall_timeout, jiffies_to_msecs(till_stall_check));
#ifdef CONFIG_PROVE_RCU
/* Add extra ~25% out of till_stall_check. */
exp_stall_delay_delta = ((till_stall_check * 25) / 100) + 1;
#endif
return till_stall_check + exp_stall_delay_delta;
}
EXPORT_SYMBOL_GPL(rcu_exp_jiffies_till_stall_check);
/* Limit-check stall timeouts specified at boottime and runtime. */
int rcu_jiffies_till_stall_check(void)
{
@@ -565,9 +593,9 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
for_each_possible_cpu(cpu)
totqlen += rcu_get_n_cbs_cpu(cpu);
pr_cont("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu)\n",
pr_cont("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu ncpus=%d)\n",
smp_processor_id(), (long)(jiffies - gps),
(long)rcu_seq_current(&rcu_state.gp_seq), totqlen);
(long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);
if (ndetected) {
rcu_dump_cpu_stacks();
@@ -626,9 +654,9 @@ static void print_cpu_stall(unsigned long gps)
raw_spin_unlock_irqrestore_rcu_node(rdp->mynode, flags);
for_each_possible_cpu(cpu)
totqlen += rcu_get_n_cbs_cpu(cpu);
pr_cont("\t(t=%lu jiffies g=%ld q=%lu)\n",
pr_cont("\t(t=%lu jiffies g=%ld q=%lu ncpus=%d)\n",
jiffies - gps,
(long)rcu_seq_current(&rcu_state.gp_seq), totqlen);
(long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);
rcu_check_gp_kthread_expired_fqs_timer();
rcu_check_gp_kthread_starvation();

View File

@@ -506,6 +506,8 @@ EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress);
module_param(rcu_cpu_stall_suppress, int, 0644);
int rcu_cpu_stall_timeout __read_mostly = CONFIG_RCU_CPU_STALL_TIMEOUT;
module_param(rcu_cpu_stall_timeout, int, 0644);
int rcu_exp_cpu_stall_timeout __read_mostly = CONFIG_RCU_EXP_CPU_STALL_TIMEOUT;
module_param(rcu_exp_cpu_stall_timeout, int, 0644);
#endif /* #ifdef CONFIG_RCU_STALL_COMMON */
// Suppress boot-time RCU CPU stall warnings and rcutorture writer stall

View File

@@ -267,9 +267,10 @@ static void scf_handler(void *scfc_in)
}
this_cpu_inc(scf_invoked_count);
if (longwait <= 0) {
if (!(r & 0xffc0))
if (!(r & 0xffc0)) {
udelay(r & 0x3f);
goto out;
goto out;
}
}
if (r & 0xfff)
goto out;

View File

@@ -8488,6 +8488,18 @@ static void __init preempt_dynamic_init(void)
}
}
#define PREEMPT_MODEL_ACCESSOR(mode) \
bool preempt_model_##mode(void) \
{ \
WARN_ON_ONCE(preempt_dynamic_mode == preempt_dynamic_undefined); \
return preempt_dynamic_mode == preempt_dynamic_##mode; \
} \
EXPORT_SYMBOL_GPL(preempt_model_##mode)
PREEMPT_MODEL_ACCESSOR(none);
PREEMPT_MODEL_ACCESSOR(voluntary);
PREEMPT_MODEL_ACCESSOR(full);
#else /* !CONFIG_PREEMPT_DYNAMIC */
static inline void preempt_dynamic_init(void) { }

View File

@@ -183,7 +183,9 @@ static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
static DEFINE_PER_CPU(void *, cur_csd_info);
static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local);
#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC)
static ulong csd_lock_timeout = 5000; /* CSD lock timeout in milliseconds. */
module_param(csd_lock_timeout, ulong, 0444);
static atomic_t csd_bug_count = ATOMIC_INIT(0);
static u64 cfd_seq;
@@ -329,6 +331,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
u64 ts2, ts_delta;
call_single_data_t *cpu_cur_csd;
unsigned int flags = READ_ONCE(csd->node.u_flags);
unsigned long long csd_lock_timeout_ns = csd_lock_timeout * NSEC_PER_MSEC;
if (!(flags & CSD_FLAG_LOCK)) {
if (!unlikely(*bug_id))
@@ -341,7 +344,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
ts2 = sched_clock();
ts_delta = ts2 - *ts1;
if (likely(ts_delta <= CSD_LOCK_TIMEOUT))
if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0))
return false;
firsttime = !*bug_id;

View File

@@ -144,6 +144,7 @@ config TRACING
select BINARY_PRINTF
select EVENT_TRACING
select TRACE_CLOCK
select TASKS_RCU if PREEMPTION
config GENERIC_TRACER
bool

View File

@@ -24,6 +24,7 @@ help:
@echo ' intel-speed-select - Intel Speed Select tool'
@echo ' kvm_stat - top-like utility for displaying kvm statistics'
@echo ' leds - LEDs tools'
@echo ' nolibc - nolibc headers testing and installation'
@echo ' objtool - an ELF object analysis tool'
@echo ' pci - PCI tools'
@echo ' perf - Linux performance measurement and analysis tool'
@@ -74,6 +75,9 @@ bpf/%: FORCE
libapi: FORCE
$(call descend,lib/api)
nolibc_%: FORCE
$(call descend,include/nolibc,$(patsubst nolibc_%,%,$@))
# The perf build does not follow the descend function setup,
# invoking it via it's own make rule.
PERF_O = $(if $(O),$(O)/tools/perf,)

View File

@@ -0,0 +1,42 @@
# SPDX-License-Identifier: GPL-2.0
# Makefile for nolibc installation and tests
include ../../scripts/Makefile.include
# we're in ".../tools/include/nolibc"
ifeq ($(srctree),)
srctree := $(patsubst %/tools/include/,%,$(dir $(CURDIR)))
endif
nolibc_arch := $(patsubst arm64,aarch64,$(ARCH))
arch_file := arch-$(nolibc_arch).h
all_files := ctype.h errno.h nolibc.h signal.h std.h stdio.h stdlib.h string.h \
sys.h time.h types.h unistd.h
# install all headers needed to support a bare-metal compiler
all:
# Note: when ARCH is "x86" we concatenate both x86_64 and i386
headers:
$(Q)mkdir -p $(OUTPUT)sysroot
$(Q)mkdir -p $(OUTPUT)sysroot/include
$(Q)cp $(all_files) $(OUTPUT)sysroot/include/
$(Q)if [ "$(ARCH)" = "x86" ]; then \
sed -e \
's,^#ifndef _NOLIBC_ARCH_X86_64_H,#if !defined(_NOLIBC_ARCH_X86_64_H) \&\& defined(__x86_64__),' \
arch-x86_64.h; \
sed -e \
's,^#ifndef _NOLIBC_ARCH_I386_H,#if !defined(_NOLIBC_ARCH_I386_H) \&\& !defined(__x86_64__),' \
arch-i386.h; \
elif [ -e "$(arch_file)" ]; then \
cat $(arch_file); \
else \
echo "Fatal: architecture $(ARCH) not yet supported by nolibc." >&2; \
exit 1; \
fi > $(OUTPUT)sysroot/include/arch.h
headers_standalone: headers
$(Q)$(MAKE) -C $(srctree) headers
$(Q)$(MAKE) -C $(srctree) headers_install INSTALL_HDR_PATH=$(OUTPUT)/sysroot
clean:
$(call QUIET_CLEAN, nolibc) rm -rf "$(OUTPUT)sysroot"

View File

@@ -0,0 +1,199 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* AARCH64 specific definitions for NOLIBC
* Copyright (C) 2017-2022 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_ARCH_AARCH64_H
#define _NOLIBC_ARCH_AARCH64_H
/* O_* macros for fcntl/open are architecture-specific */
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2
#define O_CREAT 0x40
#define O_EXCL 0x80
#define O_NOCTTY 0x100
#define O_TRUNC 0x200
#define O_APPEND 0x400
#define O_NONBLOCK 0x800
#define O_DIRECTORY 0x4000
/* The struct returned by the newfstatat() syscall. Differs slightly from the
* x86_64's stat one by field ordering, so be careful.
*/
struct sys_stat_struct {
unsigned long st_dev;
unsigned long st_ino;
unsigned int st_mode;
unsigned int st_nlink;
unsigned int st_uid;
unsigned int st_gid;
unsigned long st_rdev;
unsigned long __pad1;
long st_size;
int st_blksize;
int __pad2;
long st_blocks;
long st_atime;
unsigned long st_atime_nsec;
long st_mtime;
unsigned long st_mtime_nsec;
long st_ctime;
unsigned long st_ctime_nsec;
unsigned int __unused[2];
};
/* Syscalls for AARCH64 :
* - registers are 64-bit
* - stack is 16-byte aligned
* - syscall number is passed in x8
* - arguments are in x0, x1, x2, x3, x4, x5
* - the system call is performed by calling svc 0
* - syscall return comes in x0.
* - the arguments are cast to long and assigned into the target registers
* which are then simply passed as registers to the asm code, so that we
* don't have to experience issues with register constraints.
*
* On aarch64, select() is not implemented so we have to use pselect6().
*/
#define __ARCH_WANT_SYS_PSELECT6
#define my_syscall0(num) \
({ \
register long _num __asm__ ("x8") = (num); \
register long _arg1 __asm__ ("x0"); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall1(num, arg1) \
({ \
register long _num __asm__ ("x8") = (num); \
register long _arg1 __asm__ ("x0") = (long)(arg1); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall2(num, arg1, arg2) \
({ \
register long _num __asm__ ("x8") = (num); \
register long _arg1 __asm__ ("x0") = (long)(arg1); \
register long _arg2 __asm__ ("x1") = (long)(arg2); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), "r"(_arg2), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall3(num, arg1, arg2, arg3) \
({ \
register long _num __asm__ ("x8") = (num); \
register long _arg1 __asm__ ("x0") = (long)(arg1); \
register long _arg2 __asm__ ("x1") = (long)(arg2); \
register long _arg3 __asm__ ("x2") = (long)(arg3); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall4(num, arg1, arg2, arg3, arg4) \
({ \
register long _num __asm__ ("x8") = (num); \
register long _arg1 __asm__ ("x0") = (long)(arg1); \
register long _arg2 __asm__ ("x1") = (long)(arg2); \
register long _arg3 __asm__ ("x2") = (long)(arg3); \
register long _arg4 __asm__ ("x3") = (long)(arg4); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
({ \
register long _num __asm__ ("x8") = (num); \
register long _arg1 __asm__ ("x0") = (long)(arg1); \
register long _arg2 __asm__ ("x1") = (long)(arg2); \
register long _arg3 __asm__ ("x2") = (long)(arg3); \
register long _arg4 __asm__ ("x3") = (long)(arg4); \
register long _arg5 __asm__ ("x4") = (long)(arg5); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r" (_arg1) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6) \
({ \
register long _num __asm__ ("x8") = (num); \
register long _arg1 __asm__ ("x0") = (long)(arg1); \
register long _arg2 __asm__ ("x1") = (long)(arg2); \
register long _arg3 __asm__ ("x2") = (long)(arg3); \
register long _arg4 __asm__ ("x3") = (long)(arg4); \
register long _arg5 __asm__ ("x4") = (long)(arg5); \
register long _arg6 __asm__ ("x5") = (long)(arg6); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r" (_arg1) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
"r"(_arg6), "r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
/* startup code */
__asm__ (".section .text\n"
".weak _start\n"
"_start:\n"
"ldr x0, [sp]\n" // argc (x0) was in the stack
"add x1, sp, 8\n" // argv (x1) = sp
"lsl x2, x0, 3\n" // envp (x2) = 8*argc ...
"add x2, x2, 8\n" // + 8 (skip null)
"add x2, x2, x1\n" // + argv
"and sp, x1, -16\n" // sp must be 16-byte aligned in the callee
"bl main\n" // main() returns the status code, we'll exit with it.
"mov x8, 93\n" // NR_exit == 93
"svc #0\n"
"");
#endif // _NOLIBC_ARCH_AARCH64_H

View File

@@ -0,0 +1,204 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* ARM specific definitions for NOLIBC
* Copyright (C) 2017-2022 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_ARCH_ARM_H
#define _NOLIBC_ARCH_ARM_H
/* O_* macros for fcntl/open are architecture-specific */
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2
#define O_CREAT 0x40
#define O_EXCL 0x80
#define O_NOCTTY 0x100
#define O_TRUNC 0x200
#define O_APPEND 0x400
#define O_NONBLOCK 0x800
#define O_DIRECTORY 0x4000
/* The struct returned by the stat() syscall, 32-bit only, the syscall returns
* exactly 56 bytes (stops before the unused array). In big endian, the format
* differs as devices are returned as short only.
*/
struct sys_stat_struct {
#if defined(__ARMEB__)
unsigned short st_dev;
unsigned short __pad1;
#else
unsigned long st_dev;
#endif
unsigned long st_ino;
unsigned short st_mode;
unsigned short st_nlink;
unsigned short st_uid;
unsigned short st_gid;
#if defined(__ARMEB__)
unsigned short st_rdev;
unsigned short __pad2;
#else
unsigned long st_rdev;
#endif
unsigned long st_size;
unsigned long st_blksize;
unsigned long st_blocks;
unsigned long st_atime;
unsigned long st_atime_nsec;
unsigned long st_mtime;
unsigned long st_mtime_nsec;
unsigned long st_ctime;
unsigned long st_ctime_nsec;
unsigned long __unused[2];
};
/* Syscalls for ARM in ARM or Thumb modes :
* - registers are 32-bit
* - stack is 8-byte aligned
* ( http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4127.html)
* - syscall number is passed in r7
* - arguments are in r0, r1, r2, r3, r4, r5
* - the system call is performed by calling svc #0
* - syscall return comes in r0.
* - only lr is clobbered.
* - the arguments are cast to long and assigned into the target registers
* which are then simply passed as registers to the asm code, so that we
* don't have to experience issues with register constraints.
* - the syscall number is always specified last in order to allow to force
* some registers before (gcc refuses a %-register at the last position).
*
* Also, ARM supports the old_select syscall if newselect is not available
*/
#define __ARCH_WANT_SYS_OLD_SELECT
#define my_syscall0(num) \
({ \
register long _num __asm__ ("r7") = (num); \
register long _arg1 __asm__ ("r0"); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_num) \
: "memory", "cc", "lr" \
); \
_arg1; \
})
#define my_syscall1(num, arg1) \
({ \
register long _num __asm__ ("r7") = (num); \
register long _arg1 __asm__ ("r0") = (long)(arg1); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), \
"r"(_num) \
: "memory", "cc", "lr" \
); \
_arg1; \
})
#define my_syscall2(num, arg1, arg2) \
({ \
register long _num __asm__ ("r7") = (num); \
register long _arg1 __asm__ ("r0") = (long)(arg1); \
register long _arg2 __asm__ ("r1") = (long)(arg2); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), "r"(_arg2), \
"r"(_num) \
: "memory", "cc", "lr" \
); \
_arg1; \
})
#define my_syscall3(num, arg1, arg2, arg3) \
({ \
register long _num __asm__ ("r7") = (num); \
register long _arg1 __asm__ ("r0") = (long)(arg1); \
register long _arg2 __asm__ ("r1") = (long)(arg2); \
register long _arg3 __asm__ ("r2") = (long)(arg3); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), \
"r"(_num) \
: "memory", "cc", "lr" \
); \
_arg1; \
})
#define my_syscall4(num, arg1, arg2, arg3, arg4) \
({ \
register long _num __asm__ ("r7") = (num); \
register long _arg1 __asm__ ("r0") = (long)(arg1); \
register long _arg2 __asm__ ("r1") = (long)(arg2); \
register long _arg3 __asm__ ("r2") = (long)(arg3); \
register long _arg4 __asm__ ("r3") = (long)(arg4); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \
"r"(_num) \
: "memory", "cc", "lr" \
); \
_arg1; \
})
#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
({ \
register long _num __asm__ ("r7") = (num); \
register long _arg1 __asm__ ("r0") = (long)(arg1); \
register long _arg2 __asm__ ("r1") = (long)(arg2); \
register long _arg3 __asm__ ("r2") = (long)(arg3); \
register long _arg4 __asm__ ("r3") = (long)(arg4); \
register long _arg5 __asm__ ("r4") = (long)(arg5); \
\
__asm__ volatile ( \
"svc #0\n" \
: "=r" (_arg1) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
"r"(_num) \
: "memory", "cc", "lr" \
); \
_arg1; \
})
/* startup code */
__asm__ (".section .text\n"
".weak _start\n"
"_start:\n"
#if defined(__THUMBEB__) || defined(__THUMBEL__)
/* We enter here in 32-bit mode but if some previous functions were in
* 16-bit mode, the assembler cannot know, so we need to tell it we're in
* 32-bit now, then switch to 16-bit (is there a better way to do it than
* adding 1 by hand ?) and tell the asm we're now in 16-bit mode so that
* it generates correct instructions. Note that we do not support thumb1.
*/
".code 32\n"
"add r0, pc, #1\n"
"bx r0\n"
".code 16\n"
#endif
"pop {%r0}\n" // argc was in the stack
"mov %r1, %sp\n" // argv = sp
"add %r2, %r1, %r0, lsl #2\n" // envp = argv + 4*argc ...
"add %r2, %r2, $4\n" // ... + 4
"and %r3, %r1, $-8\n" // AAPCS : sp must be 8-byte aligned in the
"mov %sp, %r3\n" // callee, an bl doesn't push (lr=pc)
"bl main\n" // main() returns the status code, we'll exit with it.
"movs r7, $1\n" // NR_exit == 1
"svc $0x00\n"
"");
#endif // _NOLIBC_ARCH_ARM_H

View File

@@ -0,0 +1,219 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* i386 specific definitions for NOLIBC
* Copyright (C) 2017-2022 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_ARCH_I386_H
#define _NOLIBC_ARCH_I386_H
/* O_* macros for fcntl/open are architecture-specific */
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2
#define O_CREAT 0x40
#define O_EXCL 0x80
#define O_NOCTTY 0x100
#define O_TRUNC 0x200
#define O_APPEND 0x400
#define O_NONBLOCK 0x800
#define O_DIRECTORY 0x10000
/* The struct returned by the stat() syscall, 32-bit only, the syscall returns
* exactly 56 bytes (stops before the unused array).
*/
struct sys_stat_struct {
unsigned long st_dev;
unsigned long st_ino;
unsigned short st_mode;
unsigned short st_nlink;
unsigned short st_uid;
unsigned short st_gid;
unsigned long st_rdev;
unsigned long st_size;
unsigned long st_blksize;
unsigned long st_blocks;
unsigned long st_atime;
unsigned long st_atime_nsec;
unsigned long st_mtime;
unsigned long st_mtime_nsec;
unsigned long st_ctime;
unsigned long st_ctime_nsec;
unsigned long __unused[2];
};
/* Syscalls for i386 :
* - mostly similar to x86_64
* - registers are 32-bit
* - syscall number is passed in eax
* - arguments are in ebx, ecx, edx, esi, edi, ebp respectively
* - all registers are preserved (except eax of course)
* - the system call is performed by calling int $0x80
* - syscall return comes in eax
* - the arguments are cast to long and assigned into the target registers
* which are then simply passed as registers to the asm code, so that we
* don't have to experience issues with register constraints.
* - the syscall number is always specified last in order to allow to force
* some registers before (gcc refuses a %-register at the last position).
*
* Also, i386 supports the old_select syscall if newselect is not available
*/
#define __ARCH_WANT_SYS_OLD_SELECT
#define my_syscall0(num) \
({ \
long _ret; \
register long _num __asm__ ("eax") = (num); \
\
__asm__ volatile ( \
"int $0x80\n" \
: "=a" (_ret) \
: "0"(_num) \
: "memory", "cc" \
); \
_ret; \
})
#define my_syscall1(num, arg1) \
({ \
long _ret; \
register long _num __asm__ ("eax") = (num); \
register long _arg1 __asm__ ("ebx") = (long)(arg1); \
\
__asm__ volatile ( \
"int $0x80\n" \
: "=a" (_ret) \
: "r"(_arg1), \
"0"(_num) \
: "memory", "cc" \
); \
_ret; \
})
#define my_syscall2(num, arg1, arg2) \
({ \
long _ret; \
register long _num __asm__ ("eax") = (num); \
register long _arg1 __asm__ ("ebx") = (long)(arg1); \
register long _arg2 __asm__ ("ecx") = (long)(arg2); \
\
__asm__ volatile ( \
"int $0x80\n" \
: "=a" (_ret) \
: "r"(_arg1), "r"(_arg2), \
"0"(_num) \
: "memory", "cc" \
); \
_ret; \
})
#define my_syscall3(num, arg1, arg2, arg3) \
({ \
long _ret; \
register long _num __asm__ ("eax") = (num); \
register long _arg1 __asm__ ("ebx") = (long)(arg1); \
register long _arg2 __asm__ ("ecx") = (long)(arg2); \
register long _arg3 __asm__ ("edx") = (long)(arg3); \
\
__asm__ volatile ( \
"int $0x80\n" \
: "=a" (_ret) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), \
"0"(_num) \
: "memory", "cc" \
); \
_ret; \
})
#define my_syscall4(num, arg1, arg2, arg3, arg4) \
({ \
long _ret; \
register long _num __asm__ ("eax") = (num); \
register long _arg1 __asm__ ("ebx") = (long)(arg1); \
register long _arg2 __asm__ ("ecx") = (long)(arg2); \
register long _arg3 __asm__ ("edx") = (long)(arg3); \
register long _arg4 __asm__ ("esi") = (long)(arg4); \
\
__asm__ volatile ( \
"int $0x80\n" \
: "=a" (_ret) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \
"0"(_num) \
: "memory", "cc" \
); \
_ret; \
})
#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
({ \
long _ret; \
register long _num __asm__ ("eax") = (num); \
register long _arg1 __asm__ ("ebx") = (long)(arg1); \
register long _arg2 __asm__ ("ecx") = (long)(arg2); \
register long _arg3 __asm__ ("edx") = (long)(arg3); \
register long _arg4 __asm__ ("esi") = (long)(arg4); \
register long _arg5 __asm__ ("edi") = (long)(arg5); \
\
__asm__ volatile ( \
"int $0x80\n" \
: "=a" (_ret) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
"0"(_num) \
: "memory", "cc" \
); \
_ret; \
})
#define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6) \
({ \
long _eax = (long)(num); \
long _arg6 = (long)(arg6); /* Always in memory */ \
__asm__ volatile ( \
"pushl %[_arg6]\n\t" \
"pushl %%ebp\n\t" \
"movl 4(%%esp),%%ebp\n\t" \
"int $0x80\n\t" \
"popl %%ebp\n\t" \
"addl $4,%%esp\n\t" \
: "+a"(_eax) /* %eax */ \
: "b"(arg1), /* %ebx */ \
"c"(arg2), /* %ecx */ \
"d"(arg3), /* %edx */ \
"S"(arg4), /* %esi */ \
"D"(arg5), /* %edi */ \
[_arg6]"m"(_arg6) /* memory */ \
: "memory", "cc" \
); \
_eax; \
})
/* startup code */
/*
* i386 System V ABI mandates:
* 1) last pushed argument must be 16-byte aligned.
* 2) The deepest stack frame should be set to zero
*
*/
__asm__ (".section .text\n"
".weak _start\n"
"_start:\n"
"pop %eax\n" // argc (first arg, %eax)
"mov %esp, %ebx\n" // argv[] (second arg, %ebx)
"lea 4(%ebx,%eax,4),%ecx\n" // then a NULL then envp (third arg, %ecx)
"xor %ebp, %ebp\n" // zero the stack frame
"and $-16, %esp\n" // x86 ABI : esp must be 16-byte aligned before
"sub $4, %esp\n" // the call instruction (args are aligned)
"push %ecx\n" // push all registers on the stack so that we
"push %ebx\n" // support both regparm and plain stack modes
"push %eax\n"
"call main\n" // main() returns the status code in %eax
"mov %eax, %ebx\n" // retrieve exit code (32-bit int)
"movl $1, %eax\n" // NR_exit == 1
"int $0x80\n" // exit now
"hlt\n" // ensure it does not
"");
#endif // _NOLIBC_ARCH_I386_H

View File

@@ -0,0 +1,215 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* MIPS specific definitions for NOLIBC
* Copyright (C) 2017-2022 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_ARCH_MIPS_H
#define _NOLIBC_ARCH_MIPS_H
/* O_* macros for fcntl/open are architecture-specific */
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2
#define O_APPEND 0x0008
#define O_NONBLOCK 0x0080
#define O_CREAT 0x0100
#define O_TRUNC 0x0200
#define O_EXCL 0x0400
#define O_NOCTTY 0x0800
#define O_DIRECTORY 0x10000
/* The struct returned by the stat() syscall. 88 bytes are returned by the
* syscall.
*/
struct sys_stat_struct {
unsigned int st_dev;
long st_pad1[3];
unsigned long st_ino;
unsigned int st_mode;
unsigned int st_nlink;
unsigned int st_uid;
unsigned int st_gid;
unsigned int st_rdev;
long st_pad2[2];
long st_size;
long st_pad3;
long st_atime;
long st_atime_nsec;
long st_mtime;
long st_mtime_nsec;
long st_ctime;
long st_ctime_nsec;
long st_blksize;
long st_blocks;
long st_pad4[14];
};
/* Syscalls for MIPS ABI O32 :
* - WARNING! there's always a delayed slot!
* - WARNING again, the syntax is different, registers take a '$' and numbers
* do not.
* - registers are 32-bit
* - stack is 8-byte aligned
* - syscall number is passed in v0 (starts at 0xfa0).
* - arguments are in a0, a1, a2, a3, then the stack. The caller needs to
* leave some room in the stack for the callee to save a0..a3 if needed.
* - Many registers are clobbered, in fact only a0..a2 and s0..s8 are
* preserved. See: https://www.linux-mips.org/wiki/Syscall as well as
* scall32-o32.S in the kernel sources.
* - the system call is performed by calling "syscall"
* - syscall return comes in v0, and register a3 needs to be checked to know
* if an error occurred, in which case errno is in v0.
* - the arguments are cast to long and assigned into the target registers
* which are then simply passed as registers to the asm code, so that we
* don't have to experience issues with register constraints.
*/
#define my_syscall0(num) \
({ \
register long _num __asm__ ("v0") = (num); \
register long _arg4 __asm__ ("a3"); \
\
__asm__ volatile ( \
"addiu $sp, $sp, -32\n" \
"syscall\n" \
"addiu $sp, $sp, 32\n" \
: "=r"(_num), "=r"(_arg4) \
: "r"(_num) \
: "memory", "cc", "at", "v1", "hi", "lo", \
"t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \
); \
_arg4 ? -_num : _num; \
})
#define my_syscall1(num, arg1) \
({ \
register long _num __asm__ ("v0") = (num); \
register long _arg1 __asm__ ("a0") = (long)(arg1); \
register long _arg4 __asm__ ("a3"); \
\
__asm__ volatile ( \
"addiu $sp, $sp, -32\n" \
"syscall\n" \
"addiu $sp, $sp, 32\n" \
: "=r"(_num), "=r"(_arg4) \
: "0"(_num), \
"r"(_arg1) \
: "memory", "cc", "at", "v1", "hi", "lo", \
"t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \
); \
_arg4 ? -_num : _num; \
})
#define my_syscall2(num, arg1, arg2) \
({ \
register long _num __asm__ ("v0") = (num); \
register long _arg1 __asm__ ("a0") = (long)(arg1); \
register long _arg2 __asm__ ("a1") = (long)(arg2); \
register long _arg4 __asm__ ("a3"); \
\
__asm__ volatile ( \
"addiu $sp, $sp, -32\n" \
"syscall\n" \
"addiu $sp, $sp, 32\n" \
: "=r"(_num), "=r"(_arg4) \
: "0"(_num), \
"r"(_arg1), "r"(_arg2) \
: "memory", "cc", "at", "v1", "hi", "lo", \
"t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \
); \
_arg4 ? -_num : _num; \
})
#define my_syscall3(num, arg1, arg2, arg3) \
({ \
register long _num __asm__ ("v0") = (num); \
register long _arg1 __asm__ ("a0") = (long)(arg1); \
register long _arg2 __asm__ ("a1") = (long)(arg2); \
register long _arg3 __asm__ ("a2") = (long)(arg3); \
register long _arg4 __asm__ ("a3"); \
\
__asm__ volatile ( \
"addiu $sp, $sp, -32\n" \
"syscall\n" \
"addiu $sp, $sp, 32\n" \
: "=r"(_num), "=r"(_arg4) \
: "0"(_num), \
"r"(_arg1), "r"(_arg2), "r"(_arg3) \
: "memory", "cc", "at", "v1", "hi", "lo", \
"t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \
); \
_arg4 ? -_num : _num; \
})
#define my_syscall4(num, arg1, arg2, arg3, arg4) \
({ \
register long _num __asm__ ("v0") = (num); \
register long _arg1 __asm__ ("a0") = (long)(arg1); \
register long _arg2 __asm__ ("a1") = (long)(arg2); \
register long _arg3 __asm__ ("a2") = (long)(arg3); \
register long _arg4 __asm__ ("a3") = (long)(arg4); \
\
__asm__ volatile ( \
"addiu $sp, $sp, -32\n" \
"syscall\n" \
"addiu $sp, $sp, 32\n" \
: "=r" (_num), "=r"(_arg4) \
: "0"(_num), \
"r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4) \
: "memory", "cc", "at", "v1", "hi", "lo", \
"t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \
); \
_arg4 ? -_num : _num; \
})
#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
({ \
register long _num __asm__ ("v0") = (num); \
register long _arg1 __asm__ ("a0") = (long)(arg1); \
register long _arg2 __asm__ ("a1") = (long)(arg2); \
register long _arg3 __asm__ ("a2") = (long)(arg3); \
register long _arg4 __asm__ ("a3") = (long)(arg4); \
register long _arg5 = (long)(arg5); \
\
__asm__ volatile ( \
"addiu $sp, $sp, -32\n" \
"sw %7, 16($sp)\n" \
"syscall\n " \
"addiu $sp, $sp, 32\n" \
: "=r" (_num), "=r"(_arg4) \
: "0"(_num), \
"r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5) \
: "memory", "cc", "at", "v1", "hi", "lo", \
"t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \
); \
_arg4 ? -_num : _num; \
})
/* startup code, note that it's called __start on MIPS */
__asm__ (".section .text\n"
".weak __start\n"
".set nomips16\n"
".set noreorder\n"
".option pic0\n"
".ent __start\n"
"__start:\n"
"lw $a0,($sp)\n" // argc was in the stack
"addiu $a1, $sp, 4\n" // argv = sp + 4
"sll $a2, $a0, 2\n" // a2 = argc * 4
"add $a2, $a2, $a1\n" // envp = argv + 4*argc ...
"addiu $a2, $a2, 4\n" // ... + 4
"li $t0, -8\n"
"and $sp, $sp, $t0\n" // sp must be 8-byte aligned
"addiu $sp,$sp,-16\n" // the callee expects to save a0..a3 there!
"jal main\n" // main() returns the status code, we'll exit with it.
"nop\n" // delayed slot
"move $a0, $v0\n" // retrieve 32-bit exit code from v0
"li $v0, 4001\n" // NR_exit == 4001
"syscall\n"
".end __start\n"
"");
#endif // _NOLIBC_ARCH_MIPS_H

View File

@@ -0,0 +1,204 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* RISCV (32 and 64) specific definitions for NOLIBC
* Copyright (C) 2017-2022 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_ARCH_RISCV_H
#define _NOLIBC_ARCH_RISCV_H
/* O_* macros for fcntl/open are architecture-specific */
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2
#define O_CREAT 0x100
#define O_EXCL 0x200
#define O_NOCTTY 0x400
#define O_TRUNC 0x1000
#define O_APPEND 0x2000
#define O_NONBLOCK 0x4000
#define O_DIRECTORY 0x200000
struct sys_stat_struct {
unsigned long st_dev; /* Device. */
unsigned long st_ino; /* File serial number. */
unsigned int st_mode; /* File mode. */
unsigned int st_nlink; /* Link count. */
unsigned int st_uid; /* User ID of the file's owner. */
unsigned int st_gid; /* Group ID of the file's group. */
unsigned long st_rdev; /* Device number, if device. */
unsigned long __pad1;
long st_size; /* Size of file, in bytes. */
int st_blksize; /* Optimal block size for I/O. */
int __pad2;
long st_blocks; /* Number 512-byte blocks allocated. */
long st_atime; /* Time of last access. */
unsigned long st_atime_nsec;
long st_mtime; /* Time of last modification. */
unsigned long st_mtime_nsec;
long st_ctime; /* Time of last status change. */
unsigned long st_ctime_nsec;
unsigned int __unused4;
unsigned int __unused5;
};
#if __riscv_xlen == 64
#define PTRLOG "3"
#define SZREG "8"
#elif __riscv_xlen == 32
#define PTRLOG "2"
#define SZREG "4"
#endif
/* Syscalls for RISCV :
* - stack is 16-byte aligned
* - syscall number is passed in a7
* - arguments are in a0, a1, a2, a3, a4, a5
* - the system call is performed by calling ecall
* - syscall return comes in a0
* - the arguments are cast to long and assigned into the target
* registers which are then simply passed as registers to the asm code,
* so that we don't have to experience issues with register constraints.
*
* On riscv, select() is not implemented so we have to use pselect6().
*/
#define __ARCH_WANT_SYS_PSELECT6
#define my_syscall0(num) \
({ \
register long _num __asm__ ("a7") = (num); \
register long _arg1 __asm__ ("a0"); \
\
__asm__ volatile ( \
"ecall\n\t" \
: "=r"(_arg1) \
: "r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall1(num, arg1) \
({ \
register long _num __asm__ ("a7") = (num); \
register long _arg1 __asm__ ("a0") = (long)(arg1); \
\
__asm__ volatile ( \
"ecall\n" \
: "+r"(_arg1) \
: "r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall2(num, arg1, arg2) \
({ \
register long _num __asm__ ("a7") = (num); \
register long _arg1 __asm__ ("a0") = (long)(arg1); \
register long _arg2 __asm__ ("a1") = (long)(arg2); \
\
__asm__ volatile ( \
"ecall\n" \
: "+r"(_arg1) \
: "r"(_arg2), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall3(num, arg1, arg2, arg3) \
({ \
register long _num __asm__ ("a7") = (num); \
register long _arg1 __asm__ ("a0") = (long)(arg1); \
register long _arg2 __asm__ ("a1") = (long)(arg2); \
register long _arg3 __asm__ ("a2") = (long)(arg3); \
\
__asm__ volatile ( \
"ecall\n\t" \
: "+r"(_arg1) \
: "r"(_arg2), "r"(_arg3), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall4(num, arg1, arg2, arg3, arg4) \
({ \
register long _num __asm__ ("a7") = (num); \
register long _arg1 __asm__ ("a0") = (long)(arg1); \
register long _arg2 __asm__ ("a1") = (long)(arg2); \
register long _arg3 __asm__ ("a2") = (long)(arg3); \
register long _arg4 __asm__ ("a3") = (long)(arg4); \
\
__asm__ volatile ( \
"ecall\n" \
: "+r"(_arg1) \
: "r"(_arg2), "r"(_arg3), "r"(_arg4), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
({ \
register long _num __asm__ ("a7") = (num); \
register long _arg1 __asm__ ("a0") = (long)(arg1); \
register long _arg2 __asm__ ("a1") = (long)(arg2); \
register long _arg3 __asm__ ("a2") = (long)(arg3); \
register long _arg4 __asm__ ("a3") = (long)(arg4); \
register long _arg5 __asm__ ("a4") = (long)(arg5); \
\
__asm__ volatile ( \
"ecall\n" \
: "+r"(_arg1) \
: "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6) \
({ \
register long _num __asm__ ("a7") = (num); \
register long _arg1 __asm__ ("a0") = (long)(arg1); \
register long _arg2 __asm__ ("a1") = (long)(arg2); \
register long _arg3 __asm__ ("a2") = (long)(arg3); \
register long _arg4 __asm__ ("a3") = (long)(arg4); \
register long _arg5 __asm__ ("a4") = (long)(arg5); \
register long _arg6 __asm__ ("a5") = (long)(arg6); \
\
__asm__ volatile ( \
"ecall\n" \
: "+r"(_arg1) \
: "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), "r"(_arg6), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
/* startup code */
__asm__ (".section .text\n"
".weak _start\n"
"_start:\n"
".option push\n"
".option norelax\n"
"lla gp, __global_pointer$\n"
".option pop\n"
"ld a0, 0(sp)\n" // argc (a0) was in the stack
"add a1, sp, "SZREG"\n" // argv (a1) = sp
"slli a2, a0, "PTRLOG"\n" // envp (a2) = SZREG*argc ...
"add a2, a2, "SZREG"\n" // + SZREG (skip null)
"add a2,a2,a1\n" // + argv
"andi sp,a1,-16\n" // sp must be 16-byte aligned
"call main\n" // main() returns the status code, we'll exit with it.
"li a7, 93\n" // NR_exit == 93
"ecall\n"
"");
#endif // _NOLIBC_ARCH_RISCV_H

View File

@@ -0,0 +1,215 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* x86_64 specific definitions for NOLIBC
* Copyright (C) 2017-2022 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_ARCH_X86_64_H
#define _NOLIBC_ARCH_X86_64_H
/* O_* macros for fcntl/open are architecture-specific */
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2
#define O_CREAT 0x40
#define O_EXCL 0x80
#define O_NOCTTY 0x100
#define O_TRUNC 0x200
#define O_APPEND 0x400
#define O_NONBLOCK 0x800
#define O_DIRECTORY 0x10000
/* The struct returned by the stat() syscall, equivalent to stat64(). The
* syscall returns 116 bytes and stops in the middle of __unused.
*/
struct sys_stat_struct {
unsigned long st_dev;
unsigned long st_ino;
unsigned long st_nlink;
unsigned int st_mode;
unsigned int st_uid;
unsigned int st_gid;
unsigned int __pad0;
unsigned long st_rdev;
long st_size;
long st_blksize;
long st_blocks;
unsigned long st_atime;
unsigned long st_atime_nsec;
unsigned long st_mtime;
unsigned long st_mtime_nsec;
unsigned long st_ctime;
unsigned long st_ctime_nsec;
long __unused[3];
};
/* Syscalls for x86_64 :
* - registers are 64-bit
* - syscall number is passed in rax
* - arguments are in rdi, rsi, rdx, r10, r8, r9 respectively
* - the system call is performed by calling the syscall instruction
* - syscall return comes in rax
* - rcx and r11 are clobbered, others are preserved.
* - the arguments are cast to long and assigned into the target registers
* which are then simply passed as registers to the asm code, so that we
* don't have to experience issues with register constraints.
* - the syscall number is always specified last in order to allow to force
* some registers before (gcc refuses a %-register at the last position).
* - see also x86-64 ABI section A.2 AMD64 Linux Kernel Conventions, A.2.1
* Calling Conventions.
*
* Link x86-64 ABI: https://gitlab.com/x86-psABIs/x86-64-ABI/-/wikis/home
*
*/
#define my_syscall0(num) \
({ \
long _ret; \
register long _num __asm__ ("rax") = (num); \
\
__asm__ volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
#define my_syscall1(num, arg1) \
({ \
long _ret; \
register long _num __asm__ ("rax") = (num); \
register long _arg1 __asm__ ("rdi") = (long)(arg1); \
\
__asm__ volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), \
"0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
#define my_syscall2(num, arg1, arg2) \
({ \
long _ret; \
register long _num __asm__ ("rax") = (num); \
register long _arg1 __asm__ ("rdi") = (long)(arg1); \
register long _arg2 __asm__ ("rsi") = (long)(arg2); \
\
__asm__ volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), "r"(_arg2), \
"0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
#define my_syscall3(num, arg1, arg2, arg3) \
({ \
long _ret; \
register long _num __asm__ ("rax") = (num); \
register long _arg1 __asm__ ("rdi") = (long)(arg1); \
register long _arg2 __asm__ ("rsi") = (long)(arg2); \
register long _arg3 __asm__ ("rdx") = (long)(arg3); \
\
__asm__ volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), \
"0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
#define my_syscall4(num, arg1, arg2, arg3, arg4) \
({ \
long _ret; \
register long _num __asm__ ("rax") = (num); \
register long _arg1 __asm__ ("rdi") = (long)(arg1); \
register long _arg2 __asm__ ("rsi") = (long)(arg2); \
register long _arg3 __asm__ ("rdx") = (long)(arg3); \
register long _arg4 __asm__ ("r10") = (long)(arg4); \
\
__asm__ volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \
"0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
({ \
long _ret; \
register long _num __asm__ ("rax") = (num); \
register long _arg1 __asm__ ("rdi") = (long)(arg1); \
register long _arg2 __asm__ ("rsi") = (long)(arg2); \
register long _arg3 __asm__ ("rdx") = (long)(arg3); \
register long _arg4 __asm__ ("r10") = (long)(arg4); \
register long _arg5 __asm__ ("r8") = (long)(arg5); \
\
__asm__ volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
"0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
#define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6) \
({ \
long _ret; \
register long _num __asm__ ("rax") = (num); \
register long _arg1 __asm__ ("rdi") = (long)(arg1); \
register long _arg2 __asm__ ("rsi") = (long)(arg2); \
register long _arg3 __asm__ ("rdx") = (long)(arg3); \
register long _arg4 __asm__ ("r10") = (long)(arg4); \
register long _arg5 __asm__ ("r8") = (long)(arg5); \
register long _arg6 __asm__ ("r9") = (long)(arg6); \
\
__asm__ volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
"r"(_arg6), "0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
/* startup code */
/*
* x86-64 System V ABI mandates:
* 1) %rsp must be 16-byte aligned right before the function call.
* 2) The deepest stack frame should be zero (the %rbp).
*
*/
__asm__ (".section .text\n"
".weak _start\n"
"_start:\n"
"pop %rdi\n" // argc (first arg, %rdi)
"mov %rsp, %rsi\n" // argv[] (second arg, %rsi)
"lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
"xor %ebp, %ebp\n" // zero the stack frame
"and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call
"call main\n" // main() returns the status code, we'll exit with it.
"mov %eax, %edi\n" // retrieve exit code (32 bit)
"mov $60, %eax\n" // NR_exit == 60
"syscall\n" // really exit
"hlt\n" // ensure it does not return
"");
#endif // _NOLIBC_ARCH_X86_64_H

View File

@@ -0,0 +1,32 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* Copyright (C) 2017-2022 Willy Tarreau <w@1wt.eu>
*/
/* Below comes the architecture-specific code. For each architecture, we have
* the syscall declarations and the _start code definition. This is the only
* global part. On all architectures the kernel puts everything in the stack
* before jumping to _start just above us, without any return address (_start
* is not a function but an entry pint). So at the stack pointer we find argc.
* Then argv[] begins, and ends at the first NULL. Then we have envp which
* starts and ends with a NULL as well. So envp=argv+argc+1.
*/
#ifndef _NOLIBC_ARCH_H
#define _NOLIBC_ARCH_H
#if defined(__x86_64__)
#include "arch-x86_64.h"
#elif defined(__i386__) || defined(__i486__) || defined(__i586__) || defined(__i686__)
#include "arch-i386.h"
#elif defined(__ARM_EABI__)
#include "arch-arm.h"
#elif defined(__aarch64__)
#include "arch-aarch64.h"
#elif defined(__mips__) && defined(_ABIO32)
#include "arch-mips.h"
#elif defined(__riscv)
#include "arch-riscv.h"
#endif
#endif /* _NOLIBC_ARCH_H */

View File

@@ -0,0 +1,99 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* ctype function definitions for NOLIBC
* Copyright (C) 2017-2021 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_CTYPE_H
#define _NOLIBC_CTYPE_H
#include "std.h"
/*
* As much as possible, please keep functions alphabetically sorted.
*/
static __attribute__((unused))
int isascii(int c)
{
/* 0x00..0x7f */
return (unsigned int)c <= 0x7f;
}
static __attribute__((unused))
int isblank(int c)
{
return c == '\t' || c == ' ';
}
static __attribute__((unused))
int iscntrl(int c)
{
/* 0x00..0x1f, 0x7f */
return (unsigned int)c < 0x20 || c == 0x7f;
}
static __attribute__((unused))
int isdigit(int c)
{
return (unsigned int)(c - '0') < 10;
}
static __attribute__((unused))
int isgraph(int c)
{
/* 0x21..0x7e */
return (unsigned int)(c - 0x21) < 0x5e;
}
static __attribute__((unused))
int islower(int c)
{
return (unsigned int)(c - 'a') < 26;
}
static __attribute__((unused))
int isprint(int c)
{
/* 0x20..0x7e */
return (unsigned int)(c - 0x20) < 0x5f;
}
static __attribute__((unused))
int isspace(int c)
{
/* \t is 0x9, \n is 0xA, \v is 0xB, \f is 0xC, \r is 0xD */
return ((unsigned int)c == ' ') || (unsigned int)(c - 0x09) < 5;
}
static __attribute__((unused))
int isupper(int c)
{
return (unsigned int)(c - 'A') < 26;
}
static __attribute__((unused))
int isxdigit(int c)
{
return isdigit(c) || (unsigned int)(c - 'A') < 6 || (unsigned int)(c - 'a') < 6;
}
static __attribute__((unused))
int isalpha(int c)
{
return islower(c) || isupper(c);
}
static __attribute__((unused))
int isalnum(int c)
{
return isalpha(c) || isdigit(c);
}
static __attribute__((unused))
int ispunct(int c)
{
return isgraph(c) && !isalnum(c);
}
#endif /* _NOLIBC_CTYPE_H */

View File

@@ -0,0 +1,27 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* Minimal errno definitions for NOLIBC
* Copyright (C) 2017-2022 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_ERRNO_H
#define _NOLIBC_ERRNO_H
#include <asm/errno.h>
/* this way it will be removed if unused */
static int errno;
#ifndef NOLIBC_IGNORE_ERRNO
#define SET_ERRNO(v) do { errno = (v); } while (0)
#else
#define SET_ERRNO(v) do { } while (0)
#endif
/* errno codes all ensure that they will not conflict with a valid pointer
* because they all correspond to the highest addressable memory page.
*/
#define MAX_ERRNO 4095
#endif /* _NOLIBC_ERRNO_H */

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,22 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* signal function definitions for NOLIBC
* Copyright (C) 2017-2022 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_SIGNAL_H
#define _NOLIBC_SIGNAL_H
#include "std.h"
#include "arch.h"
#include "types.h"
#include "sys.h"
/* This one is not marked static as it's needed by libgcc for divide by zero */
__attribute__((weak,unused,section(".text.nolibc_raise")))
int raise(int signal)
{
return sys_kill(sys_getpid(), signal);
}
#endif /* _NOLIBC_SIGNAL_H */

View File

@@ -0,0 +1,49 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* Standard definitions and types for NOLIBC
* Copyright (C) 2017-2021 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_STD_H
#define _NOLIBC_STD_H
/* Declare a few quite common macros and types that usually are in stdlib.h,
* stdint.h, ctype.h, unistd.h and a few other common locations. Please place
* integer type definitions and generic macros here, but avoid OS-specific and
* syscall-specific stuff, as this file is expected to be included very early.
*/
/* note: may already be defined */
#ifndef NULL
#define NULL ((void *)0)
#endif
/* stdint types */
typedef unsigned char uint8_t;
typedef signed char int8_t;
typedef unsigned short uint16_t;
typedef signed short int16_t;
typedef unsigned int uint32_t;
typedef signed int int32_t;
typedef unsigned long long uint64_t;
typedef signed long long int64_t;
typedef unsigned long size_t;
typedef signed long ssize_t;
typedef unsigned long uintptr_t;
typedef signed long intptr_t;
typedef signed long ptrdiff_t;
/* those are commonly provided by sys/types.h */
typedef unsigned int dev_t;
typedef unsigned long ino_t;
typedef unsigned int mode_t;
typedef signed int pid_t;
typedef unsigned int uid_t;
typedef unsigned int gid_t;
typedef unsigned long nlink_t;
typedef signed long off_t;
typedef signed long blksize_t;
typedef signed long blkcnt_t;
typedef signed long time_t;
#endif /* _NOLIBC_STD_H */

View File

@@ -0,0 +1,306 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* minimal stdio function definitions for NOLIBC
* Copyright (C) 2017-2021 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_STDIO_H
#define _NOLIBC_STDIO_H
#include <stdarg.h>
#include "std.h"
#include "arch.h"
#include "errno.h"
#include "types.h"
#include "sys.h"
#include "stdlib.h"
#include "string.h"
#ifndef EOF
#define EOF (-1)
#endif
/* just define FILE as a non-empty type */
typedef struct FILE {
char dummy[1];
} FILE;
/* We define the 3 common stdio files as constant invalid pointers that
* are easily recognized.
*/
static __attribute__((unused)) FILE* const stdin = (FILE*)-3;
static __attribute__((unused)) FILE* const stdout = (FILE*)-2;
static __attribute__((unused)) FILE* const stderr = (FILE*)-1;
/* getc(), fgetc(), getchar() */
#define getc(stream) fgetc(stream)
static __attribute__((unused))
int fgetc(FILE* stream)
{
unsigned char ch;
int fd;
if (stream < stdin || stream > stderr)
return EOF;
fd = 3 + (long)stream;
if (read(fd, &ch, 1) <= 0)
return EOF;
return ch;
}
static __attribute__((unused))
int getchar(void)
{
return fgetc(stdin);
}
/* putc(), fputc(), putchar() */
#define putc(c, stream) fputc(c, stream)
static __attribute__((unused))
int fputc(int c, FILE* stream)
{
unsigned char ch = c;
int fd;
if (stream < stdin || stream > stderr)
return EOF;
fd = 3 + (long)stream;
if (write(fd, &ch, 1) <= 0)
return EOF;
return ch;
}
static __attribute__((unused))
int putchar(int c)
{
return fputc(c, stdout);
}
/* fwrite(), puts(), fputs(). Note that puts() emits '\n' but not fputs(). */
/* internal fwrite()-like function which only takes a size and returns 0 on
* success or EOF on error. It automatically retries on short writes.
*/
static __attribute__((unused))
int _fwrite(const void *buf, size_t size, FILE *stream)
{
ssize_t ret;
int fd;
if (stream < stdin || stream > stderr)
return EOF;
fd = 3 + (long)stream;
while (size) {
ret = write(fd, buf, size);
if (ret <= 0)
return EOF;
size -= ret;
buf += ret;
}
return 0;
}
static __attribute__((unused))
size_t fwrite(const void *s, size_t size, size_t nmemb, FILE *stream)
{
size_t written;
for (written = 0; written < nmemb; written++) {
if (_fwrite(s, size, stream) != 0)
break;
s += size;
}
return written;
}
static __attribute__((unused))
int fputs(const char *s, FILE *stream)
{
return _fwrite(s, strlen(s), stream);
}
static __attribute__((unused))
int puts(const char *s)
{
if (fputs(s, stdout) == EOF)
return EOF;
return putchar('\n');
}
/* fgets() */
static __attribute__((unused))
char *fgets(char *s, int size, FILE *stream)
{
int ofs;
int c;
for (ofs = 0; ofs + 1 < size;) {
c = fgetc(stream);
if (c == EOF)
break;
s[ofs++] = c;
if (c == '\n')
break;
}
if (ofs < size)
s[ofs] = 0;
return ofs ? s : NULL;
}
/* minimal vfprintf(). It supports the following formats:
* - %[l*]{d,u,c,x,p}
* - %s
* - unknown modifiers are ignored.
*/
static __attribute__((unused))
int vfprintf(FILE *stream, const char *fmt, va_list args)
{
char escape, lpref, c;
unsigned long long v;
unsigned int written;
size_t len, ofs;
char tmpbuf[21];
const char *outstr;
written = ofs = escape = lpref = 0;
while (1) {
c = fmt[ofs++];
if (escape) {
/* we're in an escape sequence, ofs == 1 */
escape = 0;
if (c == 'c' || c == 'd' || c == 'u' || c == 'x' || c == 'p') {
char *out = tmpbuf;
if (c == 'p')
v = va_arg(args, unsigned long);
else if (lpref) {
if (lpref > 1)
v = va_arg(args, unsigned long long);
else
v = va_arg(args, unsigned long);
} else
v = va_arg(args, unsigned int);
if (c == 'd') {
/* sign-extend the value */
if (lpref == 0)
v = (long long)(int)v;
else if (lpref == 1)
v = (long long)(long)v;
}
switch (c) {
case 'c':
out[0] = v;
out[1] = 0;
break;
case 'd':
i64toa_r(v, out);
break;
case 'u':
u64toa_r(v, out);
break;
case 'p':
*(out++) = '0';
*(out++) = 'x';
/* fall through */
default: /* 'x' and 'p' above */
u64toh_r(v, out);
break;
}
outstr = tmpbuf;
}
else if (c == 's') {
outstr = va_arg(args, char *);
if (!outstr)
outstr="(null)";
}
else if (c == '%') {
/* queue it verbatim */
continue;
}
else {
/* modifiers or final 0 */
if (c == 'l') {
/* long format prefix, maintain the escape */
lpref++;
}
escape = 1;
goto do_escape;
}
len = strlen(outstr);
goto flush_str;
}
/* not an escape sequence */
if (c == 0 || c == '%') {
/* flush pending data on escape or end */
escape = 1;
lpref = 0;
outstr = fmt;
len = ofs - 1;
flush_str:
if (_fwrite(outstr, len, stream) != 0)
break;
written += len;
do_escape:
if (c == 0)
break;
fmt += ofs;
ofs = 0;
continue;
}
/* literal char, just queue it */
}
return written;
}
static __attribute__((unused))
int fprintf(FILE *stream, const char *fmt, ...)
{
va_list args;
int ret;
va_start(args, fmt);
ret = vfprintf(stream, fmt, args);
va_end(args);
return ret;
}
static __attribute__((unused))
int printf(const char *fmt, ...)
{
va_list args;
int ret;
va_start(args, fmt);
ret = vfprintf(stdout, fmt, args);
va_end(args);
return ret;
}
static __attribute__((unused))
void perror(const char *msg)
{
fprintf(stderr, "%s%serrno=%d\n", (msg && *msg) ? msg : "", (msg && *msg) ? ": " : "", errno);
}
#endif /* _NOLIBC_STDIO_H */

View File

@@ -0,0 +1,423 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* stdlib function definitions for NOLIBC
* Copyright (C) 2017-2021 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_STDLIB_H
#define _NOLIBC_STDLIB_H
#include "std.h"
#include "arch.h"
#include "types.h"
#include "sys.h"
#include "string.h"
struct nolibc_heap {
size_t len;
char user_p[] __attribute__((__aligned__));
};
/* Buffer used to store int-to-ASCII conversions. Will only be implemented if
* any of the related functions is implemented. The area is large enough to
* store "18446744073709551615" or "-9223372036854775808" and the final zero.
*/
static __attribute__((unused)) char itoa_buffer[21];
/*
* As much as possible, please keep functions alphabetically sorted.
*/
/* must be exported, as it's used by libgcc for various divide functions */
__attribute__((weak,unused,noreturn,section(".text.nolibc_abort")))
void abort(void)
{
sys_kill(sys_getpid(), SIGABRT);
for (;;);
}
static __attribute__((unused))
long atol(const char *s)
{
unsigned long ret = 0;
unsigned long d;
int neg = 0;
if (*s == '-') {
neg = 1;
s++;
}
while (1) {
d = (*s++) - '0';
if (d > 9)
break;
ret *= 10;
ret += d;
}
return neg ? -ret : ret;
}
static __attribute__((unused))
int atoi(const char *s)
{
return atol(s);
}
static __attribute__((unused))
void free(void *ptr)
{
struct nolibc_heap *heap;
if (!ptr)
return;
heap = container_of(ptr, struct nolibc_heap, user_p);
munmap(heap, heap->len);
}
/* getenv() tries to find the environment variable named <name> in the
* environment array pointed to by global variable "environ" which must be
* declared as a char **, and must be terminated by a NULL (it is recommended
* to set this variable to the "envp" argument of main()). If the requested
* environment variable exists its value is returned otherwise NULL is
* returned. getenv() is forcefully inlined so that the reference to "environ"
* will be dropped if unused, even at -O0.
*/
static __attribute__((unused))
char *_getenv(const char *name, char **environ)
{
int idx, i;
if (environ) {
for (idx = 0; environ[idx]; idx++) {
for (i = 0; name[i] && name[i] == environ[idx][i];)
i++;
if (!name[i] && environ[idx][i] == '=')
return &environ[idx][i+1];
}
}
return NULL;
}
static inline __attribute__((unused,always_inline))
char *getenv(const char *name)
{
extern char **environ;
return _getenv(name, environ);
}
static __attribute__((unused))
void *malloc(size_t len)
{
struct nolibc_heap *heap;
/* Always allocate memory with size multiple of 4096. */
len = sizeof(*heap) + len;
len = (len + 4095UL) & -4096UL;
heap = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE,
-1, 0);
if (__builtin_expect(heap == MAP_FAILED, 0))
return NULL;
heap->len = len;
return heap->user_p;
}
static __attribute__((unused))
void *calloc(size_t size, size_t nmemb)
{
void *orig;
size_t res = 0;
if (__builtin_expect(__builtin_mul_overflow(nmemb, size, &res), 0)) {
SET_ERRNO(ENOMEM);
return NULL;
}
/*
* No need to zero the heap, the MAP_ANONYMOUS in malloc()
* already does it.
*/
return malloc(res);
}
static __attribute__((unused))
void *realloc(void *old_ptr, size_t new_size)
{
struct nolibc_heap *heap;
size_t user_p_len;
void *ret;
if (!old_ptr)
return malloc(new_size);
heap = container_of(old_ptr, struct nolibc_heap, user_p);
user_p_len = heap->len - sizeof(*heap);
/*
* Don't realloc() if @user_p_len >= @new_size, this block of
* memory is still enough to handle the @new_size. Just return
* the same pointer.
*/
if (user_p_len >= new_size)
return old_ptr;
ret = malloc(new_size);
if (__builtin_expect(!ret, 0))
return NULL;
memcpy(ret, heap->user_p, heap->len);
munmap(heap, heap->len);
return ret;
}
/* Converts the unsigned long integer <in> to its hex representation into
* buffer <buffer>, which must be long enough to store the number and the
* trailing zero (17 bytes for "ffffffffffffffff" or 9 for "ffffffff"). The
* buffer is filled from the first byte, and the number of characters emitted
* (not counting the trailing zero) is returned. The function is constructed
* in a way to optimize the code size and avoid any divide that could add a
* dependency on large external functions.
*/
static __attribute__((unused))
int utoh_r(unsigned long in, char *buffer)
{
signed char pos = (~0UL > 0xfffffffful) ? 60 : 28;
int digits = 0;
int dig;
do {
dig = in >> pos;
in -= (uint64_t)dig << pos;
pos -= 4;
if (dig || digits || pos < 0) {
if (dig > 9)
dig += 'a' - '0' - 10;
buffer[digits++] = '0' + dig;
}
} while (pos >= 0);
buffer[digits] = 0;
return digits;
}
/* converts unsigned long <in> to an hex string using the static itoa_buffer
* and returns the pointer to that string.
*/
static inline __attribute__((unused))
char *utoh(unsigned long in)
{
utoh_r(in, itoa_buffer);
return itoa_buffer;
}
/* Converts the unsigned long integer <in> to its string representation into
* buffer <buffer>, which must be long enough to store the number and the
* trailing zero (21 bytes for 18446744073709551615 in 64-bit, 11 for
* 4294967295 in 32-bit). The buffer is filled from the first byte, and the
* number of characters emitted (not counting the trailing zero) is returned.
* The function is constructed in a way to optimize the code size and avoid
* any divide that could add a dependency on large external functions.
*/
static __attribute__((unused))
int utoa_r(unsigned long in, char *buffer)
{
unsigned long lim;
int digits = 0;
int pos = (~0UL > 0xfffffffful) ? 19 : 9;
int dig;
do {
for (dig = 0, lim = 1; dig < pos; dig++)
lim *= 10;
if (digits || in >= lim || !pos) {
for (dig = 0; in >= lim; dig++)
in -= lim;
buffer[digits++] = '0' + dig;
}
} while (pos--);
buffer[digits] = 0;
return digits;
}
/* Converts the signed long integer <in> to its string representation into
* buffer <buffer>, which must be long enough to store the number and the
* trailing zero (21 bytes for -9223372036854775808 in 64-bit, 12 for
* -2147483648 in 32-bit). The buffer is filled from the first byte, and the
* number of characters emitted (not counting the trailing zero) is returned.
*/
static __attribute__((unused))
int itoa_r(long in, char *buffer)
{
char *ptr = buffer;
int len = 0;
if (in < 0) {
in = -in;
*(ptr++) = '-';
len++;
}
len += utoa_r(in, ptr);
return len;
}
/* for historical compatibility, same as above but returns the pointer to the
* buffer.
*/
static inline __attribute__((unused))
char *ltoa_r(long in, char *buffer)
{
itoa_r(in, buffer);
return buffer;
}
/* converts long integer <in> to a string using the static itoa_buffer and
* returns the pointer to that string.
*/
static inline __attribute__((unused))
char *itoa(long in)
{
itoa_r(in, itoa_buffer);
return itoa_buffer;
}
/* converts long integer <in> to a string using the static itoa_buffer and
* returns the pointer to that string. Same as above, for compatibility.
*/
static inline __attribute__((unused))
char *ltoa(long in)
{
itoa_r(in, itoa_buffer);
return itoa_buffer;
}
/* converts unsigned long integer <in> to a string using the static itoa_buffer
* and returns the pointer to that string.
*/
static inline __attribute__((unused))
char *utoa(unsigned long in)
{
utoa_r(in, itoa_buffer);
return itoa_buffer;
}
/* Converts the unsigned 64-bit integer <in> to its hex representation into
* buffer <buffer>, which must be long enough to store the number and the
* trailing zero (17 bytes for "ffffffffffffffff"). The buffer is filled from
* the first byte, and the number of characters emitted (not counting the
* trailing zero) is returned. The function is constructed in a way to optimize
* the code size and avoid any divide that could add a dependency on large
* external functions.
*/
static __attribute__((unused))
int u64toh_r(uint64_t in, char *buffer)
{
signed char pos = 60;
int digits = 0;
int dig;
do {
if (sizeof(long) >= 8) {
dig = (in >> pos) & 0xF;
} else {
/* 32-bit platforms: avoid a 64-bit shift */
uint32_t d = (pos >= 32) ? (in >> 32) : in;
dig = (d >> (pos & 31)) & 0xF;
}
if (dig > 9)
dig += 'a' - '0' - 10;
pos -= 4;
if (dig || digits || pos < 0)
buffer[digits++] = '0' + dig;
} while (pos >= 0);
buffer[digits] = 0;
return digits;
}
/* converts uint64_t <in> to an hex string using the static itoa_buffer and
* returns the pointer to that string.
*/
static inline __attribute__((unused))
char *u64toh(uint64_t in)
{
u64toh_r(in, itoa_buffer);
return itoa_buffer;
}
/* Converts the unsigned 64-bit integer <in> to its string representation into
* buffer <buffer>, which must be long enough to store the number and the
* trailing zero (21 bytes for 18446744073709551615). The buffer is filled from
* the first byte, and the number of characters emitted (not counting the
* trailing zero) is returned. The function is constructed in a way to optimize
* the code size and avoid any divide that could add a dependency on large
* external functions.
*/
static __attribute__((unused))
int u64toa_r(uint64_t in, char *buffer)
{
unsigned long long lim;
int digits = 0;
int pos = 19; /* start with the highest possible digit */
int dig;
do {
for (dig = 0, lim = 1; dig < pos; dig++)
lim *= 10;
if (digits || in >= lim || !pos) {
for (dig = 0; in >= lim; dig++)
in -= lim;
buffer[digits++] = '0' + dig;
}
} while (pos--);
buffer[digits] = 0;
return digits;
}
/* Converts the signed 64-bit integer <in> to its string representation into
* buffer <buffer>, which must be long enough to store the number and the
* trailing zero (21 bytes for -9223372036854775808). The buffer is filled from
* the first byte, and the number of characters emitted (not counting the
* trailing zero) is returned.
*/
static __attribute__((unused))
int i64toa_r(int64_t in, char *buffer)
{
char *ptr = buffer;
int len = 0;
if (in < 0) {
in = -in;
*(ptr++) = '-';
len++;
}
len += u64toa_r(in, ptr);
return len;
}
/* converts int64_t <in> to a string using the static itoa_buffer and returns
* the pointer to that string.
*/
static inline __attribute__((unused))
char *i64toa(int64_t in)
{
i64toa_r(in, itoa_buffer);
return itoa_buffer;
}
/* converts uint64_t <in> to a string using the static itoa_buffer and returns
* the pointer to that string.
*/
static inline __attribute__((unused))
char *u64toa(uint64_t in)
{
u64toa_r(in, itoa_buffer);
return itoa_buffer;
}
#endif /* _NOLIBC_STDLIB_H */

View File

@@ -0,0 +1,285 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* string function definitions for NOLIBC
* Copyright (C) 2017-2021 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_STRING_H
#define _NOLIBC_STRING_H
#include "std.h"
static void *malloc(size_t len);
/*
* As much as possible, please keep functions alphabetically sorted.
*/
static __attribute__((unused))
int memcmp(const void *s1, const void *s2, size_t n)
{
size_t ofs = 0;
char c1 = 0;
while (ofs < n && !(c1 = ((char *)s1)[ofs] - ((char *)s2)[ofs])) {
ofs++;
}
return c1;
}
static __attribute__((unused))
void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
{
size_t pos = 0;
while (pos < len) {
((char *)dst)[pos] = ((const char *)src)[pos];
pos++;
}
return dst;
}
static __attribute__((unused))
void *_nolibc_memcpy_down(void *dst, const void *src, size_t len)
{
while (len) {
len--;
((char *)dst)[len] = ((const char *)src)[len];
}
return dst;
}
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memmove")))
void *memmove(void *dst, const void *src, size_t len)
{
size_t dir, pos;
pos = len;
dir = -1;
if (dst < src) {
pos = -1;
dir = 1;
}
while (len) {
pos += dir;
((char *)dst)[pos] = ((const char *)src)[pos];
len--;
}
return dst;
}
/* must be exported, as it's used by libgcc on ARM */
__attribute__((weak,unused,section(".text.nolibc_memcpy")))
void *memcpy(void *dst, const void *src, size_t len)
{
return _nolibc_memcpy_up(dst, src, len);
}
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memset")))
void *memset(void *dst, int b, size_t len)
{
char *p = dst;
while (len--)
*(p++) = b;
return dst;
}
static __attribute__((unused))
char *strchr(const char *s, int c)
{
while (*s) {
if (*s == (char)c)
return (char *)s;
s++;
}
return NULL;
}
static __attribute__((unused))
int strcmp(const char *a, const char *b)
{
unsigned int c;
int diff;
while (!(diff = (unsigned char)*a++ - (c = (unsigned char)*b++)) && c)
;
return diff;
}
static __attribute__((unused))
char *strcpy(char *dst, const char *src)
{
char *ret = dst;
while ((*dst++ = *src++));
return ret;
}
/* this function is only used with arguments that are not constants or when
* it's not known because optimizations are disabled.
*/
static __attribute__((unused))
size_t nolibc_strlen(const char *str)
{
size_t len;
for (len = 0; str[len]; len++);
return len;
}
/* do not trust __builtin_constant_p() at -O0, as clang will emit a test and
* the two branches, then will rely on an external definition of strlen().
*/
#if defined(__OPTIMIZE__)
#define strlen(str) ({ \
__builtin_constant_p((str)) ? \
__builtin_strlen((str)) : \
nolibc_strlen((str)); \
})
#else
#define strlen(str) nolibc_strlen((str))
#endif
static __attribute__((unused))
size_t strnlen(const char *str, size_t maxlen)
{
size_t len;
for (len = 0; (len < maxlen) && str[len]; len++);
return len;
}
static __attribute__((unused))
char *strdup(const char *str)
{
size_t len;
char *ret;
len = strlen(str);
ret = malloc(len + 1);
if (__builtin_expect(ret != NULL, 1))
memcpy(ret, str, len + 1);
return ret;
}
static __attribute__((unused))
char *strndup(const char *str, size_t maxlen)
{
size_t len;
char *ret;
len = strnlen(str, maxlen);
ret = malloc(len + 1);
if (__builtin_expect(ret != NULL, 1)) {
memcpy(ret, str, len);
ret[len] = '\0';
}
return ret;
}
static __attribute__((unused))
size_t strlcat(char *dst, const char *src, size_t size)
{
size_t len;
char c;
for (len = 0; dst[len]; len++)
;
for (;;) {
c = *src;
if (len < size)
dst[len] = c;
if (!c)
break;
len++;
src++;
}
return len;
}
static __attribute__((unused))
size_t strlcpy(char *dst, const char *src, size_t size)
{
size_t len;
char c;
for (len = 0;;) {
c = src[len];
if (len < size)
dst[len] = c;
if (!c)
break;
len++;
}
return len;
}
static __attribute__((unused))
char *strncat(char *dst, const char *src, size_t size)
{
char *orig = dst;
while (*dst)
dst++;
while (size && (*dst = *src)) {
src++;
dst++;
size--;
}
*dst = 0;
return orig;
}
static __attribute__((unused))
int strncmp(const char *a, const char *b, size_t size)
{
unsigned int c;
int diff = 0;
while (size-- &&
!(diff = (unsigned char)*a++ - (c = (unsigned char)*b++)) && c)
;
return diff;
}
static __attribute__((unused))
char *strncpy(char *dst, const char *src, size_t size)
{
size_t len;
for (len = 0; len < size; len++)
if ((dst[len] = *src))
src++;
return dst;
}
static __attribute__((unused))
char *strrchr(const char *s, int c)
{
const char *ret = NULL;
while (*s) {
if (*s == (char)c)
ret = s;
s++;
}
return (char *)ret;
}
#endif /* _NOLIBC_STRING_H */

1247
tools/include/nolibc/sys.h Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,28 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* time function definitions for NOLIBC
* Copyright (C) 2017-2022 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_TIME_H
#define _NOLIBC_TIME_H
#include "std.h"
#include "arch.h"
#include "types.h"
#include "sys.h"
static __attribute__((unused))
time_t time(time_t *tptr)
{
struct timeval tv;
/* note, cannot fail here */
sys_gettimeofday(&tv, NULL);
if (tptr)
*tptr = tv.tv_sec;
return tv.tv_sec;
}
#endif /* _NOLIBC_TIME_H */

View File

@@ -0,0 +1,205 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* Special types used by various syscalls for NOLIBC
* Copyright (C) 2017-2021 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_TYPES_H
#define _NOLIBC_TYPES_H
#include "std.h"
#include <linux/time.h>
/* Only the generic macros and types may be defined here. The arch-specific
* ones such as the O_RDONLY and related macros used by fcntl() and open(), or
* the layout of sys_stat_struct must not be defined here.
*/
/* stat flags (WARNING, octal here) */
#define S_IFDIR 0040000
#define S_IFCHR 0020000
#define S_IFBLK 0060000
#define S_IFREG 0100000
#define S_IFIFO 0010000
#define S_IFLNK 0120000
#define S_IFSOCK 0140000
#define S_IFMT 0170000
#define S_ISDIR(mode) (((mode) & S_IFDIR) == S_IFDIR)
#define S_ISCHR(mode) (((mode) & S_IFCHR) == S_IFCHR)
#define S_ISBLK(mode) (((mode) & S_IFBLK) == S_IFBLK)
#define S_ISREG(mode) (((mode) & S_IFREG) == S_IFREG)
#define S_ISFIFO(mode) (((mode) & S_IFIFO) == S_IFIFO)
#define S_ISLNK(mode) (((mode) & S_IFLNK) == S_IFLNK)
#define S_ISSOCK(mode) (((mode) & S_IFSOCK) == S_IFSOCK)
/* dirent types */
#define DT_UNKNOWN 0x0
#define DT_FIFO 0x1
#define DT_CHR 0x2
#define DT_DIR 0x4
#define DT_BLK 0x6
#define DT_REG 0x8
#define DT_LNK 0xa
#define DT_SOCK 0xc
/* commonly an fd_set represents 256 FDs */
#ifndef FD_SETSIZE
#define FD_SETSIZE 256
#endif
/* PATH_MAX and MAXPATHLEN are often used and found with plenty of different
* values.
*/
#ifndef PATH_MAX
#define PATH_MAX 4096
#endif
#ifndef MAXPATHLEN
#define MAXPATHLEN (PATH_MAX)
#endif
/* Special FD used by all the *at functions */
#ifndef AT_FDCWD
#define AT_FDCWD (-100)
#endif
/* whence values for lseek() */
#define SEEK_SET 0
#define SEEK_CUR 1
#define SEEK_END 2
/* cmd for reboot() */
#define LINUX_REBOOT_MAGIC1 0xfee1dead
#define LINUX_REBOOT_MAGIC2 0x28121969
#define LINUX_REBOOT_CMD_HALT 0xcdef0123
#define LINUX_REBOOT_CMD_POWER_OFF 0x4321fedc
#define LINUX_REBOOT_CMD_RESTART 0x01234567
#define LINUX_REBOOT_CMD_SW_SUSPEND 0xd000fce2
/* Macros used on waitpid()'s return status */
#define WEXITSTATUS(status) (((status) & 0xff00) >> 8)
#define WIFEXITED(status) (((status) & 0x7f) == 0)
/* waitpid() flags */
#define WNOHANG 1
/* standard exit() codes */
#define EXIT_SUCCESS 0
#define EXIT_FAILURE 1
/* for select() */
typedef struct {
uint32_t fd32[(FD_SETSIZE + 31) / 32];
} fd_set;
#define FD_CLR(fd, set) do { \
fd_set *__set = (set); \
int __fd = (fd); \
if (__fd >= 0) \
__set->fd32[__fd / 32] &= ~(1U << (__fd & 31)); \
} while (0)
#define FD_SET(fd, set) do { \
fd_set *__set = (set); \
int __fd = (fd); \
if (__fd >= 0) \
__set->fd32[__fd / 32] |= 1U << (__fd & 31); \
} while (0)
#define FD_ISSET(fd, set) ({ \
fd_set *__set = (set); \
int __fd = (fd); \
int __r = 0; \
if (__fd >= 0) \
__r = !!(__set->fd32[__fd / 32] & 1U << (__fd & 31)); \
__r; \
})
#define FD_ZERO(set) do { \
fd_set *__set = (set); \
int __idx; \
for (__idx = 0; __idx < (FD_SETSIZE+31) / 32; __idx ++) \
__set->fd32[__idx] = 0; \
} while (0)
/* for poll() */
#define POLLIN 0x0001
#define POLLPRI 0x0002
#define POLLOUT 0x0004
#define POLLERR 0x0008
#define POLLHUP 0x0010
#define POLLNVAL 0x0020
struct pollfd {
int fd;
short int events;
short int revents;
};
/* for getdents64() */
struct linux_dirent64 {
uint64_t d_ino;
int64_t d_off;
unsigned short d_reclen;
unsigned char d_type;
char d_name[];
};
/* needed by wait4() */
struct rusage {
struct timeval ru_utime;
struct timeval ru_stime;
long ru_maxrss;
long ru_ixrss;
long ru_idrss;
long ru_isrss;
long ru_minflt;
long ru_majflt;
long ru_nswap;
long ru_inblock;
long ru_oublock;
long ru_msgsnd;
long ru_msgrcv;
long ru_nsignals;
long ru_nvcsw;
long ru_nivcsw;
};
/* The format of the struct as returned by the libc to the application, which
* significantly differs from the format returned by the stat() syscall flavours.
*/
struct stat {
dev_t st_dev; /* ID of device containing file */
ino_t st_ino; /* inode number */
mode_t st_mode; /* protection */
nlink_t st_nlink; /* number of hard links */
uid_t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
dev_t st_rdev; /* device ID (if special file) */
off_t st_size; /* total size, in bytes */
blksize_t st_blksize; /* blocksize for file system I/O */
blkcnt_t st_blocks; /* number of 512B blocks allocated */
time_t st_atime; /* time of last access */
time_t st_mtime; /* time of last modification */
time_t st_ctime; /* time of last status change */
};
/* WARNING, it only deals with the 4096 first majors and 256 first minors */
#define makedev(major, minor) ((dev_t)((((major) & 0xfff) << 8) | ((minor) & 0xff)))
#define major(dev) ((unsigned int)(((dev) >> 8) & 0xfff))
#define minor(dev) ((unsigned int)(((dev) & 0xff))
#ifndef offsetof
#define offsetof(TYPE, FIELD) ((size_t) &((TYPE *)0)->FIELD)
#endif
#ifndef container_of
#define container_of(PTR, TYPE, FIELD) ({ \
__typeof__(((TYPE *)0)->FIELD) *__FIELD_PTR = (PTR); \
(TYPE *)((char *) __FIELD_PTR - offsetof(TYPE, FIELD)); \
})
#endif
#endif /* _NOLIBC_TYPES_H */

View File

@@ -0,0 +1,54 @@
/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
/*
* unistd function definitions for NOLIBC
* Copyright (C) 2017-2022 Willy Tarreau <w@1wt.eu>
*/
#ifndef _NOLIBC_UNISTD_H
#define _NOLIBC_UNISTD_H
#include "std.h"
#include "arch.h"
#include "types.h"
#include "sys.h"
static __attribute__((unused))
int msleep(unsigned int msecs)
{
struct timeval my_timeval = { msecs / 1000, (msecs % 1000) * 1000 };
if (sys_select(0, 0, 0, 0, &my_timeval) < 0)
return (my_timeval.tv_sec * 1000) +
(my_timeval.tv_usec / 1000) +
!!(my_timeval.tv_usec % 1000);
else
return 0;
}
static __attribute__((unused))
unsigned int sleep(unsigned int seconds)
{
struct timeval my_timeval = { seconds, 0 };
if (sys_select(0, 0, 0, 0, &my_timeval) < 0)
return my_timeval.tv_sec + !!my_timeval.tv_usec;
else
return 0;
}
static __attribute__((unused))
int usleep(unsigned int usecs)
{
struct timeval my_timeval = { usecs / 1000000, usecs % 1000000 };
return sys_select(0, 0, 0, 0, &my_timeval);
}
static __attribute__((unused))
int tcsetpgrp(int fd, pid_t pid)
{
return ioctl(fd, TIOCSPGRP, &pid);
}
#endif /* _NOLIBC_UNISTD_H */

View File

@@ -54,7 +54,8 @@ klitmus7 Compatibility Table
-- 4.14 7.48 --
4.15 -- 4.19 7.49 --
4.20 -- 5.5 7.54 --
5.6 -- 7.56 --
5.6 -- 5.16 7.56 --
5.17 -- 7.56.1 --
============ ==========

View File

@@ -301,7 +301,7 @@ specify_qemu_cpus () {
echo $2 -smp $3
;;
qemu-system-ppc64)
nt="`lscpu | grep '^NUMA node0' | sed -e 's/^[^,]*,\([0-9]*\),.*$/\1/'`"
nt="`lscpu | sed -n 's/^Thread(s) per core:\s*//p'`"
echo $2 -smp cores=`expr \( $3 + $nt - 1 \) / $nt`,threads=$nt
;;
esac

View File

@@ -36,7 +36,7 @@ do
then
egrep "error:|warning:|^ld: .*undefined reference to" < $i > $i.diags
files="$files $i.diags $i"
elif ! test -f ${scenariobasedir}/vmlinux
elif ! test -f ${scenariobasedir}/vmlinux && ! test -f "${rundir}/re-run"
then
echo No ${scenariobasedir}/vmlinux file > $i.diags
files="$files $i.diags $i"

View File

@@ -33,7 +33,12 @@ do
TORTURE_SUITE="`cat $i/../torture_suite`"
configfile=`echo $i | sed -e 's,^.*/,,'`
rm -f $i/console.log.*.diags
kvm-recheck-${TORTURE_SUITE}.sh $i
case "${TORTURE_SUITE}" in
X*)
;;
*)
kvm-recheck-${TORTURE_SUITE}.sh $i
esac
if test -f "$i/qemu-retval" && test "`cat $i/qemu-retval`" -ne 0 && test "`cat $i/qemu-retval`" -ne 137
then
echo QEMU error, output:

View File

@@ -138,14 +138,14 @@ chmod +x $T/bin/kvm-remote-*.sh
# Check first to avoid the need for cleanup for system-name typos
for i in $systems
do
ncpus="`ssh $i getconf _NPROCESSORS_ONLN 2> /dev/null`"
echo $i: $ncpus CPUs " " `date` | tee -a "$oldrun/remote-log"
ncpus="`ssh -o BatchMode=yes $i getconf _NPROCESSORS_ONLN 2> /dev/null`"
ret=$?
if test "$ret" -ne 0
then
echo System $i unreachable, giving up. | tee -a "$oldrun/remote-log"
exit 4
fi
echo $i: $ncpus CPUs " " `date` | tee -a "$oldrun/remote-log"
done
# Download and expand the tarball on all systems.
@@ -153,14 +153,14 @@ echo Build-products tarball: `du -h $T/binres.tgz` | tee -a "$oldrun/remote-log"
for i in $systems
do
echo Downloading tarball to $i `date` | tee -a "$oldrun/remote-log"
cat $T/binres.tgz | ssh $i "cd /tmp; tar -xzf -"
cat $T/binres.tgz | ssh -o BatchMode=yes $i "cd /tmp; tar -xzf -"
ret=$?
tries=0
while test "$ret" -ne 0
do
echo Unable to download $T/binres.tgz to system $i, waiting and then retrying. $tries prior retries. | tee -a "$oldrun/remote-log"
sleep 60
cat $T/binres.tgz | ssh $i "cd /tmp; tar -xzf -"
cat $T/binres.tgz | ssh -o BatchMode=yes $i "cd /tmp; tar -xzf -"
ret=$?
if test "$ret" -ne 0
then
@@ -185,7 +185,7 @@ checkremotefile () {
while :
do
ssh $1 "test -f \"$2\""
ssh -o BatchMode=yes $1 "test -f \"$2\""
ret=$?
if test "$ret" -eq 255
then
@@ -228,7 +228,7 @@ startbatches () {
then
continue # System still running last test, skip.
fi
ssh "$i" "cd \"$resdir/$ds\"; touch remote.run; PATH=\"$T/bin:$PATH\" nohup kvm-remote-$curbatch.sh > kvm-remote-$curbatch.sh.out 2>&1 &" 1>&2
ssh -o BatchMode=yes "$i" "cd \"$resdir/$ds\"; touch remote.run; PATH=\"$T/bin:$PATH\" nohup kvm-remote-$curbatch.sh > kvm-remote-$curbatch.sh.out 2>&1 &" 1>&2
ret=$?
if test "$ret" -ne 0
then
@@ -267,7 +267,7 @@ do
sleep 30
done
echo " ---" Collecting results from $i `date` | tee -a "$oldrun/remote-log"
( cd "$oldrun"; ssh $i "cd $rundir; tar -czf - kvm-remote-*.sh.out */console.log */kvm-test-1-run*.sh.out */qemu[_-]pid */qemu-retval */qemu-affinity; rm -rf $T > /dev/null 2>&1" | tar -xzf - )
( cd "$oldrun"; ssh -o BatchMode=yes $i "cd $rundir; tar -czf - kvm-remote-*.sh.out */console.log */kvm-test-1-run*.sh.out */qemu[_-]pid */qemu-retval */qemu-affinity; rm -rf $T > /dev/null 2>&1" | tar -xzf - )
done
( kvm-end-run-stats.sh "$oldrun" "$starttime"; echo $? > $T/exitcode ) | tee -a "$oldrun/remote-log"

View File

@@ -44,6 +44,7 @@ TORTURE_KCONFIG_KASAN_ARG=""
TORTURE_KCONFIG_KCSAN_ARG=""
TORTURE_KMAKE_ARG=""
TORTURE_QEMU_MEM=512
torture_qemu_mem_default=1
TORTURE_REMOTE=
TORTURE_SHUTDOWN_GRACE=180
TORTURE_SUITE=rcu
@@ -86,7 +87,7 @@ usage () {
echo " --remote"
echo " --results absolute-pathname"
echo " --shutdown-grace seconds"
echo " --torture lock|rcu|rcuscale|refscale|scf"
echo " --torture lock|rcu|rcuscale|refscale|scf|X*"
echo " --trust-make"
exit 1
}
@@ -180,6 +181,10 @@ do
;;
--kasan)
TORTURE_KCONFIG_KASAN_ARG="CONFIG_DEBUG_INFO=y CONFIG_KASAN=y"; export TORTURE_KCONFIG_KASAN_ARG
if test -n "$torture_qemu_mem_default"
then
TORTURE_QEMU_MEM=2G
fi
;;
--kconfig|--kconfigs)
checkarg --kconfig "(Kconfig options)" $# "$2" '^CONFIG_[A-Z0-9_]\+=\([ynm]\|[0-9]\+\)\( CONFIG_[A-Z0-9_]\+=\([ynm]\|[0-9]\+\)\)*$' '^error$'
@@ -202,6 +207,7 @@ do
--memory)
checkarg --memory "(memory size)" $# "$2" '^[0-9]\+[MG]\?$' error
TORTURE_QEMU_MEM=$2
torture_qemu_mem_default=
shift
;;
--no-initrd)
@@ -231,7 +237,7 @@ do
shift
;;
--torture)
checkarg --torture "(suite name)" "$#" "$2" '^\(lock\|rcu\|rcuscale\|refscale\|scf\)$' '^--'
checkarg --torture "(suite name)" "$#" "$2" '^\(lock\|rcu\|rcuscale\|refscale\|scf\|X.*\)$' '^--'
TORTURE_SUITE=$2
TORTURE_MOD="`echo $TORTURE_SUITE | sed -e 's/^\(lock\|rcu\|scf\)$/\1torture/'`"
shift

View File

@@ -54,6 +54,7 @@ do_kvfree=yes
do_kasan=yes
do_kcsan=no
do_clocksourcewd=yes
do_rt=yes
# doyesno - Helper function for yes/no arguments
function doyesno () {
@@ -82,6 +83,7 @@ usage () {
echo " --do-rcuscale / --do-no-rcuscale"
echo " --do-rcutorture / --do-no-rcutorture"
echo " --do-refscale / --do-no-refscale"
echo " --do-rt / --do-no-rt"
echo " --do-scftorture / --do-no-scftorture"
echo " --duration [ <minutes> | <hours>h | <days>d ]"
echo " --kcsan-kmake-arg kernel-make-arguments"
@@ -118,6 +120,7 @@ do
do_scftorture=yes
do_rcuscale=yes
do_refscale=yes
do_rt=yes
do_kvfree=yes
do_kasan=yes
do_kcsan=yes
@@ -148,6 +151,7 @@ do
do_scftorture=no
do_rcuscale=no
do_refscale=no
do_rt=no
do_kvfree=no
do_kasan=no
do_kcsan=no
@@ -162,6 +166,9 @@ do
--do-refscale|--do-no-refscale)
do_refscale=`doyesno "$1" --do-refscale`
;;
--do-rt|--do-no-rt)
do_rt=`doyesno "$1" --do-rt`
;;
--do-scftorture|--do-no-scftorture)
do_scftorture=`doyesno "$1" --do-scftorture`
;;
@@ -322,6 +329,7 @@ then
echo " --- make clean" > "$amcdir/Make.out" 2>&1
make -j$MAKE_ALLOTED_CPUS clean >> "$amcdir/Make.out" 2>&1
echo " --- make allmodconfig" >> "$amcdir/Make.out" 2>&1
cp .config $amcdir
make -j$MAKE_ALLOTED_CPUS allmodconfig >> "$amcdir/Make.out" 2>&1
echo " --- make " >> "$amcdir/Make.out" 2>&1
make -j$MAKE_ALLOTED_CPUS >> "$amcdir/Make.out" 2>&1
@@ -350,8 +358,19 @@ fi
if test "$do_scftorture" = "yes"
then
torture_bootargs="scftorture.nthreads=$HALF_ALLOTED_CPUS torture.disable_onoff_at_boot"
torture_set "scftorture" tools/testing/selftests/rcutorture/bin/kvm.sh --torture scf --allcpus --duration "$duration_scftorture" --configs "$configs_scftorture" --kconfig "CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --memory 1G --trust-make
torture_bootargs="scftorture.nthreads=$HALF_ALLOTED_CPUS torture.disable_onoff_at_boot csdlock_debug=1"
torture_set "scftorture" tools/testing/selftests/rcutorture/bin/kvm.sh --torture scf --allcpus --duration "$duration_scftorture" --configs "$configs_scftorture" --kconfig "CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --memory 2G --trust-make
fi
if test "$do_rt" = "yes"
then
# With all post-boot grace periods forced to normal.
torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_normal=1"
torture_set "rcurttorture" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --trust-make
# With all post-boot grace periods forced to expedited.
torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_expedited=1"
torture_set "rcurttorture-exp" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --trust-make
fi
if test "$do_refscale" = yes
@@ -363,7 +382,7 @@ fi
for prim in $primlist
do
torture_bootargs="refscale.scale_type="$prim" refscale.nreaders=$HALF_ALLOTED_CPUS refscale.loops=10000 refscale.holdoff=20 torture.disable_onoff_at_boot"
torture_set "refscale-$prim" tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --kconfig "CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --bootargs "verbose_batched=$VERBOSE_BATCH_CPUS torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=$VERBOSE_BATCH_CPUS" --trust-make
torture_set "refscale-$prim" tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --kconfig "CONFIG_TASKS_TRACE_RCU=y CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --bootargs "verbose_batched=$VERBOSE_BATCH_CPUS torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=$VERBOSE_BATCH_CPUS" --trust-make
done
if test "$do_rcuscale" = yes
@@ -375,13 +394,13 @@ fi
for prim in $primlist
do
torture_bootargs="rcuscale.scale_type="$prim" rcuscale.nwriters=$HALF_ALLOTED_CPUS rcuscale.holdoff=20 torture.disable_onoff_at_boot"
torture_set "rcuscale-$prim" tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 5 --kconfig "CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --trust-make
torture_set "rcuscale-$prim" tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 5 --kconfig "CONFIG_TASKS_TRACE_RCU=y CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --trust-make
done
if test "$do_kvfree" = "yes"
then
torture_bootargs="rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 rcuscale.holdoff=20 rcuscale.kfree_loops=10000 torture.disable_onoff_at_boot"
torture_set "rcuscale-kvfree" tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 10 --kconfig "CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --memory 1G --trust-make
torture_set "rcuscale-kvfree" tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 10 --kconfig "CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --memory 2G --trust-make
fi
if test "$do_clocksourcewd" = "yes"

View File

@@ -8,3 +8,5 @@ CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
#CHECK#CONFIG_PROVE_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_FORCE_TASKS_RUDE_RCU=y
#CHECK#CONFIG_TASKS_RUDE_RCU=y

View File

@@ -6,3 +6,5 @@ CONFIG_PREEMPT_NONE=y
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=n
#CHECK#CONFIG_RCU_EXPERT=n
CONFIG_KPROBES=n
CONFIG_FTRACE=n

View File

@@ -7,4 +7,5 @@ CONFIG_PREEMPT=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
#CHECK#CONFIG_PROVE_RCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_EXPERT=y

View File

@@ -2,3 +2,7 @@ CONFIG_SMP=n
CONFIG_PREEMPT_NONE=y
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=n
CONFIG_PREEMPT_DYNAMIC=n
#CHECK#CONFIG_TASKS_RCU=y
CONFIG_FORCE_TASKS_RCU=y
CONFIG_RCU_EXPERT=y

View File

@@ -1 +1,2 @@
rcutorture.torture_type=tasks
rcutorture.stat_interval=60

View File

@@ -7,3 +7,5 @@ CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=n
CONFIG_NO_HZ_FULL=y
#CHECK#CONFIG_RCU_EXPERT=n
CONFIG_TASKS_RCU=y
CONFIG_RCU_EXPERT=y

View File

@@ -4,8 +4,11 @@ CONFIG_HOTPLUG_CPU=y
CONFIG_PREEMPT_NONE=y
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=n
CONFIG_PREEMPT_DYNAMIC=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PROVE_LOCKING=n
#CHECK#CONFIG_PROVE_RCU=n
CONFIG_FORCE_TASKS_TRACE_RCU=y
#CHECK#CONFIG_TASKS_TRACE_RCU=y
CONFIG_TASKS_TRACE_RCU_READ_MB=y
CONFIG_RCU_EXPERT=y

View File

@@ -7,5 +7,7 @@ CONFIG_PREEMPT=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
#CHECK#CONFIG_PROVE_RCU=y
CONFIG_FORCE_TASKS_TRACE_RCU=y
#CHECK#CONFIG_TASKS_TRACE_RCU=y
CONFIG_TASKS_TRACE_RCU_READ_MB=n
CONFIG_RCU_EXPERT=y

View File

@@ -1,8 +1,9 @@
CONFIG_SMP=y
CONFIG_NR_CPUS=8
CONFIG_PREEMPT_NONE=y
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT_NONE=n
CONFIG_PREEMPT_VOLUNTARY=y
CONFIG_PREEMPT=n
CONFIG_PREEMPT_DYNAMIC=n
#CHECK#CONFIG_TREE_RCU=y
CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=n

View File

@@ -3,6 +3,7 @@ CONFIG_NR_CPUS=16
CONFIG_PREEMPT_NONE=y
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=n
CONFIG_PREEMPT_DYNAMIC=n
#CHECK#CONFIG_TREE_RCU=y
CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=n

View File

@@ -13,3 +13,5 @@ CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
#CHECK#CONFIG_RCU_EXPERT=n
CONFIG_KPROBES=n
CONFIG_FTRACE=n

View File

@@ -3,6 +3,7 @@ CONFIG_NR_CPUS=56
CONFIG_PREEMPT_NONE=y
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=n
CONFIG_PREEMPT_DYNAMIC=n
#CHECK#CONFIG_TREE_RCU=y
CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=y

View File

@@ -9,7 +9,7 @@
# rcutorture_param_n_barrier_cbs bootparam-string
#
# Adds n_barrier_cbs rcutorture module parameter to kernels having it.
# Adds n_barrier_cbs rcutorture module parameter if not already specified.
rcutorture_param_n_barrier_cbs () {
if echo $1 | grep -q "rcutorture\.n_barrier_cbs"
then
@@ -30,13 +30,25 @@ rcutorture_param_onoff () {
fi
}
# rcutorture_param_stat_interval bootparam-string
#
# Adds stat_interval rcutorture module parameter if not already specified.
rcutorture_param_stat_interval () {
if echo $1 | grep -q "rcutorture\.stat_interval"
then
:
else
echo rcutorture.stat_interval=15
fi
}
# per_version_boot_params bootparam-string config-file seconds
#
# Adds per-version torture-module parameters to kernels supporting them.
per_version_boot_params () {
echo $1 `rcutorture_param_onoff "$1" "$2"` \
`rcutorture_param_n_barrier_cbs "$1"` \
rcutorture.stat_interval=15 \
`rcutorture_param_stat_interval "$1"` \
rcutorture.shutdown_secs=$3 \
rcutorture.test_no_idle_hz=1 \
rcutorture.verbose=1

Some files were not shown because too many files have changed in this diff Show More