By default all SME operations in userspace will trap. When this happens
we allocate storage space for the SME register state, set up the SVE
registers and disable traps. We do not need to initialize ZA since the
architecture guarantees that it will be zeroed when enabled and when we
trap ZA is disabled.
On syscall we exit streaming mode if we were previously in it and ensure
that all but the lower 128 bits of the registers are zeroed while
preserving the state of ZA. This follows the aarch64 PCS for SME, ZA
state is preserved over a function call and streaming mode is exited.
Since the traps for SME do not distinguish between streaming mode SVE
and ZA usage if ZA is in use rather than reenabling traps we instead
zero the parts of the SVE registers not shared with FPSIMD and leave SME
enabled, this simplifies handling SME traps. If ZA is not in use then we
reenable SME traps and fall through to normal handling of SVE.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-17-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 8bd7f91c03)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I98fc449f3310a91bb42e5c1b44f6ff5929d61e4a
Allocate space for storing ZA on first access to SME and use that to save
and restore ZA state when context switching. We do this by using the vector
form of the LDR and STR ZA instructions, these do not require streaming
mode and have implementation recommendations that they avoid contention
issues in shared SMCU implementations.
Since ZA is architecturally guaranteed to be zeroed when enabled we do not
need to explicitly zero ZA, either we will be restoring from a saved copy
or trapping on first use of SME so we know that ZA must be disabled.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-16-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 0033cd9339)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I95f189d064fb54f6d7d632e3b6c07d11aef9c6f0
When in streaming mode we need to save and restore the streaming mode
SVE register state rather than the regular SVE register state. This uses
the streaming mode vector length and omits FFR but is otherwise identical,
if TIF_SVE is enabled when we are in streaming mode then streaming mode
takes precedence.
This does not handle use of streaming SVE state with KVM, ptrace or
signals. This will be updated in further patches.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-15-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit af7167d6d2)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I3070deb80c9ca0a387ea1da8235e459cdb58ecf0
In SME the use of both streaming SVE mode and ZA are tracked through
PSTATE.SM and PSTATE.ZA, visible through the system register SVCR. In
order to context switch the floating point state for SME we need to
context switch the contents of this register as part of context
switching the floating point state.
Since changing the vector length exits streaming SVE mode and disables
ZA we also make sure we update SVCR appropriately when setting vector
length, and similarly ensure that new threads have streaming SVE mode
and ZA disabled.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-14-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit b40c559b45)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: If9fae0de37b347114fc7f0450120bc5bbf0a6757
The Scalable Matrix Extension introduces support for a new thread specific
data register TPIDR2 intended for use by libc. The kernel must save the
value of TPIDR2 on context switch and should ensure that all new threads
start off with a default value of 0. Add a field to the thread_struct to
store TPIDR2 and context switch it with the other thread specific data.
In case there are future extensions which also use TPIDR2 we introduce
system_supports_tpidr2() and use that rather than system_supports_sme()
for TPIDR2 handling.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-13-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit a9d6915859)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ib5944d3519b12a2898cd32e2af4863f9062b9bee
The vector lengths used for SME are controlled through a similar set of
registers to those for SVE and enumerated using a similar algorithm with
some slight differences due to the fact that unlike SVE there are no
restrictions on which combinations of vector lengths can be supported
nor any mandatory vector lengths which must be implemented. Add a new
vector type and implement support for enumerating it.
One slightly awkward feature is that we need to read the current vector
length using a different instruction (or enter streaming mode which
would have the same issue and be higher cost). Rather than add an ops
structure we add special cases directly in the otherwise generic
vec_probe_vqs() function, this is a bit inelegant but it's the only
place where this is an issue.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-10-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit b42990d3bf)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I7970dc4fa69b88c4db262abc55552b3a82072cd4
SME requires similar setup to that for SVE: disable traps to EL2 and
make sure that the maximum vector length is available to EL1, for SME we
have two traps - one for SME itself and one for TPIDR2.
In addition since we currently make no active use of priority control
for SCMUs we map all SME priorities lower ELs may configure to 0, the
architecture specified minimum priority, to ensure that nothing we
manage is able to configure itself to consume excessive resources. This
will need to be revisited should there be a need to manage SME
priorities at runtime.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-8-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit b2cf6a2328)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I320fddaaaa1e267d4f4f5e7b14d9e91de5b5998f
As with SVE rather than impose ambitious toolchain requirements for SME
we manually encode the few instructions which we require in order to
perform the work the kernel needs to do. The instructions used to save
and restore context are provided as assembler macros while those for
entering and leaving streaming mode are done in asm volatile blocks
since they are expected to be used from C.
We could do the SMSTART and SMSTOP operations with read/modify/write
cycles on SVCR but using the aliases provided for individual field
accesses should be slightly faster. These instructions are aliases for
MSR but since our minimum toolchain requirements are old enough to mean
that we can't use the sX_X_cX_cX_X form and they always use xzr rather
than taking a value like write_sysreg_s() wants we just use .inst.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-7-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit ca8a4ebcff)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Iefe76395cf940518d7ef245c2c0c434f8ea28d51
The arm64 Scalable Matrix Extension (SME) adds some new system registers,
fields in existing system registers and exception syndromes. This patch
adds definitions for these for use in future patches implementing support
for this extension.
Since SME will be the first user of FEAT_HCX in the kernel also include
the definitions for enumerating it and the HCRX system register it adds.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-6-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit b4adc83b07)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ic1698ac8d3b7935d95a52d439c4e3bccde09b91b
For WFxT instructions used with very small delays, it is not
unlikely that the deadline is already expired by the time we
reach the WFx handling code.
Check for this condition as soon as possible, and return to the
guest immediately if we can.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419182755.601427-7-maz@kernel.org
(cherry picked from commit a3fb596514)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I03090182ab73793c54601d8beb09be3ec090a053
When trapping a blocking WFIT instruction, take it into account when
computing the deadline of the background timer.
The state is tracked with a new vcpu flag, and is gated by a new
CPU capability, which isn't currently enabled.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419182755.601427-6-maz@kernel.org
(cherry picked from commit 89f5074c50)
[willdeacon@: Use kvm_vcpu_block() instead of kvm_vcpu_halt() in context]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I9819233e1cb0df8a4bfb5018a47249b16fecae49
Refactor kvm_timer_compute_delta() and extract a helper that
compute the delta (in ns) between a given timer and an arbitrary
value.
No functional change expected.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419182755.601427-5-maz@kernel.org
(cherry picked from commit daf85a5f6b)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I5082092730945b808bd0bfb051c071340beb0266
kvm_cpu_has_pending_timer() ends up checking all the possible
timers for a wake-up cause. However, we already check for
pending interrupts whenever we try to wake-up a vcpu, including
the timer interrupts.
Obviously, doing the same work twice is once too many. Reduce
this helper to almost nothing, but keep it around, as we are
going to make use of it soon.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419182755.601427-4-maz@kernel.org
(cherry picked from commit b57de4ffd7)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I051492d81a822d21c9434d5a09e8f670fe1d788d
The ISS field exposed by ESR_ELx contain two additional subfields
with FEAT_WFxT:
- RN, the register number containing the timeout
- RV, indicating if the register number is valid
Describe these two fields according to the arch spec.
No functional change.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419182755.601427-3-maz@kernel.org
(cherry picked from commit bdcc2f2803)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ia52d9e0985e113afb1eaa75d7b51047e80348477
Starting with FEAT_WFXT in ARMv8.7, the TI field in the ISS
that is reported on a WFx trap is expanded by one bit to
allow the description of WFET and WFIT.
Special care is taken to exclude the WFxT bit from the mask
used to match WFI so that it also matches WFIT when trapped from
EL0.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419182755.601427-2-maz@kernel.org
(cherry picked from commit 6a437208cb)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I000829397e606b68f067e5433efd339a1b4d15db
Since all the fields in the main ID registers are 4 bits wide we have up
until now not bothered specifying the width in the code. Since we now
wish to use this mechanism to enumerate features from the floating point
feature registers which do not follow this pattern add a width to the
table. This means updating all the existing table entries but makes it
less likely that we run into issues in future due to implicitly assuming
a 4 bit width.
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220207152109.197566-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 0a2eec83c2)
[willdeacon@: Add missing .field_width initialisers per upstream merge commit
8d93b7a242 ("Merge branch 'for-next/fpsimd' into for-next/core")]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ic2bd609d701fbdcc53b9d60cb402b4a6a102bfe7
CPACR_EL1 has several bitfields for controlling traps for floating point
features to EL1, each of which has a separate bits for EL0 and EL1. Marc
Zyngier noted that we are not consistent in our use of defines to
manipulate these, sometimes using a define covering the whole field and
sometimes using defines for the individual bits. Make this consistent by
expanding the whole field defines where they are used (currently only in
the KVM code) and deleting them so that no further uses can be
introduced.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220207152109.197566-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 3bb72d86d8)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I2e42e5c18bfdcbeac99eec5121b1ef5117c05ec8
The base floating point, SVE and SME all have enable controls for EL0 and
EL1 in CPACR_EL1 which have a similar layout and function. Currently the
basic floating point enable FPEN is defined differently to the SVE control,
specified as a single define in kvm_arm.h rather than in sysreg.h. Move the
define to sysreg.h and provide separate EL0 and EL1 control bits so code
managing the different floating point enables can look consistent.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220207152109.197566-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 879358fc67)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I622d3bee721ed51bd776324493757e2823465c62
The vector length configuration for SME is very similar to that for SVE
so in order to allow reuse refactor the SVE configuration so that it takes
the vector type from the struct ctl_table. Since there's no dedicated space
for this we repurpose the extra1 field to store the vector type, this is
otherwise unused for integer sysctls.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 97bcbee404)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I5f29b5c91e812fb723b3fd84fc681ccb7bbd87ef
As for SVE we will track a per task SME vector length for tasks. Convert
the existing storage for the vector length into an array and update
fpsimd_flush_task() to initialise this in a function.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-10-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 5838a15579)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I6cd068a6cb7508b84e3f202750569a20c5f2bd77
Currently when restoring the SVE state we supply the SVE vector length
as an argument to sve_load_state() and the underlying macros. This becomes
inconvenient with the addition of SME since we may need to restore any
combination of SVE and SME vector lengths, and we already separately
restore the vector length in the KVM code. We don't need to know the vector
length during the actual register load since the SME load instructions can
index into the data array for us.
Refactor the interface so we explicitly set the vector length separately
to restoring the SVE registers in preparation for adding SME support, no
functional change should be involved.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-9-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit ddc806b5c4)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I6b546ddda30845a1296084e8a5867e2a2c30f1b6
With the introduction of SME we will have a second vector length in the
system, enumerated and configured in a very similar fashion to the
existing SVE vector length. While there are a few differences in how
things are handled this is a relatively small portion of the overall
code so in order to avoid code duplication we factor out
We create two structs, one vl_info for the static hardware properties
and one vl_config for the runtime configuration, with an array
instantiated for each and update all the users to reference these. Some
accessor functions are provided where helpful for readability, and the
write to set the vector length is put into a function since the system
register being updated needs to be chosen at compile time.
This is a mostly mechanical replacement, further work will be required
to actually make things generic, ensuring that we handle those places
where there are differences properly.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-8-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit b5bc00ffdd)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ief5d1a8373f5eedcc75c3435b103383fac1f77f2
In a system with SME there are parallel vector length controls for SVE and
SME vectors which function in much the same way so it is desirable to
share the code for handling them as much as possible. In order to prepare
for doing this add a layer of accessor functions for the various VL related
operations on tasks.
Since almost all current interactions are actually via task->thread rather
than directly with the thread_info the accessors use that. Accessors are
provided for both generic and SVE specific usage, the generic accessors
should be used for cases where register state is being manipulated since
the registers are shared between streaming and regular SVE so we know that
when SME support is implemented we will always have to be in the appropriate
mode already and hence can generalise now.
Since we are using task_struct and we don't want to cause widespread
inclusion of sched.h the acessors are all out of line, it is hoped that
none of the uses are in a sufficiently critical path for this to be an
issue. Those that are most likely to present an issue are in the same
translation unit so hopefully the compiler may be able to inline anyway.
This is purely adding the layer of abstraction, additional work will be
needed to support tasks using SME.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-7-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 0423eedcf4)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I08f2a81d1082f1ebd128fe1cda82883e38f3cd39
SME introduces streaming SVE mode in which FFR is not present and the
instructions for accessing it UNDEF. In preparation for handling this
update the low level SVE state access functions to take a flag specifying
if FFR should be handled. When saving the register state we store a zero
for FFR to guard against uninitialized data being read. No behaviour change
should be introduced by this patch.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-5-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 9f58486657)
[willdeacon@: Dropped hunk from removed __sve_save_state() function]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I105e81a3283bdaeb06291dc9f249998613a39cc3
Following optimisations of the SVE register handling we no longer load the
SVE state from a saved copy of the FPSIMD registers, we convert directly
in registers or from one saved state to another. Remove the function so we
don't need to update it during further refactoring.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit b53223e0a4)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ifa825d452d32d0721e42d58fb615a10c0ef705c0
QARMA3 is relaxed version of the QARMA5 algorithm which expected to
reduce the latency of calculation while still delivering a suitable
level of security.
Support for QARMA3 can be discovered via ID_AA64ISAR2_EL1
APA3, bits [15:12] Indicates whether the QARMA3 algorithm is
implemented in the PE for address
authentication in AArch64 state.
GPA3, bits [11:8] Indicates whether the QARMA3 algorithm is
implemented in the PE for generic code
authentication in AArch64 state.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220224124952.119612-4-vladimir.murzin@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit def8c222f0)
[willdeacon@: Fix context sysreg/cpufeature context conflicts due to CLEARBHB
mitigation backport]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I7829af89cc97efe7ef2c8763c8fee8fffe54b7c9
In case, both boot_val and sec_val have value below min_field_value we
would wrongly report that address authentication is supported. It is
not a big issue because we enable address authentication based on boot
cpu (and check there is correct).
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220224124952.119612-2-vladimir.murzin@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit da844beb6d)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ib03730d7573dc9bea4732eaf930912441439c13c
When adding support for the slightly wonky Apple M1, we had to
populate ID_AA64PFR0_EL1.GIC==1 to present something to the guest,
as the HW itself doesn't advertise the feature.
However, we gated this on the in-kernel irqchip being created.
This causes some trouble for QEMU, which snapshots the state of
the registers before creating a virtual GIC, and then tries to
restore these registers once the GIC has been created. Obviously,
between the two stages, ID_AA64PFR0_EL1.GIC has changed value,
and the write fails.
The fix is to actually emulate the HW, and always populate the
field if the HW is capable of it.
Fixes: 562e530fd7 ("KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3")
Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Oliver Upton <oupton@google.com>
Link: https://lore.kernel.org/r/20220503211424.3375263-1-maz@kernel.org
(cherry picked from commit 5163373af1)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I6888e25ed0ac5b96727ae1f8f728bf523aac5650
When taking a translation fault for an IPA that is outside of
the range defined by the hypervisor (between the HW PARange and
the IPA range), we stupidly treat it as an IO and forward the access
to userspace. Of course, userspace can't do much with it, and things
end badly.
Arguably, the guest is braindead, but we should at least catch the
case and inject an exception.
Check the faulting IPA against:
- the sanitised PARange: inject an address size fault
- the IPA size: inject an abort
Reported-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 85ea6b1ec9)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Iaad029b26f5ae91ed6ff06db53f23107beffc573
kvm->arch.arm_pmu is set when userspace attempts to set the first PMU
attribute. As certain attributes are mandatory, arm_pmu ends up always
being set to a valid arm_pmu, otherwise KVM will refuse to run the VCPU.
However, this only happens if the VCPU has the PMU feature. If the VCPU
doesn't have the feature bit set, kvm->arch.arm_pmu will be left
uninitialized and equal to NULL.
KVM doesn't do ID register emulation for 32-bit guests and accesses to the
PMU registers aren't gated by the pmu_visibility() function. This is done
to prevent injecting unexpected undefined exceptions in guests which have
detected the presence of a hardware PMU. But even though the VCPU feature
is missing, KVM still attempts to emulate certain aspects of the PMU when
PMU registers are accessed. This leads to a NULL pointer dereference like
this one, which happens on an odroid-c4 board when running the
kvm-unit-tests pmu-cycle-counter test with kvmtool and without the PMU
feature being set:
[ 454.402699] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000150
[ 454.405865] Mem abort info:
[ 454.408596] ESR = 0x96000004
[ 454.411638] EC = 0x25: DABT (current EL), IL = 32 bits
[ 454.416901] SET = 0, FnV = 0
[ 454.419909] EA = 0, S1PTW = 0
[ 454.423010] FSC = 0x04: level 0 translation fault
[ 454.427841] Data abort info:
[ 454.430687] ISV = 0, ISS = 0x00000004
[ 454.434484] CM = 0, WnR = 0
[ 454.437404] user pgtable: 4k pages, 48-bit VAs, pgdp=000000000c924000
[ 454.443800] [0000000000000150] pgd=0000000000000000, p4d=0000000000000000
[ 454.450528] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 454.456036] Modules linked in:
[ 454.459053] CPU: 1 PID: 267 Comm: kvm-vcpu-0 Not tainted 5.18.0-rc4 #113
[ 454.465697] Hardware name: Hardkernel ODROID-C4 (DT)
[ 454.470612] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 454.477512] pc : kvm_pmu_event_mask.isra.0+0x14/0x74
[ 454.482427] lr : kvm_pmu_set_counter_event_type+0x2c/0x80
[ 454.487775] sp : ffff80000a9839c0
[ 454.491050] x29: ffff80000a9839c0 x28: ffff000000a83a00 x27: 0000000000000000
[ 454.498127] x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000a510000
[ 454.505198] x23: ffff000000a83a00 x22: ffff000003b01000 x21: 0000000000000000
[ 454.512271] x20: 000000000000001f x19: 00000000000003ff x18: 0000000000000000
[ 454.519343] x17: 000000008003fe98 x16: 0000000000000000 x15: 0000000000000000
[ 454.526416] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 454.533489] x11: 000000008003fdbc x10: 0000000000009d20 x9 : 000000000000001b
[ 454.540561] x8 : 0000000000000000 x7 : 0000000000000d00 x6 : 0000000000009d00
[ 454.547633] x5 : 0000000000000037 x4 : 0000000000009d00 x3 : 0d09000000000000
[ 454.554705] x2 : 000000000000001f x1 : 0000000000000000 x0 : 0000000000000000
[ 454.561779] Call trace:
[ 454.564191] kvm_pmu_event_mask.isra.0+0x14/0x74
[ 454.568764] kvm_pmu_set_counter_event_type+0x2c/0x80
[ 454.573766] access_pmu_evtyper+0x128/0x170
[ 454.577905] perform_access+0x34/0x80
[ 454.581527] kvm_handle_cp_32+0x13c/0x160
[ 454.585495] kvm_handle_cp15_32+0x1c/0x30
[ 454.589462] handle_exit+0x70/0x180
[ 454.592912] kvm_arch_vcpu_ioctl_run+0x1c4/0x5e0
[ 454.597485] kvm_vcpu_ioctl+0x23c/0x940
[ 454.601280] __arm64_sys_ioctl+0xa8/0xf0
[ 454.605160] invoke_syscall+0x48/0x114
[ 454.608869] el0_svc_common.constprop.0+0xd4/0xfc
[ 454.613527] do_el0_svc+0x28/0x90
[ 454.616803] el0_svc+0x34/0xb0
[ 454.619822] el0t_64_sync_handler+0xa4/0x130
[ 454.624049] el0t_64_sync+0x18c/0x190
[ 454.627675] Code: a9be7bfd 910003fd f9000bf3 52807ff3 (b9415001)
[ 454.633714] ---[ end trace 0000000000000000 ]---
In this particular case, Linux hasn't detected the presence of a hardware
PMU because the PMU node is missing from the DTB, so userspace would have
been unable to set the VCPU PMU feature even if it attempted it. What
happens is that the 32-bit guest reads ID_DFR0, which advertises the
presence of the PMU, and when it tries to program a counter, it triggers
the NULL pointer dereference because kvm->arch.arm_pmu is NULL.
kvm-arch.arm_pmu was introduced by commit 46b1878214 ("KVM: arm64:
Keep a per-VM pointer to the default PMU"). Until that commit, this
error would be triggered instead:
[ 73.388140] ------------[ cut here ]------------
[ 73.388189] Unknown PMU version 0
[ 73.390420] WARNING: CPU: 1 PID: 264 at arch/arm64/kvm/pmu-emul.c:36 kvm_pmu_event_mask.isra.0+0x6c/0x74
[ 73.399821] Modules linked in:
[ 73.402835] CPU: 1 PID: 264 Comm: kvm-vcpu-0 Not tainted 5.17.0 #114
[ 73.409132] Hardware name: Hardkernel ODROID-C4 (DT)
[ 73.414048] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 73.420948] pc : kvm_pmu_event_mask.isra.0+0x6c/0x74
[ 73.425863] lr : kvm_pmu_event_mask.isra.0+0x6c/0x74
[ 73.430779] sp : ffff80000a8db9b0
[ 73.434055] x29: ffff80000a8db9b0 x28: ffff000000dbaac0 x27: 0000000000000000
[ 73.441131] x26: ffff000000dbaac0 x25: 00000000c600000d x24: 0000000000180720
[ 73.448203] x23: ffff800009ffbe10 x22: ffff00000b612000 x21: 0000000000000000
[ 73.455276] x20: 000000000000001f x19: 0000000000000000 x18: ffffffffffffffff
[ 73.462348] x17: 000000008003fe98 x16: 0000000000000000 x15: 0720072007200720
[ 73.469420] x14: 0720072007200720 x13: ffff800009d32488 x12: 00000000000004e6
[ 73.476493] x11: 00000000000001a2 x10: ffff800009d32488 x9 : ffff800009d32488
[ 73.483565] x8 : 00000000ffffefff x7 : ffff800009d8a488 x6 : ffff800009d8a488
[ 73.490638] x5 : ffff0000f461a9d8 x4 : 0000000000000000 x3 : 0000000000000001
[ 73.497710] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000000dbaac0
[ 73.504784] Call trace:
[ 73.507195] kvm_pmu_event_mask.isra.0+0x6c/0x74
[ 73.511768] kvm_pmu_set_counter_event_type+0x2c/0x80
[ 73.516770] access_pmu_evtyper+0x128/0x16c
[ 73.520910] perform_access+0x34/0x80
[ 73.524532] kvm_handle_cp_32+0x13c/0x160
[ 73.528500] kvm_handle_cp15_32+0x1c/0x30
[ 73.532467] handle_exit+0x70/0x180
[ 73.535917] kvm_arch_vcpu_ioctl_run+0x20c/0x6e0
[ 73.540489] kvm_vcpu_ioctl+0x2b8/0x9e0
[ 73.544283] __arm64_sys_ioctl+0xa8/0xf0
[ 73.548165] invoke_syscall+0x48/0x114
[ 73.551874] el0_svc_common.constprop.0+0xd4/0xfc
[ 73.556531] do_el0_svc+0x28/0x90
[ 73.559808] el0_svc+0x28/0x80
[ 73.562826] el0t_64_sync_handler+0xa4/0x130
[ 73.567054] el0t_64_sync+0x1a0/0x1a4
[ 73.570676] ---[ end trace 0000000000000000 ]---
[ 73.575382] kvm: pmu event creation failed -2
The root cause remains the same: kvm->arch.pmuver was never set to
something sensible because the VCPU feature itself was never set.
The odroid-c4 is somewhat of a special case, because Linux doesn't probe
the PMU. But the above errors can easily be reproduced on any hardware,
with or without a PMU driver, as long as userspace doesn't set the PMU
feature.
Work around the fact that KVM advertises a PMU even when the VCPU feature
is not set by gating all PMU emulation on the feature. The guest can still
access the registers without KVM injecting an undefined exception.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425145530.723858-1-alexandru.elisei@arm.com
(cherry picked from commit 8f6379e207)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I6840d8312a60673344e08ab99527d0a2106384d1
When pKVM is enabled, host memory accesses are translated by an identity
mapping at stage-2, which is populated lazily in response to synchronous
exceptions from 64-bit EL1 and EL0.
Extend this handling to cover exceptions originating from 32-bit EL0 as
well. Although these are very unlikely to occur in practice, as the
kernel typically ensures that user pages are initialised before mapping
them in, drivers could still map previously untouched device pages into
userspace and expect things to work rather than panic the system.
Cc: Quentin Perret <qperret@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220427171332.13635-1-will@kernel.org
(cherry picked from commit 2a50fc5fd0)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ibc6ffba7e8b01f940dbe39aefee10be1a7135b5e
Unfortunately, there is no guarantee that KVM was able to instantiate a
debugfs directory for a particular VM. To that end, KVM shouldn't even
attempt to create new debugfs files in this case. If the specified
parent dentry is NULL, debugfs_create_file() will instantiate files at
the root of debugfs.
For arm64, it is possible to create the vgic-state file outside of a
VM directory, the file is not cleaned up when a VM is destroyed.
Nonetheless, the corresponding struct kvm is freed when the VM is
destroyed.
Nip the problem in the bud for all possible errant debugfs file
creations by initializing kvm->debugfs_dentry to -ENOENT. In so doing,
debugfs_create_file() will fail instead of creating the file in the root
directory.
Cc: stable@kernel.org
Fixes: 929f45e324 ("kvm: no need to check return value of debugfs_create functions")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220406235615.1447180-2-oupton@google.com
(cherry picked from commit a44a4cc1c9)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I1926ef85a8efb3ff6f4490454f5ce336c1d4e443
KVM allows userspace to configure either all EL1 32bit or 64bit vCPUs
for a guest. At vCPU reset, vcpu_allowed_register_width() checks
if the vcpu's register width is consistent with all other vCPUs'.
Since the checking is done even against vCPUs that are not initialized
(KVM_ARM_VCPU_INIT has not been done) yet, the uninitialized vCPUs
are erroneously treated as 64bit vCPU, which causes the function to
incorrectly detect a mixed-width VM.
Introduce KVM_ARCH_FLAG_EL1_32BIT and KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED
bits for kvm->arch.flags. A value of the EL1_32BIT bit indicates that
the guest needs to be configured with all 32bit or 64bit vCPUs, and
a value of the REG_WIDTH_CONFIGURED bit indicates if a value of the
EL1_32BIT bit is valid (already set up). Values in those bits are set at
the first KVM_ARM_VCPU_INIT for the guest based on KVM_ARM_VCPU_EL1_32BIT
configuration for the vCPU.
Check vcpu's register width against those new bits at the vcpu's
KVM_ARM_VCPU_INIT (instead of against other vCPUs' register width).
Fixes: 66e94d5caf ("KVM: arm64: Prevent mixed-width VM creation")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220329031924.619453-2-reijiw@google.com
(cherry picked from commit 26bf74bd9f)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ife1dd629e091b9bf8ee62d1c3a998f363e3fd05d