Expose a new capability, KVM_CAP_ARM_PROTECTED_VM, for protected VMs
which allows the size of the PVM firmware region to be discovered from
userspace and for the firmware load address to be specified if it is
required.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I819b9b2cfa227f1a0607a8f683aa01d4ae50704f
Signed-off-by: Quentin Perret <qperret@google.com>
When a PVM firmware image is present for a protected VM, treat the first
running vCPU as the "primary" vCPU and reset its registers accordingly,
in particular by initialising its PC to enter the firmware at startup.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I26676637145c7d809c5dc5ac0ad0e1fadaf275d2
Signed-off-by: Quentin Perret <qperret@google.com>
When the host donates a page to a protected guest at an IPA which
coincides with the PVM firmware load address, copy-in the relevant
firmware page after unmapping it from the host but before mapping it
into the guest.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I8cec813fa52938945f3122655deb785523a96ec8
Signed-off-by: Quentin Perret <qperret@google.com>
When the host shuts down cleanly under pKVM, it is EL2's responsibility
to clear the pvmfw pages before forwarding the PSCI call onto EL3.
Wipe the pvmfw pages on SYSTEM_OFF, SYSTEM_RESET and SYSTEM_RESET2 calls
from the host, cleaning the zeroed memory to the PoC for good measure.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I0dd2757e355f384813319034c6eed0fa2c2328c2
Signed-off-by: Quentin Perret <qperret@google.com>
kvm_flush_dcache_to_poc() converts its (start,len) parameters into
(start,end) parameters for dcache_clean_inval_poc(). This mostly works
out except for the case when 'len == 0', where dcache_clean_inval_poc()
will still issue cache maintenance for the cache line containing 'start'.
If 'start' is not mapped, then this can generate an unexpected fault.
In preparation for cleaning the pvmfw memory pages to the PoC on
system reset, tweak kvm_flush_dcache_to_poc() to act as a no-op when
the supplied length is 0 and avoid having to check for this corner case
in the caller.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: Idae2b22289398e941938821d1d3b3a5a1da3fd8f
Signed-off-by: Quentin Perret <qperret@google.com>
Unmap the PVM firmware memory from the pKVM host by transferring
ownership of the pages to the hypervisor when the host deprivileges
itself during boot.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I311642f543c0c73d0e0cf2ec051e8e2d9759c5d1
Signed-off-by: Quentin Perret <qperret@google.com>
Add support for a "linux,pkvm-guest-firmware-memory" reserved memory
region, which can be used to identify a firmware image for protected
VMs. If pKVM fails to initialise and a firmware region is advertised,
then the memory is cleared during boot.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: Ibfcc0ff00d4b8a42747452047856cb9ba8def4c4
Signed-off-by: Quentin Perret <qperret@google.com>
Add some initial documentation for the Protected KVM (pKVM) feature on
arm64, describing the user ABI for creating protected VMs as well as
their limitations.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I152af404f24b9aba3cc9be6acd8e26afcfa4b0a5
Signed-off-by: Quentin Perret <qperret@google.com>
Introduce a new virtual machine type, KVM_VM_TYPE_ARM_PROTECTED, which
specifies that the guest memory pages are to be unmapped from the host
stage-2 by the hypervisor.
Signed-off-by: Will Deacon <will@kernel.org>
[willdeacon@: Align KVM_VM_TYPE_ARM_PROTECTED value with android13 kernels]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iabcd03865aed4a41637597ac247897fd185bfc4d
Signed-off-by: Quentin Perret <qperret@google.com>
Extend our KVM "vendor" hypercalls to expose three new hypercalls to
protected guests for the purpose of opening and closing shared memory
windows with the host:
MEMINFO: Query the stage-2 page size (i.e. the minimum granule at
which memory can be shared)
MEM_SHARE: Share a page RWX with the host, faulting the page in if
necessary.
MEM_UNSHARE: Unshare a page with the host. Subsequent host accesses
to the page will result in a fault being injected by the
hypervisor.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I80fe8af0bc0b3a40460c5065eabe26b1d9f634f2
Signed-off-by: Quentin Perret <qperret@google.com>
The PTP hypercall documentation doesn't produce the best-looking table
when formatting in HTML as all of the return value definitions end up
on the same line.
Reformat the PTP hypercall documentation to follow the formatting used
by hypercalls.rst.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ic77cea5a621a9278d098afd80ef4c0e125760814
Signed-off-by: Quentin Perret <qperret@google.com>
KVM/arm64 makes use of the SMCCC "Vendor Specific Hypervisor Service
Call Range" to expose KVM-specific hypercalls to guests in a
discoverable and extensible fashion.
Document the existence of this interface and the discovery hypercall.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I5754589b1b695828eab7cb41c7aa6a0fb55ad273
Signed-off-by: Quentin Perret <qperret@google.com>
In preparation for describing the guest view of KVM/arm64 hypercalls in
hypercalls.rst, move the existing contents of the file concerning the
firmware pseudo-registers elsewhere.
Cc: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ie8931290b291c0ffd2f1f11265babe2475972868
Signed-off-by: Quentin Perret <qperret@google.com>
A guest that can only operate on private memory is pretty useless, as it
has no way to share buffers with the host for things like virtio.
Extend our memory protection mechanisms to support the sharing and
unsharing of guest pages from the guest to the host. For now, this
functionality is unused but will later be exposed to the guest via
hypercalls.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I6b0d6f63348f3a2a847acf4d7bb87bd6e9742af0
Signed-off-by: Quentin Perret <qperret@google.com>
Break-before-make (BBM) can be expensive, as transitioning via an
invalid mapping (i.e. the "break" step) requires the completion of TLB
invalidation and can also cause other agents to fault concurrently on
the invalid mapping.
Since BBM is not required when changing only the software bits of a PTE,
avoid the sequence in this case and just update the PTE directly.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I91ec043a75154fa2ca732f5269c6ae1bceea4a93
Signed-off-by: Quentin Perret <qperret@google.com>
Typically, TLB invalidation of guest stage-2 mappings using nVHE is
performed by a hypercall originating from the host. For the invalidation
instruction to be effective, therefore, __tlb_switch_to_{guest,host}()
swizzle the active stage-2 context around the TLBI instruction.
With guest-to-host memory sharing and unsharing hypercalls originating
from the guest under pKVM, there is now a need to support both guest
and host VMID invalidations issued from guest context.
Replace the __tlb_switch_to_{guest,host}() functions with a more general
{enter,exit}_vmid_context() implementation which supports being invoked
from guest context and acts as a no-op if the target context matches the
running context.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I92c6f48eb4c4b6286b930c2f0cda245bccc1927b
Signed-off-by: Quentin Perret <qperret@google.com>
The pKVM hypervisor will currently panic if the host tries to access
memory that it doesn't own (e.g. protected guest memory). Sadly, as
guest memory can still be mapped into the VMM's address space, userspace
can trivially crash the kernel/hypervisor by poking into guest memory.
To prevent this, inject the abort back in the host with S1PTW set in the
ESR, hence allowing the host to differentiate this abort from normal
userspace faults and inject a SIGSEGV cleanly.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I09ee54fbf4c202dc3ac2e1b5eea264d4dc84f613
Signed-off-by: Quentin Perret <qperret@google.com>
In order to simplify the injection of exceptions in the host in pkvm
context, let's factor out of enter_exception64() the code calculating
the exception offset from VBAR_EL1 and the cpsr.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I50a2510b59311717c6e17ea4e45fc634b4b43073
Signed-off-by: Quentin Perret <qperret@google.com>
Add a helper allowing to check when the pkvm static key is enabled to
ease the introduction of pkvm hooks in other parts of the code.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8995021768def73bd7636a84059bdc43fa7ab2fc
Signed-off-by: Quentin Perret <qperret@google.com>
Now that TLBI invalidation is handled entirely at EL2 for both protected
and non-protected guests when protected KVM has initialised, unplug the
unused TLBI hypercalls.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I50ad4cb930c43f88e00320e47b358613224dd1cc
Signed-off-by: Quentin Perret <qperret@google.com>
Factor out logic that resets a vcpu's core registers, including
additional PSCI handling. This code will be reused when resetting
VMs in protected mode.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I22468be1d382e05e39557e32ea09a023173dbf48
Signed-off-by: Quentin Perret <qperret@google.com>
Move some PSCI functions and macros to a shared header to be used
by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ibe84564f423cd0281f3dc33d9801b474fe8f2db9
Signed-off-by: Quentin Perret <qperret@google.com>
Move the macro defines of the pstate reset values to a shared
header to be used by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ib98734d2ced07a958427c6552a9c22d159b85ad1
Signed-off-by: Quentin Perret <qperret@google.com>
Rather than forwarding guest hypercalls back to the host for handling,
implement some basic handling at EL2 which will later be extending to
provide additional functionality such as PSCI.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I14613c416078818b25bb29ed8899d7b71f8c40cc
Signed-off-by: Quentin Perret <qperret@google.com>
When dealing with a guest with SVE enabled, make sure the host SVE
state is pinned at EL2 S1, and that the hypervisor vCPU state is
correctly initialised (and then unpinned on teardown).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ic4d0ef9a6124701026cd56f6725ab4737857ed5b
Signed-off-by: Quentin Perret <qperret@google.com>
Do not rely on the state of the vm as provided by the host, but
initialize it instead at EL2 to a known good and safe state.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8e0e9fd7cdf0b5b4d422260be06920d0550d5f91
Signed-off-by: Quentin Perret <qperret@google.com>
Move kvm_vcpu_enable_ptrauth() to a shared header to be used by
hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Idb90ae3228fc3acb1fe310227a4f606f47b026a5
Signed-off-by: Quentin Perret <qperret@google.com>
Protected vCPUs always run with a virtual counter offset of 0, so don't
bother trying to update it from the host.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I02a30687e36886aa5c97439874e3e4cf066fe6e7
Signed-off-by: Quentin Perret <qperret@google.com>
Since the world switch vgic code operates on the hypervisor data
structure, move the state back and forth between the host and
hypervisor vcpu.
This is currently limited to the VMCR and APR registers, but further
patches will deal with the rest of the state.
Note that some of the control settings (such as SRE) are always
set to the same value. This will eventually be moved to initialisation
time for the hypervisor structures.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8a3a9009ce3408fe06ea272504f4f71c3d47b7bf
Signed-off-by: Quentin Perret <qperret@google.com>
Introduce separate El2 entry/exit handlers for protected and
non-protected guests under pKVM and hook up the protected handlers to
expose the minimum amount of data to the host required for EL1 handling.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I6788edabb3849b661c05c4ce63ab17198f4ed1cd
Signed-off-by: Quentin Perret <qperret@google.com>
Instead of sharing memory with protected guests, which still leaves the
host with r/w access, donate the underlying pages so that they are
unmapped from the host stage-2.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I3e0d1d31877acf3978e82350ebbe92136919507c
Signed-off-by: Quentin Perret <qperret@google.com>
If a vcpu exits for a data abort with an invalid syndrome, the
expectations are that userspace has a chance to save the day if
it has requested to see such exits.
However, this is completely futile in the case of a protected VM,
as none of the state is available. In this particular case, inject
a data abort directly into the vcpu, consistent with what userspace
could do.
This also helps with pKVM, which discards all syndrome information when
forwarding data aborts that are not known to be MMIO.
Finally, hide the RETURN_NISV_IO_ABORT_TO_USER cap from userspace on
protected VMs, and document this tweak to the API.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ie081cf0b2fdd1ab374d479e3e355ab3cb536c960
Signed-off-by: Quentin Perret <qperret@google.com>
Advertise the system register GICv3 CPU interface to protected guests
as that is the only supported configuration under pKVM.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iea2aeaae7776424727f6833c21597b6236284796
Signed-off-by: Quentin Perret <qperret@google.com>
The values of the trapping registers for protected VMs should be
computed from the ground up, and not depend on potentially
preexisting values.
Moreover, non-protected VMs should not be restricted in protected
mode in the same manner as protected VMs.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I689c6d48e8ebb533a86b78ebd6e1a1416cb8729b
Signed-off-by: Quentin Perret <qperret@google.com>
Move the initialization of traps to the initialization of the
hyp vcpu, and remove the associated hypercall.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I2e79a6cb494d9a778b46e481206d5c8fde6890fe
Signed-off-by: Quentin Perret <qperret@google.com>
Create a framework for resetting protected VM system registers to
their architecturally defined reset values.
No functional change intended as these are not hooked in yet.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Id812d1bbe81c7c0a544aba91b35831f486c208ba
Signed-off-by: Quentin Perret <qperret@google.com>
Move the computation of the mpidr to its own function in a shared
header, as the computation will be used by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I48c36ebb430c3322a6991eeb391d617903525304
Signed-off-by: Quentin Perret <qperret@google.com>
Return an error (-EINVAL) if trying to enable MTE on a protected
vm.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I036282854169a341253869d67a3e55e6cec8f040
Signed-off-by: Quentin Perret <qperret@google.com>
Restrict protected VM capabilities based on the
fixed-configuration for protected VMs.
No functional change intended in current KVM-supported modes
(nVHE, VHE).
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I1df873d512754207decf9eedb50135ee2ae76b29
Signed-off-by: Quentin Perret <qperret@google.com>
Debug and trace are not currently supported for protected guests, so
trap accesses to the related registers and emulate them as RAZ/WI.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I910be963754c7d98e4f1270d05427e65d4c1b253
Signed-off-by: Quentin Perret <qperret@google.com>
When running with pKVM enabled, protected guests run with a fixed CPU
configuration and therefore features such as hardware debug and SVE are
unavailable and their state does not need to be copied from the host
structures on each flush operation. Although non-protected guests do
require the host and hyp structures to be kept in-sync with each
other, we can defer writing back to the host to an explicit sync
hypercall, rather than doing it after every vCPU run.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ia80ae7bf8e374a50fda4ed5637abdfb82bcf3715
Signed-off-by: Quentin Perret <qperret@google.com>
Implement lazy save/restore of the host FPSIMD register state at EL2.
This allows us to save/restore guest FPSIMD registers without involving
the host and means that we can avoid having to repopulate the hyp vCPU
register state on every flush.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I7e9827d7bf52656df69ece1844fc1b8bd7884175
Signed-off-by: Quentin Perret <qperret@google.com>
Now that the hypervisor is handling the guest state in protected
mode, it needs to be able to save the guest state.
This reverts commit e66425fc9b ("KVM: arm64: Remove unused
__sve_save_state").
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iada80e9355082e5576d016221fabc7d30ffde46b
Signed-off-by: Quentin Perret <qperret@google.com>
Rather than blindly copying the register state between the hyp and host
vCPU structures, abstract this code into some helpers which are called
only for non-protected VMs running under pKVM. To faciliate host access
to guest registers within a get/put sequence, introduce a new
'sync_state' hypercall to provide access to the registers of a
non-protected VM when handling traps.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I5b0d874d2d2184c4da95a91c0b9b57af500cbce3
Signed-off-by: Quentin Perret <qperret@google.com>
Introduce per-EC entry/exit handlers at EL2 and provide initial
implementations to manage the 'flags' and fault information registers.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I402a48c77602da969fc04c393d0624d3b2f837df
Signed-off-by: Quentin Perret <qperret@google.com>
Guarantee that both TLBs and I-cache are private to each vcpu.
Flush the CPU context if a different vcpu from the same vm is
loaded on the same physical CPU.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I870e3994c3094b43e1cc6fcaebdd167ebe1de394
Signed-off-by: Quentin Perret <qperret@google.com>
Prevent the host from issuing arbitrary PC adjustments for protected
vCPUs.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I28815d1c6782abf2654ae3e931548014c842d760
Signed-off-by: Quentin Perret <qperret@google.com>