Do not rely on the state of the vm as provided by the host, but
initialize it instead at EL2 to a known good and safe state.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8e0e9fd7cdf0b5b4d422260be06920d0550d5f91
Move kvm_vcpu_enable_ptrauth() to a shared header to be used by
hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Idb90ae3228fc3acb1fe310227a4f606f47b026a5
Protected vCPUs always run with a virtual counter offset of 0, so don't
bother trying to update it from the host.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I02a30687e36886aa5c97439874e3e4cf066fe6e7
Since the world switch vgic code operates on the hypervisor data
structure, move the state back and forth between the host and
hypervisor vcpu.
This is currently limited to the VMCR and APR registers, but further
patches will deal with the rest of the state.
Note that some of the control settings (such as SRE) are always
set to the same value. This will eventually be moved to initialisation
time for the hypervisor structures.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8a3a9009ce3408fe06ea272504f4f71c3d47b7bf
Introduce separate El2 entry/exit handlers for protected and
non-protected guests under pKVM and hook up the protected handlers to
expose the minimum amount of data to the host required for EL1 handling.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I6788edabb3849b661c05c4ce63ab17198f4ed1cd
Instead of sharing memory with protected guests, which still leaves the
host with r/w access, donate the underlying pages so that they are
unmapped from the host stage-2.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I3e0d1d31877acf3978e82350ebbe92136919507c
If a vcpu exits for a data abort with an invalid syndrome, the
expectations are that userspace has a chance to save the day if
it has requested to see such exits.
However, this is completely futile in the case of a protected VM,
as none of the state is available. In this particular case, inject
a data abort directly into the vcpu, consistent with what userspace
could do.
This also helps with pKVM, which discards all syndrome information when
forwarding data aborts that are not known to be MMIO.
Finally, hide the RETURN_NISV_IO_ABORT_TO_USER cap from userspace on
protected VMs, and document this tweak to the API.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ie081cf0b2fdd1ab374d479e3e355ab3cb536c960
Advertise the system register GICv3 CPU interface to protected guests
as that is the only supported configuration under pKVM.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iea2aeaae7776424727f6833c21597b6236284796
The values of the trapping registers for protected VMs should be
computed from the ground up, and not depend on potentially
preexisting values.
Moreover, non-protected VMs should not be restricted in protected
mode in the same manner as protected VMs.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I689c6d48e8ebb533a86b78ebd6e1a1416cb8729b
Move the initialization of traps to the initialization of the
hyp vcpu, and remove the associated hypercall.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I2e79a6cb494d9a778b46e481206d5c8fde6890fe
Create a framework for resetting protected VM system registers to
their architecturally defined reset values.
No functional change intended as these are not hooked in yet.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Id812d1bbe81c7c0a544aba91b35831f486c208ba
Move the computation of the mpidr to its own function in a shared
header, as the computation will be used by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I48c36ebb430c3322a6991eeb391d617903525304
Return an error (-EINVAL) if trying to enable MTE on a protected
vm.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I036282854169a341253869d67a3e55e6cec8f040
Restrict protected VM capabilities based on the
fixed-configuration for protected VMs.
No functional change intended in current KVM-supported modes
(nVHE, VHE).
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I1df873d512754207decf9eedb50135ee2ae76b29
Debug and trace are not currently supported for protected guests, so
trap accesses to the related registers and emulate them as RAZ/WI.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I910be963754c7d98e4f1270d05427e65d4c1b253
When running with pKVM enabled, protected guests run with a fixed CPU
configuration and therefore features such as hardware debug and SVE are
unavailable and their state does not need to be copied from the host
structures on each flush operation. Although non-protected guests do
require the host and hyp structures to be kept in-sync with each
other, we can defer writing back to the host to an explicit sync
hypercall, rather than doing it after every vCPU run.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ia80ae7bf8e374a50fda4ed5637abdfb82bcf3715
Implement lazy save/restore of the host FPSIMD register state at EL2.
This allows us to save/restore guest FPSIMD registers without involving
the host and means that we can avoid having to repopulate the hyp vCPU
register state on every flush.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I7e9827d7bf52656df69ece1844fc1b8bd7884175
Now that the hypervisor is handling the guest state in protected
mode, it needs to be able to save the guest state.
This reverts commit e66425fc9b ("KVM: arm64: Remove unused
__sve_save_state").
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iada80e9355082e5576d016221fabc7d30ffde46b
Rather than blindly copying the register state between the hyp and host
vCPU structures, abstract this code into some helpers which are called
only for non-protected VMs running under pKVM. To faciliate host access
to guest registers within a get/put sequence, introduce a new
'sync_state' hypercall to provide access to the registers of a
non-protected VM when handling traps.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I5b0d874d2d2184c4da95a91c0b9b57af500cbce3
Introduce per-EC entry/exit handlers at EL2 and provide initial
implementations to manage the 'flags' and fault information registers.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I402a48c77602da969fc04c393d0624d3b2f837df
Guarantee that both TLBs and I-cache are private to each vcpu.
Flush the CPU context if a different vcpu from the same vm is
loaded on the same physical CPU.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I870e3994c3094b43e1cc6fcaebdd167ebe1de394
Prevent the host from issuing arbitrary PC adjustments for protected
vCPUs.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I28815d1c6782abf2654ae3e931548014c842d760
In order to be able to safely manipulate the loaded vCPU state,
add a helper that always return the vcpu as mapped in the EL2 S1
address space as well as the pointer to the hyp vCPU if it exists.
In case of failure, both pointers are returned as NULL values.
Convert handle___kvm_vcpu_run() over to the new helper.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I90ba58c0e73a0544878f6b8514e3f91a9f83083d
Rather than look-up the hyp vCPU on every run hypercall at EL2,
introduce a per-CPU 'loaded_hyp_vcpu' tracking variable which is updated
by a pair of load/put hypercalls called directly from
kvm_arch_vcpu_{load,put}() when pKVM is enabled.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ic640cb805d0f9610059713ff19918dcffc477d44
In preparation for save/restore of the timer state at EL2 for protected
VMs, introduce a couple of sync/flush primitives for the architected
timer, in much the same way as we have for the GIC.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I51fd848f12c71e2c6cb14d3db834a12f1a3226d8
In order to determine whether or not a VM or (hyp) vCPU are protected,
introduce a helper function to query this state. For now, these will
always return 'false' as the underlying field is never configured.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ib39d510d56b5d96d97526d725c7768d4fe5cf752
Rather than blindly copying the vGIC state to/from the host at EL2,
introduce a couple of helpers to copy only what is needed and to
sanitise untrusted data passed by the host kernel.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ibab19f638a7d0646c4d17ce5dbd2d3c0be474eac
Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore
hypercalls so that all of the EL2 GICv3 state is covered by a single pair
of hypercalls.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ifb109d1592a82d0858d5040482d5cf686f9e74e2
Allow vcpu_{read,write}_sys_reg() to be called from EL2 so that nVHE hyp
code can reuse existing helper functions for operations such as
resetting the vCPU state.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I5509ae1cc8d3fd9479fbe0b662bb62e31636eb77
In preparation for using some of the pKVM fixed configuration register
definitions to filter the available VM CAPs in the host, split the
nvhe/fixed_config.h header so that the definitions can be shared
with the host, while keeping the hypervisor function prototypes in
the nvhe/ namespace.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I33894868e7652f7b79caa91a007dccad997ef4ab
In preparation for supporting protected guests, where guest memory
defaults to being inaccessible to the host, extend our memory protection
mechanisms to support donation of pages from the host to a specific
guest.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ic397b6fd0f7b5f0911ddd8f457e40c6e6689673c
Now that EL2 is able to manage guest stage-2 page-tables, avoid
allocating a separate MMU structure in the host and instead introduce a
new fault handler which responds to guest stage-2 faults by sharing
GUP-pinned pages with the guest via a hypercall. These pages are
recovered (and unpinned) on guest teardown via the page reclaim
hypercall.
Signed-off-by: Will Deacon <will@kernel.org>
[willdeacon@: Dropped 'bkt' arg from kvm_for_each_memslot()]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ibbddc97cee322bf2db258b4f0848733e2efb1126
The current implementation of pKVM doesn't support dirty logging or
read-only memslots. Although support for these features is desirable,
this will require future work, so let's cleanly report the limitations
to userspace by failing the ioctls until then.
Signed-off-by: Quentin Perret <qperret@google.com>
[willdeacon@: Refer to 'memslot' arg to kvm_arch_prepare_memory_region()
instead of non-existent 'new']
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ifc434c234ac58b46a244fdd44114bc9a51f53e19
As the guest stage-2 page-tables will soon be managed entirely by EL2
when pKVM is enabled, guest memory will be pinned and the MMU notifiers
in the host will be unable to reconfigure mappings at EL2 other than
destrroying the guest and reclaiming all of the memory.
Forbid memslot move/delete operations for VMs that have run under pKVM,
returning -EPERM to userspace if such an operation is requested.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I73650c1ac79d8c116a3f31d17ef2a4ef1b30a844
Don't blindly assume that the PTE is valid when checking whether
it describes an executable or cacheable mapping.
This makes sure that we don't issue CMOs for invalid mappings.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I6cbcdb97033ec7b2ed2c9dce0cfc91491e573908
In preparation for handling guest stage-2 mappings at EL2, extend our
memory protection mechanisms to support sharing of pages from the host
to a specific guest.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8e1d7cf4db70ad55a29d935f60e6335fc83490eb
Implement a new hypercall, __pkvm_host_reclaim_page(), so that the host
at EL1 can reclaim pages that were previously donated to EL2. This
allows EL2 to defer clearing of guest memory on teardown and allows
preemption in the host after reclaiming each page.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ifbeafc5ed3e930307f9a9ae04d05ee06cb4451ac
In order to deal with PC updates (such as INCREMENT_PC and the
collection of flags that come with PENDING_EXCEPTION), add a single
mask that covers them all.
This will be used to manipulate these flags as a single entity.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Id24f79f482911efe3374abbead8a70e46cf12725
Contrary to vanilla KVM, pKVM not only deals with flags in a vcpu,
but also synchronises them across host and hypervisor views of the same
vcpu.
Most of the time, this is about copying flags from one vcpu structure
to another, so let's offer a primitive that does this.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Icd67b617c1cd69706ccd99739756458864b422bb
In preparation for poisoning guest memory pages in the pKVM hypervisor
when being reclaimed by the host, introduce a new 'flags' field in
'struct hyp_page' so that we will be able to track on a per-page basis
whether or not poisoning is required.
Rather than increase the total size of the structure, shrink the 16-bit
'order' field to a single byte and use the recovered space for the new
field.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8eb1f7ed8da0374b878a315eb1a6f867d0f379a9
As a stepping stone towards deprivileging the host's access to the
guest's vCPU structures, introduce some naive flush/sync routines to
copy most of the host vCPU into the hyp vCPU on vCPU run and back
again on return to EL1.
This allows us to run using the pKVM hyp structures when KVM is
initialised in protected mode.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-26-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iaf1c07cbf58eaff8a2968e9dc6457d36dcef83cf
We no longer need to map the host's '.rodata' and '.bss' sections in the
stage-1 page-table of the pKVM hypervisor at EL2, so remove those
mappings and avoid creating any future dependencies at EL2 on
host-controlled data structures.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-25-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iddb819bc1ef4006b6ed7490476f28f0e880e1d8c
The pkvm hypervisor at EL2 may need to read the 'kvm_vgic_global_state'
variable from the host, for example when saving and restoring the state
of the virtual GIC.
Explicitly map 'kvm_vgic_global_state' in the stage-1 page-table of the
pKVM hypervisor rather than relying on mapping all of the host '.rodata'
section.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-24-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I13a190d4164a4e0fd68dd5ec88ab5647dd4e73fc
Sharing 'kvm_arm_vmid_bits' between EL1 and EL2 allows the host to
modify the variable arbitrarily, potentially leading to all sorts of
shenanians as this is used to configure the VTTBR register for the
guest stage-2.
In preparation for unmapping host sections entirely from EL2, maintain
a copy of 'kvm_arm_vmid_bits' in the pKVM hypervisor and initialise it
from the host value while it is still trusted.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-23-will@kernel.org
[willdeacon@: Fixed context conflict with CPU cap symbols in image-vars.h]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I613e7c0ef747324e73cf4d4f543354a3641c7505
When pKVM is enabled, the hypervisor at EL2 does not trust the host at
EL1 and must therefore prevent it from having unrestricted access to
internal hypervisor state.
The 'kvm_arm_hyp_percpu_base' array holds the offsets for hypervisor
per-cpu allocations, so move this this into the nVHE code where it
cannot be modified by the untrusted host at EL1.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-22-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8d67b2905ac97e15f4252d45c36b97e53d3072ce
Rather than relying on the host to free the previously-donated pKVM
hypervisor VM pages explicitly on teardown, introduce a dedicated
teardown memcache which allows the host to reclaim guest memory
resources without having to keep track of all of the allocations made by
the pKVM hypervisor at EL2.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-21-will@kernel.org
[willdeacon@: Fix GCC compat error due to variable declaration in for loop
initializer prior to C99]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ib43b68f36fdaf5aac578f177ab8260c72acc6ed5
Extend the initialisation of guest data structures within the pKVM
hypervisor at EL2 so that we instantiate a memory pool and a full
'struct kvm_s2_mmu' structure for each VM, with a stage-2 page-table
entirely independent from the one managed by the host at EL1.
The 'struct kvm_pgtable_mm_ops' used by the page-table code is populated
with a set of callbacks that can manage guest pages in the hypervisor
without any direct intervention from the host, allocating page-table
pages from the provided pool and returning these to the host on VM
teardown. To keep things simple, the stage-2 MMU for the guest is
configured identically to the host stage-2 in the VTCR register and so
the IPA size of the guest must match the PA size of the host.
For now, the new page-table is unused as there is no way for the host
to map anything into it. Yet.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-20-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I2fc61f4f16d16b4786a50f4e2e4c5b06ed357b22
The initialisation of guest stage-2 page-tables is currently split
across two functions: kvm_init_stage2_mmu() and kvm_arm_setup_stage2().
That is presumably for historical reasons as kvm_arm_setup_stage2()
originates from the (now defunct) KVM port for 32-bit Arm.
Simplify this code path by merging both functions into one, taking care
to map the 'struct kvm' into the hypervisor stage-1 early on in order to
simplify the failure path.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-19-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I47433ed294fb5329b17b5e8b290e41b8e6d5b4a9