The pKVM hypervisor will currently panic if the host tries to access
memory that it doesn't own (e.g. protected guest memory). Sadly, as
guest memory can still be mapped into the VMM's address space, userspace
can trivially crash the kernel/hypervisor by poking into guest memory.
To prevent this, inject the abort back in the host with S1PTW set in the
ESR, hence allowing the host to differentiate this abort from normal
userspace faults and inject a SIGSEGV cleanly.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I09ee54fbf4c202dc3ac2e1b5eea264d4dc84f613
In order to simplify the injection of exceptions in the host in pkvm
context, let's factor out of enter_exception64() the code calculating
the exception offset from VBAR_EL1 and the cpsr.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I50a2510b59311717c6e17ea4e45fc634b4b43073
Add a helper allowing to check when the pkvm static key is enabled to
ease the introduction of pkvm hooks in other parts of the code.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8995021768def73bd7636a84059bdc43fa7ab2fc
Now that TLBI invalidation is handled entirely at EL2 for both protected
and non-protected guests when protected KVM has initialised, unplug the
unused TLBI hypercalls.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I50ad4cb930c43f88e00320e47b358613224dd1cc
Factor out logic that resets a vcpu's core registers, including
additional PSCI handling. This code will be reused when resetting
VMs in protected mode.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I22468be1d382e05e39557e32ea09a023173dbf48
Move some PSCI functions and macros to a shared header to be used
by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ibe84564f423cd0281f3dc33d9801b474fe8f2db9
Move the macro defines of the pstate reset values to a shared
header to be used by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ib98734d2ced07a958427c6552a9c22d159b85ad1
Rather than forwarding guest hypercalls back to the host for handling,
implement some basic handling at EL2 which will later be extending to
provide additional functionality such as PSCI.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I14613c416078818b25bb29ed8899d7b71f8c40cc
When dealing with a guest with SVE enabled, make sure the host SVE
state is pinned at EL2 S1, and that the hypervisor vCPU state is
correctly initialised (and then unpinned on teardown).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ic4d0ef9a6124701026cd56f6725ab4737857ed5b
Do not rely on the state of the vm as provided by the host, but
initialize it instead at EL2 to a known good and safe state.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8e0e9fd7cdf0b5b4d422260be06920d0550d5f91
Move kvm_vcpu_enable_ptrauth() to a shared header to be used by
hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Idb90ae3228fc3acb1fe310227a4f606f47b026a5
Protected vCPUs always run with a virtual counter offset of 0, so don't
bother trying to update it from the host.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I02a30687e36886aa5c97439874e3e4cf066fe6e7
Since the world switch vgic code operates on the hypervisor data
structure, move the state back and forth between the host and
hypervisor vcpu.
This is currently limited to the VMCR and APR registers, but further
patches will deal with the rest of the state.
Note that some of the control settings (such as SRE) are always
set to the same value. This will eventually be moved to initialisation
time for the hypervisor structures.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8a3a9009ce3408fe06ea272504f4f71c3d47b7bf
Introduce separate El2 entry/exit handlers for protected and
non-protected guests under pKVM and hook up the protected handlers to
expose the minimum amount of data to the host required for EL1 handling.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I6788edabb3849b661c05c4ce63ab17198f4ed1cd
Instead of sharing memory with protected guests, which still leaves the
host with r/w access, donate the underlying pages so that they are
unmapped from the host stage-2.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I3e0d1d31877acf3978e82350ebbe92136919507c
If a vcpu exits for a data abort with an invalid syndrome, the
expectations are that userspace has a chance to save the day if
it has requested to see such exits.
However, this is completely futile in the case of a protected VM,
as none of the state is available. In this particular case, inject
a data abort directly into the vcpu, consistent with what userspace
could do.
This also helps with pKVM, which discards all syndrome information when
forwarding data aborts that are not known to be MMIO.
Finally, hide the RETURN_NISV_IO_ABORT_TO_USER cap from userspace on
protected VMs, and document this tweak to the API.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ie081cf0b2fdd1ab374d479e3e355ab3cb536c960
Advertise the system register GICv3 CPU interface to protected guests
as that is the only supported configuration under pKVM.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iea2aeaae7776424727f6833c21597b6236284796
The values of the trapping registers for protected VMs should be
computed from the ground up, and not depend on potentially
preexisting values.
Moreover, non-protected VMs should not be restricted in protected
mode in the same manner as protected VMs.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I689c6d48e8ebb533a86b78ebd6e1a1416cb8729b
Move the initialization of traps to the initialization of the
hyp vcpu, and remove the associated hypercall.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I2e79a6cb494d9a778b46e481206d5c8fde6890fe
Create a framework for resetting protected VM system registers to
their architecturally defined reset values.
No functional change intended as these are not hooked in yet.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Id812d1bbe81c7c0a544aba91b35831f486c208ba
Move the computation of the mpidr to its own function in a shared
header, as the computation will be used by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I48c36ebb430c3322a6991eeb391d617903525304
Return an error (-EINVAL) if trying to enable MTE on a protected
vm.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I036282854169a341253869d67a3e55e6cec8f040
Restrict protected VM capabilities based on the
fixed-configuration for protected VMs.
No functional change intended in current KVM-supported modes
(nVHE, VHE).
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I1df873d512754207decf9eedb50135ee2ae76b29
Debug and trace are not currently supported for protected guests, so
trap accesses to the related registers and emulate them as RAZ/WI.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I910be963754c7d98e4f1270d05427e65d4c1b253
When running with pKVM enabled, protected guests run with a fixed CPU
configuration and therefore features such as hardware debug and SVE are
unavailable and their state does not need to be copied from the host
structures on each flush operation. Although non-protected guests do
require the host and hyp structures to be kept in-sync with each
other, we can defer writing back to the host to an explicit sync
hypercall, rather than doing it after every vCPU run.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ia80ae7bf8e374a50fda4ed5637abdfb82bcf3715
Implement lazy save/restore of the host FPSIMD register state at EL2.
This allows us to save/restore guest FPSIMD registers without involving
the host and means that we can avoid having to repopulate the hyp vCPU
register state on every flush.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I7e9827d7bf52656df69ece1844fc1b8bd7884175
Now that the hypervisor is handling the guest state in protected
mode, it needs to be able to save the guest state.
This reverts commit e66425fc9b ("KVM: arm64: Remove unused
__sve_save_state").
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iada80e9355082e5576d016221fabc7d30ffde46b
Rather than blindly copying the register state between the hyp and host
vCPU structures, abstract this code into some helpers which are called
only for non-protected VMs running under pKVM. To faciliate host access
to guest registers within a get/put sequence, introduce a new
'sync_state' hypercall to provide access to the registers of a
non-protected VM when handling traps.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I5b0d874d2d2184c4da95a91c0b9b57af500cbce3
Introduce per-EC entry/exit handlers at EL2 and provide initial
implementations to manage the 'flags' and fault information registers.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I402a48c77602da969fc04c393d0624d3b2f837df
Guarantee that both TLBs and I-cache are private to each vcpu.
Flush the CPU context if a different vcpu from the same vm is
loaded on the same physical CPU.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I870e3994c3094b43e1cc6fcaebdd167ebe1de394
Prevent the host from issuing arbitrary PC adjustments for protected
vCPUs.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I28815d1c6782abf2654ae3e931548014c842d760
In order to be able to safely manipulate the loaded vCPU state,
add a helper that always return the vcpu as mapped in the EL2 S1
address space as well as the pointer to the hyp vCPU if it exists.
In case of failure, both pointers are returned as NULL values.
Convert handle___kvm_vcpu_run() over to the new helper.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I90ba58c0e73a0544878f6b8514e3f91a9f83083d
Rather than look-up the hyp vCPU on every run hypercall at EL2,
introduce a per-CPU 'loaded_hyp_vcpu' tracking variable which is updated
by a pair of load/put hypercalls called directly from
kvm_arch_vcpu_{load,put}() when pKVM is enabled.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ic640cb805d0f9610059713ff19918dcffc477d44
In preparation for save/restore of the timer state at EL2 for protected
VMs, introduce a couple of sync/flush primitives for the architected
timer, in much the same way as we have for the GIC.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I51fd848f12c71e2c6cb14d3db834a12f1a3226d8
In order to determine whether or not a VM or (hyp) vCPU are protected,
introduce a helper function to query this state. For now, these will
always return 'false' as the underlying field is never configured.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ib39d510d56b5d96d97526d725c7768d4fe5cf752
Rather than blindly copying the vGIC state to/from the host at EL2,
introduce a couple of helpers to copy only what is needed and to
sanitise untrusted data passed by the host kernel.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ibab19f638a7d0646c4d17ce5dbd2d3c0be474eac
Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore
hypercalls so that all of the EL2 GICv3 state is covered by a single pair
of hypercalls.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ifb109d1592a82d0858d5040482d5cf686f9e74e2
Allow vcpu_{read,write}_sys_reg() to be called from EL2 so that nVHE hyp
code can reuse existing helper functions for operations such as
resetting the vCPU state.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I5509ae1cc8d3fd9479fbe0b662bb62e31636eb77
In preparation for using some of the pKVM fixed configuration register
definitions to filter the available VM CAPs in the host, split the
nvhe/fixed_config.h header so that the definitions can be shared
with the host, while keeping the hypervisor function prototypes in
the nvhe/ namespace.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I33894868e7652f7b79caa91a007dccad997ef4ab
In preparation for supporting protected guests, where guest memory
defaults to being inaccessible to the host, extend our memory protection
mechanisms to support donation of pages from the host to a specific
guest.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ic397b6fd0f7b5f0911ddd8f457e40c6e6689673c
Now that EL2 is able to manage guest stage-2 page-tables, avoid
allocating a separate MMU structure in the host and instead introduce a
new fault handler which responds to guest stage-2 faults by sharing
GUP-pinned pages with the guest via a hypercall. These pages are
recovered (and unpinned) on guest teardown via the page reclaim
hypercall.
Signed-off-by: Will Deacon <will@kernel.org>
[willdeacon@: Dropped 'bkt' arg from kvm_for_each_memslot()]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ibbddc97cee322bf2db258b4f0848733e2efb1126
The current implementation of pKVM doesn't support dirty logging or
read-only memslots. Although support for these features is desirable,
this will require future work, so let's cleanly report the limitations
to userspace by failing the ioctls until then.
Signed-off-by: Quentin Perret <qperret@google.com>
[willdeacon@: Refer to 'memslot' arg to kvm_arch_prepare_memory_region()
instead of non-existent 'new']
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ifc434c234ac58b46a244fdd44114bc9a51f53e19
As the guest stage-2 page-tables will soon be managed entirely by EL2
when pKVM is enabled, guest memory will be pinned and the MMU notifiers
in the host will be unable to reconfigure mappings at EL2 other than
destrroying the guest and reclaiming all of the memory.
Forbid memslot move/delete operations for VMs that have run under pKVM,
returning -EPERM to userspace if such an operation is requested.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I73650c1ac79d8c116a3f31d17ef2a4ef1b30a844
Don't blindly assume that the PTE is valid when checking whether
it describes an executable or cacheable mapping.
This makes sure that we don't issue CMOs for invalid mappings.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I6cbcdb97033ec7b2ed2c9dce0cfc91491e573908
In preparation for handling guest stage-2 mappings at EL2, extend our
memory protection mechanisms to support sharing of pages from the host
to a specific guest.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8e1d7cf4db70ad55a29d935f60e6335fc83490eb
Implement a new hypercall, __pkvm_host_reclaim_page(), so that the host
at EL1 can reclaim pages that were previously donated to EL2. This
allows EL2 to defer clearing of guest memory on teardown and allows
preemption in the host after reclaiming each page.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ifbeafc5ed3e930307f9a9ae04d05ee06cb4451ac
In order to deal with PC updates (such as INCREMENT_PC and the
collection of flags that come with PENDING_EXCEPTION), add a single
mask that covers them all.
This will be used to manipulate these flags as a single entity.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Id24f79f482911efe3374abbead8a70e46cf12725