When a protected guest is torn down, the hypervisor currently
transitions the ownership of all the guest pages back to the host
without further intervention, which might leak guest secrets. To prevent
this, let's have the hypervisor zero the page before they can be
reclaimed by the host.
Signed-off-by: Quentin Perret <qperret@google.com>
[willdeacon@: Move pkvm_host_poison to mem_protect.h]
Bug: 209580772
Change-Id: Ib39de531284fef02c1cb84b83f0819f9e6f36f9b
Signed-off-by: Will Deacon <willdeacon@google.com>
Now that we have all the infrastructure in place to allow guest-to-host
sharing of pages in protected mode, let's stop sharing the pages from
the host on guest memory aborts and switch to a proper donation instead.
Signed-off-by Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Ib37625172e24950cd74913a20bff8ce29a72f45b
Signed-off-by: Will Deacon <willdeacon@google.com>
Expose MEM_SHARE and MEM_UNSHARE hypercalls to the KVM_CAP_EXIT_HYPERCALL
capability, allowing userspace (i.e. the VMM) to mprotect() its own
mapping of the pages based upon changes to the host permissions.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I95890595f8cc5493a5a67636bd22da3cc90a95fc
Signed-off-by: Will Deacon <willdeacon@google.com>
Expose the __pkvm_guest_{un,}share_host() functionality to protected
guests in the form of three new hypercalls in the KVM vendor service
range:
- HYP_MEMINFO: Query the size of the sharing granule (i.e. the stage-2
page size)
- MEM_SHARE: Share a page back with the host, granting RWX permission.
- MEM_UNSHARE: Remove host access to a page previously shared with
MEM_SHARE.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: Ie5a1f215058df6738e1d4f357497c82b8617c765
Signed-off-by: Will Deacon <willdeacon@google.com>
Advertise KVM vendor hypercalls (i.e. those hypercalls residing in the
"vendor specific" service range of the SMCCC specification and identified
with KVM's UID) to protected guests from EL2 so that memory sharing
hypercalls can later be probed and utilised without involving the host.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: Ic80c2aaeba236f0cbcc515d5787a1a4ad230d1d6
Signed-off-by: Will Deacon <willdeacon@google.com>
Introduce __pkvm_guest_unshare_host() to remove host access to a page
which was previously shared by protected guest.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I0a11a2458d6ff5bf1e9b8ece871ba1383ed4611d
Signed-off-by: Will Deacon <willdeacon@google.com>
In preparation for allowing a protected guest to share individual pages
back with the host for the purposes of things like virtio buffers,
introduce __pkvm_guest_share_host() to take care of the associated
page-table updates and permission checking.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I6279b565b89a961628628aa4b1b592fdd57696e4
Signed-off-by: Will Deacon <willdeacon@google.com>
Allow the VMM to hook into and handle a subset of guest hypercalls
advertised by the host. For now, no such hypercalls exist, and so the
new capability returns 0 when queried.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I684d5cd07864887377e91cc96041916d671b2b16
[willdeacon@: Leave other unsupported CAPs commented in uapi/linux/kvm.h]
Signed-off-by: Will Deacon <willdeacon@google.com>
Loading a vCPU concurrently on multiple physical CPUs is a recipe for
disaster. Introduce a per-vCPU flag to track whether or not it is loaded
and reject a load request for a vCPU which is already loaded.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Ic72db8a0462c23a3dc2af06bf0265b586729f989
Signed-off-by: Will Deacon <willdeacon@google.com>
Nothing currently prevents the host from tearing down a shadow VM while
a vCPU is loaded, which is likely to corrupt the hypervisor state. To
prevent this, refcount the shadow vm structs on vcpu_load() and
vcpu_put() and make sure to only allow tearing down a shadow VM when
it's refcount is 0.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I2860c3297516f8af6ff4a0d4c91127af4a34b62e
Signed-off-by: Will Deacon <willdeacon@google.com>
We currently track page-ownership in nVHE protected mode with a rather
coarse granularity -- all guests share a unique owner id. But a finer
grain tracking will be useful soon, to e.g. handle host stage-2 faults
caused by an access to guest memory. To prepare the ground for this,
let's use the guest VMIDs as owner ids, hence allowing to distinguish
between all of them. This only works since the pKVM EL2 hypervisor
guarantees the stability of the VMIDs for the entire lifetime of a guest
VM. This will need some rework when/if we attempt to run more than 255
guests concurrently in protected mode as we'll have to handle VMID
rollovers, but there is no clear need for now, so let's keep it simple
to start.
Signed-off-by: Quentin Perret <qperret@google.com>
[willdeacon@: Update constants in mem_protect.h]
Bug: 209580772
Change-Id: I5c5c8061617d7dc481ae5e25a0391b306aabbd8c
Signed-off-by: Will Deacon <willdeacon@google.com>
We will soon need more than 8 bits to encode all possible owner ids in
KVM protected mode. To prepare the ground for this, introduce a new type
for owner_ids, and make it a 32bits wide.
Signed-off-by: Quentin Perret <qperret@google.com>
[willdeacon@: Move IDs to header and fix S2MPU host_stage2_set_owner() callback]
Bug: 209580772
Change-Id: I37add42a2d7f34aa110c00fd9569d81db279d765
Signed-off-by: Will Deacon <willdeacon@google.com>
kvm_pgtable_stage2_set_owner() could be generalised into a way
to store up to 63 bits in the page tables, as long as we don't
set bit 0.
Let's just do that.
Signed-off-by: Marc Zyngier <maz@kernel.org>
[willdeacon@: Fix S2MPU conflict in host_stage2_set_owner_locked()]
Bug: 209580772
Change-Id: I4e42d149b457870c35a5ae0f77e14c95dee16b4d
Signed-off-by: Will Deacon <willdeacon@google.com>
Typically, TLB invalidation of guest stage-2 mappings using nVHE is
performed by a hypercall originating from the host. For the invalidation
instruction to be effective, therefore, __tlb_switch_to_{guest,host}()
swizzle the active stage-2 context around the TLBI instruction.
With guest-to-host memory sharing hypercalls originating from the guest
under pKVM, there is no need to change the context when invalidating the
TLB and restoring the host context is, in fact, harmful.
Check the currently running vCPU in __tlb_switch_to_{guest,host}() and
avoid switching the context if a vCPU is already loaded.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I4cfb36f0f88a2d50d50ea85a0d84e3e8191152a3
Signed-off-by: Will Deacon <willdeacon@google.com>
Push the memory pages used to store VM metadata in the teardown memcache
to let the host know they can be reclaimed.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Ie3b21c54093509a0cc04141bf4fc2d5feb126668
Signed-off-by: Will Deacon <willdeacon@google.com>
Now that EL2 is ready to manage guest page-tables in protected mode, use
the recently introduced hypercall to share pages with guests from the
memory abort path, instead of manipulating their page-tables directly.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I05ed8283d0eed19b2cfd6314cfcafbe3f689937c
Signed-off-by: Will Deacon <willdeacon@google.com>
Add a new hypercall for the nVHE protected mode allowing the host to
share one of its pages with a guest.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Ibfb86932e5da58c6b88448b49be6c1f994dbbd70
Signed-off-by: Will Deacon <willdeacon@google.com>
The memory pages donated by the host to the hypervisor for the creation
of guest page-tables is currently never reclaimed. Fix this returning
those pages back to the host during guest teardown using a hyp_memcache.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I90615c34e61a11ca402c19d016bd40e3dd880637
Signed-off-by: Will Deacon <willdeacon@google.com>
In preparation for managing the stage-2 page-table of guests at EL2 in
nVHE protected mode, allocate memory for the guest PGDs and populate the
shadow kvm_s2_mmu upon shadow creation.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I39f9dec9dc1bb60fe66ec6923f9b4dedc3e37f3f
Signed-off-by: Will Deacon <willdeacon@google.com>
We will soon need to temporarily map pages into the hypervisor stage-1
in nVHE protected mode. To do this efficiently, let's introduce a
per-cpu fixmap allowing to map a single page without needing to take any
lock or to allocate memory.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Id5cbfd04c5c465c9c577a5fac67458331096b581
Signed-off-by: Will Deacon <willdeacon@google.com>
__pkvm_create_private_mapping() is currently responsible for allocating
VA space in the hypervisor's "private" range and creating stage-1
mappings. In order to allow reusing the VA space allocation logic from
other places, let's factor it out in a standalone function.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I9cb82937931d272f9f61297a3f5dea0e039c2595
Signed-off-by: Will Deacon <willdeacon@google.com>
We currently fixup the hypervisor stage-1 refcount only for specific
portions of the hyp stage-1 VA space. In order to allow unmapping pages
outside of these ranges, let's fixup the refcount for the entire hyp VA
space.
Signed-off-by: Quentin Perret <qperret@google.com>
[willdeacon@: Fix context conflicts with IOMMU init in __pkvm_init_finalise()]
Bug: 209580772
Change-Id: I11d6374d9416bc97cf85fc33bb87c21369412c49
Signed-off-by: Will Deacon <willdeacon@google.com>
In preparation for re-using icache_inval_pou from the EL2 nVHE
hypervisor, annotate it as position-independent.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I66fbf42239a891bfd149c79bf77379fea124793e
Signed-off-by: Will Deacon <willdeacon@google.com>
Introduce helper functions easing the manipulation of a hyp_memcache at
EL2 in nVHE protected mode.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I3c1bda86eae67adfa4d0424e68aa5e842562b43c
Signed-off-by: Will Deacon <willdeacon@google.com>
Introduce helper functions easing the manipulation of a hyp_memcache
from the kernel.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I42f2718e059f0cb32b7007e390ba758ba68ddd0c
Signed-off-by: Will Deacon <willdeacon@google.com>
The host and hypervisor will need to dynamically exchange memory pages
soon. Indeed, the hypervisor will rely on the host to donate memory
pages it can use to create guest stage-2 page-table and to store
metadata. In order to ease this process, introduce a struct hyp_memcache
which is essentially a linked list of available pages, indexed by
physical addresses.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Iae46d01e4c0f75cc637adf50a7d0a879d85dff5e
Signed-off-by: Will Deacon <willdeacon@google.com>
The initialization of stage-2 page-tables of guests is currently split
across two functions: kvm_init_stage2_mmu() and kvm_arm_setup_stage2().
That is presumably for historical reasons as kvm_arm_setup_stage2()
originates from the (now defunct) KVM port for 32bit Arm.
Simplify this code path by merging both functions into one, and while at
it make sure to map the kvm struct into the hypervisor stage-1 early on
to simplify the failure path.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I1f9db0251f24e9712607f0c13c9f4017734d3b0f
Signed-off-by: Will Deacon <willdeacon@google.com>
All the contiguous pages used to initialize a hyp_pool are considered
coalesceable, which means that the hyp page allocator will actively
try to merge them with their buddies on the hyp_put_page() path.
However, using hyp_put_page() on a page that is not part of the inital
memory range given to a hyp_pool() is currently unsupported.
In order to allow dynamically extending hyp pools at run-time, add a
check to __hyp_attach_page() to allow inserting 'external' pages into
the free-list of order 0. This will be necessary to allow lazy
donation of pages from the host to the hypervisor when allocating guest
stage-2 page-table pages at EL2.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I3e3a6cb513f4fcc8b9683784c41fa6d3af119ac4
Signed-off-by: Will Deacon <willdeacon@google.com>
We will soon need to donate memory to the hypervisor in protected mode
when creating guest VMs. The amount of memory to be donated will depend on
the size of the (potentially concatenated) stage-2 PGD for that guest,
among other things. To prepare the ground for this donation, introduce a
helper in the KVM page-table library allowing to compute the size of a
stage-2 PGD given a VTCR.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I266f85d17415e72eb574afdb1d8b622d87b9fcc3
Signed-off-by: Will Deacon <willdeacon@google.com>
Break-before-make (BBM) can be expensive, as transitioning via an
invalid mapping (i.e. the "break" step) requires the completion of TLB
invalidation and can also cause other agents to fault concurrently on
the invalid mapping.
Since BBM is not required when changing only the software bits of a PTE,
avoid the sequence in this case and just update the PTE directly.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: Ib4858b2491b221490b74cb5100474acaad711483
Signed-off-by: Will Deacon <willdeacon@google.com>
Add PSCI 1.0 support for protected VMs. All mandatory functions
are supported, except for SYSTEM_RESET, because pKVM doesn't have
a way of resetting protected VMs yet.
Some VMMs issue a SYSTEM_RESET when tearing down a VM, therefore,
for now, we repaint SYSTEM_RESET calls as SYSTEM_OFF.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Ide47339dc4c0392b41f77e90c43ec805a0780d00
Signed-off-by: Will Deacon <willdeacon@google.com>
Move kvm_vcpu_enable_ptrauth() to a shared header to be used by
hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I9457bb7a16e807b6c59b4b3e121b6f4715a12cab
Signed-off-by: Will Deacon <willdeacon@google.com>
Move vcpu_read_sys_reg and vcpu_write_sys_reg to a shared header
to be used by hyp in protected mode.
Refactored as macros to avoid including kvm.h, which would have
been needed for struct vcpu.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I6b949d60b1cab6aa12978e2f9b775192701990c6
Signed-off-by: Will Deacon <willdeacon@google.com>
Move some PSCI functions and macros to a shared header to be used
by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I3057eadc9c2a71c2fee36575a8215c3119bd9e36
Signed-off-by: Will Deacon <willdeacon@google.com>
Instruction and data aborts taken from a guest are very different
beasts, especially as MMIO exits are only ever triggered by the latter.
Rework the shadow handlers so that instruction and data aborts are
handled independently, making the control flow and MMIO handling
considerably easier to reason about.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I8a0d6881a13154ceaedf4b51ff224256db2e7ebf
Signed-off-by: Will Deacon <willdeacon@google.com>
Rework the sysreg entry/exit handling for protected guests so that:
- The host view of the vCPU pstate is updated on exit, as this may be
required for emulation
- WRITE_ONCE() is used to update host vCPU members
- The esr_sys64_to_params() helper is used to decode the ESR
- A pending exception request from the host takes precedence over a PC
increment when setting the shadow vCPU flags
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: Ia9b546f86a8815d777b3ff4ee3b09426cba7ad71
Signed-off-by: Will Deacon <willdeacon@google.com>
Implement lazy save/restore of the FPSIMD and SVE registers for the host
so that the guest register state remains live for a vCPU with
KVM_ARM64_FP_ENABLED set and the state is only switched in response to a
trap from the host.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I6977740727964e297d1e658e8eaec4f0125cce48
Signed-off-by: Will Deacon <willdeacon@google.com>
Restrict protected VM capabilities based on the
fixed-configuration for protected VMs.
No functional change intended in current KVM-supported modes
(nVHE, VHE).
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Ic247aefac8b4046efaa69d24580f7c58aaeeb80d
Signed-off-by: Will Deacon <willdeacon@google.com>
In nVHE protected mode the hypervisor sometime needs to read or write to
host-provided data-structures, such as vcpu structs or the kvm struct.
To ensure that the hypervisor can't be tricked by the host into writing
to pages it doesn't own, let's pin the host pages containing those data
structures during the shadow vm creation. This will ensure those pages
remain in a host-shared state for the lifetime of the VM.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Id11bd6a86754b6a3e0c504b06940df310641357d
Signed-off-by: Will Deacon <willdeacon@google.com>
Introduce two new hypercalls for the nVHE protected mode allowing the
host to load and put the vCPU state, following the usual split between
vcpu_{load,put}() and vcpu_run() in KVM.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I34a19c96221c9a54ed142803b8f18f2c63511c57
Signed-off-by: Will Deacon <willdeacon@google.com>
Similar to the vgic state, make sure to sync and flush the virtual timer
state between the host and the hyp shadow vCPU structs when running in
nVHE protected mode.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: Ib90273bb670d9d815dd9f542369dde00753655cf
Signed-off-by: Will Deacon <willdeacon@google.com>
Now that protected VMs have a shadow state maintained at EL2 in nVHE
protected mode, make sure to sync and flush the vgic state between host
and hyp data structure upon entry and exit from a guest.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I8d7ab950041a5cd79217c9ee0e04742a27439a99
Signed-off-by: Will Deacon <willdeacon@google.com>
Merge the functions to save and restore vmcr and apr. This can in some
cases reduce the number of hypercalls necessary to load/put the vgic
state in nVHE and will also ease its management in protected mode later
on.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: Id85f0698a7a346282e55c15993c274828bd5309c
Signed-off-by: Will Deacon <willdeacon@google.com>
Move the initialization of traps to the initialization of the
shadow vcpu, and remove the associated hypercall.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I4bf45733972ac54c72c40b3ef1df32cfe7d04a70
Signed-off-by: Will Deacon <willdeacon@google.com>
Add handlers to exchange information between the
host and the protected guest on vcpu entry and exit, which
most often would happen on running a vcpu.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I1716f55f5a1cb75dcde26b58af8f78ee80e4a19e
Signed-off-by: Will Deacon <willdeacon@google.com>
Create and populate a shadow table that contains the state hyp
need for running protected VMs, i.e., struct kvm and struct
kvm_vcpu at EL2.
The memory for this is donated by the host and then unmapped from
the host at stage 1 and at stage 2 (by hyp).
This state is not used yet.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Ie2d948f2a5f22a06d615d909de7a60d46944e6d8
Signed-off-by: Will Deacon <willdeacon@google.com>
Create a framework for resetting protected VM system registers to
their architecturally defined reset values.
No functional change intended as these are not hooked in yet.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Iafdab9f796897429f0fb8abd5d7df9ca576e1f91
Signed-off-by: Will Deacon <willdeacon@google.com>
Move the computation of the mpidr to its own function in a shared
header, as the computation will be used by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I531795b43c9747dceea485843eed114675db9354
Signed-off-by: Will Deacon <willdeacon@google.com>
Move the macro defines of the pstate reset values to a shared
header to be used by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Iafd31108675027a799ce9ff3c5c56b49e87ead67
Signed-off-by: Will Deacon <willdeacon@google.com>
The values of the trapping registers for protected VMs should be
computed from the ground up, and not depend on potentially
preexisting values.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Iacd3916dd1bbfc8d9cc859f94a9d879e9d456ebc
Signed-off-by: Will Deacon <willdeacon@google.com>