Add a pair of hooks (ioremap_phys_range_hook/iounmap_phys_range_hook)
that can be implemented by an architecture. Contrary to the existing
arch_sync_kernel_mappings(), this one tracks things at the physical
address level.
This is specially useful in these virtualised environments where
the guest has to tell the host whether (and how) it intends to use
a MMIO device.
Signed-off-by: Marc Zyngier <maz@kernel.org>
[willdeacon@: Hook ioremap_page_range() in mm/ioremap.c]
Bug: 209580772
Change-Id: I970c2e632cb2b01060d5e66e4194fa9248188f43
Signed-off-by: Will Deacon <willdeacon@google.com>
Document the hypercalls user for the MMIO guard infrastructure.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I927bcd6c5e3ef932265d817288ff2b46b0e0db66
Signed-off-by: Will Deacon <willdeacon@google.com>
Plumb the MMIO checking code into the MMIO fault handling code.
Any fault hitting outside of an MMIO region will now report
an invalid syndrome, and won't leak any data from the guest.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I68bef2d0211a804aa1e598aeaa0c85dc4098f61e
Signed-off-by: Will Deacon <willdeacon@google.com>
Plumb in the hypercall interface to allow a guest to discover,
enroll, map and unmap MMIO regions.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I0390456ffde8ceca351d3d8e82fd1dddeb747fac
Signed-off-by: Will Deacon <willdeacon@google.com>
Introduce the infrastructure required to identify an IPA region
that is expected to be used as an MMIO window.
This include mapping, unmapping and checking the regions. Nothing
calls into it yet, so no expected functional change.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I227eaa28b98e067e3daae4f9e1071eb37a6761cc
Signed-off-by: Will Deacon <willdeacon@google.com>
Add a per-VM flag indicating that the guest has bought into the
MMIO guard enforcement framework.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: If60b2b38a419a9f44ebe9029f55dd016fd2444b5
Signed-off-by: Will Deacon <willdeacon@google.com>
In order to simplify the implementation of an EL2-only version of
MMIO guard, expose topup_hyp_memcache() and simplify its usage
by only requiring a vcpu.
While we're at it, make free_hyp_memcache() visible in kvm_host.h
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I4f54c57a9693cf7a3450f99fedc15ae32af09a31
Signed-off-by: Will Deacon <willdeacon@google.com>
Define the handful of hypercalls that MMIO guard will require.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: Iac312b2327c31a1532fdb38e8fa8066291d9f611
Signed-off-by: Will Deacon <willdeacon@google.com>
Don't blindly assume that the PTE is valid when checking whether
it describes an executable or cacheable mapping.
This makes sure that we don't issue CMOs for invalid mappings.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I5b271c91aa6ceb23f7b1e6a571e30d080866d5c9
Signed-off-by: Will Deacon <willdeacon@google.com>
We currently deal with a set of booleans for VM features,
while they could be better represented as set of flags
contained in an unsigned long, similarily to what we are
doing on the CPU side.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I86be6bab12287c3eb21bbe03f255e2899edbdffb
Signed-off-by: Will Deacon <willdeacon@google.com>
Since we must still support the dreaded set/way CMOs for non-protected
VMs (as well as the equivalent operation when vcpus switch their MMU
on), perform an invalidation that will iterate over all the pages
that have been donated to the guest, one after the other.
This requires a minor change to the locking used for donation so
that all donated pages can be seen by a concurrent invalidation.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I1780127722bda7bdc884bb4e68db6ae47d042822
There is no difference between protected and non-protected guests
when it comes to shadow structures, and we want these shadow
structures to have the same life cycle.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I7e9bf366aae6bd0542d0038d24e2350a9dd23cd0
Signed-off-by: Will Deacon <willdeacon@google.com>
We want the host to handle everything as usual.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: Icf8ee146917e886bca258815cf948a1b12540353
Signed-off-by: Will Deacon <willdeacon@google.com>
Instead of donating memory to non-pVMs, share the memory, which
gives us a good enough approximation of the usual behaviour.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I47213754613110a6fb8157806eb96ddf92ead346
Signed-off-by: Will Deacon <willdeacon@google.com>
In order to deal with state synchronisation between EL1 and EL2,
we use the following setup:
- On exit from EL2, the state is forcefully marked clean.
- Should a trap be handled, the state is synchronised and immediately
marked dirty
- On vcpu_put(), the state is also marked dirty, since it can be
modified by userspace
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I47a889ca5432566f236de4630d81753348632f8a
Signed-off-by: Will Deacon <willdeacon@google.com>
In order for a non-protected guest to be functionnal, userspace
has to be able to query its state, which means that the host view
of the vcpu has to be kept up to date.
In order to achieve this, we establish the following scheme for EL2:
- On entering vcpu_run(), we check for the KVM_ARM64_PKVM_STATE_DIRTY
flag in the host vcpu. If set, we sync the state *from* the host
to the shadow version.
- On exiting vcpu_run(), we don't do anything, but let the host
issue a synch hypercall if required.
- On vcpu_put(), we force a synchronisation *to* the host.
The El1 host will have a complementary approach in the following
patches.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I42811a25d2e176d6c7d9a66ade6e9149a96e9256
Signed-off-by: Will Deacon <willdeacon@google.com>
A non-protected guest requires a lot less handling than a protected
one when dealing with entries/exits from/to EL2.
Since we already indiredct those, introduce new entry/exit tables
for non-pVMs.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I66602bc491a4a87d6482b12e4eaf7aa53a7dbfd9
Signed-off-by: Will Deacon <willdeacon@google.com>
As we're about to need to copy some state back and forth for
non0-protected guests, pass the full loaded state to the flush/sync
functions.
No functionnal change.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I7ad6a00a7500e91237fcc0981261c819b2224ee0
Signed-off-by: Will Deacon <willdeacon@google.com>
When pKVM is enabled, all the vcpus must have a shadow structure
managed by the hypervisor, irrespective of theur protection status.
This field thus represents the wrong abstraction. Replace it with
'pkvm_loaded_state.is_protected', which tracks whether a vcpu is
part of a protected VM.
pkvm_loaded_state gets also moved around for convenience with the
following patches.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: Ic9876fde543abb350fe8969d5b4661e30092f553
Signed-off-by: Will Deacon <willdeacon@google.com>
A number of KVM definitions are keyd on _KVM_NVHE_HYPERVISOR__
being defined or not. Make sure we advertise this #define when
compiling hyp-constants.o, so that we get the right stuff.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: Ied191c0a18274258cffede72b06b0fb5bba5604e
Signed-off-by: Will Deacon <willdeacon@google.com>
Instead of poking into the internals of the host KVM structure,
stick to the shadow structures when trying to work out whether
a vcpu is part of a protected VM or not.
Take this opportunity to sprinkle a couple of unlikely(), just
because.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I22a096e1e3cfe34cd2658684b02d8bac486416c4
Signed-off-by: Will Deacon <willdeacon@google.com>
As we can't really rely on the host side for the protection status,
snapshot the expected status at VM creation time.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I0943eadba25e6c9fe718f29e749b9fcc8fbb79ba
Signed-off-by: Will Deacon <willdeacon@google.com>
As KVM is moving to using an xarray to hold the vcpus instead of
the fixed size array that has been the norm so far, we are faced
with two options: either teach the EL2 code to parse an xarray
when building the shadow structures, or find an alternative way
of communicating the vcpus to the EL2 code.
An easy way to deal with the second approach is to use the page
that EL1 donates to HYP to hold the VM S2 PDG. Instead of just
giving the memory, let's copy the pointers to the vcpus in this
page. The overhead is acceptable (it happens only at VM creation
time), and in most cases we only have a handful of vcpus.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: Id0264f0960821563c4b3c0dfcbc43598b85a1f3b
Signed-off-by: Will Deacon <willdeacon@google.com>
There really isn't much point in keeping these separate.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I78b5c2d33bd4178415d51b2bccabfb5a7590d2c2
Signed-off-by: Will Deacon <willdeacon@google.com>
Move the vcpu memcache topup into its own helper, as we will
eventually need it for the MMIO guard page table updates
(which uses the exact same mechanism).
No functionnal change.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I72bac5e8be91acbb696a1428fc5cc6cc84d2df66
Signed-off-by: Will Deacon <willdeacon@google.com>
Expose a new capability, KVM_CAP_ARM_PROTECTED_VM, for protected VMs
which allows the size of the PVM firmware region to be discovered from
userspace and for the firmware load address to be specified if it is
required.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I819b9b2cfa227f1a0607a8f683aa01d4ae50704f
Signed-off-by: Will Deacon <willdeacon@google.com>
Introduce a new virtual machine type, KVM_VM_TYPE_ARM_PROTECTED, which
specifies that the guest memory pages are to be unmapped from the host
stage-2 by the hypervisor.
Signed-off-by: Will Deacon <will@kernel.org>
[willdeacon@: Changed UAPI constants to reduce change of upstream collisions]
Bug: 209580772
Change-Id: I9de1ad96fec4f62434a81101749435f8b0596162
Signed-off-by: Will Deacon <willdeacon@google.com>
When a PVM firmware image is present for a protected VM, treat the first
running vCPU as the "primary" vCPU and reset its registers accordingly,
in particular by initialising its PC to enter the firmware at startup.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I26676637145c7d809c5dc5ac0ad0e1fadaf275d2
Signed-off-by: Will Deacon <willdeacon@google.com>
When the host donates a page to a protected guest at an IPA which
coincides with the PVM firmware load address, copy-in the relevant
firmware page after unmapping it from the host but before mapping it
into the guest.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I8cec813fa52938945f3122655deb785523a96ec8
Signed-off-by: Will Deacon <willdeacon@google.com>
Unmap the PVM firmware memory from the pKVM host by transferring
ownership of the pages to the hypervisor when the host deprivileges
itself during boot.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I311642f543c0c73d0e0cf2ec051e8e2d9759c5d1
Signed-off-by: Will Deacon <willdeacon@google.com>
Add support for a "linux,pkvm-guest-firmware-memory" reserved memory
region, which can be used to identify a firmware image for protected
VMs. If pKVM fails to initialise and a firmware region is advertised,
then the memory is cleared during boot.
Signed-off-by: Will Deacon <will@kernel.org>
[willdeacon@: Include linux/io.h for memremap() and friends]
Bug: 209580772
Change-Id: Ibfcc0ff00d4b8a42747452047856cb9ba8def4c4
Signed-off-by: Will Deacon <willdeacon@google.com>
has_vhe() expands to a compile-time constant when evaluated from the VHE
or nVHE code, alternatively checking a static key when called from
elsewhere in the kernel. On face value, this looks like a case of
premature optimization, but in fact this allows symbol references on
VHE-specific code paths to be dropped from the nVHE object.
Expand the comment in has_vhe() to make this clearer, hopefully
discouraging anybody from simplifying the code.
Cc: David Brazdil <dbrazdil@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: Icce36e192cafa14d388cb1d0a24585b6fcf6e46e
Signed-off-by: Will Deacon <willdeacon@google.com>
Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode()
only returns KVM_MODE_PROTECTED on systems where the feature is available.
Cc: David Brazdil <dbrazdil@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I4854d299f4c5aebe7ed33020ac014b24154fdb52
Signed-off-by: Will Deacon <willdeacon@google.com>
Regardless of whether a given VM is protected or unprotected, when pKVM
is enabled we must create an EL2 shadow for each VM so that the stage-2
page-tables are managed by the hypervisor instead of the host.
Create the EL2 shadow for a VM on the first-run of a vCPU when a shadow
is not already present.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I3e0ec8907f46b17d74623b84a029a5b50aeaf14f
Signed-off-by: Will Deacon <willdeacon@google.com>
When running as a protected guest, the KVM host does not have access to
any pages mapped into the guest. Consequently, KVM exposes hypercalls to
the guest so that pages can be shared back with the host for the purposes
of shared memory communication such as virtio.
Detect the presence of these hypercalls when running as a guest and use
them to implement the memory encryption interfaces gated by
CONFIG_ARCH_HAS_MEM_ENCRYPT which are called from the DMA layer to share
SWIOTLB bounce buffers for virtio.
Although no encryption is actually performed, "sharing" a page is akin
to decryption, whereas "unsharing" a page maps to encryption, albeit
without destruction of the underlying page contents.
Signed-off-by: Will Deacon <will@kernel.org>
[willdeacon@: Use asm/mem_encrypt.h instead of asm/set_memory.h; Implement mem_encrypt_active()]
Bug: 209580772
Change-Id: I5955ff0dca65561183f9a60e94be87f28fbf14ec
Signed-off-by: Will Deacon <willdeacon@google.com>
When a protected guest is torn down, the hypervisor currently
transitions the ownership of all the guest pages back to the host
without further intervention, which might leak guest secrets. To prevent
this, let's have the hypervisor zero the page before they can be
reclaimed by the host.
Signed-off-by: Quentin Perret <qperret@google.com>
[willdeacon@: Move pkvm_host_poison to mem_protect.h]
Bug: 209580772
Change-Id: Ib39de531284fef02c1cb84b83f0819f9e6f36f9b
Signed-off-by: Will Deacon <willdeacon@google.com>
Now that we have all the infrastructure in place to allow guest-to-host
sharing of pages in protected mode, let's stop sharing the pages from
the host on guest memory aborts and switch to a proper donation instead.
Signed-off-by Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Ib37625172e24950cd74913a20bff8ce29a72f45b
Signed-off-by: Will Deacon <willdeacon@google.com>
Expose MEM_SHARE and MEM_UNSHARE hypercalls to the KVM_CAP_EXIT_HYPERCALL
capability, allowing userspace (i.e. the VMM) to mprotect() its own
mapping of the pages based upon changes to the host permissions.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I95890595f8cc5493a5a67636bd22da3cc90a95fc
Signed-off-by: Will Deacon <willdeacon@google.com>
Expose the __pkvm_guest_{un,}share_host() functionality to protected
guests in the form of three new hypercalls in the KVM vendor service
range:
- HYP_MEMINFO: Query the size of the sharing granule (i.e. the stage-2
page size)
- MEM_SHARE: Share a page back with the host, granting RWX permission.
- MEM_UNSHARE: Remove host access to a page previously shared with
MEM_SHARE.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: Ie5a1f215058df6738e1d4f357497c82b8617c765
Signed-off-by: Will Deacon <willdeacon@google.com>
Advertise KVM vendor hypercalls (i.e. those hypercalls residing in the
"vendor specific" service range of the SMCCC specification and identified
with KVM's UID) to protected guests from EL2 so that memory sharing
hypercalls can later be probed and utilised without involving the host.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: Ic80c2aaeba236f0cbcc515d5787a1a4ad230d1d6
Signed-off-by: Will Deacon <willdeacon@google.com>
Introduce __pkvm_guest_unshare_host() to remove host access to a page
which was previously shared by protected guest.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I0a11a2458d6ff5bf1e9b8ece871ba1383ed4611d
Signed-off-by: Will Deacon <willdeacon@google.com>
In preparation for allowing a protected guest to share individual pages
back with the host for the purposes of things like virtio buffers,
introduce __pkvm_guest_share_host() to take care of the associated
page-table updates and permission checking.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I6279b565b89a961628628aa4b1b592fdd57696e4
Signed-off-by: Will Deacon <willdeacon@google.com>
Allow the VMM to hook into and handle a subset of guest hypercalls
advertised by the host. For now, no such hypercalls exist, and so the
new capability returns 0 when queried.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I684d5cd07864887377e91cc96041916d671b2b16
[willdeacon@: Leave other unsupported CAPs commented in uapi/linux/kvm.h]
Signed-off-by: Will Deacon <willdeacon@google.com>
Loading a vCPU concurrently on multiple physical CPUs is a recipe for
disaster. Introduce a per-vCPU flag to track whether or not it is loaded
and reject a load request for a vCPU which is already loaded.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Ic72db8a0462c23a3dc2af06bf0265b586729f989
Signed-off-by: Will Deacon <willdeacon@google.com>
Nothing currently prevents the host from tearing down a shadow VM while
a vCPU is loaded, which is likely to corrupt the hypervisor state. To
prevent this, refcount the shadow vm structs on vcpu_load() and
vcpu_put() and make sure to only allow tearing down a shadow VM when
it's refcount is 0.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I2860c3297516f8af6ff4a0d4c91127af4a34b62e
Signed-off-by: Will Deacon <willdeacon@google.com>
We currently track page-ownership in nVHE protected mode with a rather
coarse granularity -- all guests share a unique owner id. But a finer
grain tracking will be useful soon, to e.g. handle host stage-2 faults
caused by an access to guest memory. To prepare the ground for this,
let's use the guest VMIDs as owner ids, hence allowing to distinguish
between all of them. This only works since the pKVM EL2 hypervisor
guarantees the stability of the VMIDs for the entire lifetime of a guest
VM. This will need some rework when/if we attempt to run more than 255
guests concurrently in protected mode as we'll have to handle VMID
rollovers, but there is no clear need for now, so let's keep it simple
to start.
Signed-off-by: Quentin Perret <qperret@google.com>
[willdeacon@: Update constants in mem_protect.h]
Bug: 209580772
Change-Id: I5c5c8061617d7dc481ae5e25a0391b306aabbd8c
Signed-off-by: Will Deacon <willdeacon@google.com>
We will soon need more than 8 bits to encode all possible owner ids in
KVM protected mode. To prepare the ground for this, introduce a new type
for owner_ids, and make it a 32bits wide.
Signed-off-by: Quentin Perret <qperret@google.com>
[willdeacon@: Move IDs to header and fix S2MPU host_stage2_set_owner() callback]
Bug: 209580772
Change-Id: I37add42a2d7f34aa110c00fd9569d81db279d765
Signed-off-by: Will Deacon <willdeacon@google.com>
kvm_pgtable_stage2_set_owner() could be generalised into a way
to store up to 63 bits in the page tables, as long as we don't
set bit 0.
Let's just do that.
Signed-off-by: Marc Zyngier <maz@kernel.org>
[willdeacon@: Fix S2MPU conflict in host_stage2_set_owner_locked()]
Bug: 209580772
Change-Id: I4e42d149b457870c35a5ae0f77e14c95dee16b4d
Signed-off-by: Will Deacon <willdeacon@google.com>
Typically, TLB invalidation of guest stage-2 mappings using nVHE is
performed by a hypercall originating from the host. For the invalidation
instruction to be effective, therefore, __tlb_switch_to_{guest,host}()
swizzle the active stage-2 context around the TLBI instruction.
With guest-to-host memory sharing hypercalls originating from the guest
under pKVM, there is no need to change the context when invalidating the
TLB and restoring the host context is, in fact, harmful.
Check the currently running vCPU in __tlb_switch_to_{guest,host}() and
avoid switching the context if a vCPU is already loaded.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I4cfb36f0f88a2d50d50ea85a0d84e3e8191152a3
Signed-off-by: Will Deacon <willdeacon@google.com>