Document the hypercalls user for the MMIO guard infrastructure.
Bug: 209580772
Change-Id: I927bcd6c5e3ef932265d817288ff2b46b0e0db66
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Plumb the MMIO checking code into the MMIO fault handling code.
Any fault hitting outside of an MMIO region will now report
an invalid syndrome, and won't leak any data from the guest.
Bug: 209580772
Change-Id: I68bef2d0211a804aa1e598aeaa0c85dc4098f61e
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Plumb in the hypercall interface to allow a guest to discover,
enroll, map and unmap MMIO regions.
Bug: 209580772
Change-Id: I0390456ffde8ceca351d3d8e82fd1dddeb747fac
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
[tabba@:
- use the new pkvm_hyp_* infrastructure
- move pkvm_refill_memcache() up in file to expose it to
handle_pvm_entry_hvc64()
- include asm/stage2_pgtable.h in hypercalls.c for
topup_hyp_memcache()
- fix pkvm_install_ioguard_page() retval to u64, reported in
b/253586500 and fixed in a separate patch before
- fix smccc to return success, reported in b/251426790 and fixed
in a separate patch before
]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Introduce the infrastructure required to identify an IPA region
that is expected to be used as an MMIO window.
This include mapping, unmapping and checking the regions. Nothing
calls into it yet, so no expected functional change.
Bug: 209580772
Change-Id: I227eaa28b98e067e3daae4f9e1071eb37a6761cc
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: use the new pkvm_hyp_* infrastructure, and remove
redundant reassignment in __pkvm_remove_ioguard_page()]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Add a per-VM flag indicating that the guest has bought into the
MMIO guard enforcement framework.
Bug: 209580772
Change-Id: If60b2b38a419a9f44ebe9029f55dd016fd2444b5
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: had to assign it a new number since there are existing
flags now]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
In order to simplify the implementation of an EL2-only version of
MMIO guard, expose topup_hyp_memcache() and simplify its usage
by only requiring a vcpu.
Bug: 209580772
Change-Id: I4f54c57a9693cf7a3450f99fedc15ae32af09a31
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: original patch did the same for free_hyp_memcache(), but
it's already exposed]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Create a macro definition for the FAR_EL2 mask and use it instead
of a hard-coded value, and put it in a share header to be used by
hyp.
No functional change intended.
Bug: 209580772
Change-Id: Ib83932d670cba6bf8f1ed45d2c0e1ed34331d98d
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
kvm_pgtable_stage2_set_owner() could be generalised into a way
to store up to 63 bits in the page tables, as long as we don't
set bit 0.
Let's just do that.
Bug: 209580772
Change-Id: I4e42d149b457870c35a5ae0f77e14c95dee16b4d
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: Fix conflict in host_stage2_set_owner_locked()]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Memory relinquish interface is used by both memory ballooning and
by page reporting. It must be built if either is specified.
Bug: 258944680
Change-Id: I3b949dadbfc4a2b17dba1809a46f0a7386e70ebf
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Add monitor debug support for non-protected guests in protected
mode.
Save and restore the monitor debug state when running a
non-protected guest, and propagate the monitor debug
configuration of non-protected vcpus from the host.
This patch assumes that the hyp vcpu debug iflags are kept in
sync with the host.
Bug: 228011917
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Ie525693a6a6f236e388b16a1af297403e729057f
Signed-off-by: Quentin Perret <qperret@google.com>
This code will be reused when supporting debug for non-protected
VMs in protected mode.
No functional change intended
Bug: 228011917
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: If05dc8fdb3fff8e811f06cf5050d3eaf0ce67116
Signed-off-by: Quentin Perret <qperret@google.com>
The iflags are meant as input flags to the hypervisor, and will
be used in future patches by calls to functions that sync debug
and pmu state. Ensure that the hyp_vcpu copy is up-to-date with
the host's on entry.
Bug: 228011917
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Id04d65ee084c3745ddc283ff5e30348511a4a1d2
Signed-off-by: Quentin Perret <qperret@google.com>
The free-page reporting and hinting queues do not pass arrays of page
addresses (like the basic inflate queue) but instead pass the free page
ranges as buffers. This does not work well with DMA API: The host wants
to know the GPA, not an IOVA.
For these two virtqueues, disable DMA API and pass through buffers untranslated.
Bug: 240239989
Change-Id: I2d13a8b7e8f6775819de7fe96f4579afa08b1300
Signed-off-by: Keir Fraser <keirf@google.com>
[ qperret@: Fixed minor context conflict in virtio.h ]
Signed-off-by: Quentin Perret <qperret@google.com>
When running as a protected VM, the hypervisor isolates the VM's
memory pages from the host. Returning ownership of a VM page therefore
requires hypervisor involvement, and acknowledgement from the
protected VM that it is voluntarily cooperating.
To this end, notify pages via the new relinquish hypercall when they
are being reported to the host as free and available for temporary
reclaim.
Bug: 240239989
Change-Id: I8718e468be63c3aacb2f79ff141fbcedd6d19b56
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
When running as a protected VM, the hypervisor isolates the VM's
memory pages from the host. Returning ownership of a VM page
therefore requires hypervisor involvement, and acknowledgement from
the protected VM that it is voluntarily cooperating.
To this end, notify pages via the new relinquish hypercall when they
are entered into the memory balloon.
Bug: 240239989
Change-Id: Ic89b45312a7478ddff081a934d99e693eded92dc
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
On PKVM/ARM64 this uses the ARM SMCCC relinquish hypercall when available.
Bug: 240239989
Change-Id: Ifa85b641a48f348a2364cf8c6b06b6417f1eeedb
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
This allows a VM running on PKVM to notify the hypervisor (and host)
that it is returning pages to host ownership.
Bug: 240239989
Change-Id: I4644736db04afacd7da4c6f465130c73c2e44b93
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
The kernel has an awfully complicated boot sequence in order to cope
with the various EL2 configurations, including those that "enhanced"
the architecture. We go from EL2 to EL1, then back to EL2, staying
at EL2 if VHE capable and otherwise go back to EL1.
Here's a paracetamol tablet for you.
The cpu_resume path follows the same logic, because coming up with
two versions of a square wheel is hard.
However, things aren't this straightforward with pKVM, as the host
resume path is always proxied by the hypervisor, which means that
the kernel is always entered at EL1. Which contradicts what the
__boot_cpu_mode[] array contains (it obviously says EL2).
This thus triggers a HVC call from EL1 to EL2 in a vain attempt
to upgrade from EL1 to EL2 VHE, which we are, funnily enough,
reluctant to grant to the host kernel. This is also completely
unexpected, and puzzles your average EL2 hacker.
Address it by fixing up the boot mode at the point the host gets
deprivileged. is_hyp_mode_available() and co already have a static
branch to deal with this, making it pretty safe.
Cc: <stable@vger.kernel.org> # 5.15+
Reported-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Bug: 258157858
Link: https://lore.kernel.org/all/20221108100138.3887862-1-vdonnefort@google.com/
Change-Id: I4a2269402ececa0ec47cab88343c3c623b4b2e3d
Signed-off-by: Quentin Perret <qperret@google.com>
The hypervisor has not needed its own .data section because all globals
were either .rodata or .bss. Linked lists are initialized with the head
pointing to itself. To avoid having to work around this by initializing
at runtime, add a .hyp.data section.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I7a56dc4c93e05bbef53c66837164d17c6103b6b8
Signed-off-by: Quentin Perret <qperret@google.com>
As pKVM does not trust the host, it should not be involved in the
handling of, or be able to observe the response to entropy requests
issues by protected guests.
When an SMC-based implementation of the ARM SMCCC TRNG interface is
present, pass any HVC-based requests directly on to the secure firmware.
Co-developed-by: Ard Biesheuvel <ardb@google.com>
Signed-off-by: Ard Biesheuvel <ardb@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: Ica492ce49fd059a62ecc31bb7ac13c9adb773a08
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Expose a new capability, KVM_CAP_ARM_PROTECTED_VM, for protected VMs
which allows the size of the PVM firmware region to be discovered from
userspace and for the firmware load address to be specified if it is
required.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I819b9b2cfa227f1a0607a8f683aa01d4ae50704f
Signed-off-by: Quentin Perret <qperret@google.com>
When a PVM firmware image is present for a protected VM, treat the first
running vCPU as the "primary" vCPU and reset its registers accordingly,
in particular by initialising its PC to enter the firmware at startup.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I26676637145c7d809c5dc5ac0ad0e1fadaf275d2
Signed-off-by: Quentin Perret <qperret@google.com>
When the host donates a page to a protected guest at an IPA which
coincides with the PVM firmware load address, copy-in the relevant
firmware page after unmapping it from the host but before mapping it
into the guest.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I8cec813fa52938945f3122655deb785523a96ec8
Signed-off-by: Quentin Perret <qperret@google.com>
When the host shuts down cleanly under pKVM, it is EL2's responsibility
to clear the pvmfw pages before forwarding the PSCI call onto EL3.
Wipe the pvmfw pages on SYSTEM_OFF, SYSTEM_RESET and SYSTEM_RESET2 calls
from the host, cleaning the zeroed memory to the PoC for good measure.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I0dd2757e355f384813319034c6eed0fa2c2328c2
Signed-off-by: Quentin Perret <qperret@google.com>
kvm_flush_dcache_to_poc() converts its (start,len) parameters into
(start,end) parameters for dcache_clean_inval_poc(). This mostly works
out except for the case when 'len == 0', where dcache_clean_inval_poc()
will still issue cache maintenance for the cache line containing 'start'.
If 'start' is not mapped, then this can generate an unexpected fault.
In preparation for cleaning the pvmfw memory pages to the PoC on
system reset, tweak kvm_flush_dcache_to_poc() to act as a no-op when
the supplied length is 0 and avoid having to check for this corner case
in the caller.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: Idae2b22289398e941938821d1d3b3a5a1da3fd8f
Signed-off-by: Quentin Perret <qperret@google.com>
Unmap the PVM firmware memory from the pKVM host by transferring
ownership of the pages to the hypervisor when the host deprivileges
itself during boot.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I311642f543c0c73d0e0cf2ec051e8e2d9759c5d1
Signed-off-by: Quentin Perret <qperret@google.com>
Add support for a "linux,pkvm-guest-firmware-memory" reserved memory
region, which can be used to identify a firmware image for protected
VMs. If pKVM fails to initialise and a firmware region is advertised,
then the memory is cleared during boot.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: Ibfcc0ff00d4b8a42747452047856cb9ba8def4c4
Signed-off-by: Quentin Perret <qperret@google.com>
Add some initial documentation for the Protected KVM (pKVM) feature on
arm64, describing the user ABI for creating protected VMs as well as
their limitations.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I152af404f24b9aba3cc9be6acd8e26afcfa4b0a5
Signed-off-by: Quentin Perret <qperret@google.com>
Introduce a new virtual machine type, KVM_VM_TYPE_ARM_PROTECTED, which
specifies that the guest memory pages are to be unmapped from the host
stage-2 by the hypervisor.
Signed-off-by: Will Deacon <will@kernel.org>
[willdeacon@: Align KVM_VM_TYPE_ARM_PROTECTED value with android13 kernels]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iabcd03865aed4a41637597ac247897fd185bfc4d
Signed-off-by: Quentin Perret <qperret@google.com>
Extend our KVM "vendor" hypercalls to expose three new hypercalls to
protected guests for the purpose of opening and closing shared memory
windows with the host:
MEMINFO: Query the stage-2 page size (i.e. the minimum granule at
which memory can be shared)
MEM_SHARE: Share a page RWX with the host, faulting the page in if
necessary.
MEM_UNSHARE: Unshare a page with the host. Subsequent host accesses
to the page will result in a fault being injected by the
hypervisor.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I80fe8af0bc0b3a40460c5065eabe26b1d9f634f2
Signed-off-by: Quentin Perret <qperret@google.com>
The PTP hypercall documentation doesn't produce the best-looking table
when formatting in HTML as all of the return value definitions end up
on the same line.
Reformat the PTP hypercall documentation to follow the formatting used
by hypercalls.rst.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ic77cea5a621a9278d098afd80ef4c0e125760814
Signed-off-by: Quentin Perret <qperret@google.com>
KVM/arm64 makes use of the SMCCC "Vendor Specific Hypervisor Service
Call Range" to expose KVM-specific hypercalls to guests in a
discoverable and extensible fashion.
Document the existence of this interface and the discovery hypercall.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I5754589b1b695828eab7cb41c7aa6a0fb55ad273
Signed-off-by: Quentin Perret <qperret@google.com>
In preparation for describing the guest view of KVM/arm64 hypercalls in
hypercalls.rst, move the existing contents of the file concerning the
firmware pseudo-registers elsewhere.
Cc: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ie8931290b291c0ffd2f1f11265babe2475972868
Signed-off-by: Quentin Perret <qperret@google.com>
A guest that can only operate on private memory is pretty useless, as it
has no way to share buffers with the host for things like virtio.
Extend our memory protection mechanisms to support the sharing and
unsharing of guest pages from the guest to the host. For now, this
functionality is unused but will later be exposed to the guest via
hypercalls.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I6b0d6f63348f3a2a847acf4d7bb87bd6e9742af0
Signed-off-by: Quentin Perret <qperret@google.com>
Break-before-make (BBM) can be expensive, as transitioning via an
invalid mapping (i.e. the "break" step) requires the completion of TLB
invalidation and can also cause other agents to fault concurrently on
the invalid mapping.
Since BBM is not required when changing only the software bits of a PTE,
avoid the sequence in this case and just update the PTE directly.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I91ec043a75154fa2ca732f5269c6ae1bceea4a93
Signed-off-by: Quentin Perret <qperret@google.com>
Typically, TLB invalidation of guest stage-2 mappings using nVHE is
performed by a hypercall originating from the host. For the invalidation
instruction to be effective, therefore, __tlb_switch_to_{guest,host}()
swizzle the active stage-2 context around the TLBI instruction.
With guest-to-host memory sharing and unsharing hypercalls originating
from the guest under pKVM, there is now a need to support both guest
and host VMID invalidations issued from guest context.
Replace the __tlb_switch_to_{guest,host}() functions with a more general
{enter,exit}_vmid_context() implementation which supports being invoked
from guest context and acts as a no-op if the target context matches the
running context.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I92c6f48eb4c4b6286b930c2f0cda245bccc1927b
Signed-off-by: Quentin Perret <qperret@google.com>
The pKVM hypervisor will currently panic if the host tries to access
memory that it doesn't own (e.g. protected guest memory). Sadly, as
guest memory can still be mapped into the VMM's address space, userspace
can trivially crash the kernel/hypervisor by poking into guest memory.
To prevent this, inject the abort back in the host with S1PTW set in the
ESR, hence allowing the host to differentiate this abort from normal
userspace faults and inject a SIGSEGV cleanly.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I09ee54fbf4c202dc3ac2e1b5eea264d4dc84f613
Signed-off-by: Quentin Perret <qperret@google.com>
In order to simplify the injection of exceptions in the host in pkvm
context, let's factor out of enter_exception64() the code calculating
the exception offset from VBAR_EL1 and the cpsr.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I50a2510b59311717c6e17ea4e45fc634b4b43073
Signed-off-by: Quentin Perret <qperret@google.com>
Add a helper allowing to check when the pkvm static key is enabled to
ease the introduction of pkvm hooks in other parts of the code.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8995021768def73bd7636a84059bdc43fa7ab2fc
Signed-off-by: Quentin Perret <qperret@google.com>
Now that TLBI invalidation is handled entirely at EL2 for both protected
and non-protected guests when protected KVM has initialised, unplug the
unused TLBI hypercalls.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I50ad4cb930c43f88e00320e47b358613224dd1cc
Signed-off-by: Quentin Perret <qperret@google.com>
Factor out logic that resets a vcpu's core registers, including
additional PSCI handling. This code will be reused when resetting
VMs in protected mode.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I22468be1d382e05e39557e32ea09a023173dbf48
Signed-off-by: Quentin Perret <qperret@google.com>
Move some PSCI functions and macros to a shared header to be used
by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ibe84564f423cd0281f3dc33d9801b474fe8f2db9
Signed-off-by: Quentin Perret <qperret@google.com>
Move the macro defines of the pstate reset values to a shared
header to be used by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ib98734d2ced07a958427c6552a9c22d159b85ad1
Signed-off-by: Quentin Perret <qperret@google.com>
Rather than forwarding guest hypercalls back to the host for handling,
implement some basic handling at EL2 which will later be extending to
provide additional functionality such as PSCI.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I14613c416078818b25bb29ed8899d7b71f8c40cc
Signed-off-by: Quentin Perret <qperret@google.com>
When dealing with a guest with SVE enabled, make sure the host SVE
state is pinned at EL2 S1, and that the hypervisor vCPU state is
correctly initialised (and then unpinned on teardown).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ic4d0ef9a6124701026cd56f6725ab4737857ed5b
Signed-off-by: Quentin Perret <qperret@google.com>
Do not rely on the state of the vm as provided by the host, but
initialize it instead at EL2 to a known good and safe state.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8e0e9fd7cdf0b5b4d422260be06920d0550d5f91
Signed-off-by: Quentin Perret <qperret@google.com>
Move kvm_vcpu_enable_ptrauth() to a shared header to be used by
hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Idb90ae3228fc3acb1fe310227a4f606f47b026a5
Signed-off-by: Quentin Perret <qperret@google.com>