When donating pages to the guest, we only check the first IPA in the
range against the pvmfw loading range. Although this is fine for the
page-at-a-time faulting path, it doesn't fit with the rest of the mem
protection logic, which deals with the possibility of an arbitrarily
sized contiguous address range.
Rework the logic so that we check the whole IPA range during guest
donation and trigger the pvmfw loading path if any of the pages
intersect with the pvmfw region.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: I6fef9f1898e65a95cab7f6a0ffa8aa422a8d5a91
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
When poisoning the pvmfw pages during system reset at EL2, ensure that we
use a writable fixmap mapping rather than the persistent read-only mapping
of the region.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: I4c8be092d3c822695afd7d03d0d64163664a9f64
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
pkvm_clear_pvmfw_pages() is used to poison the pvmfw pages during reset,
so rename it to pkvm_poison_pvmfw_pages() instead.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: Ie5b9c90f0707fa81d9099425cff35383bfb0d009
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
hyp_zero_page() is used for poisoning memory, so rename it to
hyp_poison_page() to avoid confusing with the concept of a "zero page"
and make it available outside of mem_protect.c as it will be used to
poison the pvmfw memory in a subsequent patch.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: Ia4aec46437db3ffe466ae09bd180392fa06c0b46
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
hyp_fixmap_map() never returns NULL, so remove the redundant checks for
it and simplify the error handling in the callers.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: Ie73a97cc3d9bded3750abe6e243003827393ee5e
Signed-off-by: Quentin Perret <qperret@google.com>
This essentially reverts commit e41b135550
"virtio_balloon: disable VIOMMU support".
Although the virtio_balloon driver does not translate through a
VIOMMU (or bounce buffer) the pages that it sends to the device,
it *does* need to perform these translations on the virtio rings
themselves.
This fixes virtio_balloon initialisation inside a PKVM/ARM64
protected virtual machine.
Bug: 240239989
Change-Id: I2a84eec870fd638223b231e5c4d1c27216dc40a2
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
This specifies that the driver is running on a PKVM hypervisor
and must use the memrelinquish service to cooperatively release
memory. If this service is unavailable, virtio_balloon cannot be
used.
Bug: 240239989
Change-Id: I8800c4435d8fae9df6f1ab108cc61c8f93020773
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
When using nVHE in protected mode, the host donates pages through an arch
specific memcache the hyp can then pours in its local vcpu copy. The latter
should be flushed on VM teardown.
Bug: 237506543
Change-Id: Ic37d794ac33e9f844fa6ae1b4943febcdad5b033
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
If the mapping is determined to be not present in an earlier walk,
attempting the unmap is pointless.
Bug: 259217067
Change-Id: I6fd939556b80d7a9a0731cab36166a652f7a7c6d
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
The VM should only relinquish "normal" pages. For a protected VM, this
means PAGE_OWNED; For a normal VM, this means PAGE_SHARED_BORROWED. All
other page types are rejected and failure is reported to the caller.
Bug: 259217067
Change-Id: Icff3474dc2c975a6c5befe546c5521a05b3bd575
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Fixes build failure on -Werror=missing-prototypes.
At the same time, make the header file more resilient to ordering by
declaring 'struct page'.
Bug: 240239989
Change-Id: I84d069bde5ff03d1afa55d25c01448b0d43042da
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
When MMIO guard is queried, it advertises the guard granule size
it uses. Use that value.
Fixes: arm64: Implement ioremap/iounmap hooks calling into KVM's MMIO guard
Bug: 251432016
Change-Id: Iff4dcb6229bf89aef832a29a98fecc041a1aec1b
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Set the MMIO guard flag for protected vms prior to entering the guest
for the first time.
Bug: 216798684
Change-Id: I1448102ae85176d495ae7f8d6d20de4092049f0d
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Document the hypercalls user for the MMIO guard infrastructure.
Bug: 209580772
Change-Id: I927bcd6c5e3ef932265d817288ff2b46b0e0db66
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Plumb the MMIO checking code into the MMIO fault handling code.
Any fault hitting outside of an MMIO region will now report
an invalid syndrome, and won't leak any data from the guest.
Bug: 209580772
Change-Id: I68bef2d0211a804aa1e598aeaa0c85dc4098f61e
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Plumb in the hypercall interface to allow a guest to discover,
enroll, map and unmap MMIO regions.
Bug: 209580772
Change-Id: I0390456ffde8ceca351d3d8e82fd1dddeb747fac
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
[tabba@:
- use the new pkvm_hyp_* infrastructure
- move pkvm_refill_memcache() up in file to expose it to
handle_pvm_entry_hvc64()
- include asm/stage2_pgtable.h in hypercalls.c for
topup_hyp_memcache()
- fix pkvm_install_ioguard_page() retval to u64, reported in
b/253586500 and fixed in a separate patch before
- fix smccc to return success, reported in b/251426790 and fixed
in a separate patch before
]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Introduce the infrastructure required to identify an IPA region
that is expected to be used as an MMIO window.
This include mapping, unmapping and checking the regions. Nothing
calls into it yet, so no expected functional change.
Bug: 209580772
Change-Id: I227eaa28b98e067e3daae4f9e1071eb37a6761cc
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: use the new pkvm_hyp_* infrastructure, and remove
redundant reassignment in __pkvm_remove_ioguard_page()]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Add a per-VM flag indicating that the guest has bought into the
MMIO guard enforcement framework.
Bug: 209580772
Change-Id: If60b2b38a419a9f44ebe9029f55dd016fd2444b5
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: had to assign it a new number since there are existing
flags now]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
In order to simplify the implementation of an EL2-only version of
MMIO guard, expose topup_hyp_memcache() and simplify its usage
by only requiring a vcpu.
Bug: 209580772
Change-Id: I4f54c57a9693cf7a3450f99fedc15ae32af09a31
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: original patch did the same for free_hyp_memcache(), but
it's already exposed]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Create a macro definition for the FAR_EL2 mask and use it instead
of a hard-coded value, and put it in a share header to be used by
hyp.
No functional change intended.
Bug: 209580772
Change-Id: Ib83932d670cba6bf8f1ed45d2c0e1ed34331d98d
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
kvm_pgtable_stage2_set_owner() could be generalised into a way
to store up to 63 bits in the page tables, as long as we don't
set bit 0.
Let's just do that.
Bug: 209580772
Change-Id: I4e42d149b457870c35a5ae0f77e14c95dee16b4d
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: Fix conflict in host_stage2_set_owner_locked()]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Memory relinquish interface is used by both memory ballooning and
by page reporting. It must be built if either is specified.
Bug: 258944680
Change-Id: I3b949dadbfc4a2b17dba1809a46f0a7386e70ebf
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Add monitor debug support for non-protected guests in protected
mode.
Save and restore the monitor debug state when running a
non-protected guest, and propagate the monitor debug
configuration of non-protected vcpus from the host.
This patch assumes that the hyp vcpu debug iflags are kept in
sync with the host.
Bug: 228011917
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Ie525693a6a6f236e388b16a1af297403e729057f
Signed-off-by: Quentin Perret <qperret@google.com>
This code will be reused when supporting debug for non-protected
VMs in protected mode.
No functional change intended
Bug: 228011917
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: If05dc8fdb3fff8e811f06cf5050d3eaf0ce67116
Signed-off-by: Quentin Perret <qperret@google.com>
The iflags are meant as input flags to the hypervisor, and will
be used in future patches by calls to functions that sync debug
and pmu state. Ensure that the hyp_vcpu copy is up-to-date with
the host's on entry.
Bug: 228011917
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Id04d65ee084c3745ddc283ff5e30348511a4a1d2
Signed-off-by: Quentin Perret <qperret@google.com>
The free-page reporting and hinting queues do not pass arrays of page
addresses (like the basic inflate queue) but instead pass the free page
ranges as buffers. This does not work well with DMA API: The host wants
to know the GPA, not an IOVA.
For these two virtqueues, disable DMA API and pass through buffers untranslated.
Bug: 240239989
Change-Id: I2d13a8b7e8f6775819de7fe96f4579afa08b1300
Signed-off-by: Keir Fraser <keirf@google.com>
[ qperret@: Fixed minor context conflict in virtio.h ]
Signed-off-by: Quentin Perret <qperret@google.com>
When running as a protected VM, the hypervisor isolates the VM's
memory pages from the host. Returning ownership of a VM page therefore
requires hypervisor involvement, and acknowledgement from the
protected VM that it is voluntarily cooperating.
To this end, notify pages via the new relinquish hypercall when they
are being reported to the host as free and available for temporary
reclaim.
Bug: 240239989
Change-Id: I8718e468be63c3aacb2f79ff141fbcedd6d19b56
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
When running as a protected VM, the hypervisor isolates the VM's
memory pages from the host. Returning ownership of a VM page
therefore requires hypervisor involvement, and acknowledgement from
the protected VM that it is voluntarily cooperating.
To this end, notify pages via the new relinquish hypercall when they
are entered into the memory balloon.
Bug: 240239989
Change-Id: Ic89b45312a7478ddff081a934d99e693eded92dc
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
On PKVM/ARM64 this uses the ARM SMCCC relinquish hypercall when available.
Bug: 240239989
Change-Id: Ifa85b641a48f348a2364cf8c6b06b6417f1eeedb
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
This allows a VM running on PKVM to notify the hypervisor (and host)
that it is returning pages to host ownership.
Bug: 240239989
Change-Id: I4644736db04afacd7da4c6f465130c73c2e44b93
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
The kernel has an awfully complicated boot sequence in order to cope
with the various EL2 configurations, including those that "enhanced"
the architecture. We go from EL2 to EL1, then back to EL2, staying
at EL2 if VHE capable and otherwise go back to EL1.
Here's a paracetamol tablet for you.
The cpu_resume path follows the same logic, because coming up with
two versions of a square wheel is hard.
However, things aren't this straightforward with pKVM, as the host
resume path is always proxied by the hypervisor, which means that
the kernel is always entered at EL1. Which contradicts what the
__boot_cpu_mode[] array contains (it obviously says EL2).
This thus triggers a HVC call from EL1 to EL2 in a vain attempt
to upgrade from EL1 to EL2 VHE, which we are, funnily enough,
reluctant to grant to the host kernel. This is also completely
unexpected, and puzzles your average EL2 hacker.
Address it by fixing up the boot mode at the point the host gets
deprivileged. is_hyp_mode_available() and co already have a static
branch to deal with this, making it pretty safe.
Cc: <stable@vger.kernel.org> # 5.15+
Reported-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Bug: 258157858
Link: https://lore.kernel.org/all/20221108100138.3887862-1-vdonnefort@google.com/
Change-Id: I4a2269402ececa0ec47cab88343c3c623b4b2e3d
Signed-off-by: Quentin Perret <qperret@google.com>
The hypervisor has not needed its own .data section because all globals
were either .rodata or .bss. Linked lists are initialized with the head
pointing to itself. To avoid having to work around this by initializing
at runtime, add a .hyp.data section.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I7a56dc4c93e05bbef53c66837164d17c6103b6b8
Signed-off-by: Quentin Perret <qperret@google.com>
As pKVM does not trust the host, it should not be involved in the
handling of, or be able to observe the response to entropy requests
issues by protected guests.
When an SMC-based implementation of the ARM SMCCC TRNG interface is
present, pass any HVC-based requests directly on to the secure firmware.
Co-developed-by: Ard Biesheuvel <ardb@google.com>
Signed-off-by: Ard Biesheuvel <ardb@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: Ica492ce49fd059a62ecc31bb7ac13c9adb773a08
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Expose a new capability, KVM_CAP_ARM_PROTECTED_VM, for protected VMs
which allows the size of the PVM firmware region to be discovered from
userspace and for the firmware load address to be specified if it is
required.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I819b9b2cfa227f1a0607a8f683aa01d4ae50704f
Signed-off-by: Quentin Perret <qperret@google.com>
When a PVM firmware image is present for a protected VM, treat the first
running vCPU as the "primary" vCPU and reset its registers accordingly,
in particular by initialising its PC to enter the firmware at startup.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I26676637145c7d809c5dc5ac0ad0e1fadaf275d2
Signed-off-by: Quentin Perret <qperret@google.com>
When the host donates a page to a protected guest at an IPA which
coincides with the PVM firmware load address, copy-in the relevant
firmware page after unmapping it from the host but before mapping it
into the guest.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I8cec813fa52938945f3122655deb785523a96ec8
Signed-off-by: Quentin Perret <qperret@google.com>
When the host shuts down cleanly under pKVM, it is EL2's responsibility
to clear the pvmfw pages before forwarding the PSCI call onto EL3.
Wipe the pvmfw pages on SYSTEM_OFF, SYSTEM_RESET and SYSTEM_RESET2 calls
from the host, cleaning the zeroed memory to the PoC for good measure.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I0dd2757e355f384813319034c6eed0fa2c2328c2
Signed-off-by: Quentin Perret <qperret@google.com>
kvm_flush_dcache_to_poc() converts its (start,len) parameters into
(start,end) parameters for dcache_clean_inval_poc(). This mostly works
out except for the case when 'len == 0', where dcache_clean_inval_poc()
will still issue cache maintenance for the cache line containing 'start'.
If 'start' is not mapped, then this can generate an unexpected fault.
In preparation for cleaning the pvmfw memory pages to the PoC on
system reset, tweak kvm_flush_dcache_to_poc() to act as a no-op when
the supplied length is 0 and avoid having to check for this corner case
in the caller.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: Idae2b22289398e941938821d1d3b3a5a1da3fd8f
Signed-off-by: Quentin Perret <qperret@google.com>
Unmap the PVM firmware memory from the pKVM host by transferring
ownership of the pages to the hypervisor when the host deprivileges
itself during boot.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: I311642f543c0c73d0e0cf2ec051e8e2d9759c5d1
Signed-off-by: Quentin Perret <qperret@google.com>
Add support for a "linux,pkvm-guest-firmware-memory" reserved memory
region, which can be used to identify a firmware image for protected
VMs. If pKVM fails to initialise and a firmware region is advertised,
then the memory is cleared during boot.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: Ibfcc0ff00d4b8a42747452047856cb9ba8def4c4
Signed-off-by: Quentin Perret <qperret@google.com>
Add some initial documentation for the Protected KVM (pKVM) feature on
arm64, describing the user ABI for creating protected VMs as well as
their limitations.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I152af404f24b9aba3cc9be6acd8e26afcfa4b0a5
Signed-off-by: Quentin Perret <qperret@google.com>
Introduce a new virtual machine type, KVM_VM_TYPE_ARM_PROTECTED, which
specifies that the guest memory pages are to be unmapped from the host
stage-2 by the hypervisor.
Signed-off-by: Will Deacon <will@kernel.org>
[willdeacon@: Align KVM_VM_TYPE_ARM_PROTECTED value with android13 kernels]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iabcd03865aed4a41637597ac247897fd185bfc4d
Signed-off-by: Quentin Perret <qperret@google.com>
Extend our KVM "vendor" hypercalls to expose three new hypercalls to
protected guests for the purpose of opening and closing shared memory
windows with the host:
MEMINFO: Query the stage-2 page size (i.e. the minimum granule at
which memory can be shared)
MEM_SHARE: Share a page RWX with the host, faulting the page in if
necessary.
MEM_UNSHARE: Unshare a page with the host. Subsequent host accesses
to the page will result in a fault being injected by the
hypervisor.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I80fe8af0bc0b3a40460c5065eabe26b1d9f634f2
Signed-off-by: Quentin Perret <qperret@google.com>
The PTP hypercall documentation doesn't produce the best-looking table
when formatting in HTML as all of the return value definitions end up
on the same line.
Reformat the PTP hypercall documentation to follow the formatting used
by hypercalls.rst.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ic77cea5a621a9278d098afd80ef4c0e125760814
Signed-off-by: Quentin Perret <qperret@google.com>
KVM/arm64 makes use of the SMCCC "Vendor Specific Hypervisor Service
Call Range" to expose KVM-specific hypercalls to guests in a
discoverable and extensible fashion.
Document the existence of this interface and the discovery hypercall.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I5754589b1b695828eab7cb41c7aa6a0fb55ad273
Signed-off-by: Quentin Perret <qperret@google.com>
In preparation for describing the guest view of KVM/arm64 hypercalls in
hypercalls.rst, move the existing contents of the file concerning the
firmware pseudo-registers elsewhere.
Cc: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ie8931290b291c0ffd2f1f11265babe2475972868
Signed-off-by: Quentin Perret <qperret@google.com>