Intecept FFA_MEM_RECLAIM calls from the host and transition the host
stage-2 page-table entries from the SHARED_OWNED state back to the OWNED
state once EL3 has confirmed that the secure mapping has been reclaimed.
Bug: 254811097
Change-Id: I58365e1b3fafa47f290a292fe57f6d2ed7f9091b
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20221116170335.2341003-11-qperret@google.com
Signed-off-by: Quentin Perret <qperret@google.com>
Extend pKVM's memory protection code so that we can update the host's
stage-2 page-table to track pages shared with secure world by the host
using FF-A and prevent those pages from being mapped into a guest.
[ qperret: BACKPORT due to context conflicts in mem_protect.c caused by
the presense of guest-related memory transition in the android kernel
(host_donate_guest and friends) ]
Bug: 254811097
Co-developed-by: Andrew Walbran <qwandor@google.com>
Change-Id: Ib4d404cd1d4fa11d7bf8c1d0b8ec00838a8038a0
Signed-off-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20221116170335.2341003-9-qperret@google.com
Signed-off-by: Quentin Perret <qperret@google.com>
The FF-A proxy code needs to allocate its own buffer pair for
communication with EL3 and for forwarding calls from the host at EL1.
Reserve a couple of pages for this purpose and use them to initialise
the hypervisor's FF-A buffer structure.
Bug: 254811097
Co-developed-by: Andrew Walbran <qwandor@google.com>
Change-Id: Id72cd7f59be20eb6d1faa6f1c5e64ecc8debf929
Signed-off-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20221116170335.2341003-7-qperret@google.com
Signed-off-by: Quentin Perret <qperret@google.com>
Probe FF-A during pKVM initialisation so that we can detect any
inconsistencies in the version or partition ID early on.
[ qperret: BACKPORT due to trivial conflict with header includes in
setup.c ]
Bug: 254811097
Change-Id: I7def4c2c497017ba86621bc98298bc65ffdeefae
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20221116170335.2341003-5-qperret@google.com
Signed-off-by: Quentin Perret <qperret@google.com>
When KVM is initialised in protected mode, we must take care to filter
certain FFA calls from the host kernel so that the integrity of guest
and hypervisor memory is maintained and is not made available to the
secure world.
As a first step, intercept and block all memory-related FF-A SMC calls
from the host to EL3. This puts the framework in place for handling them
properly.
Bug: 254811097
Co-developed-by: Andrew Walbran <qwandor@google.com>
Change-Id: I5279bce56956c590862a68e8c4803dd2205e3f81
Signed-off-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20221116170335.2341003-4-qperret@google.com
Signed-off-by: Quentin Perret <qperret@google.com>
FF-A function IDs and error codes will be needed in the hypervisor too,
so move to them to the header file where they can be shared. Rename the
version constants with an "FFA_" prefix so that they are less likely
to clash with other code in the tree.
Bug: 254811097
Co-developed-by: Andrew Walbran <qwandor@google.com>
Change-Id: I00ed487279fdfb61ea34ae99140c6fac8ee89187
Signed-off-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20221116170335.2341003-2-qperret@google.com
Signed-off-by: Quentin Perret <qperret@google.com>
On the guest teardown path, pKVM will zero the pages used to back the
guest shadow data structures before returning them to the host as they
may contain secrets (e.g. in the vCPU registers). However, the zeroing
is done using a cacheable alias, and CMOs are missing, hence giving the
host a potential opportunity to read the original content of the shadow
structs from memory.
Fix this by issuing CMOs after zeroing the pages.
[ qperret@: moved the CMOs to __unmap_donated_memory() to cover all
callers, including the __pkvm_init_vm() error path ]
Bug: 259551298
Change-Id: Id696d47d16e4c3fd870cb70b792eeb7f2282fc78
Signed-off-by: Quentin Perret <qperret@google.com>
If a malicious/compromised host issues a PSCI SYSTEM_RESET call in the
presence of guest-owned pages then the contents of those pages may be
susceptible to cold-reboot attacks.
Use the PSCI MEM_PROTECT call to ensure that volatile memory is wiped by
the firmware if a SYSTEM_RESET occurs while unpoisoned guest pages exist
in the system. Since this call does not offer protection for a "warm"
reset initiated by SYSTEM_RESET2, detect this case in the PSCI relay and
repaint the call to a standard SYSTEM_RESET instead.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254821051
Change-Id: I5c3dd93bc83ebcd0b6cea2ec734f6e3a77f0064e
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
When donating pages to the guest, we only check the first IPA in the
range against the pvmfw loading range. Although this is fine for the
page-at-a-time faulting path, it doesn't fit with the rest of the mem
protection logic, which deals with the possibility of an arbitrarily
sized contiguous address range.
Rework the logic so that we check the whole IPA range during guest
donation and trigger the pvmfw loading path if any of the pages
intersect with the pvmfw region.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: I6fef9f1898e65a95cab7f6a0ffa8aa422a8d5a91
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
When poisoning the pvmfw pages during system reset at EL2, ensure that we
use a writable fixmap mapping rather than the persistent read-only mapping
of the region.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: I4c8be092d3c822695afd7d03d0d64163664a9f64
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
pkvm_clear_pvmfw_pages() is used to poison the pvmfw pages during reset,
so rename it to pkvm_poison_pvmfw_pages() instead.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: Ie5b9c90f0707fa81d9099425cff35383bfb0d009
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
hyp_zero_page() is used for poisoning memory, so rename it to
hyp_poison_page() to avoid confusing with the concept of a "zero page"
and make it available outside of mem_protect.c as it will be used to
poison the pvmfw memory in a subsequent patch.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: Ia4aec46437db3ffe466ae09bd180392fa06c0b46
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
hyp_fixmap_map() never returns NULL, so remove the redundant checks for
it and simplify the error handling in the callers.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: Ie73a97cc3d9bded3750abe6e243003827393ee5e
Signed-off-by: Quentin Perret <qperret@google.com>
This essentially reverts commit e41b135550
"virtio_balloon: disable VIOMMU support".
Although the virtio_balloon driver does not translate through a
VIOMMU (or bounce buffer) the pages that it sends to the device,
it *does* need to perform these translations on the virtio rings
themselves.
This fixes virtio_balloon initialisation inside a PKVM/ARM64
protected virtual machine.
Bug: 240239989
Change-Id: I2a84eec870fd638223b231e5c4d1c27216dc40a2
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
This specifies that the driver is running on a PKVM hypervisor
and must use the memrelinquish service to cooperatively release
memory. If this service is unavailable, virtio_balloon cannot be
used.
Bug: 240239989
Change-Id: I8800c4435d8fae9df6f1ab108cc61c8f93020773
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
When using nVHE in protected mode, the host donates pages through an arch
specific memcache the hyp can then pours in its local vcpu copy. The latter
should be flushed on VM teardown.
Bug: 237506543
Change-Id: Ic37d794ac33e9f844fa6ae1b4943febcdad5b033
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
If the mapping is determined to be not present in an earlier walk,
attempting the unmap is pointless.
Bug: 259217067
Change-Id: I6fd939556b80d7a9a0731cab36166a652f7a7c6d
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
The VM should only relinquish "normal" pages. For a protected VM, this
means PAGE_OWNED; For a normal VM, this means PAGE_SHARED_BORROWED. All
other page types are rejected and failure is reported to the caller.
Bug: 259217067
Change-Id: Icff3474dc2c975a6c5befe546c5521a05b3bd575
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Fixes build failure on -Werror=missing-prototypes.
At the same time, make the header file more resilient to ordering by
declaring 'struct page'.
Bug: 240239989
Change-Id: I84d069bde5ff03d1afa55d25c01448b0d43042da
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
When MMIO guard is queried, it advertises the guard granule size
it uses. Use that value.
Fixes: arm64: Implement ioremap/iounmap hooks calling into KVM's MMIO guard
Bug: 251432016
Change-Id: Iff4dcb6229bf89aef832a29a98fecc041a1aec1b
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Set the MMIO guard flag for protected vms prior to entering the guest
for the first time.
Bug: 216798684
Change-Id: I1448102ae85176d495ae7f8d6d20de4092049f0d
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Document the hypercalls user for the MMIO guard infrastructure.
Bug: 209580772
Change-Id: I927bcd6c5e3ef932265d817288ff2b46b0e0db66
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Plumb the MMIO checking code into the MMIO fault handling code.
Any fault hitting outside of an MMIO region will now report
an invalid syndrome, and won't leak any data from the guest.
Bug: 209580772
Change-Id: I68bef2d0211a804aa1e598aeaa0c85dc4098f61e
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Plumb in the hypercall interface to allow a guest to discover,
enroll, map and unmap MMIO regions.
Bug: 209580772
Change-Id: I0390456ffde8ceca351d3d8e82fd1dddeb747fac
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
[tabba@:
- use the new pkvm_hyp_* infrastructure
- move pkvm_refill_memcache() up in file to expose it to
handle_pvm_entry_hvc64()
- include asm/stage2_pgtable.h in hypercalls.c for
topup_hyp_memcache()
- fix pkvm_install_ioguard_page() retval to u64, reported in
b/253586500 and fixed in a separate patch before
- fix smccc to return success, reported in b/251426790 and fixed
in a separate patch before
]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Introduce the infrastructure required to identify an IPA region
that is expected to be used as an MMIO window.
This include mapping, unmapping and checking the regions. Nothing
calls into it yet, so no expected functional change.
Bug: 209580772
Change-Id: I227eaa28b98e067e3daae4f9e1071eb37a6761cc
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: use the new pkvm_hyp_* infrastructure, and remove
redundant reassignment in __pkvm_remove_ioguard_page()]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Add a per-VM flag indicating that the guest has bought into the
MMIO guard enforcement framework.
Bug: 209580772
Change-Id: If60b2b38a419a9f44ebe9029f55dd016fd2444b5
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: had to assign it a new number since there are existing
flags now]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
In order to simplify the implementation of an EL2-only version of
MMIO guard, expose topup_hyp_memcache() and simplify its usage
by only requiring a vcpu.
Bug: 209580772
Change-Id: I4f54c57a9693cf7a3450f99fedc15ae32af09a31
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: original patch did the same for free_hyp_memcache(), but
it's already exposed]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Create a macro definition for the FAR_EL2 mask and use it instead
of a hard-coded value, and put it in a share header to be used by
hyp.
No functional change intended.
Bug: 209580772
Change-Id: Ib83932d670cba6bf8f1ed45d2c0e1ed34331d98d
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
kvm_pgtable_stage2_set_owner() could be generalised into a way
to store up to 63 bits in the page tables, as long as we don't
set bit 0.
Let's just do that.
Bug: 209580772
Change-Id: I4e42d149b457870c35a5ae0f77e14c95dee16b4d
Signed-off-by: Marc Zyngier <maz@kernel.org>
[tabba@: Fix conflict in host_stage2_set_owner_locked()]
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Memory relinquish interface is used by both memory ballooning and
by page reporting. It must be built if either is specified.
Bug: 258944680
Change-Id: I3b949dadbfc4a2b17dba1809a46f0a7386e70ebf
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Add monitor debug support for non-protected guests in protected
mode.
Save and restore the monitor debug state when running a
non-protected guest, and propagate the monitor debug
configuration of non-protected vcpus from the host.
This patch assumes that the hyp vcpu debug iflags are kept in
sync with the host.
Bug: 228011917
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Ie525693a6a6f236e388b16a1af297403e729057f
Signed-off-by: Quentin Perret <qperret@google.com>
This code will be reused when supporting debug for non-protected
VMs in protected mode.
No functional change intended
Bug: 228011917
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: If05dc8fdb3fff8e811f06cf5050d3eaf0ce67116
Signed-off-by: Quentin Perret <qperret@google.com>
The iflags are meant as input flags to the hypervisor, and will
be used in future patches by calls to functions that sync debug
and pmu state. Ensure that the hyp_vcpu copy is up-to-date with
the host's on entry.
Bug: 228011917
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Id04d65ee084c3745ddc283ff5e30348511a4a1d2
Signed-off-by: Quentin Perret <qperret@google.com>
The free-page reporting and hinting queues do not pass arrays of page
addresses (like the basic inflate queue) but instead pass the free page
ranges as buffers. This does not work well with DMA API: The host wants
to know the GPA, not an IOVA.
For these two virtqueues, disable DMA API and pass through buffers untranslated.
Bug: 240239989
Change-Id: I2d13a8b7e8f6775819de7fe96f4579afa08b1300
Signed-off-by: Keir Fraser <keirf@google.com>
[ qperret@: Fixed minor context conflict in virtio.h ]
Signed-off-by: Quentin Perret <qperret@google.com>
When running as a protected VM, the hypervisor isolates the VM's
memory pages from the host. Returning ownership of a VM page therefore
requires hypervisor involvement, and acknowledgement from the
protected VM that it is voluntarily cooperating.
To this end, notify pages via the new relinquish hypercall when they
are being reported to the host as free and available for temporary
reclaim.
Bug: 240239989
Change-Id: I8718e468be63c3aacb2f79ff141fbcedd6d19b56
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
When running as a protected VM, the hypervisor isolates the VM's
memory pages from the host. Returning ownership of a VM page
therefore requires hypervisor involvement, and acknowledgement from
the protected VM that it is voluntarily cooperating.
To this end, notify pages via the new relinquish hypercall when they
are entered into the memory balloon.
Bug: 240239989
Change-Id: Ic89b45312a7478ddff081a934d99e693eded92dc
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
On PKVM/ARM64 this uses the ARM SMCCC relinquish hypercall when available.
Bug: 240239989
Change-Id: Ifa85b641a48f348a2364cf8c6b06b6417f1eeedb
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
This allows a VM running on PKVM to notify the hypervisor (and host)
that it is returning pages to host ownership.
Bug: 240239989
Change-Id: I4644736db04afacd7da4c6f465130c73c2e44b93
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
The kernel has an awfully complicated boot sequence in order to cope
with the various EL2 configurations, including those that "enhanced"
the architecture. We go from EL2 to EL1, then back to EL2, staying
at EL2 if VHE capable and otherwise go back to EL1.
Here's a paracetamol tablet for you.
The cpu_resume path follows the same logic, because coming up with
two versions of a square wheel is hard.
However, things aren't this straightforward with pKVM, as the host
resume path is always proxied by the hypervisor, which means that
the kernel is always entered at EL1. Which contradicts what the
__boot_cpu_mode[] array contains (it obviously says EL2).
This thus triggers a HVC call from EL1 to EL2 in a vain attempt
to upgrade from EL1 to EL2 VHE, which we are, funnily enough,
reluctant to grant to the host kernel. This is also completely
unexpected, and puzzles your average EL2 hacker.
Address it by fixing up the boot mode at the point the host gets
deprivileged. is_hyp_mode_available() and co already have a static
branch to deal with this, making it pretty safe.
Cc: <stable@vger.kernel.org> # 5.15+
Reported-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Bug: 258157858
Link: https://lore.kernel.org/all/20221108100138.3887862-1-vdonnefort@google.com/
Change-Id: I4a2269402ececa0ec47cab88343c3c623b4b2e3d
Signed-off-by: Quentin Perret <qperret@google.com>
The hypervisor has not needed its own .data section because all globals
were either .rodata or .bss. Linked lists are initialized with the head
pointing to itself. To avoid having to work around this by initializing
at runtime, add a .hyp.data section.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I7a56dc4c93e05bbef53c66837164d17c6103b6b8
Signed-off-by: Quentin Perret <qperret@google.com>
As pKVM does not trust the host, it should not be involved in the
handling of, or be able to observe the response to entropy requests
issues by protected guests.
When an SMC-based implementation of the ARM SMCCC TRNG interface is
present, pass any HVC-based requests directly on to the secure firmware.
Co-developed-by: Ard Biesheuvel <ardb@google.com>
Signed-off-by: Ard Biesheuvel <ardb@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: Ica492ce49fd059a62ecc31bb7ac13c9adb773a08
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>