Add a new hypercall for the nVHE protected mode allowing the host to
share one of its pages with a guest.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Ibfb86932e5da58c6b88448b49be6c1f994dbbd70
Signed-off-by: Will Deacon <willdeacon@google.com>
The memory pages donated by the host to the hypervisor for the creation
of guest page-tables is currently never reclaimed. Fix this returning
those pages back to the host during guest teardown using a hyp_memcache.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I90615c34e61a11ca402c19d016bd40e3dd880637
Signed-off-by: Will Deacon <willdeacon@google.com>
In preparation for managing the stage-2 page-table of guests at EL2 in
nVHE protected mode, allocate memory for the guest PGDs and populate the
shadow kvm_s2_mmu upon shadow creation.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I39f9dec9dc1bb60fe66ec6923f9b4dedc3e37f3f
Signed-off-by: Will Deacon <willdeacon@google.com>
We will soon need to temporarily map pages into the hypervisor stage-1
in nVHE protected mode. To do this efficiently, let's introduce a
per-cpu fixmap allowing to map a single page without needing to take any
lock or to allocate memory.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Id5cbfd04c5c465c9c577a5fac67458331096b581
Signed-off-by: Will Deacon <willdeacon@google.com>
__pkvm_create_private_mapping() is currently responsible for allocating
VA space in the hypervisor's "private" range and creating stage-1
mappings. In order to allow reusing the VA space allocation logic from
other places, let's factor it out in a standalone function.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I9cb82937931d272f9f61297a3f5dea0e039c2595
Signed-off-by: Will Deacon <willdeacon@google.com>
We currently fixup the hypervisor stage-1 refcount only for specific
portions of the hyp stage-1 VA space. In order to allow unmapping pages
outside of these ranges, let's fixup the refcount for the entire hyp VA
space.
Signed-off-by: Quentin Perret <qperret@google.com>
[willdeacon@: Fix context conflicts with IOMMU init in __pkvm_init_finalise()]
Bug: 209580772
Change-Id: I11d6374d9416bc97cf85fc33bb87c21369412c49
Signed-off-by: Will Deacon <willdeacon@google.com>
In preparation for re-using icache_inval_pou from the EL2 nVHE
hypervisor, annotate it as position-independent.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I66fbf42239a891bfd149c79bf77379fea124793e
Signed-off-by: Will Deacon <willdeacon@google.com>
Introduce helper functions easing the manipulation of a hyp_memcache at
EL2 in nVHE protected mode.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I3c1bda86eae67adfa4d0424e68aa5e842562b43c
Signed-off-by: Will Deacon <willdeacon@google.com>
Introduce helper functions easing the manipulation of a hyp_memcache
from the kernel.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I42f2718e059f0cb32b7007e390ba758ba68ddd0c
Signed-off-by: Will Deacon <willdeacon@google.com>
The host and hypervisor will need to dynamically exchange memory pages
soon. Indeed, the hypervisor will rely on the host to donate memory
pages it can use to create guest stage-2 page-table and to store
metadata. In order to ease this process, introduce a struct hyp_memcache
which is essentially a linked list of available pages, indexed by
physical addresses.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Iae46d01e4c0f75cc637adf50a7d0a879d85dff5e
Signed-off-by: Will Deacon <willdeacon@google.com>
The initialization of stage-2 page-tables of guests is currently split
across two functions: kvm_init_stage2_mmu() and kvm_arm_setup_stage2().
That is presumably for historical reasons as kvm_arm_setup_stage2()
originates from the (now defunct) KVM port for 32bit Arm.
Simplify this code path by merging both functions into one, and while at
it make sure to map the kvm struct into the hypervisor stage-1 early on
to simplify the failure path.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I1f9db0251f24e9712607f0c13c9f4017734d3b0f
Signed-off-by: Will Deacon <willdeacon@google.com>
All the contiguous pages used to initialize a hyp_pool are considered
coalesceable, which means that the hyp page allocator will actively
try to merge them with their buddies on the hyp_put_page() path.
However, using hyp_put_page() on a page that is not part of the inital
memory range given to a hyp_pool() is currently unsupported.
In order to allow dynamically extending hyp pools at run-time, add a
check to __hyp_attach_page() to allow inserting 'external' pages into
the free-list of order 0. This will be necessary to allow lazy
donation of pages from the host to the hypervisor when allocating guest
stage-2 page-table pages at EL2.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I3e3a6cb513f4fcc8b9683784c41fa6d3af119ac4
Signed-off-by: Will Deacon <willdeacon@google.com>
We will soon need to donate memory to the hypervisor in protected mode
when creating guest VMs. The amount of memory to be donated will depend on
the size of the (potentially concatenated) stage-2 PGD for that guest,
among other things. To prepare the ground for this donation, introduce a
helper in the KVM page-table library allowing to compute the size of a
stage-2 PGD given a VTCR.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I266f85d17415e72eb574afdb1d8b622d87b9fcc3
Signed-off-by: Will Deacon <willdeacon@google.com>
Break-before-make (BBM) can be expensive, as transitioning via an
invalid mapping (i.e. the "break" step) requires the completion of TLB
invalidation and can also cause other agents to fault concurrently on
the invalid mapping.
Since BBM is not required when changing only the software bits of a PTE,
avoid the sequence in this case and just update the PTE directly.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: Ib4858b2491b221490b74cb5100474acaad711483
Signed-off-by: Will Deacon <willdeacon@google.com>
Add PSCI 1.0 support for protected VMs. All mandatory functions
are supported, except for SYSTEM_RESET, because pKVM doesn't have
a way of resetting protected VMs yet.
Some VMMs issue a SYSTEM_RESET when tearing down a VM, therefore,
for now, we repaint SYSTEM_RESET calls as SYSTEM_OFF.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Ide47339dc4c0392b41f77e90c43ec805a0780d00
Signed-off-by: Will Deacon <willdeacon@google.com>
Move kvm_vcpu_enable_ptrauth() to a shared header to be used by
hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I9457bb7a16e807b6c59b4b3e121b6f4715a12cab
Signed-off-by: Will Deacon <willdeacon@google.com>
Move vcpu_read_sys_reg and vcpu_write_sys_reg to a shared header
to be used by hyp in protected mode.
Refactored as macros to avoid including kvm.h, which would have
been needed for struct vcpu.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I6b949d60b1cab6aa12978e2f9b775192701990c6
Signed-off-by: Will Deacon <willdeacon@google.com>
Move some PSCI functions and macros to a shared header to be used
by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I3057eadc9c2a71c2fee36575a8215c3119bd9e36
Signed-off-by: Will Deacon <willdeacon@google.com>
Instruction and data aborts taken from a guest are very different
beasts, especially as MMIO exits are only ever triggered by the latter.
Rework the shadow handlers so that instruction and data aborts are
handled independently, making the control flow and MMIO handling
considerably easier to reason about.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I8a0d6881a13154ceaedf4b51ff224256db2e7ebf
Signed-off-by: Will Deacon <willdeacon@google.com>
Rework the sysreg entry/exit handling for protected guests so that:
- The host view of the vCPU pstate is updated on exit, as this may be
required for emulation
- WRITE_ONCE() is used to update host vCPU members
- The esr_sys64_to_params() helper is used to decode the ESR
- A pending exception request from the host takes precedence over a PC
increment when setting the shadow vCPU flags
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: Ia9b546f86a8815d777b3ff4ee3b09426cba7ad71
Signed-off-by: Will Deacon <willdeacon@google.com>
Implement lazy save/restore of the FPSIMD and SVE registers for the host
so that the guest register state remains live for a vCPU with
KVM_ARM64_FP_ENABLED set and the state is only switched in response to a
trap from the host.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I6977740727964e297d1e658e8eaec4f0125cce48
Signed-off-by: Will Deacon <willdeacon@google.com>
Restrict protected VM capabilities based on the
fixed-configuration for protected VMs.
No functional change intended in current KVM-supported modes
(nVHE, VHE).
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Ic247aefac8b4046efaa69d24580f7c58aaeeb80d
Signed-off-by: Will Deacon <willdeacon@google.com>
In nVHE protected mode the hypervisor sometime needs to read or write to
host-provided data-structures, such as vcpu structs or the kvm struct.
To ensure that the hypervisor can't be tricked by the host into writing
to pages it doesn't own, let's pin the host pages containing those data
structures during the shadow vm creation. This will ensure those pages
remain in a host-shared state for the lifetime of the VM.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Id11bd6a86754b6a3e0c504b06940df310641357d
Signed-off-by: Will Deacon <willdeacon@google.com>
Introduce two new hypercalls for the nVHE protected mode allowing the
host to load and put the vCPU state, following the usual split between
vcpu_{load,put}() and vcpu_run() in KVM.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I34a19c96221c9a54ed142803b8f18f2c63511c57
Signed-off-by: Will Deacon <willdeacon@google.com>
Similar to the vgic state, make sure to sync and flush the virtual timer
state between the host and the hyp shadow vCPU structs when running in
nVHE protected mode.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: Ib90273bb670d9d815dd9f542369dde00753655cf
Signed-off-by: Will Deacon <willdeacon@google.com>
Now that protected VMs have a shadow state maintained at EL2 in nVHE
protected mode, make sure to sync and flush the vgic state between host
and hyp data structure upon entry and exit from a guest.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I8d7ab950041a5cd79217c9ee0e04742a27439a99
Signed-off-by: Will Deacon <willdeacon@google.com>
Merge the functions to save and restore vmcr and apr. This can in some
cases reduce the number of hypercalls necessary to load/put the vgic
state in nVHE and will also ease its management in protected mode later
on.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: Id85f0698a7a346282e55c15993c274828bd5309c
Signed-off-by: Will Deacon <willdeacon@google.com>
Move the initialization of traps to the initialization of the
shadow vcpu, and remove the associated hypercall.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I4bf45733972ac54c72c40b3ef1df32cfe7d04a70
Signed-off-by: Will Deacon <willdeacon@google.com>
Add handlers to exchange information between the
host and the protected guest on vcpu entry and exit, which
most often would happen on running a vcpu.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I1716f55f5a1cb75dcde26b58af8f78ee80e4a19e
Signed-off-by: Will Deacon <willdeacon@google.com>
Create and populate a shadow table that contains the state hyp
need for running protected VMs, i.e., struct kvm and struct
kvm_vcpu at EL2.
The memory for this is donated by the host and then unmapped from
the host at stage 1 and at stage 2 (by hyp).
This state is not used yet.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Ie2d948f2a5f22a06d615d909de7a60d46944e6d8
Signed-off-by: Will Deacon <willdeacon@google.com>
Create a framework for resetting protected VM system registers to
their architecturally defined reset values.
No functional change intended as these are not hooked in yet.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Iafdab9f796897429f0fb8abd5d7df9ca576e1f91
Signed-off-by: Will Deacon <willdeacon@google.com>
Move the computation of the mpidr to its own function in a shared
header, as the computation will be used by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: I531795b43c9747dceea485843eed114675db9354
Signed-off-by: Will Deacon <willdeacon@google.com>
Move the macro defines of the pstate reset values to a shared
header to be used by hyp in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Iafd31108675027a799ce9ff3c5c56b49e87ead67
Signed-off-by: Will Deacon <willdeacon@google.com>
The values of the trapping registers for protected VMs should be
computed from the ground up, and not depend on potentially
preexisting values.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Iacd3916dd1bbfc8d9cc859f94a9d879e9d456ebc
Signed-off-by: Will Deacon <willdeacon@google.com>
Having a static initializer for hyp_spinlock_t simplifies its
use when there isn't an initializing function.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Ib1eabe03f49013955a7afcbfcc6a7d3c4a31a736
Signed-off-by: Will Deacon <willdeacon@google.com>
Create a macro definition for the FAR_EL2 mask and use it instead
of a hard-coded value, and put it in a share header to be used by
hyp.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: Ib83932d670cba6bf8f1ed45d2c0e1ed34331d98d
Signed-off-by: Will Deacon <willdeacon@google.com>
Debug and trace are not currently supported for protected guests.
Trap related exceptions and restrict access to related registers.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 209580772
Change-Id: If7483e5b38837d6e7d83c47657a94f16a34ba856
Signed-off-by: Will Deacon <willdeacon@google.com>
In preparation for using some of the pKVM fixed configuration register
definitions to filter the available VM CAPs in the host, split the
nvhe/fixed_config.h header so that the definitions can be shared
with the host, while keeping the hypervisor function prototypes in
the nvhe/ namespace.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 209580772
Change-Id: I587bbcfebcc89633695fde9a5cfa1546fdca1018
Signed-off-by: Will Deacon <willdeacon@google.com>
Add helpers allowing the hypervisor to check whether a range of pages
are currently shared by the host, and 'pin' them if so by blocking host
unshare operations until the memory has been unpinned. This will allow
the hypervisor to take references on host-provided data-structures
(struct kvm and such) and be guaranteed these pages will remain in a
stable state until it decides to release them, e.g. during guest
teardown.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I60ff204bd11e78e3e2ce21defc0d94ae916f5097
Signed-off-by: Will Deacon <willdeacon@google.com>
The EL2 vmemmap in nVHE Protected mode is currently very sparse: only
memory pages owned by the hypervisor itself have a matching struct
hyp_page. But since the size of these structs has been reduced
significantly, it appears that we can afford backing the vmemmap for all
of memory.
This will simplify a lot memory tracking as the hypervisor will have a
place to store metadata (e.g. refcounts) that wouldn't otherwise fit in
the 4 SW bits we have in the host stage-2 page-table for instance.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Idaaf67ae6401765143fd7fe4b12f8f53e9cbf64b
Signed-off-by: Will Deacon <willdeacon@google.com>
We will soon need to manipulate struct hyp_page refcounts from outside
page_alloc.c, so move the helpers to a header file.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I5cfeeb1e3e6a61cbba70c242cf25e035b26149e7
Signed-off-by: Will Deacon <willdeacon@google.com>
The hypervisor will soon need to donate memory pages to the host to
return pages backing guest VM metadata during guest teardown, so provide
a helper allowing hyp-to-host memory donations.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I3013e8f69e9d26fae751bb81cc1e66253f0f5039
Signed-off-by: Will Deacon <willdeacon@google.com>
The host will soon need to donate memory pages to the hypervisor to
store VM metadata, so provide a helper function allowing host-to-hyp
memory donations.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I246978d81bd5301dae13c1f9d3e546334ecd88ad
Signed-off-by: Will Deacon <willdeacon@google.com>
Returning memory ownership of KVM metadata pages to the host once it is
no longer required (i.e. after VM teardown) can be achieved using a
series of memory donations from the hypervisor to the host.
Implement hyp-to-host memory donation.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I7c77bf6dae0ee7f96cd032d06b1ced5502530786
Signed-off-by: Will Deacon <willdeacon@google.com>
Transferring ownership information of a memory region from one component
to another can be achieved using a "donate" operation, which results
in the previous owner losing access to the underlying pages entirely.
Implement a do_donate() helper, along the same lines as do_{un,}share,
to provide this functionality for the host-to-hyp case.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I426f8b068450e7e6b93ba05a0aea6ce8f93e6bf7
Signed-off-by: Will Deacon <willdeacon@google.com>
CMOs issued from EL2 cannot directly use the kernel helpers,
as EL2 doesn't have a mapping of the guest pages. Oops.
Instead, use the mm_ops indirection to use helpers that will
perform a mapping at EL2 and allow the CMO to be effective.
Fixes: 25aa28691b ("KVM: arm64: Move guest CMOs to the fault handlers")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209777660
Link: https://lore.kernel.org/r/20220114125038.1336965-1-maz@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I8cd221f7c89a20de28f0bea422641622b8320c1f
The S2MPU must wait for a v9 device to finish invalidation before
accessing its SFRs. Failure to do so can result in memory transaction
timeouts.
Add a loop that polls the STATUS register while the return value has
the BUSY and ON_INVALIDATING bits set.
Test: builds, boots
Bug: 190463801
Bug: 206761586
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I00891dc3a8ad185d29757b8622a053a96237b803
Comments in S2MPU driver code were mistakenly prefixed with /**,
denoting a kernel-doc comment. Since these do not match kernel-doc
syntax, replace them with regular /* comments.
Test: n/a
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I81ed57f22c2cf9eaa5761f11b4f3b8ce1800f457
The previous version would miss if FS_OPEN_EXEC was set.
Change-Id: I52d55bed2ca029f8fae8576f831a0621f2d02804
Fixes: b99f858e42 ("ANDROID: fsnotify: Notify lower fs of open")
Bug: 70706497
Signed-off-by: Daniel Rosenberg <drosen@google.com>
If the filesystem being watched supports d_canonical_path,
notify the lower filesystem of the open as well.
Fixes: f37e05049b ("ANDROID: vfs: d_canonical_path for stacked FS")
Bug: 70706497
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I7c9d210e8e6ee99928ad9db0b41ffc3ac3371dc0