Create 'struct pkvm_iommu_ops' for the S2MPU and a new driver ID to the
list of IOMMU drivers. Implement the 'init' callback, accepting donated
memory from the host to back SMPTs. If the donation is successful,
the SMPTs are assigned to 'host_mpt'.
Export 'pkvm_iommu_s2mpu_register' for a kernel module to call to
register an S2MPU device. First call to this function will also
run the global S2MPU driver initializer.
Bug: 190463801
Change-Id: Icad06379e5cf695fba4f3a18a0773e302f3ead06
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 41707102f4)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
The function is superseded by the generic
pkvm_iommu_host_stage2_adjust_range, remove it.
Bug: 190463801
Change-Id: If42b40357f1d9a046ff20815215f927ac2a0d765
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 3da3f51b33)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Replace all uses of 'struct s2mpu' with the generic 'struct pkvm_iommu'.
'struct s2mpu_drv_data' is created to accommodate driver-specific values
associated with 'struct pkvm_iommu' and allocated by the generic code.
These changes are safe because the S2MPU code is currently unused.
The EL1 code that initialized it had been removed.
Bug: 190463801
Change-Id: Ia634bac9b7dda333d87f7da0a02768df01d6bbd6
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit a1ed8a1881)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
EL2 S2MPU driver relied on EL1 code which parsed the DT and populated
EL2 driver data before deprivileging of the host. The driver is now
moving to later initialization from kernel modules, which will take over
the role of parsing the DT and power management. Remove the unused code.
Bug: 190463801
Change-Id: Ie6e21ba02b84494e5066c7681f85612a09f93f6d
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 167332a9fa)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
S2MPU code previously assumed that all S2MPUs were powered on at boot
and would check the version register and precompute the value of
S2MPU.CONTEXT_CFG_VALID_VID.
With EL1 S2MPU code being removed, and to allow for S2MPUs not powered
at boot, move the code to EL2 and run it on resume.
Bug: 190463801
Change-Id: Icaccfd125a6be7bab336ca3ffee52f2a33cf43b2
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit c823243791)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
IOMMU drivers may need to keep their own state of the host stage-2
mappings, eg. because they cannot share the PTs with the CPU. To this
end, walk the host stage-2 at driver init time and pass the current
state of host stage-2 mappings to the driver.
The driver initialization lock is released together with host_kvm
lock. That was the driver starts receiving stage-2 updates immediately
after the snapshot is taken.
Bug: 190463801
Change-Id: I5a5b0e064c5c88e210e28e343314318a2a1bffda
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 1ec4b346d0)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Add IOMMU callbacks for host stage-2 idmap changes.
'host_stage2_idmap_prepare' is called first and is expected to apply
the changes on the driver level, eg. update driver-specific page table
information. If successful, the generic code invokes
'host_stage2_idmap_apply' on each currently powered IOMMU device
associated with the driver to apply the changes.
Bug: 190463801
Change-Id: Ifcc063896f6e8967c332dbaa5b7e7f2ba138abbf
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 4395ddff4b)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Replace the 'host_mmio_dabt_handler' hook in kvm_iommu_ops with
an equivalent callback in the new pkvm_iommu_ops. The generic portion
of the code finds the IOMMU device at the faulted address and invokes
the callback on it.
Bug: 190463801
Change-Id: I0ca008c3e1ae0ec12a259fa4ddac1aee65aaac5c
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 5df451f35e)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Add suspend/resume callbacks for IOMMU devices. The EL1 kernel driver
is expected to call these when the IOMMU device is powered on but is
about to be used or about to stop being used.
pkvm_iommu_suspend/resume are exported for use by kernel modules.
Bug: 190463801
Change-Id: Ia4ab37fe96879d451ce82f4278b3ff33a0b9685b
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit ca47ae70c7)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Add '__pkvm_iommu_register' hypcall for registering a new IOMMU device.
The handler allocates a linked-list entry for the device from a memory
pool provided by the host. If the pool has run out, the handler returns
-ENOMEM and expects the host to call it again with a fresh mem pool.
The inputs are validated, eg. ID is unique and memory region does not
overlap with existing IOMMUs. The driver can also implement a 'validate'
callback for driver-specific input validation.
If successful, the handler creates a private EL2 mapping for the device,
forces the memory region is unmapped from host stage-2 and inserts the
device into the linked list. Future attempts to map the MMIO region will
fail because of pkvm_iommu_host_stage2_adjust_range.
Bug: 190463801
Change-Id: If6f707555c80ac164ff995f42260872896a84e3d
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 78e0b7722c)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Introduce a linked list of IOMMU devices and
'pkvm_iommu_host_stage2_adjust_range' called from host DABT handler.
The function will adjust the memory range that is about to be mapped
to avoid MMIO regions of all devices in the linked list. If the host
tried to access a device MMIO region, the access is declined.
The function replaces the existing call to
'kvm_iommu.ops.host_stage2_adjust_mmio_range' callback.
Bug: 190463801
Change-Id: Iacd6b74147fea2fef04846a91f0a5e550daaf074
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit d7adab5f9f)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Add '__pkvm_iommu_driver_init' hypcall and 'struct pkvm_iommu_ops' with
an 'init' callback implemented by an EL2 driver. Driver-specific data
can be passed to 'init' from the host. The memory is pinned while
the callback processed it.
Bug: 190463801
Change-Id: I1185350bb46d41ff060a207af8e6d1f2f8a3d32d
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 1d9ae14c92)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
The data abort fault IPA obtained from HFAR_EL2 has the bottom 12 bits
zeroed out. This broke the host MMIO DABT handler because the offsets
of accessed MMIO registers were rounded down to the nearest page.
Include FAR_EL2 in the address to fix the issue.
Bug: 220194478
Change-Id: I6473e2dfbe189c58c15c0e5647d695d07f88c5e0
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 346987baf5)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
The S2MPU must wait for a v9 device to finish invalidation before
accessing its SFRs. Failure to do so can result in memory transaction
timeouts.
Add a loop that polls the STATUS register while the return value has
the BUSY and ON_INVALIDATING bits set.
Test: builds, boots
Bug: 190463801
Bug: 206761586
Change-Id: Ie8755bd3466b2c76ca05d6f3f2dd6e8e7bce592c
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit e149939df2)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Comments in S2MPU driver code were mistakenly prefixed with /**,
denoting a kernel-doc comment. Since these do not match kernel-doc
syntax, replace them with regular /* comments.
Test: n/a
Bug: 190463801
Change-Id: I0c68bb5d1c843caeb4d535430bdfc866ba8d119c
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 4377d9dea9)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
The `init` callback of an IOMMU driver is called just before
`finalize_host_mappings` so that EL2 mappings created by drivers are
subsequently unmapped from host stage-2. However, at this point hyp has
already switched to the buddy allocator, having reserved pages allocated
by the early allocator, but `pkvm_pgtable.mm_ops` have not been switched
to buddy allocator callbacks. As a result, pages allocated for EL2
mappings of the IOMMU driver are allocated by the obsoleted early
allocator and remain treated as free by the buddy allocator. This likely
leads to a corruption in the free page lists and a later hyp panic.
Move the initialization of `pkvm_pgtable.mm_ops` before
`finalize_host_mappings` and the call to IOMMU's `init`.
Test: run a VM
Test: adb shell cmd jobscheduler run -f android 5132250
Bug: 190463801
Bug: 209004831
Change-Id: I1f6e00bca087d889b0cad4bd43d044895e37006c
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 395d045123)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
The function is only used in the compilation unit where it is defined.
Silence a warning by marking it static.
Test: builds
Bug: 190463801
Change-Id: I296cffefdef4639ef2bab644d42f1374ee1a2f60
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 91abc8ece2)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
The S2MPU driver needs to protect its MMIO registers from the host.
Implement the host_stage2_adjust_mmio_range callback and restrict
the address range that is about to be mapped in to avoid the known
S2MPU MMIO regions.
Test: builds, boots
Bug: 190463801
Change-Id: Ib46f5dd651b9368c31940035e4c28a7324fc4160
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 8f23406153)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
The host should not have access to the vast majority of S2MPU MMIO
registers. Currently it only needs access to fault information, in
the future maybe also performance registers.
Implement an MMIO trap handler for the S2MPU, allowing read-only
access to FAULT_* registers, and a write-only access to
INTERRUPT_CLEAR.
Test: builds, boots
Bug: 190463801
Change-Id: Ia482cc65642ba9ec303f443591e8f0fe192d4d27
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 81e70911d6)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
The 'host_stage2_set_owner' callback indicates that a range of
PA-contiguous pages changed owner. With all devices owned by the host,
the driver sets the protection bits in the corresponding FMPT/SMPT to
either MPT_PROT_RW if owned by the host or MPT_PROT_NONE otherwise.
For each gigabyte region, the implementation will select between 1G and
4K/64K (depending on PAGE_SIZE) mappings and populate the L1ENTRY_ATTR
register or SMPT bitmap, respectivelly.
The driver never dynamically switches between two granularities which
both require a SMPT. This is because the L1ENTRY_ATTR and
L1ENTRY_L2TABLE_ADDR registers would need to be set atomically.
Test: builds, boots
Bug: 190463801
Change-Id: Ifb0bdcaa143ef8eb213ba4133ac86d8b610a4bcf
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 4475d993aa)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
S2MPU Second-level Memory Protection Table is a PA-contiguous buffer
containing an array of 2-bit read/write entries at given granularity
for a given gigabyte physical address space region. The size of SMPT
varies per granularity but at the finest 4K granularity it is 64KB
PA-contiguous, aligned to 64KB.
Allocate sufficient number of SMPT buffers for the S2MPU driver assuming
4K granularity for 4K/16K PAGE_SIZE, and 64K granularity for 64K
PAGE_SIZE. We also assume that all S2MPUs share SMPTs for a given
gigabyte region. There are 34 gigabyte regions that can be set by the
driver (GBs 4-33 always block all traffic).
Hyp takes ownership of the memory in s2mpu_init and assigns pointers to
the buffers to L1ENTRY_L2TABLE_ADDR registers on init and power-on
events. The pointers remain static as the driver will only change
granularity between 1G and 4K/64K (depending on PAGE_SIZE).
Test: builds, boots
Bug: 190463801
Change-Id: I3fcad8b3ce5d194a987b09d042bd56d59bb35e5e
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit f0e1de52ef)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Intercept SMCs known to be used by the host to inform EL3 about power
events, either powering SoC blocks on or off.
Test: builds, boots
Bug: 190463801
Change-Id: I306433c8c1b712df24569cbd4dc346f72b4c9650
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 8ca0b34fe4)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Initialize the S2MPU driver in __pkvm_init_stage2_iommu if requested by
the host. The driver sets kvm_iommu_ops and configures all S2MPUs which
are powered on at that point (ie. all S2MPUs on currently supported
devices).
The S2MPU L1ENTRY registers are set to 1G granularity and R/W access.
CTRL0/CTRL1/CFG as set to reasonable defaults, though the code relies on
the reset state blocking all traffic as well.
On fault the S2MPUs are configured to return SLVERR/DECERR (v8/9) to the
master. Interrupts are enabled for all VIDs and trigger an IRQ handler
if EL1 init registered a handler as a result of a DT interrupts entry.
Because the host can configure the SSMTs freely, all permission bits are
configured for all VIDs. For v9 CONTEXT_CFG_VALID_VIDS is set to the
value precomputed at EL1, allocating a context ID to each VID.
Test: builds, boots
Bug: 190463801
Change-Id: I4a824e90b5d474dd83c97ef53e4df3c8b68da6ba
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 8aa6c440da)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Create variables in hyp that will hold the DT information about S2MPUs
to use by hyp at runtime. Copy the information from EL1 to EL2.
The EL1 code computes the size of the data and allocates a sufficient
number of pages, which hyp will later take ownership of.
Test: builds, boots
Bug: 190463801
Change-Id: Ic3d4bfa3ec11f7c2e1b4474910e2f57a62139a75
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit bc80f81582)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
The S2MPU can be configured to trigger an interrupt on faults: access
permission (both regular and during page table walks) and if no matching
context ID is found for request's VID (v9 only).
When interrupt information is provided in the S2MPU's DT node, parse the
information and enable an IRQ handler. Later patch will enable the
functionality in the S2MPU.
Test: builds, boots
Bug: 190463801
Change-Id: I11d1a896406011cff1506ee1bd124bfc66ffa914
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 2517c4e5f0)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
S2MPU_CONTEXT_CFG_VALID_VID register must be configured on v9,
allocating a context ID in range 0 to S2MPU_NUM_CONTEXT to each valid
VID. For now assume that all 8 VIDs are valid. This will change once
the hypervisor takes control over SSMT configuration as well.
If there are more VIDs than available context IDs, the driver prints
a warning that DMA may be blocked and continues.
Test: builds, boots
Bug: 190463801
Change-Id: I0c9e0a5c9470b27debaade2c4e02e16c6577fbfe
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 923353be1e)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Read S2MPU_VERSION during driver init and check it against list of
supported versions. The register fields are as follows:
- MAJOR_ARCH_VER,
- MINOR_ARCH_VER,
- REV_ARCH_VER,
- RTL_VER.
Their exact use is not documented. For now, we mask out RTL_VER and
expect a match on MAJOR_, MINOR_ and REV_ARCH_VER. This may be tweaked
in the future.
Test: builds, boots
Bug: 190463801
Change-Id: I9709fde5f4d3ca4c23f84919c37b081302846917
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 4a7da93bdb)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Start EL1 portion of the S2MPU driver with an init function which
probes the Device tree for nodes compatible with 'google,s2mpu'.
Parse and check the base, size and power domain ID.
Test: builds, boots
Bug: 190463801
Change-Id: I5f0b32febb4e922fdfdfe10a9a9c823e20b8e26f
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 4e91a00153)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Create a skeleton driver for the S2MPU - an EL1 portion called during
KVM init which will parse the DT and configure the kernel, and an EL2
portion which will program the S2MPUs later at runtime. The code is
behind CONFIG_KVM_S2MPU.
Test: builds, boots
Bug: 190463801
Change-Id: I58206535f3493e1d989576a9db2112d370a1cb4d
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit b2de5483b7)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Add a new kvm_iommu_ops hook to the lower-EL instruction/data abort
handler, which allows the IOMMU driver to restrict the region of device
memory that is about to be mapped in the host stage-2.
This can be used by the IOMMU driver to restrict access to the MMIO
registers of the IOMMU itself.
Test: builds, boots
Bug: 190463801
Change-Id: I51cf3cfd84c889627e290d74579657447964ca16
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit cc1ad46fb2)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Add a new kvm_iommu_ops hook which allows the IOMMU driver to handle
data aborts in unmapped device memory regions. If the abort is handled
by the driver, the global abort handler will not attempt to map in the
page.
For example, this enables the IOMMU driver to virtualize access to
the underlying IOMMU hardware, or to allow access to a subset of the
functionality, eg. performance counters.
Test: builds, boots
Bug: 190463801
Change-Id: I84adbc992e577ac6ceb09f4856e1c648df580f76
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 25f81ec77b)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Add a new hook to kvm_iommu_ops that is invoked whenever a range of
pages changes their owner in the host stage2. This is currently limited
to finalize_host_mappings, which changes the owner of EL2-mapped pages
from host to hyp.
The driver is expected to apply corresponding changes in the IOMMU it
controls, so that only the new owner can access the page range.
Test: builds, boots
Bug: 190463801
Change-Id: I0809f4859a9117d1a37506b7aa9e19c6bd25ffdb
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 3cd8b5b00b)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
IOMMU drivers need to intercept power management SMCs between the host
and EL3. Add a hook to hyp's 'handle_host_smc'.
Test: builds, boots
Bug: 190463801
Change-Id: Ied34b60d4bb0e5ae0fbf03f8ce1dc22a09679e37
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit d2efcdcb2b)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Bootstrap infrastructure for IOMMU drivers by introducing kvm_iommu_ops
struct in EL2 that is populated based on a iommu_driver parameter to
__pkvm_init hypercall and selected in EL1 early init.
An 'init' operation is called in __pkvm_init_finalise, giving the driver
an opportunity to initialize itself in EL2 and create any EL2 mappings
that it will need. 'init' is specifically called before
'finalize_host_mappings' so that:
(a) pages mapped by the driver change owner to hyp,
(b) ownership changes in 'finalize_host_mappings' get reflected in
IOMMU mappings (added in a future patch).
Test: builds, boots
Bug: 190463801
Change-Id: I04c9f32c6eda846e6e377cb3d23330eb143b6242
Signed-off-by: David Brazdil <dbrazdil@google.com>
(cherry picked from commit 79775d0225)
Signed-off-by: Mostafa Saleh <smostafa@google.com>
aosp/2257747 merged v5 of the pKVM hypervisor state series as FROMLIST.
Since then, version 6 was posted and queued by the upstream maintainer:
https://lore.kernel.org/r/166819337067.3836113.13147674500457473286.b4-ty@kernel.org
Rather than revert v5 from android (and the dozens of dependent patches),
snap to v6 so that we're in-sync with upstream.
Bug: 233587962
[willdeacon@: Fix conflicts with 'stage2_mc' introduced by accounting work]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I137bbd611c180cbe03e63a55705150f8f9c2ae31
The optimizations [1] and [2] to reset vma->anon_vma during
MREMAP_DONTUNMAP can affect speculative page fault handler. If
vma->anon_vma reset happens after do_anonymous_page verified no
changes to the vma and obtained the ptl lock but before it calls
page_add_new_anon_rmap() then __page_set_anon_rmap() will stumble
on BUG_ON(!anon_vma). Disable these optimizations if SPF is enabled
to avoid such situations. As a result the reverse map walk will
consider the old VMA as it did before these optimizations were
introduced.
[1] 1583aa278f ("mm: mremap: unlink anon_vmas when mremap with MREMAP_DONTUNMAP success")
[2] ee8ab1903e ("mm: rmap: explicitly reset vma->anon_vma in unlink_anon_vmas()")
Bug: 257443051
Change-Id: I4e7611137f4a49c94bfe73532b4b06cbb0d2405b
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
do_numa_page() uses pte_offset_map() directly and needs to implement
additional mechanisms to ensure the mempolicy object used in
numa_migrate_prep() is not destroyed from under it when speculating.
Rather than fixing this, just disable speculation for CONFIG_NUMA
for now and fix it if it's ever needed in Android.
Bug: 257443051
Change-Id: Ib5750b9809979a69a42ebfa6c130e123f416f1aa
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Speculative page fault handling expects MMU_GATHER_RCU_TABLE_FREE to
guarantee that page tables are stable, however tlb_remove_table() has
a slow-path fall-back case when __get_free_page() returns NULL and
tlb_remove_table_one() gets called. The way synchronization is
implemented in that function is not RCU-safe and require IRQs to be
disabled (see the comment in tlb_remove_table_sync_one()).
Fix the invalid assumption to disable IRQs even when
MMU_GATHER_RCU_TABLE_FREE=y.
Bug: 257443051
Change-Id: I227f351607cf73022cb31f6f7a232cab41cf6a5a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
When searching vma under RCU protection vmcache should be avoided because
a race with munmap() might result in finding a vma and placing it into
vmcache after munmap() removed that vma and called vmcache_invalidate.
Once that vma is freed, vmcache will be left with an invalid vma pointer.
Bug: 257443051
Change-Id: I62438305fcf5139974f4f7d3bae5b22c74084a59
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
move_page_tables() can move entire pmd or pud without locking individual
ptes. This is problematic for speculative page faults which do not take
mmap_lock because they rely on ptl lock when writing new pte value. To
avoid possible race, disable move_page_tables() optimization when
CONFIG_SPECULATIVE_PAGE_FAULT is enabled.
Bug: 257443051
Change-Id: Ib48dda08ecad1abc60d08fc089a6566a63393c13
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Speculative page fault checks pmd to be valid before starting to handle
the page fault and pte_alloc() should do nothing if pmd stays valid.
If pmd gets changed during speculative page fault, we will detect the
change later and retry with mmap_lock. Therefore pte_alloc() can be
safely skipped and this prevents the racy pmd_lock() call which can
access pmd->ptl after pmd was cleared.
Bug: 257443051
Change-Id: Iec57df5530dba6e0e0bdf9f7500f910851c3d3fd
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Current mechanism to stabilize a vma during speculative page fault
handling makes a copy of the faulting vma under RCU protection. This
makes it hard to protect elements which do not belong to the vma but
are used by the page fault handler like vma->vm_file.
The problems is that a copy of the vma can't be used to safely
protect the file attached to the original vma unless the file is
also released after RCU grace period (which is how SPF was designed
originally but that caused performance regression and had to be
changed).
To avoid these complications, introduce vma refcounting to stabilize
and operate on the original vma during page fault handling. Page
fault handler finds the vma and increases its refcount under RCU
protection, vma is freed after RCU grace period, vma->vm_file is
released only after refcount indicates no users. This mechanism
guarantees that once get_vma returns a vma, both the vma itself and
vma->vm_file are stable.
Additional benefits of this patch are: we don't need to copy the vma
and no additional logic is needed to stabilize vma->vm_file.
Bug: 257443051
Change-Id: I59d373926d687fcbd56847a8c3500c43bf1844c8
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Use vma->vm_file refcounting to protect the file during speculative page
fault handling.
Bug: 258731892
Change-Id: I222c23785391bea7d95c4506d70d6f68029ec45f
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
This reverts commit a3fe25d92303739a0515c92cb1febb46a920d4d9.
File refcounting implemented in this patch is broken and needs to be
redone.
The change in include/linux/mm_types.h which adds file_ref_count into
vm_area_struct is left untouched to keep ABI intact.
Bug: 258731892
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I37984eb2f0981a989f74bcaaa6be42040a2f241e
This reverts commit 0f4ea1e59394908a0c1c7619c7a24fd7f790586f.
File refcounting implemented in this patch is broken and needs to be
redone.
Bug: 258731892
Change-Id: I3ae5a78b871edaf655d1c9a7868c8543e27f39e5
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
This reverts commit 4fc18576ca94ca9620bd03e0fc7a64467c1ea0c2.
File refcounting implemented in this patch is broken and needs to be
redone.
Bug: 258731892
Change-Id: Ibcefaf6aa72c60c9627d0ea7d473a3ec806535f4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
This reverts commit 6551a55c4dc5492dcae3dc340c376ed160ab9928.
File refcounting implemented in this patch is broken and needs to be
redone.
Bug: 258731892
Change-Id: I425517a07d1fdcf5cd1842733a4c6c70ef0608b4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
When using nVHE in protected mode, protected memory can be between
host and a guest. Tracking this value is interesting from a debug
perspective, to identify potential leaks.
Keeping the count of memory sharing is easy, each share/unshare will return
to the host where the accounting will take place.
Bug: 222044477
Change-Id: I43dcd258789f79dbfe489e5bf721e606c5e6e022
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
When using the nVHE protected mode, the stage-2 page tables are handled by
the hypervisor, but are backed by memory donated by the host. That memory
is accounted during the donation (add to the vCPUs hyp_memcache) under
secondary pagetable stats.
On VM teardown, those pages are mixed with others in the teardown_mc, so use
a separated teardown_stage2_mc to deduct them from accounting after
reclaim.
Bug: 222044477
Change-Id: I2a45ce65c5ce9cf96aabd1b66d6f83ffe4808a0c
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>