Add three new symbols to the aarch64 kernel ABI. These are to be
called from vendor modules to register an IOMMU with pKVM and
notify the hypervisor about its PM events.
New symbols:
- pkvm_iommu_s2mpu_register
- pkvm_iommu_suspend
- pkvm_iommu_resume
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I9797326a54cba6abd1b233682379de10139c2303
With new generic IOMMU code in place, and with all S2MPU code
having been migrated to the new pkvm_iommu_ops callbacks, remove
all the now unused code.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I6abc7ef0f400250cbb38a673feb1db35116c3f69
Remove the existing 's2mpu_host_stage2_set_owner' hook implementation
and refactor the code to match the prepare/apply split of the generic
IOMMU callbacks for updating host stage-2 mappings.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: If550fe2c41198c320559c8125ec9ecc0479eb249
Core SMPT manipulation code returns mpt_update_flags, signalling whether
the caller should flush the dcache (MPT_UPDATE_L2) or write new L1ATTR
values to S2MPU MMIO registers (MPT_UPDATE_L1).
In preparation for splitting the code into a driver-global and
per-device portions, store the value in the corresponding FMPT.
As long as the two code portions are called from a single critical
section, the FMPT value is guaranteed to not change.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: Iec06697e8826b0dba682476b39cf64acd6337166
Previously the S2MPU DABT handler would be called directly from the host
DABT handler and it would look up the corresponding S2MPU device. Now the
lookup is done in the generic IOMMU DABT handler and only the actual
S2MPU register access is left to the driver itself.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I5236cf01b9e1dcc65a00081797a13ee92a4e263b
The host is now expected to notify EL2 about PM state changes of
individual IOMMU devices. Remove the old code that intercepted SMCs
and instead rely on callbacks from the core IOMMU code.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I2dd49836c01e405562ba62c00efc711f084d5963
Create 'struct pkvm_iommu_ops' for the S2MPU and a new driver ID to the
list of IOMMU drivers. Implement the 'init' callback, accepting donated
memory from the host to back SMPTs. If the donation is successful,
the SMPTs are assigned to 'host_mpt'.
Export 'pkvm_iommu_s2mpu_register' for a kernel module to call to
register an S2MPU device. First call to this function will also
run the global S2MPU driver initializer.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I3d1aaf8535114beae956993674a2b436414f07c4
The function is superseded by the generic
pkvm_iommu_host_stage2_adjust_range, remove it.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I7d138a3c2e2497bdc19e6e6e95b7870ac48d890e
Replace all uses of 'struct s2mpu' with the generic 'struct pkvm_iommu'.
'struct s2mpu_drv_data' is created to accommodate driver-specific values
associated with 'struct pkvm_iommu' and allocated by the generic code.
These changes are safe because the S2MPU code is currently unused.
The EL1 code that initialized it had been removed.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: Ib12b64ffa3281be83440e33cdd032d80df6e4868
EL2 S2MPU driver relied on EL1 code which parsed the DT and populated
EL2 driver data before deprivileging of the host. The driver is now
moving to later initialization from kernel modules, which will take over
the role of parsing the DT and power management. Remove the unused code.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I96542ceeec4fcf1040658779a922363b1e41e976
S2MPU code previously assumed that all S2MPUs were powered on at boot
and would check the version register and precompute the value of
S2MPU.CONTEXT_CFG_VALID_VID.
With EL1 S2MPU code being removed, and to allow for S2MPUs not powered
at boot, move the code to EL2 and run it on resume.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: Ib3c3926c1ed3b78fe39769758a8d66490963d4c5
IOMMU drivers may need to keep their own state of the host stage-2
mappings, eg. because they cannot share the PTs with the CPU. To this
end, walk the host stage-2 at driver init time and pass the current
state of host stage-2 mappings to the driver.
The driver initialization lock is released together with host_kvm
lock. That was the driver starts receiving stage-2 updates immediately
after the snapshot is taken.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I62c54c43d2e165d4abab5efbe14e6ea2589c9ed0
Add IOMMU callbacks for host stage-2 idmap changes.
'host_stage2_idmap_prepare' is called first and is expected to apply
the changes on the driver level, eg. update driver-specific page table
information. If successful, the generic code invokes
'host_stage2_idmap_apply' on each currently powered IOMMU device
associated with the driver to apply the changes.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: Icf0d7b9c4b5b7219074b54c961db2fe85561d114
Replace the 'host_mmio_dabt_handler' hook in kvm_iommu_ops with
an equivalent callback in the new pkvm_iommu_ops. The generic portion
of the code finds the IOMMU device at the faulted address and invokes
the callback on it.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I44147ceb7877dc1999fd10f4db55659bbbec5bb7
Add suspend/resume callbacks for IOMMU devices. The EL1 kernel driver
is expected to call these when the IOMMU device is powered on but is
about to be used or about to stop being used.
pkvm_iommu_suspend/resume are exported for use by kernel modules.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I5cd38aaeb685bcdae0368453138cc099055adb27
Add '__pkvm_iommu_register' hypcall for registering a new IOMMU device.
The handler allocates a linked-list entry for the device from a memory
pool provided by the host. If the pool has run out, the handler returns
-ENOMEM and expects the host to call it again with a fresh mem pool.
The inputs are validated, eg. ID is unique and memory region does not
overlap with existing IOMMUs. The driver can also implement a 'validate'
callback for driver-specific input validation.
If successful, the handler creates a private EL2 mapping for the device,
forces the memory region is unmapped from host stage-2 and inserts the
device into the linked list. Future attempts to map the MMIO region will
fail because of pkvm_iommu_host_stage2_adjust_range.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: If54ba41cd0b219c6e63508b542d526703ab5b97e
Introduce a linked list of IOMMU devices and
'pkvm_iommu_host_stage2_adjust_range' called from host DABT handler.
The function will adjust the memory range that is about to be mapped
to avoid MMIO regions of all devices in the linked list. If the host
tried to access a device MMIO region, the access is declined.
The function replaces the existing call to
'kvm_iommu.ops.host_stage2_adjust_mmio_range' callback.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: Ib38256f0005588810a4400efd9a85380d354be59
Add '__pkvm_iommu_driver_init' hypcall and 'struct pkvm_iommu_ops' with
an 'init' callback implemented by an EL2 driver. Driver-specific data
can be passed to 'init' from the host. The memory is pinned while
the callback processed it.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I7cfe51de553e07083747467e1e3ca8bc51737035
The hypervisor has not needed its own .data section because all globals
were either .rodata or .bss. Linked lists are initialized with the head
pointing to itself. To avoid having to work around this by initializing
at runtime, add a .hyp.data section.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I7a56dc4c93e05bbef53c66837164d17c6103b6b8
The pKVM hypervisor currently zeroes all the pages mapped into guests
when tearing them down for confidentiality reasons. However, for pages
that are shared with the host this is unecessary at best as the content
of memory is already visible. This is particularly bad for non-protected
guests as all their memory is shared with the host by definition.
Add a new flag to distingish pages that solely need to be updated from
an ownership perspective and those that need to be zeroed.
NOTE: We should probably overhaul the teardown procedure at some point
to avoid the proliferation of those flags, but that would require
significant changes so we might not want that in Android 13.
Bug: 223678931
Change-Id: Icefc85a0bdcdf9958e9eb6871c794f68b06a007f
Signed-off-by: Quentin Perret <qperret@google.com>
The pKVM shadow table is protected by 'shadow_lock', however this lock
is only taken across relatively fine-grained calls when inserting and
removing entries from the table. This poses a problem for higher-level
functions such as __pkvm_init_shadow(), where a partially-initialised
shadow entry is made transiently visibly to get_shadow_vcpu() and could
potentially be loaded in an inconsistent state by another CPU.
Push the locking out of the insert/remove functions and up into
__pkvm_{init,teardown}_shadow() so that the shadow state always appears
to be consistent as long as the lock is held.
Bug: 216808671
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I74c563a539c1ce35f5da86a8281e47c7d435bd27
There's no reason to make the internal shadow table data directly
accessible outside of pkvm.c, so make it all static and provide an
initialisation function to install the initial pages.
Bug: 216808671
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Idc0908796ebbd2b620494f5d4d6b6055455c8013
Add hooks to gather data of unsual aborts and summarize it with
other information.
Bug: 222638752
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I74eb36b8551ed9a5e6dc87507939a7f4d81c9c18
Add hooks to gather data of kernel fault and summarize it with
other information.
Bug: 222638752
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I7d6a66837f2e896a413bd8d878f26928669d96e6
Add hooks to gather data of unfrozen tasks and summarize it
with other information.
Bug: 222638752
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I61da3d253bd9959c6f06e09c9a35c4b242cedafe
Add hook to gather data of softlockup and summarize it with
other information.
Bug: 222638752
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I5263bbd573c3fa4b4c981ac26c943721ce09506d
Add hook to gather data of bug trap and summarize it with other
information.
Bug: 222638752
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I1f347c20629786f9bf0b9c50c7f96b50b4360504
Export cpuidle_driver_state_disabled() so that CPU idle states may be
disabled at runtime for debugging CPU and cluster idle states.
Bug: 175718935
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Change-Id: Id9038074d64fb6c0444d9aca68420414c3223e93
(cherry picked from commit de93734e22)
With these hooks, printk can provide more information, such as the
processor ID.
Bug: 223302138
Signed-off-by: Ben Dai <ben.dai@unisoc.com>
Change-Id: Iac60ffd49640d8badf5c5dd446c211d37bbbc6a6
Commit d43583b890 ("KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the
guest") hooked up the SYSTEM_RESET2 PSCI call for guests but failed to
preserve its arguments for userspace, instead overwriting them with
zeroes via smccc_set_retval(). As Linux only passes zeroes for these
arguments, this appeared to be working for Linux guests. Oh well.
Don't call smccc_set_retval() for a SYSTEM_RESET2 heading to userspace
and instead set X0 (and only X0) explicitly to PSCI_RET_INTERNAL_FAILURE
just in case the vCPU re-enters the guest.
Fixes: d43583b890 ("KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest")
Reported-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220309181308.982-1-will@kernel.org
(cherry picked from commit 9d3e7b7c82
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 216801012
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ieead1a813e6b4dfee1aa89e42ee1926efcd5f590
Set KMI_GENERATION=1 for 3/9 KMI update
Leaf changes summary: 2579 artifacts changed (1 filtered out)
Changed leaf types summary: 9 (1 filtered out) leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 2521 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 49 Changed, 0 Added variable
2521 functions with some sub-type change:
[C] 'function void* PDE_DATA(const inode*)' at generic.c:794:1 has some sub-type changes:
CRC (modversions) changed from 0x17465176 to 0x1c3e2a86
[C] 'function void __ClearPageMovable(page*)' at compaction.c:138:1 has some sub-type changes:
CRC (modversions) changed from 0x8331b3e3 to 0x734edab3
[C] 'function void __SetPageMovable(page*, address_space*)' at compaction.c:130:1 has some sub-type changes:
CRC (modversions) changed from 0xe56f361 to 0x891f9c1d
... 2518 omitted; 2521 symbols have only CRC changes
49 Changed variables:
[C] 'bus_type amba_bustype' was changed at bus.c:313:1:
CRC (modversions) changed from 0xe555ebeb to 0x517f2d17
[C] 'const address_space_operations balloon_aops' was changed at balloon_compaction.c:253:1:
CRC (modversions) changed from 0xa9866f1a to 0x89a77b8c
[C] 'const clk_ops clk_divider_ops' was changed at clk-divider.c:522:1:
CRC (modversions) changed from 0xca4154fa to 0x5a75cc1
... 46 omitted; 49 symbols have only CRC changes
'enum nl80211_attrs at nl80211.h:2666:1' changed:
type size hasn't changed
1 enumerator insertion:
'nl80211_attrs::NL80211_ATTR_EHT_CAPABILITY' value '310'
3 enumerator changes:
'nl80211_attrs::NL80211_ATTR_MAX' from value '309' to '310' at nl80211.h:2670:1
'nl80211_attrs::NUM_NL80211_ATTR' from value '310' to '311' at nl80211.h:2670:1
'nl80211_attrs::__NL80211_ATTR_AFTER_LAST' from value '310' to '311' at nl80211.h:2670:1
2 impacted interfaces
'struct ieee80211_sband_iftype_data at cfg80211.h:378:1' changed:
type size changed from 640 to 1024 (in bits)
1 data member insertion:
'ieee80211_sta_eht_cap eht_cap', at offset 472 (in bits) at cfg80211.h:431:1
there are data member changes:
'struct {const u8* data; unsigned int len;} vendor_elems' offset changed (by +384 bits)
3084 impacted interfaces
'struct iommu_dma_cookie at dma-iommu.c:41:1' changed (indirectly):
type size changed from 15360 to 15424 (in bits)
there are data member changes:
type 'union {iova_domain iovad; dma_addr_t msi_iova;}' of 'anonymous data member' changed:
type size changed from 15104 to 15168 (in bits)
there are data member changes:
type 'struct iova_domain' of '__anonymous_union__::iovad' changed:
type size changed from 15104 to 15168 (in bits)
1 data member insertion:
'bool best_fit', at offset 15104 (in bits) at iova.h:99:1
3086 impacted interfaces
2 ('list_head msi_page_list' .. 'iommu_domain* fq_domain') offsets changed (by +64 bits)
3084 impacted interfaces
'struct iova_domain at iova.h:68:1' changed:
details were reported earlier
'struct module at module.h:364:1' changed:
type size hasn't changed
2 data member insertions:
'unsigned int btf_data_size', at offset 6016 (in bits) at module.h:477:1
'void* btf_data', at offset 6080 (in bits) at module.h:478:1
there are data member changes:
18 ('jump_entry* jump_entries' .. 'unsigned int num_ei_funcs') offsets changed (by +128 bits)
3084 impacted interfaces
'struct rate_info at cfg80211.h:1580:1' changed:
type size changed from 80 to 96 (in bits)
2 data member insertions:
'u8 eht_gi', at offset 80 (in bits) at cfg80211.h:1673:1
'u8 eht_ru_alloc', at offset 88 (in bits) at cfg80211.h:1674:1
5 impacted interfaces
'struct station_info at cfg80211.h:1743:1' changed (indirectly):
type size changed from 1792 to 1856 (in bits)
there are data member changes:
type 'struct rate_info' of 'station_info::txrate' changed, as reported earlier
type 'struct rate_info' of 'station_info::rxrate' changed, as reported earlier
and offset changed from 528 to 544 (in bits) (by +16 bits)
8 ('u32 rx_packets' .. 'int generation') offsets changed (by +32 bits)
21 ('const u8* assoc_req_ies' .. 'u8 connected_to_as') offsets changed (by +64 bits)
4 impacted interfaces
'struct station_parameters at cfg80211.h:1421:1' changed:
type size changed from 1280 to 1408 (in bits)
2 data member insertions:
'const ieee80211_eht_cap_elem* eht_capa', at offset 1280 (in bits) at cfg80211.h:1525:1
'u8 eht_capa_len', at offset 1344 (in bits) at cfg80211.h:1526:1
one impacted interface
'struct virtio_config_ops at virtio_config.h:77:1' changed:
type size changed from 896 to 960 (in bits)
1 data member insertion:
'void (virtio_device*)* enable_cbs', at offset 0 (in bits) at virtio_config.h:80:1
there are data member changes:
14 ('void (virtio_device*, unsigned int, void*, unsigned int)* get' .. 'typedef bool (virtio_device*, virtio_shm_region*, typedef u8)* get_shm_region') offsets changed (by +64 bits)
35 impacted interfaces
Bug: 222115076
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: I1aac74111756444ff6bff92b843a5133f3c7541c
This patch tries to make sure the virtio interrupt handler for INTX
won't be called after a reset and before virtio_device_ready(). We
can't use IRQF_NO_AUTOEN since we're using shared interrupt
(IRQF_SHARED). So this patch tracks the INTX enabling status in a new
intx_soft_enabled variable and toggle it during in
vp_disable/enable_vectors(). The INTX interrupt handler will check
intx_soft_enabled before processing the actual interrupt.
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-6-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 080cd7c3ac)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: If90814df2859e742df050d406f2d67547bd6dbb3
We used to synchronize pending MSI-X irq handlers via
synchronize_irq(), this may not work for the untrusted device which
may keep sending interrupts after reset which may lead unexpected
results. Similarly, we should not enable MSI-X interrupt until the
device is ready. So this patch fixes those two issues by:
1) switching to use disable_irq() to prevent the virtio interrupt
handlers to be called after the device is reset.
2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
This can make sure the virtio interrupt handler won't be called before
virtio_device_ready() and after reset.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-5-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9e35276a53)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: I63832b87a567c4447064143fa62386c59481d43b
This patch introduces a new method to enable the callbacks for config
and virtqueues. This will be used for making sure the virtqueue
callbacks are only enabled after virtio_device_ready() if transport
implements this method.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-4-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit d50497eb4e)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: I17ea164aa100d690ebde3b2f6c2e5514a9b5cfd9
Build BTF type info into the kernel to enable use of BPF-based tools
such as BCC's libbpf-tools.
By default, modules whose split BTF is inconsistent with vmlinux BTF
will fail to load, which can prevent loading compatible but separately
built modules. Instead, enable MODULE_ALLOW_BTF_MISMATCH to ignore
such modules' BTF rather than refusing to load the module.
Bug: 203823368
Bug: 218515241
Test: build
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I8efaab5f1a5c6ad6e9e6ccf1e78088d81a880480