Replace the 'host_mmio_dabt_handler' hook in kvm_iommu_ops with
an equivalent callback in the new pkvm_iommu_ops. The generic portion
of the code finds the IOMMU device at the faulted address and invokes
the callback on it.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I44147ceb7877dc1999fd10f4db55659bbbec5bb7
Add suspend/resume callbacks for IOMMU devices. The EL1 kernel driver
is expected to call these when the IOMMU device is powered on but is
about to be used or about to stop being used.
pkvm_iommu_suspend/resume are exported for use by kernel modules.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I5cd38aaeb685bcdae0368453138cc099055adb27
Add '__pkvm_iommu_register' hypcall for registering a new IOMMU device.
The handler allocates a linked-list entry for the device from a memory
pool provided by the host. If the pool has run out, the handler returns
-ENOMEM and expects the host to call it again with a fresh mem pool.
The inputs are validated, eg. ID is unique and memory region does not
overlap with existing IOMMUs. The driver can also implement a 'validate'
callback for driver-specific input validation.
If successful, the handler creates a private EL2 mapping for the device,
forces the memory region is unmapped from host stage-2 and inserts the
device into the linked list. Future attempts to map the MMIO region will
fail because of pkvm_iommu_host_stage2_adjust_range.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: If54ba41cd0b219c6e63508b542d526703ab5b97e
Introduce a linked list of IOMMU devices and
'pkvm_iommu_host_stage2_adjust_range' called from host DABT handler.
The function will adjust the memory range that is about to be mapped
to avoid MMIO regions of all devices in the linked list. If the host
tried to access a device MMIO region, the access is declined.
The function replaces the existing call to
'kvm_iommu.ops.host_stage2_adjust_mmio_range' callback.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: Ib38256f0005588810a4400efd9a85380d354be59
Add '__pkvm_iommu_driver_init' hypcall and 'struct pkvm_iommu_ops' with
an 'init' callback implemented by an EL2 driver. Driver-specific data
can be passed to 'init' from the host. The memory is pinned while
the callback processed it.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I7cfe51de553e07083747467e1e3ca8bc51737035
The hypervisor has not needed its own .data section because all globals
were either .rodata or .bss. Linked lists are initialized with the head
pointing to itself. To avoid having to work around this by initializing
at runtime, add a .hyp.data section.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I7a56dc4c93e05bbef53c66837164d17c6103b6b8
The pKVM hypervisor currently zeroes all the pages mapped into guests
when tearing them down for confidentiality reasons. However, for pages
that are shared with the host this is unecessary at best as the content
of memory is already visible. This is particularly bad for non-protected
guests as all their memory is shared with the host by definition.
Add a new flag to distingish pages that solely need to be updated from
an ownership perspective and those that need to be zeroed.
NOTE: We should probably overhaul the teardown procedure at some point
to avoid the proliferation of those flags, but that would require
significant changes so we might not want that in Android 13.
Bug: 223678931
Change-Id: Icefc85a0bdcdf9958e9eb6871c794f68b06a007f
Signed-off-by: Quentin Perret <qperret@google.com>
The pKVM shadow table is protected by 'shadow_lock', however this lock
is only taken across relatively fine-grained calls when inserting and
removing entries from the table. This poses a problem for higher-level
functions such as __pkvm_init_shadow(), where a partially-initialised
shadow entry is made transiently visibly to get_shadow_vcpu() and could
potentially be loaded in an inconsistent state by another CPU.
Push the locking out of the insert/remove functions and up into
__pkvm_{init,teardown}_shadow() so that the shadow state always appears
to be consistent as long as the lock is held.
Bug: 216808671
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I74c563a539c1ce35f5da86a8281e47c7d435bd27
There's no reason to make the internal shadow table data directly
accessible outside of pkvm.c, so make it all static and provide an
initialisation function to install the initial pages.
Bug: 216808671
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Idc0908796ebbd2b620494f5d4d6b6055455c8013
Add hooks to gather data of unsual aborts and summarize it with
other information.
Bug: 222638752
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I74eb36b8551ed9a5e6dc87507939a7f4d81c9c18
Add hooks to gather data of kernel fault and summarize it with
other information.
Bug: 222638752
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I7d6a66837f2e896a413bd8d878f26928669d96e6
Add hooks to gather data of unfrozen tasks and summarize it
with other information.
Bug: 222638752
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I61da3d253bd9959c6f06e09c9a35c4b242cedafe
Add hook to gather data of softlockup and summarize it with
other information.
Bug: 222638752
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I5263bbd573c3fa4b4c981ac26c943721ce09506d
Add hook to gather data of bug trap and summarize it with other
information.
Bug: 222638752
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I1f347c20629786f9bf0b9c50c7f96b50b4360504
Export cpuidle_driver_state_disabled() so that CPU idle states may be
disabled at runtime for debugging CPU and cluster idle states.
Bug: 175718935
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Change-Id: Id9038074d64fb6c0444d9aca68420414c3223e93
(cherry picked from commit de93734e22)
With these hooks, printk can provide more information, such as the
processor ID.
Bug: 223302138
Signed-off-by: Ben Dai <ben.dai@unisoc.com>
Change-Id: Iac60ffd49640d8badf5c5dd446c211d37bbbc6a6
Commit d43583b890 ("KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the
guest") hooked up the SYSTEM_RESET2 PSCI call for guests but failed to
preserve its arguments for userspace, instead overwriting them with
zeroes via smccc_set_retval(). As Linux only passes zeroes for these
arguments, this appeared to be working for Linux guests. Oh well.
Don't call smccc_set_retval() for a SYSTEM_RESET2 heading to userspace
and instead set X0 (and only X0) explicitly to PSCI_RET_INTERNAL_FAILURE
just in case the vCPU re-enters the guest.
Fixes: d43583b890 ("KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest")
Reported-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220309181308.982-1-will@kernel.org
(cherry picked from commit 9d3e7b7c82
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 216801012
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ieead1a813e6b4dfee1aa89e42ee1926efcd5f590
Set KMI_GENERATION=1 for 3/9 KMI update
Leaf changes summary: 2579 artifacts changed (1 filtered out)
Changed leaf types summary: 9 (1 filtered out) leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 2521 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 49 Changed, 0 Added variable
2521 functions with some sub-type change:
[C] 'function void* PDE_DATA(const inode*)' at generic.c:794:1 has some sub-type changes:
CRC (modversions) changed from 0x17465176 to 0x1c3e2a86
[C] 'function void __ClearPageMovable(page*)' at compaction.c:138:1 has some sub-type changes:
CRC (modversions) changed from 0x8331b3e3 to 0x734edab3
[C] 'function void __SetPageMovable(page*, address_space*)' at compaction.c:130:1 has some sub-type changes:
CRC (modversions) changed from 0xe56f361 to 0x891f9c1d
... 2518 omitted; 2521 symbols have only CRC changes
49 Changed variables:
[C] 'bus_type amba_bustype' was changed at bus.c:313:1:
CRC (modversions) changed from 0xe555ebeb to 0x517f2d17
[C] 'const address_space_operations balloon_aops' was changed at balloon_compaction.c:253:1:
CRC (modversions) changed from 0xa9866f1a to 0x89a77b8c
[C] 'const clk_ops clk_divider_ops' was changed at clk-divider.c:522:1:
CRC (modversions) changed from 0xca4154fa to 0x5a75cc1
... 46 omitted; 49 symbols have only CRC changes
'enum nl80211_attrs at nl80211.h:2666:1' changed:
type size hasn't changed
1 enumerator insertion:
'nl80211_attrs::NL80211_ATTR_EHT_CAPABILITY' value '310'
3 enumerator changes:
'nl80211_attrs::NL80211_ATTR_MAX' from value '309' to '310' at nl80211.h:2670:1
'nl80211_attrs::NUM_NL80211_ATTR' from value '310' to '311' at nl80211.h:2670:1
'nl80211_attrs::__NL80211_ATTR_AFTER_LAST' from value '310' to '311' at nl80211.h:2670:1
2 impacted interfaces
'struct ieee80211_sband_iftype_data at cfg80211.h:378:1' changed:
type size changed from 640 to 1024 (in bits)
1 data member insertion:
'ieee80211_sta_eht_cap eht_cap', at offset 472 (in bits) at cfg80211.h:431:1
there are data member changes:
'struct {const u8* data; unsigned int len;} vendor_elems' offset changed (by +384 bits)
3084 impacted interfaces
'struct iommu_dma_cookie at dma-iommu.c:41:1' changed (indirectly):
type size changed from 15360 to 15424 (in bits)
there are data member changes:
type 'union {iova_domain iovad; dma_addr_t msi_iova;}' of 'anonymous data member' changed:
type size changed from 15104 to 15168 (in bits)
there are data member changes:
type 'struct iova_domain' of '__anonymous_union__::iovad' changed:
type size changed from 15104 to 15168 (in bits)
1 data member insertion:
'bool best_fit', at offset 15104 (in bits) at iova.h:99:1
3086 impacted interfaces
2 ('list_head msi_page_list' .. 'iommu_domain* fq_domain') offsets changed (by +64 bits)
3084 impacted interfaces
'struct iova_domain at iova.h:68:1' changed:
details were reported earlier
'struct module at module.h:364:1' changed:
type size hasn't changed
2 data member insertions:
'unsigned int btf_data_size', at offset 6016 (in bits) at module.h:477:1
'void* btf_data', at offset 6080 (in bits) at module.h:478:1
there are data member changes:
18 ('jump_entry* jump_entries' .. 'unsigned int num_ei_funcs') offsets changed (by +128 bits)
3084 impacted interfaces
'struct rate_info at cfg80211.h:1580:1' changed:
type size changed from 80 to 96 (in bits)
2 data member insertions:
'u8 eht_gi', at offset 80 (in bits) at cfg80211.h:1673:1
'u8 eht_ru_alloc', at offset 88 (in bits) at cfg80211.h:1674:1
5 impacted interfaces
'struct station_info at cfg80211.h:1743:1' changed (indirectly):
type size changed from 1792 to 1856 (in bits)
there are data member changes:
type 'struct rate_info' of 'station_info::txrate' changed, as reported earlier
type 'struct rate_info' of 'station_info::rxrate' changed, as reported earlier
and offset changed from 528 to 544 (in bits) (by +16 bits)
8 ('u32 rx_packets' .. 'int generation') offsets changed (by +32 bits)
21 ('const u8* assoc_req_ies' .. 'u8 connected_to_as') offsets changed (by +64 bits)
4 impacted interfaces
'struct station_parameters at cfg80211.h:1421:1' changed:
type size changed from 1280 to 1408 (in bits)
2 data member insertions:
'const ieee80211_eht_cap_elem* eht_capa', at offset 1280 (in bits) at cfg80211.h:1525:1
'u8 eht_capa_len', at offset 1344 (in bits) at cfg80211.h:1526:1
one impacted interface
'struct virtio_config_ops at virtio_config.h:77:1' changed:
type size changed from 896 to 960 (in bits)
1 data member insertion:
'void (virtio_device*)* enable_cbs', at offset 0 (in bits) at virtio_config.h:80:1
there are data member changes:
14 ('void (virtio_device*, unsigned int, void*, unsigned int)* get' .. 'typedef bool (virtio_device*, virtio_shm_region*, typedef u8)* get_shm_region') offsets changed (by +64 bits)
35 impacted interfaces
Bug: 222115076
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: I1aac74111756444ff6bff92b843a5133f3c7541c
This patch tries to make sure the virtio interrupt handler for INTX
won't be called after a reset and before virtio_device_ready(). We
can't use IRQF_NO_AUTOEN since we're using shared interrupt
(IRQF_SHARED). So this patch tracks the INTX enabling status in a new
intx_soft_enabled variable and toggle it during in
vp_disable/enable_vectors(). The INTX interrupt handler will check
intx_soft_enabled before processing the actual interrupt.
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-6-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 080cd7c3ac)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: If90814df2859e742df050d406f2d67547bd6dbb3
We used to synchronize pending MSI-X irq handlers via
synchronize_irq(), this may not work for the untrusted device which
may keep sending interrupts after reset which may lead unexpected
results. Similarly, we should not enable MSI-X interrupt until the
device is ready. So this patch fixes those two issues by:
1) switching to use disable_irq() to prevent the virtio interrupt
handlers to be called after the device is reset.
2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
This can make sure the virtio interrupt handler won't be called before
virtio_device_ready() and after reset.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-5-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9e35276a53)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: I63832b87a567c4447064143fa62386c59481d43b
This patch introduces a new method to enable the callbacks for config
and virtqueues. This will be used for making sure the virtqueue
callbacks are only enabled after virtio_device_ready() if transport
implements this method.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-4-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit d50497eb4e)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: I17ea164aa100d690ebde3b2f6c2e5514a9b5cfd9
Build BTF type info into the kernel to enable use of BPF-based tools
such as BCC's libbpf-tools.
By default, modules whose split BTF is inconsistent with vmlinux BTF
will fail to load, which can prevent loading compatible but separately
built modules. Instead, enable MODULE_ALLOW_BTF_MISMATCH to ignore
such modules' BTF rather than refusing to load the module.
Bug: 203823368
Bug: 218515241
Test: build
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I8efaab5f1a5c6ad6e9e6ccf1e78088d81a880480
IOVAs are aligned to the smallest PAGE_SIZE order, where the requested
IOVA can fit. But this might not work for all use-cases. It can cause
IOVA fragmentation in some multimedia and 8K video use-cases that may
require larger buffers to be allocated and mapped.
When the above allocation pattern is used with the current alignment
scheme, the IOVA space could be quickly exhausted for 32bit devices.
In order to get better IOVA space utilization and reduce fragmentation,
a new kernel command line parameter is introduced to make the alignment
limit configurable by the user during boot.
Bug: 190519428
Change-Id: I0c8e72370fc3266a5a242837d82aae4f9831aef3
Link: https://lore.kernel.org/r/1634148667-409263-1-git-send-email-quic_c_gdjako@quicinc.com/
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
This reverts three increment-fs commits:
d5faa13b5910412e10c67ad88c9349
This is to fix the incrementalinstall test.
Can now install the same apk twice, and repeated installs are stable.
Bug: 217661925
Bug: 219731048
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Change-Id: Ia8488d728218881ed17e4d68cab21b0b152e3ca4
During teardown, we currently walk the guest stage-2 page-table and
annotate all of its pages as 'pending poisoning' in the host stage-2.
Sadly, this requires a host stage-2 walk for every guest page, which is
rather inefficient and can lead to a long non-preemptible amount of time
spent at EL2. This gets particularly bad with IOMMUs as, in its current
form, the host stage-2 annotation triggers IOMMU updates.
To avoid the host stage-2 walks, let's annotate the pages pending
poisoning using a flag in the hyp_vmemmap instead.
Bug: 219180169
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I8894bd8e0b10ea8817763479412b540c0291e8f5
Add a 'flags' field to struct hyp_page, and reduce the size of the order
field to u8 to avoid growing the struct size.
Bug: 219180169
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: If629935bb6fa7d832c595685083f7985cfcfa221
We calculate nr_ports based on the max_nr_ports:
nr_queues = use_multiport(portdev) ? (nr_ports + 1) * 2 : 2;
If the device advertises a large max_nr_ports, we will end up with a
integer overflow. Fixing this by validating the max_nr_ports and fail
the probe for invalid max_nr_ports in this case.
Cc: Amit Shah <amit@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-3-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 28962ec595)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: Idb5462a1268d2bde5f867f5455da0957ca68035a
Although FF-A claims to require version v1.2 of SMCCC, in reality the
current set of calls work just fine with v1.1 and some devices ship with
EL3 firmware that advertises this configuration.
Allow pKVM to proxy FF-A calls for these devices by relaxing our SMCCC
version check to permit SMCCC v1.1+
Reported-by: Alan Stokes <alanstokes@google.com>
Bug: 222663556
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I41e9ff35f169df3609acee7bbc67999c1d11c9d1
Currently, the trace hook for is_cpu_allowed only executes if the
cpu is not a kthread. Modules need to be able to reject cpus
regardless of whether the task is a kthread or not. Modules also
need to have the flexibility to execute, or not, the remainder of
is_cpu_allowed.
Move the tracepoint for is_cpu_allowed so that it is invoked
regardless of task's kthread status, but do not interfere with
per-cpu-kthread cpu assignment.
Bug: 222550772
Change-Id: Ide48a82a33129448bb22be28814267b0b76535a2
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
Document the functionality of disable_dma32 as introduced in commit
c3c2bb34ac ("ANDROID: arm64/mm: Add command line option to make
ZONE_DMA32 empty").
Bug: 199917449
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Change-Id: I32ab2969f59fcc49e9ac49e7e6b545f816d120f9
zone_dma32_is_empty() currently lacks the proper validation to ensure
that the NUMA node ID it receives as an argument is valid. This has no
effect on kernels with CONFIG_NUMA=n as NODE_DATA() will return the
same pglist_data on these devices, but on kernels with CONFIG_NUMA=y,
this is not the case, and the node passed to NODE_DATA must be
validated.
Rather than trying to find the node containing ZONE_DMA32, replace
calls of zone_dma32_is_empty() with zone_dma32_are_empty() (which
iterates over all nodes and returns false if one of the nodes holds
DMA32 and it is non-empty).
Bug: 199917449
Fixes: c3c2bb34ac ("ANDROID: arm64/mm: Add command line option to make ZONE_DMA32 empty")
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Change-Id: I850fb9213b71a1ef29106728bfda0cc6de46fdbb