Use depends on instead of select for DMA_RESTRICTED_POOL; otherwise it
will make SWIOTLB user configurable and cause compile errors for some
arch (e.g. mips).
Fixes: 0b84e4f8b7 ("swiotlb: Add restricted DMA pool initialization")
Acked-by: Will Deacon <will@kernel.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Claire Chang <tientzu@chromium.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit f3c4b1341e)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190591509
Change-Id: I56503151fc4e9695a2a9a2c0ffaf92b0c50880ac
If CONFIG_DMA_RESTRICTED_POOL=n then probing a device with a reference
to a "restricted-dma-pool" will fail with a reasonably cryptic error:
| pci-host-generic: probe of 10000.pci failed with error -22
Rework of_dma_set_restricted_buffer() so that it does not cause probing
failure and instead either returns early if CONFIG_DMA_RESTRICTED_POOL=n
or emits a diagnostic if the reserved DMA pool fails to initialise.
Cc: Claire Chang <tientzu@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit f3cfd136ae)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190591509
Change-Id: Iaa5624abce46a7da19ba4a0e57591228370d74ad
The switch-case for handling guest debug exception covers
all the debug exception classes, but functionally, doesn't
do anything with them other than ESR_ELx_EC_WATCHPT_LOW.
Moreover, even though handled well, the 'default' case
could be confusing from a security point of view, stating
that the guests' actions can potentially flood the syslog.
But in reality, the code is unreachable.
Hence, trim down the function to only handle the case with
ESR_ELx_EC_WATCHPT_LOW with a simple 'if' check.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210823223940.1878930-1-rananta@google.com
(cherry picked from commit 8ce8a6fce9)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I45882454b843fe5ed0bab002fdc0af760de2033d
Currently range_is_memory finds the corresponding struct memblock_region
for both the lower and upper bounds of the given address range with two
rounds of binary search, and then checks that the two memblocks are the
same. Simplify this by only doing binary search on the lower bound and
then checking that the upper bound is in the same memblock.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210728153232.1018911-3-dbrazdil@google.com
(cherry picked from commit 14ecf075fe)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: If8b21f208aecec8d45cffd945cba70cbad851f44
A number of registers pased to trace_kvm_arm_set_dreg32() are
actually 64bit. Upgrade the tracepoint to take a 64bit value,
despite the name...
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 411d63d8c6)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: Ic6a07cdc9943aff6e4b7a9381062e0b01c8291b9
Add feature register flag definitions to clarify which features
might be supported.
Consolidate the various ID_AA64PFR0_ELx flags for all ELs.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210817081134.2918285-10-tabba@google.com
(cherry picked from commit 95b54c3e4c)
[willdeacon@: Resolve context conflict with ID_AA64MMFR0_TGRAN_ definitions]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I0cf05479c3b858136584a30687d36742558f5b81
Track the baseline guest value for cptr_el2 in struct
kvm_vcpu_arch, similar to the other registers that control traps.
Use this value when setting cptr_el2 for the guest.
Currently this value is unchanged (CPTR_EL2_DEFAULT), but future
patches will set trapping bits based on features supported for
the guest.
No functional change intended.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210817081134.2918285-9-tabba@google.com
(cherry picked from commit cd496228fd)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I88b6c918d0eb626c8a508593f60569fd8f5073d3
On deactivating traps, restore the value of mdcr_el2 from the
newly created and preserved host value vcpu context, rather than
directly reading the hardware register.
Up until and including this patch the two values are the same,
i.e., the hardware register and the vcpu one. A future patch will
be changing the value of mdcr_el2 on activating traps, and this
ensures that its value will be restored.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210817081134.2918285-7-tabba@google.com
(cherry picked from commit 1460b4b25f)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I893688d65c5025a2c657df9cc0333cde6a4c70f7
Refactor sys_regs.h and sys_regs.c to make it easier to reuse
common code. It will be used in nVHE in a later patch.
Note that the refactored code uses __inline_bsearch for find_reg
instead of bsearch to avoid copying the bsearch code for nVHE.
No functional change intended.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210817081134.2918285-6-tabba@google.com
(cherry picked from commit f76f89e2f7)
[willdeacon@: Resolve context conflict with GIC ICH_* register definitions]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: Id2dd04b633a0e6f7a311cc7c8b495715d66cf1c6
Change the names of hcr_el2 register fields to match the Arm
Architecture Reference Manual. Easier for cross-referencing and
for grepping.
Also, change the name of CPTR_EL2_RES1 to CPTR_NVHE_EL2_RES1,
because res1 bits are different for VHE.
No functional change intended.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210817081134.2918285-5-tabba@google.com
(cherry picked from commit dabb1667d8)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I3c9720b31385ccc0a32a92efc2760be667688822
When a mapped level interrupt (a timer, for example) is deactivated
by the guest, the corresponding host interrupt is equally deactivated.
However, the fate of the pending state still needs to be dealt
with in SW.
This is specially true when the interrupt was in the active+pending
state in the virtual distributor at the point where the guest
was entered. On exit, the pending state is potentially stale
(the guest may have put the interrupt in a non-pending state).
If we don't do anything, the interrupt will be spuriously injected
in the guest. Although this shouldn't have any ill effect (spurious
interrupts are always possible), we can improve the emulation by
detecting the deactivation-while-pending case and resample the
interrupt.
While we're at it, move the logic into a common helper that can
be shared between the two GIC implementations.
Fixes: e40cc57bac ("KVM: arm/arm64: vgic: Support level-triggered mapped interrupts")
Reported-by: Raghavendra Rao Ananta <rananta@google.com>
Tested-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210819180305.1670525-1-maz@kernel.org
(cherry picked from commit 3134cc8beb)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I0cce2841be395633313a182db64ee258de0f9af5
vgic_get_irq(intid) is used all over the vgic code in order to get a
reference to a struct irq. It warns whenever intid is not a valid number
(like when it's a reserved IRQ number). The issue is that this warning
can be triggered from userspace (e.g., KVM_IRQ_LINE for intid 1020).
Drop the WARN call from vgic_get_irq.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210818213205.598471-1-ricarkol@google.com
(cherry picked from commit b9a51949ce)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I413e6059a75dcf2794a34e8be3a9a9c3404a9b1f
According to the PSCI specification, ARM DEN 0022D, 5.1.4 "CPU_ON", the
CPU_ON function takes a target_cpu argument that is bit-compatible with
the affinity fields in MPIDR_EL1. All other bits in the argument are
RES0. Note that the same constraints apply to the target_affinity
argument for the AFFINITY_INFO call.
Enforce the spec by returning INVALID_PARAMS if a guest incorrectly sets
a RES0 bit.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210818202133.1106786-4-oupton@google.com
(cherry picked from commit e10ecb4d6c)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: Ib07086826c08d7a7b1018b89d445cd33d387d01b
The CPU_ON PSCI call takes a payload that KVM uses to configure a
destination vCPU to run. This payload is non-architectural state and not
exposed through any existing UAPI. Effectively, we have a race between
CPU_ON and userspace saving/restoring a guest: if the target vCPU isn't
ran again before the VMM saves its state, the requested PC and context
ID are lost. When restored, the target vCPU will be runnable and start
executing at its old PC.
We can avoid this race by making sure the reset payload is serviced
before userspace can access a vCPU's state.
Fixes: 358b28f09f ("arm/arm64: KVM: Allow a VCPU to fully reset itself")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210818202133.1106786-3-oupton@google.com
(cherry picked from commit 6826c6849b)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: Ic163ea6f91941a161913fa495e9a677d5e3589df
KVM correctly serializes writes to a vCPU's reset state, however since
we do not take the KVM lock on the read side it is entirely possible to
read state from two different reset requests.
Cure the race for now by taking the KVM lock when reading the
reset_state structure.
Fixes: 358b28f09f ("arm/arm64: KVM: Allow a VCPU to fully reset itself")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210818202133.1106786-2-oupton@google.com
(cherry picked from commit 6654f9dfcb)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I8b563f223a77b8bd6ec486c6243f02d3d65d6cd2
When protected mode is enabled, the host is unable to access most parts
of the EL2 hypervisor image, including 'hyp_physvirt_offset' and the
contents of the hypervisor's '.rodata.str' section. Unfortunately,
nvhe_hyp_panic_handler() tries to read from both of these locations when
handling a BUG() triggered at EL2; the former for converting the ELR to
a physical address and the latter for displaying the name of the source
file where the BUG() occurred.
Hack the EL2 panic asm to pass both physical and virtual ELR values to
the host and utilise the newly introduced CONFIG_NVHE_EL2_DEBUG so that
we disable stage-2 protection for the host before returning to the EL1
panic handler. If the debug option is not enabled, display the address
instead of the source file:line information.
Cc: Andrew Scull <ascull@google.com>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210813130336.8139-1-will@kernel.org
(cherry picked from commit ccac969772)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I1fa87b3fea827b03ca9c8711c70680c26dba1e4c
KVM/arm64 is the sole user of perf_num_counters(), and really
could do without it. Stop using the obsolete API by relying on
the existing probing code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210414134409.1266357-2-maz@kernel.org
(cherry picked from commit 5421db1be3)
[willdeacon@: Resolve 'is_protected_kvm_enabled()' context conflict]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: Iea7438030314455002f76eefaa6cb7d945dd8e08
The host kernel is currently able to change EL2 stage-1 mappings without
restrictions thanks to the __pkvm_create_mappings() hypercall. But in a
world where the host is no longer part of the TCB, this clearly poses a
problem.
To fix this, introduce a new hypercall to allow the host to share a
physical memory page with the hypervisor, and remove the
__pkvm_create_mappings() variant. The new hypercall implements
ownership and permission checks before allowing the sharing operation,
and it annotates the shared page in the hypervisor stage-1 and host
stage-2 page-tables.
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210809152448.1810400-21-qperret@google.com
(cherry picked from commit 66c57edd3b)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: Iee84b10f898f00e2f82169a7a31c748e38b20f33
Refactor the hypervisor stage-1 locking in nVHE protected mode to expose
a new pkvm_create_mappings_locked() function. This will be used in later
patches to allow walking and changing the hypervisor stage-1 without
releasing the lock.
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210809152448.1810400-20-qperret@google.com
(cherry picked from commit f9370010e9)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: Ifd3fffbed53356f33bf1fdb4e4a0ac5b98f5b11f
Now that we mark memory owned by the hypervisor in the host stage-2
during __pkvm_init(), we no longer need to rely on the host to
explicitly mark the hyp sections later on.
Remove the __pkvm_mark_hyp() hypercall altogether.
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210809152448.1810400-19-qperret@google.com
(cherry picked from commit ad0e0139a8)
[willdeacon@: Resolve conflict with kmemleak fix]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I046c26613dea9406e07bd0e2a83aff1715a56991
As the hypervisor maps the host's .bss and .rodata sections in its
stage-1, make sure to tag them as shared in hyp and host page-tables.
But since the hypervisor relies on the presence of these mappings, we
cannot let the host in complete control of the memory regions -- it
must not unshare or donate them to another entity for example. To
prevent this, let's transfer the ownership of those ranges to the
hypervisor itself, and share the pages back with the host.
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210809152448.1810400-18-qperret@google.com
(cherry picked from commit 2c50166c62)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: Ie89c6656848e4056e75071be57dcd9aad885b11d
We will soon start annotating shared pages in page-tables in nVHE
protected mode. Define all the states in which a page can be (owned,
shared and owned, shared and borrowed), and provide helpers allowing to
convert this into SW bits annotations using the matching prot
attributes.
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210809152448.1810400-13-qperret@google.com
(cherry picked from commit ec250a67ea)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: Iffc5738d7c8cd8731a976a8bb74c029cc09f9975
Much of the stage-2 manipulation logic relies on being able to destroy
block mappings if e.g. installing a smaller mapping in the range. The
rationale for this behaviour is that stage-2 mappings can always be
re-created lazily. However, this gets more complicated when the stage-2
page-table is used to store metadata about the underlying pages. In such
cases, destroying a block mapping may lead to losing part of the state,
and confuse the user of those metadata (such as the hypervisor in nVHE
protected mode).
To avoid this, introduce a callback function in the pgtable struct which
is called during all map operations to determine whether the mappings
can use blocks, or should be forced to page granularity. This is used by
the hypervisor when creating the host stage-2 to force page-level
mappings when using non-default protection attributes.
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210809152448.1810400-11-qperret@google.com
(cherry picked from commit 5651311941)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I16a35b501d73ae1c58604a60fd271fc79361eb49