Commit Graph

378696 Commits

Author SHA1 Message Date
Christoffer Dall
944b8a6f9f KVM: arm-vgic: Support unqueueing of LRs to the dist
To properly access the VGIC state from user space it is very unpractical
to have to loop through all the LRs in all register access functions.
Instead, support moving all pending state from LRs to the distributor,
but leave active state LRs alone.

Note that to accurately present the active and pending state to VCPUs
reading these distributor registers from a live VM, we would have to
stop all other VPUs than the calling VCPU and ask each CPU to unqueue
their LR state onto the distributor and add fields to track active state
on the distributor side as well.  We don't have any users of such
functionality yet and there are other inaccuracies of the GIC emulation,
so don't provide accurate synchronized access to this state just yet.
However, when the time comes, having this function should help.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit cbd333a4bf)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:26 +02:00
Christoffer Dall
0b3a540dcc KVM: arm-vgic: Add vgic reg access from dev attr
Add infrastructure to handle distributor and cpu interface register
accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the
KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups
and defining the semantics of the attr field to be the MMIO offset as
specified in the GICv2 specs.

Missing register accesses or other changes in individual register access
functions to support save/restore of the VGIC state is added in
subsequent patches.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit c07a0191ef)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:26 +02:00
Christoffer Dall
f05c65c6da arm/arm64: kvm: Set vcpu->cpu to -1 on vcpu_put
The arch-generic KVM code expects the cpu field of a vcpu to be -1 if
the vcpu is no longer assigned to a cpu.  This is used for the optimized
make_all_cpus_request path and will be used by the vgic code to check
that no vcpus are running.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit e9b152cb95)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:25 +02:00
Christoffer Dall
405003b808 KVM: arm-vgic: Make vgic mmio functions more generic
Rename the vgic_ranges array to vgic_dist_ranges to be more specific and
to prepare for handling CPU interface register access as well (for
save/restore of VGIC state).

Pass offset from distributor or interface MMIO base to
find_matching_range function instead of the physical address of the
access in the VM memory map.  This allows other callers unaware of the
VM specifics, but with generic VGIC knowledge to reuse the function.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 1006e8cb22)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:25 +02:00
Christoffer Dall
10b316b725 irqchip: arm-gic: Define additional MMIO offsets and masks
Define CPU interface offsets for the GICC_ABPR, GICC_APR, and GICC_IIDR
registers.  Define distributor registers for the GICD_SPENDSGIR and the
GICD_CPENDSGIR.  KVM/ARM needs to know about these definitions to fully
support save/restore of the VGIC.

Also define some masks and shifts for the various GICH_VMCR fields.

Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 0307e1770f)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:25 +02:00
Christoffer Dall
c4ad31ff7d KVM: arm-vgic: Set base addr through device API
Support setting the distributor and cpu interface base addresses in the
VM physical address space through the KVM_{SET,GET}_DEVICE_ATTR API
in addition to the ARM specific API.

This has the added benefit of being able to share more code in user
space and do things in a uniform manner.

Also deprecate the older API at the same time, but backwards
compatibility will be maintained.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit ce01e4e887)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:24 +02:00
Christoffer Dall
1032acb686 KVM: arm-vgic: Support KVM_CREATE_DEVICE for VGIC
Support creating the ARM VGIC device through the KVM_CREATE_DEVICE
ioctl, which can then later be leveraged to use the
KVM_{GET/SET}_DEVICE_ATTR, which is useful both for setting addresses in
a more generic API than the ARM-specific one and is useful for
save/restore of VGIC state.

Adds KVM_CAP_DEVICE_CTRL to ARM capabilities.

Note that we change the check for creating a VGIC from bailing out if
any VCPUs were created, to bailing out if any VCPUs were ever run.  This
is an important distinction that shouldn't break anything, but allows
creating the VGIC after the VCPUs have been created.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 7330672bef)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:24 +02:00
Christoffer Dall
d90651fa17 ARM: KVM: Allow creating the VGIC after VCPUs
Rework the VGIC initialization slightly to allow initialization of the
vgic cpu-specific state even if the irqchip (the VGIC) hasn't been
created by user space yet.  This is safe, because the vgic data
structures are already allocated when the CPU is allocated if VGIC
support is compiled into the kernel.  Further, the init process does not
depend on any other information and the sacrifice is a slight
performance degradation for creating VMs in the no-VGIC case.

The reason is that the new device control API doesn't mandate creating
the VGIC before creating the VCPU and it is unreasonable to require user
space to create the VGIC before creating the VCPUs.

At the same time move the irqchip_in_kernel check out of
kvm_vcpu_first_run_init and into the init function to make the per-vcpu
and global init functions symmetric and add comments on the exported
functions making it a bit easier to understand the init flow by only
looking at vgic.c.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit e1ba0207a1)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:23 +02:00
Andre Przywara
764ee33977 ARM/KVM: save and restore generic timer registers
For migration to work we need to save (and later restore) the state of
each core's virtual generic timer.
Since this is per VCPU, we can use the [gs]et_one_reg ioctl and export
the three needed registers (control, counter, compare value).
Though they live in cp15 space, we don't use the existing list, since
they need special accessor functions and the arch timer is optional.

Acked-by: Marc Zynger <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 39735a3a39)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:23 +02:00
Christoffer Dall
b5a94dd48d arm/arm64: KVM: arch_timer: Initialize cntvoff at kvm_init
Initialize the cntvoff at kvm_init_vm time, not before running the VCPUs
at the first time because that will overwrite any potentially restored
values from user space.

Cc: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Marc Zynger <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit a1a64387ad)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:23 +02:00
Christoffer Dall
7e2c9ce019 arm: KVM: Don't return PSCI_INVAL if waitqueue is inactive
The current KVM implementation of PSCI returns INVALID_PARAMETERS if the
waitqueue for the corresponding CPU is not active.  This does not seem
correct, since KVM should not care what the specific thread is doing,
for example, user space may not have called KVM_RUN on this VCPU yet or
the thread may be busy looping to user space because it received a
signal; this is really up to the user space implementation.  Instead we
should check specifically that the CPU is marked as being turned off,
regardless of the VCPU thread state, and if it is, we shall
simply clear the pause flag on the CPU and wake up the thread if it
happens to be blocked for us.

Further, the implementation seems to be racy when executing multiple
VCPU threads.  There really isn't a reasonable user space programming
scheme to ensure all secondary CPUs have reached kvm_vcpu_first_run_init
before turning on the boot CPU.

Therefore, set the pause flag on the vcpu at VCPU init time (which can
reasonably be expected to be completed for all CPUs by user space before
running any VCPUs) and clear both this flag and the feature (in case the
feature can somehow get set again in the future) and ping the waitqueue
on turning on a VCPU using PSCI.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 478a8237f6)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:22 +02:00
Masanari Iida
90715c6a82 treewide: Fix typos in printk
Correct spelling typo in various part of kernel

[ cdall: Pickes KVM/arm64 specific part not already merged into LSK ]

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
(cherry picked from commit 77d84ff87e)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:22 +02:00
Lorenzo Pieralisi
8215ed10b4 arm: kvm: implement CPU PM notifier
Upon CPU shutdown and consequent warm-reboot, the hypervisor CPU state
must be re-initialized. This patch implements a CPU PM notifier that
upon warm-boot calls a KVM hook to reinitialize properly the hypervisor
state so that the CPU can be safely resumed.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
(cherry picked from commit 1fcf7ce0c6)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:22 +02:00
Takuya Yoshikawa
580ab46133 KVM: Use cond_resched() directly and remove useless kvm_resched()
Since the commit 15ad7146 ("KVM: Use the scheduler preemption notifiers
to make kvm preemptible"), the remaining stuff in this function is a
simple cond_resched() call with an extra need_resched() check which was
there to avoid dropping VCPUs unnecessarily.  Now it is meaningless.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit c08ac06ab3)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:21 +02:00
Andy Honig
bc071b4cc7 KVM: Improve create VCPU parameter (CVE-2013-4587)
In multiple functions the vcpu_id is used as an offset into a bitfield.  Ag
malicious user could specify a vcpu_id greater than 255 in order to set or
clear bits in kernel memory.  This could be used to elevate priveges in the
kernel.  This patch verifies that the vcpu_id provided is less than 255.
The api documentation already specifies that the vcpu_id must be less than
max_vcpus, but this is currently not checked.

Reported-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 338c7dbadd)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:21 +02:00
Santosh Shilimkar
4479f3b006 arm/arm64: kvm: Use virt_to_idmap instead of virt_to_phys for idmap mappings
KVM initialisation fails on architectures implementing virt_to_idmap()
because virt_to_phys() on such architectures won't fetch you the correct
idmap page.

So update the KVM ARM code to use the virt_to_idmap() to fix the issue.
Since the KVM code is shared between arm and arm64, we create
kvm_virt_to_phys() and handle the redirection in respective headers.

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 4fda342cc7)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:20 +02:00
Heiko Carstens
4ae68e1ee6 KVM: kvm_clear_guest_page(): fix empty_zero_page usage
Using the address of 'empty_zero_page' as source address in order to
clear a page is wrong. On some architectures empty_zero_page is only the
pointer to the struct page of the empty_zero_page.  Therefore the clear
page operation would copy the contents of a couple of struct pages instead
of clearing a page.  For kvm only arm/arm64 are affected by this bug.

To fix this use the ZERO_PAGE macro instead which will return the struct
page address of the empty_zero_page on all architectures.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
(cherry picked from commit 8a3caa6d74)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:20 +02:00
Christoffer Dall
52031ff601 arm/arm64: KVM: Fix hyp mappings of vmalloc regions
Using virt_to_phys on percpu mappings is horribly wrong as it may be
backed by vmalloc.  Introduce kvm_kaddr_to_phys which translates both
types of valid kernel addresses to the corresponding physical address.

At the same time resolves a typing issue where we were storing the
physical address as a 32 bit unsigned long (on arm), truncating the
physical address for addresses above the 4GB limit.  This caused
breakage on Keystone.

Cc: <stable@vger.kernel.org>	[3.10+]
Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 40c2729bab)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:20 +02:00
Marc Zyngier
53e3896440 arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu
When booting a vcpu using PSCI, make sure we start it with the
endianness of the caller. Otherwise, secondaries can be pretty
unhappy to execute a BE kernel in LE mode...

This conforms to PSCI spec Rev B, 5.13.3.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit ce94fe93d5)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:19 +02:00
Marc Zyngier
dd8858820e arm/arm64: KVM: MMIO support for BE guest
Do the necessary byteswap when host and guest have different
views of the universe. Actually, the only case we need to take
care of is when the guest is BE. All the other cases are naturally
handled.

Also be careful about endianness when the data is being memcopy-ed
from/to the run buffer.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 6d89d2d9b5)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:19 +02:00
Marc Zyngier
128a021aa8 arm64: KVM: vgic: byteswap GICv2 access on world switch if BE
Ensure that accesses to the GICH_* registers are byteswapped
when the kernel is compiled as big-endian.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit c5b2c0f520)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:19 +02:00
Marc Zyngier
60979ebbbb arm64: KVM: initialize HYP mode following the kernel endianness
Force SCTLR_EL2.EE to 1 if the kernel is compiled as BE.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 18ea3dbc9e)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:18 +02:00
Gleb Natapov
d49c7a4473 KVM: remove vm mmap method
It was used in conjunction with KVM_SET_MEMORY_REGION ioctl which was
removed by b74a07beed in 2010, QEMU stopped using it in 2008, so
it is time to remove the code finally.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
(cherry picked from commit 80f5b5e700)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:18 +02:00
Michael S. Tsirkin
7f17a13bdd kvm_host: typo fix
fix up typo in comment.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 81e87e2679)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:17 +02:00
Alex Williamson
a7dc5f5535 kvm: Add VFIO device
So far we've succeeded at making KVM and VFIO mostly unaware of each
other, but areas are cropping up where a connection beyond eventfds
and irqfds needs to be made.  This patch introduces a KVM-VFIO device
that is meant to be a gateway for such interaction.  The user creates
the device and can add and remove VFIO groups to it via file
descriptors.  When a group is added, KVM verifies the group is valid
and gets a reference to it via the VFIO external user interface.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ec53500fae)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:17 +02:00
Borislav Petkov
b4fe3057a0 kvm: Add KVM_GET_EMULATED_CPUID
Add a kvm ioctl which states which system functionality kvm emulates.
The format used is that of CPUID and we return the corresponding CPUID
bits set for which we do emulate functionality.

Make sure ->padding is being passed on clean from userspace so that we
can use it for something in the future, after the ioctl gets cast in
stone.

s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 9c15bb1d0a)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:17 +02:00
Paolo Bonzini
e845f9d367 KVM: use a more sensible error number when debugfs directory creation fails
I don't know if this was due to cut and paste, or somebody was really
using a D20 to pick the error code for kvm_init_debugfs as suggested by
Linus (EFAULT is 14, so the possibility cannot be entirely ruled out).

In any case, this patch fixes it.

Reported-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0c8eb04a62)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:16 +02:00
Marc Zyngier
c0cdef185a arm64: KVM: Yield CPU when vcpu executes a WFE
On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.

The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit d241aac798)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:16 +02:00
Yang Zhang
b695de8341 KVM: Mapping IOMMU pages after updating memslot
In kvm_iommu_map_pages(), we need to know the page size via call
kvm_host_page_size(). And it will check whether the target slot
is valid before return the right page size.
Currently, we will map the iommu pages when creating a new slot.
But we call kvm_iommu_map_pages() during preparing the new slot.
At that time, the new slot is not visible by domain(still in preparing).
So we cannot get the right page size from kvm_host_page_size() and
this will break the IOMMU super page logic.
The solution is to map the iommu pages after we insert the new slot
into domain.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Tested-by: Patrick Lu <patrick.lu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e0230e1327)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:16 +02:00
Marc Zyngier
3d6b7ab302 arm/arm64: KVM: PSCI: use MPIDR to identify a target CPU
The KVM PSCI code blindly assumes that vcpu_id and MPIDR are
the same thing. This is true when vcpus are organized as a flat
topology, but is wrong when trying to emulate any other topology
(such as A15 clusters).

Change the KVM PSCI CPU_ON code to look at the MPIDR instead
of the vcpu_id to pick a target CPU.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 79c648806f)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:15 +02:00
Marc Zyngier
c93a79267f ARM: KVM: drop limitation to 4 CPU VMs
Now that the KVM/arm code knows about affinity, remove the hard
limit of 4 vcpus per VM.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 7999b4d182)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:15 +02:00
Marc Zyngier
8c02aa5001 ARM: KVM: fix L2CTLR to be per-cluster
The L2CTLR register contains the number of CPUs in this cluster.

Make sure the register content is actually relevant to the vcpu
that is being configured by computing the number of cores that are
part of its cluster.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 9cbb6d969c)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:14 +02:00
Marc Zyngier
c0acda6fd7 ARM: KVM: Fix MPIDR computing to support virtual clusters
In order to be able to support more than 4 A7 or A15 CPUs,
we need to fix the MPIDR computing to reflect the fact that
both A15 and A7 can only exist in clusters of at most 4 CPUs.

Fix the MPIDR computing to allow virtual clusters to be exposed
to the guest.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 2d1d841bd4)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:14 +02:00
Christoffer Dall
ad8b3ca795 KVM: ARM: Transparent huge page (THP) support
Support transparent huge pages in KVM/ARM and KVM/ARM64.  The
transparent_hugepage_adjust is not very pretty, but this is also how
it's solved on x86 and seems to be simply an artifact on how THPs
behave.  This should eventually be shared across architectures if
possible, but that can always be changed down the road.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 9b5fdb9781)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:14 +02:00
Christoffer Dall
cd709bb9fb KVM: ARM: Support hugetlbfs backed huge pages
Support huge pages in KVM/ARM and KVM/ARM64.  The pud_huge checking on
the unmap path may feel a bit silly as the pud_huge check is always
defined to false, but the compiler should be smart about this.

Note: This deals only with VMAs marked as huge which are allocated by
users through hugetlbfs only.  Transparent huge pages can only be
detected by looking at the underlying pages (or the page tables
themselves) and this patch so far simply maps these on a page-by-page
level in the Stage-2 page tables.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit ad361f093c)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:13 +02:00
Christoffer Dall
1391c263d2 KVM: ARM: Update comments for kvm_handle_wfi
Update comments to reflect what is really going on and add the TWE bit
to the comments in kvm_arm.h.

Also renames the function to kvm_handle_wfx like is done on arm64 for
consistency and uber-correctness.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 86ed81aa2e)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:13 +02:00
Marc Zyngier
46a037fd36 ARM: KVM: Yield CPU when vcpu executes a WFE
On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.

This creates contention, and the observed slowdown is 40x for
hackbench. No, this isn't a typo.

The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.

Quick test to estimate the performance: hackbench 1 process 1000

2xA15 host (baseline):	1.843s

2xA15 guest w/o patch:	2.083s
4xA15 guest w/o patch:	80.212s
8xA15 guest w/o patch:	Could not be bothered to find out

2xA15 guest w/ patch:	2.102s
4xA15 guest w/ patch:	3.205s
8xA15 guest w/ patch:	6.887s

So we go from a 40x degradation to 1.5x in the 2x overcommit case,
which is vaguely more acceptable.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 58d5ec8f8e)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:12 +02:00
Aneesh Kumar K.V
8413726807 kvm: Add struct kvm arg to memslot APIs
We will use that in the later patch to find the kvm ops handler

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
(cherry picked from commit 5587027ce9)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:12 +02:00
chai wen
0a2e6af7be KVM: Drop FOLL_GET in GUP when doing async page fault
Page pinning is not mandatory in kvm async page fault processing since
after async page fault event is delivered to a guest it accesses page once
again and does its own GUP.  Drop the FOLL_GET flag in GUP in async_pf
code, and do some simplifying in check/clear processing.

Suggested-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Gu zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: chai wen <chaiw.fnst@cn.fujitsu.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
(cherry picked from commit f2e106692d)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:12 +02:00
Christoffer Dall
1feff9299a KVM: arm64: Get rid of KVM_HPAGE defines
Now when the main kvm code relying on these defines has been moved to
the x86 specific part of the world, we can get rid of these.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
(cherry picked from commit ef0cfe71c2)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:11 +02:00
Christoffer Dall
4b0745341e KVM: ARM: Get rid of KVM_HPAGE defines
The KVM_HPAGE_DEFINES are a little artificial on ARM, since the huge
page size is statically defined at compile time and there is only a
single huge page size.

Now when the main kvm code relying on these defines has been moved to
the x86 specific part of the world, we can get rid of these.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
(cherry picked from commit dc6f6763df)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:11 +02:00
Christoffer Dall
c7a9a5f3f0 KVM: Move gfn_to_index to x86 specific code
The gfn_to_index function relies on huge page defines which either may
not make sense on systems that don't support huge pages or are defined
in an unconvenient way for other architectures.  Since this is
x86-specific, move the function to arch/x86/include/asm/kvm_host.h.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
(cherry picked from commit 6d9d41e574)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:11 +02:00
Jonathan Austin
5e1ddf60b6 KVM: ARM: Add support for Cortex-A7
This patch adds support for running Cortex-A7 guests on Cortex-A7 hosts.

As Cortex-A7 is architecturally compatible with A15, this patch is largely just
generalising existing code. Areas where 'implementation defined' behaviour
is identical for A7 and A15 is moved to allow it to be used by both cores.

The check to ensure that coprocessor register tables are sorted correctly is
also moved in to 'common' code to avoid each new cpu doing its own check
(and possibly forgetting to do so!)

Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit e8c2d99f82)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:10 +02:00
Jonathan Austin
e82866a69b KVM: ARM: fix the size of TTBCR_{T0SZ,T1SZ} masks
The T{0,1}SZ fields of TTBCR are 3 bits wide when using the long descriptor
format. Likewise, the T0SZ field of the HTCR is 3-bits. KVM currently
defines TTBCR_T{0,1}SZ as 3, not 7.

The T0SZ mask is used to calculate the value for the HTCR, both to pick out
TTBCR.T0SZ and mask off the equivalent field in the HTCR during
read-modify-write. The incorrect mask size causes the (UNKNOWN) reset value
of HTCR.T0SZ to leak in to the calculated HTCR value. Linux will hang when
initializing KVM if HTCR's reset value has bit 2 set (sometimes the case on
A7/TC2)

Fixing T0SZ allows A7 cores to boot and T1SZ is also fixed for completeness.

Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 5e497046f0)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:10 +02:00
Jonathan Austin
c1b378c34e KVM: ARM: Fix calculation of virtual CPU ID
KVM does not have a notion of multiple clusters for CPUs, just a linear
array of CPUs. When using a system with cores in more than one cluster, the
current method for calculating the virtual MPIDR will leak the (physical)
cluster information into the virtual MPIDR. One effect of this is that
Linux under KVM fails to boot multiple CPUs that aren't in the 0th cluster.

This patch does away with exposing the real MPIDR fields in favour of simply
using the virtual CPU number (but preserving the U bit, as before).

Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 1158fca401)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:09 +02:00
Andre Richter
708854902d virt/kvm/iommu.c: Add leading zeros to device's BDF notation in debug messages
When KVM (de)assigns PCI(e) devices to VMs, a debug message is printed
including the BDF notation of the respective device. Currently, the BDF
notation does not have the commonly used leading zeros. This produces
messages like "assign device 0:1:8.0", which look strange at first sight.

The patch fixes this by exchanging the printk(KERN_DEBUG ...) with dev_info()
and also inserts "kvm" into the debug message, so that it is obvious where
the message comes from. Also reduces LoC.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Andre Richter <andre.o.richter@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
(cherry picked from commit 29242cb5c6)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:09 +02:00
Gleb Natapov
89c774beb2 Fix NULL dereference in gfn_to_hva_prot()
gfn_to_memslot() can return NULL or invalid slot. We need to check slot
validity before accessing it.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
(cherry picked from commit a2ac07fe29)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:09 +02:00
Anup Patel
df50e5da2c ARM/ARM64: KVM: Implement KVM_ARM_PREFERRED_TARGET ioctl
For implementing CPU=host, we need a mechanism for querying
preferred VCPU target type on underlying Host.

This patch implements KVM_ARM_PREFERRED_TARGET vm ioctl which
returns struct kvm_vcpu_init instance containing information
about preferred VCPU target type and target specific features
available for it.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 42c4e0c77a)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:08 +02:00
Anup Patel
9778336fe1 ARM64: KVM: Implement kvm_vcpu_preferred_target() function
This patch implements kvm_vcpu_preferred_target() function for
KVM ARM64 which will help us implement KVM_ARM_PREFERRED_TARGET
ioctl for user space.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 473bdc0e65)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:08 +02:00
Anup Patel
f166457e08 ARM: KVM: Implement kvm_vcpu_preferred_target() function
This patch implements kvm_vcpu_preferred_target() function for
KVM ARM which will help us implement KVM_ARM_PREFERRED_TARGET ioctl
for user space.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 4a6fee805d)
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02 17:18:08 +02:00