The kvmarm mailing list is moderated for non-subscriber, but that
was never advertised. Fix this with the hope that people will
eventually subscribe before posting, saving me the hassle of
letting their post through eventually.
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 1a219e08ec)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I9c914e529e88646342c6090b7821056b3a8f21f4
Commit 21b6f32f94 ("KVM: arm64: guest debug, define API headers") added
the arm64 KVM_GUESTDBG_USE_HW flag for the KVM_SET_GUEST_DEBUG ioctl and
commit 834bf88726 ("KVM: arm64: enable KVM_CAP_SET_GUEST_DEBUG")
documented and implemented the flag functionality. Since its introduction,
at no point was the flag known by any name other than KVM_GUESTDBG_USE_HW
for the arm64 architecture, so refer to it as such in the documentation.
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210407144857.199746-2-alexandru.elisei@arm.com
(cherry picked from commit feb5dc3de0)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I268c7954f61487ed88605cd02024aa90e2cd15b2
Currently, there is no mechanism to keep time sync between guest and host
in arm/arm64 virtualization environment. Time in guest will drift compared
with host after boot up as they may both use third party time sources
to correct their time respectively. The time deviation will be in order
of milliseconds. But in some scenarios,like in cloud environment, we ask
for higher time precision.
kvm ptp clock, which chooses the host clock source as a reference
clock to sync time between guest and host, has been adopted by x86
which takes the time sync order from milliseconds to nanoseconds.
This patch enables kvm ptp clock for arm/arm64 and improves clock sync precision
significantly.
Test result comparisons between with kvm ptp clock and without it in arm/arm64
are as follows. This test derived from the result of command 'chronyc
sources'. we should take more care of the last sample column which shows
the offset between the local clock and the source at the last measurement.
no kvm ptp in guest:
MS Name/IP address Stratum Poll Reach LastRx Last sample
========================================================================
^* dns1.synet.edu.cn 2 6 377 13 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 21 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 29 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 37 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 45 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 53 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 61 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 4 -130us[ +796us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 12 -130us[ +796us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 20 -130us[ +796us] +/- 21ms
in host:
MS Name/IP address Stratum Poll Reach LastRx Last sample
========================================================================
^* 120.25.115.20 2 7 377 72 -470us[ -603us] +/- 18ms
^* 120.25.115.20 2 7 377 92 -470us[ -603us] +/- 18ms
^* 120.25.115.20 2 7 377 112 -470us[ -603us] +/- 18ms
^* 120.25.115.20 2 7 377 2 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 22 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 43 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 63 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 83 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 103 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 123 +872ns[-6808ns] +/- 17ms
The dns1.synet.edu.cn is the network reference clock for guest and
120.25.115.20 is the network reference clock for host. we can't get the
clock error between guest and host directly, but a roughly estimated value
will be in order of hundreds of us to ms.
with kvm ptp in guest:
chrony has been disabled in host to remove the disturb by network clock.
MS Name/IP address Stratum Poll Reach LastRx Last sample
========================================================================
* PHC0 0 3 377 8 -7ns[ +1ns] +/- 3ns
* PHC0 0 3 377 8 +1ns[ +16ns] +/- 3ns
* PHC0 0 3 377 6 -4ns[ -0ns] +/- 6ns
* PHC0 0 3 377 6 -8ns[ -12ns] +/- 5ns
* PHC0 0 3 377 5 +2ns[ +4ns] +/- 4ns
* PHC0 0 3 377 13 +2ns[ +4ns] +/- 4ns
* PHC0 0 3 377 12 -4ns[ -6ns] +/- 4ns
* PHC0 0 3 377 11 -8ns[ -11ns] +/- 6ns
* PHC0 0 3 377 10 -14ns[ -20ns] +/- 4ns
* PHC0 0 3 377 8 +4ns[ +5ns] +/- 4ns
The PHC0 is the ptp clock which choose the host clock as its source
clock. So we can see that the clock difference between host and guest
is in order of ns.
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201209060932.212364-8-jianyong.wu@arm.com
(cherry picked from commit 300bb1fe76)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I34c8be42218ea909e8d0623836cd93442d1257da
Implement the hypervisor side of the KVM PTP interface.
The service offers wall time and cycle count from host to guest.
The caller must specify whether they want the host's view of
either the virtual or physical counter.
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201209060932.212364-7-jianyong.wu@arm.com
(cherry picked from commit 3bf725699b)
[willdeacon@: Fixed UAPI #define and documentation conflicts]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: Ib2071bbf4c57d5408f7ad7ab27bf97018b7fe535
System time snapshots are not conveying information about the current
clocksource which was used, but callers like the PTP KVM guest
implementation have the requirement to evaluate the clocksource type to
select the appropriate mechanism.
Introduce a clocksource id field in struct clocksource which is by default
set to CSID_GENERIC (0). Clocksource implementations can set that field to
a value which allows to identify the clocksource.
Store the clocksource id of the current clocksource in the
system_time_snapshot so callers can evaluate which clocksource was used to
take the snapshot and act accordingly.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201209060932.212364-5-jianyong.wu@arm.com
(cherry picked from commit b2c67cbe9f)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I4660b3c4973ead7593fc957aad123ddddc10052a
We needn't retrieve the memory slot again in user_mem_abort() because
the corresponding memory slot has been passed from the caller. This
would save some CPU cycles. For example, the time used to write 1GB
memory, which is backed by 2MB hugetlb pages and write-protected, is
dropped by 6.8% from 928ms to 864ms.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210316041126.81860-4-gshan@redhat.com
(cherry picked from commit 10ba2d17d2)
[willdeacon@: Drop superfluous arguments to __gfn_to_pfn_memslot() and
mark_page_dirty_in_slot()]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: Ieddde429a392401e60ff6d171d35d51fde84ed56
This FROMLIST: commit conflicts with other patches, so drop it for now
and replace it with the UPSTREAM: version in a subsequent commit.
This reverts commit ad5f52dce6.
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I86b1ff75fb6784d983d5b29203f62aa13ae3ee58
Commit 23bde34771 ("KVM: arm64: vgic-v3: Drop the
reporting of GICR_TYPER.Last for userspace") temporarily fixed
a bug identified when attempting to access the GICR_TYPER
register before the redistributor region setting, but dropped
the support of the LAST bit.
Emulating the GICR_TYPER.Last bit still makes sense for
architecture compliance though. This patch restores its support
(if the redistributor region was set) while keeping the code safe.
We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
computes whether a redistributor is the highest one of a series
of redistributor contributor pages.
With this new implementation we do not need to have a uaccess
read accessor anymore.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-9-eric.auger@redhat.com
(cherry picked from commit 28e9d4bce3)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: Iec787a002501beb2103f6170a18b034e943ff250
vgic_uaccess() takes a struct vgic_io_device argument, converts it
to a struct kvm_io_device and passes it to the read/write accessor
functions, which convert it back to a struct vgic_io_device.
Avoid the indirection by passing the struct vgic_io_device argument
directly to vgic_uaccess_{read,write}.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-7-eric.auger@redhat.com
(cherry picked from commit da38530976)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: Ie26c66abeb1f09c9f7fadf4a66cda5dc3436a818
vgic_v3_insert_redist_region() may succeed while
vgic_register_all_redist_iodevs fails. For example this happens
while adding a redistributor region overlapping a dist region. The
failure only is detected on vgic_register_all_redist_iodevs when
vgic_v3_check_base() gets called in vgic_register_redist_iodev().
In such a case, remove the newly added redistributor region and free
it.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-4-eric.auger@redhat.com
(cherry picked from commit 8542a8f95a)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I8d28dd40946c714dd3df902defc267c6c1b9267f
KVM_DEV_ARM_VGIC_GRP_ADDR group doc says we should return
-EEXIST in case the base address of the redist is already set.
We currently return -EINVAL.
However we need to return -EINVAL in case a legacy REDIST address
is attempted to be set while REDIST_REGIONS were set. This case
is discriminated by looking at the count field.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-2-eric.auger@redhat.com
(cherry picked from commit d9b201e99c)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I227e7a9ecca105a796d71ea0a7de25f987956bba
To aid with debugging, add details of the source of a panic from nVHE
hyp. This is done by having nVHE hyp exit to nvhe_hyp_panic_handler()
rather than directly to panic(). The handler will then add the extra
details for debugging before panicking the kernel.
If the panic was due to a BUG(), look up the metadata to log the file
and line, if available, otherwise log an address that can be looked up
in vmlinux. The hyp offset is also logged to allow other hyp VAs to be
converted, similar to how the kernel offset is logged during a panic.
__hyp_panic_string is now inlined since it no longer needs to be
referenced as a symbol and the message is free to diverge between VHE
and nVHE.
The following is an example of the logs generated by a BUG in nVHE hyp.
[ 46.754840] kvm [307]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/switch.c:242!
[ 46.755357] kvm [307]: Hyp Offset: 0xfffea6c58e1e0000
[ 46.755824] Kernel panic - not syncing: HYP panic:
[ 46.755824] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[ 46.755824] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[ 46.755824] VCPU:0000d93a880d0000
[ 46.756960] CPU: 3 PID: 307 Comm: kvm-vcpu-0 Not tainted 5.12.0-rc3-00005-gc572b99cf65b-dirty #133
[ 46.757459] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 46.758366] Call trace:
[ 46.758601] dump_backtrace+0x0/0x1b0
[ 46.758856] show_stack+0x18/0x70
[ 46.759057] dump_stack+0xd0/0x12c
[ 46.759236] panic+0x16c/0x334
[ 46.759426] arm64_kernel_unmapped_at_el0+0x0/0x30
[ 46.759661] kvm_arch_vcpu_ioctl_run+0x134/0x750
[ 46.759936] kvm_vcpu_ioctl+0x2f0/0x970
[ 46.760156] __arm64_sys_ioctl+0xa8/0xec
[ 46.760379] el0_svc_common.constprop.0+0x60/0x120
[ 46.760627] do_el0_svc+0x24/0x90
[ 46.760766] el0_svc+0x2c/0x54
[ 46.760915] el0_sync_handler+0x1a4/0x1b0
[ 46.761146] el0_sync+0x170/0x180
[ 46.761889] SMP: stopping secondary CPUs
[ 46.762786] Kernel Offset: 0x3e1cd2820000 from 0xffff800010000000
[ 46.763142] PHYS_OFFSET: 0xffffa9f680000000
[ 46.763359] CPU features: 0x00240022,61806008
[ 46.763651] Memory Limit: none
[ 46.813867] ---[ end Kernel panic - not syncing: HYP panic:
[ 46.813867] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[ 46.813867] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[ 46.813867] VCPU:0000d93a880d0000 ]---
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-6-ascull@google.com
(cherry picked from commit aec0fae62e)
[willdeacon@: Resolved __hyp_pa() conflicts in psci-relay.c; aligned BUG() usage
with upstream]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: Ie1a500ab526de32abb3cb502319e3b88115f8038
hyp_panic() reports the address of the panic by using ELR_EL2, but this
isn't a useful address when hyp_panic() is called directly. Replace such
direct calls with BUG() and BUG_ON() which use BRK to trigger an
exception that then goes to hyp_panic() with the correct address. Also
remove the hyp_panic() declaration from the header file to avoid
accidental misuse.
Signed-off-by: Andrew Scull <ascull@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-5-ascull@google.com
(cherry picked from commit f79e616f27)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I1ba5edd4ddcdefd002d744ecd203d9db2b882bf6
Compilation fails when KVM is selected and ARM64_SVE isn't.
The root cause is that sve_cond_update_zcr_vq is not defined when
ARM64_SVE is not selected. Fix it by adding an empty definition
when CONFIG_ARM64_SVE=n.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
[maz: simplified commit message, fleshed out dummy #define]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1617183879-48748-1-git-send-email-tanxiaofei@huawei.com
(cherry picked from commit a9f8696d4b)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I55b8027a6025262e3921561d37d73d2199625be4
Although the SMCCC specification provides some limited functionality for
describing the presence of hypervisor and firmware services, this is
generally applicable only to functions designated as "Arm Architecture
Service Functions" and no portable discovery mechanism is provided for
standard hypervisor services, despite having a designated range of
function identifiers reserved by the specification.
In an attempt to avoid the need for additional firmware changes every
time a new function is added, introduce a UID to identify the service
provider as being compatible with KVM. Once this has been established,
additional services can be discovered via a feature bitmap.
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
[maz: move code to its own file, plug it into PSCI]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201209060932.212364-2-jianyong.wu@arm.com
(cherry picked from commit 6e085e0ac9)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I820ef716a2316a928d7cc8e5dda5befa543432d5
When reseeding the CRNG periodically, arch_get_random_seed_long() is
called to obtain entropy from an architecture specific source if one
is implemented. In most cases, these are special instructions, but in
some cases, such as on ARM, we may want to back this using firmware
calls, which are considerably more expensive.
Another call to arch_get_random_seed_long() exists in the CRNG driver,
in add_interrupt_randomness(), which collects entropy by capturing
inter-interrupt timing and relying on interrupt jitter to provide
random bits. This is done by keeping a per-CPU state, and mixing in
the IRQ number, the cycle counter and the return address every time an
interrupt is taken, and mixing this per-CPU state into the entropy pool
every 64 invocations, or at least once per second. The entropy that is
gathered this way is credited as 1 bit of entropy. Every time this
happens, arch_get_random_seed_long() is invoked, and the result is
mixed in as well, and also credited with 1 bit of entropy.
This means that arch_get_random_seed_long() is called at least once
per second on every CPU, which seems excessive, and doesn't really
scale, especially in a virtualization scenario where CPUs may be
oversubscribed: in cases where arch_get_random_seed_long() is backed
by an instruction that actually goes back to a shared hardware entropy
source (such as RNDRRS on ARM), we will end up hitting it hundreds of
times per second.
So let's drop the call to arch_get_random_seed_long() from
add_interrupt_randomness(), and instead, rely on crng_reseed() to call
the arch hook to get random seed material from the platform.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20201105152944.16953-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 390596c995)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I03ef9372cf464a8b5de0e09e93cad1baa059ae40
The ARM architected TRNG firmware interface, described in ARM spec
DEN0098, defines an ARM SMCCC based interface to a true random number
generator, provided by firmware.
This can be discovered via the SMCCC >=v1.1 interface, and provides
up to 192 bits of entropy per call.
Hook this SMC call into arm64's arch_get_random_*() implementation,
coming to the rescue when the CPU does not implement the ARM v8.5 RNG
system registers.
For the detection, we piggy back on the PSCI/SMCCC discovery (which gives
us the conduit to use (hvc/smc)), then try to call the
ARM_SMCCC_TRNG_VERSION function, which returns -1 if this interface is
not implemented.
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 38db987316)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I0c0cedbd7053cd322e119cb00ec5c05684458b7e
The ARM DEN0098 document describe an SMCCC based firmware service to
deliver hardware generated random numbers. Its existence is advertised
according to the SMCCC v1.1 specification.
Add a (dummy) call to probe functions implemented in each architecture
(ARM and arm64), to determine the existence of this interface.
For now this return false, but this will be overwritten by each
architecture's support patch.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit a37e31fc97)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I3e91678706efdaa992c2d52a9bcbf1bb994e93e4
Before GICv4.1, we don't have direct access to the VLPI state. So
we simply let it fail early when encountering any VLPI in saving.
But now we don't have to return -EACCES directly if on GICv4.1. Let’s
change the hard code and give a chance to save the VLPI state (and
preserve the UAPI).
Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210322060158.1584-7-lushenming@huawei.com
(cherry picked from commit 8082d50f48)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I5f878c088b42772774d5189d32bf74814e932521
When setting the forwarding path of a VLPI (switch to the HW mode),
we can also transfer the pending state from irq->pending_latch to
VPT (especially in migration, the pending states of VLPIs are restored
into kvm’s vgic first). And we currently send "INT+VSYNC" to trigger
a VLPI to pending.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210322060158.1584-6-lushenming@huawei.com
(cherry picked from commit 12df742921)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I7eb7e5db5d6e251c0f65f00c814a887ef44e260f
After pausing all vCPUs and devices capable of interrupting, in order
to save the states of all interrupts, besides flushing the states in
kvm’s vgic, we also try to flush the states of VLPIs in the virtual
pending tables into guest RAM, but we need to have GICv4.1 and safely
unmap the vPEs first.
As for the saving of VSGIs, which needs the vPEs to be mapped and might
conflict with the saving of VLPIs, but since we will map the vPEs back
at the end of save_pending_tables and both savings require the kvm->lock
to be held (thus only happen serially), it will work fine.
Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210322060158.1584-5-lushenming@huawei.com
(cherry picked from commit f66b7b151e)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: Ia375c8400861ef93cded852c05e4fe169445fe13
GICv4.1 gives a way to get the VLPI state, which needs to map the
vPE first, and after the state read, we may remap the vPE back while
the VPT is not empty. So we can't assume that the VPT is empty at
the first map. Besides, the optimization of PTZ is probably limited
since the HW should be fairly efficient to parse the empty VPT. Let's
drop the setting of PTZ altogether.
Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210322060158.1584-3-lushenming@huawei.com
(cherry picked from commit c21bc068cd)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 190594147
Change-Id: I599b10b7b196fba5515d835b340721b23990387c
Changes in 5.10.47
module: limit enabling module.sig_enforce
Revert "drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue."
Revert "drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell."
drm: add a locked version of drm_is_current_master
drm/nouveau: wait for moving fence after pinning v2
drm/radeon: wait for moving fence after pinning
drm/amdgpu: wait for moving fence after pinning
ARM: 9081/1: fix gcc-10 thumb2-kernel regression
mmc: meson-gx: use memcpy_to/fromio for dram-access-quirk
MIPS: generic: Update node names to avoid unit addresses
arm64: Ignore any DMA offsets in the max_zone_phys() calculation
arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required
spi: spi-nxp-fspi: move the register operation after the clock enable
Revert "PCI: PM: Do not read power state in pci_enable_device_flags()"
drm/vc4: hdmi: Move the HSM clock enable to runtime_pm
drm/vc4: hdmi: Make sure the controller is powered in detect
x86/entry: Fix noinstr fail in __do_fast_syscall_32()
x86/xen: Fix noinstr fail in exc_xen_unknown_trap()
locking/lockdep: Improve noinstr vs errors
perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context
perf/x86/intel/lbr: Zero the xstate buffer on allocation
dmaengine: zynqmp_dma: Fix PM reference leak in zynqmp_dma_alloc_chan_resourc()
dmaengine: stm32-mdma: fix PM reference leak in stm32_mdma_alloc_chan_resourc()
dmaengine: xilinx: dpdma: Add missing dependencies to Kconfig
dmaengine: xilinx: dpdma: Limit descriptor IDs to 16 bits
mac80211: remove warning in ieee80211_get_sband()
mac80211_hwsim: drop pending frames on stop
cfg80211: call cfg80211_leave_ocb when switching away from OCB
dmaengine: rcar-dmac: Fix PM reference leak in rcar_dmac_probe()
dmaengine: mediatek: free the proper desc in desc_free handler
dmaengine: mediatek: do not issue a new desc if one is still current
dmaengine: mediatek: use GFP_NOWAIT instead of GFP_ATOMIC in prep_dma
net: ipv4: Remove unneed BUG() function
mac80211: drop multicast fragments
net: ethtool: clear heap allocations for ethtool function
inet: annotate data race in inet_send_prepare() and inet_dgram_connect()
ping: Check return value of function 'ping_queue_rcv_skb'
net: annotate data race in sock_error()
inet: annotate date races around sk->sk_txhash
net/packet: annotate data race in packet_sendmsg()
net: phy: dp83867: perform soft reset and retain established link
riscv32: Use medany C model for modules
net: caif: fix memory leak in ldisc_open
net/packet: annotate accesses to po->bind
net/packet: annotate accesses to po->ifindex
r8152: Avoid memcpy() over-reading of ETH_SS_STATS
sh_eth: Avoid memcpy() over-reading of ETH_SS_STATS
r8169: Avoid memcpy() over-reading of ETH_SS_STATS
KVM: selftests: Fix kvm_check_cap() assertion
net: qed: Fix memcpy() overflow of qed_dcbx_params()
mac80211: reset profile_periodicity/ema_ap
mac80211: handle various extensible elements correctly
recordmcount: Correct st_shndx handling
PCI: Add AMD RS690 quirk to enable 64-bit DMA
net: ll_temac: Add memory-barriers for TX BD access
net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
perf/x86: Track pmu in per-CPU cpu_hw_events
pinctrl: stm32: fix the reported number of GPIO lines per bank
i2c: i801: Ensure that SMBHSTSTS_INUSE_STS is cleared when leaving i801_access
gpiolib: cdev: zero padding during conversion to gpioline_info_changed
scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART)
nilfs2: fix memory leak in nilfs_sysfs_delete_device_group
s390/stack: fix possible register corruption with stack switch helper
KVM: do not allow mapping valid but non-reference-counted pages
i2c: robotfuzz-osif: fix control-request directions
ceph: must hold snap_rwsem when filling inode for async create
kthread_worker: split code for canceling the delayed work timer
kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate()
x86/fpu: Make init_fpstate correct with optimized XSAVE
mm: add VM_WARN_ON_ONCE_PAGE() macro
mm/rmap: remove unneeded semicolon in page_not_mapped()
mm/rmap: use page_not_mapped in try_to_unmap()
mm, thp: use head page in __migration_entry_wait()
mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
mm/thp: make is_huge_zero_pmd() safe and quicker
mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
mm/thp: fix vma_address() if virtual address below file offset
mm/thp: fix page_address_in_vma() on file THP tails
mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
mm: page_vma_mapped_walk(): use page for pvmw->page
mm: page_vma_mapped_walk(): settle PageHuge on entry
mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
mm: page_vma_mapped_walk(): crossing page table boundary
mm: page_vma_mapped_walk(): add a level of indentation
mm: page_vma_mapped_walk(): use goto instead of while (1)
mm: page_vma_mapped_walk(): get vma_address_end() earlier
mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
mm, futex: fix shared futex pgoff on shmem huge page
KVM: SVM: Call SEV Guest Decommission if ASID binding fails
swiotlb: manipulate orig_addr when tlb_addr has offset
netfs: fix test for whether we can skip read when writing beyond EOF
Revert "drm: add a locked version of drm_is_current_master"
certs: Add EFI_CERT_X509_GUID support for dbx entries
certs: Move load_system_certificate_list to a common function
certs: Add ability to preload revocation certs
integrity: Load mokx variables into the blacklist keyring
Linux 5.10.47
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I48fc1e94475e27da558c2cbedcd253a3d06e535e
commit 5f89468e2f upstream.
in case of driver wants to sync part of ranges with offset,
swiotlb_tbl_sync_single() copies from orig_addr base to tlb_addr with
offset and ends up with data mismatch.
It was removed from
"swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single",
but said logic has to be added back in.
From Linus's email:
"That commit which the removed the offset calculation entirely, because the old
(unsigned long)tlb_addr & (IO_TLB_SIZE - 1)
was wrong, but instead of removing it, I think it should have just
fixed it to be
(tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
instead. That way the slot offset always matches the slot index calculation."
(Unfortunatly that broke NVMe).
The use-case that drivers are hitting is as follow:
1. Get dma_addr_t from dma_map_single()
dma_addr_t tlb_addr = dma_map_single(dev, vaddr, vsize, DMA_TO_DEVICE);
|<---------------vsize------------->|
+-----------------------------------+
| | original buffer
+-----------------------------------+
vaddr
swiotlb_align_offset
|<----->|<---------------vsize------------->|
+-------+-----------------------------------+
| | | swiotlb buffer
+-------+-----------------------------------+
tlb_addr
2. Do something
3. Sync dma_addr_t through dma_sync_single_for_device(..)
dma_sync_single_for_device(dev, tlb_addr + offset, size, DMA_TO_DEVICE);
Error case.
Copy data to original buffer but it is from base addr (instead of
base addr + offset) in original buffer:
swiotlb_align_offset
|<----->|<- offset ->|<- size ->|
+-------+-----------------------------------+
| | |##########| | swiotlb buffer
+-------+-----------------------------------+
tlb_addr
|<- size ->|
+-----------------------------------+
|##########| | original buffer
+-----------------------------------+
vaddr
The fix is to copy the data to the original buffer and take into
account the offset, like so:
swiotlb_align_offset
|<----->|<- offset ->|<- size ->|
+-------+-----------------------------------+
| | |##########| | swiotlb buffer
+-------+-----------------------------------+
tlb_addr
|<- offset ->|<- size ->|
+-----------------------------------+
| |##########| | original buffer
+-----------------------------------+
vaddr
[One fix which was Linus's that made more sense to as it created a
symmetry would break NVMe. The reason for that is the:
unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
would come up with the proper offset, but it would lose the
alignment (which this patch contains).]
Fixes: 16fc3cef33 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
Signed-off-by: Bumyong Lee <bumyong.lee@samsung.com>
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Dominique MARTINET <dominique.martinet@atmark-techno.com>
Reported-by: Horia Geantă <horia.geanta@nxp.com>
Tested-by: Horia Geantă <horia.geanta@nxp.com>
CC: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 934002cd66 upstream.
Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding
fails. If a failure happens after a successful LAUNCH_START command,
a decommission command should be executed. Otherwise, guest context
will be unfreed inside the AMD SP. After the firmware will not have
memory to allocate more SEV guest context, LAUNCH_START command will
begin to fail with SEV_RET_RESOURCE_LIMIT error.
The existing code calls decommission inside sev_unbind_asid, but it is
not called if a failure happens before guest activation succeeds. If
sev_bind_asid fails, decommission is never called. PSP firmware has a
limit for the number of guests. If sev_asid_binding fails many times,
PSP firmware will not have resources to create another guest context.
Cc: stable@vger.kernel.org
Fixes: 59414c9892 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command")
Reported-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Alper Gun <alpergun@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210610174604.2554090-1-alpergun@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>