Ganapatrao reported that the kvm_pgtable->mmu pointer is more or
less hardcoded to the main S2 mmu structure, while the nested
code needs it to point to other instances (as we have one instance
per nested context).
Rework the initialisation of the kvm_pgtable structure so that
this assumtion doesn't hold true anymore. This requires some
minor changes to the order in which things are initialised
(the mmu->arch pointer being the critical one).
Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211129200150.351436-5-maz@kernel.org
(cherry picked from commit 9d8604b285)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: If942bf4a01a975a60c37ad8483eafc52185234d5
Introduce an unshare hypercall which can be used to unmap memory from
the hypervisor stage-1 in nVHE protected mode. This will be useful to
update the EL2 ownership state of pages during guest teardown, and
avoids keeping dangling mappings to unreferenced portions of memory.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-14-qperret@google.com
(cherry picked from commit b8cc6eb5bd)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I0b2839046914a0fd9a6d0077383d53dc84a8ac3d
Tearing down a previously shared memory region results in the borrower
losing access to the underlying pages and returning them to the "owned"
state in the owner.
Implement a do_unshare() helper, along the same lines as do_share(), to
provide this functionality for the host-to-hyp case.
Reviewed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-13-qperret@google.com
(cherry picked from commit 376a240f03)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I07aa6194099325c955f48ab608c4b601664419e5
__pkvm_host_share_hyp() shares memory between the host and the
hypervisor so implement it as an invocation of the new do_share()
mechanism.
Note that double-sharing is no longer permitted (as this allows us to
reduce the number of page-table walks significantly), but is thankfully
no longer relied upon by the host.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-12-qperret@google.com
(cherry picked from commit 1ee32109fd)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I9f5877b6ea7b9bf735a90492fbca16f485e73110
By default, protected KVM isolates memory pages so that they are
accessible only to their owner: be it the host kernel, the hypervisor
at EL2 or (in future) the guest. Establishing shared-memory regions
between these components therefore involves a transition for each page
so that the owner can share memory with a borrower under a certain set
of permissions.
Introduce a do_share() helper for safely sharing a memory region between
two components. Currently, only host-to-hyp sharing is implemented, but
the code is easily extended to handle other combinations and the
permission checks for each component are reusable.
Reviewed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-11-qperret@google.com
(cherry picked from commit e82edcc75c)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I3ee4486ab08faa24015ac608979e3596b27806a2
In preparation for adding additional locked sections for manipulating
page-tables at EL2, introduce some simple wrappers around the host and
hypervisor locks so that it's a bit easier to read and bit more difficult
to take the wrong lock (or even take them in the wrong order).
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-10-qperret@google.com
(cherry picked from commit 61d99e33e7)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I413e2d7f514db507a1aca9a21f04115e97f72c00
Explicitly name the combination of SW0 | SW1 as reserved in the pte and
introduce a new PKVM_NOPAGE meta-state which, although not directly
stored in the software bits of the pte, can be used to represent an
entry for which there is no underlying page. This is distinct from an
invalid pte, as stage-2 identity mappings for the host are created
lazily and so an invalid pte there is the same as a valid mapping for
the purposes of ownership information.
This state will be used for permission checking during page transitions
in later patches.
Reviewed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-9-qperret@google.com
(cherry picked from commit 3d467f7b8c)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I2e803a6ae83694be178a3d3eaf75fe918c881659
In order to simplify the page tracking infrastructure at EL2 in nVHE
protected mode, move the responsibility of refcounting pages that are
shared multiple times on the host. In order to do so, let's create a
red-black tree tracking all the PFNs that have been shared, along with
a refcount.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-8-qperret@google.com
(cherry picked from commit a83e2191b7)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I58f39d139c84acb545676c99144547cd080cdda3
The create_hyp_mappings() function can currently be called at any point
in time. However, its behaviour in protected mode changes widely
depending on when it is being called. Prior to KVM init, it is used to
create the temporary page-table used to bring-up the hypervisor, and
later on it is transparently turned into a 'share' hypercall when the
kernel has lost control over the hypervisor stage-1. In order to prepare
the ground for also unsharing pages with the hypervisor during guest
teardown, introduce a kvm_share_hyp() function to make it clear in which
places a share hypercall should be expected, as we will soon need a
matching unshare hypercall in all those places.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-7-qperret@google.com
(cherry picked from commit 3f868e142c)
[willdeacon@: Fix conflict in kvm_arch_vcpu_run_map_fp() in same way as
upstream merge commit 2d761dbf7f ("Merge branch kvm-arm64/fpsimd-tracking
into kvmarm-master/next")]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Id3ba033a51550e5aada862d6af06773d0be833fb
kvm_pgtable_hyp_unmap() relies on the ->page_count() function callback
being provided by the memory-management operations for the page-table.
Wire up this callback for the hypervisor stage-1 page-table.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-5-qperret@google.com
(cherry picked from commit 34ec7cbf1e)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I481b1f722f4dddda72e35481aca725a937a52090
In nVHE-protected mode, the hyp stage-1 page-table refcount is broken
due to the lack of refcount support in the early allocator. Fix-up the
refcount in the finalize walker, once the 'hyp_vmemmap' is up and running.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-4-qperret@google.com
(cherry picked from commit d6b4bd3f48)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I6c48e4e5a0ae969e1d1ac6be0ce601faf138cfd3
To prepare the ground for allowing hyp stage-1 mappings to be removed at
run-time, update the KVM page-table code to maintain a correct refcount
using the ->{get,put}_page() function callbacks.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-3-qperret@google.com
(cherry picked from commit 2ea2ff91e8)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ic4ce9177a4381a01e1f2c3b00e9f376e14015718
In nVHE protected mode, the EL2 code uses a temporary allocator during
boot while re-creating its stage-1 page-table. Unfortunately, the
hyp_vmmemap is not ready to use at this stage, so refcounting pages
is not possible. That is not currently a problem because hyp stage-1
mappings are never removed, which implies refcounting of page-table
pages is unnecessary.
In preparation for allowing hypervisor stage-1 mappings to be removed,
provide stub implementations for {get,put}_page() in the early allocator.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-2-qperret@google.com
(cherry picked from commit 1fac3cfb9c)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I9602cb4db9cef6f52e8beefa52e53cf71f2e0cd1
Running the KVM selftests results in these messages being dumped
in the kernel console:
[ 188.051073] kvm [469]: VGIC redist and dist frames overlap
[ 188.056820] kvm [469]: VGIC redist and dist frames overlap
[ 188.076199] kvm [469]: VGIC redist and dist frames overlap
Being amle to trigger this from userspace is definitely not on,
so demote these warnings to kvm_debug().
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211216104507.1482017-1-maz@kernel.org
(cherry picked from commit 440523b92b)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I0c6c94795a86705925dc1c9519bb0dec831b641c
When handling an error at the point where we try and register
all the redistributors, we unregister all the previously
registered frames by counting down from the failing index.
However, the way the code is written relies on that index
being a signed value. Which won't be true once we switch to
an xarray-based vcpu set.
Since this code is pretty awkward the first place, and that the
failure mode is hard to spot, rewrite this loop to iterate
over the vcpus upwards rather than downwards.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211216104526.1482124-1-maz@kernel.org
(cherry picked from commit c95b1d7ca7)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I9df3874b4a6542e5d20125d436108d8c1235b8e5
The kvm_host_owns_hyp_mappings() function should return true if and only
if the host kernel is responsible for creating the hypervisor stage-1
mappings. That is only possible in standard non-VHE mode, or during boot
in protected nVHE mode. But either way, none of this makes sense in VHE,
so make sure to catch this case as well, hence making the function
return sensible values in any context (VHE or not).
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-7-qperret@google.com
(cherry picked from commit 64a1fbda59)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Iec9d5f5f6f1258b76725df9b93064a9ddef1e670
Now that GICv2 is disabled in nVHE protected mode there should be no
other reason for the host to use create_hyp_io_mappings() or
kvm_phys_addr_ioremap(). Add sanity checks to make sure that assumption
remains true looking forward.
Signed-off-by: Quentin Perret <qperret@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-6-qperret@google.com
(cherry picked from commit bff01cb6b1)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I371533976ce9ffdbf6b0eff986680d34d3153b86
The __io_map_base variable is used at EL2 to track the end of the
hypervisor's "private" VA range in nVHE protected mode. However it
doesn't need to be used outside of mm.c, so let's make it static to keep
all the hyp VA allocation logic in one place.
Signed-off-by: Quentin Perret <qperret@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-5-qperret@google.com
(cherry picked from commit 473a3efbaf)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I0aac3451fdeddbc193d127ed38c3c998636d11b9
The hyp memory pool struct is sized to fit exactly the needs of the
hypervisor stage-1 page-table allocator, so it is important it is not
used for anything else. As it is currently used only from setup.c,
reduce its visibility by marking it static.
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-4-qperret@google.com
(cherry picked from commit 53a563b01f)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I5079221a3a5125ba85b837996aa64f098636d4cc
GICv2 requires having device mappings in guests and the hypervisor,
which is incompatible with the current pKVM EL2 page ownership model
which only covers memory. While it would be desirable to support pKVM
with GICv2, this will require a lot more work, so let's make the
current assumption clear until then.
Co-developed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-3-qperret@google.com
(cherry picked from commit a770ee80e6)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I0c507b698e7cefc389e1a49ed6b15cf59d9daaa7
The EL2 page allocator in protected mode maintains a per-pool max order
value to optimize allocations when the memory region it covers is small.
However, the max order value is currently under-estimated whenever the
number of pages in the region is a power of two. Fix the estimation.
Signed-off-by: Quentin Perret <qperret@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-2-qperret@google.com
(cherry picked from commit 34b43a8849)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ibb149a33cad785c777032a4d129004f619d88653
workaround_flags is a leftover from our earlier Spectre-v4 workaround
implementation, and now serves no purpose.
Get rid of the field and the corresponding asm-offset definition.
Fixes: 29e8910a56 ("KVM: arm64: Simplify handling of ARCH_WORKAROUND_2")
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 142ff9bddb)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ie059d34e32f35718abc13b321fbf9f762a5585c1
kvm_is_transparent_hugepage() was removed in commit 205d76ff06 ("KVM:
Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()") but its
declaration in include/linux/kvm_host.h persisted. Drop it.
Fixes: 205d76ff06 (""KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211018151407.2107363-1-vkuznets@redhat.com
(cherry picked from commit f0e6e6fa41)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I9078ab62be40bc843ca2959f929ed22c1b8888e2
kvm/hyp/reserved_mem.c contains host code executing at EL1 and is not
linked into the hypervisor object. Move the file into kvm/pkvm.c and
rework the headers so that the definitions shared between the host and
the hypervisor live in asm/kvm_pkvm.h.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211202171048.26924-4-will@kernel.org
(cherry picked from commit 9429f4b041)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ic53c6ef5262e473e61bfdd44204b6a6725035827
In order to avoid exposing hypervisor (EL2) data structures directly to
the host, generate hyp_constants.h to provide constants such as structure
sizes to the host without dragging in the definitions themselves.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211202171048.26924-3-will@kernel.org
(cherry picked from commit ed4ed15d57)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I24957ea3ef1da8863a60dcf53c146b3a78f56fa5
The only usage of kvm_io_gic_ops is to make a comparison with its
address and to pass its address to kvm_iodevice_init() which takes a
pointer to const kvm_io_device_ops as input. Make it const to allow the
compiler to put it in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211204213518.83642-1-rikard.falkeborn@gmail.com
(cherry picked from commit 636dcd0204)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I057a166181bea5855dd19be14971ac086e02ec12
With the transition to kvm_arch_vcpu_run_pid_change() to handle
the "run once" activities, it becomes obvious that has_run_once
is now an exact shadow of vcpu->pid.
Replace vcpu->arch.has_run_once with a new vcpu_has_run_once()
helper that directly checks for vcpu->pid, and get rid of the
now unused field.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit cc5705fb1b)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Iaecd0c5440ae929775fd43b7e9cfe71168b45911
The kvm_arch_vcpu_run_pid_change() helper gets called on each PID
change. The kvm_vcpu_first_run_init() helper gets run on the...
first run(!) of a vcpu.
As it turns out, the first run of a vcpu also triggers a PID change
event (vcpu->pid is initially NULL).
Use this property to merge these two helpers and get rid of another
arm64-specific oddity.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit b5aa368abf)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ie65247a0f1fb3bef49c2cdc1d6226836071554f0
Restructure kvm_vcpu_first_run_init() to set the has_run_once
flag after having completed all the "run once" activities.
This includes moving the flip of the userspace irqchip static key
to a point where nothing can fail.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 1408e73d21)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I034562031b0ad89815d2623da1fff8930b964694
Having kvm_arch_vcpu_run_pid_change() inline doesn't bring anything
to the table. Move it next to kvm_vcpu_first_run_init(), which will
be convenient for what is next to come.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 052f064d42)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I78e24d8bbfa44a4ebd96f6e1f1441079a627476a
We currently map the SVE state to HYP on detection of a PID change.
Although this matches what we do for FPSIMD, this is pretty pointless
for SVE, as the buffer is per-vcpu and has nothing to do with the
thread that is being run.
Move the mapping of the SVE state to finalize-time, which is where
we allocate the state memory, and thus the most logical place to
do this.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit bff01a61af)
[willdeacon@: Fixed context conflict due to removal of EL2 thread_info mapping]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I672f411b50a827a45d30ac5fb154c7f1a5102d7d
The bit of documentation that talks about TIF_FOREIGN_FPSTATE
does not mention the ungodly tricks that KVM plays with this flag.
Try and document this for the posterity.
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 31aa126de8)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Iec0b06e35ad286d6bcea15745f2a1b160ff967cc
Now that we can track an equivalent of TIF_FOREIGN_FPSTATE, drop
the mapping of current's thread_info at EL2.
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit bee14bca73)
[willdeacon@: Don't drop inclusion of asm/thread_info.h from vhe/switch.c]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I8d113a0f7551302a03446f9cfac1248b0a975184
We currently have to maintain a mapping the thread_info structure
at EL2 in order to be able to check the TIF_FOREIGN_FPSTATE flag.
In order to eventually get rid of this, start with a vcpu flag that
shadows the thread flag on each entry into the hypervisor.
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit af9a0e21d8)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I3a59991de7eca3a08fc3de9ddb11213d889165b5
Now that we don't have any users left for __sve_save_state, remove
it altogether. Should we ever need to save the SVE state from the
hypervisor again, we can always re-introduce it.
Suggested-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit e66425fc9b)
[willdeacon@: Resolved conflict due to different __sve_save_state code]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ie6a95dfad3e510361730713fa92a61fcf9f22a7e
The SVE host tracking in KVM is pretty involved. It relies on a
set of flags tracking the ownership of the SVE register, as well
as that of the EL0 access.
It is also pretty scary: __hyp_sve_save_host() computes
a thread_struct pointer and obtains a sve_state which gets directly
accessed without further ado, even on nVHE. How can this even work?
The answer to that is that it doesn't, and that this is mostly dead
code. Closer examination shows that on executing a syscall, userspace
loses its SVE state entirely. This is part of the ABI. Another
thing to notice is that although the kernel provides helpers such as
kernel_neon_begin()/end(), they only deal with the FP/NEON state,
and not SVE.
Given that you can only execute a guest as the result of a syscall,
and that the kernel cannot use SVE by itself, it becomes pretty
obvious that there is never any host SVE state to save, and that
this code is only there to increase confusion.
Get rid of the TIF_SVE tracking and host save infrastructure altogether.
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 8383741ab2)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I8f26c83393bac40056ce849a1082b7516130cb0a
The vcpu arch flags are in an interesting, semi random order.
As I have made the mistake of reusing a flag once, let's rework
this in an order that I find a bit less confusing.
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 892fd259cb)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I79f0f8de29bb111d95a923a744055d69e0dbad60
They are moved to common-modules/virtual-device which is a better place
for project-specific configs.
Bug: 233192153
Test: N/A
Signed-off-by: Jiyong Park <jiyong@google.com>
Change-Id: Ia1bb78861977d7dd39479fe7bb0ed9e568064fb0
The patchset in [1] exported some definitions to binder_internal.h in
order to make the debugfs entries such as 'stats' and 'transaction_log'
available in a binderfs instance. However, the DEFINE_SHOW_ATTRIBUTE
macro expands into a static function/variable pair, which in turn get
redefined each time a source file includes this internal header.
This problem was made evident after a report from the kernel test robot
<lkp@intel.com> where several W=1 build warnings are seen in downstream
kernels. See the following example:
include/../drivers/android/binder_internal.h:111:23: warning: 'binder_stats_fops' defined but not used [-Wunused-const-variable=]
111 | DEFINE_SHOW_ATTRIBUTE(binder_stats);
| ^~~~~~~~~~~~
include/linux/seq_file.h:174:37: note: in definition of macro 'DEFINE_SHOW_ATTRIBUTE'
174 | static const struct file_operations __name ## _fops = { \
| ^~~~~~
This patch fixes the above issues by moving back the definitions into
binder.c and instead creates an array of the debugfs entries which is
more convenient to share with binderfs and iterate through.
[1] https://lore.kernel.org/all/20190903161655.107408-1-hridya@google.com/
Fixes: 0e13e452da ("binder: Add stats, state and transactions files")
Fixes: 03e2e07e38 ("binder: Make transaction_log available in binderfs")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20220701182041.2134313-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 240404657
(cherry picked from commit b7e241bbff)
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Change-Id: I7e9ca1ab1f3a5a4a272e50f24d404c17cad55d32
Changes in 5.15.59
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Revert "ocfs2: mount shared volume without ha stack"
ntfs: fix use-after-free in ntfs_ucsncmp()
fs: sendfile handles O_NONBLOCK of out_fd
secretmem: fix unhandled fault in truncate
mm: fix page leak with multiple threads mapping the same page
hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte
asm-generic: remove a broken and needless ifdef conditional
s390/archrandom: prevent CPACF trng invocations in interrupt context
nouveau/svm: Fix to migrate all requested pages
drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
watch_queue: Fix missing rcu annotation
watch_queue: Fix missing locking in add_watch_to_object()
tcp: Fix data-races around sysctl_tcp_dsack.
tcp: Fix a data-race around sysctl_tcp_app_win.
tcp: Fix a data-race around sysctl_tcp_adv_win_scale.
tcp: Fix a data-race around sysctl_tcp_frto.
tcp: Fix a data-race around sysctl_tcp_nometrics_save.
tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
ice: do not setup vlan for loopback VSI
scsi: ufs: host: Hold reference returned by of_parse_phandle()
Revert "tcp: change pingpong threshold to 3"
octeontx2-pf: Fix UDP/TCP src and dst port tc filters
tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.
tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.
tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.
scsi: core: Fix warning in scsi_alloc_sgtables()
scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
net: ping6: Fix memleak in ipv6_renew_options().
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
net/tls: Remove the context from the list in tls_device_down
igmp: Fix data-races around sysctl_igmp_qrv.
net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii
net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()
tcp: Fix a data-race around sysctl_tcp_min_tso_segs.
tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.
tcp: Fix a data-race around sysctl_tcp_autocorking.
tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.
Documentation: fix sctp_wmem in ip-sysctl.rst
macsec: fix NULL deref in macsec_add_rxsa
macsec: fix error message in macsec_add_rxsa and _txsa
macsec: limit replay window size with XPN
macsec: always read MACSEC_SA_ATTR_PN as a u64
net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa()
net: mld: fix reference count leak in mld_{query | report}_work()
tcp: Fix data-races around sk_pacing_rate.
net: Fix data-races around sysctl_[rw]mem(_offset)?.
tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.
tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.
tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.
tcp: Fix data-races around sysctl_tcp_reflect_tos.
ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.
i40e: Fix interface init with MSI interrupts (no MSI-X)
sctp: fix sleep in atomic context bug in timer handlers
octeontx2-pf: cn10k: Fix egress ratelimit configuration
netfilter: nf_queue: do not allow packet truncation below transport header offset
virtio-net: fix the race between refill work and close
perf symbol: Correct address for bss symbols
sfc: disable softirqs for ptp TX
sctp: leave the err path free in sctp_stream_init to sctp_stream_free
ARM: crypto: comment out gcc warning that breaks clang builds
mm/hmm: fault non-owner device private entries
page_alloc: fix invalid watermark check on a negative value
ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow
EDAC/ghes: Set the DIMM label unconditionally
docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
Linux 5.15.59
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4f2002d38aea467e150a912f50d456c41b23de89
The following symbols no longer exist after reverting the out-of-tree
pKVM patches inherited from android13-5.15:
- pkvm_iommu_finalize
- pkvm_iommu_resume
- pkvm_iommu_s2mpu_register
- pkvm_iommu_suspend
- pkvm_iommu_sysmmu_sync_register
Additionally, mem_encrypt_active does not make sense outside of a pKVM
guest (as it will always return 0).
Remove references to these symbols from the symbol lists so that we can
get presubmit working again before re-introducing the symbols in a newer
version of the pKVM series.
Bug: 233587962
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I1413e12e5e0f078fc7c7dfdbcac74c0078dadafd
On initialising the MMIO guard infrastructure, register the
earlycon mapping if present.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 233587962
Change-Id: I379387253d08e2414fa386a3360a45391da7d90d
Signed-off-by: Will Deacon <willdeacon@google.com>
In order to transfer the early mapping state into KVM's MMIO
guard infrastructure, provide a small helper that will retrieve
the associated PTE.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 233587962
Change-Id: Iefc1c57d5e9476b718a8a68f60e562a57b09fb6a
Signed-off-by: Will Deacon <willdeacon@google.com>
Should a guest desire to enroll into the MMIO guard, allow it to
do so with a command-line option.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 233587962
Change-Id: Ia9a77f693531740500739693c52b4959abacafd4
[willdeacon@: Add hypercall IDs]
Signed-off-by: Will Deacon <willdeacon@google.com>
Implement the previously defined ioremap/iounmap hooks for arm64,
calling into KVM's MMIO guard if available.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 233587962
Change-Id: I86a78f8941fb60078fb873a34c5eb32830a00259
[willdeacon@: Add hypercall IDs and slab_is_available() check]
Signed-off-by: Will Deacon <willdeacon@google.com>
Add a pair of hooks (ioremap_phys_range_hook/iounmap_phys_range_hook)
that can be implemented by an architecture. Contrary to the existing
arch_sync_kernel_mappings(), this one tracks things at the physical
address level.
This is specially useful in these virtualised environments where
the guest has to tell the host whether (and how) it intends to use
a MMIO device.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 233587962
Change-Id: I970c2e632cb2b01060d5e66e4194fa9248188f43
Signed-off-by: Will Deacon <willdeacon@google.com>