The GICv3 ITS driver allocates memory for its tables using alloc_pages()
and performs explicit cache maintenance if necessary. On systems such
as those running pKVM, where the memory encryption API is implemented,
memory shared with the ITS must first be transitioned to the "decrypted"
state, as it would be if allocated via the DMA API.
Allow pKVM guests to interact with an ITS emulation by ensuring that the
shared pages are decrypted at the point of allocation and encrypted
again upon free().
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211208155916.681-1-will@kernel.org
Bug: 209580772
Change-Id: I89820c65769a07306fd3e067d7d33c938d156820
Signed-off-by: Quentin Perret <qperret@google.com>
Now that GICv2 is disabled in nVHE protected mode there should be no
other reason for the host to use create_hyp_io_mappings() or
kvm_phys_addr_ioremap(). Add sanity checks to make sure that assumption
remains true looking forward.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-6-qperret@google.com
Bug: 209580772
Change-Id: I371533976ce9ffdbf6b0eff986680d34d3153b86
The __io_map_base variable is used at EL2 to track the end of the
hypervisor's "private" VA range in nVHE protected mode. However it
doesn't need to be used outside of mm.c, so let's make it static to keep
all the hyp VA allocation logic in one place.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-5-qperret@google.com
Bug: 209580772
Change-Id: I0aac3451fdeddbc193d127ed38c3c998636d11b9
The hyp memory pool struct is sized to fit exactly the needs of the
hypervisor stage-1 page-table allocator, so it is important it is not
used for anything else. As it is currently used only from setup.c,
reduce its visibility by marking it static.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-4-qperret@google.com
Bug: 209580772
Change-Id: I5079221a3a5125ba85b837996aa64f098636d4cc
GICv2 requires having device mappings in guests and the hypervisor,
which is incompatible with the current pKVM EL2 page ownership model
which only covers memory. While it would be desirable to support pKVM
with GICv2, this will require a lot more work, so let's make the
current assumption clear until then.
Co-developed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-3-qperret@google.com
Bug: 209580772
Change-Id: I0c507b698e7cefc389e1a49ed6b15cf59d9daaa7
The EL2 page allocator in protected mode maintains a per-pool max order
value to optimize allocations when the memory region it covers is small.
However, the max order value is currently under-estimated whenever the
number of pages in the region is a power of two. Fix the estimation.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-2-qperret@google.com
Bug: 209580772
Change-Id: Ibb149a33cad785c777032a4d129004f619d88653
Introduce an unshare hypercall which can be used to unmap memory from
the hypervisor stage-1 in nVHE protected mode. This will be useful to
update the EL2 ownership state of pages during guest teardown, and
avoids keeping dangling mappings to unreferenced portions of memory.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-15-qperret@google.com
Bug: 209599700
Change-Id: Id79362978000d72b866152d0d83c887e4caeb973
Tearing down a previously shared memory region results in the borrower
losing access to the underlying pages and returning them to the "owned"
state in the owner.
Implement a do_unshare() helper, along the same lines as do_share(), to
provide this functionality for the host-to-hyp case.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-14-qperret@google.com
Bug: 209599700
Change-Id: I717d87c9aa2d1f1b159d7dc3bca439a2869967e5
__pkvm_host_share_hyp() shares memory between the host and the
hypervisor so implement it as an invocation of the new do_share()
mechanism.
Note that double-sharing is no longer permitted (as this allows us to
reduce the number of page-table walks significantly), but is thankfully
no longer relied upon by the host.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-13-qperret@google.com
Bug: 209599700
Change-Id: I8d44fc9ca79ac7ea5f8ca289b3cca08a4879b3cd
By default, protected KVM isolates memory pages so that they are
accessible only to their owner: be it the host kernel, the hypervisor
at EL2 or (in future) the guest. Establishing shared-memory regions
between these components therefore involves a transition for each page
so that the owner can share memory with a borrower under a certain set
of permissions.
Introduce a do_share() helper for safely sharing a memory region between
two components. Currently, only host-to-hyp sharing is implemented, but
the code is easily extended to handle other combinations and the
permission checks for each component are reusable.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-12-qperret@google.com
Bug: 209599700
Change-Id: I7edb1b53014ffb4a5aa7a6ee54fd99d8091b57cd
In preparation for adding additional locked sections for manipulating
page-tables at EL2, introduce some simple wrappers around the host and
hypervisor locks so that it's a bit easier to read and bit more difficult
to take the wrong lock (or even take them in the wrong order).
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-11-qperret@google.com
Bug: 209599700
Change-Id: If6a1baf1dc099894c3445d6b6fec4dd3a46164a9
Explicitly name the combination of SW0 | SW1 as reserved in the pte and
introduce a new PKVM_NOPAGE meta-state which, although not directly
stored in the software bits of the pte, can be used to represent an
entry for which there is no underlying page. This is distinct from an
invalid pte, as stage-2 identity mappings for the host are created
lazily and so an invalid pte there is the same as a valid mapping for
the purposes of ownership information.
This state will be used for permission checking during page transitions
in later patches.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-10-qperret@google.com
Bug: 209599700
Change-Id: I7f31f675d39c5b33168eb652ca35822fba2ec0ff
In order to simplify the page tracking infrastructure at EL2 in nVHE
protected mode, move the responsibility of refcounting pages that are
shared multiple times on the host. In order to do so, let's create a
red-black tree tracking all the PFNs that have been shared, along with
a refcount.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-9-qperret@google.com
Bug: 209599700
Change-Id: I11dc907d139ba314247fb42e8702a6e80c55c054
The create_hyp_mappings() function can currently be called at any point
in time. However, its behaviour in protected mode changes widely
depending on when it is being called. Prior to KVM init, it is used to
create the temporary page-table used to bring-up the hypervisor, and
later on it is transparently turned into a 'share' hypercall when the
kernel has lost control over the hypervisor stage-1. In order to prepare
the ground for also unsharing pages with the hypervisor during guest
teardown, introduce a kvm_share_hyp() function to make it clear in which
places a share hypercall should be expected, as we will soon need a
matching unshare hypercall in all those places.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-8-qperret@google.com/
Bug: 209599700
Change-Id: I17b9c2542e21f7c4cef0ee1e358b71a4f01c6647
kvm_pgtable_hyp_unmap() relies on the ->page_count() function callback
being provided by the memory-management operations for the page-table.
Wire up this callback for the hypervisor stage-1 page-table.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-6-qperret@google.com
Bug: 209599700
Change-Id: Ieaf1f60698e1ebafc60424e879ccfd6ec192dbb5
In nVHE-protected mode, the hyp stage-1 page-table refcount is broken
due to the lack of refcount support in the early allocator. Fix-up the
refcount in the finalize walker, once the 'hyp_vmemmap' is up and running.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-5-qperret@google.com
Bug: 209599700
Change-Id: Ib31ace99838f397d7a2e48bfd43c6f4eaf730878
To prepare the ground for allowing hyp stage-1 mappings to be removed at
run-time, update the KVM page-table code to maintain a correct refcount
using the ->{get,put}_page() function callbacks.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-4-qperret@google.com
Bug: 209599700
Change-Id: If45f4a5c62e70db5c6ee60192fff5ca4b945aa31
In nVHE protected mode, the EL2 code uses a temporary allocator during
boot while re-creating its stage-1 page-table. Unfortunately, the
hyp_vmmemap is not ready to use at this stage, so refcounting pages
is not possible. That is not currently a problem because hyp stage-1
mappings are never removed, which implies refcounting of page-table
pages is unnecessary.
In preparation for allowing hypervisor stage-1 mappings to be removed,
provide stub implementations for {get,put}_page() in the early allocator.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-3-qperret@google.com
Bug: 209599700
Change-Id: I051ceebbe2c564ff88726a451f83af646f0d2cf0
The kvm_host_owns_hyp_mappings() function should return true if and only
if the host kernel is responsible for creating the hypervisor stage-1
mappings. That is only possible in standard non-VHE mode, or during boot
in protected nVHE mode. But either way, non of this makes sense in VHE,
so make sure to catch this case as well, hence making the function
return sensible values in any context (VHE or not).
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-2-qperret@google.com
Bug: 209599700
Change-Id: Iec9d5f5f6f1258b76725df9b93064a9ddef1e670
virtio_max_dma_size() returns the maximum DMA mapping size of the virtio
device by querying dma_max_mapping_size() for the device when the DMA
API is in use for the vring. Unfortunately, the device passed is
initialised by register_virtio_device() and does not inherit the DMA
configuration from its parent, resulting in SWIOTLB errors when bouncing
is enabled and the default 256K mapping limit (IO_TLB_SEGSIZE) is not
respected:
| virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 294912 bytes), total 1024 (slots), used 725 (slots)
Follow the pattern used elsewhere in the virtio_ring code when calling
into the DMA layer and pass the parent device to dma_max_mapping_size()
instead.
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211201112018.25276-1-will@kernel.org
Bug: 209580772
Change-Id: I3389270b4df2b0e0d3813ff8be61bdb594c1b0bd
Signed-off-by: Quentin Perret <qperret@google.com>
kvm_is_transparent_hugepage() was removed in commit 205d76ff06 ("KVM:
Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()") but its
declaration in include/linux/kvm_host.h persisted. Drop it.
Fixes: 205d76ff06 (""KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211018151407.2107363-1-vkuznets@redhat.com
(cherry picked from commit f0e6e6fa41
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I9078ab62be40bc843ca2959f929ed22c1b8888e2
kvm/hyp/reserved_mem.c contains host code executing at EL1 and is not
linked into the hypervisor object. Move the file into kvm/pkvm.c and
rework the headers so that the definitions shared between the host and
the hypervisor live in asm/kvm_pkvm.h.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211202171048.26924-4-will@kernel.org
(cherry picked from commit 9429f4b041
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ic53c6ef5262e473e61bfdd44204b6a6725035827
In order to avoid exposing hypervisor (EL2) data structures directly to
the host, generate hyp_constants.h to provide constants such as structure
sizes to the host without dragging in the definitions themselves.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211202171048.26924-3-will@kernel.org
(cherry picked from commit ed4ed15d57
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I24957ea3ef1da8863a60dcf53c146b3a78f56fa5
asm/mmu.h refers to cpus_have_const_cap() in the definition of
arm64_kernel_unmapped_at_el0() so include asm/cpufeature.h directly
rather than force all users of the header to do it themselves.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211202171048.26924-2-will@kernel.org
(cherry picked from commit 7e04f05984
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Iaa42070f8f41255406b1031e5a59f58c06f47f5d
The only usage of kvm_io_gic_ops is to make a comparison with its
address and to pass its address to kvm_iodevice_init() which takes a
pointer to const kvm_io_device_ops as input. Make it const to allow the
compiler to put it in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211204213518.83642-1-rikard.falkeborn@gmail.com
(cherry picked from commit 636dcd0204
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I057a166181bea5855dd19be14971ac086e02ec12
When running a KVM guest hosted on an ARMv8.7 machine, the host
kernel complains that it doesn't know about the architected number
of events.
Fix it by adding the PMUver code corresponding to PMUv3 for ARMv8.7.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211126115533.217903-1-maz@kernel.org
(cherry picked from commit 00e228b315
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I705efed6bcdd2000a57901bd04ba080a36527ad4
With the transition to kvm_arch_vcpu_run_pid_change() to handle
the "run once" activities, it becomes obvious that has_run_once
is now an exact shadow of vcpu->pid.
Replace vcpu->arch.has_run_once with a new vcpu_has_run_once()
helper that directly checks for vcpu->pid, and get rid of the
now unused field.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit cc5705fb1b
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Iaecd0c5440ae929775fd43b7e9cfe71168b45911
The kvm_arch_vcpu_run_pid_change() helper gets called on each PID
change. The kvm_vcpu_first_run_init() helper gets run on the...
first run(!) of a vcpu.
As it turns out, the first run of a vcpu also triggers a PID change
event (vcpu->pid is initially NULL).
Use this property to merge these two helpers and get rid of another
arm64-specific oddity.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit b5aa368abf
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ie65247a0f1fb3bef49c2cdc1d6226836071554f0
Restructure kvm_vcpu_first_run_init() to set the has_run_once
flag after having completed all the "run once" activities.
This includes moving the flip of the userspace irqchip static key
to a point where nothing can fail.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 1408e73d21
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I034562031b0ad89815d2623da1fff8930b964694
Having kvm_arch_vcpu_run_pid_change() inline doesn't bring anything
to the table. Move it next to kvm_vcpu_first_run_init(), which will
be convenient for what is next to come.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 052f064d42
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I78e24d8bbfa44a4ebd96f6e1f1441079a627476a
We currently map the SVE state to HYP on detection of a PID change.
Although this matches what we do for FPSIMD, this is pretty pointless
for SVE, as the buffer is per-vcpu and has nothing to do with the
thread that is being run.
Move the mapping of the SVE state to finalize-time, which is where
we allocate the state memory, and thus the most logical place to
do this.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit bff01a61af
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
[willdeacon@: Fixed context conflict due to removal of EL2 thread_info mapping]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I672f411b50a827a45d30ac5fb154c7f1a5102d7d
The bit of documentation that talks about TIF_FOREIGN_FPSTATE
does not mention the ungodly tricks that KVM plays with this flag.
Try and document this for the posterity.
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 31aa126de8
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Iec0b06e35ad286d6bcea15745f2a1b160ff967cc
Now that we can track an equivalent of TIF_FOREIGN_FPSTATE, drop
the mapping of current's thread_info at EL2.
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit bee14bca73
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I8d113a0f7551302a03446f9cfac1248b0a975184
We currently have to maintain a mapping the thread_info structure
at EL2 in order to be able to check the TIF_FOREIGN_FPSTATE flag.
In order to eventually get rid of this, start with a vcpu flag that
shadows the thread flag on each entry into the hypervisor.
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit af9a0e21d8
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I3a59991de7eca3a08fc3de9ddb11213d889165b5
Now that we don't have any users left for __sve_save_state, remove
it altogether. Should we ever need to save the SVE state from the
hypervisor again, we can always re-introduce it.
Suggested-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit e66425fc9b
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
[willdeacon@: Resolved conflict due to different __sve_save_state code]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ie6a95dfad3e510361730713fa92a61fcf9f22a7e
The SVE host tracking in KVM is pretty involved. It relies on a
set of flags tracking the ownership of the SVE register, as well
as that of the EL0 access.
It is also pretty scary: __hyp_sve_save_host() computes
a thread_struct pointer and obtains a sve_state which gets directly
accessed without further ado, even on nVHE. How can this even work?
The answer to that is that it doesn't, and that this is mostly dead
code. Closer examination shows that on executing a syscall, userspace
loses its SVE state entirely. This is part of the ABI. Another
thing to notice is that although the kernel provides helpers such as
kernel_neon_begin()/end(), they only deal with the FP/NEON state,
and not SVE.
Given that you can only execute a guest as the result of a syscall,
and that the kernel cannot use SVE by itself, it becomes pretty
obvious that there is never any host SVE state to save, and that
this code is only there to increase confusion.
Get rid of the TIF_SVE tracking and host save infrastructure altogether.
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 8383741ab2
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I8f26c83393bac40056ce849a1082b7516130cb0a
The vcpu arch flags are in an interesting, semi random order.
As I have made the mistake of reusing a flag once, let's rework
this in an order that I find a bit less confusing.
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 892fd259cb
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I79f0f8de29bb111d95a923a744055d69e0dbad60
Changes in 5.15.7
ALSA: usb-audio: Restrict rates for the shared clocks
ALSA: usb-audio: Rename early_playback_start flag with lowlatency_playback
ALSA: usb-audio: Disable low-latency playback for free-wheel mode
ALSA: usb-audio: Disable low-latency mode for implicit feedback sync
ALSA: usb-audio: Check available frames for the next packet size
ALSA: usb-audio: Add spinlock to stop_urbs()
ALSA: usb-audio: Improved lowlatency playback support
ALSA: usb-audio: Avoid killing in-flight URBs during draining
ALSA: usb-audio: Fix packet size calculation regression
ALSA: usb-audio: Less restriction for low-latency playback mode
ALSA: usb-audio: Switch back to non-latency mode at a later point
ALSA: usb-audio: Don't start stream for capture at prepare
gfs2: release iopen glock early in evict
gfs2: Fix length of holes reported at end-of-file
powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window
drm/sun4i: fix unmet dependency on RESET_CONTROLLER for PHY_SUN6I_MIPI_DPHY
mac80211: do not access the IV when it was stripped
mac80211: fix throughput LED trigger
x86/hyperv: Move required MSRs check to initial platform probing
net/smc: Transfer remaining wait queue entries during fallback
atlantic: Fix OOB read and write in hw_atl_utils_fw_rpc_wait
net: return correct error code
pinctrl: qcom: fix unmet dependencies on GPIOLIB for GPIOLIB_IRQCHIP
platform/x86: dell-wmi-descriptor: disable by default
platform/x86: thinkpad_acpi: Add support for dual fan control
platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep
s390/setup: avoid using memblock_enforce_memory_limit
btrfs: silence lockdep when reading chunk tree during mount
btrfs: check-integrity: fix a warning on write caching disabled disk
thermal: core: Reset previous low and high trip during thermal zone init
scsi: iscsi: Unblock session then wake up error handler
net: usb: r8152: Add MAC passthrough support for more Lenovo Docks
drm/amd/pm: Remove artificial freq level on Navi1x
drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
drm/amd/amdgpu: fix potential memleak
ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile
ata: libahci: Adjust behavior when StorageD3Enable _DSD is set
ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
ipv6: check return value of ipv6_skip_exthdr
net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
perf sort: Fix the 'weight' sort key behavior
perf sort: Fix the 'ins_lat' sort key behavior
perf sort: Fix the 'p_stage_cyc' sort key behavior
perf inject: Fix ARM SPE handling
perf hist: Fix memory leak of a perf_hpp_fmt
perf report: Fix memory leaks around perf_tip()
tracing: Don't use out-of-sync va_list in event printing
net/smc: Avoid warning of possible recursive locking
ACPI: Add stubs for wakeup handler functions
net/tls: Fix authentication failure in CCM mode
vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit
kprobes: Limit max data_size of the kretprobe instances
ALSA: hda/cs8409: Set PMSG_ON earlier inside cs8409 driver
rt2x00: do not mark device gone on EPROTO errors during start
ipmi: Move remove_work to dedicated workqueue
cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink()
iwlwifi: mvm: retry init flow if failed
dma-buf: system_heap: Use 'for_each_sgtable_sg' in pages free flow
s390/pci: move pseudo-MMIO to prevent MIO overlap
fget: check that the fd still exists after getting a ref to it
sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl
sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl
scsi: lpfc: Fix non-recovery of remote ports following an unsolicited LOGO
scsi: ufs: ufs-pci: Add support for Intel ADL
ipv6: fix memory leak in fib6_rule_suppress
drm/amd/display: Allow DSC on supported MST branch devices
drm/i915/dp: Perform 30ms delay after source OUI write
KVM: fix avic_set_running for preemptable kernels
KVM: Disallow user memslot with size that exceeds "unsigned long"
KVM: x86/mmu: Fix TLB flush range when handling disconnected pt
KVM: Ensure local memslot copies operate on up-to-date arch-specific data
KVM: x86: ignore APICv if LAPIC is not enabled
KVM: nVMX: Emulate guest TLB flush on nested VM-Enter with new vpid12
KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST
KVM: nVMX: Abide to KVM_REQ_TLB_FLUSH_GUEST request on nested vmentry/vmexit
KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled
KVM: x86: Use a stable condition around all VT-d PI paths
KVM: MMU: shadow nested paging does not have PKU
KVM: arm64: Avoid setting the upper 32 bits of TCR_EL2 and CPTR_EL2 to 1
KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg()
KVM: x86: check PIR even for vCPUs with disabled APICv
tracing/histograms: String compares should not care about signed values
net: dsa: mv88e6xxx: Fix application of erratum 4.8 for 88E6393X
net: dsa: mv88e6xxx: Drop unnecessary check in mv88e6393x_serdes_erratum_4_6()
net: dsa: mv88e6xxx: Save power by disabling SerDes trasmitter and receiver
net: dsa: mv88e6xxx: Add fix for erratum 5.2 of 88E6393X family
net: dsa: mv88e6xxx: Fix inband AN for 2500base-x on 88E6393X family
net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed
wireguard: selftests: increase default dmesg log size
wireguard: allowedips: add missing __rcu annotation to satisfy sparse
wireguard: selftests: actually test for routing loops
wireguard: selftests: rename DEBUG_PI_LIST to DEBUG_PLIST
wireguard: device: reset peer src endpoint when netns exits
wireguard: receive: use ring buffer for incoming handshakes
wireguard: receive: drop handshakes if queue lock is contended
wireguard: ratelimiter: use kvcalloc() instead of kvzalloc()
i2c: stm32f7: flush TX FIFO upon transfer errors
i2c: stm32f7: recover the bus on access timeout
i2c: stm32f7: stop dma transfer in case of NACK
i2c: cbus-gpio: set atomic transfer callback
natsemi: xtensa: fix section mismatch warnings
tcp: fix page frag corruption on page fault
net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()
net: mpls: Fix notifications when deleting a device
siphash: use _unaligned version by default
arm64: ftrace: add missing BTIs
iwlwifi: fix warnings produced by kernel debug options
net/mlx5e: IPsec: Fix Software parser inner l3 type setting in case of encapsulation
net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources()
selftests: net: Correct case name
net: dsa: b53: Add SPI ID table
mt76: mt7915: fix NULL pointer dereference in mt7915_get_phy_mode
ASoC: tegra: Fix wrong value type in ADMAIF
ASoC: tegra: Fix wrong value type in I2S
ASoC: tegra: Fix wrong value type in DMIC
ASoC: tegra: Fix wrong value type in DSPK
ASoC: tegra: Fix kcontrol put callback in ADMAIF
ASoC: tegra: Fix kcontrol put callback in I2S
ASoC: tegra: Fix kcontrol put callback in DMIC
ASoC: tegra: Fix kcontrol put callback in DSPK
ASoC: tegra: Fix kcontrol put callback in AHUB
rxrpc: Fix rxrpc_peer leak in rxrpc_look_up_bundle()
rxrpc: Fix rxrpc_local leak in rxrpc_lookup_peer()
ALSA: intel-dsp-config: add quirk for CML devices based on ES8336 codec
net: stmmac: Avoid DMA_CHAN_CONTROL write if no Split Header support
net: usb: lan78xx: lan78xx_phy_init(): use PHY_POLL instead of "0" if no IRQ is available
net: marvell: mvpp2: Fix the computation of shared CPUs
dpaa2-eth: destroy workqueue at the end of remove function
octeontx2-af: Fix a memleak bug in rvu_mbox_init()
net: annotate data-races on txq->xmit_lock_owner
ipv4: convert fib_num_tclassid_users to atomic_t
net/smc: fix wrong list_del in smc_lgr_cleanup_early
net/rds: correct socket tunable error in rds_tcp_tune()
net/smc: Keep smc_close_final rc during active close
drm/msm/a6xx: Allocate enough space for GMU registers
drm/msm: Do hw_init() before capturing GPU state
drm/vc4: kms: Wait for the commit before increasing our clock rate
drm/vc4: kms: Fix return code check
drm/vc4: kms: Add missing drm_crtc_commit_put
drm/vc4: kms: Clear the HVS FIFO commit pointer once done
drm/vc4: kms: Don't duplicate pending commit
drm/vc4: kms: Fix previous HVS commit wait
atlantic: Increase delay for fw transactions
atlatnic: enable Nbase-t speeds with base-t
atlantic: Fix to display FW bundle version instead of FW mac version.
atlantic: Add missing DIDs and fix 115c.
Remove Half duplex mode speed capabilities.
atlantic: Fix statistics logic for production hardware
atlantic: Remove warn trace message.
KVM: x86/mmu: Skip tlb flush if it has been done in zap_gfn_range()
KVM: x86/mmu: Pass parameter flush as false in kvm_tdp_mmu_zap_collapsible_sptes()
drm/msm/devfreq: Fix OPP refcnt leak
drm/msm: Fix mmap to include VM_IO and VM_DONTDUMP
drm/msm: Fix wait_fence submitqueue leak
drm/msm: Restore error return on invalid fence
ASoC: rk817: Add module alias for rk817-codec
iwlwifi: Fix memory leaks in error handling path
KVM: X86: Fix when shadow_root_level=5 && guest root_level<4
KVM: SEV: initialize regions_list of a mirror VM
net/mlx5e: Fix missing IPsec statistics on uplink representor
net/mlx5: Move MODIFY_RQT command to ignore list in internal error state
net/mlx5: E-switch, Respect BW share of the new group
net/mlx5: E-Switch, fix single FDB creation on BlueField
net/mlx5: E-Switch, Check group pointer before reading bw_share value
KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register
KVM: VMX: Set failure code in prepare_vmcs02()
mctp: Don't let RTM_DELROUTE delete local routes
Revert "drm/i915: Implement Wa_1508744258"
io-wq: don't retry task_work creation failure on fatal conditions
x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword
x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()
x86/entry: Use the correct fence macro after swapgs in kernel CR3
x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
preempt/dynamic: Fix setup_preempt_mode() return value
sched/uclamp: Fix rq->uclamp_max not set on first enqueue
KVM: SEV: Return appropriate error codes if SEV-ES scratch setup fails
KVM: x86/mmu: Rename slot_handle_leaf to slot_handle_level_4k
KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path
net/mlx5e: Rename lro_timeout to packet_merge_timeout
net/mlx5e: Rename TIR lro functions to TIR packet merge functions
net/mlx5e: Sync TIR params updates against concurrent create/modify
serial: 8250_bcm7271: UART errors after resuming from S2
parisc: Fix KBUILD_IMAGE for self-extracting kernel
parisc: Fix "make install" on newer debian releases
parisc: Mark cr16 CPU clocksource unstable on all SMP machines
vgacon: Propagate console boot parameters before calling `vc_resize'
xhci: Fix commad ring abort, write all 64 bits to CRCR register.
USB: NO_LPM quirk Lenovo Powered USB-C Travel Hub
usb: typec: tcpm: Wait in SNK_DEBOUNCED until disconnect
usb: cdns3: gadget: fix new urb never complete if ep cancel previous requests
usb: cdnsp: Fix a NULL pointer dereference in cdnsp_endpoint_init()
x86/tsc: Add a timer to make sure TSC_adjust is always checked
x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
x86/64/mm: Map all kernel memory into trampoline_pgd
tty: serial: msm_serial: Deactivate RX DMA for polling support
serial: pl011: Add ACPI SBSA UART match id
serial: tegra: Change lower tolerance baud rate limit for tegra20 and tegra30
serial: core: fix transmit-buffer reset and memleak
serial: 8250_pci: Fix ACCES entries in pci_serial_quirks array
serial: 8250_pci: rewrite pericom_do_set_divisor()
serial: 8250: Fix RTS modem control while in rs485 mode
serial: liteuart: Fix NULL pointer dereference in ->remove()
serial: liteuart: fix use-after-free and memleak on unbind
serial: liteuart: fix minor-number leak on probe errors
ipmi: msghandler: Make symbol 'remove_work_wq' static
Linux 5.15.7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9300a10911f6205d2fb76f18255b017d34d68d1d
commit f85e04503f upstream.
Commit f45709df77 ("serial: 8250: Don't touch RTS modem control while
in rs485 mode") sought to prevent user space from interfering with rs485
communication by ignoring a TIOCMSET ioctl() which changes RTS polarity.
It did so in serial8250_do_set_mctrl(), which turns out to be too deep
in the call stack: When a uart_port is opened, RTS polarity is set by
the rs485-aware function uart_port_dtr_rts(). It calls down to
serial8250_do_set_mctrl() and that particular RTS polarity change should
*not* be ignored.
The user-visible result is that on 8250_omap ports which use rs485 with
inverse polarity (RTS bit in MCR register is 1 to receive, 0 to send),
a newly opened port initially sets up RTS for sending instead of
receiving. That's because omap_8250_startup() sets the cached value
up->mcr to 0 and omap_8250_restore_regs() subsequently writes it to the
MCR register. Due to the commit, serial8250_do_set_mctrl() preserves
that incorrect register value:
do_sys_openat2
do_filp_open
path_openat
vfs_open
do_dentry_open
chrdev_open
tty_open
uart_open
tty_port_open
uart_port_activate
uart_startup
uart_port_startup
serial8250_startup
omap_8250_startup # up->mcr = 0
uart_change_speed
serial8250_set_termios
omap_8250_set_termios
omap_8250_restore_regs
serial8250_out_MCR # up->mcr written
tty_port_block_til_ready
uart_dtr_rts
uart_port_dtr_rts
serial8250_set_mctrl
omap8250_set_mctrl
serial8250_do_set_mctrl # mcr[1] = 1 ignored
Fix by intercepting RTS changes from user space in uart_tiocmset()
instead.
Link: https://lore.kernel.org/linux-serial/20211027111644.1996921-1-baocheng.su@siemens.com/
Fixes: f45709df77 ("serial: 8250: Don't touch RTS modem control while in rs485 mode")
Cc: Chao Zeng <chao.zeng@siemens.com>
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Su Bao Cheng <baocheng.su@siemens.com>
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Tested-by: Su Bao Cheng <baocheng.su@siemens.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/21170e622a1aaf842a50b32146008b5374b3dd1d.1637596432.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00de977f9e upstream.
Commit 761ed4a945 ("tty: serial_core: convert uart_close to use
tty_port_close") converted serial core to use tty_port_close() but
failed to notice that the transmit buffer still needs to be freed on
final close.
Not freeing the transmit buffer means that the buffer is no longer
cleared on next open so that any ioctl() waiting for the buffer to drain
might wait indefinitely (e.g. on termios changes) or that stale data can
end up being transmitted in case tx is restarted.
Furthermore, the buffer of any port that has been opened would leak on
driver unbind.
Note that the port lock is held when clearing the buffer pointer due to
the ldisc race worked around by commit a5ba1d95e4 ("uart: fix race
between uart_put_char() and uart_shutdown()").
Also note that the tty-port shutdown() callback is not called for
console ports so it is not strictly necessary to free the buffer page
after releasing the lock (cf. d72402145a ("tty/serial: do not free
trasnmit buffer page under port lock")).
Link: https://lore.kernel.org/r/319321886d97c456203d5c6a576a5480d07c3478.1635781688.git.baruch@tkos.co.il
Fixes: 761ed4a945 ("tty: serial_core: convert uart_close to use tty_port_close")
Cc: stable@vger.kernel.org # 4.9
Cc: Rob Herring <robh@kernel.org>
Reported-by: Baruch Siach <baruch@tkos.co.il>
Tested-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20211108085431.12637-1-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>