When vendor hooks are added to a file that previously didn't have any
vendor hooks, we end up indirectly including linux/tracepoint.h. This
causes some data types that used to be opaque (forward declared) to the
code to become visible to the code.
Modversions correctly catches this change in visibility, but we don't
really care about the data types made visible when linux/tracepoint.h is
included. So, hide this from modversions in the central vendor_hooks.h file
instead of having to fix this on a case by case basis.
This change itself will cause a one time CRC breakage/churn because it's
fixing the existing vendor hook headers, but should reduce unnecessary CRC
churns in the future.
To avoid future pointless CRC churn, vendor hook header files that include
vendor_hooks.h should not include linux/tracepoint.h directly.
Bug: 227513263
Bug: 226140073
Signed-off-by: Saravana Kannan <saravanak@google.com>
Change-Id: Ia88e6af11dd94fe475c464eb30a6e5e1e24c938b
It needs addtional struct page **pages params to judge whether
it's possible to migrate pages out of CMA.
Bug: 227475444
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I9a8aa57ff91228baf0fc970b8499464c07872c09
Introduce $debugfs/page_pinner/buffer_size to change
buffer_size on demand. The change of buffer_size will
reset the buffer.
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I505cdc2ee29aa0c6ed4e2dc2c0b6fcff77c388e4
We shouldn't waste memory for vendors who don't use
page_pinner so remove the page_pinner static buffer.
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I46ae2fb5000c4eb59253159032182ca106b39eb9
From the experience, longterm_pinner is not worth maintaining
considering how much it churns MM. Just drop the feature and
we are good with alloc_contig_failed.
The visible effect from this patch is
1. drop $debugfs/page_pinner/longterm_pinner
2. drop put_user_page expoerted API
3. rename alloc_contig_failed to buffer
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I68cc11db448260987a9e26b99647ecb55f571616
Currently, output format is a little hard to parse how long the
page has been pinned since user need to figure out the timeline
from migration failure detection to put event. Sometimes, the log
buffer would be overflowed so we lost the migration failure event
timeline, even. This patch stores the page pinning time in kernel
side and keep the information whenever page was released. Thus,
user could understand the output easier and never lose the information.
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I396f0c12438e0ff8a3497253b750a7e5bb342f57
Currently, page_pinner code are too ugly since we missed refactoring
last time due to GKI deadline. Let's make it better this time before
GKI is freezing.
What this patch is cleaning are __reset_page_pinner which is used
for freeing page as well as putting pages depending on free parameter.
It makes code too ugly for readability PoV and hard to make further
changes so split it with each put and free functions.
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I610ffc629eea5e996b7d55340b5589a3f49574d7
For various reasons based on the allocator behaviour and typical
use-cases at the time, when the max32_alloc_size optimisation was
introduced it seemed reasonable to couple the reset of the tracked
size to the update of cached32_node upon freeing a relevant IOVA.
However, since subsequent optimisations focused on helping genuine
32-bit devices make best use of even more limited address spaces, it
is now a lot more likely for cached32_node to be anywhere in a "full"
32-bit address space, and as such more likely for space to become
available from IOVAs below that node being freed.
At this point, the short-cut in __cached_rbnode_delete_update() really
doesn't hold up any more, and we need to fix the logic to reliably
provide the expected behaviour. We still want cached32_node to only move
upwards, but we should reset the allocation size if *any* 32-bit space
has become available.
Reported-by: Yunfei Wang <yf.wang@mediatek.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Miles Chen <miles.chen@mediatek.com>
Link: https://lore.kernel.org/r/033815732d83ca73b13c11485ac39336f15c3b40.1646318408.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Bug: 223712131
(cherry picked from commit 5b61343b50https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git core)
Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
Change-Id: I5026411dd022c6ddea5c0e4da6e69c7b14162c3f
(cherry picked from commit ec48b1892e)
When dealing with a guest with SVE enabled, we don't populate
the shadow SVE state, nor pin the SVE state at S1 EL2.
Fix both issues in one go.
Bug: 227292021
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <mzyngier@google.com>
Change-Id: I88dc7e9c84e5970ec2466a0aa98ad4e3c94711a0
It isn't always obvious what vcpu (host or shadow) should be used
in which context, nor whether the provided vcpu is valid or not.
To make this less error prone, provide a helper that will always
return a vcpu in the HYP address space as well as the corresponding
shadow state if we're in protected mode. If the host-provided vcpu
doesn't match the loaded vcpu, NULL is returned for both pointers.
In non-protected mode, no state is provided, of course, but the vcpu
is converted to its HYP pointer.
Bug: 227292021
Bug: 227768863
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <mzyngier@google.com>
Change-Id: Idcfc4e042ff05d97ae52f7991666935c1c570f10
We have namespaces, so use them for all vfs-exported namespaces so that
filesystems can use them, but not anything else.
Some in-kernel drivers that do direct filesystem accesses (because they
serve up files) are also allowed access to these symbols to keep 'make
allmodconfig' builds working properly, but it is not needed for Android
kernel images.
Bug: 157965270
Bug: 210074446
Cc: Matthias Maennich <maennich@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iaf6140baf3a18a516ab2d5c3966235c42f3f70de
The MODULE_IMPORT_NS() macro does not allow defined strings to work
properly with it, so add a layer of indirection to allow this to happen.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Matthias Maennich <maennich@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
(cherry picked from commit ca321ec743)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibd64ba139912ea10e81ac22490831129b23a31e1
Add code to head.S's el2_setup to detect MPAM and disable any EL2 traps.
This register resets to an unknown value, setting it to the default
parititons/pmg before we enable the MMU is the best thing to do.
Kexec/kdump will depend on this if the previous kernel left the CPU
configured with a restrictive configuration.
If linux is booted at the highest implemented exception level el2_setup
will clear the enable bit, disabling MPAM.
Signed-off-by: James Morse <james.morse@arm.com>
Bug: 221768437
(cherry picked from commit fa0ff38f06b397d8a92d88eb8083c2c5a20ac87f
git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v5.16)
Change-Id: I2758f7f7b236d09a207e13d1165efb6887e8611a
Signed-off-by: Valentin Schneider <Valentin.Schneider@arm.com>
[bm: amended commit msg, dropped config option and switched to named labels]
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
This is a partial cherry-pick of commit:
7fe77616f156 ("arm64: cpufeature: discover CPU support for MPAM")
from git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
Bug: 221768437
Change-Id: I77101abb07f9b73dbc7cc2a53ac44fbf772f0b1d
Signed-off-by: Valentin Schneider <Valentin.Schneider@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
virtio pci config structures may in future have non-standard bar
values in the bar field. We should anticipate this by skipping any
structures containing such a reserved value.
The bar value should never change: check for harmful modified values
we re-read it from the config space in vp_modern_map_capability().
Also clean up an existing check to consistently use PCI_STD_NUM_BARS.
Signed-off-by: Keir Fraser <keirf@google.com>
Link: https://lore.kernel.org/r/20220323140727.3499235-1-keirf@google.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 3f63a1d7f6)
Bug: 222232623
[keirf@: Pass virtio_pci_device to map_capability. Move everything
into virtio_pci_modern.c]
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: Idbba48154a051cf173b9cb0bd40c77fcf02902a4
This reverts commit 9e35276a53. Issue
were reported for the drivers that are using affinity managed IRQ
where manually toggling IRQ status is not expected. And we forget to
enable the interrupts in the restore path as well.
In the future, we will rework on the interrupt hardening.
Fixes: 9e35276a53 ("virtio_pci: harden MSI-X interrupts")
Reported-by: Marc Zyngier <maz@kernel.org>
Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220323031524.6555-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit eb4cecb453)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: I05264d9e61d558522a8a20cf87399aa3578b3a6e
The synchronous wakeup interface is available only for the
interruptible wakeup. Add it for normal wakeup and use this
synchronous wakeup interface to wakeup the userspace daemon.
Scheduler can make use of this hint to find a better CPU for
the waker task.
With this change the performance numbers for compress, decompress
and copy use-cases on /sdcard path has improved by ~30%.
Use-case details:
1. copy 10000 files of each 4k size into /sdcard path
2. use any File explorer application that has compress/decompress
support
3. start compress/decompress and capture the time.
-------------------------------------------------
| Default | wakeup support | Improvement/Diff |
-------------------------------------------------
| 13.8 sec | 9.9 sec | 3.9 sec (28.26%) |
-------------------------------------------------
Co-developed-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Pradeep P V K <quic_pragalla@quicinc.com>
Bug: 216261533
Link: https://lore.kernel.org/lkml/1638780405-38026-1-git-send-email-quic_pragalla@quicinc.com/
Change-Id: I9ac89064e34b1e0605064bf4d2d3a310679cb605
Signed-off-by: Pradeep P V K <quic_pragalla@quicinc.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
We no longer need to map the host's .rodata and .bss sections in the
pkvm hypervisor, so let's remove those mappings. This will avoid
creating dependencies at EL2 on host-controlled data-structures.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 225169428
Change-Id: I0fcb0e1b34d3c7c0c226b3fd30cdec0e8d7bfb44
The pkvm hypervisor may need to read the kvm_vgic_global_state variable
at EL2. Make sure to explicitely map it in the its stage-1 page-table
rather than relying on mapping all of .rodata.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 225169428
Change-Id: I72d1eba78fb6b7593d236539cd81269480856fdf
In pKVM mode, we can't trust the host not to mess with the hypervisor
per-cpu offsets, so let's move the array containing them to the nVHE
code.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 225169428
Change-Id: I9ef4175ce9cf00d6ff1c0e358551a565358f2408
The host KVM PMU code can currently index kvm_arm_hyp_percpu_base[]
through this_cpu_ptr_hyp_sym(), but will not actually dereference that
pointer when protected KVM is enabled. In preparation for making
kvm_arm_hyp_percpu_base[] unaccessible to the host, let's make sure the
indexing in hyp per-cpu pages is also done after the static key check to
avoid spurious accesses to EL2-private data from EL1.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 225169428
Change-Id: I3f4e3f7ee789c31a1ae1f67e07edf8fb34f520b9
Vendor may have need to track rt util.
Bug: 201261299
Signed-off-by: Rick Yiu <rickyiu@google.com>
Change-Id: I2f4e5142c6bc8574ee3558042e1fb0dae13b702d
Add two new symbols to aarch64 kernel ABI:
* pkvm_iommu_sysmmu_sync_register
* pkvm_iommu_finalize
The former allows vendor modules to register a SYSMMU_SYNC device with
the hypervisor, and the latter tells the hypervisor to stop acception
new device registrations.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I6c6948d94cb6494f07d52b4e2b7e91db40e2fcd6
Add new hypercall that the host can use to inform the hypervisor that
all hypervisor-controlled IOMMUs have been registered and no new
registrations should be allowed. This will typically be called at the
end of kernel module initialization phase.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I8c175310d5b262a67947443c5a0154056a8ebf3e
The IOMMU DABT handler currently checks if the device is considered
powered by hyp before resolving the request. If the power tracking does
not reflect reality, the IOMMU may trigger issues in the host but the
incorrect state prevents it from diagnosing the issue.
Drop the powered check from the generic IOMMU code. The host accessing
the device's SFR means that it assumes it is powered, and individual
drivers can choose to reject that DABT request.
Bug: 224891559
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I1c132c4030a61a90be4675867c9658e3bc696118
SysMMU_SYNC devices expose an interface to start a sync counter and
poll its SFR until the device signals that all memory transactions in
flight at the start have drained. This gives the hypervisor a reliable
indicator that S2MPU invalidation has fully completed and all new
transactions will use the new MPTs.
Add a new pKVM IOMMU driver that the host can use to register
SysMMU_SYNCs. Each device is expected to be a supplier to exactly one
S2MPU (parent), but multiple SYNCs can supply a single S2MPU.
To keep things simple, the SYNCs do not implement suspend/resume and are
assumed to follow the power transitions of their parent.
Following an invalidation, the S2MPU driver iterates over its children
and waits for each SYNC to signal that its transactions have drained.
The algorithm currently waits on each SYNC in turn. If latency proves to
be an issue, this could be optimized to initiate a SYNC on all powered
devices before starting to poll.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I45b832fd11d76b65987935c8548e2a214ee2fa2a
In preparation for adding new IOMMU devices that act as suppliers to
others, add the notion of a parent IOMMU device. Such device must be
registered after its parent and the driver of the parent device must
validate the addition.
The relation has no generic implications, it is up to drivers to make
use of it.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I4ee3675e5529bb73ad4546fa32380f237f054177
In preparation for needing to validate more aspects of a device that is
about to be registered, change the callback to accept the to-be-added
'struct pkvm_iommu' rather than individual inputs.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I3fb911e4280c220ddd779cf6a5fc9c302a5617f7
Private EL2 mappings currently cannot be removed. Move the creation of
IOMMU device mappings at the end of the registration function so that
other errors do not result in unnecessary mappings.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I3139e9af3345f157295eb72441a7cf3cc055116d
Memory for IOMMU device entries gets allocated from a pool donated by
the host. It is possible for pkvm_iommu_register() to allocate the
memory and then fail, in which case the memory remains unused but not
freed.
Refactor the code such that the host lock covers the entire section
where the memory is allocated. This way we can return the memory back to
the linear allocator if an error is returned.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I8c1650ba3e545741144d793de506e93c4066896f
Currently __pkvm_iommu_pm_notify always changes the value of
dev->powered following a suspend/resume attempt. This could potentially
be abused to force the hypervisor to stop issuing updates to an S2MPU
and preserving an old/invalid state.
Modify to only update the power state if suspend/resume was successful.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I285fc822e9fc926c49b9b5e69446790e1edccafb
Passing FOLL_FORCE when pinning guest memory pages was intended to allow
the VMM to map guest memory as PROT_NONE without prohibiting access from
the guest. As it turns out, crosvm doesn't implement this, and since
the host kernel will inject a signal into the VMM on a bad access
irrespective of the stage-1 permissions, we can drop the FOLL_FORCE flag
altogether.
Bug: 226564150
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: If21091b6adf3dbe4155c5c840753c912d283b159
This reverts commit e853c3b172.
This capability is unused, so remove it to avoid UAPI divergence from
upstream.
Bug: 226564150
[willdeacon@: Also removed additional instance in arch/arm64/kvm/arm.c]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ib3e929a5fc81dc5c9c1ff8512d48f63bdda5c404
This reverts commit 7f19cf521f.
These notifications are unused by crosvm and are no longer required now
that the host takes care of injecting a SEGV on an illegal memory access
from userspace.
Bug: 226564150
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I22c3e49b4aa5f023961c8849b79e2e0a21ebf0c1
Vendor may have the need to implement their own util tracking.
Bug: 201260585
Signed-off-by: Rick Yiu <rickyiu@google.com>
Change-Id: I973902e6ff82a85ecd029ac5a78692d629df1ebe
The pKVM hypervisor will currently panic if the host tries to access
memory that it doesn't own (e.g. protected guest memory). Sadly, as
guest memory can still be mapped into the VMM's address space, userspace
can trivially crash the kernel/hypervisor by poking into guest memory.
To prevent this, inject the abort back in the host with S1PTW set in the
ESR, hence allowing the host to differentiate this abort from normal
userspace faults and inject a SIGSEGV cleanly.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 215520143
Change-Id: I9636e71e2fe3eb49d2d7cddaab7774cd672cfcae
In order to simplify the injection of exceptions in the host in pkvm
context, let's factor out of enter_exception64() the code calculating
the exception offset from VBAR_EL1 and the cpsr.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 215520143
Change-Id: I97b2431a79fdec87c95c2d1f691bd3a11635c29b
Add a helper allowing to check when the pkvm static key is enabled to
ease the introduction of pkvm hooks in other parts of the code.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 215520143
Change-Id: Iae065b09bb33d42d73a408365c803727269d0de0