The time spent in io_schedule() and also the interrupt latency are
significant when submitting direct I/O to a UFS device. Hence this patch
that implements polling support. User space software can enable polling by
passing the RWF_HIPRI flag to the preadv2() system call or the
IORING_SETUP_IOPOLL flag to the io_uring interface.
Although the block layer supports to partition the tag space for
interrupt-based completions (HCTX_TYPE_DEFAULT) purposes and polling
(HCTX_TYPE_POLL), the choice has been made to use the same hardware queue
for both hctx types because partitioning the tag space would negatively
affect performance.
On my test setup this patch increases IOPS from 2736 to 22000 (8x) for the
following test:
for hipri in 0 1; do
fio --ioengine=io_uring --iodepth=1 --rw=randread \
--runtime=60 --time_based=1 --direct=1 --name=qd1 \
--filename=/dev/block/sda --ioscheduler=none --gtod_reduce=1 \
--norandommap --hipri=$hipri
done
Link: https://lore.kernel.org/r/20211203231950.193369-18-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit eaab9b5730 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Bug: 199284641
Change-Id: I9da9982800dee4fa0181ac88ea19e8b826f290ca
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Remove the clock scaling lock from ufshcd_queuecommand() since it is a
performance bottleneck. Instead check the SCSI device budget bitmaps in the
code that waits for ongoing ufshcd_queuecommand() calls. A bit is set in
sdev->budget_map just before scsi_queue_rq() is called and a bit is cleared
from that bitmap if scsi_queue_rq() does not submit the request or after
the request has finished. See also the blk_mq_{get,put}_dispatch_budget()
calls in the block layer.
There is no risk for a livelock since the block layer delays queue reruns
if queueing a request fails because the SCSI host has been blocked.
Link: https://lore.kernel.org/r/20211203231950.193369-17-bvanassche@acm.org
Cc: Asutosh Das (asd) <asutoshd@codeaurora.org>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8d077ede48 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
[ bvanassche: changed sbitmap_weight(&sdev->budget_map) into atomic_read(&sdev->device_busy); ]
Bug: 204438323
Change-Id: I1330e09de30b74cff15cab05c9ae86ec413204cb
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Fix the following kernel crash:
Unable to handle kernel paging request at virtual address ffffffc91e735000
Call trace:
__queue_work+0x26c/0x624
queue_work_on+0x6c/0xf0
ufshcd_hold+0x12c/0x210
__ufshcd_wl_suspend+0xc0/0x400
ufshcd_wl_shutdown+0xb8/0xcc
device_shutdown+0x184/0x224
kernel_restart+0x4c/0x124
__arm64_sys_reboot+0x194/0x264
el0_svc_common+0xc8/0x1d4
do_el0_svc+0x30/0x8c
el0_svc+0x20/0x30
el0_sync_handler+0x84/0xe4
el0_sync+0x1bc/0x1c0
Fix this crash by ungating the clock before destroying the work queue on
which clock gating work is queued.
Link: https://lore.kernel.org/r/20211203231950.193369-15-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3489c34bd0 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: I5b2ef262356fe1ed3581e7173fd3cc69208177fc
Signed-off-by: Bart Van Assche <bvanassche@google.com>
The only functional change in this patch is that scsi_done() is now called
after ufshcd_release() and ufshcd_clk_scaling_update_busy() instead of
before.
The next patch in this series will introduce a call to
ufshcd_release_scsi_cmd() in the abort handler.
Link: https://lore.kernel.org/r/20211203231950.193369-13-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6f8dafdee6 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: Ie9e3ef49aa10d3dc9ce43625893809b232d87d5f
Signed-off-by: Bart Van Assche <bvanassche@google.com>
The following deadlock has been observed on a test setup:
- All tags allocated
- The SCSI error handler calls ufshcd_eh_host_reset_handler()
- ufshcd_eh_host_reset_handler() queues work that calls
ufshcd_err_handler()
- ufshcd_err_handler() locks up as follows:
Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt
Call trace:
__switch_to+0x298/0x5d8
__schedule+0x6cc/0xa94
schedule+0x12c/0x298
blk_mq_get_tag+0x210/0x480
__blk_mq_alloc_request+0x1c8/0x284
blk_get_request+0x74/0x134
ufshcd_exec_dev_cmd+0x68/0x640
ufshcd_verify_dev_init+0x68/0x35c
ufshcd_probe_hba+0x12c/0x1cb8
ufshcd_host_reset_and_restore+0x88/0x254
ufshcd_reset_and_restore+0xd0/0x354
ufshcd_err_handler+0x408/0xc58
process_one_work+0x24c/0x66c
worker_thread+0x3e8/0xa4c
kthread+0x150/0x1b4
ret_from_fork+0x10/0x30
Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved
request.
Link: https://lore.kernel.org/r/20211203231950.193369-10-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 945c3cca05 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: Ib8027a51cc4b7bec7ddd69719f0f7f4a6e8dfb3a
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Revert commit d56a3389b8 ("FROMLIST: scsi: ufs: Fix a deadlock in the error
handler") in preparation of switching to the FROMGIT solution.
Bug: 204438323
Change-Id: Ia9ae90af8eb99acf323d04504fa990163a4162cc
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Use hba->outstanding_reqs instead of ufshcd_any_tag_in_use(). This patch
prepares for removal of the blk_mq_start_request() call from
ufshcd_wait_for_dev_cmd(). blk_mq_tagset_busy_iter() only iterates over
started requests.
Link: https://lore.kernel.org/r/20211203231950.193369-8-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit bd0b353831 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: Ib43ac46cfb8094d0727af060c90a709d5430ecac
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Before commit fc0c209c14 ("clk: Allow parents to be specified without
string names") child clks couldn't find their parent until the parent
clk was added to a list in __clk_core_init(). After that commit, child
clks can reference their parent clks directly via a clk_hw pointer, or
they can lookup that clk_hw pointer via DT if the parent clk is
registered with an OF clk provider.
The common clk framework treats hw->core being non-NULL as "the clk is
registered" per the logic within clk_core_fill_parent_index():
parent = entry->hw->core;
/*
* We have a direct reference but it isn't registered yet?
* Orphan it and let clk_reparent() update the orphan status
* when the parent is registered.
*/
if (!parent)
Therefore we need to be extra careful to not set hw->core until the clk
is fully registered with the clk framework. Otherwise we can get into a
situation where a child finds a parent clk and we move the child clk off
the orphan list when the parent isn't actually registered, wrecking our
enable accounting and breaking critical clks.
Consider the following scenario:
CPU0 CPU1
---- ----
struct clk_hw clkBad;
struct clk_hw clkA;
clkA.init.parent_hws = { &clkBad };
clk_hw_register(&clkA) clk_hw_register(&clkBad)
... __clk_register()
hw->core = core
...
__clk_register()
__clk_core_init()
clk_prepare_lock()
__clk_init_parent()
clk_core_get_parent_by_index()
clk_core_fill_parent_index()
if (entry->hw) {
parent = entry->hw->core;
At this point, 'parent' points to clkBad even though clkBad hasn't been
fully registered yet. Ouch! A similar problem can happen if a clk
controller registers orphan clks that are referenced in the DT node of
another clk controller.
Let's fix all this by only setting the hw->core pointer underneath the
clk prepare lock in __clk_core_init(). This way we know that
clk_core_fill_parent_index() can't see hw->core be non-NULL until the
clk is fully registered.
Fixes: fc0c209c14 ("clk: Allow parents to be specified without string names")
Signed-off-by: Mike Tipton <quic_mdtipton@quicinc.com>
Link: https://lore.kernel.org/r/20211109043438.4639-1-quic_mdtipton@quicinc.com
[sboyd@kernel.org: Reword commit text, update comment]
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Bug: 208605820
(cherry picked from commit 54baf56eaahttps://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git clk-next)
Change-Id: Iee7ea8a1ba3a95a4985c2e689bcc4484c33153f1
Signed-off-by: Mike Tipton <quic_mdtipton@quicinc.com>
Long ago there wasn't a FOLL_LONGTERM flag so this DAX check was done by
post-processing the VMA list.
These days it is trivial to just check each VMA to see if it is DAX before
processing it inside __get_user_pages() and return failure if a DAX VMA is
encountered with FOLL_LONGTERM.
Removing the allocation of the VMA list is a significant speed up for many
call sites.
Add an IS_ENABLED to vma_is_fsdax so that code generation is unchanged
when DAX is compiled out.
Remove the dummy version of __gup_longterm_locked() as !CONFIG_CMA already
makes memalloc_nocma_save(), check_and_migrate_cma_pages(), and
memalloc_nocma_restore() into a NOP.
Bug: 209719897
Link: https://lkml.kernel.org/r/0-v1-5551df3ed12e+b8-gup_dax_speedup_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 52650c8b46)
Change-Id: I8be099dc7b617916254c2650ff8a55a6b926a32e
(cherry picked from commit 78ea29e570)
The comment for kvm_reset_vcpu() refers to the sysreg table as
being the table above, probably because of the code extracted at
commit f4672752c3 ("arm64: KVM: virtual CPU reset").
Fix the comment to remove the potentially confusing reference.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211208193257.667613-2-tabba@google.com
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Id1a3c02a5522990a53d92055f7bac826c3491f12
The GICv3 ITS driver allocates memory for its tables using alloc_pages()
and performs explicit cache maintenance if necessary. On systems such
as those running pKVM, where the memory encryption API is implemented,
memory shared with the ITS must first be transitioned to the "decrypted"
state, as it would be if allocated via the DMA API.
Allow pKVM guests to interact with an ITS emulation by ensuring that the
shared pages are decrypted at the point of allocation and encrypted
again upon free().
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211208155916.681-1-will@kernel.org
Bug: 209580772
Change-Id: I89820c65769a07306fd3e067d7d33c938d156820
Signed-off-by: Quentin Perret <qperret@google.com>
Now that GICv2 is disabled in nVHE protected mode there should be no
other reason for the host to use create_hyp_io_mappings() or
kvm_phys_addr_ioremap(). Add sanity checks to make sure that assumption
remains true looking forward.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-6-qperret@google.com
Bug: 209580772
Change-Id: I371533976ce9ffdbf6b0eff986680d34d3153b86
The __io_map_base variable is used at EL2 to track the end of the
hypervisor's "private" VA range in nVHE protected mode. However it
doesn't need to be used outside of mm.c, so let's make it static to keep
all the hyp VA allocation logic in one place.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-5-qperret@google.com
Bug: 209580772
Change-Id: I0aac3451fdeddbc193d127ed38c3c998636d11b9
The hyp memory pool struct is sized to fit exactly the needs of the
hypervisor stage-1 page-table allocator, so it is important it is not
used for anything else. As it is currently used only from setup.c,
reduce its visibility by marking it static.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-4-qperret@google.com
Bug: 209580772
Change-Id: I5079221a3a5125ba85b837996aa64f098636d4cc
GICv2 requires having device mappings in guests and the hypervisor,
which is incompatible with the current pKVM EL2 page ownership model
which only covers memory. While it would be desirable to support pKVM
with GICv2, this will require a lot more work, so let's make the
current assumption clear until then.
Co-developed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-3-qperret@google.com
Bug: 209580772
Change-Id: I0c507b698e7cefc389e1a49ed6b15cf59d9daaa7
The EL2 page allocator in protected mode maintains a per-pool max order
value to optimize allocations when the memory region it covers is small.
However, the max order value is currently under-estimated whenever the
number of pages in the region is a power of two. Fix the estimation.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-2-qperret@google.com
Bug: 209580772
Change-Id: Ibb149a33cad785c777032a4d129004f619d88653
Introduce an unshare hypercall which can be used to unmap memory from
the hypervisor stage-1 in nVHE protected mode. This will be useful to
update the EL2 ownership state of pages during guest teardown, and
avoids keeping dangling mappings to unreferenced portions of memory.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-15-qperret@google.com
Bug: 209599700
Change-Id: Id79362978000d72b866152d0d83c887e4caeb973
Tearing down a previously shared memory region results in the borrower
losing access to the underlying pages and returning them to the "owned"
state in the owner.
Implement a do_unshare() helper, along the same lines as do_share(), to
provide this functionality for the host-to-hyp case.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-14-qperret@google.com
Bug: 209599700
Change-Id: I717d87c9aa2d1f1b159d7dc3bca439a2869967e5
__pkvm_host_share_hyp() shares memory between the host and the
hypervisor so implement it as an invocation of the new do_share()
mechanism.
Note that double-sharing is no longer permitted (as this allows us to
reduce the number of page-table walks significantly), but is thankfully
no longer relied upon by the host.
[ qperret: BACKPORT becuse of conflict caused by the MMIO handler
introduced with the S2MPU support ]
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-13-qperret@google.com
Bug: 209599700
Change-Id: I8d44fc9ca79ac7ea5f8ca289b3cca08a4879b3cd
By default, protected KVM isolates memory pages so that they are
accessible only to their owner: be it the host kernel, the hypervisor
at EL2 or (in future) the guest. Establishing shared-memory regions
between these components therefore involves a transition for each page
so that the owner can share memory with a borrower under a certain set
of permissions.
Introduce a do_share() helper for safely sharing a memory region between
two components. Currently, only host-to-hyp sharing is implemented, but
the code is easily extended to handle other combinations and the
permission checks for each component are reusable.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-12-qperret@google.com
Bug: 209599700
Change-Id: I7edb1b53014ffb4a5aa7a6ee54fd99d8091b57cd
In preparation for adding additional locked sections for manipulating
page-tables at EL2, introduce some simple wrappers around the host and
hypervisor locks so that it's a bit easier to read and bit more difficult
to take the wrong lock (or even take them in the wrong order).
[ qperret: BACKPORT caused by trivial conflict with S2MPU code ]
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-11-qperret@google.com
Bug: 209599700
Change-Id: If6a1baf1dc099894c3445d6b6fec4dd3a46164a9
Explicitly name the combination of SW0 | SW1 as reserved in the pte and
introduce a new PKVM_NOPAGE meta-state which, although not directly
stored in the software bits of the pte, can be used to represent an
entry for which there is no underlying page. This is distinct from an
invalid pte, as stage-2 identity mappings for the host are created
lazily and so an invalid pte there is the same as a valid mapping for
the purposes of ownership information.
This state will be used for permission checking during page transitions
in later patches.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-10-qperret@google.com
Bug: 209599700
Change-Id: I7f31f675d39c5b33168eb652ca35822fba2ec0ff
In order to simplify the page tracking infrastructure at EL2 in nVHE
protected mode, move the responsibility of refcounting pages that are
shared multiple times on the host. In order to do so, let's create a
red-black tree tracking all the PFNs that have been shared, along with
a refcount.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-9-qperret@google.com
Bug: 209599700
Change-Id: I11dc907d139ba314247fb42e8702a6e80c55c054
The create_hyp_mappings() function can currently be called at any point
in time. However, its behaviour in protected mode changes widely
depending on when it is being called. Prior to KVM init, it is used to
create the temporary page-table used to bring-up the hypervisor, and
later on it is transparently turned into a 'share' hypercall when the
kernel has lost control over the hypervisor stage-1. In order to prepare
the ground for also unsharing pages with the hypervisor during guest
teardown, introduce a kvm_share_hyp() function to make it clear in which
places a share hypercall should be expected, as we will soon need a
matching unshare hypercall in all those places.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-8-qperret@google.com/
Bug: 209599700
Change-Id: I17b9c2542e21f7c4cef0ee1e358b71a4f01c6647
kvm_pgtable_hyp_unmap() relies on the ->page_count() function callback
being provided by the memory-management operations for the page-table.
Wire up this callback for the hypervisor stage-1 page-table.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-6-qperret@google.com
Bug: 209599700
Change-Id: Ieaf1f60698e1ebafc60424e879ccfd6ec192dbb5
In nVHE-protected mode, the hyp stage-1 page-table refcount is broken
due to the lack of refcount support in the early allocator. Fix-up the
refcount in the finalize walker, once the 'hyp_vmemmap' is up and running.
[ qperret: BACKPORT because of conflict with S2MPU init ]
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-5-qperret@google.com
Bug: 209599700
Change-Id: Ib31ace99838f397d7a2e48bfd43c6f4eaf730878
To prepare the ground for allowing hyp stage-1 mappings to be removed at
run-time, update the KVM page-table code to maintain a correct refcount
using the ->{get,put}_page() function callbacks.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-4-qperret@google.com
Bug: 209599700
Change-Id: If45f4a5c62e70db5c6ee60192fff5ca4b945aa31
In nVHE protected mode, the EL2 code uses a temporary allocator during
boot while re-creating its stage-1 page-table. Unfortunately, the
hyp_vmmemap is not ready to use at this stage, so refcounting pages
is not possible. That is not currently a problem because hyp stage-1
mappings are never removed, which implies refcounting of page-table
pages is unnecessary.
In preparation for allowing hypervisor stage-1 mappings to be removed,
provide stub implementations for {get,put}_page() in the early allocator.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-3-qperret@google.com
Bug: 209599700
Change-Id: I051ceebbe2c564ff88726a451f83af646f0d2cf0
The kvm_host_owns_hyp_mappings() function should return true if and only
if the host kernel is responsible for creating the hypervisor stage-1
mappings. That is only possible in standard non-VHE mode, or during boot
in protected nVHE mode. But either way, non of this makes sense in VHE,
so make sure to catch this case as well, hence making the function
return sensible values in any context (VHE or not).
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-2-qperret@google.com
Bug: 209599700
Change-Id: Iec9d5f5f6f1258b76725df9b93064a9ddef1e670
virtio_max_dma_size() returns the maximum DMA mapping size of the virtio
device by querying dma_max_mapping_size() for the device when the DMA
API is in use for the vring. Unfortunately, the device passed is
initialised by register_virtio_device() and does not inherit the DMA
configuration from its parent, resulting in SWIOTLB errors when bouncing
is enabled and the default 256K mapping limit (IO_TLB_SEGSIZE) is not
respected:
| virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 294912 bytes), total 1024 (slots), used 725 (slots)
Follow the pattern used elsewhere in the virtio_ring code when calling
into the DMA layer and pass the parent device to dma_max_mapping_size()
instead.
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211201112018.25276-1-will@kernel.org
Bug: 209580772
Change-Id: I3389270b4df2b0e0d3813ff8be61bdb594c1b0bd
Signed-off-by: Quentin Perret <qperret@google.com>
kvm_is_transparent_hugepage() was removed in commit 205d76ff06 ("KVM:
Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()") but its
declaration in include/linux/kvm_host.h persisted. Drop it.
Fixes: 205d76ff06 (""KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211018151407.2107363-1-vkuznets@redhat.com
(cherry picked from commit f0e6e6fa41
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I9078ab62be40bc843ca2959f929ed22c1b8888e2
kvm/hyp/reserved_mem.c contains host code executing at EL1 and is not
linked into the hypervisor object. Move the file into kvm/pkvm.c and
rework the headers so that the definitions shared between the host and
the hypervisor live in asm/kvm_pkvm.h.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211202171048.26924-4-will@kernel.org
(cherry picked from commit 9429f4b041
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ic53c6ef5262e473e61bfdd44204b6a6725035827
In order to avoid exposing hypervisor (EL2) data structures directly to
the host, generate hyp_constants.h to provide constants such as structure
sizes to the host without dragging in the definitions themselves.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211202171048.26924-3-will@kernel.org
(cherry picked from commit ed4ed15d57
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I24957ea3ef1da8863a60dcf53c146b3a78f56fa5
asm/mmu.h refers to cpus_have_const_cap() in the definition of
arm64_kernel_unmapped_at_el0() so include asm/cpufeature.h directly
rather than force all users of the header to do it themselves.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211202171048.26924-2-will@kernel.org
(cherry picked from commit 7e04f05984
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Iaa42070f8f41255406b1031e5a59f58c06f47f5d
The only usage of kvm_io_gic_ops is to make a comparison with its
address and to pass its address to kvm_iodevice_init() which takes a
pointer to const kvm_io_device_ops as input. Make it const to allow the
compiler to put it in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211204213518.83642-1-rikard.falkeborn@gmail.com
(cherry picked from commit 636dcd0204
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I057a166181bea5855dd19be14971ac086e02ec12
When running a KVM guest hosted on an ARMv8.7 machine, the host
kernel complains that it doesn't know about the architected number
of events.
Fix it by adding the PMUver code corresponding to PMUv3 for ARMv8.7.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211126115533.217903-1-maz@kernel.org
(cherry picked from commit 00e228b315
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I705efed6bcdd2000a57901bd04ba080a36527ad4
With the transition to kvm_arch_vcpu_run_pid_change() to handle
the "run once" activities, it becomes obvious that has_run_once
is now an exact shadow of vcpu->pid.
Replace vcpu->arch.has_run_once with a new vcpu_has_run_once()
helper that directly checks for vcpu->pid, and get rid of the
now unused field.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit cc5705fb1b
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Iaecd0c5440ae929775fd43b7e9cfe71168b45911