Commit Graph

985355 Commits

Author SHA1 Message Date
Bart Van Assche
cfa96fa21d FROMGIT: scsi: ufs: Implement polling support
The time spent in io_schedule() and also the interrupt latency are
significant when submitting direct I/O to a UFS device. Hence this patch
that implements polling support. User space software can enable polling by
passing the RWF_HIPRI flag to the preadv2() system call or the
IORING_SETUP_IOPOLL flag to the io_uring interface.

Although the block layer supports to partition the tag space for
interrupt-based completions (HCTX_TYPE_DEFAULT) purposes and polling
(HCTX_TYPE_POLL), the choice has been made to use the same hardware queue
for both hctx types because partitioning the tag space would negatively
affect performance.

On my test setup this patch increases IOPS from 2736 to 22000 (8x) for the
following test:

for hipri in 0 1; do
    fio --ioengine=io_uring --iodepth=1 --rw=randread \
    --runtime=60 --time_based=1 --direct=1 --name=qd1 \
    --filename=/dev/block/sda --ioscheduler=none --gtod_reduce=1 \
    --norandommap --hipri=$hipri
done

Link: https://lore.kernel.org/r/20211203231950.193369-18-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit eaab9b5730 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Bug: 199284641
Change-Id: I9da9982800dee4fa0181ac88ea19e8b826f290ca
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:40 -08:00
Bart Van Assche
198b86493c BACKPORT: FROMGIT: scsi: ufs: Optimize the command queueing code
Remove the clock scaling lock from ufshcd_queuecommand() since it is a
performance bottleneck. Instead check the SCSI device budget bitmaps in the
code that waits for ongoing ufshcd_queuecommand() calls. A bit is set in
sdev->budget_map just before scsi_queue_rq() is called and a bit is cleared
from that bitmap if scsi_queue_rq() does not submit the request or after
the request has finished. See also the blk_mq_{get,put}_dispatch_budget()
calls in the block layer.

There is no risk for a livelock since the block layer delays queue reruns
if queueing a request fails because the SCSI host has been blocked.

Link: https://lore.kernel.org/r/20211203231950.193369-17-bvanassche@acm.org
Cc: Asutosh Das (asd) <asutoshd@codeaurora.org>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8d077ede48 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
[ bvanassche: changed sbitmap_weight(&sdev->budget_map) into atomic_read(&sdev->device_busy); ]
Bug: 204438323
Change-Id: I1330e09de30b74cff15cab05c9ae86ec413204cb
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:40 -08:00
Bart Van Assche
b533233833 FROMGIT: scsi: ufs: Stop using the clock scaling lock in the error handler
Instead of locking and unlocking the clock scaling lock, surround the
command queueing code with an RCU reader lock and call synchronize_rcu().
This patch prepares for removal of the clock scaling lock.

Link: https://lore.kernel.org/r/20211203231950.193369-16-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5675c381ea git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: I7cfeecbffcaad1cb61a096717b27816c385dae14
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:40 -08:00
Bart Van Assche
330a1d16f2 FROMGIT: scsi: ufs: Fix a kernel crash during shutdown
Fix the following kernel crash:

Unable to handle kernel paging request at virtual address ffffffc91e735000
Call trace:
 __queue_work+0x26c/0x624
 queue_work_on+0x6c/0xf0
 ufshcd_hold+0x12c/0x210
 __ufshcd_wl_suspend+0xc0/0x400
 ufshcd_wl_shutdown+0xb8/0xcc
 device_shutdown+0x184/0x224
 kernel_restart+0x4c/0x124
 __arm64_sys_reboot+0x194/0x264
 el0_svc_common+0xc8/0x1d4
 do_el0_svc+0x30/0x8c
 el0_svc+0x20/0x30
 el0_sync_handler+0x84/0xe4
 el0_sync+0x1bc/0x1c0

Fix this crash by ungating the clock before destroying the work queue on
which clock gating work is queued.

Link: https://lore.kernel.org/r/20211203231950.193369-15-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3489c34bd0 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: I5b2ef262356fe1ed3581e7173fd3cc69208177fc
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:40 -08:00
Bart Van Assche
f23e2c8fa8 FROMGIT: scsi: ufs: Improve SCSI abort handling further
Release resources when aborting a command. Make sure that aborted commands
are completed once by clearing the corresponding tag bit from
hba->outstanding_reqs. This patch is an improved version of commit
3ff1f6b6ba ("scsi: ufs: core: Improve SCSI abort handling").

Link: https://lore.kernel.org/r/20211203231950.193369-14-bvanassche@acm.org
Fixes: 7a3e97b0dc ("[SCSI] ufshcd: UFS Host controller driver")
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1fbaa02dfd git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: Ifdf7f016c0d1986fe905f13be8abbeb54af4bce5
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:40 -08:00
Bart Van Assche
c36f34dff6 FROMGIT: scsi: ufs: Introduce ufshcd_release_scsi_cmd()
The only functional change in this patch is that scsi_done() is now called
after ufshcd_release() and ufshcd_clk_scaling_update_busy() instead of
before.

The next patch in this series will introduce a call to
ufshcd_release_scsi_cmd() in the abort handler.

Link: https://lore.kernel.org/r/20211203231950.193369-13-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6f8dafdee6 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: Ie9e3ef49aa10d3dc9ce43625893809b232d87d5f
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:40 -08:00
Bart Van Assche
83ecae51ea FROMGIT: scsi: ufs: Remove the 'update_scaling' local variable
This patch does not change any functionality but makes the next patch in
this series easier to read.

Link: https://lore.kernel.org/r/20211203231950.193369-12-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3eb9dcc027 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: I5a420ba06517e65aa2cbabf08c2fc78de2490def
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:40 -08:00
Bart Van Assche
72a1395f6c FROMGIT: scsi: ufs: Remove hba->cmd_queue
The previous patch removed all code that uses hba->cmd_queue. Hence also
remove hba->cmd_queue itself.

Link: https://lore.kernel.org/r/20211203231950.193369-11-bvanassche@acm.org
Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 511a083b8b git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: I444a5343f779620aa82359b0dd709c5be880e6f0
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:39 -08:00
Bart Van Assche
6c8460404d FROMGIT: scsi: ufs: Fix a deadlock in the error handler
The following deadlock has been observed on a test setup:

 - All tags allocated

 - The SCSI error handler calls ufshcd_eh_host_reset_handler()

 - ufshcd_eh_host_reset_handler() queues work that calls
   ufshcd_err_handler()

 - ufshcd_err_handler() locks up as follows:

Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt
Call trace:
 __switch_to+0x298/0x5d8
 __schedule+0x6cc/0xa94
 schedule+0x12c/0x298
 blk_mq_get_tag+0x210/0x480
 __blk_mq_alloc_request+0x1c8/0x284
 blk_get_request+0x74/0x134
 ufshcd_exec_dev_cmd+0x68/0x640
 ufshcd_verify_dev_init+0x68/0x35c
 ufshcd_probe_hba+0x12c/0x1cb8
 ufshcd_host_reset_and_restore+0x88/0x254
 ufshcd_reset_and_restore+0xd0/0x354
 ufshcd_err_handler+0x408/0xc58
 process_one_work+0x24c/0x66c
 worker_thread+0x3e8/0xa4c
 kthread+0x150/0x1b4
 ret_from_fork+0x10/0x30

Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved
request.

Link: https://lore.kernel.org/r/20211203231950.193369-10-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 945c3cca05 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: Ib8027a51cc4b7bec7ddd69719f0f7f4a6e8dfb3a
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:39 -08:00
Bart Van Assche
d0516fa2a9 ANDROID: Revert "FROMLIST: scsi: ufs: Fix a deadlock in the error handler"
Revert commit d56a3389b8 ("FROMLIST: scsi: ufs: Fix a deadlock in the error
handler") in preparation of switching to the FROMGIT solution.

Bug: 204438323
Change-Id: Ia9ae90af8eb99acf323d04504fa990163a4162cc
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:39 -08:00
Bart Van Assche
9d0179eda1 FROMGIT: scsi: ufs: Remove ufshcd_any_tag_in_use()
Use hba->outstanding_reqs instead of ufshcd_any_tag_in_use(). This patch
prepares for removal of the blk_mq_start_request() call from
ufshcd_wait_for_dev_cmd(). blk_mq_tagset_busy_iter() only iterates over
started requests.

Link: https://lore.kernel.org/r/20211203231950.193369-8-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit bd0b353831 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: Ib43ac46cfb8094d0727af060c90a709d5430ecac
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:39 -08:00
Bart Van Assche
f531b624a6 FROMGIT: scsi: ufs: Fix race conditions related to driver data
The driver data pointer must be set before any callbacks are registered
that use that pointer. Hence move the initialization of that pointer from
after the ufshcd_init() call to inside ufshcd_init().

Link: https://lore.kernel.org/r/20211203231950.193369-7-bvanassche@acm.org
Fixes: 3b1d05807a ("[SCSI] ufs: Segregate PCI Specific Code")
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 21ad0e4908 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: Id924038c13cab1e203bb650cc3939ebd5acf56fe
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:39 -08:00
Bart Van Assche
968af1dd93 FROMGIT: scsi: ufs: Remove the sdev_rpmb member
Since the sdev_rpmb member of struct ufs_hba is only used inside
ufshcd_scsi_add_wlus(), convert it into a local variable.

Link: https://lore.kernel.org/r/20211203231950.193369-5-bvanassche@acm.org
Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 59830c095c git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: I32b1e9b7e4c517113a2836eba544a94c67223579
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:39 -08:00
Bart Van Assche
475c91d6c8 FROMGIT: scsi: ufs: Remove is_rpmb_wlun()
Commit edc0596cc0 ("scsi: ufs: core: Stop clearing UNIT ATTENTIONS")
removed all callers of is_rpmb_wlun(). Hence also remove the function
itself.

Link: https://lore.kernel.org/r/20211203231950.193369-4-bvanassche@acm.org
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d656dc9b0b git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: I1c4cca0645db743c9c3af0acf0f8cca83681fae1
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:39 -08:00
Bart Van Assche
cfcf226fda FROMGIT: scsi: ufs: Rename a function argument
The new name makes it clear what the meaning of the function argument is.

Link: https://lore.kernel.org/r/20211203231950.193369-3-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Chanho Park <chanho61.park@samsung.com>
Reviewed-by: Keoseong Park <keosung.park@samsung.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Acked-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b427609e11 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 204438323
Change-Id: I84338bfad20a24cb6e844507116e72fe5116df24
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2021-12-10 09:15:38 -08:00
Nick Desaulniers
88389e813f ANDROID: clang: update to 14.0.0
Bug: 202986547
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Change-Id: I6c32124aa77b256f5b8760e8054607ae8a46197f
2021-12-10 09:26:44 +00:00
Mike Tipton
12a745bf83 FROMGIT: clk: Don't parent clks until the parent is fully registered
Before commit fc0c209c14 ("clk: Allow parents to be specified without
string names") child clks couldn't find their parent until the parent
clk was added to a list in __clk_core_init(). After that commit, child
clks can reference their parent clks directly via a clk_hw pointer, or
they can lookup that clk_hw pointer via DT if the parent clk is
registered with an OF clk provider.

The common clk framework treats hw->core being non-NULL as "the clk is
registered" per the logic within clk_core_fill_parent_index():

	parent = entry->hw->core;
	/*
	 * We have a direct reference but it isn't registered yet?
	 * Orphan it and let clk_reparent() update the orphan status
	 * when the parent is registered.
	 */
	if (!parent)

Therefore we need to be extra careful to not set hw->core until the clk
is fully registered with the clk framework. Otherwise we can get into a
situation where a child finds a parent clk and we move the child clk off
the orphan list when the parent isn't actually registered, wrecking our
enable accounting and breaking critical clks.

Consider the following scenario:

  CPU0                                     CPU1
  ----                                     ----
  struct clk_hw clkBad;
  struct clk_hw clkA;

  clkA.init.parent_hws = { &clkBad };

  clk_hw_register(&clkA)                   clk_hw_register(&clkBad)
   ...                                      __clk_register()
					     hw->core = core
					     ...
   __clk_register()
    __clk_core_init()
     clk_prepare_lock()
     __clk_init_parent()
      clk_core_get_parent_by_index()
       clk_core_fill_parent_index()
        if (entry->hw) {
	 parent = entry->hw->core;

At this point, 'parent' points to clkBad even though clkBad hasn't been
fully registered yet. Ouch! A similar problem can happen if a clk
controller registers orphan clks that are referenced in the DT node of
another clk controller.

Let's fix all this by only setting the hw->core pointer underneath the
clk prepare lock in __clk_core_init(). This way we know that
clk_core_fill_parent_index() can't see hw->core be non-NULL until the
clk is fully registered.

Fixes: fc0c209c14 ("clk: Allow parents to be specified without string names")
Signed-off-by: Mike Tipton <quic_mdtipton@quicinc.com>
Link: https://lore.kernel.org/r/20211109043438.4639-1-quic_mdtipton@quicinc.com
[sboyd@kernel.org: Reword commit text, update comment]
Signed-off-by: Stephen Boyd <sboyd@kernel.org>

Bug: 208605820
(cherry picked from commit 54baf56eaa
 https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git clk-next)
Change-Id: Iee7ea8a1ba3a95a4985c2e689bcc4484c33153f1
Signed-off-by: Mike Tipton <quic_mdtipton@quicinc.com>
2021-12-10 00:25:12 +00:00
Jason Gunthorpe
f1f505f3b4 UPSTREAM: mm/gup: remove the vma allocation from gup_longterm_locked()
Long ago there wasn't a FOLL_LONGTERM flag so this DAX check was done by
post-processing the VMA list.

These days it is trivial to just check each VMA to see if it is DAX before
processing it inside __get_user_pages() and return failure if a DAX VMA is
encountered with FOLL_LONGTERM.

Removing the allocation of the VMA list is a significant speed up for many
call sites.

Add an IS_ENABLED to vma_is_fsdax so that code generation is unchanged
when DAX is compiled out.

Remove the dummy version of __gup_longterm_locked() as !CONFIG_CMA already
makes memalloc_nocma_save(), check_and_migrate_cma_pages(), and
memalloc_nocma_restore() into a NOP.

Bug: 209719897
Link: https://lkml.kernel.org/r/0-v1-5551df3ed12e+b8-gup_dax_speedup_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 52650c8b46)
Change-Id: I8be099dc7b617916254c2650ff8a55a6b926a32e
(cherry picked from commit 78ea29e570)
2021-12-09 19:43:20 +00:00
Fuad Tabba
e8a81778fe FROMLIST: KVM: arm64: Use defined value for SCTLR_ELx_EE
Replace the hardcoded value with the existing definition.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211208192810.657360-1-tabba@google.com
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I80b23293060ff773bbb1ff8da5d36bfc3b517936
2021-12-09 09:51:08 +00:00
Fuad Tabba
ac233f3893 FROMLIST: KVM: arm64: Fix comment on barrier in kvm_psci_vcpu_on()
The barrier is there for power_off rather than power_state.
Probably typo in commit 358b28f09f ("arm/arm64: KVM: Allow
a VCPU to fully reset itself").

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211208193257.667613-3-tabba@google.com
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: I055d206de6a01f0ea97fb624e3647472b76f0620
2021-12-09 09:51:01 +00:00
Fuad Tabba
5f3ca8858f FROMLIST: KVM: arm64: Fix comment for kvm_reset_vcpu()
The comment for kvm_reset_vcpu() refers to the sysreg table as
being the table above, probably because of the code extracted at
commit f4672752c3 ("arm64: KVM: virtual CPU reset").

Fix the comment to remove the potentially confusing reference.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211208193257.667613-2-tabba@google.com
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 209580772
Change-Id: Id1a3c02a5522990a53d92055f7bac826c3491f12
2021-12-09 09:50:54 +00:00
Will Deacon
c330de946f FROMLIST: irqchip/gic-v3-its: Mark some in-memory data structures as 'decrypted'
The GICv3 ITS driver allocates memory for its tables using alloc_pages()
and performs explicit cache maintenance if necessary. On systems such
as those running pKVM, where the memory encryption API is implemented,
memory shared with the ITS must first be transitioned to the "decrypted"
state, as it would be if allocated via the DMA API.

Allow pKVM guests to interact with an ITS emulation by ensuring that the
shared pages are decrypted at the point of allocation and encrypted
again upon free().

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211208155916.681-1-will@kernel.org
Bug: 209580772
Change-Id: I89820c65769a07306fd3e067d7d33c938d156820
Signed-off-by: Quentin Perret <qperret@google.com>
2021-12-09 09:50:20 +00:00
Quentin Perret
428452cb60 FROMLIST: KVM: arm64: pkvm: Stub io map functions
Now that GICv2 is disabled in nVHE protected mode there should be no
other reason for the host to use create_hyp_io_mappings() or
kvm_phys_addr_ioremap(). Add sanity checks to make sure that assumption
remains true looking forward.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-6-qperret@google.com
Bug: 209580772
Change-Id: I371533976ce9ffdbf6b0eff986680d34d3153b86
2021-12-09 09:50:20 +00:00
Quentin Perret
f5c76009e2 FROMLIST: KVM: arm64: Make __io_map_base static
The __io_map_base variable is used at EL2 to track the end of the
hypervisor's "private" VA range in nVHE protected mode. However it
doesn't need to be used outside of mm.c, so let's make it static to keep
all the hyp VA allocation logic in one place.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-5-qperret@google.com
Bug: 209580772
Change-Id: I0aac3451fdeddbc193d127ed38c3c998636d11b9
2021-12-09 09:50:20 +00:00
Quentin Perret
2e433f3894 FROMLIST: KVM: arm64: Make the hyp memory pool static
The hyp memory pool struct is sized to fit exactly the needs of the
hypervisor stage-1 page-table allocator, so it is important it is not
used for anything else. As it is currently used only from setup.c,
reduce its visibility by marking it static.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-4-qperret@google.com
Bug: 209580772
Change-Id: I5079221a3a5125ba85b837996aa64f098636d4cc
2021-12-09 09:50:20 +00:00
Quentin Perret
695573928a FROMLIST: KVM: arm64: pkvm: Disable GICv2 support
GICv2 requires having device mappings in guests and the hypervisor,
which is incompatible with the current pKVM EL2 page ownership model
which only covers memory. While it would be desirable to support pKVM
with GICv2, this will require a lot more work, so let's make the
current assumption clear until then.

Co-developed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-3-qperret@google.com
Bug: 209580772
Change-Id: I0c507b698e7cefc389e1a49ed6b15cf59d9daaa7
2021-12-09 09:50:20 +00:00
Quentin Perret
a2fffdffb7 FROMLIST: KVM: arm64: pkvm: Fix hyp_pool max order
The EL2 page allocator in protected mode maintains a per-pool max order
value to optimize allocations when the memory region it covers is small.
However, the max order value is currently under-estimated whenever the
number of pages in the region is a power of two. Fix the estimation.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211208152300.2478542-2-qperret@google.com
Bug: 209580772
Change-Id: Ibb149a33cad785c777032a4d129004f619d88653
2021-12-09 09:50:19 +00:00
Quentin Perret
bcf3fd91be FROMLIST: KVM: arm64: pkvm: Unshare guest structs during teardown
Make use of the newly introduced unshare hypercall during guest teardown
to unmap guest-related data structures from the hyp stage-1.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-16-qperret@google.com/
Bug: 209599700
Change-Id: Ife3e9c83ddd69b46490cee8f36a0770747950d69
2021-12-09 09:50:19 +00:00
Will Deacon
fee11d0f41 FROMLIST: KVM: arm64: Expose unshare hypercall to the host
Introduce an unshare hypercall which can be used to unmap memory from
the hypervisor stage-1 in nVHE protected mode. This will be useful to
update the EL2 ownership state of pages during guest teardown, and
avoids keeping dangling mappings to unreferenced portions of memory.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-15-qperret@google.com
Bug: 209599700
Change-Id: Id79362978000d72b866152d0d83c887e4caeb973
2021-12-09 09:50:19 +00:00
Will Deacon
0a4821ecc2 FROMLIST: KVM: arm64: Implement do_unshare() helper for unsharing memory
Tearing down a previously shared memory region results in the borrower
losing access to the underlying pages and returning them to the "owned"
state in the owner.

Implement a do_unshare() helper, along the same lines as do_share(), to
provide this functionality for the host-to-hyp case.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-14-qperret@google.com
Bug: 209599700
Change-Id: I717d87c9aa2d1f1b159d7dc3bca439a2869967e5
2021-12-09 09:50:19 +00:00
Will Deacon
50e7557b36 BACKPORT: FROMLIST: KVM: arm64: Implement __pkvm_host_share_hyp() using do_share()
__pkvm_host_share_hyp() shares memory between the host and the
hypervisor so implement it as an invocation of the new do_share()
mechanism.

Note that double-sharing is no longer permitted (as this allows us to
reduce the number of page-table walks significantly), but is thankfully
no longer relied upon by the host.

[ qperret: BACKPORT becuse of conflict caused by the MMIO handler
  introduced with the S2MPU support ]

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-13-qperret@google.com
Bug: 209599700
Change-Id: I8d44fc9ca79ac7ea5f8ca289b3cca08a4879b3cd
2021-12-09 09:50:19 +00:00
Will Deacon
455e17002b FROMLIST: KVM: arm64: Implement do_share() helper for sharing memory
By default, protected KVM isolates memory pages so that they are
accessible only to their owner: be it the host kernel, the hypervisor
at EL2 or (in future) the guest. Establishing shared-memory regions
between these components therefore involves a transition for each page
so that the owner can share memory with a borrower under a certain set
of permissions.

Introduce a do_share() helper for safely sharing a memory region between
two components. Currently, only host-to-hyp sharing is implemented, but
the code is easily extended to handle other combinations and the
permission checks for each component are reusable.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-12-qperret@google.com
Bug: 209599700
Change-Id: I7edb1b53014ffb4a5aa7a6ee54fd99d8091b57cd
2021-12-09 09:50:19 +00:00
Will Deacon
fb29cc8de3 BACKPORT: FROMLIST: KVM: arm64: Introduce wrappers for host and hyp spin lock accessors
In preparation for adding additional locked sections for manipulating
page-tables at EL2, introduce some simple wrappers around the host and
hypervisor locks so that it's a bit easier to read and bit more difficult
to take the wrong lock (or even take them in the wrong order).

[ qperret: BACKPORT caused by trivial conflict with S2MPU code ]

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-11-qperret@google.com
Bug: 209599700
Change-Id: If6a1baf1dc099894c3445d6b6fec4dd3a46164a9
2021-12-09 09:50:19 +00:00
Will Deacon
f80bdbb276 FROMLIST: KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages
Explicitly name the combination of SW0 | SW1 as reserved in the pte and
introduce a new PKVM_NOPAGE meta-state which, although not directly
stored in the software bits of the pte, can be used to represent an
entry for which there is no underlying page. This is distinct from an
invalid pte, as stage-2 identity mappings for the host are created
lazily and so an invalid pte there is the same as a valid mapping for
the purposes of ownership information.

This state will be used for permission checking during page transitions
in later patches.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-10-qperret@google.com
Bug: 209599700
Change-Id: I7f31f675d39c5b33168eb652ca35822fba2ec0ff
2021-12-09 09:50:19 +00:00
Quentin Perret
33fa24cc3b FROMLIST: KVM: arm64: pkvm: Refcount the pages shared with EL2
In order to simplify the page tracking infrastructure at EL2 in nVHE
protected mode, move the responsibility of refcounting pages that are
shared multiple times on the host. In order to do so, let's create a
red-black tree tracking all the PFNs that have been shared, along with
a refcount.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-9-qperret@google.com
Bug: 209599700
Change-Id: I11dc907d139ba314247fb42e8702a6e80c55c054
2021-12-09 09:50:18 +00:00
Quentin Perret
21fb63c709 FROMLIST: KVM: arm64: Introduce kvm_share_hyp()
The create_hyp_mappings() function can currently be called at any point
in time. However, its behaviour in protected mode changes widely
depending on when it is being called. Prior to KVM init, it is used to
create the temporary page-table used to bring-up the hypervisor, and
later on it is transparently turned into a 'share' hypercall when the
kernel has lost control over the hypervisor stage-1. In order to prepare
the ground for also unsharing pages with the hypervisor during guest
teardown, introduce a kvm_share_hyp() function to make it clear in which
places a share hypercall should be expected, as we will soon need a
matching unshare hypercall in all those places.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-8-qperret@google.com/
Bug: 209599700
Change-Id: I17b9c2542e21f7c4cef0ee1e358b71a4f01c6647
2021-12-09 09:50:18 +00:00
Will Deacon
446ab9f9b4 FROMLIST: KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2
Implement kvm_pgtable_hyp_unmap() which can be used to remove hypervisor
stage-1 mappings at EL2.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-7-qperret@google.com
Bug: 209599700
Change-Id: I8cf45752704850162bb02b89cc04449679febe72
2021-12-09 09:50:18 +00:00
Will Deacon
3239508319 FROMLIST: KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table
kvm_pgtable_hyp_unmap() relies on the ->page_count() function callback
being provided by the memory-management operations for the page-table.

Wire up this callback for the hypervisor stage-1 page-table.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-6-qperret@google.com
Bug: 209599700
Change-Id: Ieaf1f60698e1ebafc60424e879ccfd6ec192dbb5
2021-12-09 09:50:18 +00:00
Quentin Perret
edffd3888c BACKPORT: FROMLIST: KVM: arm64: Fixup hyp stage-1 refcount
In nVHE-protected mode, the hyp stage-1 page-table refcount is broken
due to the lack of refcount support in the early allocator. Fix-up the
refcount in the finalize walker, once the 'hyp_vmemmap' is up and running.

[ qperret: BACKPORT because of conflict with S2MPU init ]

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-5-qperret@google.com
Bug: 209599700
Change-Id: Ib31ace99838f397d7a2e48bfd43c6f4eaf730878
2021-12-09 09:50:18 +00:00
Quentin Perret
e96c599591 FROMLIST: KVM: arm64: Refcount hyp stage-1 pgtable pages
To prepare the ground for allowing hyp stage-1 mappings to be removed at
run-time, update the KVM page-table code to maintain a correct refcount
using the ->{get,put}_page() function callbacks.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-4-qperret@google.com
Bug: 209599700
Change-Id: If45f4a5c62e70db5c6ee60192fff5ca4b945aa31
2021-12-09 09:50:18 +00:00
Quentin Perret
b66c10e133 FROMLIST: KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator
In nVHE protected mode, the EL2 code uses a temporary allocator during
boot while re-creating its stage-1 page-table. Unfortunately, the
hyp_vmmemap is not ready to use at this stage, so refcounting pages
is not possible. That is not currently a problem because hyp stage-1
mappings are never removed, which implies refcounting of page-table
pages is unnecessary.

In preparation for allowing hypervisor stage-1 mappings to be removed,
provide stub implementations for {get,put}_page() in the early allocator.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-3-qperret@google.com
Bug: 209599700
Change-Id: I051ceebbe2c564ff88726a451f83af646f0d2cf0
2021-12-09 09:50:18 +00:00
Quentin Perret
c765c9635a FROMLIST: KVM: arm64: Check if running in VHE from kvm_host_owns_hyp_mappings()
The kvm_host_owns_hyp_mappings() function should return true if and only
if the host kernel is responsible for creating the hypervisor stage-1
mappings. That is only possible in standard non-VHE mode, or during boot
in protected nVHE mode. But either way, non of this makes sense in VHE,
so make sure to catch this case as well, hence making the function
return sensible values in any context (VHE or not).

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20211201170411.1561936-2-qperret@google.com
Bug: 209599700
Change-Id: Iec9d5f5f6f1258b76725df9b93064a9ddef1e670
2021-12-09 09:50:18 +00:00
Will Deacon
4a0f27b32e FROMLIST: virtio_ring: Fix querying of maximum DMA mapping size for virtio device
virtio_max_dma_size() returns the maximum DMA mapping size of the virtio
device by querying dma_max_mapping_size() for the device when the DMA
API is in use for the vring. Unfortunately, the device passed is
initialised by register_virtio_device() and does not inherit the DMA
configuration from its parent, resulting in SWIOTLB errors when bouncing
is enabled and the default 256K mapping limit (IO_TLB_SEGSIZE) is not
respected:

  | virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 294912 bytes), total 1024 (slots), used 725 (slots)

Follow the pattern used elsewhere in the virtio_ring code when calling
into the DMA layer and pass the parent device to dma_max_mapping_size()
instead.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211201112018.25276-1-will@kernel.org
Bug: 209580772
Change-Id: I3389270b4df2b0e0d3813ff8be61bdb594c1b0bd
Signed-off-by: Quentin Perret <qperret@google.com>
2021-12-09 09:50:17 +00:00
Vitaly Kuznetsov
ad10bedb3f FROMGIT: KVM: Drop stale kvm_is_transparent_hugepage() declaration
kvm_is_transparent_hugepage() was removed in commit 205d76ff06 ("KVM:
Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()") but its
declaration in include/linux/kvm_host.h persisted. Drop it.

Fixes: 205d76ff06 (""KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211018151407.2107363-1-vkuznets@redhat.com
(cherry picked from commit f0e6e6fa41
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I9078ab62be40bc843ca2959f929ed22c1b8888e2
2021-12-09 09:43:53 +00:00
Will Deacon
63f358ca1b FROMGIT: KVM: arm64: Move host EL1 code out of hyp/ directory
kvm/hyp/reserved_mem.c contains host code executing at EL1 and is not
linked into the hypervisor object. Move the file into kvm/pkvm.c and
rework the headers so that the definitions shared between the host and
the hypervisor live in asm/kvm_pkvm.h.

Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211202171048.26924-4-will@kernel.org
(cherry picked from commit 9429f4b041
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ic53c6ef5262e473e61bfdd44204b6a6725035827
2021-12-09 09:43:53 +00:00
Will Deacon
95bfeeb6b5 FROMGIT: KVM: arm64: Generate hyp_constants.h for the host
In order to avoid exposing hypervisor (EL2) data structures directly to
the host, generate hyp_constants.h to provide constants such as structure
sizes to the host without dragging in the definitions themselves.

Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211202171048.26924-3-will@kernel.org
(cherry picked from commit ed4ed15d57
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I24957ea3ef1da8863a60dcf53c146b3a78f56fa5
2021-12-09 09:43:53 +00:00
Will Deacon
8a3b33dc2e FROMGIT: arm64: Add missing include of asm/cpufeature.h to asm/mmu.h
asm/mmu.h refers to cpus_have_const_cap() in the definition of
arm64_kernel_unmapped_at_el0() so include asm/cpufeature.h directly
rather than force all users of the header to do it themselves.

Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211202171048.26924-2-will@kernel.org
(cherry picked from commit 7e04f05984
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Iaa42070f8f41255406b1031e5a59f58c06f47f5d
2021-12-09 09:43:53 +00:00
Rikard Falkeborn
21ceef2920 FROMGIT: KVM: arm64: Constify kvm_io_gic_ops
The only usage of kvm_io_gic_ops is to make a comparison with its
address and to pass its address to kvm_iodevice_init() which takes a
pointer to const kvm_io_device_ops as input. Make it const to allow the
compiler to put it in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211204213518.83642-1-rikard.falkeborn@gmail.com
(cherry picked from commit 636dcd0204
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I057a166181bea5855dd19be14971ac086e02ec12
2021-12-09 09:43:53 +00:00
Marc Zyngier
d907216e5c FROMGIT: KVM: arm64: Add minimal handling for the ARMv8.7 PMU
When running a KVM guest hosted on an ARMv8.7 machine, the host
kernel complains that it doesn't know about the architected number
of events.

Fix it by adding the PMUver code corresponding to PMUv3 for ARMv8.7.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211126115533.217903-1-maz@kernel.org
(cherry picked from commit 00e228b315
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I705efed6bcdd2000a57901bd04ba080a36527ad4
2021-12-09 09:43:53 +00:00
Marc Zyngier
f74a77dd09 FROMGIT: KVM: arm64: Drop vcpu->arch.has_run_once for vcpu->pid
With the transition to kvm_arch_vcpu_run_pid_change() to handle
the "run once" activities, it becomes obvious that has_run_once
is now an exact shadow of vcpu->pid.

Replace vcpu->arch.has_run_once with a new vcpu_has_run_once()
helper that directly checks for vcpu->pid, and get rid of the
now unused field.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit cc5705fb1b
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209777660
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Iaecd0c5440ae929775fd43b7e9cfe71168b45911
2021-12-09 09:43:52 +00:00