On the guest teardown path, pKVM will zero the pages used to back the
guest shadow data structures before returning them to the host as they
may contain secrets (e.g. in the vCPU registers). However, the zeroing
is done using a cacheable alias, and CMOs are missing, hence giving the
host a potential opportunity to read the original content of the shadow
structs from memory.
Fix this by issuing CMOs after zeroing the pages.
[ qperret@: moved the CMOs to __unmap_donated_memory() to cover all
callers, including the __pkvm_init_vm() error path ]
Bug: 259551298
Change-Id: Id696d47d16e4c3fd870cb70b792eeb7f2282fc78
Signed-off-by: Quentin Perret <qperret@google.com>
If a malicious/compromised host issues a PSCI SYSTEM_RESET call in the
presence of guest-owned pages then the contents of those pages may be
susceptible to cold-reboot attacks.
Use the PSCI MEM_PROTECT call to ensure that volatile memory is wiped by
the firmware if a SYSTEM_RESET occurs while unpoisoned guest pages exist
in the system. Since this call does not offer protection for a "warm"
reset initiated by SYSTEM_RESET2, detect this case in the PSCI relay and
repaint the call to a standard SYSTEM_RESET instead.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254821051
Change-Id: I5c3dd93bc83ebcd0b6cea2ec734f6e3a77f0064e
Signed-off-by: Will Deacon <willdeacon@google.com>
When donating pages to the guest, we only check the first IPA in the
range against the pvmfw loading range. Although this is fine for the
page-at-a-time faulting path, it doesn't fit with the rest of the mem
protection logic, which deals with the possibility of an arbitrarily
sized contiguous address range.
Rework the logic so that we check the whole IPA range during guest
donation and trigger the pvmfw loading path if any of the pages
intersect with the pvmfw region.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: I6fef9f1898e65a95cab7f6a0ffa8aa422a8d5a91
Signed-off-by: Will Deacon <willdeacon@google.com>
When poisoning the pvmfw pages during system reset at EL2, ensure that we
use a writable fixmap mapping rather than the persistent read-only mapping
of the region.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: I4c8be092d3c822695afd7d03d0d64163664a9f64
Signed-off-by: Will Deacon <willdeacon@google.com>
pkvm_clear_pvmfw_pages() is used to poison the pvmfw pages during reset,
so rename it to pkvm_poison_pvmfw_pages() instead.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: Ie5b9c90f0707fa81d9099425cff35383bfb0d009
Signed-off-by: Will Deacon <willdeacon@google.com>
hyp_zero_page() is used for poisoning memory, so rename it to
hyp_poison_page() to avoid confusing with the concept of a "zero page"
and make it available outside of mem_protect.c as it will be used to
poison the pvmfw memory in a subsequent patch.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254819795
Change-Id: Ia4aec46437db3ffe466ae09bd180392fa06c0b46
Signed-off-by: Will Deacon <willdeacon@google.com>
hyp_fixmap_map() never returns NULL, so remove the redundant checks for
it and simplify the error handling in the callers.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 254819795
Change-Id: Ie73a97cc3d9bded3750abe6e243003827393ee5e
This essentially reverts commit e41b135550
"virtio_balloon: disable VIOMMU support".
Although the virtio_balloon driver does not translate through a
VIOMMU (or bounce buffer) the pages that it sends to the device,
it *does* need to perform these translations on the virtio rings
themselves.
This fixes virtio_balloon initialisation inside a PKVM/ARM64
protected virtual machine.
Bug: 240239989
Change-Id: I2a84eec870fd638223b231e5c4d1c27216dc40a2
Signed-off-by: Keir Fraser <keirf@google.com>
This specifies that the driver is running on a PKVM hypervisor
and must use the memrelinquish service to cooperatively release
memory. If this service is unavailable, virtio_balloon cannot be
used.
Bug: 240239989
Change-Id: I8800c4435d8fae9df6f1ab108cc61c8f93020773
Signed-off-by: Keir Fraser <keirf@google.com>
Used to determine whether memrelinquish services have been
initialised.
Bug: 240239989
Change-Id: I81dd23d8122ea54924d52b3fdc1fc4a8cdb28ea5
Signed-off-by: Keir Fraser <keirf@google.com>
Move the code for aborting all SCSI commands and TMFs into a new function.
This patch makes the ufshcd_err_handler() easier to read. Except for adding
more logging, this patch does not change any functionality.
Change-Id: I94e1e80883236834eefd809dd42d6751359de03f
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221031183433.2443554-1-bvanassche@acm.org
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit b817e6ffba git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
The following deadlock has been observed on multiple test setups:
* ufshcd_wl_suspend() is waiting for blk_execute_rq(START STOP UNIT) to
complete while ufshcd_wl_suspend() holds host_sem.
* The SCSI error handler is activated, changes the host state to
SHOST_RECOVERY, ufshcd_eh_host_reset_handler() and ufshcd_err_handler()
are called and the latter function tries to obtain host_sem.
This is a deadlock because blk_execute_rq() can't execute SCSI commands
while the host is in the SHOST_RECOVERY state and because the error handler
cannot make progress because host_sem is held by another thread.
Fix this deadlock as follows:
* Fail attempts to suspend the system while the SCSI error handler is in
progress by setting the SCMD_FAIL_IF_RECOVERING flag for START STOP UNIT
commands.
* If the system is suspending and a START STOP UNIT command times out,
handle the SCSI command timeout from inside the context of the SCSI
timeout handler instead of activating the SCSI error handler.
The runtime power management code is not affected by this deadlock since
hba->host_sem is not touched by the runtime power management functions in
the UFS driver.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Change-Id: Ie34fc057e57f58a3c3ff24858c76c9b8f2b8270c
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-11-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit 7029e2151a git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Open-code scsi_execute() because a later patch will modify scmd->flags and
because scsi_execute() does not support setting scmd->flags. No
functionality is changed.
Change-Id: I427ec2f5f46aed3a0bd2c31d7f0c92ab58c6b4a2
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-10-bvanassche@acm.org
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit 6a354a7e74 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
[ bvanassche: backported from kernel v6.1 to kernel v5.15 ]
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Add a new boolean variable that tracks whether the system is suspending,
suspended or resuming. This information will be used in a later commit to
fix a deadlock between the SCSI error handler and the suspend code.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Change-Id: I8a1e6595ff3808cfab1789f9d2ead0cbd405b279
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-9-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit 1a547cbc6f git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Instead of only retrying the START STOP UNIT command if a unit attention is
reported, repeat it if any SCSI error is reported by the device or if the
command timed out.
Change-Id: I231b641942239cbcfba31058137a41e0277380a6
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-8-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit 579a4e9dbd git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Although the host lock had to be held by ufshcd_clk_scaling_start_busy()
callers when that function was introduced, that is no longer the case
today. Hence remove the comment that claims that callers of this function
must hold the host lock.
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Change-Id: Ia5534a076cb22112975052f5470ad8b11489b6f3
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-5-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit 1626c7bba1 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Commit 6600593cbd ("block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE")
made it impossible for .eh_timed_out() implementations to call
scsi_done() without causing a crash.
Restore support for SCSI timeout handlers to call scsi_done() as follows:
* Change all .eh_timed_out() handlers as follows:
- Change the return type into enum scsi_timeout_action.
- Change BLK_EH_RESET_TIMER into SCSI_EH_RESET_TIMER.
- Change BLK_EH_DONE into SCSI_EH_NOT_HANDLED.
* In scsi_timeout(), convert the SCSI_EH_* values into BLK_EH_* values.
Reviewed-by: Lee Duncan <lduncan@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Change-Id: I92d261e37dc396b43596015b70a8894ad4933eab
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-3-bvanassche@acm.org
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit dee7121e8c)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
If there is a race between scsi_done() and scsi_timeout() and if
scsi_timeout() loses the race, scsi_timeout() should not reset the request
timer. Hence change the return value for this case from BLK_EH_RESET_TIMER
into BLK_EH_DONE.
Although the block layer holds a reference on a request (req->ref) while
calling a timeout handler, restarting the timer (blk_add_timer()) while a
request is being completed is racy.
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Hannes Reinecke <hare@suse.de>
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 15f73f5b3e ("blk-mq: move failure injection out of blk_mq_complete_request")
Change-Id: Ia669e7c74e68a8a2fb716ea70f01b6ea75fff2df
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit 978b7922d3)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Userspace may want to manually control when the data should go into
WriteBooster buffer. The control happens via "wb_on" node, but presently,
there is no simple way to check if WriteBooster is supported and
enabled.
Expose the Write Booster and Clock Scaling capabilities to be able to
determine if the Write Booster is available and if its manual control is
blocked by Clock Scaling mechanism.
Link: https://lore.kernel.org/r/20220829081845.v8.1.Ibf9efc9be50783eeee55befa2270b7d38552354c@changeid
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Change-Id: I4b8f30fc9291deaccd6a4d8913d78bd0a7b344dc
Signed-off-by: Daniil Lunev <dlunev@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit 2286ade07d)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
There is the following quirk to bypass "WB Flush" in Write Booster.
- UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
If this quirk is not set, there is no knob that can control "WB Flush".
There are three flags that control Write Booster Feature:
1. WB ON/OFF
2. WB Hibern Flush ON/OFF (implicitly)
3. WB Flush ON/OFF (explicit)
The sysfs attribute that controls the WB was implemented. (1)
In the case of "Hibern Flush", it is always good to turn on. Control may
not be required. (2)
Finally, "Flush" may be necessary because the Auto-Hibern8 is not supported
in a specific environment. So the sysfs attribute that controls this is
necessary. (3)
Link: https://lore.kernel.org/r/20220804075354epcms2p8c21c894b4e28840c5fc651875b7f435f@epcms2p8
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Change-Id: Ie05798286a28c30dc30f7e9e6a1d77b963263043
Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit 6c4148ce7c)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Mediatek UFS does not want to toggle write booster during clock scaling.
Permit host driver to disable wb toggling during clock scaling.
Introduce a flag UFSHCD_CAP_WB_WITH_CLK_SCALING to decouple WB and clock
scaling. UFSHCD_CAP_WB_WITH_CLK_SCALING is only valid when
UFSHCD_CAP_CLK_SCALING is set. Just like UFSHCD_CAP_HIBERN8_WITH_CLK_GATING
is valid only when UFSHCD_CAP_CLK_GATING set.
Set UFSHCD_CAP_WB_WITH_CLK_SCALING for qcom to compatible legacy design at
the same time.
Link: https://lore.kernel.org/r/20220804025422.18803-1-peter.wang@mediatek.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Change-Id: I55eb7d48b8662be9a9e17977c4ef9cf4d402c241
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit 87bd05016a)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
CLOCK_MONOTONIC is not advanced when the system is in suspend. This becomes
problematic when debugging issues related to suspend-resume: the timestamps
printed by ufshcd_print_trs can not be correlated with dmesg entries, which
are timestamped with local_clock().
Change the used clock to local_clock() for the informational timestamp
variables and adds mirroring *_local_clock instances for variables used in
subsequent derevations (to not change the semantics of those derevations).
Link: https://lore.kernel.org/r/20220804065019.v5.1.I699244ea7efbd326a34a6dfd9b5a31e78400cf68@changeid
Acked-by: Stanley Chu <stanley.chu@mediatek.com>
Acked-by: Avri Altman <avri.altman@wdc.com>
Change-Id: If97e2a159690adb247156f264879a706eefabb84
Signed-off-by: Daniil Lunev <dlunev@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit 0f85e74756)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Provide clk-scaling feature in MediaTek UFS platforms.
MediaTek platform supports clk-scaling by switching parent clock mux of
UFSHCI main clocks: ufs_sel.
The driver needs to prevent changing the rate of ufs_sel because its parent
PLL clock may be shared between multiple IPs. In order to achieve this
goal, the maximum and minimum clock rates of ufs_sel defined in dts should
match the rate of "ufs_sel_max_src" and "ufs_sel_min_src" respectively.
Link: https://lore.kernel.org/r/20220802235437.4547-6-stanley.chu@mediatek.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Change-Id: Ie37f45fc3738369f9a401ef33f46d995cbe400a1
Signed-off-by: Po-Wen Kao <powen.kao@mediatek.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit b7dbc686f6)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Measurements for one particular UFS controller + UFS device show a 25%
higher read bandwidth if the maximum data buffer size is increased from 512
KiB to 1 MiB. Hence increase the maximum size of the data buffer associated
with a single request from SCSI_DEFAULT_MAX_SECTORS (1024) * 512 bytes =
512 KiB to 1 MiB.
Notes:
- The maximum data buffer size supported by the UFSHCI specification
is 65535 * 256 KiB or about 16 GiB.
- The maximum data buffer size for READ(10) commands is 65535 logical
blocks. To transfer more than 65535 * 4096 bytes = 255 MiB with a single
SCSI command, the READ(16) command is required. Support for READ(16) is
optional in the UFS 3.1 and UFS 4.0 standards.
Link: https://lore.kernel.org/r/20220726225232.1362251-1-bvanassche@acm.org
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Bean Huo <beanhuo@micron.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Tested-by: Avri Altman <avri.altman@wdc.com>
Acked-by: Avri Altman <avri.altman@wdc.com>
Change-Id: I8c9db803e7f55d911e86d4fa0272d38d13ee05d5
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit 86a44f045b)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Reduce the START STOP UNIT command timeout to one second since on Android
devices a kernel panic is triggered if an attempt to suspend the system
takes more than 20 seconds. One second should be enough for the START STOP
UNIT command since this command completes in less than a millisecond for
the UFS devices I have access to.
Change-Id: Ibe4960f00bfa4ae6871b563c2f91d37aa4cb6150
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-7-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit dcd5b7637c)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
If a device management command completion happens after
wait_for_completion_timeout() times out and before ufshcd_clear_cmds() is
called, then the completion code may crash on the complete() call in
__ufshcd_transfer_req_compl().
Fix the following crash:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
Call trace:
complete+0x64/0x178
__ufshcd_transfer_req_compl+0x30c/0x9c0
ufshcd_poll+0xf0/0x208
ufshcd_sl_intr+0xb8/0xf0
ufshcd_intr+0x168/0x2f4
__handle_irq_event_percpu+0xa0/0x30c
handle_irq_event+0x84/0x178
handle_fasteoi_irq+0x150/0x2e8
__handle_domain_irq+0x114/0x1e4
gic_handle_irq.31846+0x58/0x300
el1_irq+0xe4/0x1c0
efi_header_end+0x110/0x680
__irq_exit_rcu+0x108/0x124
__handle_domain_irq+0x118/0x1e4
gic_handle_irq.31846+0x58/0x300
el1_irq+0xe4/0x1c0
cpuidle_enter_state+0x3ac/0x8c4
do_idle+0x2fc/0x55c
cpu_startup_entry+0x84/0x90
kernel_init+0x0/0x310
start_kernel+0x0/0x608
start_kernel+0x4ec/0x608
Link: https://lore.kernel.org/r/20220720170228.1598842-1-bvanassche@acm.org
Fixes: 5a0b0cb9be ("[SCSI] ufs: Add support for sending NOP OUT UPIU")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Bean Huo <beanhuo@micron.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Change-Id: Id6f73be1eb751784d28e1acf6270134f6080dbd3
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bug: 258234315
(cherry picked from commit f5c2976e0c)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Remove a statement from the MediaTek driver that is not present in the
upstream code.
Bug: 258234315
Fixes: 239044beef ("Merge 5.15.63 into android14-5.15")
Change-Id: I6379d6a5f10cd6d00c94314995c55c72282e1275
Signed-off-by: Bart Van Assche <bvanassche@google.com>
This will allow a client program to avoid redundant actions on ashmem
buffers which it has already seen.
Bug: 244233389
Change-Id: Ica57a8842ff163eae5f9eca8141b439091ec0940
Signed-off-by: Mark Fasheh <mfasheh@google.com>