The following deadlock has been observed on multiple test setups:
* ufshcd_wl_suspend() is waiting for blk_execute_rq(START STOP UNIT) to
complete while ufshcd_wl_suspend() holds host_sem.
* The SCSI error handler is activated, changes the host state to
SHOST_RECOVERY, ufshcd_eh_host_reset_handler() and ufshcd_err_handler()
are called and the latter function tries to obtain host_sem.
This is a deadlock because blk_execute_rq() can't execute SCSI commands
while the host is in the SHOST_RECOVERY state and because the error handler
cannot make progress because host_sem is held by another thread.
Fix this deadlock as follows:
* Fail attempts to suspend the system while the SCSI error handler is in
progress by setting the SCMD_FAIL_IF_RECOVERING flag for START STOP UNIT
commands.
* If the system is suspending and a START STOP UNIT command times out,
handle the SCSI command timeout from inside the context of the SCSI
timeout handler instead of activating the SCSI error handler.
The runtime power management code is not affected by this deadlock since
hba->host_sem is not touched by the runtime power management functions in
the UFS driver.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-11-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reduce the START STOP UNIT command timeout to one second since on Android
devices a kernel panic is triggered if an attempt to suspend the system
takes more than 20 seconds. One second should be enough for the START STOP
UNIT command since this command completes in less than a millisecond for
the UFS devices I have access to.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-7-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During I/O and simultaneous cat of /sys/kernel/debug/lpfc/fnX/rx_monitor, a
hard lockup similar to the call trace below may occur.
The spin_lock_bh in lpfc_rx_monitor_report is not protecting from timer
interrupts as expected, so change the strength of the spin lock to _irq.
Kernel panic - not syncing: Hard LOCKUP
CPU: 3 PID: 110402 Comm: cat Kdump: loaded
exception RIP: native_queued_spin_lock_slowpath+91
[IRQ stack]
native_queued_spin_lock_slowpath at ffffffffb814e30b
_raw_spin_lock at ffffffffb89a667a
lpfc_rx_monitor_record at ffffffffc0a73a36 [lpfc]
lpfc_cmf_timer at ffffffffc0abbc67 [lpfc]
__hrtimer_run_queues at ffffffffb8184250
hrtimer_interrupt at ffffffffb8184ab0
smp_apic_timer_interrupt at ffffffffb8a026ba
apic_timer_interrupt at ffffffffb8a01c4f
[End of IRQ stack]
apic_timer_interrupt at ffffffffb8a01c4f
lpfc_rx_monitor_report at ffffffffc0a73c80 [lpfc]
lpfc_rx_monitor_read at ffffffffc0addde1 [lpfc]
full_proxy_read at ffffffffb83e7fc3
vfs_read at ffffffffb833fe71
ksys_read at ffffffffb83402af
do_syscall_64 at ffffffffb800430b
entry_SYSCALL_64_after_hwframe at ffffffffb8a000ad
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20221017164323.14536-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The request associated with a SCSI command coming from the block layer has
a unique tag, so use that when possible for getting a slot.
Unfortunately we don't support reserved commands in the SCSI midlayer yet.
As such, SMP tasks - as an example - will not have a request associated, so
in the interim continue to manage those tags for that type of sas_task
internally.
We reserve an arbitrary 4 tags for these internal tags. Indeed, we already
decrement MVS_RSVD_SLOTS by 2 for the shost can_queue when flag
MVF_FLAG_SOC is set. This change was made in commit 20b09c2992 ("[SCSI]
mvsas: add support for 94xx; layout change; bug fixes"), but what those 2
slots are used for is not obvious.
Also make the tag management functions static, where possible.
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1666091763-11023-8-git-send-email-john.garry@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In commit 5a141315ed ("scsi: pm80xx: Increase the number of outstanding
I/O supported to 1024") the pm8001_ha->tags allocation was moved into
pm8001_init_ccb_tag(). This changed the execution order of allocation.
pm8001_tag_init() used to be called after the pm8001_ha->tags allocation
and now it is called before the allocation.
Before:
pm8001_pci_probe()
`--> pm8001_pci_alloc()
`--> pm8001_alloc()
`--> pm8001_ha->tags = kzalloc(...)
`--> pm8001_tag_init(pm8001_ha); // OK: tags are allocated
After:
pm8001_pci_probe()
`--> pm8001_pci_alloc()
| `--> pm8001_alloc()
| `--> pm8001_tag_init(pm8001_ha); // NOK: tags are not allocated
|
`--> pm8001_init_ccb_tag()
`--> pm8001_ha->tags = kzalloc(...) // today it is bitmap_zalloc()
Since pm8001_ha->tags_num is zero when pm8001_tag_init() is called it does
nothing. Tags memory is allocated with bitmap_zalloc() so there is no need
to manually clear each bit with pm8001_tag_free().
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1666091763-11023-5-git-send-email-john.garry@huawei.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of using scsi_host_template members to track the SCSI proc
directory entries, track these entries in a list. This changes the time
needed for looking up the proc dir pointer from O(1) into O(n). This is
considered acceptable since the number of SCSI host adapter types per host
is usually small (less than ten).
This change has been tested by attaching two USB storage devices to a qemu
host:
$ grep -aH . /proc/scsi/usb-storage/*
/proc/scsi/usb-storage/7: Host scsi7: usb-storage
/proc/scsi/usb-storage/7: Vendor: QEMU
/proc/scsi/usb-storage/7: Product: QEMU USB HARDDRIVE
/proc/scsi/usb-storage/7:Serial Number: 1-0000:00:02.1:00.0-6
/proc/scsi/usb-storage/7: Protocol: Transparent SCSI
/proc/scsi/usb-storage/7: Transport: Bulk
/proc/scsi/usb-storage/7: Quirks: SANE_SENSE
/proc/scsi/usb-storage/8: Host scsi8: usb-storage
/proc/scsi/usb-storage/8: Vendor: QEMU
/proc/scsi/usb-storage/8: Product: QEMU USB HARDDRIVE
/proc/scsi/usb-storage/8:Serial Number: 1-0000:00:02.1:00.0-7
/proc/scsi/usb-storage/8: Protocol: Transparent SCSI
/proc/scsi/usb-storage/8: Transport: Bulk
/proc/scsi/usb-storage/8: Quirks: SANE_SENSE
This commit prepares for constifying most SCSI host templates.
Reviewed-by: John Garry <john.garry@huawei.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221015002418.30955-5-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In commit c6b9ef5779 ("[SCSI] pm80xx: NCQ error handling changes") the
driver had support added to handle NCQ errors but much of what is done in
this handling is duplicated from the libata EH.
In that named commit we handle in 2x main steps:
a. Issue read log ext10 to examine and clear the errors
b. Issue SATA_ABORT all command
Indeed, in libata EH, we do similar to above:
a. ata_do_eh() -> ata_eh_autopsy() -> ata_eh_link_autopsy() ->
ata_eh_analyze_ncq_error() -> ata_eh_read_log_10h()
b. ata_do_eh() -> ata_eh_recover() which will issue a device soft reset
or hard reset
Since there is so much duplication, use sas_ata_device_link_abort() which
will abort all pending IOs and kick of ATA EH which will do the steps,
above.
However we will not follow the advisory to send the SATA_ABORT all command
after the autopsy in read log ext10. Indeed, in libsas EH, we already send
a per-task SATA_ABORT command, and this is prior to the ATA EH kicking in
and issuing the read log ext10 in the recovery process. I judge that this
is ok as the SATA_ABORT command does not actually send any protocol on the
link to abort I/O on the other side, so would not change any state on the
disk (for the read log ext10 command).
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1665998435-199946-7-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com> # pm80xx
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When an NCQ error occurs, the controller will abnormally complete the I/Os
that are newly delivered to disk, and bit8 in CQ dw3 will be set which
indicates that the SATA disk is in error state. The current processing flow
is to set ts->stat to SAS_OPEN_REJECT and then sas_ata_task_done() will set
FIS stat to ATA_ERR. After analyzing the I/O by ata_eh_analyze_tf(),
err_mask will set to AC_ERR_HSM. If media error occurs for four times
within 10 minutes and the chip rejects new I/Os for four times, NCQ will be
disabled due to excessive errors, which is undesirable.
Therefore, use sas_task_abort() to handle abnormally completed I/Os when
SATA disk is in error state, as these abnormally completed I/Os are already
processed by sas_ata_device_link_abort() and qc->flag are set to
ATA_QCFLAG_FAILED. If sas_task_abort() is used, qc->err_mask will not be
modified in EH. Unlike the current process flow, it will not increase the
count of ECAT_TOUT_HSM and not turn off NCQ. Like other I/Os on the disk
that do not have an error but do not return after the NCQ error, they are
retried after the EH.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1665998435-199946-5-git-send-email-john.garry@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When CQ header dw3 SATA_DISK_ERR is set it means this SATA disk is in error
state and the current IPTT is invalid. An invalid IPTT does not correspond
to any slot.
In this scenario, new I/Os that delivered to disk will be rejected by the
controller and all I/Os remaining in the disk should be aborted, which we
add here with the sas_ata_device_link_abort() call.
In hisi_sas_abort_task() we don't want to issue a soft reset as it may
cause info to be lost in the target disk for the ATA EH autopsy. In this
case, just release resources - the disk won't return other I/Os normally
after NCQ Error, so this is safe.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1665998435-199946-4-git-send-email-john.garry@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Similar to how AHCI handles NCQ errors in ahci_error_intr() ->
ata_port_abort() -> ata_do_link_abort(), add an NCQ error handler for LLDDs
to call to initiate a link abort.
This will mark all outstanding QCs as failed and kick-off EH.
Note:
A "force reset" argument is added for drivers which require the ATA error
handling to always reset the device.
A driver may require this feature for when SATA device per-SCSI cmnd
resources are only released during reset for ATA EH. As such, we need an
option to force reset to be done, regardless of what any EH autopsy
decides.
The SATA device FIS fields are set to indicate a device error from
ata_eh_analyze_tf().
Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1665998435-199946-2-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com> # pm80xx
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>