commit ecda7c0280 upstream.
Add support for one pre-commit callback which is run right before the
metadata are committed.
This allows the thin provisioning target to run a callback before the
metadata are committed and is required by the next commit.
Cc: stable@vger.kernel.org
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b3fd1f53a upstream.
dm-clone maintains an on-disk bitmap which records which regions are
valid in the destination device, i.e., which regions have already been
hydrated, or have been written to directly, via user I/O.
Setting a bit in the on-disk bitmap meas the corresponding region is
valid in the destination device and we redirect all I/O regarding it to
the destination device.
Suppose the destination device has a volatile write-back cache and the
following sequence of events occur:
1. A region gets hydrated, either through the background hydration or
because it was written to directly, via user I/O.
2. The commit timeout expires and we commit the metadata, marking that
region as valid in the destination device.
3. The system crashes and the destination device's cache has not been
flushed, meaning the region's data are lost.
The next time we read that region we read it from the destination
device, since the metadata have been successfully committed, but the
data are lost due to the crash, so we read garbage instead of the old
data.
This has several implications:
1. In case of background hydration or of writes with size smaller than
the region size (which means we first copy the whole region and then
issue the smaller write), we corrupt data that the user never
touched.
2. In case of writes with size equal to the device's logical block size,
we fail to provide atomic sector writes. When the system recovers the
user will read garbage from the sector instead of the old data or the
new data.
3. In case of writes without the FUA flag set, after the system
recovers, the written sectors will contain garbage instead of a
random mix of sectors containing either old data or new data, thus we
fail again to provide atomic sector writes.
4. Even when the user flushes the dm-clone device, because we first
commit the metadata and then pass down the flush, the same risk for
corruption exists (if the system crashes after the metadata have been
committed but before the flush is passed down).
The only case which is unaffected is that of writes with size equal to
the region size and with the FUA flag set. But, because FUA writes
trigger metadata commits, this case can trigger the corruption
indirectly.
To solve this and avoid the potential data corruption we flush the
destination device **before** committing the metadata.
This ensures that any freshly hydrated regions, for which we commit the
metadata, are properly written to non-volatile storage and won't be lost
in case of a crash.
Fixes: 7431b7835f ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8fdbfe8d16 upstream.
Split the metadata commit in two parts:
1. dm_clone_metadata_pre_commit(): Prepare the current transaction for
committing. After this is called, all subsequent metadata updates,
done through either dm_clone_set_region_hydrated() or
dm_clone_cond_set_range(), will be part of the next transaction.
2. dm_clone_metadata_commit(): Actually commit the current transaction
to disk and start a new transaction.
This is required by the following commit. It allows dm-clone to flush
the destination device after step (1) to ensure that all freshly
hydrated regions, for which we are updating the metadata, are properly
written to non-volatile storage and won't be lost in case of a crash.
Fixes: 7431b7835f ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6a505f3f9 upstream.
Extend struct dirty_map with a second bitmap which tracks the exact
regions that were hydrated during the current metadata transaction.
Moreover, fix __flush_dmap() to only commit the metadata of the regions
that were hydrated during the current transaction.
This is required by the following commits to fix a data corruption bug.
Fixes: 7431b7835f ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 474e559567 upstream.
We got the following warnings from thin_check during thin-pool setup:
$ thin_check /dev/vdb
examining superblock
examining devices tree
missing devices: [1, 84]
too few entries in btree_node: 41, expected at least 42 (block 138, max_entries = 126)
examining mapping tree
The phenomenon is the number of entries in one node of details_info tree is
less than (max_entries / 3). And it can be easily reproduced by the following
procedures:
$ new a thin pool
$ presume the max entries of details_info tree is 126
$ new 127 thin devices (e.g. 1~127) to make the root node being full
and then split
$ remove the first 43 (e.g. 1~43) thin devices to make the children
reblance repeatedly
$ stop the thin pool
$ thin_check
The root cause is that the B-tree removal procedure in __rebalance2()
doesn't guarantee the invariance: the minimal number of entries in
non-root node should be >= (max_entries / 3).
Simply fix the problem by increasing the rebalance threshold to
make sure the number of entries in each child will be greater
than or equal to (max_entries / 3 + 1), so no matter which
child is used for removal, the number will still be valid.
Cc: stable@vger.kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbaf971c9c upstream.
Removes the branching for edge-case where no SCSI device handler
exists. The __map_bio_fast() method was far too limited, by only
selecting a new pathgroup or path IFF there was a path failure, fix this
be eliminating it in favor of __map_bio(). __map_bio()'s extra SCSI
device handler specific MPATHF_PG_INIT_REQUIRED test is not in the fast
path anyway.
This change restores full path selector functionality for bio-based
configurations that don't haave a SCSI device handler. But it should be
noted that the path selectors do have an impact on performance for
certain networks that are extremely fast (and don't require frequent
switching).
Fixes: 8d47e65948 ("dm mpath: remove unnecessary NVMe branching in favor of scsi_dh checks")
Cc: stable@vger.kernel.org
Reported-by: Drew Hastings <dhastings@crucialwebhost.com>
Suggested-by: Martin Wilck <mwilck@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43cb86799f upstream.
With commit 222ec1618c ("drm: Add aspect ratio parsing in DRM
layer") the drm core started honoring the picture_aspect_ratio field
when comparing two drm_display_modes. Prior to that it was ignored.
When the CVBS encoder driver was initially submitted there was no aspect
ratio check.
Switch from drm_mode_equal() to drm_mode_match() without
DRM_MODE_MATCH_ASPECT_RATIO to fix "kmscube" and X.org output using the
CVBS connector. When (for example) kmscube sets the output mode when
using the CVBS connector it passes HDMI_PICTURE_ASPECT_NONE, making the
drm_mode_equal() fail as it include the aspect ratio.
Prior to this patch kmscube reported:
failed to set mode: Invalid argument
The CVBS mode checking in the sun4i (drivers/gpu/drm/sun4i/sun4i_tv.c
sun4i_tv_mode_to_drm_mode) and ZTE (drivers/gpu/drm/zte/zx_tvenc.c
tvenc_mode_{pal,ntsc}) drivers don't set the "picture_aspect_ratio" at
all. The Meson VPU driver does not rely on the aspect ratio for the CVBS
output so we can safely decouple it from the hdmi_picture_aspect
setting.
Cc: <stable@vger.kernel.org>
Fixes: 222ec1618c ("drm: Add aspect ratio parsing in DRM layer")
Fixes: bbbe775ec5 ("drm: Add support for Amlogic Meson Graphic Controller")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
[narmstrong: squashed with drm: meson: venc: cvbs: deduplicate the meson_cvbs_mode lookup code]
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191208171832.1064772-3-martin.blumenstingl@googlemail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d567fb8819 upstream.
Since irq_bypass_register_producer() is called after request_irq(), we
should do tear-down in reverse order: irq_bypass_unregister_producer()
then free_irq().
Specifically free_irq() may release resources required by the
irqbypass del_producer() callback. Notably an example provided by
Marc Zyngier on arm64 with GICv4 that he indicates has the potential
to wedge the hardware:
free_irq(irq)
__free_irq(irq)
irq_domain_deactivate_irq(irq)
its_irq_domain_deactivate()
[unmap the VLPI from the ITS]
kvm_arch_irq_bypass_del_producer(cons, prod)
kvm_vgic_v4_unset_forwarding(kvm, irq, ...)
its_unmap_vlpi(irq)
[Unmap the VLPI from the ITS (again), remap the original LPI]
Signed-off-by: Jiang Yi <giangyi@amazon.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: 6d7425f109 ("vfio: Register/unregister irq_bypass_producer")
Link: https://lore.kernel.org/kvm/20191127164910.15888-1-giangyi@amazon.com
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[aw: commit log]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86a7964be7 upstream.
There is a race between a system call processing thread
and the demultiplex thread when mid->resp_buf becomes NULL
and later is being accessed to get credits. It happens when
the 1st thread wakes up before a mid callback is called in
the 2nd one but the mid state has already been set to
MID_RESPONSE_RECEIVED. This causes NULL pointer dereference
in mid callback.
Fix this by saving credits from the response before we
update the mid state and then use this value in the mid
callback rather then accessing a response buffer.
Cc: Stable <stable@vger.kernel.org>
Fixes: ee258d7915 ("CIFS: Move credit processing to mid callbacks for SMB3")
Tested-by: Frank Sorenson <sorenson@redhat.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b71843fa7 upstream.
When an OPEN command is cancelled we mark a mid as
cancelled and let the demultiplex thread process it
by closing an open handle. The problem is there is
a race between a system call thread and the demultiplex
thread and there may be a situation when the mid has
been already processed before it is set as cancelled.
Fix this by processing cancelled requests when mids
are being destroyed which means that there is only
one thread referencing a particular mid. Also set
mids as cancelled unconditionally on their state.
Cc: Stable <stable@vger.kernel.org>
Tested-by: Frank Sorenson <sorenson@redhat.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9150c3adbf upstream.
If Close command is interrupted before sending a request
to the server the client ends up leaking an open file
handle. This wastes server resources and can potentially
block applications that try to remove the file or any
directory containing this file.
Fix this by putting the close command into a worker queue,
so another thread retries it later.
Cc: Stable <stable@vger.kernel.org>
Tested-by: Frank Sorenson <sorenson@redhat.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44805b0e62 upstream.
Currently the client translates O_SYNC and O_DIRECT flags
into corresponding SMB create options when openning a file.
The problem is that on reconnect when the file is being
re-opened the client doesn't set those flags and it causes
a server to reject re-open requests because create options
don't match. The latter means that any subsequent system
call against that open file fail until a share is re-mounted.
Fix this by properly setting SMB create options when
re-openning files after reconnects.
Fixes: 1013e760d1: ("SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags")
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 37941ea17d upstream.
While it's not friendly to fail user processes that issue more iovs
than we support, at least we should return the correct error code so the
user process gets a chance to retry with smaller number of iovs.
Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c21ce58eab upstream.
It's not necessary to queue invalidated memory registration to work queue, as
all we need to do is to unmap the SG and make it usable again. This can save
CPU cycles in normal data paths as memory registration errors are rare and
normally only happens during reconnection.
Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3dadc19b7 upstream.
Attempting to transmit rx_done messages after the GLINK instance is
being torn down will cause use after free and memory leaks. So cancel
the intent_work and free up the pending intents.
With this there are no concurrent accessors of the channel left during
qcom_glink_native_remove() and there is therefor no need to hold the
spinlock during this operation - which would prohibit the use of
cancel_work_sync() in the release function. So remove this.
Fixes: 1d2ea36eea ("rpmsg: glink: Add rx done command")
Cc: stable@vger.kernel.org
Acked-by: Chris Lew <clew@codeaurora.org>
Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7e714988e upstream.
The device release function is set before registering with rpmsg. If
rpmsg registration fails, the framework will call device_put(), which
invokes the release function. The channel create logic does not need to
free rpdev if rpmsg_register_device() fails and release is called.
Fixes: b4f8e52b89 ("rpmsg: Introduce Qualcomm RPM glink driver")
Cc: stable@vger.kernel.org
Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Chris Lew <clew@codeaurora.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 36de10c478 upstream.
Virtual and translated addresses retrieved by the xtensa TLB sanity
checker must be consistent, i.e. correspond to the same state of the
checked TLB entry. KASAN shadow memory is mapped dynamically using
auto-refill TLB entries and thus may change TLB state between the
virtual and translated address retrieval, resulting in false TLB
insanity report.
Move read_xtlb_translation close to read_xtlb_virtual to make sure that
read values are consistent.
Cc: stable@vger.kernel.org
Fixes: a99e07ee5e ("xtensa: check TLB sanity on return to userspace")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe5e7ba11f upstream.
Commit 9287c6452d fixed a situation in which gfs2 could use a glock
after it had been freed. To do that, it temporarily added a new glock
reference by calling gfs2_glock_hold in function gfs2_add_revoke.
However, if the bd element was removed by gfs2_trans_remove_revoke, it
failed to drop the additional reference.
This patch adds logic to gfs2_trans_remove_revoke to properly drop the
additional glock reference.
Fixes: 9287c6452d ("gfs2: Fix occasional glock use-after-free")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f53056c430 upstream.
In gfs2_page_mkwrite's gfs2_allocate_page_backing helper, try to
allocate as many blocks at once as we need. Pass in the size of the
requested allocation.
Fixes: 35af80aef9 ("gfs2: don't use buffer_heads in gfs2_allocate_page_backing")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e64681b487 upstream.
KASAN shadow map doesn't need to be accessible through the linear kernel
mapping, allocate its pages with MEMBLOCK_ALLOC_ANYWHERE so that high
memory can be used. This frees up to ~100MB of low memory on xtensa
configurations with KASAN and high memory.
Cc: stable@vger.kernel.org # v5.1+
Fixes: f240ec09bb ("memblock: replace memblock_alloc_base(ANYWHERE) with memblock_phys_alloc")
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc90bc6842 upstream.
This partially reverts commit e3a5d8e386.
Commit e3a5d8e386 ("check bi_size overflow before merge") adds a bio_full
check to __bio_try_merge_page. This will cause __bio_try_merge_page to fail
when the last bi_io_vec has been reached. Instead, what we want here is only
the bi_size overflow check.
Fixes: e3a5d8e386 ("block: check bi_size overflow before merge")
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6a3aea935 upstream.
QOS requests for DEFAULT_VALUE are supposed to be ignored but this is
not the case for FREQ_QOS_MAX. Adding one request for MAX_DEFAULT_VALUE
and one for a real value will cause freq_qos_read_value to unexpectedly
return MAX_DEFAULT_VALUE (-1).
This happens because freq_qos max value is aggregated with PM_QOS_MIN
but FREQ_QOS_MAX_DEFAULT_VALUE is (-1) so it's smaller than other
values.
Fix this by redefining FREQ_QOS_MAX_DEFAULT_VALUE to S32_MAX.
Looking at current users for freq_qos it seems that none of them create
requests for FREQ_QOS_MAX_DEFAULT_VALUE.
Fixes: 77751a466e ("PM: QoS: Introduce frequency QoS")
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c7e53e1c9 upstream.
The R-Car Gen2/3 manual - available at:
https://www.renesas.com/eu/en/products/microcontrollers-microprocessors/rz/rzg/rzg1m.html#documents
"RZ/G Series User's Manual: Hardware" section
strictly enforces the MACCTLR inizialization value - 39.3.1 - "Initial
Setting of PCI Express":
"Be sure to write the initial value (= H'80FF 0000) to MACCTLR before
enabling PCIETCTLR.CFINIT".
To avoid unexpected behavior and to match the SW initialization sequence
guidelines, this patch programs the MACCTLR with the correct value.
Note that the MACCTLR.SPCHG bit in the MACCTLR register description
reports that "Only writing 1 is valid and writing 0 is invalid" but this
"invalid" has to be interpreted as a write-ignore aka "ignored", not
"prohibited".
Reported-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Fixes: c25da47788 ("PCI: rcar: Add Renesas R-Car PCIe driver")
Fixes: be20bbcb0a ("PCI: rcar: Add the initialization of PCIe link in resume_noirq()")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: <stable@vger.kernel.org> # v5.2+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73884a7082 upstream.
As per PCIe r5.0, sec 7.8.5.2, fixed bus numbers of a bridge must be zero
when no function that uses EA is located behind it. Hence, if EA supplies
bus numbers of zero, assign bus numbers normally. A secondary bus can
never have a bus number of zero, so setting a bridge's Secondary Bus Number
to zero makes downstream devices unreachable.
[bhelgaas: retain bool return value so "zero is invalid" logic is local]
Fixes: 2dbce59011 ("PCI: Assign bus numbers present in EA capability for bridges")
Link: https://lore.kernel.org/r/1572850664-9861-1-git-send-email-sundeep.lkml@gmail.com
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e045fa29e8 upstream.
When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector
Control register for each vector and saves it in desc->masked. Each
register is 32 bits and bit 0 is the actual Mask bit.
When we restored these registers during resume, we previously set the Mask
bit if *any* bit in desc->masked was set instead of when the Mask bit
itself was set:
pci_restore_state
pci_restore_msi_state
__pci_restore_msix_state
for_each_pci_msi_entry
msix_mask_irq(entry, entry->masked) <-- entire u32 word
__pci_msix_desc_mask_irq(desc, flag)
mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT
if (flag) <-- testing entire u32, not just bit 0
mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT
writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL)
This means that after resume, MSI-X vectors were masked when they shouldn't
be, which leads to timeouts like this:
nvme nvme0: I/O 978 QID 3 timeout, completion polled
On resume, set the Mask bit only when the saved Mask bit from suspend was
set.
This should remove the need for 19ea025e1d ("nvme: Add quirk for Kingston
NVME SSD running FW E8FK11.T").
[bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887
Link: https://lore.kernel.org/r/20191008034238.2503-1-jian-hong@endlessm.com
Fixes: f2440d9acb ("PCI MSI: Refactor interrupt masking code")
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2c33ccacb upstream.
pci_pm_thaw_noirq() is supposed to return the device to D0 and restore its
configuration registers, but previously it only did that for devices whose
drivers implemented the new power management ops.
Hibernation, e.g., via "echo disk > /sys/power/state", involves freezing
devices, creating a hibernation image, thawing devices, writing the image,
and powering off. The fact that thawing did not return devices with legacy
power management to D0 caused errors, e.g., in this path:
pci_pm_thaw_noirq
if (pci_has_legacy_pm_support(pci_dev)) # true for Mellanox VF driver
return pci_legacy_resume_early(dev) # ... legacy PM skips the rest
pci_set_power_state(pci_dev, PCI_D0)
pci_restore_state(pci_dev)
pci_pm_thaw
if (pci_has_legacy_pm_support(pci_dev))
pci_legacy_resume
drv->resume
mlx4_resume
...
pci_enable_msix_range
...
if (dev->current_state != PCI_D0) # <---
return -EINVAL;
which caused these warnings:
mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
PM: Device a6d1:00:02.0 failed to thaw: error -95
Return devices to D0 and restore config registers for all devices, not just
those whose drivers support new power management.
[bhelgaas: also call pci_restore_state() before pci_legacy_resume_early(),
update comment, add stable tag, commit log]
Link: https://lore.kernel.org/r/KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>