- Assign VMD IRQ domain before enumeration to avoid IOMMU interrupt
remapping errors when MSI-X remapping is disabled (Nirmal Patel)
- Revert VMD workaround that kept MSI-X remapping enabled when IOMMU
remapping was enabled (Nirmal Patel)
* remotes/lorenzo/pci/vmd:
PCI: vmd: Revert 2565e5b69c ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.")
PCI: vmd: Assign VMD IRQ domain before enumeration
- Drop unnecessary "retval" variable, since it's never read (Colin Ian
King)
* remotes/lorenzo/pci/versatile:
PCI: versatile: Remove redundant variable retval
- Fix refcount leak in mtk_pcie_subsys_powerup() (Miaoqian Lin)
- Reset PHY and MAC at probe time (AngeloGioacchino Del Regno)
* remotes/lorenzo/pci/mediatek:
PCI: mediatek-gen3: Assert resets to ensure expected init state
PCI: mediatek: Fix refcount leak in mtk_pcie_subsys_powerup()
- Add a "big-endian" DT property to indicate that the PEX_LUT and PF
register blocks are implemented in big-endian (Hou Zhiqiang)
- Add EP mode compatible strings for ls1028a (Xiaowei Bao)
- Define DT properties for AER/PME interrupts (Li Yang)
* remotes/lorenzo/pci/layerscape:
dt-bindings: pci: layerscape-pci: define AER/PME interrupts
dt-bindings: pci: layerscape-pci: Add EP mode compatible strings for ls1028a
dt-bindings: pci: layerscape-pci: Update the description of SCFG property
dt-bindings: pci: layerscape-pci: Add a optional property big-endian
- Return error instead of success if DMA mapping of MSI area fails (Jiantao
Zhang)
- Drop tegra194 MSI register save/restore, which is unnecessary since the
DWC core does it (Jisheng Zhang)
- Factor out qcom enable/disable resources code (Dmitry Baryshkov)
- Remove "snps,dw-pcie" from rockchip-dwc DT "compatible" property because
it's not fully compatible with rockchip (Peter Geis)
- Reset rockchip-dwc controller at probe (Peter Geis)
- Add rockchip-dwc INTx support (Peter Geis)
* remotes/lorenzo/pci/dwc:
PCI: rockchip-dwc: Add legacy interrupt support
PCI: rockchip-dwc: Reset core at driver probe
dt-bindings: PCI: Remove fallback from Rockchip DesignWare binding
PCI: qcom-ep: Move enable/disable resources code to common functions
PCI: tegra194: Remove unnecessary MSI enable reg save and restore
PCI: dwc: Fix setting error return on MSI DMA mapping failure
- Fix bitmap size when searching for free outbound region (Dan Carpenter)
- Do device-specific setup to allow PTM Responder to be enabled (Christian
Gmeiner)
- Don't advertise FLR in Device Capabilities register because the
controller incorrectly resets Margining Lane Status and Margining Lane
Control on FLR (Parshuram Thombare)
* remotes/lorenzo/pci/cadence:
PCI: cadence: Clear FLR in device capabilities register
PCI: cadence: Allow PTM Responder to be enabled
PCI: cadence: Fix find_first_zero_bit() limit
- Clip only host bridge windows for E820 regions and log what is clipped
(Bjorn Helgaas)
- Add kernel cmdline options to use/ignore E820 reserved regions (Hans de
Goede)
- Disable E820 reserved region clipping for IdeaPads, Yoga, Yoga Slip, Acer
Spin 5, Clevo Barebone systems where clipping leaves no usable address
space for touchpads, Thunderbolt devices, etc (Hans de Goede)
- Disable E820 reserved region clipping by default starting in 2023 (Hans
de Goede)
* pci/resource:
x86/PCI: Disable E820 reserved region clipping starting in 2023
x86/PCI: Disable E820 reserved region clipping via quirks
x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions
x86/PCI: Clip only host bridge windows for E820 regions
x86: Log resource clipping for E820 regions
x86/PCI: Eliminate remove_e820_regions() common subexpressions
- Define pci_restore_standard_config() only for CONFIG_PM_SLEEP, since it's
not used otherwise (Krzysztof Kozlowski)
- Power up all devices during runtime resume (Rafael J. Wysocki)
- Resume subordinate bus in bus type callbacks (Rafael J. Wysocki)
- Drop pci_dev runtime_d3cold flag since no uses remain (Rafael J. Wysocki)
- Move power-up to D0 code to pci_power_up() and rename
pci_raw_set_power_state() to pci_set_low_power_state() (Rafael J.
Wysocki)
- Set current_state to D3cold if the device is not accessible (Rafael J.
Wysocki)
- Do not call pci_update_current_state() from pci_power_up() (Rafael J.
Wysocki)
- Write 0 to PMCSR in pci_power_up() in all cases (Rafael J. Wysocki)
- Split part of pci_power_up() off to pci_set_full_power_state() (Rafael J.
Wysocki)
- Do not restore BARs if device is not in D0 (Rafael J. Wysocki)
* pci/pm:
PCI/PM: Replace pci_set_power_state() in pci_pm_thaw_noirq()
PCI/PM: Rearrange pci_set_power_state()
PCI/PM: Clean up pci_set_low_power_state()
PCI/PM: Do not restore BARs if device is not in D0
PCI/PM: Split pci_power_up()
PCI/PM: Write 0 to PMCSR in pci_power_up() in all cases
PCI/PM: Do not call pci_update_current_state() from pci_power_up()
PCI/PM: Unfold pci_platform_power_transition() in pci_power_up()
PCI/PM: Set current_state to D3cold if the device is not accessible
PCI/PM: Relocate pci_set_low_power_state()
PCI/PM: Split pci_raw_set_power_state()
PCI/PM: Rearrange pci_update_current_state()
PCI/PM: Drop the runtime_d3cold device flag
PCI/PM: Resume subordinate bus in bus type callbacks
PCI/PM: Power up all devices during runtime resume
PCI/PM: Define pci_restore_standard_config() only for CONFIG_PM_SLEEP
- Update pci_p2pdma_whitelist[] checking so we accept Skylake-E Root Ports
even if they're not at devfn 00.0 (Shlomo Pongratz)
* pci/p2pdma:
PCI/P2PDMA: Whitelist Intel Skylake-E Root Ports at any devfn
- Allow D3 only if Root Port can signal and wake from D3 so we don't miss
hotplug events on AMD Yellow Carp (Mario Limonciello)
- Clean up hotplug include files to enable future powerpc cleanup
(Christophe Leroy)
* pci/hotplug:
PCI: hotplug: Clean up include files
PCI/ACPI: Allow D3 only if Root Port can signal and wake from D3
- Clear AER "multiple errors" bits to avoid race that left them set forever
(Kuppuswamy Sathyanarayanan)
* pci/error:
PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits
- Quirk Intel DG2 ASPM L1 acceptable latency to be unlimited. The device
advertises that it can only tolerate 1us, which means L1 is rarely if
ever used (Mika Westerberg)
* pci/aspm:
PCI/ASPM: Make Intel DG2 L1 acceptable latency unlimited
Changes to the schema:
- Fixed the ordering of clock-names/reset-names according to
the dtsi files.
- Mark vdda-supply as required only for apq/ipq8064 (as it was marked
as generally required in the txt file).
Changes to examples:
- Inline clock and reset numbers rather than including dt-bindings
files because of conflicts between the headers
- Split ranges and reg properties to follow current practice
- Change -gpio to -gpios
- Update IRQ flags to LEVEL_HIGH rater than NONE
- Removed extra "snps,dw-pcie" compatibility.
Note: while it was not clearly described in text schema, the majority of
Qualcomm platforms follow the snps,dw-pcie schema and use two
compatibility strings in the DT files: platform-specific one and a
fallback to the generic snps,dw-pcie one. However the platform itself is
not compatible with the snps,dw-pcie interface, so we are going to
remove it.
Link: https://lore.kernel.org/r/20220506152107.1527552-2-dmitry.baryshkov@linaro.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Some firmware includes unusable space (host bridge registers, hidden PCI
device BARs, etc) in PCI host bridge _CRS. As far as we know, there's
nothing in the ACPI, UEFI, or PCI Firmware spec that requires the OS to
remove E820 reserved regions from _CRS, so this seems like a firmware
defect.
As a workaround, 4dc2287c18 ("x86: avoid E820 regions when allocating
address space") has clipped out the unusable space in the past. This is
required for machines like the following:
- Dell Precision T3500 (the original motivator for 4dc2287c18); see
https://bugzilla.kernel.org/show_bug.cgi?id=16228
- Asus C523NA (Coral) Chromebook; see
https://lore.kernel.org/all/4e9fca2f-0af1-3684-6c97-4c35befd5019@redhat.com/
- Lenovo ThinkPad X1 Gen 2; see:
https://bugzilla.redhat.com/show_bug.cgi?id=2029207
But other firmware supplies E820 reserved regions that cover entire _CRS
windows, and clipping throws away the entire window, leaving none for
hot-added or uninitialized devices. This clipping breaks a whole range of
Lenovo IdeaPads, Yogas, Yoga Slims, and notebooks, as well as Acer Spin 5
and Clevo X170KM-G Barebone machines.
E820 reserved entries that cover a memory-mapped PCI host bridge, including
its registers and memory/IO windows, are probably *not* a firmware defect.
Per ACPI v5.4, sec 15.2, the E820 memory map may include:
Address ranges defined for baseboard memory-mapped I/O devices, such as
APICs, are returned as reserved.
Disable the E820 clipping by default for all post-2022 machines. We
already have quirks to disable clipping for pre-2023 machines, and we'll
likely need quirks to *enable* clipping for post-2022 machines that
incorrectly include unusable space in _CRS, including Chromebooks and
Lenovo ThinkPads.
Here's the rationale for doing this. If we do nothing, and continue
clipping by default:
- Future systems like the Lenovo IdeaPads, Yogas, etc, Acer Spin, and
Clevo Barebones will require new quirks to disable clipping.
- The problem here is E820 entries that cover entire _CRS windows that
should not be clipped out.
- I think these E820 entries are legal per spec, and it would be hard to
get BIOS vendors to change them.
- We will discover new systems that need clipping disabled piecemeal as
they are released.
- Future systems like Lenovo X1 Carbon and the Chromebooks (probably
anything using coreboot) will just work, even though their _CRS is
incorrect, so we will not notice new ones that rely on the clipping.
- BIOS updates will not require new quirks unless they change the DMI
model string.
If we add the date check in this commit that disables clipping, e.g., "no
clipping when date >= 2023":
- Future systems like Lenovo *IIL*, Acer Spin, and Clevo Barebones will
just work without new quirks.
- Future systems like Lenovo X1 Carbon and the Chromebooks will require
new quirks to *enable* clipping.
- The problem here is that _CRS contains regions that are not usable by
PCI devices, and we rely on the E820 kludge to clip them out.
- I think this use of E820 is clearly a firmware bug, so we have a
fighting chance of getting it changed eventually.
- BIOS updates after the cutoff date *will* require quirks, but only for
systems like Lenovo X1 Carbon and Chromebooks that we already think
have broken firmware.
It seems to me like it's better to add quirks for firmware that we think is
broken than for firmware that seems unusual but correct.
[bhelgaas: comment and commit log]
Link: https://lore.kernel.org/linux-pci/20220518220754.GA7911@bhelgaas/
Link: https://lore.kernel.org/r/20220519152150.6135-4-hdegoede@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Benoit Grégoire <benoitg@coeus.ca>
Cc: Hui Wang <hui.wang@canonical.com>
To avoid unusable space that some firmware includes in PCI host bridge
_CRS, Linux currently excludes E820 reserved regions from _CRS windows; see
4dc2287c18 ("x86: avoid E820 regions when allocating address space").
However, some systems supply E820 reserved regions that cover the entire
memory window from _CRS, so clipping them out leaves no space for hot-added
or uninitialized PCI devices.
For example, from a Lenovo IdeaPad 3 15IIL 81WE:
BIOS-e820: [mem 0x4bc50000-0xcfffffff] reserved
pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
pci 0000:00:15.0: BAR 0: [mem 0x00000000-0x00000fff 64bit]
pci 0000:00:15.0: BAR 0: no space for [mem size 0x00001000 64bit]
Add quirks to disable the E820 clipping for machines known to do this.
A single DMI_PRODUCT_VERSION "IIL" quirk matches all the below:
Lenovo IdeaPad 3 14IIL05
Lenovo IdeaPad 3 15IIL05
Lenovo IdeaPad 3 17IIL05
Lenovo IdeaPad 5 14IIL05
Lenovo IdeaPad 5 15IIL05
Lenovo IdeaPad Slim 7 14IIL05
Lenovo IdeaPad Slim 7 15IIL05
Lenovo IdeaPad S145-15IIL
Lenovo IdeaPad S340-14IIL
Lenovo IdeaPad S340-15IIL
Lenovo IdeaPad C340-15IIL
Lenovo BS145-15IIL
Lenovo V14-IIL
Lenovo V15-IIL
Lenovo V17-IIL
Lenovo Yoga C940-14IIL
Lenovo Yoga S740-14IIL
Lenovo Yoga Slim 7 14IIL05
Lenovo Yoga Slim 7 15IIL05
in addition to the following that don't actually need it because they have
no E820 reserved regions that overlap _CRS windows:
Lenovo IdeaPad Flex 5 14IIL05
Lenovo IdeaPad Flex 5 15IIL05
Lenovo ThinkBook 14-IIL
Lenovo ThinkBook 15-IIL
Lenovo Yoga S940-14IIL
Other quirks match these:
Acer Spin 5 (SP513-54N)
Clevo X170KM-G Barebone
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459 Lenovo Yoga C940-14IIL
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214259 Clevo X170KM Barebone
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Lenovo IdeaPad 3 15IIL05
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1871793 Lenovo IdeaPad 5 14IIL05
Link: https://bugs.launchpad.net/bugs/1878279 Lenovo IdeaPad 5 14IIL05
Link: https://bugs.launchpad.net/bugs/1880172 Lenovo IdeaPad 3 14IIL05
Link: https://bugs.launchpad.net/bugs/1884232 Acer Spin SP513-54N
Link: https://bugs.launchpad.net/bugs/1921649 Lenovo IdeaPad S145
Link: https://bugs.launchpad.net/bugs/1931715 Lenovo IdeaPad S145
Link: https://bugs.launchpad.net/bugs/1932069 Lenovo BS145-15IIL
Link: https://lore.kernel.org/r/20220519152150.6135-3-hdegoede@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Benoit Grégoire <benoitg@coeus.ca>
Cc: Hui Wang <hui.wang@canonical.com>
Some firmware supplies PCI host bridge _CRS that includes address space
unusable by PCI devices, e.g., space occupied by host bridge registers or
used by hidden PCI devices.
To avoid this unusable space, Linux currently excludes E820 reserved
regions from _CRS windows; see 4dc2287c18 ("x86: avoid E820 regions when
allocating address space").
However, this use of E820 reserved regions to clip things out of _CRS is
not supported by ACPI, UEFI, or PCI Firmware specs, and some systems have
E820 reserved regions that cover the entire memory window from _CRS.
4dc2287c18 clips the entire window, leaving no space for hot-added or
uninitialized PCI devices.
For example, from a Lenovo IdeaPad 3 15IIL 81WE:
BIOS-e820: [mem 0x4bc50000-0xcfffffff] reserved
pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
pci 0000:00:15.0: BAR 0: [mem 0x00000000-0x00000fff 64bit]
pci 0000:00:15.0: BAR 0: no space for [mem size 0x00001000 64bit]
Future patches will add quirks to enable/disable E820 clipping
automatically.
Add a "pci=no_e820" kernel command line option to disable clipping with
E820 reserved regions. Also add a matching "pci=use_e820" option to enable
clipping with E820 reserved regions if that has been disabled by default by
further patches in this patch-set.
Both options taint the kernel because they are intended for debugging and
workaround purposes until a quirk can set them automatically.
[bhelgaas: commit log, add printk]
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Lenovo IdeaPad 3
Link: https://lore.kernel.org/r/20220519152150.6135-2-hdegoede@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Benoit Grégoire <benoitg@coeus.ca>
Cc: Hui Wang <hui.wang@canonical.com>
Clear the MSI bit in ISTATUS_LOCAL register after reading it, but
before reading and handling individual MSI bits from the ISTATUS_MSI
register. This avoids a potential race where new MSI bits may be set
on the ISTATUS_MSI register after it was read and be missed when the
MSI bit in the ISTATUS_LOCAL register is cleared.
ISTATUS_LOCAL is a read/write/clear register; the register's bits
are set when the corresponding interrupt source is activated. Each
source is independent and thus multiple sources may be active
simultaneously. The processor can monitor and clear status
bits. If one or more ISTATUS_LOCAL interrupt sources are active,
the RootPort issues an interrupt towards the processor (on
the AXI domain). Bit 28 of this register reports an MSI has been
received by the RootPort.
ISTATUS_MSI is a read/write/clear register. Bits 31-0 are asserted
when an MSI with message number 31-0 is received by the RootPort.
The processor must monitor and clear these bits.
Effectively, Bit 28 of ISTATUS_LOCAL informs the processor that
an MSI has arrived at the RootPort and ISTATUS_MSI informs the
processor which MSI (in the range 0 - 31) needs handling.
Reported by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/
Link: https://lore.kernel.org/r/20220517141622.145581-1-daire.mcnamara@microchip.com
Fixes: 6f15a9c9f9 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver")
Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
When a Root Port or Root Complex Event Collector receives an error Message
e.g., ERR_COR, it sets PCI_ERR_ROOT_COR_RCV in the Root Error Status
register and logs the Requester ID in the Error Source Identification
register. If it receives a second ERR_COR Message before software clears
PCI_ERR_ROOT_COR_RCV, hardware sets PCI_ERR_ROOT_MULTI_COR_RCV and the
Requester ID is lost.
In the following scenario, PCI_ERR_ROOT_MULTI_COR_RCV was never cleared:
- hardware receives ERR_COR message
- hardware sets PCI_ERR_ROOT_COR_RCV
- aer_irq() entered
- aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
- aer_irq(): now status == PCI_ERR_ROOT_COR_RCV
- hardware receives second ERR_COR message
- hardware sets PCI_ERR_ROOT_MULTI_COR_RCV
- aer_irq(): pci_write_config_dword(PCI_ERR_ROOT_STATUS, status)
- PCI_ERR_ROOT_COR_RCV is cleared; PCI_ERR_ROOT_MULTI_COR_RCV is set
- aer_irq() entered again
- aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
- aer_irq(): now status == PCI_ERR_ROOT_MULTI_COR_RCV
- aer_irq() exits because PCI_ERR_ROOT_COR_RCV not set
- PCI_ERR_ROOT_MULTI_COR_RCV is still set
The same problem occurred with ERR_NONFATAL/ERR_FATAL Messages and
PCI_ERR_ROOT_UNCOR_RCV and PCI_ERR_ROOT_MULTI_UNCOR_RCV.
Fix the problem by queueing an AER event and clearing the Root Error Status
bits when any of these bits are set:
PCI_ERR_ROOT_COR_RCV
PCI_ERR_ROOT_UNCOR_RCV
PCI_ERR_ROOT_MULTI_COR_RCV
PCI_ERR_ROOT_MULTI_UNCOR_RCV
See the bugzilla link for details from Eric about how to reproduce this
problem.
[bhelgaas: commit log, move repro details to bugzilla]
Fixes: e167bfcaa4 ("PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215992
Link: https://lore.kernel.org/r/20220418150237.1021519-1-sathyanarayanan.kuppuswamy@linux.intel.com
Reported-by: Eric Badger <ebadger@purestorage.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Clear FLR (Function Level Reset) from device capabilities
registers for all physical functions.
During FLR, the Margining Lane Status and Margining Lane Control
registers should not be reset, as per PCIe specification.
However, the controller incorrectly resets these registers upon FLR.
This causes PCISIG compliance FLR test to fail. Hence preventing
all functions from advertising FLR support if flag quirk_disable_flr
is set.
Link: https://lore.kernel.org/r/1635165075-89864-1-git-send-email-pthombar@cadence.com
Signed-off-by: Parshuram Thombare <pthombar@cadence.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
During the boot process all the PCI devices are assigned default PCI-MSI
IRQ domain including VMD endpoint devices. If interrupt-remapping is
enabled by IOMMU, the PCI devices except VMD get new INTEL-IR-MSI IRQ
domain. And VMD is supposed to create and assign a separate VMD-MSI IRQ
domain for its child devices in order to support MSI-X remapping
capabilities.
Now when MSI-X remapping in VMD is disabled in order to improve
performance, VMD skips VMD-MSI IRQ domain assignment process to its
child devices. Thus the devices behind VMD get default PCI-MSI IRQ
domain instead of INTEL-IR-MSI IRQ domain when VMD creates root bus and
configures child devices.
As a result host OS fails to boot and DMAR errors were observed when
interrupt remapping was enabled on Intel Icelake CPUs. For instance:
DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [0xe2:0x00.0] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request
To fix this issue, dev_msi_info struct in dev struct maintains correct
value of IRQ domain. VMD will use this information to assign proper IRQ
domain to its child devices when it doesn't create a separate IRQ domain.
Link: https://lore.kernel.org/r/20220511095707.25403-2-nirmal.patel@linux.intel.com
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
The sysfs sriov_numvfs_store() path acquires the device lock before the
config space access lock:
sriov_numvfs_store
device_lock # A (1) acquire device lock
sriov_configure
vfio_pci_sriov_configure # (for example)
vfio_pci_core_sriov_configure
pci_disable_sriov
sriov_disable
pci_cfg_access_lock
pci_wait_cfg # B (4) wait for dev->block_cfg_access == 0
Previously, pci_dev_lock() acquired the config space access lock before the
device lock:
pci_dev_lock
pci_cfg_access_lock
dev->block_cfg_access = 1 # B (2) set dev->block_cfg_access = 1
device_lock # A (3) wait for device lock
Any path that uses pci_dev_lock(), e.g., pci_reset_function(), may
deadlock with sriov_numvfs_store() if the operations occur in the sequence
(1) (2) (3) (4).
Avoid the deadlock by reversing the order in pci_dev_lock() so it acquires
the device lock before the config space access lock, the same as the
sriov_numvfs_store() path.
[bhelgaas: combined and adapted commit log from Jay Zhou's independent
subsequent posting:
https://lore.kernel.org/r/20220404062539.1710-1-jianjay.zhou@huawei.com]
Link: https://lore.kernel.org/linux-pci/1583489997-17156-1-git-send-email-yangyicong@hisilicon.com/
Also-posted-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The controller may have been left out of reset by the bootloader,
in which case, before the powerup sequence, the controller will be
found preconfigured with values that were set before booting the
kernel: this produces a controller failure, with the result of
a failure during the mtk_pcie_startup_port() sequence as the PCIe
link never gets up.
To ensure that we get a clean start in an expected state, assert
both the PHY and MAC resets before executing the controller
power-up sequence.
Link: https://lore.kernel.org/r/20220404144858.92390-1-angelogioacchino.delregno@collabora.com
Fixes: d3bf75b579 ("PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
According to the PCIe standard the PERST# signal (reset-gpio in
fsl,imx* compatible dts) should be kept asserted for at least 100 usec
before the PCIe refclock is stable, should be kept asserted for at
least 100 msec after the power rails are stable and the host should wait
at least 100 msec after it is de-asserted before accessing the
configuration space of any attached device.
From PCIe CEM r2.0, sec 2.6.2
T-PVPERL: Power stable to PERST# inactive - 100 msec
T-PERST-CLK: REFCLK stable before PERST# inactive - 100 usec.
From PCIe r5.0, sec 6.6.1
With a Downstream Port that does not support Link speeds greater than
5.0 GT/s, software must wait a minimum of 100 ms before sending a
Configuration Request to the device immediately below that Port.
Failure to do so could prevent PCIe devices to be working correctly,
and this was experienced with real devices.
Move reset assert to imx6_pcie_assert_core_reset(), this way we ensure
that PERST# is asserted before enabling any clock, move de-assert to the
end of imx6_pcie_deassert_core_reset() after the clock is enabled and
deemed stable and add a new delay of 100 msec just afterward.
Link: https://lore.kernel.org/all/20220211152550.286821-1-francesco.dolcini@toradex.com
Link: https://lore.kernel.org/r/20220404081509.94356-1-francesco.dolcini@toradex.com
Fixes: bb38919ec5 ("PCI: imx6: Add support for i.MX6 PCIe controller")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
Calling pci_set_power_state() to put the given device into D0 in
pci_pm_thaw_noirq() may cause it to restore the device's BARs, which is
redundant before calling pci_restore_state(), so replace it with a direct
pci_power_up() call followed by pci_update_current_state() if it returns a
nonzero value, in analogy with pci_pm_default_resume_early().
Avoid code duplication by introducing a wrapper function to contain the
repeating pattern and calling it in both places.
Link: https://lore.kernel.org/r/3639079.MHq7AAxBmi@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The part of pci_set_power_state() related to transitions into
low-power states is unnecessary convoluted, so clearly divide it
into the D3cold special case and the general case covering all of
the other states.
Also fix a potential issue with calling pci_bus_set_current_state()
to set the current state of all devices on the subordinate bus to
D3cold without checking if the power state of the parent bridge has
really changed to D3cold.
Link: https://lore.kernel.org/r/2139440.Mh6RI2rZIc@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
One of the two callers of pci_power_up() invokes
pci_update_current_state() and pci_restore_state() right after calling
it, in which case running the part of it happening after the mandatory
transition delays is redundant, so move that part out of it into a new
function called pci_set_full_power_state() that will be invoked from
pci_set_power_state() which is the other caller of pci_power_up().
Link: https://lore.kernel.org/r/1942150.usQuhbGJ8B@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>