Commit Graph

1156 Commits

Author SHA1 Message Date
Keith Busch
cfcbfe5cb1 PCI: Don't fallback to bus reset after failed slot reset
If a bus has hotplug slots that implement the slot's reset_slot callback,
it is not safe to do the non-slot specific bus reset, so don't fallback to
it. If a slot reset does fail, the subsequent bus reset will attempt a 2nd
link reset on top of previous and fail to handle the hotplug events.

Fixes: 8238cb69c0 ("PCI: Make reset_subordinate hotplug safe")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260421150644.3543733-1-kbusch@meta.com
2026-04-27 17:26:59 -05:00
Bjorn Helgaas
4224e91fea Merge branch 'pci/misc'
- Warn only once about invalid ACS kernel parameter format (Richard Cheng)

- Suppress FW_BUG warning when writing sysfs 'numa_node' with the current
  value (Li RongQing)

- Drop redundant 'depends on PCI' from Kconfig (Julian Braha)

* pci/misc:
  PCI: Clean up dead code in Kconfig
  PCI/sysfs: Suppress FW_BUG warning when NUMA node already matches
  PCI: Use pr_warn_once() for ACS parameter parse failure
  PCI: of: Reduce severity of missing of_root error message
2026-04-13 12:50:54 -05:00
Bjorn Helgaas
a09007a782 Merge branch 'pci/vga'
- Return vga_get_uninterruptible() back to userspace in the
  /dev/vga_arbiter path so user can tell whether VGA routing was updated
  (Simon Richter)

- Make pci_set_vga_state() fail if bridge doesn't support VGA routing,
  i.e., PCI_BRIDGE_CTL_VGA is not writable, and return errors up to
  vga_get() callers (Simon Richter)

* pci/vga:
  PCI/VGA: Fail pci_set_vga_state() if VGA decoding not supported
  PCI/VGA: Pass errors from pci_set_vga_state() up
  PCI/VGA: Pass vga_get_uninterruptible() errors to userspace
2026-04-13 12:50:06 -05:00
Bjorn Helgaas
12b56ec723 Merge branch 'pci/reset'
- Update slot handling so all ARI functions are treated as being in the
  same slot.  They're all reset by Secondary Bus Reset, but previously
  drivers of ARI functions that appeared to be on a non-zero device weren't
  notified and fatal hardware errors could result (Keith Busch)

- Make sysfs reset_subordinate hotplug safe to avoid spurious hotplug
  events (Keith Busch)

- Consolidate bus iteration across the _lock(), _unlock(), and _trylock()
  functions for pci_bus and pci_slot (Ilpo Järvinen)

- Hide Secondary Bus Reset ('bus') from sysfs reset_methods if masked by
  CXL because it has no effect (Vidya Sagar)

* pci/reset:
  PCI/CXL: Hide SBR from reset_methods if masked by CXL
  PCI: Consolidate pci_bus/slot_lock/unlock/trylock()
  PCI: Make reset_subordinate hotplug safe
  PCI: Allow all bus devices to use the same slot
  PCI: Rename __pci_bus_reset() and __pci_slot_reset()
2026-04-13 12:50:05 -05:00
Bjorn Helgaas
31e39c9ae1 Merge branch 'pci/atomics'
- Don't enable AtomicOps by RCiEPs since none of them need Atomic Ops and
  we can't tell whether the Root Complex would support them (Gerd Bayer)

- Enable AtomicOps only if we know the Root Port supports them (Gerd Bayer)

* pci/atomics:
  PCI: Update PCIe spec references for AtomicOps
  PCI: Enable AtomicOps only if Root Port supports them
  PCI: Do not enable AtomicOps by RCiEPs
2026-04-13 12:50:02 -05:00
Gerd Bayer
8e69214402 PCI: Update PCIe spec references for AtomicOps
Point to the relevant sections in the most recent release 7.0 of the PCIe
spec. Text has mostly just moved around without any semantic change.

Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260330-fix_pciatops-v7-3-f601818417e8@linux.ibm.com
2026-04-03 16:39:31 -05:00
Gerd Bayer
1ae8c4ce15 PCI: Enable AtomicOps only if Root Port supports them
When inspecting the config space of a Connect-X physical function in an
s390 system after it was initialized by the mlx5_core device driver, we
found the function to be enabled to request AtomicOps despite the Root Port
lacking support for completing them:

  00:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
          Subsystem: Mellanox Technologies Device 0002
          DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                   AtomicOpsCtl: ReqEn+

On s390 and many virtualized guests, the Endpoint is visible but the Root
Port is not.  In this case, pci_enable_atomic_ops_to_root() previously
enabled AtomicOps in the Endpoint even though it can't tell whether the
Root Port supports them as a completer.

Change pci_enable_atomic_ops_to_root() to fail if there's no Root Port or
the Root Port doesn't support AtomicOps.

Fixes: 430a23689d ("PCI: Add pci_enable_atomic_ops_to_root()")
Reported-by: Alexander Schmidt <alexs@linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
[bhelgaas: commit log, check RP first to simplify flow]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260330-fix_pciatops-v7-2-f601818417e8@linux.ibm.com
2026-04-03 16:39:31 -05:00
Gerd Bayer
03ec922f00 PCI: Do not enable AtomicOps by RCiEPs
Since Root Complex Integrated Endpoints (RCiEPs) attach to a bus that has
no bridge device describing the Root Port, the capability to complete
AtomicOps requests cannot be determined with PCIe methods.

Change default of pci_enable_atomic_ops_to_root() to not enable AtomicOps
requests on RCiEPs.

As far as we know, there are no RCiEPs that need AtomicOps (see Link
below).

Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260330-fix_pciatops-v7-1-f601818417e8@linux.ibm.com
2026-04-03 16:39:21 -05:00
Simon Richter
5b6471fc72 PCI/VGA: Fail pci_set_vga_state() if VGA decoding not supported
PCI bridges are allowed to refuse activating VGA decoding, by simply
ignoring attempts to set the bit that enables it, so after setting the bit,
read it back to verify.

One example of such a bridge is the root bridge in IBM PowerNV, but this is
also useful for GPU passthrough into virtual machines, where it is
difficult to set up routing for legacy IO through IOMMU.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
[bhelgaas: subject, add comment about VGA Enable writability]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260307173538.763188-5-Simon.Richter@hogyros.de
2026-03-30 10:44:50 -05:00
Vidya Sagar
702c1d56c7 PCI/CXL: Hide SBR from reset_methods if masked by CXL
Per CXL r3.1, sec 8.1.5.2, the Secondary Bus Reset (SBR) bit in the Bridge
Control register of a CXL port has no effect unless the "Unmask SBR" bit in
the Port Control Extensions Register is set.

After b1956e2d07 ("PCI/CXL: Fail bus reset if upstream CXL Port has SBR
masked"), Linux checks the "Unmask SBR" bit in pci_reset_bus_function().
But when probe==true, it previously returned 0, incorrectly indicating that
SBR is a viable reset method for the device.

As a result, "bus" is listed in the device's "reset_method" attribute even
though the hardware is incapable of performing it. If a user writes "bus"
to "reset_method" or triggers a reset that falls back to SBR, the operation
fails with "write error: Inappropriate ioctl for device".

If the link is operating in CXL mode (pcie_is_cxl()), return -ENOTTY
immediately unless "Unmask SBR" is set, regardless of the probe argument.
This ensures that "bus" is not advertised in "reset_methods" when the
hardware prevents it, improving clarity for users and aligning the sysfs
capability report with actual hardware behavior.

Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
[bhelgaas: commit log, use pcie_is_cxl()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20260225133801.30231-1-vidyas@nvidia.com
2026-03-17 14:53:03 -05:00
Richard Cheng
0026bb20d1 PCI: Use pr_warn_once() for ACS parameter parse failure
When the ACS command line parameter cannot be parsed, the kernel skips
applying the requested ACS override. This indicates an invalid boot
parameter and should not be logged at informational level.

Use pr_warn_once() so the message is surfaced as a warning while still
avoiding repeated log spam during device enumeration.

Signed-off-by: Richard Cheng <icheng@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Tushar Dave <tdave@nvidia.com>
Link: https://patch.msgid.link/20260312115441.8168-1-icheng@nvidia.com
2026-03-12 16:51:07 -05:00
Ilpo Järvinen
7b193af58b PCI: Consolidate pci_bus/slot_lock/unlock/trylock()
pci_bus/slot_lock/unlock/trylock() largely duplicate the bus iteration loop
with variation only due to slot filter handling. The only differences in
the loops is where the struct bus is found (directly in the argument vs in
slot->bus) and whether slot filter is applied. Those differences are simple
to handle using function parameters.

Consolidate the bus iteration loop to one place by creating
__pci_bus_{lock,unlock,trylock}() and call them from the non-underscore
locking functions.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260304122139.1479-1-ilpo.jarvinen@linux.intel.com
2026-03-09 15:40:28 -05:00
Keith Busch
8238cb69c0 PCI: Make reset_subordinate hotplug safe
Use the slot reset method when resetting the bridge if the bus contains
hot plug slots. This fixes spurious hot plug events that are triggered
by the secondary bus reset that bypasses the slot's detection disabling.

Resetting a bridge's subordinate bus can be done like this:

  # echo 1 > /sys/bus/pci/devices/0000:50:01.0/reset_subordinate

Prior to this patch, an example kernel message may show something like:

  pcieport 0000:50:01.0: pciehp: Slot(40): Link Down

With this change, the pciehp driver ignores the link event during the
reset, so may show this message instead:

  pcieport 0000:50:01.0: pciehp: Slot(40): Link Down/Up ignored

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260217160836.2709885-4-kbusch@meta.com
2026-03-09 15:40:28 -05:00
Keith Busch
84226677e0 PCI: Rename __pci_bus_reset() and __pci_slot_reset()
Make the code a little easier to navigate with more descriptive function
names. The two renamed functions here "try" to do to a reset, so make that
clear in the name to distinguish them from other similarly named functions:

  __pci_reset_bus()    -> pci_try_reset_bus()
  __pci_reset_slot()   -> pci_try_reset_slot()

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260217160836.2709885-2-kbusch@meta.com
2026-03-09 15:27:45 -05:00
Shuai Xue
a8aeea1bf3 PCI/AER: Clear only error bits in PCIe Device Status
Currently, pcie_clear_device_status() clears the entire PCIe Device Status
register (PCI_EXP_DEVSTA) by writing back the value read from the register,
which affects not only the error status bits but also other writable bits.

According to PCIe r7.0, sec 7.5.3.5, this register contains:

  - RW1C error status bits (CED, NFED, FED, URD at bits 0-3): These are the
    four error status bits that need to be cleared.

  - Read-only bits (AUXPD at bit 4, TRPND at bit 5): Writing to these has
    no effect.

  - Emergency Power Reduction Detected (bit 6): A RW1C non-error bit
    introduced in PCIe r5.0 (2019). This is currently the only writable
    non-error bit in the Device Status register. Unconditionally clearing
    this bit can interfere with other software components that rely on this
    power management indication.

  - Reserved bits (RsvdZ): These bits are required to be written as zero.
    Writing 1s to them (as the current implementation may do) violates the
    specification.

To prevent unintended side effects, modify pcie_clear_device_status() to
only write 1s to the four error status bits (CED, NFED, FED, URD), leaving
the Emergency Power Reduction Detected bit and reserved bits unaffected.

Fixes: ec752f5d54 ("PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL")
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260211124624.49656-1-xueshuai@linux.alibaba.com
2026-02-23 09:00:33 -06:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
1c2b4a4c2b pci-v7.0-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmmKJO4UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vzotA/+OGSOPOs9hWd+OwNF5Dm2WA81yG/3
 K3Jx5uMuPoSjduMbPhVcib02Mr6YDJTa6WlYNVa76ADs2G6HxcVMFHutlYudSVcl
 umSF48FnyeH1LTba88dRoVj4DB47Cue+BfhYY2L0ZtxmjQq/NRuDFAaGBh54uNeF
 Gcdgr52QlM01n1X6yKvl7vE9gPdcPH80L256ssHAm6oSOHI1SPc6gqEKUUD02f8G
 FtzfTUAq/cWYjlY3VoS5GKtdHxFYuXqC5WfbURhJ11o/nVJY9k1Zx8n4eI1tmAtN
 7q692xjWSQJZlzepOBBEyjFUpIiy80tZ43z2ptRRBeI/n/qMmGPAov/g4MzegBWG
 IAEHTAp/xx1Wra1ynr7RNvYVcPpXm2TEim8gIGah9DkHbNgbu7ing+OO7DnQuyfD
 2h4hGD2622o6uikqkwzVd4mYuIcFu7SA6yROZhFn83BRnz0QOQienDrDlvOB8XCV
 EodLAOMc2KClvOmmriFMy11PH7MFFoXexV6KS83VfDJHi4+XzBsy0w6TXTohcA9s
 JTPIkSWqf/u6SrdLjXlFGyyJ2/KCgRiXFIBhhtYBMhDuuO7nG+mcSVzMa1PT0s6C
 PF+QoT7sJof/5VMJ4o3BgPrPkD3CQICrlt8XIt5I8ngsy6RZRQ5rt+pUix7Shcn8
 DgcunuINYfQtkfw=
 =LIjp
 -----END PGP SIGNATURE-----

Merge tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Don't try to enable Extended Tags on VFs since that bit is Reserved
     and causes misleading log messages (Håkon Bugge)

   - Initialize Endpoint Read Completion Boundary to match Root Port,
     regardless of ACPI _HPX (Håkon Bugge)

   - Apply _HPX PCIe Setting Record only to AER configuration, and only
     when OS owns PCIe hotplug but not AER, to avoid clobbering Extended
     Tag and Relaxed Ordering settings (Håkon Bugge)

  Resource management:

   - Move CardBus code to setup-cardbus.c and only build it when
     CONFIG_CARDBUS is set (Ilpo Järvinen)

   - Fix bridge window alignment with optional resources, where
     additional alignment requirement was previously lost (Ilpo
     Järvinen)

   - Stop over-estimating bridge window size since they are now assigned
     without any gaps between them (Ilpo Järvinen)

   - Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening
     for nested bridges and endpoints (Ilpo Järvinen)

   - Add pbus_mem_size_optional() to handle sizes of optional resources
     (SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)

   - Don't claim disabled bridge windows to avoid spurious claim
     failures (Ilpo Järvinen)

  Driver binding:

   - Fix device reference leak in pcie_port_remove_service() (Uwe
     Kleine-König)

   - Move pcie_port_bus_match() and pcie_port_bus_type to PCIe-specific
     portdrv.c (Uwe Kleine-König)

   - Convert portdrv to use pcie_port_bus_type.probe() and .remove()
     callbacks so .probe() and .remove() can eventually be removed from
     struct device_driver (Uwe Kleine-König)

  Error handling:

   - Clear stale errors on reporting agents upon probe so they don't
     look like recent errors (Lukas Wunner)

   - Add generic RAS tracepoint for hotplug events (Shuai Xue)

   - Add RAS tracepoint for link speed changes (Shuai Xue)

  Power management:

   - Avoid redundant delay on transition from D3hot to D3cold if the
     device was already in D3hot (Brian Norris)

   - Prevent runtime suspend until devices are fully initialized to
     avoid saving incompletely configured device state (Brian Norris)

  Power control:

   - Add power_on/off callbacks with generic signature to pwrseq,
     tc9563, and slot drivers so they can be used by pwrctrl core
     (Manivannan Sadhasivam)

   - Add PCIe M.2 connector support to the slot pwrctrl driver
     (Manivannan Sadhasivam)

   - Switch to pwrctrl interfaces to create, destroy, and power on/off
     devices, calling them from host controller drivers instead of the
     PCI core (Manivannan Sadhasivam)

   - Drop qcom .assert_perst() callbacks since this is now done by the
     controller driver instead of the pwrctrl driver (Manivannan
     Sadhasivam)

  Virtualization:

   - Remove an incorrect unlock in pci_slot_trylock() error handling
     (Jinhui Guo)

   - Lock the bridge device for slot reset (Keith Busch)

   - Enable ACS after IOMMU configuration on OF platforms so ACS is
     enabled an all devices; previously the first device enumerated
     (typically a Root Port) didn't have ACS enabled (Manivannan
     Sadhasivam)

   - Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to
     work around hardware erratum; previously ACS SV was only
     temporarily disabled, which worked for enumeration but not after
     reset (Manivannan Sadhasivam)

  Peer-to-peer DMA:

   - Release per-CPU pgmap ref when vm_insert_page() fails to avoid hang
     when removing the PCI device (Hou Tao)

   - Remove incorrect p2pmem_alloc_mmap() warning about page refcount
     (Hou Tao)

  Endpoint framework:

   - Add configfs sub-groups synchronously to avoid NULL pointer
     dereference when racing with removal (Liu Song)

   - Fix swapped parameters in pci_{primary/secondary}_epc_epf_unlink()
     functions (Manikanta Maddireddy)

  ASPEED PCIe controller driver:

   - Add ASPEED Root Complex DT binding and driver (Jacky Chou)

  Freescale i.MX6 PCIe controller driver:

   - Add DT binding and driver support for an optional external refclock
     in addition to the refclock from the internal PLL (Richard Zhu)

   - Fix CLKREQ# control so host asserts it during enumeration and
     Endpoints can use it afterwards to exit the L1.2 link state
     (Richard Zhu)

  NVIDIA Tegra PCIe controller driver:

   - Export irq_domain_free_irqs() to allow PCI/MSI drivers that tear
     down MSI domains to be built as modules (Aaron Kling)

   - Allow pci-tegra to be built as a module (Aaron Kling)

  NVIDIA Tegra194 PCIe controller driver:

   - Relax Kconfig so tegra194 can be built for platforms beyond
     Tegra194 (Vidya Sagar)

  Qualcomm PCIe controller driver:

   - Merge SC8180x DT binding into SM8150 (Krzysztof Kozlowski)

   - Move SDX55, SDM845, QCS404, IPQ5018, IPQ6018, IPQ8074 Gen3,
     IPQ8074, IPQ4019, IPQ9574, APQ8064, MSM8996, APQ8084 to dedicated
     schema (Krzysztof Kozlowski)

   - Add DT binding and driver support for SA8255p Endpoint being
     configured by firmware (Mrinmay Sarkar)

   - Parse PERST# from all PCIe bridge nodes for future platforms that
     will have PERST# in Switch Downstream Ports as well as in Root
     Ports (Manivannan Sadhasivam)

  Renesas RZ/G3S PCIe controller driver:

   - Use pci_generic_config_write() since the writability provided by
     the custom wrapper is unnecessary (Claudiu Beznea)

  SOPHGO PCIe controller driver:

   - Disable ASPM L0s and L1 on Sophgo 2044 PCIe Root Ports (Inochi
     Amaoto)

  Synopsys DesignWare PCIe controller driver:

   - Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
     pointer to the preceding Capability, to allow removal of
     Capabilities that are advertised but not fully implemented (Qiang
     Yu)

   - Remove MSI and MSI-X Capabilities in platforms that can't support
     them, so the PCI core automatically falls back to INTx (Qiang Yu)

   - Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status
     for drivers that support this (Shawn Lin)

   - Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if
     link is not up to avoid an unnecessary timeout (Manivannan
     Sadhasivam)

   - Revert dw-rockchip, qcom, and DWC core changes that used link-up
     IRQs to trigger enumeration instead of waiting for link to be up
     because the PCI core doesn't allocate bus number space for
     hierarchies that might be attached (Niklas Cassel)

   - Make endpoint iATU entry for MSI permanent instead of programming
     it dynamically, which is slow and racy with respect to other
     concurrent traffic, e.g., eDMA (Koichiro Den)

   - Use iMSI-RX MSI target address when possible to fix endpoints using
     32-bit MSI (Shawn Lin)

   - Allow DWC host controller driver probe to continue if device is not
     found or found but inactive; only fail when there's an error with
     the link (Manivannan Sadhasivam)

   - For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers
     are not accessible after PME_Turn_Off, simply wait 10ms instead of
     polling for L2/L3 Ready (Richard Zhu)

   - Use multiple iATU entries to map large bridge windows and DMA
     ranges when necessary instead of failing (Samuel Holland)

   - Add EPC dynamic_inbound_mapping feature bit for Endpoint
     Controllers that can update BAR inbound address translation without
     requiring EPF driver to clear/reset the BAR first, and advertise it
     for DWC-based Endpoints (Koichiro Den)

   - Add EPC subrange_mapping feature bit for Endpoint Controllers that
     can map multiple independent inbound regions in a single BAR,
     implement subrange mapping, advertise it for DWC-based Endpoints,
     and add Endpoint selftests for it (Koichiro Den)

   - Make resizable BARs work for Endpoint multi-PF configurations;
     previously it only worked for PF 0 (Aksh Garg)

   - Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings,
     and Address Match Mode (Aksh Garg)

   - Set up iATU when ECAM is enabled; previously IO and MEM outbound
     windows weren't programmed, and ECAM-related iATU entries weren't
     restored after suspend/resume, so config accesses failed (Krishna
     Chaitanya Chundru)

  Miscellaneous:

   - Use system_percpu_wq and WQ_PERCPU to explicitly request per-CPU
     work so WQ_UNBOUND can eventually be removed (Marco Crivellari)"

* tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (176 commits)
  PCI/bwctrl: Disable BW controller on Intel P45 using a quirk
  PCI: Disable ACS SV for IDT 0x8090 switch
  PCI: Disable ACS SV for IDT 0x80b5 switch
  PCI: Cache ACS Capabilities register
  PCI: Enable ACS after configuring IOMMU for OF platforms
  PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404]
  PCI: Add ACS quirk for Qualcomm Hamoa & Glymur
  PCI: Use device_lock_assert() to verify device lock is held
  PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
  PCI: Fix pci_slot_lock () device locking
  PCI: Fix pci_slot_trylock() error handling
  PCI: Mark Nvidia GB10 to avoid bus reset
  PCI: Mark ASM1164 SATA controller to avoid bus reset
  PCI: host-generic: Avoid reporting incorrect 'missing reg property' error
  PCI/PME: Replace RMW of Root Status register with direct write
  PCI/AER: Clear stale errors on reporting agents upon probe
  PCI: Don't claim disabled bridge windows
  PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()
  PCI: dwc: Fix missing iATU setup when ECAM is enabled
  PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
  ...
2026-02-11 17:20:38 -08:00
Bjorn Helgaas
93c398be49 Merge branch 'pci/controller/dwc'
- Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
  pointer to the preceding Capability (Qiang Yu)

- Add dw_pcie_remove_capability() and dw_pcie_remove_ext_capability() to
  remove Capabilities that are advertised but not fully implemented (Qiang
  Yu)

- Remove MSI and MSI-X Capabilities for DWC controllers in platforms that
  can't support them, so we automatically fall back to INTx (Qiang Yu)

- Remove MSI-X and DPC Capabilities for Qualcomm platforms that advertise
  but don't support them (Qiang Yu)

- Remove duplicate dw_pcie_ep_hide_ext_capability() function and replace
  with dw_pcie_remove_ext_capability() (Qiang Yu)

- Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status for
  drivers that support this (Shawn Lin)

- Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link
  is not up to avoid an unnecessary timeout (Manivannan Sadhasivam)

- Revert dw-rockchip, qcom, and DWC core changes that used link-up IRQs to
  trigger enumeration instead of waiting for link to be up because the PCI
  core doesn't allocate bus number space for hierarchies that might be
  attached (Niklas Cassel)

- Make endpoint iATU entry for MSI permanent instead of programming it
  dynamically, which is slow and racy with respect to other concurrent
  traffic, e.g., eDMA (Koichiro Den)

- Use iMSI-RX MSI target address when possible to fix endpoints using
  32-bit MSI (Shawn Lin)

- Make dw_pcie_ltssm_status_string() available and use it for logging
  errors in dw_pcie_wait_for_link() (Manivannan Sadhasivam)

- Return -ENODEV when dw_pcie_wait_for_link() finds no devices, -EIO for
  device present but inactive, -ETIMEDOUT for other failures, so callers
  can handle these cases differently (Manivannan Sadhasivam)

- Allow DWC host controller driver probe to continue if device is not found
  or found but inactive; only fail when there's an error with the link
  (Manivannan Sadhasivam)

- For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers are
  not accessible after PME_Turn_Off, simply wait 10ms instead of polling
  for L2/L3 Ready (Richard Zhu)

- Use multiple iATU entries to map large bridge windows and DMA ranges when
  necessary instead of failing (Samuel Holland)

- Rename struct dw_pcie_rp.has_msi_ctrl to .use_imsi_rx for clarity (Qiang
  Yu)

- Add EPC dynamic_inbound_mapping feature bit for Endpoint Controllers that
  can update BAR inbound address translation without requiring EPF driver
  to clear/reset the BAR first, and advertise it for DWC-based Endpoints
  (Koichiro Den)

- Add EPC subrange_mapping feature bit for Endpoint Controllers that can
  map multiple independent inbound regions in a single BAR, implement
  subrange mapping, advertise it for DWC-based Endpoints, and add Endpoint
  selftests for it (Koichiro Den)

- Allow overriding default BAR sizes for pci-epf-test (Niklas Cassel)

- Make resizable BARs work for Endpoint multi-PF configurations; previously
  it only worked for PF 0 (Aksh Garg)

- Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings, and
  Address Match Mode (Aksh Garg)

- Fix issues with outbound iATU index assignment that caused iATU index to
  be out of bounds (Niklas Cassel)

- Clean up iATU index tracking to be consistent (Niklas Cassel)

- Set up iATU when ECAM is enabled; previously IO and MEM outbound windows
  weren't programmed, and ECAM-related iATU entries weren't restored after
  suspend/resume, so config accesses failed (Krishna Chaitanya Chundru)

* pci/controller/dwc:
  PCI: dwc: Fix missing iATU setup when ECAM is enabled
  PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
  PCI: dwc: Fix msg_atu_index assignment
  PCI: dwc: ep: Add comment explaining controller level PTM access in multi PF setup
  PCI: dwc: ep: Add per-PF BAR and inbound ATU mapping support
  PCI: dwc: ep: Fix resizable BAR support for multi-PF configurations
  PCI: endpoint: pci-epf-test: Allow overriding default BAR sizes
  selftests: pci_endpoint: Add BAR subrange mapping test case
  misc: pci_endpoint_test: Add BAR subrange mapping test case
  PCI: endpoint: pci-epf-test: Add BAR subrange mapping test support
  Documentation: PCI: endpoint: Clarify pci_epc_set_bar() usage
  PCI: dwc: ep: Support BAR subrange inbound mapping via Address Match Mode iATU
  PCI: dwc: Advertise dynamic inbound mapping support
  PCI: endpoint: Add BAR subrange mapping support
  PCI: endpoint: Add dynamic_inbound_mapping EPC feature
  PCI: dwc: Rename dw_pcie_rp::has_msi_ctrl to dw_pcie_rp::use_imsi_rx for clarity
  PCI: dwc: Fix grammar and formatting for comment in dw_pcie_remove_ext_capability()
  PCI: dwc: Use multiple iATU windows for mapping large bridge windows and DMA ranges
  PCI: dwc: Remove duplicate dw_pcie_ep_hide_ext_capability() function
  PCI: dwc: Skip waiting for L2/L3 Ready if dw_pcie_rp::skip_l23_wait is true
  PCI: dwc: Fail dw_pcie_host_init() if dw_pcie_wait_for_link() returns -ETIMEDOUT
  PCI: dwc: Rework the error print of dw_pcie_wait_for_link()
  PCI: dwc: Rename and move ltssm_status_string() to pcie-designware.c
  PCI: dwc: Return -EIO from dw_pcie_wait_for_link() if device is not active
  PCI: dwc: Return -ENODEV from dw_pcie_wait_for_link() if device is not found
  PCI: dwc: Use cfg0_base as iMSI-RX target address to support 32-bit MSI devices
  PCI: dwc: ep: Cache MSI outbound iATU mapping
  Revert "PCI: dwc: Don't wait for link up if driver can detect Link Up event"
  Revert "PCI: qcom: Enumerate endpoints based on Link up event in 'global_irq' interrupt"
  Revert "PCI: qcom: Enable MSI interrupts together with Link up if 'Global IRQ' is supported"
  Revert "PCI: qcom: Don't wait for link if we can detect Link Up"
  Revert "PCI: dw-rockchip: Enumerate endpoints based on dll_link_up IRQ"
  Revert "PCI: dw-rockchip: Don't wait for link since we can detect Link Up"
  PCI: dwc: Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link is not up
  PCI: dw-rockchip: Change get_ltssm() to provide L1 Substates info
  PCI: dwc: Add L1 Substates context to ltssm_status of debugfs
  PCI: qcom: Remove DPC Extended Capability
  PCI: qcom: Remove MSI-X Capability for Root Ports
  PCI: dwc: Remove MSI/MSIX capability for Root Port if iMSI-RX is used as MSI controller
  PCI: dwc: Add new APIs to remove standard and extended Capability
  PCI: Add preceding capability position support in PCI_FIND_NEXT_*_CAP macros
2026-02-06 17:09:34 -06:00
Bjorn Helgaas
2095b9dd2e Merge branch 'pci/virtualization'
- Mark ASM1164 SATA controller to avoid bus reset since it fails to train
  the Link after reset (Alex Williamson)

- Mark Nvidia GB10 Root Ports to avoid bus reset since they may fail to
  retrain the link after reset (Johnny-CC Chang)

- Add lockdep and other lock assertions (Ilpo Järvinen)

- Add ACS quirk for Qualcomm Hamoa & Glymur, which provides ACS-like
  features but doesn't advertise an ACS Capability (Krishna Chaitanya
  Chundru)

- Add ACS quirk for Pericom PI7C9X2G404 switches, which fail under load
  when P2P Redirect Request is enabled (Nicolas Cavallari)

- Remove an incorrect unlock in pci_slot_trylock() error handling (Jinhui
  Guo)

- Lock the bridge device for slot reset (Keith Busch)

- Enable ACS after IOMMU configuration on OF platforms so ACS is enabled an
  all devices; previously the first device enumeration (typically a Root
  Port) was omitted (Manivannan Sadhasivam)

- Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to work
  around hardware erratum; previously ACS SV was temporarily disabled,
  which worked for enumeration but not after reset (Manivannan Sadhasivam)

* pci/virtualization:
  PCI: Disable ACS SV for IDT 0x8090 switch
  PCI: Disable ACS SV for IDT 0x80b5 switch
  PCI: Cache ACS Capabilities register
  PCI: Enable ACS after configuring IOMMU for OF platforms
  PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404]
  PCI: Add ACS quirk for Qualcomm Hamoa & Glymur
  PCI: Use device_lock_assert() to verify device lock is held
  PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
  PCI: Fix pci_slot_lock () device locking
  PCI: Fix pci_slot_trylock() error handling
  PCI: Mark Nvidia GB10 to avoid bus reset
  PCI: Mark ASM1164 SATA controller to avoid bus reset
2026-02-06 17:09:26 -06:00
Bjorn Helgaas
401b356520 Merge branch 'pci/trace'
- Add generic RAS tracepoint for hotplug events (Shuai Xue)

- Add RAS tracepoint for link speed changes (Shuai Xue)

* pci/trace:
  Documentation: tracing: Add PCI tracepoint documentation
  PCI: trace: Add RAS tracepoint to monitor link speed changes
  PCI: trace: Add generic RAS tracepoint for hotplug event
2026-02-06 17:09:26 -06:00
Bjorn Helgaas
73b4779864 Merge branch 'pci/resource'
- Build zero-sized resources when a BAR is larger than 4G but
  pci_bus_addr_t or resource_size_t can't represent 64-bit addresses (Ilpo
  Järvinen)

- Fix bridge window alignment with optional resources, where we previously
  lost the additional alignment requirement (Ilpo Järvinen)

- Stop over-estimating bridge window size since we now assign them without
  any gaps between them (Ilpo Järvinen)

- Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening for
  nested bridges and endpoints (Ilpo Järvinen)

- Remove old_size limit from bridge window sizing (Ilpo Järvinen)

- Push realloc check into pbus_size_mem() to simplify callers (Ilpo
  Järvinen)

- Pass bridge window resource to pbus_size_mem() to avoid looking it up
  again (Ilpo Järvinen)

- Use res_to_dev_res() instead of open-coding the same search (Ilpo
  Järvinen)

- Add pci_resource_is_bridge_win() helper (Ilpo Järvinen)

- Add more logging of resource assignment (Ilpo Järvinen)

- Add pbus_mem_size_optional() to handle sizes of optional resources
  (SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)

- Move CardBus code to setup-cardbus.c and only build it when
  CONFIG_CARDBUS is set (Ilpo Järvinen)

- Use scnprintf() instead of sprintf() (Ilpo Järvinen)

- Add pbus_validate_busn() for Bus Number validation (Ilpo Järvinen)

- Don't claim disabled bridge windows to avoid spurious claim failures
  (Ilpo Järvinen)

* pci/resource:
  PCI: Don't claim disabled bridge windows
  PCI: Move CardBus bridge scanning to setup-cardbus.c
  PCI: Add pbus_validate_busn() for Bus Number validation
  PCI: Add dword #defines for Bus Number + Secondary Latency Timer
  PCI: Use scnprintf() instead of sprintf()
  PCI: Handle CardBus-specific params in setup-cardbus.c
  PCI: Separate CardBus setup & build it only with CONFIG_CARDBUS
  PCI: Add 'pci' prefix to struct pci_dev_resource handling functions
  PCI: Use resource_assigned() in setup-bus.c algorithm
  resource: Mark res given to resource_assigned() as const
  PCI: Add pbus_mem_size_optional() to handle optional sizes
  PCI: Check invalid align earlier in pbus_size_mem()
  PCI: Log reset and restore of resources
  PCI: Add pci_resource_is_bridge_win()
  PCI: Fetch dev_res to local var in __assign_resources_sorted()
  PCI: Use res_to_dev_res() in reassign_resources_sorted()
  PCI: Pass bridge window resource to pbus_size_mem()
  PCI: Push realloc check into pbus_size_mem()
  PCI: Remove old_size limit from bridge window sizing
  resource: Increase MAX_IORES_LEVEL to 8
  PCI: Stop over-estimating bridge window size
  PCI: Rewrite bridge window head alignment function
  PCI: Fix bridge window alignment with optional resources
  PCI: Use resource_set_range() that correctly sets ->end
2026-02-06 17:09:25 -06:00
Bjorn Helgaas
85fdfc522a Merge branch 'pci/pm'
- Avoid redundant delay on transition from D3hot to D3cold if the device
  was already in D3hot (Brian Norris)

- Prevent runtime suspend until devices are fully initialized to avoid
  saving incompletely configured device state (Brian Norris)

* pci/pm:
  PCI/PM: Prevent runtime suspend until devices are fully initialized
  PCI/PM: Avoid redundant delays on D3hot->D3cold
2026-02-06 17:09:18 -06:00
Manivannan Sadhasivam
b26d7fb4a5 PCI: Disable ACS SV for IDT 0x80b5 switch
Some IDT switches incorrectly flag an ACS Source Validation error on
completions for config read requests before they have captured the bus
number from a previous config write, even though PCIe r7.0, sec 6.12.1.1,
says that completions are never affected by ACS Source Validation.

The previous workaround, aa667c6408 ("PCI: Workaround IDT switch ACS
Source Validation erratum"), temporarily disabled ACS SV during
enumeration. This was effective but didn't cover the time after switch
reset, when it may lose the captured bus number.

Avoid the issue by preventing use of ACS SV altogether for these switches
by calling pci_disable_broken_acs_cap() from pci_acs_init() and remove the
previous workaround in pci_bus_read_dev_vendor_id().

Removal of ACS SV for these switches means they no longer enforce
everything in REQ_ACS_FLAGS, so downstream devices are not isolated from
each other and the iommu_group may include more devices.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
[bhelgaas: commit log, retain specific erratum details]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://patch.msgid.link/20260102-pci_acs-v3-3-72280b94d288@oss.qualcomm.com
2026-02-06 16:54:12 -06:00
Manivannan Sadhasivam
8f05a5f674 PCI: Cache ACS Capabilities register
The ACS Capability register is read-only. Cache it to allow quirks to
override it and to avoid re-reading it.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://patch.msgid.link/20260102-pci_acs-v3-2-72280b94d288@oss.qualcomm.com
2026-02-06 16:54:12 -06:00
Manivannan Sadhasivam
c41e2fb67e PCI: Enable ACS after configuring IOMMU for OF platforms
Platform, ACPI, or IOMMU drivers call pci_request_acs(), which sets
'pci_acs_enable' to request that ACS be enabled for any devices enumerated
in the future.

OF platforms called pci_enable_acs() for the first device before
of_iommu_configure() called pci_request_acs(), so ACS was never enabled for
that device (typically a Root Port).

Call pci_enable_acs() later, from pci_dma_configure(), after
of_dma_configure() has had a chance to call pci_request_acs().

Here's the call path, showing the move of pci_enable_acs() from
pci_acs_init() to pci_dma_configure(), where it always happens after
pci_request_acs():

    pci_device_add
      pci_init_capabilities
        pci_acs_init
 -        pci_enable_acs
 -          if (pci_acs_enable)                <-- previous test
 -            ...
      device_add
        bus_notify(BUS_NOTIFY_ADD_DEVICE)
          iommu_bus_notifier
            iommu_probe_device
              iommu_init_device
                dev->bus->dma_configure
                  pci_dma_configure            # pci_bus_type.dma_configure
                    of_dma_configure
                      of_iommu_configure
                        pci_request_acs
                          pci_acs_enable = 1   <-- set
 +                  pci_enable_acs
 +                    if (pci_acs_enable)      <-- new test
 +                      ...
        bus_probe_device
          device_initial_probe
            ...
              really_probe
                dev->bus->dma_configure
                  pci_dma_configure            # pci_bus_type.dma_configure
                    ...
                      pci_enable_acs

Note that we will now call pci_enable_acs() twice for every device, first
from the iommu_probe_device() path and again from the really_probe() path.
Presumably that's not an issue since we also call dev->bus->dma_configure()
twice.

For the ACPI platforms, pci_request_acs() is called during ACPI
initialization time itself, independent of the IOMMU framework.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://patch.msgid.link/20260102-pci_acs-v3-1-72280b94d288@oss.qualcomm.com
2026-02-06 16:54:11 -06:00
Ilpo Järvinen
f06e0ad226 PCI: Use device_lock_assert() to verify device lock is held
Multiple function comments say the function should be called with
device_lock held. Check that by calling device_lock_assert().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260116125742.1890-3-ilpo.jarvinen@linux.intel.com
2026-02-06 16:53:57 -06:00
Ilpo Järvinen
183c291caa PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
The function comment for pci_bus_max_d3cold_delay() declares pci_bus_sem
must be held while calling the function which can be automatically checked.
Add lockdep_assert_held(pci_bus_sem) to confirm pci_bus_sem is held.

Also mark the comment line with Context prefix.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260116125742.1890-2-ilpo.jarvinen@linux.intel.com
2026-02-06 16:53:35 -06:00
Keith Busch
1f5e57c622 PCI: Fix pci_slot_lock () device locking
Like pci_bus_lock(), pci_slot_lock() needs to lock the bridge device to
prevent warnings like:

  pcieport 0000:e2:05.0: unlocked secondary bus reset via: pciehp_reset_slot+0x55/0xa0

Take and release the lock for the bridge providing the slot for the
lock/trylock and unlock routines.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260130165953.751063-3-kbusch@meta.com
2026-02-06 16:53:27 -06:00
Jinhui Guo
9368d1ee62 PCI: Fix pci_slot_trylock() error handling
Commit a4e772898f ("PCI: Add missing bridge lock to pci_bus_lock()")
delegates the bridge device's pci_dev_trylock() to pci_bus_trylock() in
pci_slot_trylock(), but it forgets to remove the corresponding
pci_dev_unlock() when pci_bus_trylock() fails.

Before a4e772898f, the code did:

  if (!pci_dev_trylock(dev)) /* <- lock bridge device */
    goto unlock;
  if (dev->subordinate) {
    if (!pci_bus_trylock(dev->subordinate)) {
      pci_dev_unlock(dev);   /* <- unlock bridge device */
      goto unlock;
    }
  }

After a4e772898f the bridge-device lock is no longer taken, but the
pci_dev_unlock(dev) on the failure path was left in place, leading to the
bug.

This yields one of two errors:

  1. A warning that the lock is being unlocked when no one holds it.
  2. An incorrect unlock of a lock that belongs to another thread.

Fix it by removing the now-redundant pci_dev_unlock(dev) on the failure
path.

[Same patch later posted by Keith at
https://patch.msgid.link/20260116184150.3013258-1-kbusch@meta.com]

Fixes: a4e772898f ("PCI: Add missing bridge lock to pci_bus_lock()")
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251212145528.2555-1-guojinhui.liam@bytedance.com
2026-02-06 16:53:27 -06:00
Lukas Wunner
699722468a PCI/PME: Replace RMW of Root Status register with direct write
As of PCIe r7.0, the Root Status register contains a single writeable bit
(PME Status, type RW1C) and otherwise just read-only bits and RsvdZ bits
(which software must write as zero, PCIe r7.0 sec 7.4).

Thus, when clearing the PME Status bit, there's no need to perform a
read-modify-write of the register.  Instead, the bit can be written
directly.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/39f87c99f6c44be3c0371c79e454e6fde7be0d4d.1761497583.git.lukas@wunner.de
2026-02-06 16:40:33 -06:00
Sergey Shtylyov
f7245901de PCI: Check parent for NULL in of_pci_bus_release_domain_nr()
of_pci_bus_find_domain_nr() allows its parent parameter to be NULL but
of_pci_bus_release_domain_nr() (that undoes its effect) doesn't -- that
means it's going to blow up while calling of_get_pci_domain_nr() if the
parent parameter indeed happens to be NULL.  Add the missing NULL check.

Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.

Fixes: c14f7ccc9f ("PCI: Assign PCI domain IDs by ida_alloc()")
Signed-off-by: Sergey Shtylyov <s.shtylyov@auroraos.dev>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260127203944.28588-1-s.shtylyov@auroraos.dev
2026-01-29 12:29:20 -06:00
Ilpo Järvinen
08b3af830a PCI: Handle CardBus-specific params in setup-cardbus.c
Move CardBus window sizing parameters to setup-cardbus.c, which contains
all the other CardBus code.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20251219174036.16738-19-ilpo.jarvinen@linux.intel.com
2026-01-27 16:36:52 -06:00
Brian Norris
51c0996dad PCI/PM: Prevent runtime suspend until devices are fully initialized
Previously, it was possible for a PCI device to be runtime-suspended before
it was fully initialized. When that happened, the suspend process could
save invalid device state, for example, before BAR assignment. Restoring
the invalid state during resume may leave the device non-functional.

Prevent runtime suspend for PCI devices until they are fully initialized by
deferring pm_runtime_enable().

More details on how exactly this may occur:

  1. PCI device is created by pci_scan_slot() or similar

  2. As part of pci_scan_slot(), pci_pm_init() puts the device in D0 and
     prevents runtime suspend prevented via pm_runtime_forbid()

  3. pci_device_add() adds the underlying 'struct device' via device_add(),
     which means user space can allow runtime suspend, e.g.,

       echo auto > /sys/bus/pci/devices/.../power/control

  4. PCI device receives BAR configuration
     (pci_assign_unassigned_bus_resources(), etc.)

  5. pci_bus_add_device() applies final fixups, saves device state, and
     tries to attach a driver

The device may potentially be suspended between #3 and #5, so this is racy
with user space (udev or similar).

Many PCI devices are enumerated at subsys_initcall time and so will not
race with user space, but devices created later by hotplug or modular
pwrctrl or host controller drivers are susceptible to this race.

More runtime PM details at the first Link: below.

Link: https://lore.kernel.org/all/0e35a4e1-894a-47c1-9528-fc5ffbafd9e2@samsung.com/
Signed-off-by: Brian Norris <briannorris@chromium.org>
[bhelgaas: update comments per https://lore.kernel.org/r/CAJZ5v0iBNOmMtqfqEbrYyuK2u+2J2+zZ-iQd1FvyCPjdvU2TJg@mail.gmail.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260122094815.v5.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid
2026-01-22 13:02:55 -06:00
Nicolin Chen
f5b16b8021 PCI: Suspend iommu function prior to resetting a device
PCIe permits a device to ignore ATS invalidation TLPs while processing a
reset. This creates a problem visible to the OS where an ATS invalidation
command will time out: e.g. an SVA domain will have no coordination with a
reset event and can racily issue ATS invalidations to a resetting device.

The PCIe r6.0, sec 10.3.1 IMPLEMENTATION NOTE recommends SW to disable and
block ATS before initiating a Function Level Reset. It also mentions that
other reset methods could have the same vulnerability as well.

The IOMMU subsystem provides pci_dev_reset_iommu_prepare/done() callback
helpers for this matter. Use them in all the existing reset functions.

This will attach the device to its iommu_group->blocking_domain during the
device reset, so as to allow IOMMU driver to:
 - invoke pci_disable_ats() and pci_enable_ats(), if necessary
 - wait for all ATS invalidations to complete
 - stop issuing new ATS invalidations
 - fence any incoming ATS queries

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10 10:26:44 +01:00
Brian Norris
4d98208450 PCI/PM: Avoid redundant delays on D3hot->D3cold
When transitioning to D3cold, __pci_set_power_state() first transitions to
D3hot. If the device was already in D3hot, this adds excess work:

  (a) read/modify/write PMCSR; and
  (b) excess delay (pci_dev_d3_sleep()).

For (b), we already performed the necessary delay on the previous D3hot
entry; this was extra noticeable when evaluating runtime PM transition
latency.

Check whether we're already in the target state before continuing.

Note that __pci_set_power_state() already does this same check for other
state transitions, but D3cold is special because __pci_set_power_state()
converts it to D3hot for the purposes of PMCSR.

This seems to be an oversight in commit 0aacdc9574 ("PCI/PM: Clean up
pci_set_low_power_state()").

Fixes: 0aacdc9574 ("PCI/PM: Clean up pci_set_low_power_state()")
Signed-off-by: Brian Norris <briannorris@google.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
[bhelgaas: reverse test to match other "dev->current_state == state" cases]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20251003154008.1.I7a21c240b30062c66471329567a96dceb6274358@changeid
2026-01-05 17:47:15 -06:00
Shuai Xue
d4318c1a79 PCI: trace: Add RAS tracepoint to monitor link speed changes
PCIe link speed degradation directly impacts system performance and often
indicates hardware issues such as faulty devices, physical layer problems,
or configuration errors.

To this end, add a RAS tracepoint to monitor link speed changes, enabling
proactive health checks and diagnostic analysis.

The following output is generated when a device is hotplugged:

  $ echo 1 > /sys/kernel/debug/tracing/events/pci/pcie_link_event/enable
  $ cat /sys/kernel/debug/tracing/trace_pipe
     irq/51-pciehp-88      [001] .....   381.545386: pcie_link_event: 0000:00:02.0 type:4, reason:4, cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, status:DLLLA

Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Suggested-by: Matthew W Carlis <mattc@purestorage.com>
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20251210132907.58799-3-xueshuai@linux.alibaba.com
2025-12-23 16:06:00 -06:00
Qiang Yu
a2582e05e3 PCI: Add preceding capability position support in PCI_FIND_NEXT_*_CAP macros
Add support for finding the preceding capability position in PCI
capability list by extending the capability finding macros with an
additional parameter. This functionality is essential for modifying PCI
capability list, as it provides the necessary information to update the
"next" pointer of the predecessor capability when removing entries.

Modify two macros to accept a new 'prev_ptr' parameter:
- PCI_FIND_NEXT_CAP - Now accepts 'prev_ptr' parameter for standard
  capabilities
- PCI_FIND_NEXT_EXT_CAP - Now accepts 'prev_ptr' parameter for extended
  capabilities

When a capability is found, these macros:
- Store the position of the preceding capability in *prev_ptr
  (if prev_ptr != NULL)
- Maintain all existing functionality when prev_ptr is NULL

Update current callers to accommodate this API change by passing NULL to
'prev_ptr' argument if they do not care about the preceding capability
position.

No functional changes to driver behavior result from this commit as it
maintains the existing capability finding functionality while adding the
infrastructure for future capability removal operations.

Signed-off-by: Qiang Yu <qiang.yu@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20251109-remove_cap-v1-1-2208f46f4dc2@oss.qualcomm.com
2025-12-18 12:46:16 +05:30
Bjorn Helgaas
13571584e1 Merge branch 'pci/resource'
- Prevent resource tree corruption when BAR resize fails (Ilpo Järvinen)

- Restore BARs to the original size if a BAR resize fails (Ilpo Järvinen)

- Remove BAR release from BAR resize attempts by the xe, i915, and amdgpu
  drivers so the PCI core can restore BARs if the resize fails (Ilpo
  Järvinen)

- Move Resizable BAR code to rebar.c (Ilpo Järvinen)

- Add pci_rebar_size_supported() and use it in i915 and xe (Ilpo Järvinen)

- Add pci_rebar_get_max_size() and use it in xe and amdgpu (Ilpo Järvinen)

* pci/resource:
  PCI: Validate pci_rebar_size_supported() input
  PCI: Convert BAR sizes bitmasks to u64
  drm/amdgpu: Use pci_rebar_get_max_size()
  drm/xe/vram: Use pci_rebar_get_max_size()
  PCI: Add pci_rebar_get_max_size()
  drm/xe/vram: Use PCI rebar helpers in resize_vram_bar()
  drm/i915/gt: Use pci_rebar_size_supported()
  PCI: Add pci_rebar_size_supported() helper
  PCI: Improve Resizable BAR functions kernel doc
  PCI: Move pci_rebar_size_to_bytes() and export it
  PCI: Move pci_rebar_bytes_to_size() and clean it up
  PCI: Move Resizable BAR code to rebar.c
  PCI: Prevent restoring assigned resources
  drm/amdgpu: Remove driver side BAR release before resize
  drm/i915: Remove driver side BAR release before resize
  drm/xe: Remove driver side BAR release before resize
  PCI: Add kerneldoc for pci_resize_resource()
  PCI: Fix restoring BARs on BAR resize rollback path
  PCI: Free saved list without holding pci_bus_sem
  PCI: Try BAR resize even when no window was released
  PCI: Change pci_dev variable from 'bridge' to 'dev'
  PCI/IOV: Adjust ->barsz[] when changing BAR size
  PCI: Prevent resource tree corruption when BAR resize fails
2025-12-03 14:18:32 -06:00
Bjorn Helgaas
5c5b8751e5 Merge branch 'pci/err'
- For drivers using PCI legacy suspend, save config state at suspend so
  that state (not any earlier state from enumeration, probe, or error
  recovery) will be restored when resuming (Lukas Wunner)

- For devices with no driver or a driver that lacks PM, save config state
  at hibernate so that state (not any earlier state from enumeration,
  probe, or error recovery) will be restored when resuming (Lukas Wunner)

- Save device config space on device addition, before driver binding, so
  error recovery works more reliably (Lukas Wunner)

- Drop pci_save_state() from several drivers that no longer need it since
  the PCI core always does it and pci_restore_state() no longer invalidates
  the saved state (Lukas Wunner)

- Document use of pci_save_state() by drivers to capture the state they
  want restored during error recovery (Lukas Wunner)

* pci/err:
  Documentation: PCI: Amend error recovery doc with pci_save_state() rules
  treewide: Drop pci_save_state() after pci_restore_state()
  PCI/ERR: Ensure error recoverability at all times
  PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw
  PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths
2025-12-03 14:18:31 -06:00
Lukas Wunner
a2f1e22390 PCI/ERR: Ensure error recoverability at all times
When the PCI core gained power management support in 2002, it introduced
pci_save_state() and pci_restore_state() helpers to restore Config Space
after a D3hot or D3cold transition, which implies a Soft or Fundamental
Reset (PCIe r7.0 sec 5.8):

  https://git.kernel.org/tglx/history/c/a5287abe398b

In 2006, EEH and AER were introduced to recover from errors by performing
a reset.  Because errors can occur at any time, drivers began calling
pci_save_state() on probe to ensure recoverability.

In 2009, recoverability was foiled by commit c82f63e411 ("PCI: check
saved state before restore"):  It amended pci_restore_state() to bail out
if the "state_saved" flag has been cleared.  The flag is cleared by
pci_restore_state() itself, hence a saved state is now allowed to be
restored only once and is then invalidated.  That doesn't seem to make
sense because the saved state should be good enough to be reused.

Soon after, drivers began to work around this behavior by calling
pci_save_state() immediately after pci_restore_state(), see e.g. commit
b94f2d775a ("igb: call pci_save_state after pci_restore_state").
Hilariously, two drivers even set the "saved_state" flag to true before
invoking pci_restore_state(), see ipr_reset_restore_cfg_space() and
e1000_io_slot_reset().

Despite these workarounds, recoverability at all times is not guaranteed:
E.g. when a PCIe port goes through a runtime suspend and resume cycle,
the "saved_state" flag is cleared by:

  pci_pm_runtime_resume()
    pci_pm_default_resume_early()
      pci_restore_state()

... and hence on a subsequent AER event, the port's Config Space cannot be
restored.  Riana reports a recovery failure of a GPU-integrated PCIe
switch and has root-caused it to the behavior of pci_restore_state().
Another workaround would be necessary, namely calling pci_save_state() in
pcie_port_device_runtime_resume().

The motivation of commit c82f63e411 was to prevent restoring state if
pci_save_state() hasn't been called before.  But that can be achieved by
saving state already on device addition, after Config Space has been
initialized.  A desirable side effect is that devices become recoverable
even if no driver gets bound.  This renders the commit unnecessary, so
revert it.

Reported-by: Riana Tauro <riana.tauro@intel.com> # off-list
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/9e34ce61c5404e99ffdd29205122c6fb334b38aa.1763483367.git.lukas@wunner.de
2025-11-24 16:58:33 -06:00
Ilpo Järvinen
9f71938cd7 PCI: Move Resizable BAR code to rebar.c
For lack of a better place to put it, Resizable BAR code has been placed
inside pci.c and setup-res.c that do not use it for anything.  Upcoming
changes are going to add more Resizable BAR related functions, increasing
the code size.

As pci.c is huge as is, move the Resizable BAR related code and the BAR
resize code from setup-res.c to rebar.c.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patch.msgid.link/20251113180053.27944-2-ilpo.jarvinen@linux.intel.com
2025-11-14 12:34:21 -06:00
Ilpo Järvinen
4687b3315a PCI/IOV: Adjust ->barsz[] when changing BAR size
pci_rebar_set_size() adjusts BAR size for both normal and IOV BARs. The
struct pci_sriov keeps a cached copy of BAR size in ->barsz[] which is not
adjusted by pci_rebar_set_size() but by pci_iov_resource_set_size().
pci_iov_resource_set_size() is called also from
pci_resize_resource_set_size().

The current arrangement is problematic once BAR resize algorithm starts to
roll back changes properly in case of a failure. The normal resource
fitting algorithm rolls back resource size using the struct
pci_dev_resource easily but also calling pci_resize_resource_set_size() or
pci_iov_resource_set_size() to roll back BAR size would be an extra burden,
whereas combining ->barsz[] update with pci_rebar_set_size() naturally
rolls back it when restoring the old BAR size on a different layer of the
BAR resize operation.

Thus, rework pci_rebar_set_size() to also update ->barsz[].

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU
Link: https://patch.msgid.link/20251113162628.5946-3-ilpo.jarvinen@linux.intel.com
2025-11-14 12:32:47 -06:00
Dan Williams
bcce8c74f1 PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
The ability to emulate a host bridge is useful not only for hardware PCI
controllers like CONFIG_VMD, or virtual PCI controllers like
CONFIG_PCI_HYPERV, but also for test and development scenarios like
CONFIG_SAMPLES_DEVSEC [1].

One stumbling block for defining CONFIG_SAMPLES_DEVSEC, a sample
implementation of a platform TSM for PCI Device Security, is the need to
accommodate PCI_DOMAINS_GENERIC architectures alongside x86 [2].

In support of supplementing the existing CONFIG_PCI_BRIDGE_EMUL
infrastructure for host bridges:

* Introduce pci_bus_find_emul_domain_nr() as a common way to find a free
  PCI domain number whether that is to reuse the existing dynamic
  allocation code in the !ACPI case, or to assign an unused domain above
  the last ACPI segment.

* Convert pci-hyperv to the new allocator so that the PCI core can
  unconditionally assume that bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET
  is the dynamically allocated case.

A follow on patch can also convert vmd to the new scheme. Currently vmd is
limited to CONFIG_PCI_DOMAINS_GENERIC=n (x86) so, unlike pci-hyperv, it
does not immediately conflict with this new pci_bus_find_emul_domain_nr()
mechanism.

Link: http://lore.kernel.org/174107249038.1288555.12362100502109498455.stgit@dwillia2-xfh.jf.intel.com [1]
Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Closes: http://lore.kernel.org/20250311144601.145736-3-suzuki.poulose@arm.com [2]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Rob Herring <robh@kernel.org>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Link: https://patch.msgid.link/20251024224622.1470555-2-dan.j.williams@intel.com
2025-10-28 12:36:34 -05:00
Linus Torvalds
2f2c725493 pci-v6.18-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmjgOAkUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxzlA//QxoUF4p1cN7+rPwuzCPNi2ZmKNyU
 T7mLfUciV/t8nPLPFdtxdttHB3F+BsA/E9WYFiUUGBzvdYafnoZ/Qnio1WdMIIYz
 0eVrTpnMUMBXrUwGFnnIER3b4GCJb2WR3RPfaBrbqQRHoAlDmv/ijh7rIKhgWIeR
 NsCmPiFnsxPjgVusn2jXWLheUHEbZh2dVTk9lceQXFRdrUELC9wH7zigAA6GviGO
 ssPC1pKfg5DrtuuM6k9JCcEYibQIlynxZ8sbT6YfQ2bs1uSEd2pEcr7AORb4l2yQ
 rcirHwGTpvZ/QvzKpDY8FcuzPFRP7QPd+34zMEQ2OW04y1k61iKE/4EE2Z9w/OoW
 esFQXbevy9P5JHu6DBcaJ2uwvnLiVesry+9CmkKCc6Dxyjbcbgeta1LR5dhn1Rv0
 dMtRnkd/pxzIF5cRnu+WlOFV2aAw2gKL9pGuimH5TO4xL2qCZKak0hh8PAjUN2c/
 12GAlrwAyBK1FeY2ZflTN7Vr8o2O0I6I6NeaF3sCW1VO2e6E9/bAIhrduUO4lhGq
 BHTVRBefFRtbFVaxTlUAj+lSCyqES3Wzm8y/uLQvT6M3opunTziSDff1aWbm1Y2t
 aASl1IByuKsGID8VrT5khHeBKSWtnd/v7LLUjCeq+g6eKdfN2arInPvw5X1NpVMj
 tzzBYqwHgBoA4u8=
 =BUw/
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Add PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() macros that
     take config space accessor functions.

     Implement pci_find_capability(), pci_find_ext_capability(), and
     dwc, dwc endpoint, and cadence capability search interfaces with
     them (Hans Zhang)

   - Leave parent unit address 0 in 'interrupt-map' so that when we
     build devicetree nodes to describe PCI functions that contain
     multiple peripherals, we can build this property even when
     interrupt controllers lack 'reg' properties (Lorenzo Pieralisi)

   - Add a Xeon 6 quirk to disable Extended Tags and limit Max Read
     Request Size to 128B to avoid a performance issue (Ilpo Järvinen)

   - Add sysfs 'serial_number' file to expose the Device Serial Number
     (Matthew Wood)

   - Fix pci_acpi_preserve_config() memory leak (Nirmoy Das)

  Resource management:

   - Align m68k pcibios_enable_device() with other arches (Ilpo
     Järvinen)

   - Remove sparc pcibios_enable_device() implementations that don't do
     anything beyond what pci_enable_resources() does (Ilpo Järvinen)

   - Remove mips pcibios_enable_resources() and use
     pci_enable_resources() instead (Ilpo Järvinen)

   - Clean up bridge window sizing and assignment (Ilpo Järvinen),
     including:

       - Leave non-claimed bridge windows disabled

       - Enable bridges even if a window wasn't assigned because not all
         windows are required by downstream devices

       - Preserve bridge window type when releasing the resource, since
         the type is needed for reassignment

       - Consolidate selection of bridge windows into two new
         interfaces, pbus_select_window() and
         pbus_select_window_for_type(), so this is done consistently

       - Compute bridge window start and end earlier to avoid logging
         stale information

  MSI:

   - Add quirk to disable MSI on RDC PCI to PCIe bridges (Marcos Del Sol
     Vives)

  Error handling:

   - Align AER with EEH by allowing drivers to request a Bus Reset on
     Non-Fatal Errors (in addition to the reset on Fatal Errors that we
     already do) (Lukas Wunner)

   - If error recovery fails, emit FAILED_RECOVERY uevents for the
     devices, not for the bridge leading to them.

     This makes them correspond to BEGIN_RECOVERY uevents (Lukas Wunner)

   - Align AER with EEH by calling err_handler.error_detected()
     callbacks to notify drivers if error recovery fails (Lukas Wunner)

   - Align AER with EEH by restoring device error_state to
     pci_channel_io_normal before the err_handler.slot_reset() callback.

     This is earlier than before the err_handler.resume() callback
     (Lukas Wunner)

   - Emit a BEGIN_RECOVERY uevent when driver's
     err_handler.error_detected() requests a reset, as well as when it
     says recovery is complete or can be done without a reset (Niklas
     Schnelle)

   - Align s390 with AER and EEH by emitting uevents during error
     recovery (Niklas Schnelle)

   - Align EEH with AER and s390 by emitting BEGIN_RECOVERY,
     SUCCESSFUL_RECOVERY, or FAILED_RECOVERY uevents depending on the
     result of err_handler.error_detected() (Niklas Schnelle)

   - Fix a NULL pointer dereference in aer_ratelimit() when ACPI GHES
     error information identifies a device without an AER Capability
     (Breno Leitao)

   - Update error decoding and TLP Log printing for new errors in
     current PCIe base spec (Lukas Wunner)

   - Update error recovery documentation to match the current code
     and use consistent nomenclature (Lukas Wunner)

  ASPM:

   - Enable all ClockPM and ASPM states for devicetree platforms, since
     there's typically no firmware that enables ASPM

     This is a risky change that may uncover hardware or configuration
     defects at boot-time rather than when users enable ASPM via sysfs
     later. Booting with "pcie_aspm=off" prevents this enabling
     (Manivannan Sadhasivam)

   - Remove the qcom code that enabled ASPM (Manivannan Sadhasivam)

  Power management:

   - If a device has already been disconnected, e.g., by a hotplug
     removal, don't bother trying to resume it to D0 when detaching the
     driver.

     This avoids annoying "Unable to change power state from D3cold to
     D0" messages (Mario Limonciello)

   - Ensure devices are powered up before config reads for
     'max_link_width', 'current_link_speed', 'current_link_width',
     'secondary_bus_number', and 'subordinate_bus_number' sysfs files.

     This prevents using invalid data (~0) in drivers or lspci and,
     depending on how the PCIe controller reports errors, may avoid
     error interrupts or crashes (Brian Norris)

  Virtualization:

   - Add rescan/remove locking when enabling/disabling SR-IOV, which
     avoids list corruption on s390, where disabling SR-IOV also
     generates hotplug events (Niklas Schnelle)

  Peer-to-peer DMA:

   - Free struct p2p_pgmap, not a member within it, in the
     pci_p2pdma_add_resource() error path (Sungho Kim)

  Endpoint framework:

   - Document sysfs interface for BAR assignment of vNTB endpoint
     functions (Jerome Brunet)

   - Fix array underflow in endpoint BAR test case (Dan Carpenter)

   - Skip endpoint IRQ test if the IRQ is out of range to avoid false
     errors (Christian Bruel)

   - Fix endpoint test case for controllers with fixed-size BARs smaller
     than requested by the test (Marek Vasut)

   - Restore inbound translation when disabling doorbell so the endpoint
     doorbell test case can be run more than once (Niklas Cassel)

   - Avoid a NULL pointer dereference when releasing DMA channels in
     endpoint DMA test case (Shin'ichiro Kawasaki)

   - Convert tegra194 interrupt number to MSI vector to fix endpoint
     Kselftest MSI_TEST test case (Niklas Cassel)

   - Reset tegra194 BARs when running in endpoint mode so the BAR tests
     don't overwrite the ATU settings in BAR4 (Niklas Cassel)

   - Handle errors in tegra194 BPMP transactions so we don't mistakenly
     skip future PERST# assertion (Vidya Sagar)

  AMD MDB PCIe controller driver:

   - Update DT binding example to separate PERST# to a Root Port stanza
     to make multiple Root Ports possible in the future (Sai Krishna
     Musham)

   - Add driver support for PERST# being described in a Root Port
     stanza, falling back to the host bridge if not found there (Sai
     Krishna Musham)

  Freescale i.MX6 PCIe controller driver:

   - Enable the 3.3V Vaux supply if available so devices can request
     wakeup with either Beacon or WAKE# (Richard Zhu)

  MediaTek PCIe Gen3 controller driver:

   - Add optional sys clock ready time setting to avoid sys_clk_rdy
     signal glitching in MT6991 and MT8196 (AngeloGioacchino Del Regno)

   - Add DT binding and driver support for MT6991 and MT8196
     (AngeloGioacchino Del Regno)

  NVIDIA Tegra PCIe controller driver:

   - When asserting PERST#, disable the controller instead of mistakenly
     disabling the PLL twice (Nagarjuna Kristam)

   - Convert struct tegra_msi mask_lock to raw spinlock to avoid a lock
     nesting error (Marek Vasut)

  Qualcomm PCIe controller driver:

   - Select PCI Power Control Slot driver so slot voltage rails can be
     turned on/off if described in Root Port devicetree node (Qiang Yu)

   - Parse only PCI bridge child nodes in devicetree, skipping unrelated
     nodes such as OPP (Operating Performance Points), which caused
     probe failures (Krishna Chaitanya Chundru)

   - Add 8.0 GT/s and 32.0 GT/s equalization settings (Ziyue Zhang)

   - Consolidate Root Port 'phy' and 'reset' properties in struct
     qcom_pcie_port, regardless of whether we got them from the Root
     Port node or the host bridge node (Manivannan Sadhasivam)

   - Fetch and map the ELBI register space in the DWC core rather than
     in each driver individually (Krishna Chaitanya Chundru)

   - Enable ECAM mechanism in DWC core by setting up iATU with 'CFG
     Shift Feature' and use this in the qcom driver (Krishna Chaitanya
     Chundru)

   - Add SM8750 compatible to qcom,pcie-sm8550.yaml (Krishna Chaitanya
     Chundru)

   - Update qcom,pcie-x1e80100.yaml to allow fifth PCIe host on Qualcomm
     Glymur, which is compatible with X1E80100 but doesn't have the
     cnoc_sf_axi clock (Qiang Yu)

  Renesas R-Car PCIe controller driver:

   - Fix a typo that prevented correct PHY initialization (Marek Vasut)

   - Add a missing 1ms delay after PWR reset assertion as required by
     the V4H manual (Marek Vasut)

   - Assure reset has completed before DBI access to avoid SError (Marek
     Vasut)

   - Fix inverted PHY initialization check, which sometimes led to
     timeouts and failure to start the controller (Marek Vasut)

   - Pass the correct IRQ domain to generic_handle_domain_irq() to fix a
     regression when converting to msi_create_parent_irq_domain()
     (Claudiu Beznea)

   - Drop the spinlock protecting the PMSR register - it's no longer
     required since pci_lock already serializes accesses (Marek Vasut)

   - Convert struct rcar_msi mask_lock to raw spinlock to avoid a lock
     nesting error (Marek Vasut)

  SOPHGO PCIe controller driver:

   - Check for existence of struct cdns_pcie.ops before using it to
     allow Cadence drivers that don't need to supply ops (Chen Wang)

   - Add DT binding and driver for the SOPHGO SG2042 PCIe controller
     (Chen Wang)

  STMicroelectronics STM32MP25 PCIe controller driver:

   - Update pinctrl documentation of initial states and use in runtime
     suspend/resume (Christian Bruel)

   - Add pinctrl_pm_select_init_state() for use by stm32 driver, which
     needs it during resume (Christian Bruel)

   - Add devicetree bindings and drivers for the STMicroelectronics
     STM32MP25 in host and endpoint modes (Christian Bruel)

  Synopsys DesignWare PCIe controller driver:

   - Add support for x16 in devicetree 'num-lanes' property (Konrad
     Dybcio)

   - Verify that if DT specifies a single IRQ for all eDMA channels, it
     is named 'dma' (Niklas Cassel)

  TI J721E PCIe driver:

   - Add MODULE_DEVICE_TABLE() so driver can be autoloaded (Siddharth
     Vadapalli)

   - Power controller off before configuring the glue layer so the
     controller latches the correct values on power-on (Siddharth
     Vadapalli)

  TI Keystone PCIe controller driver:

   - Use devm_request_irq() so 'ks-pcie-error-irq' is freed when driver
     exits with error (Siddharth Vadapalli)

   - Add Peripheral Virtualization Unit (PVU), which restricts DMA from
     PCIe devices to specific regions of host memory, to the ti,am65
     binding (Jan Kiszka)

  Xilinx NWL PCIe controller driver:

   - Clear bootloader E_ECAM_CONTROL before merging in the new driver
     value to avoid writing invalid values (Jani Nurminen)"

* tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (141 commits)
  PCI/AER: Avoid NULL pointer dereference in aer_ratelimit()
  MAINTAINERS: Add entry for ST STM32MP25 PCIe drivers
  PCI: stm32-ep: Add PCIe Endpoint support for STM32MP25
  dt-bindings: PCI: Add STM32MP25 PCIe Endpoint bindings
  PCI: stm32: Add PCIe host support for STM32MP25
  PCI: xilinx-nwl: Fix ECAM programming
  PCI: j721e: Fix incorrect error message in probe()
  PCI: keystone: Use devm_request_irq() to free "ks-pcie-error-irq" on exit
  dt-bindings: PCI: qcom,pcie-x1e80100: Set clocks minItems for the fifth Glymur PCIe Controller
  PCI: dwc: Support 16-lane operation
  PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
  PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV
  PCI: rcar-host: Convert struct rcar_msi mask_lock into raw spinlock
  PCI: tegra194: Rename 'root_bus' to 'root_port_bus' in tegra_pcie_downstream_dev_to_D0()
  PCI: tegra: Convert struct tegra_msi mask_lock into raw spinlock
  PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  PCI: rcar-gen4: Assure reset occurs before DBI access
  PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
  PCI: Set up bridge resources earlier
  PCI: rcar-host: Drop PMSR spinlock
  ...
2025-10-06 10:41:03 -07:00
Bjorn Helgaas
fef3530379 Merge branch 'pci/capability-search'
- Simplify __pci_find_next_cap_ttl() by replacing magic numbers with
  #defines, extracting fields with FIELD_GET(), etc (Hans Zhang)

- Convert __pci_find_next_cap_ttl() to a PCI_FIND_NEXT_CAP() macro that
  takes a config space accessor function so we can also use it in cases
  where the usual config accessors aren't available (Hans Zhang)

- Similarly convert pci_find_next_ext_capability() to a
  PCI_FIND_NEXT_EXT_CAP() macro (Hans Zhang)

- Implement dwc, dwc endpoint, and cadence capability search interfaces on
  top of PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP(), replacing the
  previous duplicated code (Hans Zhang)

- Search for capabilities in the cadence core instead of hard-coding their
  offsets, which are subject to change (Hans Zhang)

* pci/capability-search:
  PCI: cadence: Use cdns_pcie_find_*capability() to avoid hardcoding offsets
  PCI: cadence: Implement capability search using PCI core APIs
  PCI: dwc: ep: Implement capability search using PCI core APIs
  PCI: dwc: Implement capability search using PCI core APIs
  PCI: Refactor extended capability search into PCI_FIND_NEXT_EXT_CAP()
  PCI: Refactor capability search into PCI_FIND_NEXT_CAP()
  PCI: Clean up __pci_find_next_cap_ttl() readability
2025-10-03 12:13:14 -05:00
Mario Limonciello
299fad4133 PCI/PM: Skip resuming to D0 if device is disconnected
When a device is surprise-removed (e.g., due to a dock unplug), the PCI
core unconfigures all downstream devices and sets their error state to
pci_channel_io_perm_failure. This marks them as disconnected via
pci_dev_is_disconnected().

During device removal, the runtime PM framework may attempt to resume the
device to D0 via pm_runtime_get_sync(), which calls into pci_power_up().
Since the device is already disconnected, this resume attempt is
unnecessary and results in a predictable errors like this, typically when
undocking from a TBT3 or USB4 dock with PCIe tunneling:

  pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible

Avoid powering up disconnected devices by checking their status early in
pci_power_up() and returning -EIO.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
[bhelgaas: add typical message]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://patch.msgid.link/20250909031916.4143121-1-superm1@kernel.org
2025-09-22 09:38:49 -05:00
Kees Cook
00e58ff924 PCI: Test for bit underflow in pcie_set_readrq()
In preparation for the future commit ("bitops: Add __attribute_const__ to generic
ffs()-family implementations"), which allows GCC's value range tracker
to see past ffs(), GCC 8 on ARM thinks that it might be possible that
"ffs(rq) - 8" used here:

	v = FIELD_PREP(PCI_EXP_DEVCTL_READRQ, ffs(rq) - 8);

could wrap below 0, leading to a very large value, which would be out of
range for the FIELD_PREP() usage:

drivers/pci/pci.c: In function 'pcie_set_readrq':
include/linux/compiler_types.h:572:38: error: call to '__compiletime_assert_471' declared with attribute error: FIELD_PREP: value too large for the field
...
drivers/pci/pci.c:5896:6: note: in expansion of macro 'FIELD_PREP'
  v = FIELD_PREP(PCI_EXP_DEVCTL_READRQ, ffs(rq) - 8);
      ^~~~~~~~~~

If the result of the ffs() is bounds checked before being used in
FIELD_PREP(), the value tracker seems happy again. :)

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/linux-pci/CA+G9fYuysVr6qT8bjF6f08WLyCJRG7aXAeSd2F7=zTaHHd7L+Q@mail.gmail.com/
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250905052836.work.425-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-08 14:58:20 -07:00
Hans Zhang
4d909bf1a5 PCI: Refactor extended capability search into PCI_FIND_NEXT_EXT_CAP()
Move the extended Capability search logic into a PCI_FIND_NEXT_EXT_CAP()
macro that accepts a config space accessor function as an argument. This
enables controller drivers to perform Capability discovery using their
early access mechanisms prior to full device initialization while sharing
the Capability search code.

Convert the existing PCI core extended Capability search implementation to
use PCI_FIND_NEXT_EXT_CAP().

Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://patch.msgid.link/20250813144529.303548-4-18255117159@163.com
2025-08-14 15:03:45 -05:00
Hans Zhang
3c02084d8f PCI: Refactor capability search into PCI_FIND_NEXT_CAP()
The PCI Capability search functionality is duplicated across the PCI core
and several controller drivers. The core's current implementation requires
fully initialized PCI device and bus structures, which prevents controller
drivers from using it during early initialization phases before these
structures are available.

Move the Capability search logic into a PCI_FIND_NEXT_CAP() macro that
accepts a config space accessor function as an argument. This enables
controller drivers to perform Capability discovery using their early
access mechanisms prior to full device initialization while sharing the
Capability search code.

Convert the existing PCI core Capability search implementation to use
PCI_FIND_NEXT_CAP().  Controller drivers can later use this with their
early access mechanisms while maintaining the existing protection against
infinite loops through preserved TTL checks.

Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://patch.msgid.link/20250813144529.303548-3-18255117159@163.com
2025-08-14 15:03:40 -05:00