From 4d982084507d663df160546c4c48066a8887ed89 Mon Sep 17 00:00:00 2001 From: Brian Norris Date: Fri, 3 Oct 2025 15:40:09 -0700 Subject: [PATCH 1/2] PCI/PM: Avoid redundant delays on D3hot->D3cold When transitioning to D3cold, __pci_set_power_state() first transitions to D3hot. If the device was already in D3hot, this adds excess work: (a) read/modify/write PMCSR; and (b) excess delay (pci_dev_d3_sleep()). For (b), we already performed the necessary delay on the previous D3hot entry; this was extra noticeable when evaluating runtime PM transition latency. Check whether we're already in the target state before continuing. Note that __pci_set_power_state() already does this same check for other state transitions, but D3cold is special because __pci_set_power_state() converts it to D3hot for the purposes of PMCSR. This seems to be an oversight in commit 0aacdc957401 ("PCI/PM: Clean up pci_set_low_power_state()"). Fixes: 0aacdc957401 ("PCI/PM: Clean up pci_set_low_power_state()") Signed-off-by: Brian Norris Signed-off-by: Brian Norris [bhelgaas: reverse test to match other "dev->current_state == state" cases] Signed-off-by: Bjorn Helgaas Link: https://patch.msgid.link/20251003154008.1.I7a21c240b30062c66471329567a96dceb6274358@changeid --- drivers/pci/pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 13dbb405dc31..86ccbd0efb49 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1488,6 +1488,9 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool || (state == PCI_D2 && !dev->d2_support)) return -EIO; + if (dev->current_state == state) + return 0; + pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); if (PCI_POSSIBLE_ERROR(pmcsr)) { pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n", From 51c0996dadaea20d73eb0495aeda9cb0422243e8 Mon Sep 17 00:00:00 2001 From: Brian Norris Date: Thu, 22 Jan 2026 09:48:15 -0800 Subject: [PATCH 2/2] PCI/PM: Prevent runtime suspend until devices are fully initialized Previously, it was possible for a PCI device to be runtime-suspended before it was fully initialized. When that happened, the suspend process could save invalid device state, for example, before BAR assignment. Restoring the invalid state during resume may leave the device non-functional. Prevent runtime suspend for PCI devices until they are fully initialized by deferring pm_runtime_enable(). More details on how exactly this may occur: 1. PCI device is created by pci_scan_slot() or similar 2. As part of pci_scan_slot(), pci_pm_init() puts the device in D0 and prevents runtime suspend prevented via pm_runtime_forbid() 3. pci_device_add() adds the underlying 'struct device' via device_add(), which means user space can allow runtime suspend, e.g., echo auto > /sys/bus/pci/devices/.../power/control 4. PCI device receives BAR configuration (pci_assign_unassigned_bus_resources(), etc.) 5. pci_bus_add_device() applies final fixups, saves device state, and tries to attach a driver The device may potentially be suspended between #3 and #5, so this is racy with user space (udev or similar). Many PCI devices are enumerated at subsys_initcall time and so will not race with user space, but devices created later by hotplug or modular pwrctrl or host controller drivers are susceptible to this race. More runtime PM details at the first Link: below. Link: https://lore.kernel.org/all/0e35a4e1-894a-47c1-9528-fc5ffbafd9e2@samsung.com/ Signed-off-by: Brian Norris [bhelgaas: update comments per https://lore.kernel.org/r/CAJZ5v0iBNOmMtqfqEbrYyuK2u+2J2+zZ-iQd1FvyCPjdvU2TJg@mail.gmail.com] Signed-off-by: Bjorn Helgaas Tested-by: Marek Szyprowski Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260122094815.v5.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid --- drivers/pci/bus.c | 8 ++++++++ drivers/pci/pci.c | 8 +++++++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 4383a36fd6ca..41e5c45e38b5 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -379,6 +380,13 @@ void pci_bus_add_device(struct pci_dev *dev) put_device(&pdev->dev); } + /* + * Enable runtime PM, which potentially allows the device to + * suspend immediately, only after the PCI state has been + * configured completely. + */ + pm_runtime_enable(&dev->dev); + if (!dn || of_device_is_available(dn)) pci_dev_allow_binding(dev); diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 86ccbd0efb49..d5f4b52899ad 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3199,8 +3199,14 @@ void pci_pm_init(struct pci_dev *dev) poweron: pci_pm_power_up_and_verify_state(dev); pm_runtime_forbid(&dev->dev); + + /* + * Runtime PM will be enabled for the device when it has been fully + * configured, but since its parent and suppliers may suspend in + * the meantime, prevent them from doing so by changing the + * device's runtime PM status to "active". + */ pm_runtime_set_active(&dev->dev); - pm_runtime_enable(&dev->dev); } static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)