This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
- Rename pwrseq, tc9563, and slot driver structs, variables, and functions
for consistency (Bjorn Helgaas)
- Add power_on/off callbacks with generic signature to pwrseq, tc9563, and
slot drivers so they can be used by pwrctrl core (Manivannan Sadhasivam)
- Add interfaces to create and destroy pwrctrl devices (Krishna Chaitanya
Chundru)
- Add interfaces to power devices on and off (Manivannan Sadhasivam)
- Switch to pwrctrl interfaces to create, destroy, and power on/off
devices, calling them from host controller drivers instead of the PCI
core (Manivannan Sadhasivam)
- Drop qcom .assert_perst() callbacks since this is now done by the
controller driver instead of the pwrctrl driver (Manivannan Sadhasivam)
- Add PCIe M.2 connector support to the slot pwrctrl driver (Manivannan
Sadhasivam)
- Create pwrctrl devices for devicetree PCIe M.2 connector nodes
(Manivannan Sadhasivam)
* pci/pwrctrl:
PCI/pwrctrl: Create pwrctrl device if graph port is found
PCI/pwrctrl: Add PCIe M.2 connector support
PCI: Drop the assert_perst() callback
PCI: qcom: Drop the assert_perst() callbacks
PCI/pwrctrl: Switch to pwrctrl create, power on/off, destroy APIs
PCI/pwrctrl: Add APIs to power on/off pwrctrl devices
PCI/pwrctrl: Add APIs to create, destroy pwrctrl devices
PCI/pwrctrl: Add 'struct pci_pwrctrl::power_{on/off}' callbacks
PCI/pwrctrl: pwrseq: Factor out power on/off code to helpers
PCI/pwrctrl: slot: Factor out power on/off code to helpers
PCI/pwrctrl: tc9563: Rename private struct and pointers for consistency
PCI/pwrctrl: tc9563: Add local variables to reduce repetition
PCI/pwrctrl: tc9563: Clean up whitespace
PCI/pwrctrl: tc9563: Use put_device() instead of i2c_put_adapter()
PCI/pwrctrl: slot: Rename private struct and pointers for consistency
PCI/pwrctrl: pwrseq: Rename private struct and pointers for consistency
# Conflicts:
# drivers/pci/bus.c
Previously, it was possible for a PCI device to be runtime-suspended before
it was fully initialized. When that happened, the suspend process could
save invalid device state, for example, before BAR assignment. Restoring
the invalid state during resume may leave the device non-functional.
Prevent runtime suspend for PCI devices until they are fully initialized by
deferring pm_runtime_enable().
More details on how exactly this may occur:
1. PCI device is created by pci_scan_slot() or similar
2. As part of pci_scan_slot(), pci_pm_init() puts the device in D0 and
prevents runtime suspend prevented via pm_runtime_forbid()
3. pci_device_add() adds the underlying 'struct device' via device_add(),
which means user space can allow runtime suspend, e.g.,
echo auto > /sys/bus/pci/devices/.../power/control
4. PCI device receives BAR configuration
(pci_assign_unassigned_bus_resources(), etc.)
5. pci_bus_add_device() applies final fixups, saves device state, and
tries to attach a driver
The device may potentially be suspended between #3 and #5, so this is racy
with user space (udev or similar).
Many PCI devices are enumerated at subsys_initcall time and so will not
race with user space, but devices created later by hotplug or modular
pwrctrl or host controller drivers are susceptible to this race.
More runtime PM details at the first Link: below.
Link: https://lore.kernel.org/all/0e35a4e1-894a-47c1-9528-fc5ffbafd9e2@samsung.com/
Signed-off-by: Brian Norris <briannorris@chromium.org>
[bhelgaas: update comments per https://lore.kernel.org/r/CAJZ5v0iBNOmMtqfqEbrYyuK2u+2J2+zZ-iQd1FvyCPjdvU2TJg@mail.gmail.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260122094815.v5.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid
Adopt pwrctrl APIs to create, power on/off, and destroy pwrctrl devices.
In qcom_pcie_host_init(), call pci_pwrctrl_create_devices() to create
devices, then pci_pwrctrl_power_on_devices() to power them on, both after
controller resource initialization. Once successful, deassert PERST# for
all devices.
In qcom_pcie_host_deinit(), call pci_pwrctrl_power_off_devices() after
asserting PERST#. Note that pci_pwrctrl_destroy_devices() is not called
here, as deinit is only invoked during system suspend where device
destruction is unnecessary. If the driver becomes removable in future,
pci_pwrctrl_destroy_devices() should be called in the remove() handler.
Remove the old pwrctrl framework code from the PCI core (including
devlinks) as the new APIs are now the sole consumer of pwrctrl
functionality. And also do not power on the pwrctrl drivers during probe()
as this is now handled by the APIs.
Co-developed-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260115-pci-pwrctrl-rework-v5-12-9d26da3ce903@oss.qualcomm.com
- Introduce the PCI/TSM core for the coordination of device
authentication, link encryption and establishment (IDE), and later
management of the device security operational states (TDISP). Notify
the new TSM core layer of PCI device arrival and departure.
- Add a low level TSM driver for the link encryption establishment
capabilities of the AMD SEV-TIO architecture.
- Add a library of helpers TSM drivers to use for IDE establishment and
the DOE transport.
- Add skeleton support for 'bind' and 'guest_request' operations in
support of TDISP.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCaTOdAwAKCRDfioYZHlFs
Z/fWAQDS5mwS/8rn0UdH/SijTm/oKVxdiyIQbTstrjk8AySITgEA5ki9w2iKa0WG
x1ACZKlo9gS9emyx4wuJpCBIMtR50Qc=
=B4oG
-----END PGP SIGNATURE-----
Merge tag 'tsm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm
Pull PCIe Link Encryption and Device Authentication from Dan Williams:
"New PCI infrastructure and one architecture implementation for PCIe
link encryption establishment via platform firmware services.
This work is the result of multiple vendors coming to consensus on
some core infrastructure (thanks Alexey, Yilun, and Aneesh!), and
three vendor implementations, although only one is included in this
pull. The PCI core changes have an ack from Bjorn, the crypto/ccp/
changes have an ack from Tom, and the iommu/amd/ changes have an ack
from Joerg.
PCIe link encryption is made possible by the soup of acronyms
mentioned in the shortlog below. Link Integrity and Data Encryption
(IDE) is a protocol for installing keys in the transmitter and
receiver at each end of a link. That protocol is transported over Data
Object Exchange (DOE) mailboxes using PCI configuration requests.
The aspect that makes this a "platform firmware service" is that the
key provisioning and protocol is coordinated through a Trusted
Execution Envrionment (TEE) Security Manager (TSM). That is either
firmware running in a coprocessor (AMD SEV-TIO), or quasi-hypervisor
software (Intel TDX Connect / ARM CCA) running in a protected CPU
mode.
Now, the only reason to ask a TSM to run this protocol and install the
keys rather than have a Linux driver do the same is so that later, a
confidential VM can ask the TSM directly "can you certify this
device?".
That precludes host Linux from provisioning its own keys, because host
Linux is outside the trust domain for the VM. It also turns out that
all architectures, save for one, do not publish a mechanism for an OS
to establish keys in the root port. So "TSM-established link
encryption" is the only cross-architecture path for this capability
for the foreseeable future.
This unblocks the other arch implementations to follow in v6.20/v7.0,
once they clear some other dependencies, and it unblocks the next
phase of work to implement the end-to-end flow of confidential device
assignment. The PCIe specification calls this end-to-end flow Trusted
Execution Environment (TEE) Device Interface Security Protocol
(TDISP).
In the meantime, Linux gets a link encryption facility which has
practical benefits along the same lines as memory encryption. It
authenticates devices via certificates and may protect against
interposer attacks trying to capture clear-text PCIe traffic.
Summary:
- Introduce the PCI/TSM core for the coordination of device
authentication, link encryption and establishment (IDE), and later
management of the device security operational states (TDISP).
Notify the new TSM core layer of PCI device arrival and departure
- Add a low level TSM driver for the link encryption establishment
capabilities of the AMD SEV-TIO architecture
- Add a library of helpers TSM drivers to use for IDE establishment
and the DOE transport
- Add skeleton support for 'bind' and 'guest_request' operations in
support of TDISP"
* tag 'tsm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm: (23 commits)
crypto/ccp: Fix CONFIG_PCI=n build
virt: Fix Kconfig warning when selecting TSM without VIRT_DRIVERS
crypto/ccp: Implement SEV-TIO PCIe IDE (phase1)
iommu/amd: Report SEV-TIO support
psp-sev: Assign numbers to all status codes and add new
ccp: Make snp_reclaim_pages and __sev_do_cmd_locked public
PCI/TSM: Add 'dsm' and 'bound' attributes for dependent functions
PCI/TSM: Add pci_tsm_guest_req() for managing TDIs
PCI/TSM: Add pci_tsm_bind() helper for instantiating TDIs
PCI/IDE: Initialize an ID for all IDE streams
PCI/IDE: Add Address Association Register setup for downstream MMIO
resource: Introduce resource_assigned() for discerning active resources
PCI/TSM: Drop stub for pci_tsm_doe_transfer()
drivers/virt: Drop VIRT_DRIVERS build dependency
PCI/TSM: Report active IDE streams
PCI/IDE: Report available IDE streams
PCI/IDE: Add IDE establishment helpers
PCI: Establish document for PCI host bridge sysfs attributes
PCI: Add PCIe Device 3 Extended Capability enumeration
PCI/TSM: Establish Secure Sessions and Link Encryption
...
When the PCI core gained power management support in 2002, it introduced
pci_save_state() and pci_restore_state() helpers to restore Config Space
after a D3hot or D3cold transition, which implies a Soft or Fundamental
Reset (PCIe r7.0 sec 5.8):
https://git.kernel.org/tglx/history/c/a5287abe398b
In 2006, EEH and AER were introduced to recover from errors by performing
a reset. Because errors can occur at any time, drivers began calling
pci_save_state() on probe to ensure recoverability.
In 2009, recoverability was foiled by commit c82f63e411 ("PCI: check
saved state before restore"): It amended pci_restore_state() to bail out
if the "state_saved" flag has been cleared. The flag is cleared by
pci_restore_state() itself, hence a saved state is now allowed to be
restored only once and is then invalidated. That doesn't seem to make
sense because the saved state should be good enough to be reused.
Soon after, drivers began to work around this behavior by calling
pci_save_state() immediately after pci_restore_state(), see e.g. commit
b94f2d775a ("igb: call pci_save_state after pci_restore_state").
Hilariously, two drivers even set the "saved_state" flag to true before
invoking pci_restore_state(), see ipr_reset_restore_cfg_space() and
e1000_io_slot_reset().
Despite these workarounds, recoverability at all times is not guaranteed:
E.g. when a PCIe port goes through a runtime suspend and resume cycle,
the "saved_state" flag is cleared by:
pci_pm_runtime_resume()
pci_pm_default_resume_early()
pci_restore_state()
... and hence on a subsequent AER event, the port's Config Space cannot be
restored. Riana reports a recovery failure of a GPU-integrated PCIe
switch and has root-caused it to the behavior of pci_restore_state().
Another workaround would be necessary, namely calling pci_save_state() in
pcie_port_device_runtime_resume().
The motivation of commit c82f63e411 was to prevent restoring state if
pci_save_state() hasn't been called before. But that can be achieved by
saving state already on device addition, after Config Space has been
initialized. A desirable side effect is that devices become recoverable
even if no driver gets bound. This renders the commit unnecessary, so
revert it.
Reported-by: Riana Tauro <riana.tauro@intel.com> # off-list
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/9e34ce61c5404e99ffdd29205122c6fb334b38aa.1763483367.git.lukas@wunner.de
PCI/TSM, the PCI core functionality for the PCIe TEE Device Interface
Security Protocol (TDISP), has a need to walk all subordinate functions of
a Device Security Manager (DSM) to setup a device security context. A DSM
is physical function 0 of multi-function or SR-IOV device endpoint, or it
is an upstream switch port.
In error scenarios or when a TEE Security Manager (TSM) device is removed
it needs to unwind all established DSM contexts.
Introduce reverse versions of PCI device iteration helpers to mirror the
setup path and ensure that dependent children are handled before parents.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251031212902.2256310-4-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
- Ensure relaxed tail alignment does not increase min_align when computing
bridge window size, to fix a regression (Ilpo Järvinen)
- Fix bridge window size computation to fix a regression for devices with
undefined PCI class, e.g., Samsung [144d:a5a5] (Ilpo Järvinen)
- Fix error handling during resource resize to fix a regression in amdgpu
(Ilpo Järvinen)
- Align m68k pcibios_enable_device() with other arches (Ilpo Järvinen)
- Remove several sparc pcibios_enable_device() implementations that don't
do anything beyond what pci_enable_resources() does (Ilpo Järvinen)
- Remove mips pcibios_enable_resources() and use pci_enable_resources()
instead (Ilpo Järvinen)
- Refactor and simplify find_bus_resource_of_type() (Ilpo Järvinen)
- Claim bridge windows before setting them up (Ilpo Järvinen)
- Disable non-claimed bridge windows so the kernel's view matches the
hardware configuration (Ilpo Järvinen)
- Use pci_release_resource() instead of release_resource() to reduce code
duplication and increase consistency (Ilpo Järvinen)
- Enable bridges even if bridge window assignment fails (Ilpo Järvinen)
- Preserve bridge window resource type flags when assignment fails because
we may need it later (Ilpo Järvinen)
- Add bridge window selection functions to make the selection consistent
across the several places that do this (Ilpo Järvinen)
- Warn if bridge window cannot be released when resizing BAR (Ilpo
Järvinen)
- Set up bridge resources before enumerating children so we can check
whether child resources are inside bridge windows (Ilpo Järvinen)
* pci/resource:
PCI: Set up bridge resources earlier
PCI: Don't print stale information about resource
PCI: Alter misleading recursion to pci_bus_release_bridge_resources()
PCI: Pass bridge window to pci_bus_release_bridge_resources()
PCI: Add pci_setup_one_bridge_window()
PCI: Refactor remove_dev_resources() to use pbus_select_window()
PCI: Refactor distributing available memory to use loops
PCI: Use pbus_select_window_for_type() during mem window sizing
PCI: Use pbus_select_window() in space available checker
PCI: Rename resource variable from r to res
PCI: Use pbus_select_window_for_type() during IO window sizing
PCI: Use pbus_select_window() during BAR resize
PCI: Warn if bridge window cannot be released when resizing BAR
PCI: Fix finding bridge window in pci_reassign_bridge_resources()
PCI: Add bridge window selection functions
PCI: Add defines for bridge window indexing
PCI: Preserve bridge window resource type flags
PCI: Enable bridge even if bridge window fails to assign
PCI: Use pci_release_resource() instead of release_resource()
PCI: Disable non-claimed bridge window
PCI: Always claim bridge window before its setup
PCI: Refactor find_bus_resource_of_type() logic checks
PCI: Move find_bus_resource_of_type() earlier
MIPS: PCI: Use pci_enable_resources()
sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
PCI: Fix failure detection during resource resize
PCI: Fix pdev_resources_assignable() disparity
PCI: Ensure relaxed tail alignment does not increase min_align
When a bridge window is found unused or fails to assign, the flags of the
associated resource are cleared. Clearing flags is problematic as it also
removes the type information of the resource which is needed later.
Thus, always preserve the bridge window type flags and use IORESOURCE_UNSET
and IORESOURCE_DISABLED to indicate the status of the bridge window. Also,
when initializing resources, make sure all valid bridge windows do get
their type flags set.
Change various places that relied on resource flags being cleared to check
for IORESOURCE_UNSET and IORESOURCE_DISABLED to allow bridge window
resource to retain their type flags. Add pdev_resource_assignable() and
pdev_resource_should_fit() helpers to filter out disabled bridge windows
during resource fitting; the latter combines more common checks into the
helper.
When reading the bridge windows from the registers, instead of leaving the
resource flags cleared for bridge windows that are not enabled, always
set up the flags and set IORESOURCE_UNSET | IORESOURCE_DISABLED as needed.
When resource fitting or assignment fails for a bridge window resource, or
the bridge window is not needed, mark the resource with IORESOURCE_UNSET or
IORESOURCE_DISABLED, respectively.
Use dummy zero resource in resource_show() for backwards compatibility as
lspci will otherwise misrepresent disabled bridge windows.
This change fixes an issue which highlights the importance of keeping the
resource type flags intact:
At the end of __assign_resources_sorted(), reset_resource() is called,
previously clearing the flags. Later, pci_prepare_next_assign_round()
attempted to release bridge resources using
pci_bus_release_bridge_resources() that calls into
pci_bridge_release_resources() that assumes type flags are still present.
As type flags were cleared, IORESOURCE_MEM_64 was not set leading to
resources under an incorrect bridge window to be released (idx = 1
instead of idx = 2). While the assignments performed later covered this
problem so that the wrongly released resources got assigned in the end,
it was still causing extra release+assign pairs.
There are other reasons why the resource flags should be retained in
upcoming changes too.
Removing the flag reset for non-bridge window resource is left as future
work, in part because it has a much higher regression potential due to
pci_enable_resources() that will start to work also for those resources
then and due to what endpoint drivers might assume about resources.
Despite the Fixes tag, backporting this (at least any time soon) is highly
discouraged. The issue fixed is borderline cosmetic as the later
assignments normally cover the problem entirely. Also there might be
non-obvious dependencies.
Fixes: 5b28541552 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-11-ilpo.jarvinen@linux.intel.com
Make sure to drop the reference to the pwrctrl device taken by
of_find_device_by_node() when registering a PCI device.
Fixes: b458ff7e81 ("PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org # v6.13
Link: https://patch.msgid.link/20250721153609.8611-2-johan+linaro@kernel.org
The PCI core has historically not allowed built-in drivers to opt in to
async initial probing: Drivers may set "PROBE_PREFER_ASYNCHRONOUS", but
initial probing always happens synchronously. That's because the PCI core
uses device_attach() instead of device_initial_probe().
Should a driver return -EPROBE_DEFER on initial probe, reprobing later on
does honor the PROBE_PREFER_ASYNCHRONOUS setting. Modular drivers are
also allowed to probe asynchronously, which is inconsistent.
The choice of device_attach() is likely not deliberate: It was introduced
in 2013 with commit 58d9a38f6f ("PCI: Skip attaching driver in
device_add()"), but asynchronous probing was added two years later with
commit 765230b5f0 ("driver-core: add asynchronous probing support for
drivers").
According to the kernel-doc of "enum probe_type", "the end goal is to
switch the kernel to use asynchronous probing by default". To this end,
use device_initial_probe() to allow asynchronous initial probing. The
function returns void, making the return value check unnecessary.
Initial PCI probing often takes on the order of seconds even on laptops,
so this may speed up booting significantly.
A small number of PCI drivers already opt in to asynchronous probing.
Their maintainers (who are all cc'ed) should watch out for issues, now
that asynchronous probing is not just allowed for deferred and modular
probing, but also initial probing:
hl_pci_driver drivers/accel/habanalabs/common/habanalabs_drv.c
cxl_pci_driver drivers/cxl/pci.c
quicki2c_driver drivers/hid/intel-thc-hid/intel-quicki2c/pci-quicki2c.c
quickspi_driver drivers/hid/intel-thc-hid/intel-quickspi/pci-quickspi.c
i801_driver drivers/i2c/busses/i2c-i801.c
mei_me_driver drivers/misc/mei/pci-me.c
mei_vsc_drv drivers/misc/mei/platform-vsc.c
sdhci_driver drivers/mmc/host/sdhci-pci-core.c
nvme_driver drivers/nvme/host/pci.c
ehci_pci_driver drivers/usb/host/ehci-pci.c
hvfb_pci_stub_driver drivers/video/fbdev/hyperv_fb.c
All other driver maintainers may test asynchronous probing by specifying
the command line parameter "driver_async_probe=drv_name1,drv_name2,...",
and on success setting "probe_type = PROBE_PREFER_ASYNCHRONOUS" in the
pci_driver struct.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
[bhelgaas: updated commit log per https://lore.kernel.org/r/aHYUh7WoDlhHckxd@wunner.de]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/53abe6f5ac7c631f95f5d061aa748b192eda0379.1751614426.git.lukas@wunner.de
Since commit 58d9a38f6f ("PCI: Skip attaching driver in device_add()"),
PCI enumeration is split into two steps: In the first step, all devices
are published in sysfs with device_add(). In the second step, drivers are
bound to the devices with device_attach(). To delay driver binding until
the second step, a "bool match_driver" in struct pci_dev is used.
Instead of a bool, use a bit in the "unsigned long priv_flags" to shrink
struct pci_dev a little and prevent use of the bool outside the PCI core
(as has happened with commit cbbc00be2c ("iommu/amd: Prevent binding
other PCI drivers to IOMMU PCI devices")).
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://patch.msgid.link/d22a9e5b81d6bd8dd1837607d6156679b3b1199c.1745572340.git.lukas@wunner.de
Current way of creating pwrctrl devices requires iterating through the
child devicetree nodes of the PCI bridge in pci_pwrctrl_create_devices().
Even though it works, it creates confusion as there is no symmetry between
this and pci_pwrctrl_unregister() function that removes the pwrctrl
devices.
So to make these two functions symmetric, move the creation of pwrctrl
devices to pci_scan_device(). During the scan of each device in a slot,
the devicetree node (if exists) for the PCI device will be checked. If it
has the supplies populated, then the pwrctrl device will be created.
Since the PCI device scan happens so early, there would be no "struct
pci_dev" available for the device. So the host bridge is used as the
parent of all pwrctrl devices.
One nice side effect of this move is that, it is now possible to have
pwrctrl devices for PCI bridges as well (to control the supplies of PCI
slots).
Suggested-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-1-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
- Use of_platform_device_create() instead of of_platform_populate() to
create pwrctl platform devices so we can control it based on the child
nodes (Manivannan Sadhasivam)
- Create pwrctrl platform devices only if there's a relevant power supply
property (Manivannan Sadhasivam)
- Add device link from the pwrctl supplier to the PCI dev to ensure pwrctl
drivers are probed before the PCI dev driver; this avoids a race where
pwrctl could change device power state while the PCI driver was active
(Manivannan Sadhasivam)
- Find pwrctl device for removal with of_find_device_by_node() instead of
searching all children of the parent (Manivannan Sadhasivam)
- Rename 'pwrctl' to 'pwrctrl' to use the same 'ctrl' suffix as 'bwctrl'
and other PCI files to reduce confusion (Bjorn Helgaas)
* pci/pwrctl:
PCI/pwrctrl: Rename pwrctrl functions and structures
PCI/pwrctrl: Rename pwrctl files to pwrctrl
PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent
PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers
PCI/pwrctl: Create pwrctl device only if at least one power supply is present
PCI/pwrctl: Use of_platform_device_create() to create pwrctl devices
# Conflicts:
# drivers/pci/bus.c
# drivers/pci/remove.c
- Make pci_stop_dev() and pci_destroy_dev() concurrent safe (Keith Busch)
- Move __pci_walk_bus() mutex up into the caller, which avoids the need for
a parameter to control locking (Keith Busch)
- Simplify __pci_walk_bus() by making it recursive (Keith Busch)
- Unexport pci_walk_bus_locked(), which is only used internally by the PCI
core (Keith Busch)
* pci/locking:
PCI: Unexport pci_walk_bus_locked()
PCI: Convert __pci_walk_bus() to be recursive
PCI: Move __pci_walk_bus() mutex to where we need it
PCI: Make pci_destroy_dev() concurrent safe
PCI: Make pci_stop_dev() concurrent safe
Rename pwrctrl functions and structures from "pwrctl" to "pwrctrl" to match
the similar file renames.
Link: https://lore.kernel.org/r/20241115214428.2061153-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
As per the kernel device driver model, a pwrctl device is the supplier for
the PCI device, but the device link that enforces the supplier-consumer
relationship was previously created by the pwrctl driver. Therefore, the
driver model didn't prevent probing PCI client drivers before probing the
corresponding pwrctl drivers. This may lead to a race condition if the PCI
device was already powered on by the bootloader (before the pwrctl driver).
If the bootloader did not power on the PCI device, this wouldn't create any
problem as the pwrctl driver will be the one powering on the device, so the
PCI client driver always gets probed afterward. But if the device was
already powered on, then the device will be seen by the PCI core and the
PCI client driver may get probed before its pwrctl driver. This creates a
race condition as the pwrctl driver may change the device power state while
the device is being accessed by the client driver.
One such issue was already reported on the Qcom X13s platform with the WLAN
device and fixed with a hack in the WCN pwrseq driver by a9aaf1ff88
("power: sequencing: request the WLAN enable GPIO as-is").
A cleaner way to fix the above mentioned race condition is to ensure that
the pwrctl drivers are always probed before the client drivers.
If the PCI device is associated with a pwrctl platform device with a power
supply, add a device link between the PCI device and the pwrctl device
before device_attach() in pci_bus_add_device().
Note that there is no need to explicitly remove the device link as that
will be taken care of by the driver core when the PCI device gets removed.
Fixes: 4565d2652a ("PCI/pwrctl: Add PCI power control core code")
Fixes: 8fb18619d9 ("PCI/pwrctl: Create platform devices for child OF nodes of the port node")
Link: https://lore.kernel.org/r/20241025-pci-pwrctl-rework-v2-3-568756156cbe@linaro.org
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Tested-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[bhelgaas: squash fix from
https://lore.kernel.org/r/20241120062459.6371-1-manivannan.sadhasivam@linaro.org
for SPARCv9 issue reported by Jonathan Currier <dullfire@yahoo.com>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: wrap code to 80 columns]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: stable+noautosel@kernel.org # Depends on power supply check
Currently, pwrctl devices are created if the corresponding PCI nodes are
defined in devicetree. But this is not correct, because not all PCI nodes
require pwrctl support. Pwrctl comes into the picture only when the device
requires kernel to manage its power state. This can be determined using the
power supply properties present in the devicetree node of the device.
Add of_pci_supply_present() to check whether the devicetree contains at
least one power supply property for a device. If one is present, create a
pwrctl device for that PCI node.
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Fixes: 8fb18619d9 ("PCI/pwrctl: Create platform devices for child OF nodes of the port node")
Link: https://lore.kernel.org/r/20241025-pci-pwrctl-rework-v2-2-568756156cbe@linaro.org
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Tested-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[bhelgaas: rename of_pci_is_supply_present() to of_pci_supply_present() for
readability]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: stable+noautosel@kernel.org # Depends on of_platform_device_create() rework
The of_platform_populate() API creates platform devices by descending
through the children of the parent node. But it provides no control over
the child nodes, which makes it difficult to add checks for the child
nodes in the future.
Use of_platform_device_create() and for_each_child_of_node_scoped() to make
it possible to add checks for each node before creating the platform
device.
Link: https://lore.kernel.org/r/20241025-pci-pwrctl-rework-v2-1-568756156cbe@linaro.org
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Tested-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
There's only one user of pci_walk_bus_locked(), and it's internal to the
PCI core. Unexport it and make it private to drivers/pci/.
Link: https://lore.kernel.org/r/20241022224851.340648-6-kbusch@meta.com
Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: move decl to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
The original implementation of __pci_walk_bus() chose a non-recursive walk,
presumably as a precaution on stack use. We do recursive bus walking in
other places though. For example:
pci_bus_resettable()
pci_stop_bus_device()
pci_remove_bus_device()
pci_bus_allocate_dev_resources()
So recursive pci bus walking is well tested and safe, and is easier to
follow.
Convert __pci_walk_bus() to be recursive to make it easier to introduce
finer grain locking in the future.
Link: https://lore.kernel.org/r/20241022224851.340648-5-kbusch@meta.com
Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Simplify __pci_walk_bus() by moving the pci_bus_sem mutex into
pci_walk_bus(), the only place it is needed, and removing the parameter
that told __pci_walk_bus() whether to acquire the mutex.
Link: https://lore.kernel.org/r/20241022224851.340648-4-kbusch@meta.com
Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Use the atomic ADDED flag to ensure concurrent callers can't attempt to
stop the device multiple times. Callers should currently all be holding the
pci_rescan_remove_lock, so there shouldn't be an existing race. But that
global lock can cause lock dependency issues, so this is preparing to
reduce reliance on that lock by using the existing existing atomic bit ops.
Link: https://lore.kernel.org/r/20241022224851.340648-2-kbusch@meta.com
Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: squash https://lore.kernel.org/r/20241111180659.3321671-1-kbusch@meta.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The struct pci_bus_resource is only used in bus.c, so move it there.
Link: https://lore.kernel.org/r/20241017141111.44612-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2fe2abf896 ("PCI: augment bus resource table with a list") added
PCI_SUBTRACTIVE_DECODE which is put into the struct pci_bus_resource flags
field but is never read. There seems to never have been users for it.
Remove both PCI_SUBTRACTIVE_DECODE and the flags field from the struct
pci_bus_resource.
Link: https://lore.kernel.org/r/20241017141111.44612-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmaahiEUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwypg/+LSzrx0CyyXruwwkjuoMIzqXoEpxV
SSdJv47E9rnJymQvd0RAeNyc1BPbtRcP1FdEvV/G1ovb8qJSOJgU22PSSiMQsQ0h
2WGBl1ShubQDDLBdy1AggAsRJhIH4P4tWZ4k5Ftz6WZPWA1UcrDqmjN4d02UIYZb
A3YYcBEIm6bvrixxy+xq/Ii7S9A2idikabDLLGXOMSliFHx0ehWDNXyQEBONlrDh
rEHih21rPtOltVEdJl7yF+SIA467HI09NuXfTviHWnJ1hinFoSlEHIhz4j+i+r//
xOj7iDqtk/UAIToVsxtwgOnElNwY6ab/h/t1AmSSxX4FUEV2TiS1YEpUfX7pByt+
dytgvepjQyycC/ZHUtRZFZ6+1M0z+Vgb5c3+jXyPh8pQEPqmXt8+KYVIi/wychmJ
Opo4xniiDoKHSZ4E0bg/wMbe9yVCjTpX0i0S7BbNa/TRjud6vAhXvgx/y092jsdg
h4lU0ywNCgea/rZFHZYomPjncx9xJ+rtOaH+/dVQhCm/wuRHnj7tJGZnl5LfCWVw
+yNOcExQaE+lRvKqp6mQvUva3+4UArAL2tnFC00tGd0emRLIvXrxY2lF1sqp9wCZ
AJu65El4nnpFNU7vJR7x4X31BvcdquFEvfofPxPXbPz09N8hPRhkunKzgd5ftKZS
mcxMfStvIFXiMEM=
=vw2i
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Define PCIE_RESET_CONFIG_DEVICE_WAIT_MS for the generic 100ms
required after reset before config access (Kevin Xie)
- Define PCIE_T_RRS_READY_MS for the generic 100ms required after
reset before config access (probably should be unified with
PCIE_RESET_CONFIG_DEVICE_WAIT_MS) (Damien Le Moal)
Resource management:
- Rename find_resource() to find_resource_space() to be more
descriptive (Ilpo Järvinen)
- Export find_resource_space() for use by PCI core, which needs to
learn whether there is available space for a bridge window (Ilpo
Järvinen)
- Prevent double counting of resources so window size doesn't grow on
each remove/rescan cycle (Ilpo Järvinen)
- Relax bridge window sizing algorithm so a device doesn't break
simply because it was removed and rescanned (Ilpo Järvinen)
- Evaluate the ACPI PRESERVE_BOOT_CONFIG _DSM in
pci_register_host_bridge() (not acpi_pci_root_create()) so we can
unify it with similar DT functionality (Vidya Sagar)
- Extend use of DT "linux,pci-probe-only" property so it works
per-host bridge as well as globally (Vidya Sagar)
- Unify support for ACPI PRESERVE_BOOT_CONFIG _DSM and the DT
"linux,pci-probe-only" property in pci_preserve_config() (Vidya
Sagar)
Driver binding:
- Add devres infrastructure for managed request and map of partial
BAR resources (Philipp Stanner)
- Deprecate pcim_iomap_table() because uses like
"pcim_iomap_table()[0]" have no good way to return errors (Philipp
Stanner)
- Add an always-managed pcim_request_region() for use instead of
pci_request_region() and similar, which are sometimes managed
depending on whether pcim_enable_device() has been called
previously (Philipp Stanner)
- Reimplement pcim_set_mwi() so it doesn't need to keep store MWI
state (Philipp Stanner)
- Add pcim_intx() for use instead of pci_intx(), which is sometimes
managed depending on whether pcim_enable_device() has been called
previously (Philipp Stanner)
- Add managed pcim_iomap_range() to allow mapping of a partial BAR
(Philipp Stanner)
- Fix a devres mapping leak in drm/vboxvideo (Philipp Stanner)
Error handling:
- Add missing bridge locking in device reset path and add a warning
for other possible lock issues (Dan Williams)
- Fix use-after-free on concurrent DPC and hot-removal (Lukas Wunner)
Power management:
- Disable AER and DPC during suspend to avoid spurious wakeups if
they share an interrupt with PME (Kai-Heng Feng)
PCIe native device hotplug:
- Detect if a device was removed or replaced during system sleep so
we don't assume a new device is the one that used to be there
(Lukas Wunner)
Virtualization:
- Add an ACS quirk for Broadcom BCM5760X multi-function NIC; it
prevents transactions between functions even though it doesn't
advertise ACS, so the functions can be attached individually via
VFIO (Ajit Khaparde)
Peer-to-peer DMA:
- Add a "pci=config_acs=" kernel command-line parameter to relax
default ACS settings to enable additional peer-to-peer
configurations. Requires expert knowledge of topology and ACS
operation (Vidya Sagar)
Endpoint framework:
- Remove unused struct pci_epf_group.type_group (Christophe JAILLET)
- Fix error handling in vpci_scan_bus() and epf_ntb_epc_cleanup()
(Dan Carpenter)
- Make struct pci_epc_class constant (Greg Kroah-Hartman)
- Remove unused pci_endpoint_test_bar_{readl,writel} functions
(Jiapeng Chong)
- Rename "BME" to "Bus Master Enable" (Manivannan Sadhasivam)
- Rename struct pci_epc_event_ops.core_init() callback to epc_init()
(Manivannan Sadhasivam)
- Move DMA init to MHI .epc_init() callback for uniformity
(Manivannan Sadhasivam)
- Cancel EPF test delayed work when link goes down (Manivannan
Sadhasivam)
- Add struct pci_epc_event_ops.epc_deinit() callback for cleanup
needed on fundamental reset (Manivannan Sadhasivam)
- Add 64KB alignment to endpoint test to support Rockchip rk3588
(Niklas Cassel)
- Optimize endpoint test by using memcpy() instead of readl() (Niklas
Cassel)
Device tree bindings:
- Add generic "ats-supported" property to advertise that a PCIe Root
Complex supports ATS (Jean-Philippe Brucker)
Amazon Annapurna Labs PCIe controller driver:
- Validate IORESOURCE_BUS presence to avoid NULL pointer dereference
(Aleksandr Mishin)
Axis ARTPEC-6 PCIe controller driver:
- Rename .cpu_addr_fixup() parameter to reflect that it is a PCI
address, not a CPU address (Niklas Cassel)
Freescale i.MX6 PCIe controller driver:
- Convert to agnostic GPIO API (Andy Shevchenko)
Freescale Layerscape PCIe controller driver:
- Make struct mobiveil_rp_ops constant (Christophe JAILLET)
- Use new generic dw_pcie_ep_linkdown() to handle link-down events
(Manivannan Sadhasivam)
HiSilicon Kirin PCIe controller driver:
- Convert to agnostic GPIO API (Andy Shevchenko)
- Use _scoped() iterator for OF children to ensure refcounts are
decremented at loop exit (Javier Carrasco)
Intel VMD host bridge driver:
- Create sysfs "domain" symlink before downstream devices are exposed
to userspace by pci_bus_add_devices() (Jiwei Sun)
Loongson PCIe controller driver:
- Enable MSI when LS7A is used with new CPUs that have integrated
PCIe Root Complex, e.g., Loongson-3C6000, so downstream devices can
use MSI (Huacai Chen)
Microchip AXI PolarFlare PCIe controller driver:
- Move pcie-microchip-host.c to a new PLDA directory (Minda Chen)
- Factor PLDA generic items out to a common
plda,xpressrich3-axi-common.yaml binding (Minda Chen)
- Factor PLDA generic data structures and code out to shared
pcie-plda.h, pcie-plda-host.c (Minda Chen)
- Add PLDA generic interrupt handling with a .request_event_irq()
callback for vendor-specific events (Minda Chen)
- Add PLDA generic host init/deinit and map bus functions for use by
vendor-specific drivers (Minda Chen)
- Rework to use PLDA core (Minda Chen)
Microsoft Hyper-V host bridge driver:
- Return zero, not garbage, when reading PCI_INTERRUPT_PIN (Wei Liu)
NVIDIA Tegra194 PCIe controller driver:
- Remove unused struct tegra_pcie_soc (Dr. David Alan Gilbert)
- Set 64KB inbound ATU alignment restriction (Jon Hunter)
Qualcomm PCIe controller driver:
- Make the MHI reg region mandatory for X1E80100, since all PCIe
controllers have it (Abel Vesa)
- Prevent use of uninitialized data and possible error pointer
dereference (Dan Carpenter)
- Return error, not success, if dev_pm_opp_find_freq_floor() fails
(Dan Carpenter)
- Add Operating Performance Points (OPP) support to scale performance
state based on aggregate link bandwidth to improve SoC power
efficiency (Krishna chaitanya chundru)
- Vote for the CPU-PCIe ICC (interconnect) path to ensure it stays
active even if other drivers don't vote for it (Krishna chaitanya
chundru)
- Use devm_clk_bulk_get_all() to get all the clocks from DT to avoid
writing out all the clock names (Manivannan Sadhasivam)
- Add DT binding and driver support for the SA8775P SoC (Mrinmay
Sarkar)
- Add HDMA support for the SA8775P SoC (Mrinmay Sarkar)
- Override the SA8775P NO_SNOOP default to avoid possible memory
corruption (Mrinmay Sarkar)
- Make sure resources are disabled during PERST# assertion, even if
the link is already disabled (Manivannan Sadhasivam)
- Use new generic dw_pcie_ep_linkdown() to handle link-down events
(Manivannan Sadhasivam)
- Add DT and endpoint driver support for the SA8775P SoC (Mrinmay
Sarkar)
- Add Hyper DMA (HDMA) support for the SA8775P SoC and enable it in
the EPF MHI driver (Mrinmay Sarkar)
- Set PCIE_PARF_NO_SNOOP_OVERIDE to override the default NO_SNOOP
attribute on the SA8775P SoC (both Root Complex and Endpoint mode)
to avoid possible memory corruption (Mrinmay Sarkar)
Renesas R-Car PCIe controller driver:
- Demote WARN() to dev_warn_ratelimited() in rcar_pcie_wakeup() to
avoid unnecessary backtrace (Marek Vasut)
- Add DT and driver support for R-Car V4H (R8A779G0) host and
endpoint. This requires separate proprietary firmware (Yoshihiro
Shimoda)
Rockchip PCIe controller driver:
- Assert PERST# for 100ms after power is stable (Damien Le Moal)
- Wait PCIE_T_RRS_READY_MS (100ms) after reset before starting
configuration (Damien Le Moal)
- Use GPIOD_OUT_LOW flag while requesting ep_gpio to fix a firmware
crash on Qcom-based modems with Rockpro64 board (Manivannan
Sadhasivam)
Rockchip DesignWare PCIe controller driver:
- Factor common parts of rockchip-dw-pcie DT binding to be shared by
Root Complex and Endpoint mode (Niklas Cassel)
- Add missing INTx signals to common DT binding (Niklas Cassel)
- Add eDMA items to DT binding for Endpoint controller (Niklas
Cassel)
- Fix initial dw-rockchip PERST# GPIO value to prevent unnecessary
short assert/deassert that causes issues with some WLAN controllers
(Niklas Cassel)
- Refactor dw-rockchip and add support for Endpoint mode (Niklas
Cassel)
- Call pci_epc_init_notify() and drop dw_pcie_ep_init_notify()
wrapper (Niklas Cassel)
- Add error messages in .probe() error paths to improve user
experience (Uwe Kleine-König)
Samsung Exynos PCIe controller driver:
- Use bulk clock APIs to simplify clock setup (Shradha Todi)
StarFive PCIe controller driver:
- Add DT binding and driver support for the StarFive JH7110
PLDA-based PCIe controller (Minda Chen)
Synopsys DesignWare PCIe controller driver:
- Add generic support for sending PME_Turn_Off when system suspends
(Frank Li)
- Fix incorrect interpretation of iATU slot 0 after PERST#
assert/deassert (Frank Li)
- Use msleep() instead of usleep_range() while waiting for link
(Konrad Dybcio)
- Refactor dw_pcie_edma_find_chip() to enable adding support for
Hyper DMA (HDMA) (Manivannan Sadhasivam)
- Enable drivers to supply the eDMA channel count since some can't
auto detect this (Manivannan Sadhasivam)
- Call pci_epc_init_notify() and drop dw_pcie_ep_init_notify()
wrapper (Manivannan Sadhasivam)
- Pass the eDMA mapping format directly from drivers instead of
maintaining a capability for it (Manivannan Sadhasivam)
- Add generic dw_pcie_ep_linkdown() to notify EPF drivers about
link-down events and restore non-sticky DWC registers lost on link
down (Manivannan Sadhasivam)
- Add vendor-specific "apb" reg name, interrupt names, INTx names to
generic binding (Niklas Cassel)
- Enforce DWC restriction that 64-bit BARs must start with an
even-numbered BAR (Niklas Cassel)
- Consolidate args of dw_pcie_prog_outbound_atu() into a structure
(Yoshihiro Shimoda)
- Add support for endpoints to send Message TLPs, e.g., for INTx
emulation (Yoshihiro Shimoda)
TI DRA7xx PCIe controller driver:
- Rename .cpu_addr_fixup() parameter to reflect that it is a PCI
address, not a CPU address (Niklas Cassel)
TI Keystone PCIe controller driver:
- Validate IORESOURCE_BUS presence to avoid NULL pointer dereference
(Aleksandr Mishin)
- Work around AM65x/DRA80xM Errata #i2037 that corrupts TLPs and
causes processor hangs by limiting Max_Read_Request_Size (MRRS) and
Max_Payload_Size (MPS) (Kishon Vijay Abraham I)
- Leave BAR 0 disabled for AM654x to fix a regression caused by
6ab15b5e70 ("PCI: dwc: keystone: Convert .scan_bus() callback to
use add_bus"), which caused a 45-second boot delay (Siddharth
Vadapalli)
Xilinx Versal CPM PCIe controller driver:
- Fix overlapping bridge registers and 32-bit BAR addresses in DT
binding (Thippeswamy Havalige)
MicroSemi Switchtec management driver:
- Make struct switchtec_class constant (Greg Kroah-Hartman)
Miscellaneous:
- Remove unused struct acpi_handle_node (Dr. David Alan Gilbert)
- Add missing MODULE_DESCRIPTION() macros (Jeff Johnson)"
* tag 'pci-v6.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (154 commits)
PCI: loongson: Enable MSI in LS7A Root Complex
PCI: Extend ACS configurability
PCI: Add missing bridge lock to pci_bus_lock()
drm/vboxvideo: fix mapping leaks
PCI: Add managed pcim_iomap_range()
PCI: Remove legacy pcim_release()
PCI: Add managed pcim_intx()
PCI: vmd: Create domain symlink before pci_bus_add_devices()
PCI: qcom: Prevent use of uninitialized data in qcom_pcie_suspend_noirq()
PCI: qcom: Prevent potential error pointer dereference
PCI: qcom: Fix missing error code in qcom_pcie_probe()
PCI: Give pcim_set_mwi() its own devres cleanup callback
PCI: Move struct pci_devres.pinned bit to struct pci_dev
PCI: Remove struct pci_devres.enabled status bit
PCI: Document hybrid devres hazards
PCI: Add managed pcim_request_region()
PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()
PCI: Add managed partial-BAR request and map infrastructure
PCI: Add devres helpers for iomap table
PCI: Add and use devres helper for bit masks
...
Commit 50b040ef37 ("PCI/pwrctl: only call of_platform_populate() if
CONFIG_OF is enabled") added the CONFIG_OF guard for the
of_platform_populate() API. But it missed the fact that the CONFIG_OF
platforms can also run on ACPI without devicetree (so dev.of_node will
be NULL). In those cases, of_platform_populate() will fail with below
error messages as seen on the Ampere Altra box:
pci 000c:00:01.0: failed to populate child OF nodes (-22)
pci 000c:00:02.0: failed to populate child OF nodes (-22)
Fix this by checking for the existence of 'dev.of_node' before calling
the of_platform_populate() API. This also warrants the removal of
CONFIG_OF check, since dev_of_node() helper will return NULL if
CONFIG_OF is not enabled.
While at it, let's also use dev_of_node() to pass device OF node pointer
to of_platform_populate().
Fixes: 50b040ef37 ("PCI/pwrctl: only call of_platform_populate() if CONFIG_OF is enabled")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/linux-arm-msm/CAHk-=wjcO_9dkNf-bNda6bzykb5ZXWtAYA97p7oDsXPHmMRi6g@mail.gmail.com
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If of_platform_populate() is called when CONFIG_OF is not defined this
leads to spurious error messages of the following type:
pci 0000:00:01.1: failed to populate child OF nodes (-19)
pci 0000:00:02.1: failed to populate child OF nodes (-19)
Fixes: 8fb18619d9 ("PCI/pwrctl: Create platform devices for child OF nodes of the port node")
Signed-off-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/all/20240702173255.39932-1-superm1@kernel.org/
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Reported-by: Praveenkumar Patil <PraveenKumar.Patil@amd.com>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240707183829.41519-1-spasswolf@web.de
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
In preparation for introducing PCI device power control - a set of
library functions that will allow powering-up of PCI devices before
they're detected on the PCI bus - we need to populate the devices
defined on the device-tree.
We are reusing the platform bus as it provides us with all the
infrastructure we need to match the pwrctl drivers against the
compatibles from OF nodes.
These platform devices will be probed by the driver core and bound to
the PCI pwrctl drivers we'll introduce later.
Tested-by: Amit Pundir <amit.pundir@linaro.org>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD, SM8650-QRD & SM8650-HDK
Tested-by: Caleb Connolly <caleb.connolly@linaro.org> # OnePlus 8T
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20240612082019.19161-4-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
To make it simpler to declare resource constraint alignf callbacks, add
typedef for it and document it.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240507102523.57320-5-ilpo.jarvinen@linux.intel.com
Tested-by: Lidong Wang <lidong.wang@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
A last minute revert in 6.7-final introduced a potential deadlock when
enabling ASPM during probe of Qualcomm PCIe controllers as reported by
lockdep:
============================================
WARNING: possible recursive locking detected
6.7.0 #40 Not tainted
--------------------------------------------
kworker/u16:5/90 is trying to acquire lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
but task is already holding lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(pci_bus_sem);
lock(pci_bus_sem);
*** DEADLOCK ***
Call trace:
print_deadlock_bug+0x25c/0x348
__lock_acquire+0x10a4/0x2064
lock_acquire+0x1e8/0x318
down_read+0x60/0x184
pcie_aspm_pm_state_change+0x58/0xdc
pci_set_full_power_state+0xa8/0x114
pci_set_power_state+0xc4/0x120
qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
pci_walk_bus+0x64/0xbc
qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]
The deadlock can easily be reproduced on machines like the Lenovo ThinkPad
X13s by adding a delay to increase the race window during asynchronous
probe where another thread can take a write lock.
Add a new pci_set_power_state_locked() and associated helper functions that
can be called with the PCI bus semaphore held to avoid taking the read lock
twice.
Link: https://lore.kernel.org/r/ZZu0qx2cmn7IwTyQ@hovoldconsulting.com
Link: https://lore.kernel.org/r/20240130100243.11011-1-johan+linaro@kernel.org
Fixes: f93e71aea6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org> # 6.7
DT core:
- Add support for generating DT nodes for PCI devices. This is the
groundwork for applying overlays to PCI devices containing
non-discoverable downstream devices.
- DT unittest additions to check reverted changesets, to test for
refcount issues, and to test unresolved symbols. Also, various
clean-ups of the unittest along the way.
- Refactor node and property manipulation functions to better share code
with old API and changeset API
- Refactor changeset print functions to a common implementation
- Move some platform_device specific functions into of_platform.c
Bindings:
- Treewide fixing of typos
- Treewide clean-up of SPDX tags to use 'OR' consistently
- Last chunk of dropping unnecessary quotes. With that, the check
for unnecessary quotes is enabled in yamllint.
- Convert ftgmac100, zynqmp-genpd, pps-gpio, syna,rmi4, and qcom,ssbi
bindings to DT schema format
- Add Allwinner V3s xHCI USB, Saef SF-TC154B display, QCom SM8450 Inline
Crypto Engine, QCom SM6115 UFS, QCom SDM670 PDC interrupt controller,
Arm 2022 Cortex cores, and QCom IPQ9574 Crypto bindings
- Fixes for Rockchip DWC PCI binding
- Ensure all properties are evaluated on USB connector schema
- Fix dt-check-compatible script to find of_device_id instances with
compiler annotations
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmTubyoACgkQ+vtdtY28
YcPamA//feXFYNPiIbSa7XqfAu1PE5XSg3PqCe77QvLBGJU7saTwRJApc88iTjlA
hc5EELnZKp3FE9N7DJdmvEjYxKDqtJOukO+txKy3mFBWo+gZQURthZVcbLxUZmpw
XmYA4b/GrIv5h8YWG1wokyaGTtSfTcf0+RmAtVepiDk5kWQKaC04Let356fKn9xi
ePgLTZV6BJvPoGpMWd08o+1szUAc6Vihs9qWu7g0+mtb5K5xi/l05YMz3REu7kpf
iz06CE/uzYvHpFBJZ6izN+9Qqxh52DnWckXX68v8kStHUON2h1YmZYvjhGrfay4k
rHeDnHoRBrepDDCytXQ/fxzGtURr3b8yBnlhzEQadMLXmf25mm+TRBDmf6GnX5ij
QmHlj+eSARIafcbb4fqF1Hdyv8c7XM0AkEnj1XrIWLtXPuRNSHlS25dngCztbII/
lqmtBaH1ifCKj2VQ8YL8sVX7k208YU9vDNKZHQyA8dPEYwhknrWmp1F0OAnBB+wz
F11kDE7xkZ0/gE7mUHwe9mP94hC6Ceks4IuBvsTzBmSwqXxyCz8gM2KHK4U3gNUr
Sk2hWgZn+k2HM9zLb38FE18C6hqws6RBUWnJwZ4V3qPo2eYJ8Jzkvm7oonxjHgCC
4FmYYAoCQhBEkZPOJ4map0eO5VbShn9Hrgs46Jj4WoXmm7dFDLc=
=kl+z
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DT core:
- Add support for generating DT nodes for PCI devices. This is the
groundwork for applying overlays to PCI devices containing
non-discoverable downstream devices.
- DT unittest additions to check reverted changesets, to test for
refcount issues, and to test unresolved symbols. Also, various
clean-ups of the unittest along the way.
- Refactor node and property manipulation functions to better share
code with old API and changeset API
- Refactor changeset print functions to a common implementation
- Move some platform_device specific functions into of_platform.c
Bindings:
- Treewide fixing of typos
- Treewide clean-up of SPDX tags to use 'OR' consistently
- Last chunk of dropping unnecessary quotes. With that, the check for
unnecessary quotes is enabled in yamllint.
- Convert ftgmac100, zynqmp-genpd, pps-gpio, syna,rmi4, and qcom,ssbi
bindings to DT schema format
- Add Allwinner V3s xHCI USB, Saef SF-TC154B display, QCom SM8450
Inline Crypto Engine, QCom SM6115 UFS, QCom SDM670 PDC interrupt
controller, Arm 2022 Cortex cores, and QCom IPQ9574 Crypto bindings
- Fixes for Rockchip DWC PCI binding
- Ensure all properties are evaluated on USB connector schema
- Fix dt-check-compatible script to find of_device_id instances with
compiler annotations"
* tag 'devicetree-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (64 commits)
dt-bindings: usb: Add V3s compatible string for OHCI
dt-bindings: usb: Add V3s compatible string for EHCI
dt-bindings: display: panel: mipi-dbi-spi: add Saef SF-TC154B
dt-bindings: vendor-prefixes: document Saef Technology
dt-bindings: thermal: lmh: update maintainer address
of: unittest: Fix of_unittest_pci_node() kconfig dependencies
dt-bindings: crypto: ice: Document sm8450 inline crypto engine
dt-bindings: ufs: qcom: Add ICE to sm8450 example
dt-bindings: ufs: qcom: Add sm6115 binding
dt-bindings: ufs: qcom: Add reg-names property for ICE
dt-bindings: yamllint: Enable quoted string check
dt-bindings: Drop remaining unneeded quotes
of: unittest-data: Fix whitespace - angular brackets
of: unittest-data: Fix whitespace - indentation
of: unittest-data: Fix whitespace - blank lines
of: unittest-data: Convert remaining overlay DTS files to sugar syntax
of: overlay: unittest: Add test for unresolved symbol
of: unittest: Add separators to of_unittest_overlay_high_level()
of: unittest: Cleanup partially-applied overlays
of: unittest: Merge of_unittest_apply{,_revert}_overlay_check()
...
The PCI endpoint device such as Xilinx Alveo PCI card maps the register
spaces from multiple hardware peripherals to its PCI BAR. Normally,
the PCI core discovers devices and BARs using the PCI enumeration process.
There is no infrastructure to discover the hardware peripherals that are
present in a PCI device, and which can be accessed through the PCI BARs.
Apparently, the device tree framework requires a device tree node for the
PCI device. Thus, it can generate the device tree nodes for hardware
peripherals underneath. Because PCI is self discoverable bus, there might
not be a device tree node created for PCI devices. Furthermore, if the PCI
device is hot pluggable, when it is plugged in, the device tree nodes for
its parent bridges are required. Add support to generate device tree node
for PCI bridges.
Add an of_pci_make_dev_node() interface that can be used to create device
tree node for PCI devices.
Add a PCI_DYNAMIC_OF_NODES config option. When the option is turned on,
the kernel will generate device tree nodes for PCI bridges unconditionally.
Initially, add the basic properties for the dynamically generated device
tree nodes which include #address-cells, #size-cells, device_type,
compatible, ranges, reg.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/r/1692120000-46900-3-git-send-email-lizhi.hou@amd.com
Signed-off-by: Rob Herring <robh@kernel.org>
The blamed commit has broken probing on
arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi when &enetc_port0
(PCI function 0) has status = "disabled".
Background: pci_scan_slot() has logic to say that if the function 0 of a
device is absent, the entire device is absent and we can skip the other
functions entirely. Traditionally, this has meant that
pci_bus_read_dev_vendor_id() returns an error code for that function.
However, since the blamed commit, there is an extra confounding
condition: function 0 of the device exists and has a valid vendor id,
but it is disabled in the device tree. In that case, pci_scan_slot()
would incorrectly skip the entire device instead of just that function.
In the case of NXP LS1028A, status = "disabled" does not mean that the
PCI function's config space is not available for reading. It is, but the
Ethernet port is just not functionally useful with a particular SerDes
protocol configuration (0x9999) due to pinmuxing constraints of the Soc.
So, pci_scan_slot() skips all other functions on the ENETC ECAM
(enetc_port1, enetc_port2, enetc_mdio_pf3 etc) when just enetc_port0 had
to not be probed.
There is an additional regression introduced by the change, caused by
its fundamental premise. The enetc driver needs to run code for all PCI
functions, regardless of whether they're enabled or not in the device
tree. That is no longer possible if the driver's probe function is no
longer called. But Rob recommends that we move the of_device_is_available()
detection to dev->match_driver, and this makes the PCI fixups still run
on all functions, while just probing drivers for those functions that
are enabled. So, a separate change in the enetc driver will have to move
the workarounds to a PCI fixup.
Fixes: 6fffbc7ae1 ("PCI: Honor firmware's device disabled status")
Link: https://lore.kernel.org/netdev/CAL_JsqLsVYiPLx2kcHkDQ4t=hQVCR7NHziDwi9cCFUFhx48Qow@mail.gmail.com/
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmRIKooUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxq7A/9G0sInrqvqH2I9/Set/FnmMfCtGDH
YcEjHYYxL+pztSiXTavDV+ib9iaut83oYtcV9p1bUMhJoZdKNZhrNdIGzRFSemI4
0/ShtklPzNEu6nPPL24CnEzgbrODBU56ZvzrIE/tShEoOjkKa1triBnOA/JMxYTL
cUwqDQlDkdpYniCgxy05QfcFZ0mmSOkbl7runGfTMTiUKKC3xSRiaW5YN9KZe3i7
G5YHu1VVCjeQdQSICHYwyFmkyiqosCoajQNp1IHBkWqSwilzyZMg0NWJobVSA7M/
mXXnzLtFcC60oT58/9MaggQwDTaSGDE8mG+sWv05bB2u5TQVyZEZqZ4c2FzmZIZT
WLZYLB6PFRW0zePEuMnVkSLS2npkX+aGaBv28bf88sjorpaYNG01uYijnLEceolQ
yBPFRN3bsRuOyHvYY/tiZX/BP7z/DS++XXwA8zQWZnYsXSlncJdwCNquV0xIwUt+
hij4/Yu7o9SgV1LbuwtkMFAn3C9Szc65Eer+IvRRdnMZYphjVHbA5F2msRFyiCeR
HxECtMQ1jBnVrpQAcBX1Sz+Vu5MrwCqzc2n6tvTQHDvVNjXfkG3NaFhxYPc1IL9Z
NJMeCKfK1qzw7TtbvWXCluTTIM9N/bNJXrJhQbjNY7V6IaBZY1QNYW0ZFfGgj6Gb
UUPgndidRy4/hzw=
=HPXl
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Resource management:
- Add pci_dev_for_each_resource() and pci_bus_for_each_resource()
iterators
PCIe native device hotplug:
- Fix AB-BA deadlock between reset_lock and device_lock
Power management:
- Wait longer for devices to become ready after resume (as we do for
reset) to accommodate Intel Titan Ridge xHCI devices
- Extend D3hot delay for NVIDIA HDA controllers to avoid
unrecoverable devices after a bus reset
Error handling:
- Clear PCIe Device Status after EDR since generic error recovery now
only clears it when AER is native
ASPM:
- Work around Chromebook firmware defect that clobbers Capability
list (including ASPM L1 PM Substates Cap) when returning from
D3cold to D0
Freescale i.MX6 PCIe controller driver:
- Install imprecise external abort handler only when DT indicates
PCIe support
Freescale Layerscape PCIe controller driver:
- Add ls1028a endpoint mode support
Qualcomm PCIe controller driver:
- Add SM8550 DT binding and driver support
- Add SDX55 DT binding and driver support
- Use bulk APIs for clocks of IP 1.0.0, 2.3.2, 2.3.3
- Use bulk APIs for reset of IP 2.1.0, 2.3.3, 2.4.0
- Add DT "mhi" register region for supported SoCs
- Expose link transition counts via debugfs to help debug low power
issues
- Support system suspend and resume; reduce interconnect bandwidth
and turn off clock and PHY if there are no active devices
- Enable async probe by default to reduce boot time
Miscellaneous:
- Sort controller Kconfig entries by vendor"
* tag 'pci-v6.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (56 commits)
PCI: xilinx: Drop obsolete dependency on COMPILE_TEST
PCI: mobiveil: Sort Kconfig entries by vendor
PCI: dwc: Sort Kconfig entries by vendor
PCI: Sort controller Kconfig entries by vendor
PCI: Use consistent controller Kconfig menu entry language
PCI: xilinx-nwl: Add 'Xilinx' to Kconfig prompt
PCI: hv: Add 'Microsoft' to Kconfig prompt
PCI: meson: Add 'Amlogic' to Kconfig prompt
PCI: Use of_property_present() for testing DT property presence
PCI/PM: Extend D3hot delay for NVIDIA HDA controllers
dt-bindings: PCI: qcom: Document msi-map and msi-map-mask properties
PCI: qcom: Add SM8550 PCIe support
dt-bindings: PCI: qcom: Add SM8550 compatible
PCI: qcom: Add support for SDX55 SoC
dt-bindings: PCI: qcom-ep: Fix the unit address used in example
dt-bindings: PCI: qcom: Add SDX55 SoC
dt-bindings: PCI: qcom: Update maintainers entry
PCI: qcom: Enable async probe by default
PCI: qcom: Add support for system suspend and resume
PCI/PM: Drop pci_bridge_wait_for_secondary_bus() timeout parameter
...
Refactor pci_bus_for_each_resource() in the same way as
pci_dev_for_each_resource(). This allows the index to be hidden inside the
implementation so the caller can omit it when it's not used otherwise.
No functional changes intended.
Link: https://lore.kernel.org/r/20230330162434.35055-6-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
On s390 PCI functions may be hotplugged individually even when they
belong to a multi-function device. In particular on an SR-IOV device VFs
may be removed and later re-added.
In commit a50297cf82 ("s390/pci: separate zbus creation from
scanning") it was missed however that struct pci_bus and struct
zpci_bus's resource list retained a reference to the PCI functions MMIO
resources even though those resources are released and freed on
hot-unplug. These stale resources may subsequently be claimed when the
PCI function re-appears resulting in use-after-free.
One idea of fixing this use-after-free in s390 specific code that was
investigated was to simply keep resources around from the moment a PCI
function first appeared until the whole virtual PCI bus created for
a multi-function device disappears. The problem with this however is
that due to the requirement of artificial MMIO addreesses (address
cookies) extra logic is then needed to keep the address cookies
compatible on re-plug. At the same time the MMIO resources semantically
belong to the PCI function so tying their lifecycle to the function
seems more logical.
Instead a simpler approach is to remove the resources of an individually
hot-unplugged PCI function from the PCI bus's resource list while
keeping the resources of other PCI functions on the PCI bus untouched.
This is done by introducing pci_bus_remove_resource() to remove an
individual resource. Similarly the resource also needs to be removed
from the struct zpci_bus's resource list. It turns out however, that
there is really no need to add the MMIO resources to the struct
zpci_bus's resource list at all and instead we can simply use the
zpci_bar_struct's resource pointer directly.
Fixes: a50297cf82 ("s390/pci: separate zbus creation from scanning")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20230306151014.60913-2-schnelle@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
pci_bus_alloc_from_region() allocates MMIO space by iterating through all
the resources available on the bus. The available resource might be
reduced if the caller requires 32-bit space or we're avoiding BIOS or E820
areas.
Don't bother calling allocate_resource() if we need more space than is
available in this resource. This prevents some pointless and annoying
messages about avoided areas.
Link: https://lore.kernel.org/r/20221208190341.1560157-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
device_attach() returning failure indicates a driver error while trying to
probe the device. In such a scenario, the PCI device should still be added
in the system and be visible to the user.
When device_attach() fails, merely warn about it and keep the PCI device in
the system.
This partially reverts ab1a187bba ("PCI: Check device_attach() return
value always").
Link: https://lore.kernel.org/r/20200706233240.3245512-1-rajatja@google.com
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org # v4.6+
pci_bus_get() and pci_bus_put() are not used by a loadable kernel module
and do not need to be exported.
These were exported by fe830ef62a ("PCI: Introduce pci_bus_{get|put}() to
manage PCI bus reference count"), but there are no loadable modules in the
tree that use them.
Link: https://lore.kernel.org/r/20190717182353.45557-1-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace dev_printk(KERN_DEBUG) with dev_info(), etc to be more consistent
with other logging and avoid checkpatch warnings.
The KERN_DEBUG messages could be converted to dev_dbg(), but that depends
on CONFIG_DYNAMIC_DEBUG and DEBUG, and we want most of these messages to
*always* be in the dmesg log.
Link: https://lore.kernel.org/lkml/1555733240-19875-1-git-send-email-mohankumar718@gmail.com
Signed-off-by: Mohan Kumar <mohankumar718@gmail.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.
When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.
is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.
A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.
Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master(). As these fields are not
handled atomically, that clears the is_added bit.
Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state. This avoids the
race because is_added no longer shares a memory location with is_busmaster.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
This symbol is now always identical to CONFIG_ARCH_DMA_ADDR_T_64BIT, so
remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Remove pointless comments that tell us the file name, remove blank line
comments, follow multi-line comment conventions. No functional change
intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/spdx:
PCI: Add SPDX GPL-2.0+ to replace implicit GPL v2 or later statement
PCI: Add SPDX GPL-2.0+ to replace GPL v2 or later boilerplate
PCI: Add SPDX GPL-2.0 to replace COPYING boilerplate
PCI: Add SPDX GPL-2.0 to replace GPL v2 boilerplate
PCI: Add SPDX GPL-2.0 when no license was specified
b24413180f ("License cleanup: add SPDX GPL-2.0 license identifier to
files with no license") added SPDX GPL-2.0 to several PCI files that
previously contained no license information.
Add SPDX GPL-2.0 to all other PCI files that did not contain any license
information and hence were under the default GPL version 2 license of the
kernel.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add PCI-specific dev_printk() wrappers and use them to simplify the code
slightly. No functional change intended.
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
[bhelgaas: squash into one patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The algorithm to update the flag indicating whether a bridge may go to D3
makes a few optimizations based on whether the update was caused by the
removal of a device on the one hand, versus the addition of a device or the
change of its D3cold flags on the other hand.
The information whether the update pertains to a removal is currently
passed in by the caller, but the function may as well determine that itself
by examining the device in question, thereby allowing for a considerable
simplification and reduction of the code.
Out of several options to determine removal, I've chosen the function
device_is_registered() because it's cheap: It merely returns the
dev->kobj.state_in_sysfs flag. That flag is set through device_add() when
the root bus is scanned and cleared through device_remove(). The call to
pci_bridge_d3_update() happens after each of these calls, respectively, so
the ordering is correct.
No functional change intended.
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>