When auxiliary_device_add() fails in idpf_plug_vport_aux_dev() or
idpf_plug_core_aux_dev(), the err_aux_dev_add label calls
auxiliary_device_uninit() and falls through to err_aux_dev_init. The
uninit call will trigger put_device(), which invokes the release
callback (idpf_vport_adev_release / idpf_core_adev_release) that frees
iadev. The fall-through then reads adev->id from the freed iadev for
ida_free() and double-frees iadev with kfree().
Free the IDA slot and clear the back-pointer before uninit, while adev
is still valid, then return immediately.
Commit 65637c3a18 ("idpf: fix UAF in RDMA core aux dev deinitialization")
fixed the same use-after-free in the matching unplug path in this file but
missed both probe error paths.
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: stable@kernel.org
Fixes: be91128c57 ("idpf: implement RDMA vport auxiliary dev create, init, and destroy")
Fixes: f4312e6bfa ("idpf: implement core RDMA auxiliary dev create, init, and destroy")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-4-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In idpf_ptp_init(), read_dev_clk_lock is initialized after
ptp_schedule_worker() had already been called (and after
idpf_ptp_settime64() could reach the lock). The PTP aux worker
fires immediately upon scheduling and can call into
idpf_ptp_read_src_clk_reg_direct(), which takes
spin_lock(&ptp->read_dev_clk_lock) on an uninitialized lock, triggering
the lockdep "non-static key" warning:
[12973.796587] idpf 0000:83:00.0: Device HW Reset initiated
[12974.094507] INFO: trying to register non-static key.
...
[12974.097208] Call Trace:
[12974.097213] <TASK>
[12974.097218] dump_stack_lvl+0x93/0xe0
[12974.097234] register_lock_class+0x4c4/0x4e0
[12974.097249] ? __lock_acquire+0x427/0x2290
[12974.097259] __lock_acquire+0x98/0x2290
[12974.097272] lock_acquire+0xc6/0x310
[12974.097281] ? idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097311] ? lockdep_hardirqs_on_prepare+0xde/0x190
[12974.097318] ? finish_task_switch.isra.0+0xd2/0x350
[12974.097330] ? __pfx_ptp_aux_kworker+0x10/0x10 [ptp]
[12974.097343] _raw_spin_lock+0x30/0x40
[12974.097353] ? idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097373] idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097391] ? kthread_worker_fn+0x88/0x3d0
[12974.097404] ? kthread_worker_fn+0x4e/0x3d0
[12974.097411] idpf_ptp_update_cached_phctime+0x26/0x120 [idpf]
[12974.097428] ? _raw_spin_unlock_irq+0x28/0x50
[12974.097436] idpf_ptp_do_aux_work+0x15/0x20 [idpf]
[12974.097454] ptp_aux_kworker+0x20/0x40 [ptp]
[12974.097464] kthread_worker_fn+0xd5/0x3d0
[12974.097474] ? __pfx_kthread_worker_fn+0x10/0x10
[12974.097482] kthread+0xf4/0x130
[12974.097489] ? __pfx_kthread+0x10/0x10
[12974.097498] ret_from_fork+0x32c/0x410
[12974.097512] ? __pfx_kthread+0x10/0x10
[12974.097519] ret_from_fork_asm+0x1a/0x30
[12974.097540] </TASK>
Move the call to spin_lock_init() up a bit to make sure read_dev_clk_lock
is not touched before it's been initialized.
Fixes: 5cb8805d23 ("idpf: negotiate PTP capabilities and get PTP clock")
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-3-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmnfwfMUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxwIRAAlN1h5er8aFDbjON5YMXBZqlQmzaC
bjlUHgwm7HkdErTFozyuqhE8QUO1kCm4uMQzeyJdfY9nRWqMDOuKYxMD5j0exk+o
4tbbJg6Xx4dq7Qrawy9PhxyQm/PDAcvs+FRRlGala+qq9o3fxPDOAZVDE/1C8qFQ
Jd7GGd7NZn/NN4xrqST4RQHjO8fwaMwmksWCStsb79kfesQWP6kLADGfIMcWxNUB
2s+oTnK6Hw0tkBv56n6i8mbb0EzS3/RN1daTevGAta1rmfUVVtWGRZ4paMvv0Owi
Rl5+O5Jz6/c1qiXZbUqu5CRQPIy7Dr3JPvURcZX6qbsV8PzWXZr0Wi+geWefGOnp
55y+3OT0vdBGAuXLJhrcU7Clzq9D/TZOt8oTI8IFArUfDlmrAIdozPn7gr+VGre5
QuKymSk3XWtyIbe4o8UeZ4f9g0y6ZY1XvtvB7K1tze+OOmqlkfq966+z8aZuGOKx
ZvAU/NIat5H02EgB4dEVOP8R5vPZlXGT0RLGl1JWRypPWyZDbVVA3z927qRQG5md
IsVq8WaIrB1zyl9g37lZeEaYwP/qCIQsHkMGPYcP4wdOQEV9AQqi5pmjMXnWyQJD
PR1nvmTKW7USRCJ+pz8xPhZh0cj3ENaddORTD3I/0CGVV0y452bU/5rr4T+K04bK
PCJBpxTIDuWDwXc=
=FFRz
-----END PGP SIGNATURE-----
Merge tag 'pci-v7.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Allow TLP Processing Hints to be enabled for RCiEPs (George Abraham
P)
- Enable AtomicOps only if we know the Root Port supports them (Gerd
Bayer)
- Don't enable AtomicOps for RCiEPs since none of them need Atomic
Ops and we can't tell whether the Root Complex would support them
(Gerd Bayer)
- Leave Precision Time Measurement disabled until a driver enables it
to avoid PCIe errors (Mika Westerberg)
- Make pci_set_vga_state() fail if bridge doesn't support VGA
routing, i.e., PCI_BRIDGE_CTL_VGA is not writable, and return
errors to vga_get() callers including userspace via
/dev/vga_arbiter (Simon Richter)
- Validate max-link-speed from DT in j721e, brcmstb, mediatek-gen3,
rzg3s drivers (where the actual controller constraints are known),
and remove validation from the generic OF DT accessor (Hans Zhang)
- Remove pc110pad driver (no longer useful after 486 CPU support
removed) and no_pci_devices() (pc110pad was the last user) (Dmitry
Torokhov, Heiner Kallweit)
Resource management:
- Prevent assigning space to unimplemented bridge windows; previously
we mistakenly assumed prefetchable window existed and assigned
space and put a BAR there (Ahmed Naseef)
- Avoid shrinking bridge windows to fit in the initial Root Port
window; fixes one problem with devices with large BARs connected
via switches, e.g., Thunderbolt (Ilpo Järvinen)
- Pass full extent of empty space, not just the aligned space, to
resource_alignf callback so free space before the requested
alignment can be used (Ilpo Järvinen)
- Place small resources before larger ones for better utilization of
address space (Ilpo Järvinen)
- Fix alignment calculation for resource size larger than align,
e.g., bridge windows larger than the 1MB required alignment (Ilpo
Järvinen)
Reset:
- Update slot handling so all ARI functions are treated as being in
the same slot. They're all reset by Secondary Bus Reset, but
previously drivers of ARI functions that appeared to be on a
non-zero device weren't notified and fatal hardware errors could
result (Keith Busch)
- Make sysfs reset_subordinate hotplug safe to avoid spurious hotplug
events (Keith Busch)
- Hide Secondary Bus Reset ('bus') from sysfs reset_methods if masked
by CXL because it has no effect (Vidya Sagar)
- Avoid FLR for AMD NPU device, where it causes the device to hang
(Lizhi Hou)
Error handling:
- Clear only error bits in PCIe Device Status to avoid accidentally
clearing Emergency Power Reduction Detected (Shuai Xue)
- Check for AER errors even in devices without drivers (Lukas Wunner)
- Initialize ratelimit info so DPC and EDR paths log AER error
information (Kuppuswamy Sathyanarayanan)
Power control:
- Add UPD720201/UPD720202 USB 3.0 xHCI Host Controller .compatible so
generic pwrctrl driver can control it (Neil Armstrong)
Hotplug:
- Set LED_HW_PLUGGABLE for NPEM hotplug-capable ports so LED core
doesn't complain when setting brightness fails because the endpoint
is gone (Richard Cheng)
Peer-to-peer DMA:
- Allow wildcards in list of host bridges that support peer-to-peer
DMA between hierarchy domains and add all Google SoCs (Jacob
Moroni)
Endpoint framework:
- Advertise dynamic inbound mapping support in pci-epf-test and
update host pci_endpoint_test to skip doorbell testing if not
advertised by endpoint (Koichiro Den)
- Return 0, not remaining timeout, when MHI eDMA ops complete so
mhi_ep_ring_add_element() doesn't interpret non-zero as failure
(Daniel Hodges)
- Remove vntb and ntb duplicate resource teardown that leads to oops
when .allow_link() fails or .drop_link() is called (Koichiro Den)
- Disable vntb delayed work before clearing BAR mappings and
doorbells to avoid oops caused by doing the work after resources
have been torn down (Koichiro Den)
- Add a way to describe reserved subregions within BARs, e.g.,
platform-owned fixed register windows, and use it for the RK3588
BAR4 DMA ctrl window (Koichiro Den)
- Add BAR_DISABLED for BARs that will never be available to an EPF
driver, and change some BAR_RESERVED annotations to BAR_DISABLED
(Niklas Cassel)
- Add NTB .get_dma_dev() callback for cases where DMA API requires a
different device, e.g., vNTB devices (Koichiro Den)
- Add reserved region types for MSI-X Table and PBA so Endpoint
controllers can them as describe hardware-owned regions in a
BAR_RESERVED BAR (Manikanta Maddireddy)
- Make Tegra194/234 BAR0 programmable and remove 1MB size limit
(Manikanta Maddireddy)
- Expose Tegra BAR2 (MSI-X) and BAR4 (DMA) as 64-bit BAR_RESERVED
(Manikanta Maddireddy)
- Add Tegra194 and Tegra234 device table entries to pci_endpoint_test
(Manikanta Maddireddy)
- Skip the BAR subrange selftest if there are not enough inbound
window resources to run the test (Christian Bruel)
New native PCIe controller drivers:
- Add DT binding and driver for Andes QiLai SoC PCIe host controller
(Randolph Lin)
- Add DT binding and driver for ESWIN PCIe Root Complex (Senchuan
Zhang)
Baikal T-1 PCIe controller driver:
- Remove driver since it never quite became usable (Andy Shevchenko)
Cadence PCIe controller driver:
- Implement byte/word config reads with dword (32-bit) reads because
some Cadence controllers don't support sub-dword accesses (Aksh
Garg)
CIX Sky1 PCIe controller driver:
- Add 'power-domains' to DT binding for SCMI power domain (Gary Yang)
Freescale i.MX6 PCIe controller driver:
- Add i.MX94 and i.MX943 to fsl,imx6q-pcie-ep DT binding (Richard
Zhu)
- Delay instead of polling for L2/L3 Ready after PME_Turn_off when
suspending i.MX6SX because LTSSM registers are inaccessible
(Richard Zhu)
- Separate PERST# assertion (for resetting endpoints) from core reset
(for resetting the RC itself) to prepare for new DTs with PERST#
GPIO in per-Root Port nodes (Sherry Sun)
- Retain Root Port MSI capability on i.MX7D, i.MX8MM, and i.MX8MQ so
MSI from downstream devices will work (Richard Zhu)
- Fix i.MX95 reference clock source selection when internal refclk is
used (Franz Schnyder)
Freescale Layerscape PCIe controller driver:
- Allow building as a removable module (Sascha Hauer)
MediaTek PCIe Gen3 controller driver:
- Use dev_err_probe() to simplify error paths and make deferred probe
messages visible in /sys/kernel/debug/devices_deferred (Chen-Yu
Tsai)
- Power off device if setup fails (Chen-Yu Tsai)
- Integrate new pwrctrl API to enable power control for WiFi/BT
adapters on mainboard or in PCIe or M.2 slots (Chen-Yu Tsai)
NVIDIA Tegra194 PCIe controller driver:
- Poll less aggressively and non-atomically for PME_TO_Ack during
transition to L2 (Vidya Sagar)
- Disable LTSSM after transition to Detect on surprise link down to
stop toggling between Polling and Detect (Manikanta Maddireddy)
- Don't force the device into the D0 state before L2 when suspending
or shutting down the controller (Vidya Sagar)
- Disable PERST# IRQ only in Endpoint mode because it's not
registered in Root Port mode (Manikanta Maddireddy)
- Handle 'nvidia,refclk-select' as optional (Vidya Sagar)
- Disable direct speed change in Endpoint mode so link speed change
is controlled by the host (Vidya Sagar)
- Set LTR values before link up to avoid bogus LTR messages with 0
latency (Vidya Sagar)
- Allow system suspend when the Endpoint link is down (Vidya Sagar)
- Use DWC IP core version, not Tegra custom values, to avoid DWC core
version check warnings (Manikanta Maddireddy)
- Apply ECRC workaround to devices based on DesignWare 5.00a as well
as 4.90a (Manikanta Maddireddy)
- Disable PM Substate L1.2 in Endpoint mode to work around Tegra234
erratum (Vidya Sagar)
- Delay post-PERST# cleanup until core is powered on to avoid CBB
timeout (Manikanta Maddireddy)
- Assert CLKREQ# so switches that forward it to their downstream side
can bring up those links successfully (Vidya Sagar)
- Calibrate pipe to UPHY for Endpoint mode to reset stale PLL state
from any previous bad link state (Vidya Sagar)
- Remove IRQF_ONESHOT flag from Endpoint interrupt registration so
DMA driver and Endpoint controller driver can share the interrupt
line (Vidya Sagar)
- Enable DMA interrupt to support DMA in both Root Port and Endpoint
modes (Vidya Sagar)
- Enable hardware link retraining after link goes down in Endpoint
mode (Vidya Sagar)
- Add DT binding and driver support for core clock monitoring (Vidya
Sagar)
Qualcomm PCIe controller driver:
- Advertise 'Hot-Plug Capable' and set 'No Command Completed Support'
since Qcom Root Ports support hotplug events like DL_Up/Down and
can accept writes to Slot Control without delays between writes
(Krishna Chaitanya Chundru)
Renesas R-Car PCIe controller driver:
- Mark Endpoint BAR0 and BAR2 as Resizable (Koichiro Den)
- Reduce EPC BAR alignment requirement to 4K (Koichiro Den)
Renesas RZ/G3S PCIe controller driver:
- Add RZ/G3E to DT binding and to driver (John Madieu)
- Assert (not deassert) resets in probe error path (John Madieu)
- Assert resets in suspend path in reverse order they were deasserted
during probe (John Madieu)
- Rework inbound window algorithm to prevent mapping more than
intended region and enforce alignment on size, to prepare for
RZ/G3E support (John Madieu)
Rockchip DesignWare PCIe controller driver:
- Add tracepoints for PCIe controller LTSSM transitions and link rate
changes (Shawn Lin)
- Trace LTSSM events collected by the dw-rockchip debug FIFO (Shawn
Lin)
SOPHGO PCIe controller driver:
- Disable ASPM L0s and L1 on Sophgo 2042 PCIe Root Ports that
advertise support for them (Yao Zi)
Synopsys DesignWare PCIe controller driver:
- Continue with system suspend even if an Endpoint doesn't respond
with PME_TO_Ack message (Manivannan Sadhasivam)
- Set Endpoint MSI-X Table Size in the correct function of a
multi-function device when configuring MSI-X, not in Function 0
(Aksh Garg)
- Set Max Link Width and Max Link Speed for all functions of a
multi-function device, not just Function 0 (Aksh Garg)
- Expose PCIe event counters in groups 5-7 in debugfs (Hans Zhang)
Miscellaneous:
- Warn only once about invalid ACS kernel parameter format (Richard
Cheng)
- Suppress FW_BUG warning when writing sysfs 'numa_node' with the
current value (Li RongQing)
- Drop redundant 'depends on PCI' from Kconfig (Julian Braha)"
* tag 'pci-v7.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (165 commits)
PCI/P2PDMA: Add Google SoCs to the P2P DMA host bridge list
PCI/P2PDMA: Allow wildcard Device IDs in host bridge list
PCI: sg2042: Avoid L0s and L1 on Sophgo 2042 PCIe Root Ports
PCI: cadence: Add flags for disabling ASPM capability for broken Root Ports
PCI: tegra194: Add core monitor clock support
dt-bindings: PCI: tegra194: Add monitor clock support
PCI: tegra194: Enable hardware hot reset mode in Endpoint mode
PCI: tegra194: Enable DMA interrupt
PCI: tegra194: Remove IRQF_ONESHOT flag during Endpoint interrupt registration
PCI: tegra194: Calibrate pipe to UPHY for Endpoint mode
PCI: tegra194: Assert CLKREQ# explicitly by default
PCI: tegra194: Fix CBB timeout caused by DBI access before core power-on
PCI: tegra194: Disable L1.2 capability of Tegra234 EP
PCI: dwc: Apply ECRC workaround to DesignWare 5.00a as well
PCI: tegra194: Use DWC IP core version
PCI: tegra194: Free up Endpoint resources during remove()
PCI: tegra194: Allow system suspend when the Endpoint link is not up
PCI: tegra194: Set LTR message request before PCIe link up in Endpoint mode
PCI: tegra194: Disable direct speed change for Endpoint mode
PCI: tegra194: Use devm_gpiod_get_optional() to parse "nvidia,refclk-select"
...
Set the payload size before forwarding the reply to the async handler.
Without this, xn->reply_sz will be 0 and idpf_mac_filter_async_handler()
will never get past the size check.
Fixes: 34c21fa894 ("idpf: implement virtchnl transaction manager")
Cc: stable@vger.kernel.org
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Li Li <boolli@google.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Protect the set_bit() operation for the free_xn bitmask in
idpf_vc_xn_push_free(), to make the locking consistent with rest of the
code and avoid potential races in that logic.
Fixes: 34c21fa894 ("idpf: implement virtchnl transaction manager")
Cc: stable@vger.kernel.org
Reported-by: Ray Zhang <sgzhang@google.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
As reported by AI review [1], if the refillqs allocation fails, refillqs
will be NULL but num_refillqs will be non-zero. The release function
will then dereference refillqs since it thinks the refillqs are present,
resulting in a NULL ptr dereference.
Only assign the num refillqs if the allocation was successful. This will
prevent the release function from entering the loop and accessing
refillqs.
[1] https://lore.kernel.org/netdev/20260227035625.2632753-1-kuba@kernel.org/
Fixes: 95af467d9a ("idpf: configure resources for RX queues")
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Deinit calls idpf_idc_deinit_core_aux_device to free the cdev_info
memory, but leaves the adapter->cdev_info field with a stale pointer
value. This will bypass subsequent "if (!cdev_info)" checks if cdev_info
is not reallocated. For example, if idc_init fails after a reset,
cdev_info will already have been freed during the reset handling, but it
will not have been reallocated. The next reset or rmmod will result in a
crash.
[ +0.000008] BUG: kernel NULL pointer dereference, address: 00000000000000d0
[ +0.000033] #PF: supervisor read access in kernel mode
[ +0.000020] #PF: error_code(0x0000) - not-present page
[ +0.000017] PGD 2097dfa067 P4D 0
[ +0.000017] Oops: Oops: 0000 [#1] SMP NOPTI
...
[ +0.000018] RIP: 0010:device_del+0x3e/0x3d0
[ +0.000010] Call Trace:
[ +0.000010] <TASK>
[ +0.000012] idpf_idc_deinit_core_aux_device+0x36/0x70 [idpf]
[ +0.000034] idpf_vc_core_deinit+0x3e/0x180 [idpf]
[ +0.000035] idpf_remove+0x40/0x1d0 [idpf]
[ +0.000035] pci_device_remove+0x42/0xb0
[ +0.000020] device_release_driver_internal+0x19c/0x200
[ +0.000024] driver_detach+0x48/0x90
[ +0.000018] bus_remove_driver+0x6d/0x100
[ +0.000023] pci_unregister_driver+0x2e/0xb0
[ +0.000022] __do_sys_delete_module.isra.0+0x18c/0x2b0
[ +0.000025] ? kmem_cache_free+0x2c2/0x390
[ +0.000023] do_syscall_64+0x107/0x7d0
[ +0.000023] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Pass the adapter struct into idpf_idc_deinit_core_aux_device instead and
clear the cdev_info ptr.
Fixes: f4312e6bfa ("idpf: implement core RDMA auxiliary dev create, init, and destroy")
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The only user of frag_size field in XDP RxQ info is
bpf_xdp_frags_increase_tail(). It clearly expects whole buffer size instead
of DMA write size. Different assumptions in idpf driver configuration lead
to negative tailroom.
To make it worse, buffer sizes are not actually uniform in idpf when
splitq is enabled, as there are several buffer queues, so rxq->rx_buf_size
is meaningless in this case.
Use truesize of the first bufq in AF_XDP ZC, as there is only one. Disable
growing tail for regular splitq.
Fixes: ac8a861f63 ("idpf: prepare structures to support XDP")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20260305111253.2317394-8-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-02-19 (idpf, ice, i40e, ixgbevf, e1000e)
For idpf:
Li Li moves the check for software marker to occur after incrementing
next to clean to avoid re-encountering the same packet. He also adds a
couple of checks to prevent NULL pointer dereferences and NULLs rss_key,
after free, in error path so that later checks are properly evaluated.
Brian Vazquez adjusts IRQ naming to have correlation with netdev naming.
Sreedevi removes validation of action type as part of ntuple rule
deletion.
For ice:
Aaron Ma breaks RDMA initialization into two steps and adjusts calls so
that VSIs are entirely configured before plugging.
Michal Schmidt fixes initialization of loopback VSI to have proper
resources allocated to allow for loopback testing to occur.
For i40e:
Thomas Gleixner fixes a leak of preempt count by replacing get_cpu()
with smp_processor_id().
For ixgbevf:
Jedrzej adds a check for mailbox version before attempting to call an
associated link state call that is supported in that mailbox version.
For e1000e:
Vitaly clears power gating feature for Panther Lake systems to avoid
packet issues.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000e: clear DPG_EN after reset to avoid autonomous power-gating
e1000e: introduce new board type for Panther Lake PCH
ixgbevf: fix link setup issue
i40e: Fix preempt count leak in napi poll tracepoint
ice: fix crash in ethtool offline loopback test
ice: recap the VSI and QoS info after rebuild
idpf: Fix flow rule delete failure due to invalid validation
idpf: change IRQ naming to match netdev and ethtool queue numbering
idpf: nullify pointers after they are freed
idpf: skip deallocating txq group's txqs if it is NULL
idpf: skip deallocating bufq_sets from rx_qgrp if it is NULL
idpf: increment completion queue next_to_clean in sw marker wait routine
====================
Link: https://patch.msgid.link/20260225211546.1949260-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When deleting a flow rule using "ethtool -N <dev> delete <location>",
idpf_sideband_action_ena() incorrectly validates fsp->ring_cookie even
though ethtool doesn't populate this field for delete operations. The
uninitialized ring_cookie may randomly match RX_CLS_FLOW_DISC or
RX_CLS_FLOW_WAKE, causing validation to fail and preventing legitimate
rule deletions. Remove the unnecessary sideband action enable check and
ring_cookie validation during delete operations since action validation
is not required when removing existing rules.
Fixes: ada3e24b84 ("idpf: add flow steering support")
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The code uses the vidx for the IRQ name but that doesn't match ethtool
reporting nor netdev naming, this makes it hard to tune the device and
associate queues with IRQs. Sequentially requesting irqs starting from
'0' makes the output consistent.
This commit changes the interrupt numbering but preserves the name
format, maintaining ABI compatibility. Existing tools relying on the old
numbering are already non-functional, as they lack a useful correlation
to the interrupts.
Before:
ethtool -L eth1 tx 1 combined 3
grep . /proc/irq/*/*idpf*/../smp_affinity_list
/proc/irq/67/idpf-Mailbox-0/../smp_affinity_list:0-55,112-167
/proc/irq/68/idpf-eth1-TxRx-1/../smp_affinity_list:0
/proc/irq/70/idpf-eth1-TxRx-3/../smp_affinity_list:1
/proc/irq/71/idpf-eth1-TxRx-4/../smp_affinity_list:2
/proc/irq/72/idpf-eth1-Tx-5/../smp_affinity_list:3
ethtool -S eth1 | grep -v ': 0'
NIC statistics:
tx_q-0_pkts: 1002
tx_q-1_pkts: 2679
tx_q-2_pkts: 1113
tx_q-3_pkts: 1192 <----- tx_q-3 vs idpf-eth1-Tx-5
rx_q-0_pkts: 1143
rx_q-1_pkts: 3172
rx_q-2_pkts: 1074
After:
ethtool -L eth1 tx 1 combined 3
grep . /proc/irq/*/*idpf*/../smp_affinity_list
/proc/irq/67/idpf-Mailbox-0/../smp_affinity_list:0-55,112-167
/proc/irq/68/idpf-eth1-TxRx-0/../smp_affinity_list:0
/proc/irq/70/idpf-eth1-TxRx-1/../smp_affinity_list:1
/proc/irq/71/idpf-eth1-TxRx-2/../smp_affinity_list:2
/proc/irq/72/idpf-eth1-Tx-3/../smp_affinity_list:3
ethtool -S eth1 | grep -v ': 0'
NIC statistics:
tx_q-0_pkts: 118
tx_q-1_pkts: 134
tx_q-2_pkts: 228
tx_q-3_pkts: 138 <--- tx_q-3 matches idpf-eth1-Tx-3
rx_q-0_pkts: 111
rx_q-1_pkts: 366
rx_q-2_pkts: 120
Fixes: d4d5587182 ("idpf: initialize interrupts and enable vport")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
rss_data->rss_key needs to be nullified after it is freed.
Checks like "if (!rss_data->rss_key)" in the code could fail
if it is not nullified.
Tested: built and booted the kernel.
Fixes: 83f38f210b ("idpf: Fix RSS LUT NULL pointer crash on early ethtool operations")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, in idpf_wait_for_sw_marker_completion(), when an
IDPF_TXD_COMPLT_SW_MARKER packet is found, the routine breaks out of
the for loop and does not increment the next_to_clean counter. This
causes the subsequent NAPI polls to run into the same
IDPF_TXD_COMPLT_SW_MARKER packet again and print out the following:
[ 23.261341] idpf 0000:05:00.0 eth1: Unknown TX completion type: 5
Instead, we should increment next_to_clean regardless when an
IDPF_TXD_COMPLT_SW_MARKER packet is found.
Tested: with the patch applied, we do not see the errors above from NAPI
polls anymore.
Fixes: 9d39447051 ("idpf: remove SW marker handling from NAPI")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
No pci_enable_ptm() callers supply the "granularity" pointer where the
clock granularity would be returned.
Drop the unused pci_enable_ptm() parameter.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: https://patch.msgid.link/20260224111044.3487873-5-mika.westerberg@linux.intel.com
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
The logic is similar to idpf_rx_hwtstamp, but the data is exported
as a BPF kfunc instead of appended to an skb to support grabbing
timestamps in xsk packets.
A idpf_queue_has(PTP, rxq) condition is added to check the queue
supports PTP similar to idpf_rx_process_skb_fields.
Tested using an xsk connection and checking xdp timestamps are
retrievable in received packets.
Cc: intel-wired-lan@lists.osuosl.org
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
refactor IDPF resource access
Pavan Kumar Linga says:
Queue and vector resources for a given vport, are stored in the
idpf_vport structure. At the time of configuration, these
resources are accessed using vport pointer. Meaning, all the
config path functions are tied to the default queue and vector
resources of the vport.
There are use cases which can make use of config path functions
to configure queue and vector resources that are not tied to any
vport. One such use case is PTP secondary mailbox creation
(it would be in a followup series). To configure queue and interrupt
resources for such cases, we can make use of the existing config
infrastructure by passing the necessary queue and vector resources info.
To achieve this, group the existing queue and vector resources into
default resource group and refactor the code to pass the resource
pointer to the config path functions.
This series also includes patches which generalizes the send virtchnl
message APIs and mailbox API that are necessary for the implementation
of PTP secondary mailbox.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
idpf: generalize mailbox API
idpf: avoid calling get_rx_ptypes for each vport
idpf: generalize send virtchnl message API
idpf: remove vport pointer from queue sets
idpf: add rss_data field to RSS function parameters
idpf: reshuffle idpf_vport struct members to avoid holes
idpf: move some iterator declarations inside for loops
idpf: move queue resources to idpf_q_vec_rsrc structure
idpf: introduce idpf_q_vec_rsrc struct and move vector resources to it
idpf: introduce local idpf structure to store virtchnl queue chunks
====================
Link: https://patch.msgid.link/20260122223601.2208759-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a control queue parameter to all mailbox APIs in order to make use
of those APIs for non-default mailbox as well.
Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
RX ptypes received from device control plane doesn't depend on vport
info, but might vary based on the queue model. When the driver requests
for ptypes, control plane fills both ptype_id_10 (used for splitq) and
ptype_id_8 (used for singleq) fields of the virtchnl2_ptype response
structure. This allows to call get_rx_ptypes once at the adapter level
instead of each vport.
Parse and store the received ptypes of both splitq and singleq in a
separate lookup table. Respective lookup table is used based on the
queue model info. As part of the changes, pull the ptype protocol
parsing code into a separate function.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
With the previous refactor of passing idpf resource pointer, all of the
virtchnl send message functions do not require full vport structure.
Those functions can be generalized to be able to use for configuring
vport independent queues.
Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Replace vport pointer in queue sets struct with adapter backpointer and
vport_id as those are the primary fields necessary for virtchnl
communication. Otherwise, pass the vport pointer as a separate parameter
where available. Also move xdp_txq_offset to queue vector resource
struct since we no longer have the vport pointer.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Retrieve rss_data field of vport just once and pass it to RSS related
functions instead of retrieving it in each function.
While at it, update s/rss/RSS in the RSS function doc comments.
Reviewed-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The previous refactor of moving queue and vector resources out of the
idpf_vport structure, created few holes. Reshuffle the existing members
to avoid holes as much as possible.
Reviewed-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Move some iterator declarations to their respective for loops; use more
appropriate unsigned type.
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Move both TX and RX queue resources to the newly introduced
idpf_q_vec_rsrc structure.
Reviewed-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
To group all the vector and queue resources, introduce idpf_q_vec_rsrc
structure. This helps to reuse the same config path functions by other
features. For example, PTP implementation can use the existing config
infrastructure to configure secondary mailbox by passing its queue and
vector info. It also helps to avoid any duplication of code.
Existing queue and vector resources are grouped as default resources.
This patch moves vector info to the newly introduced structure.
Following patch moves the queue resources.
While at it, declare the loop iterator for 'num_q_vectors' in loop and
use the correct type.
Include idpf_q_vec_rsrc backpointer in idpf_alloc_queue_set along with
vport.
Reviewed-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Queue ID and register info received from device Control Plane is stored
locally in the same little endian format. As the queue chunks are
retrieved in 3 functions, lexx_to_cpu conversions are done each time.
Instead introduce a new idpf structure to store the received queue info.
It also avoids conditional check to retrieve queue chunks.
With this change, there is no need to store the queue chunks in
'req_qs_chunks' field. So remove that.
Suggested-by: Milena Olech <milena.olech@intel.com>
Reviewed-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-01-20 (ice, idpf)
For ice:
Cody Haas breaks dependency of needing both RSS key and LUT for
ice_get_rxfh() as ethtool ioctls do not always supply both.
Paul fixes issues related to devlink reload; adding missing deinit HW
call and moving hwmon exit function to the proper call chain.
For idpf:
Mina Almasry moves a register read call into the time sandwich to ensure
the register is properly flushed.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
idpf: read lower clock bits inside the time sandwich
ice: fix devlink reload call trace
ice: add missing ice_deinit_hw() in devlink reinit path
ice: Fix persistent failure in ice_get_rxfh
====================
Link: https://patch.msgid.link/20260120224430.410377-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In idpf_net_dim(), some statistics protected by u64_stats_sync, are read
and accumulated in ignorance of possible u64_stats_fetch_retry() events.
The correct way to copy statistics is already illustrated by
idpf_add_queue_stats(). Fix this by reading them into temporary variables
first.
Fixes: c2d548cad1 ("idpf: add TX splitq napi poll support")
Fixes: 3a8845af66 ("idpf: add RX splitq napi poll support")
Signed-off-by: David Yang <mmyangfl@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260119162720.1463859-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PCIe reads need to be done inside the time sandwich because PCIe
writes may get buffered in the PCIe fabric and posted to the device
after the _postts completes. Doing the PCIe read inside the time
sandwich guarantees that the write gets flushed before the _postts
timestamp is taken.
Cc: lrizzo@google.com
Cc: namangulati@google.com
Cc: willemb@google.com
Cc: intel-wired-lan@lists.osuosl.org
Cc: milena.olech@intel.com
Cc: jacob.e.keller@intel.com
Fixes: 5cb8805d23 ("idpf: negotiate PTP capabilities and get PTP clock")
Suggested-by: Shachar Raindel <shacharr@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In many functions the Return section is missing. Fix kernel-doc
descriptions to address that and other warnings.
Before the change:
$ scripts/kernel-doc -none -Wreturn drivers/net/ethernet/intel/idpf/idpf_txrx.c 2>&1 | wc -l
85
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
idpf_up_complete() function always returns 0 and no callers use this return
value. Although idpf_vport_open() checks the return value, it only handles
error cases which never occur. Change the return type to void to simplify
the code.
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If vport flags do not contain VIRTCHNL2_VPORT_ENABLE_RDMA, driver does not
allocate vdev_info for this vport. This leads to kernel NULL pointer
dereference in idpf_idc_vport_dev_down(), which references vdev_info for
every vport regardless.
Check, if vdev_info was ever allocated before unplugging aux device.
Fixes: be91128c57 ("idpf: implement RDMA vport auxiliary dev create, init, and destroy")
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The HW only supports a maximum Rx buffer size of 16K-128. On systems
using large pages, the libeth logic can configure the buffer size to be
larger than this. The upper bound is PAGE_SIZE while the lower bound is
MTU rounded up to the nearest power of 2. For example, ARM systems with
a 64K page size and an mtu of 9000 will set the Rx buffer size to 16K,
which will cause the config Rx queues message to fail.
Initialize the bufq/fill queue buf_len field to the maximum supported
size. This will trigger the libeth logic to cap the maximum Rx buffer
size by reducing the upper bound.
Fixes: 74d1412ac8 ("idpf: use libeth Rx buffer management for payload buffer")
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: David Decotigny <ddecotig@google.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix error handling to properly cleanup interrupts when
idpf_vport_queue_ids_init() or idpf_rx_bufs_init_all() fail. Jump to
'intr_deinit' instead of 'queues_rel' to ensure interrupts are cleaned up
before releasing other resources.
Fixes: d4d5587182 ("idpf: initialize interrupts and enable vport")
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
During soft reset, the RSS LUT is freed and not restored unless the
interface is up. If an ethtool command that accesses the rss lut is
attempted immediately after reset, it will result in NULL ptr
dereference. Also, there is no need to reset the rss lut if the soft reset
does not involve queue count change.
After soft reset, set the RSS LUT to default values based on the updated
queue count only if the reset was a result of a queue count change and
the LUT was not configured by the user. In all other cases, don't touch
the LUT.
Steps to reproduce:
** Bring the interface down (if up)
ifconfig eth1 down
** update the queue count (eg., 27->20)
ethtool -L eth1 combined 20
** display the RSS LUT
ethtool -x eth1
[82375.558338] BUG: kernel NULL pointer dereference, address: 0000000000000000
[82375.558373] #PF: supervisor read access in kernel mode
[82375.558391] #PF: error_code(0x0000) - not-present page
[82375.558408] PGD 0 P4D 0
[82375.558421] Oops: Oops: 0000 [#1] SMP NOPTI
<snip>
[82375.558516] RIP: 0010:idpf_get_rxfh+0x108/0x150 [idpf]
[82375.558786] Call Trace:
[82375.558793] <TASK>
[82375.558804] rss_prepare.isra.0+0x187/0x2a0
[82375.558827] rss_prepare_data+0x3a/0x50
[82375.558845] ethnl_default_doit+0x13d/0x3e0
[82375.558863] genl_family_rcv_msg_doit+0x11f/0x180
[82375.558886] genl_rcv_msg+0x1ad/0x2b0
[82375.558902] ? __pfx_ethnl_default_doit+0x10/0x10
[82375.558920] ? __pfx_genl_rcv_msg+0x10/0x10
[82375.558937] netlink_rcv_skb+0x58/0x100
[82375.558957] genl_rcv+0x2c/0x50
[82375.558971] netlink_unicast+0x289/0x3e0
[82375.558988] netlink_sendmsg+0x215/0x440
[82375.559005] __sys_sendto+0x234/0x240
[82375.559555] __x64_sys_sendto+0x28/0x30
[82375.560068] x64_sys_call+0x1909/0x1da0
[82375.560576] do_syscall_64+0x7a/0xfa0
[82375.561076] ? clear_bhb_loop+0x60/0xb0
[82375.561567] entry_SYSCALL_64_after_hwframe+0x76/0x7e
<snip>
Fixes: 02cbfba1ad ("idpf: add ethtool callbacks")
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
RSS LUT provisioning and queries on a down interface currently return
silently without effect. Users should be able to configure RSS settings
even when the interface is down.
Fix by maintaining RSS configuration changes in the driver's soft copy and
deferring HW programming until the interface comes up.
Fixes: 02cbfba1ad ("idpf: add ethtool callbacks")
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The RSS LUT is not initialized until the interface comes up, causing
the following NULL pointer crash when ethtool operations like rxhash on/off
are performed before the interface is brought up for the first time.
Move RSS LUT initialization from ndo_open to vport creation to ensure LUT
is always available. This enables RSS configuration via ethtool before
bringing the interface up. Simplify LUT management by maintaining all
changes in the driver's soft copy and programming zeros to the indirection
table when rxhash is disabled. Defer HW programming until the interface
comes up if it is down during rxhash and LUT configuration changes.
Steps to reproduce:
** Load idpf driver; interfaces will be created
modprobe idpf
** Before bringing the interfaces up, turn rxhash off
ethtool -K eth2 rxhash off
[89408.371875] BUG: kernel NULL pointer dereference, address: 0000000000000000
[89408.371908] #PF: supervisor read access in kernel mode
[89408.371924] #PF: error_code(0x0000) - not-present page
[89408.371940] PGD 0 P4D 0
[89408.371953] Oops: Oops: 0000 [#1] SMP NOPTI
<snip>
[89408.372052] RIP: 0010:memcpy_orig+0x16/0x130
[89408.372310] Call Trace:
[89408.372317] <TASK>
[89408.372326] ? idpf_set_features+0xfc/0x180 [idpf]
[89408.372363] __netdev_update_features+0x295/0xde0
[89408.372384] ethnl_set_features+0x15e/0x460
[89408.372406] genl_family_rcv_msg_doit+0x11f/0x180
[89408.372429] genl_rcv_msg+0x1ad/0x2b0
[89408.372446] ? __pfx_ethnl_set_features+0x10/0x10
[89408.372465] ? __pfx_genl_rcv_msg+0x10/0x10
[89408.372482] netlink_rcv_skb+0x58/0x100
[89408.372502] genl_rcv+0x2c/0x50
[89408.372516] netlink_unicast+0x289/0x3e0
[89408.372533] netlink_sendmsg+0x215/0x440
[89408.372551] __sys_sendto+0x234/0x240
[89408.372571] __x64_sys_sendto+0x28/0x30
[89408.372585] x64_sys_call+0x1909/0x1da0
[89408.372604] do_syscall_64+0x7a/0xfa0
[89408.373140] ? clear_bhb_loop+0x60/0xb0
[89408.373647] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[89408.378887] </TASK>
<snip>
Fixes: a251eee621 ("idpf: add SRIOV support and other ndo_ops")
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When ethtool -n is executed on an interface to display the flow steering
rules, "rxclass: Unknown flow type" error is generated.
The flow steering list maintained in the driver currently stores only the
location and q_index but other fields of the ethtool_rx_flow_spec are not
stored. This may be enough for the virtchnl command to delete the entry.
However, when the ethtool -n command is used to query the flow steering
rules, the ethtool_rx_flow_spec returned is not complete causing the
error below.
Resolve this by storing the flow spec (fsp) when rules are added and
returning the complete flow spec when rules are queried.
Also, change the return value from EINVAL to ENOENT when flow steering
entry is not found during query by location or when deleting an entry.
Add logic to detect and reject duplicate filter entries at the same
location and change logic to perform upfront validation of all error
conditions before adding flow rules through virtchnl. This avoids the
need for additional virtchnl delete messages when subsequent operations
fail, which was missing in the original upstream code.
Example:
Before the fix:
ethtool -n eth1
2 RX rings available
Total 2 rules
rxclass: Unknown flow type
rxclass: Unknown flow type
After the fix:
ethtool -n eth1
2 RX rings available
Total 2 rules
Filter: 0
Rule Type: TCP over IPv4
Src IP addr: 10.0.0.1 mask: 0.0.0.0
Dest IP addr: 0.0.0.0 mask: 255.255.255.255
TOS: 0x0 mask: 0xff
Src port: 0 mask: 0xffff
Dest port: 0 mask: 0xffff
Action: Direct to queue 0
Filter: 1
Rule Type: UDP over IPv4
Src IP addr: 10.0.0.1 mask: 0.0.0.0
Dest IP addr: 0.0.0.0 mask: 255.255.255.255
TOS: 0x0 mask: 0xff
Src port: 0 mask: 0xffff
Dest port: 0 mask: 0xffff
Action: Direct to queue 0
Fixes: ada3e24b84 ("idpf: add flow steering support")
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Co-developed-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The flow steering list maintains entries that are added and removed as
ethtool creates and deletes flow steering rules. Module removal with active
entries causes memory leak as the list is not properly cleaned up.
Prevent this by iterating through the remaining entries in the list and
freeing the associated memory during module removal. Add a spinlock
(flow_steer_list_lock) to protect the list access from multiple threads.
Fixes: ada3e24b84 ("idpf: add flow steering support")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If the init_task fails during a driver load, we end up without vports and
netdevs, effectively failing the entire process. In that state a
subsequent reset will result in a crash as the service task attempts to
access uninitialized resources. Following trace is from an error in the
init_task where the CREATE_VPORT (op 501) is rejected by the FW:
[40922.763136] idpf 0000:83:00.0: Device HW Reset initiated
[40924.449797] idpf 0000:83:00.0: Transaction failed (op 501)
[40958.148190] idpf 0000:83:00.0: HW reset detected
[40958.161202] BUG: kernel NULL pointer dereference, address: 00000000000000a8
...
[40958.168094] Workqueue: idpf-0000:83:00.0-vc_event idpf_vc_event_task [idpf]
[40958.168865] RIP: 0010:idpf_vc_event_task+0x9b/0x350 [idpf]
...
[40958.177932] Call Trace:
[40958.178491] <TASK>
[40958.179040] process_one_work+0x226/0x6d0
[40958.179609] worker_thread+0x19e/0x340
[40958.180158] ? __pfx_worker_thread+0x10/0x10
[40958.180702] kthread+0x10f/0x250
[40958.181238] ? __pfx_kthread+0x10/0x10
[40958.181774] ret_from_fork+0x251/0x2b0
[40958.182307] ? __pfx_kthread+0x10/0x10
[40958.182834] ret_from_fork_asm+0x1a/0x30
[40958.183370] </TASK>
Fix the error handling in the init_task to make sure the service and
mailbox tasks are disabled if the error happens during load. These are
started in idpf_vc_core_init(), which spawns the init_task and has no way
of knowing if it failed. If the error happens on reset, following
successful driver load, the tasks can still run, as that will allow the
netdevs to attempt recovery through another reset. Stop the PTP callbacks
either way as those will be restarted by the call to idpf_vc_core_init()
during a successful reset.
Fixes: 0fe45467a1 ("idpf: add create vport and netdev configuration")
Reported-by: Vivek Kumar <iamvivekkumar@google.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>