linux/drivers
Shay Drory d466ddda55 net/mlx5e: SD, Fix race condition in secondary device probe/remove
When utilizing Socket-Direct single netdev functionality the driver
resolves the actual auxiliary device using mlx5_sd_get_adev(). However,
the current implementation returns the primary ETH auxiliary device
without holding the device lock, leading to a potential race condition
where the ETH device could be unbound or removed concurrently during
probe, suspend, resume, or remove operations.[1]

Fix this by introducing mlx5_sd_put_adev() and updating
mlx5_sd_get_adev() so that secondaries devices would get a ref and
acquire the device lock of the returned auxiliary device. After the lock
is acquired, a second devcom check is needed[2].
In addition, update The callers to pair the get operation with the new
put operation, ensuring the lock is held while the auxiliary device is
being operated on and released afterwards.

The "primary" designation is determined once in sd_register(). It's set
before devcom is marked ready, and it never changes after that.
In Addition, The primary path never locks a secondary: When the primary
device invoke mlx5_sd_get_adev(), it sees dev == primary and returns.
no additional lock is taken.
Therefore lock ordering is always: secondary_lock -> primary_lock. The
reverse never happens, so ABBA deadlock is impossible.

[1]
for example:
BUG: kernel NULL pointer dereference, address: 0000000000000370
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP
CPU: 4 UID: 0 PID: 3945 Comm: bash Not tainted 6.19.0-rc3+ #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5e_dcbnl_dscp_app+0x23/0x100 [mlx5_core]
Call Trace:
 <TASK>
 mlx5e_remove+0x82/0x12a [mlx5_core]
 device_release_driver_internal+0x194/0x1f0
 bus_remove_device+0xc6/0x140
 device_del+0x159/0x3c0
 ? devl_param_driverinit_value_get+0x29/0x80
 mlx5_rescan_drivers_locked+0x92/0x160 [mlx5_core]
 mlx5_unregister_device+0x34/0x50 [mlx5_core]
 mlx5_uninit_one+0x43/0xb0 [mlx5_core]
 remove_one+0x4e/0xc0 [mlx5_core]
 pci_device_remove+0x39/0xa0
 device_release_driver_internal+0x194/0x1f0
 unbind_store+0x99/0xa0
 kernfs_fop_write_iter+0x12e/0x1e0
 vfs_write+0x215/0x3d0
 ksys_write+0x5f/0xd0
 do_syscall_64+0x55/0xe90
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

[2]
    CPU0 (primary)                     CPU1 (secondary)
==========================================================================
mlx5e_remove() (device_lock held)
                                     mlx5e_remove() (2nd device_lock held)
                                      mlx5_sd_get_adev()
                                       mlx5_devcom_comp_is_ready() => true
                                       device_lock(primary)
 mlx5_sd_get_adev() ==> ret adev
 _mlx5e_remove()
 mlx5_sd_cleanup()
 // mlx5e_remove finished
 // releasing device_lock
                                       //need another check here...
                                       mlx5_devcom_comp_is_ready() => false

Fixes: 381978d283 ("net/mlx5e: Create single netdev per SD group")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504180206.268568-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:13:09 -07:00
..
accel drm for v7.1-rc1 2026-04-15 08:45:00 -07:00
accessibility
acpi ACPI support fixes for 7.1-rc1 2026-04-23 12:29:22 -07:00
amba
android Char/Misc/IIO/and others driver updates for 7.1-rc1 2026-04-24 13:23:50 -07:00
ata ata: pata_parport: switch to dynamic root device 2026-04-27 11:38:16 +02:00
atm net: remove unused ATM protocols and legacy ATM device drivers 2026-04-23 12:21:14 -07:00
auxdisplay
base regmap: Fixes for v7.1 2026-04-24 12:11:26 -07:00
bcma
block block-7.1-20260424 2026-04-24 15:06:55 -07:00
bluetooth Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling 2026-04-13 09:19:42 -04:00
bus Char/Misc/IIO/and others driver updates for 7.1-rc1 2026-04-24 13:23:50 -07:00
cache
cdrom
cdx
char Here are the accumulated fixes for 7.1-rc1 and a single structural worth of 2026-04-25 16:20:52 -07:00
clk One more fix for the merge window to avoid a boot hang on 2026-04-26 14:03:20 -07:00
clocksource
comedi Char/Misc/IIO/and others driver updates for 7.1-rc1 2026-04-24 13:23:50 -07:00
connector
counter Linux 7.0-rc7 2026-04-06 09:04:53 +02:00
cpufreq Devicetree updates for v7.1: 2026-04-17 14:09:02 -07:00
cpuidle powerpc updates for 7.1 2026-04-14 17:10:15 -07:00
crypto crypto: ccp - copy IV using skcipher ivsize 2026-04-16 17:37:03 +08:00
cxl CXL changes for v7.1 2026-04-17 15:52:58 -07:00
dax dax changes for 7.1 2026-04-21 14:12:01 -07:00
dca
devfreq PM / devfreq: tegra30-devfreq: add support for Tegra114 2026-04-04 03:15:39 +09:00
dibs
dio
dma dmaengine updates for v7.1 2026-04-17 10:29:01 -07:00
dma-buf drm fixes for 7.1-rc1 2026-04-24 11:44:52 -07:00
dpll dpll: export __dpll_pin_change_ntf() for use under dpll_lock 2026-04-30 11:37:39 +02:00
edac - Add new AMD MCA bank names and types to the MCA code, preceded by a clean 2026-04-14 15:32:39 -07:00
eisa
extcon
firewire
firmware LoongArch changes for v7.1 2026-04-24 09:54:45 -07:00
fpga
fsi
fwctl fwctl: Fix class init ordering to avoid NULL pointer dereference on device removal 2026-04-10 11:21:06 -03:00
gnss
gpib Linux 7.0-rc7 2026-04-06 09:04:53 +02:00
gpio gpio fixes for v7.1-rc1 2026-04-24 11:59:46 -07:00
gpu drm fixes for 7.1-rc1 2026-04-24 11:44:52 -07:00
greybus greybus: gb-beagleplay: bound bootloader receive buffering 2026-04-02 15:55:09 +02:00
hid Input updates for v7.1-rc0 2026-04-22 18:36:40 -07:00
hsi HSI: omap_ssi_port: remove depends on ARM 2026-04-02 22:33:44 +02:00
hte hte: tegra194: Add Tegra264 GTE support 2026-04-12 23:29:31 -07:00
hv drm fixes for 7.1-rc1 2026-04-24 11:44:52 -07:00
hwmon hwmon updates for 7.1 2026-04-15 14:37:32 -07:00
hwspinlock hwspinlock: u8500: delete driver 2026-04-06 09:43:18 -05:00
hwtracing Char/Misc/IIO/and others driver updates for 7.1-rc1 2026-04-24 13:23:50 -07:00
i2c i2c-host for v7.1, part 2 2026-04-20 00:03:38 +02:00
i3c i3c: mipi-i3c-hci: fix IBI payload length calculation for final status 2026-04-12 22:06:02 +02:00
idle
iio Char/Misc/IIO/and others driver updates for 7.1-rc1 2026-04-24 13:23:50 -07:00
infiniband SCSI misc on 20260421 2026-04-21 08:22:18 -07:00
input Input updates for v7.1-rc0 2026-04-22 18:36:40 -07:00
interconnect This pull request contains the interconnect changes for the 7.1-rc1 2026-04-07 10:06:50 +02:00
iommu dma-mapping updates for Linux 7.0: 2026-04-17 11:12:42 -07:00
ipack
irqchip Arm: 2026-04-17 07:18:03 -07:00
leds leds: class: Make led_remove_lookup() NULL-aware 2026-04-09 13:49:19 +01:00
macintosh
mailbox mailbox: mailbox-test: make data_ready a per-instance variable 2026-04-18 13:10:14 -05:00
mcb
md - fix metadata corruption in dm-thin 2026-04-27 16:33:23 -07:00
media rpmsg updates for v7.1 2026-04-17 14:18:55 -07:00
memory dma-mapping updates for Linux 7.0: 2026-04-17 11:12:42 -07:00
memstick
message
mfd MFD for v7.1 2026-04-20 11:31:01 -07:00
misc Char/Misc/IIO/and others driver updates for 7.1-rc1 2026-04-24 13:23:50 -07:00
mmc mmc: sdhci-msm: Fix the wrapped key handling 2026-04-10 10:29:58 +02:00
most most: usb: Use kzalloc_objs for endpoint address array 2026-04-02 17:06:09 +02:00
mtd * MTD changes 2026-04-17 17:57:04 -07:00
mux Char/Misc/IIO/and others driver updates for 7.1-rc1 2026-04-24 13:23:50 -07:00
net net/mlx5e: SD, Fix race condition in secondary device probe/remove 2026-05-05 19:13:09 -07:00
nfc NFC: trf7970a: Ignore antenna noise when checking for RF field 2026-04-27 18:00:43 -07:00
ntb pci-v7.1-changes 2026-04-15 14:41:21 -07:00
nubus
nvdimm vfs-7.1-rc1.integrity 2026-04-13 10:40:26 -07:00
nvme for-7.1/io_uring-20260411 2026-04-13 16:22:30 -07:00
nvmem Linux 7.0-rc7 2026-04-06 09:04:53 +02:00
of memblock: updates for 7.0-rc1 2026-04-18 11:29:14 -07:00
opp
parisc parisc: led: fix reference leak on failed device registration 2026-04-17 15:46:46 +02:00
parport parport: Remove completed item from to-do list 2026-04-02 17:05:56 +02:00
pci LoongArch changes for v7.1 2026-04-24 09:54:45 -07:00
pcmcia PCMCIA fixes and cleanups for v7.1 2026-04-23 11:22:16 -07:00
peci
perf arm64 updates for 7.1: 2026-04-14 16:48:56 -07:00
phy phy-for-7.1 2026-04-17 10:22:08 -07:00
pinctrl Pin control changes for the v7.1 kernel cycle: 2026-04-18 16:59:09 -07:00
platform platform-drivers-x86 for v7.1-1 2026-04-20 12:02:24 -07:00
pmdomain pmdomain: qcom: rpmhpd: Add power domains for Hawi SoC 2026-04-08 12:01:37 +02:00
pnp
power USB / Thunderbolt changes for 7.1-rc1 2026-04-19 08:47:40 -07:00
powercap
pps pps: change pps_class to a const struct 2026-04-02 16:33:00 +02:00
ps3
ptp
pwm pwm: Two driver fixes 2026-04-23 08:37:07 -07:00
rapidio
ras
regulator regulator: Fix for v7.1 2026-04-24 13:06:25 -07:00
remoteproc rpmsg updates for v7.1 2026-04-17 14:18:55 -07:00
resctrl arm64 updates for 7.1 (second round): 2026-04-20 16:46:22 -07:00
reset soc: late changes for 7.1 2026-04-23 08:57:24 -07:00
rpmsg rpmsg: Constify buffer passed to send API 2026-04-06 09:37:51 -05:00
rtc RTC for 7.1 2026-04-25 16:39:03 -07:00
s390 s390 updates for 7.1 merge window 2026-04-22 11:13:45 -07:00
sbus
scsi SCSI misc on 20260421 2026-04-21 08:22:18 -07:00
sh
siox
slimbus
soc rpmsg updates for v7.1 2026-04-17 14:18:55 -07:00
soundwire soundwire updates for 7.1 2026-04-17 10:16:53 -07:00
spi spi: Fixes for v7.1 2026-04-24 13:16:36 -07:00
spmi
ssb
staging Char/Misc/IIO/and others driver updates for 7.1-rc1 2026-04-24 13:23:50 -07:00
target SCSI misc on 20260421 2026-04-21 08:22:18 -07:00
tc
tee soc: drivers for 7.1 2026-04-16 20:34:34 -07:00
thermal bitmap updates for v7.1 2026-04-14 08:55:18 -07:00
thunderbolt thunderbolt: Changes for v7.1 merge window 2026-04-10 13:10:28 +02:00
tty TTY/Serial changes for 7.1-rc1 2026-04-19 08:44:41 -07:00
ufs scsi: ufs: core: Disable timestamp for Kioxia THGJFJT0E25BAIP 2026-04-08 22:27:16 -04:00
uio uio: replace deprecated mmap hook with mmap_prepare in uio_info 2026-04-05 13:53:44 -07:00
usb SCSI misc on 20260421 2026-04-21 08:22:18 -07:00
vdpa vdpa: use generic driver_override infrastructure 2026-04-04 00:47:50 +02:00
vfio vfio/cdx: Consolidate MSI configured state onto cdx_irqs 2026-04-21 12:01:22 -06:00
vhost Including fixes from Netfilter. 2026-04-23 16:50:42 -07:00
video fbdev: hgafb: Request memory region before ioremap 2026-04-22 17:02:55 +02:00
virt tsm for 7.1 2026-04-26 09:51:29 -07:00
virtio mm.git review status for linus..mm-stable 2026-04-15 12:59:16 -07:00
w1 w1: ds2490: drop redundant device reference 2026-04-03 10:55:12 +02:00
watchdog watchdog: ni903x_wdt: Convert to a platform driver 2026-04-07 21:06:59 +02:00
xen xen/privcmd: fix double free via VMA splitting 2026-04-23 15:32:59 +02:00
zorro
Kconfig net: remove ISDN subsystem and Bluetooth CMTP 2026-04-23 10:24:02 -07:00
Makefile net: remove ISDN subsystem and Bluetooth CMTP 2026-04-23 10:24:02 -07:00