Commit Graph

1444448 Commits

Author SHA1 Message Date
Shitalkumar Gandhi
701ea57fea net: rtsn: fix mdio_node leak in rtsn_mdio_alloc()
of_get_child_by_name() takes a reference. The rtsn_reset() and
rtsn_change_mode() failure paths jump to out_free_bus and leak
mdio_node.

Add out_put_node to drop it before falling through.

Fixes: b0d3969d2b ("net: ethernet: rtsn: Add support for Renesas Ethernet-TSN")
Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260505123236.406000-1-shitalkumar.gandhi@cambiumnetworks.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06 17:42:50 -07:00
Jakub Kicinski
e418273936 Merge branch 'netdevsim-psp-fix-init-and-uninit-bugs'
Daniel Zahka says:

====================
netdevsim: psp: fix init and uninit bugs

This series has three fixes. The first is a straightforward NULL
pointer dereference that is reachable by creating and destroying some
vfs on a kernel with INET_PSP enabled.

The last two patches deal with nsim_psp_rereg_write(), which is a
debugfs handler that reregisters netdevsim's psp_dev without
aquiescing and disabling tx/rx processing. This was added to enable
some tests in psp.py where a psp device is unregistered while it still
referenced by tcp socket state.

There are two issues with this code:
1. Calls to nsim_psp_uninit() are not properly serialized
2. netdevsim's psp_dev refcount can be released while nsim_do_psp() is
   reading from it.
====================

Link: https://patch.msgid.link/20260505-psd-rcu-v1-0-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06 17:39:22 -07:00
Daniel Zahka
07bdec3fc7 netdevsim: psp: rcu protect psp_dev reference
There are two issues with the way psp_dev is used in nsim_do_psp():

1. There is no check for IS_ERR() on the peers psp_dev, before
   dereferencing.
2. The refcount on this psp_dev can be dropped by
   nsim_psp_rereg_write()

To fix this, we can make netdevsim's reference to its psp_dev an rcu
reference, and then nsim_do_psp() can read the fields it needs from an
rcu critical section.

Fixes: f857478d62 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-3-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06 17:39:20 -07:00
Daniel Zahka
24c96a4200 netdevsim: psp: serialize calls to nsim_psp_uninit()
The debugfs write handler, nsim_psp_rereg_write(), can race against
nsim_destroy() and against itself, causing nsim_psp_uninit() to run
more than once concurrently. Two complementary changes serialize all
callers:

1. Delete the psp_rereg debugfs file from nsim_psp_uninit() before
   doing the actual teardown. debugfs_remove() drains any in-flight
   writers and prevents new ones from starting.

2. Add a mutex around the body of nsim_psp_rereg_write() so that two
   concurrent userspace writers cannot both enter the teardown path
   at once.

The teardown work itself is moved into a new __nsim_psp_uninit() that
the rereg handler calls under the mutex, while the public
nsim_psp_uninit() wraps it with the debugfs_remove()/mutex_destroy()
pair so nsim_destroy() doesn't have to know about the psp internals.

Fixes: f857478d62 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-2-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06 17:39:20 -07:00
Daniel Zahka
7ce3f1beda netdevsim: psp: only call nsim_psp_uninit() on PFs
VFs go through nsim_init_netdevsim_vf() which never calls
nsim_psp_init(), so ns->psp.dev stays NULL. nsim_psp_uninit() guards
with !IS_ERR(ns->psp.dev), so destroying a VF reaches
psp_dev_unregister(NULL) and dereferences NULL on the first
mutex_lock(&psd->lock):

  BUG: kernel NULL pointer dereference, address: 0000000000000020
  RIP: 0010:mutex_lock+0x1c/0x30
  Call Trace:
   psp_dev_unregister+0x2a/0x1a0
   nsim_psp_uninit+0x1f/0x40 [netdevsim]
   nsim_destroy+0x61/0x1e0 [netdevsim]
   __nsim_dev_port_del+0x47/0x90 [netdevsim]
   nsim_drv_configure_vfs+0xc9/0x130 [netdevsim]
   nsim_bus_dev_numvfs_store+0x79/0xb0 [netdevsim]

Gate nsim_psp_uninit() on nsim_dev_port_is_pf(), matching the pattern
already used for nsim_exit_netdevsim() and the bpf/ipsec/macsec/queue
teardowns.

Reproducer:
  modprobe netdevsim
  echo "10 1" > /sys/bus/netdevsim/new_device
  echo 1 > /sys/bus/netdevsim/devices/netdevsim10/sriov_numvfs
  devlink dev eswitch set netdevsim/netdevsim10 mode switchdev
  echo 0 > /sys/bus/netdevsim/devices/netdevsim10/sriov_numvfs

Fixes: f857478d62 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-1-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06 17:39:20 -07:00
Eric Dumazet
7aaa8f5e45 ipv6: fix potential UAF caused by ip6_forward_proxy_check()
ip6_forward_proxy_check() calls pskb_may_pull() which might re-allocate
skb->head.

Reload ipv6_hdr() after the pskb_may_pull() call to avoid using
the freed memory.

Fixes: e21e0b5f19 ("[IPV6] NDISC: Handle NDP messages to proxied addresses.")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260505130056.2927197-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06 17:29:23 -07:00
Jakub Kicinski
0e1368a28d selftests: drv-net: fix sort order of makefile and config
Recent changes added configs and tests in the wrong spot.

Link: https://lore.kernel.org/20260506170435.34984dfc@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06 17:22:24 -07:00
Jakub Kicinski
dc61989e37 ipsec-2026-05-05
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmn57igACgkQrB3Eaf9P
 W7cDig//aXeIEN6VUYPU6lTDYXNCWz2A7sM636rXMMizF1nVjkRtrZlzQFwE9pIm
 LOla+Mu1VLGVsuxaoYfW2NagKt6bUg3xEDrlOt+lL/Bn6hengdjVF9PibvP4XCjt
 5bwtg0xN0AysoktYS2v+2b+fSh5CSnQkcEcn9F2d+3zXmFlLpxuyPJqhHn54nHmI
 JPACVyk9bZdKutdfr86uThgWnTDInPvJ2vMRpRlwpGWx5f2JspJv1g4zzWzc38Ad
 yTcRZQXhZ7zfOaYFGjqMD0eHtFDPC+HqMTi0Ak9ngCBAFpZS8/iBJ3/TlukJjNcy
 q805gPyRqnpiVgm6NH55C8HUguzpD7m8tcjBbVADvIrMA0OzMw3mBxwFsbG2aaCs
 cPXxvtT7crDbKPtxvY5RhVJIvCe4BCMP/uqlmo7wuwPE01arVau5i4miZKGPTzXB
 LRNchWJMDIrwE/+MnAbJBXT5RfiN5RPvPdV5OdTlrofkwDzBjpTev5FeQq7QktSx
 ctPy7I28IRw+eCKlu2FNrUJ4x8C/7Fv1ZPADOSvd3D5PdaOAArUb3RhTGwC9giuo
 qKKv8Q30x5xyOv90MB3M8vQwM7mGUloIfZPN6AhRoaDGikdMyy6gZ8Y5M3noGUUJ
 D4z+kZgHy1ZrdYDM58CdfE1Kz/s96rA5aIHUVZQYonaz35YGRts=
 =WKO1
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-2026-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2026-05-05

1. Fix an IPv6 encapsulation error path that leaked route references
   when UDPv6 ESP decapsulation resolved to an error route.
   From Yilin Zhu.

2. Fix AH with ESN on async crypto paths by accounting for the extra
   high-order sequence number when reconstructing the temporary
   authentication layout in the completion callbacks.
   From Michael Bomarito.

3. Fix XFRM output so it does not overwrite already-correct inner header
   pointers when a tunnel layer such as VXLAN has already saved them.
   The fix comes with new selftests. From Cosmin Ratiu.

4. Add the missing native payload size entry for XFRM_MSG_MAPPING in the
   compat translation path. From Ruijie Li.

5. Harden __xfrm_state_delete() against repeated or inconsistent unhashing
   of state list nodes by keying the removal on actual list membership and
   using delete-and-init helpers. From Michal Kosiorek.

6. Prevent ESP from decrypting shared splice-backed skb fragments in place
   by marking UDP splice frags as shared and forcing copy-on-write in ESP
   input when needed. From Kuan-Ting Chen.

* tag 'ipsec-2026-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: esp: avoid in-place decrypt on shared skb frags
  xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete
  xfrm: provide message size for XFRM_MSG_MAPPING
  xfrm: Don't clobber inner headers when already set
  tools/selftests: Add a VXLAN+IPsec traffic test
  tools/selftests: Use a sensible timeout value for iperf3 client
  xfrm: ah: account for ESN high bits in async callbacks
  ipv6: xfrm6: release dst on error in xfrm6_rcv_encap()
====================

Link: https://patch.msgid.link/20260505132326.1362733-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06 16:49:42 -07:00
Jakub Kicinski
f4eac70d1e Includes changes:
* ensure MAC header offset is reset before delivering packet
 * ensure gro_cells_receive() and dstats_dev_add() are called with BH
   disabled
 * reduce ping count in selftest to ensure it completes within
   timeout
 -----BEGIN PGP SIGNATURE-----
 
 iJEEABYIADkWIQQKU153ubb5unbkl6Gx/ZpNW1HNdwUCafkekRsUgAAAAAAEAA5t
 YW51MiwyLjUrMS4xMiwyLDIACgkQsf2aTVtRzXdZAAEA2yDBkZdIiALO7V5ul5Ao
 Y2/jFxysR5fyCMiOdxTOruwBAJaX9KExSE2QucHKOBFLmrjIIqwI5Br6whljdZKt
 n1YE
 =UhhZ
 -----END PGP SIGNATURE-----

Merge tag 'ovpn-net-20260504' of https://github.com/OpenVPN/ovpn-net-next

Antonio Quartulli says:

====================
Includes changes:

* ensure MAC header offset is reset before delivering packet
* ensure gro_cells_receive() and dstats_dev_add() are called
  with BH disabled
* reduce ping count in selftest to ensure it completes within
  timeout

* tag 'ovpn-net-20260504' of https://github.com/OpenVPN/ovpn-net-next:
  selftests: ovpn: reduce ping count in test.sh
  ovpn: ensure packet delivery happens with BH disabled
  ovpn: reset MAC header before passing skb up
====================

Link: https://patch.msgid.link/20260504230305.2681646-1-antonio@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06 16:10:03 -07:00
Jakub Kicinski
bd75e1003d bluetooth pull request for net:
- hci_conn: fix potential UAF in create_big_sync
  - hci_event: fix memset typo
  - hci_event: Fix OOB read and infinite loop in hci_le_create_big_complete_evt
  - L2CAP: fix MPS check in l2cap_ecred_reconf_req
  - L2CAP: defer conn param update to avoid conn->lock/hdev->lock inversion
  - L2CAP: Fix null-ptr-deref in l2cap_sock_state_change_cb()
  - L2CAP: Fix null-ptr-deref in l2cap_sock_get_sndtimeo_cb()
  - L2CAP: Fix null-ptr-deref in l2cap_sock_new_connection_cb()
  - RFCOMM: pull credit byte with skb_pull_data()
  - SCO: fix sleeping under spinlock in sco_conn_ready
  - SCO: hold sk properly in sco_conn_ready
  - ISO: Fix data-race on dst in iso_sock_connect()
  - ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths
  - bnep: fix incorrect length parsing in bnep_rx_frame() extension handling
  - hci_uart: Fix NULL deref in recv callbacks when priv is uninitialized
  - virtio_bt: clamp rx length before skb_put
  - virtio_bt: validate rx pkt_type header length
  - HIDP: serialise l2cap_unregister_user via hidp_session_sem
  - btintel_pcie: treat boot stage bit 12 as warning
  - btmtk: validate WMT event SKB length before struct access
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmn7qCwZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKdbuD/wIj4GwiCd/vWz6qEdbK3Xl
 naw2i1HH4W3cLSDbEREQ7pJos+Uti6VqdzgW3yldzpKG3rZRjCx5hh3HxqmpuWmI
 LbCv4cI13ZfPgjfRqyjmX2AhpY8zkeOVy5wFiIVQsqsRm6s30g7lqxPkMPYG0K2G
 FDjS06iZsoRGXRFp2+lqpSk1H/90Bcz78yDyEr0qoHxpxUace2lx5gVmoZQxWasx
 Y5dcuNSVUvnftHMd4Lv2pehllpJDbmuyll1aVrhqEueRqdmyocjINXZRyYTdrECz
 8WR4tiax1zvl/eYgJ6zdVLJ1Iva1HyiTVN5tY0uSM03+u1P/OxSInkoo2VSZoIIK
 bQUFQ92Xml1J0qL6g0rwEHESEYzaJXz9Ai+XdAFzHv1RkziiYRkDqvPFjivqh/JG
 QeOuNosSKGfG9V5m02Ym/GVTdE59xonukNr+RaIdpt6djsybv4go+E8RpnxVyQvy
 5CMKchOvE6TnW3JRcaaXtC9cdMfOAjgBiebnWTguFBLutpPf1z8EhiKNFyYlt1yb
 r8tNhci6jimoD9hKzemEuKwyP07HnBo8B361kHByFYfhBHs+XZANtcVyMo6HtbqE
 94eRdWvKBvG3ixAP5/ujqrmp9HFyMBhZPc3XimZxkBx71/JCqcSpsQHGtQRvAiXy
 4FKJXqDINoOtpkggRQfkyQ==
 =KbxG
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2026-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - hci_conn: fix potential UAF in create_big_sync
 - hci_event: fix memset typo
 - hci_event: Fix OOB read and infinite loop in hci_le_create_big_complete_evt
 - L2CAP: fix MPS check in l2cap_ecred_reconf_req
 - L2CAP: defer conn param update to avoid conn->lock/hdev->lock inversion
 - L2CAP: Fix null-ptr-deref in l2cap_sock_state_change_cb()
 - L2CAP: Fix null-ptr-deref in l2cap_sock_get_sndtimeo_cb()
 - L2CAP: Fix null-ptr-deref in l2cap_sock_new_connection_cb()
 - RFCOMM: pull credit byte with skb_pull_data()
 - SCO: fix sleeping under spinlock in sco_conn_ready
 - SCO: hold sk properly in sco_conn_ready
 - ISO: Fix data-race on dst in iso_sock_connect()
 - ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths
 - bnep: fix incorrect length parsing in bnep_rx_frame() extension handling
 - hci_uart: Fix NULL deref in recv callbacks when priv is uninitialized
 - virtio_bt: clamp rx length before skb_put
 - virtio_bt: validate rx pkt_type header length
 - HIDP: serialise l2cap_unregister_user via hidp_session_sem
 - btintel_pcie: treat boot stage bit 12 as warning
 - btmtk: validate WMT event SKB length before struct access

* tag 'for-net-2026-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: HIDP: serialise l2cap_unregister_user via hidp_session_sem
  Bluetooth: hci_event: fix memset typo
  Bluetooth: RFCOMM: pull credit byte with skb_pull_data()
  Bluetooth: virtio_bt: validate rx pkt_type header length
  Bluetooth: virtio_bt: clamp rx length before skb_put
  Bluetooth: btmtk: validate WMT event SKB length before struct access
  Bluetooth: ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths
  Bluetooth: ISO: Fix data-race on dst in iso_sock_connect()
  Bluetooth: hci_uart: Fix NULL deref in recv callbacks when priv is uninitialized
  Bluetooth: btintel_pcie: treat boot stage bit 12 as warning
  Bluetooth: SCO: hold sk properly in sco_conn_ready
  Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_new_connection_cb()
  Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_get_sndtimeo_cb()
  Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_state_change_cb()
  Bluetooth: l2cap: defer conn param update to avoid conn->lock/hdev->lock inversion
  Bluetooth: l2cap: fix MPS check in l2cap_ecred_reconf_req
  Bluetooth: bnep: fix incorrect length parsing in bnep_rx_frame() extension handling
  Bluetooth: hci_event: Fix OOB read and infinite loop in hci_le_create_big_complete_evt
  Bluetooth: hci_conn: fix potential UAF in create_big_sync
  Bluetooth: SCO: fix sleeping under spinlock in sco_conn_ready
====================

Link: https://patch.msgid.link/20260506204553.58686-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06 15:43:34 -07:00
Michael Bommarito
c5d415596c Bluetooth: HIDP: serialise l2cap_unregister_user via hidp_session_sem
Commit dbf666e4fc ("Bluetooth: HIDP: Fix possible UAF") made
hidp_session_remove() drop the L2CAP reference and set
session->conn = NULL once the session is considered removed, and
added a bare if (session->conn) guard around the kthread-exit
l2cap_unregister_user() call in hidp_session_thread().  The sibling
ioctl site in hidp_connection_del() still reads session->conn
unlocked and unguarded, and the kthread-exit guard itself is a
lockless double-read.

hidp_session_find() drops hidp_session_sem before returning, so
hidp_session_remove() can null session->conn between the lookup and
the call in hidp_connection_del().  Worse, since commit 752a6c9596
("Bluetooth: L2CAP: Fix use-after-free in l2cap_unregister_user")
takes mutex_lock(&conn->lock) inside l2cap_unregister_user(), a
stale non-NULL snapshot also UAFs on conn->lock.  v1 only added an
if (session->conn) guard at the ioctl site, which doesn't address
either race; Luiz suggested snapshotting session->conn under the
sem and clearing it before the call.

Taking hidp_session_sem across l2cap_unregister_user() would be
wrong: l2cap_conn_del() already establishes the lock order

  conn->lock -> hidp_session_sem

via l2cap_unregister_all_users() -> user->remove ==
hidp_session_remove(), so taking hidp_session_sem before conn->lock
would AB/BA deadlock.

Factor a helper hidp_session_unregister_conn() that under
down_write(&hidp_session_sem) snapshots session->conn and clears
the member, then outside the sem calls l2cap_unregister_user() and
l2cap_conn_put() on the snapshot.  Call it from both
hidp_connection_del() and hidp_session_thread()'s exit path.  At
most one consumer wins the write-sem; later callers observe
session->conn == NULL and skip the unregister and put, so the
reference hidp_session_new() took via l2cap_conn_get() is consumed
exactly once.  session_free() already tolerates a NULL session->conn.

Fixes: dbf666e4fc ("Bluetooth: HIDP: Fix possible UAF")
Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Link: https://lore.kernel.org/all/20260422011437.176643-1-michael.bommarito@gmail.com/
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:27:53 -04:00
Jann Horn
72d97cae2a Bluetooth: hci_event: fix memset typo
hci_le_big_sync_established_evt() currently does:

    conn->num_bis = 0;
    memset(conn->bis, 0, sizeof(conn->num_bis));

sizeof(conn->num_bis) is wrong - it would make sense to either use
conn->num_bis (before setting that to 0) or sizeof(conn->bis).
Fix it by using sizeof(conn->bis), the least intrusive change.

Luckily, nothing actually depends on this memset() working properly:
Nothing seems to ever read from conn->bis beyond conn->num_bis, and when
conn->num_bis is increased, the corresponding elements of conn->bis are
initialized. So I think this line could also just be removed.

This is a purely theoretical fix and should have no impact on actual
behavior.

Fixes: 42ecf19471 ("Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:27:29 -04:00
Pengpeng Hou
8f59d17b18 Bluetooth: RFCOMM: pull credit byte with skb_pull_data()
rfcomm_recv_data() treats the first payload byte as a credit field when
the UIH frame carries PF and credit-based flow control is enabled.

After the header has been stripped, the PF/CFC path consumes that byte
with a direct skb->data dereference followed by skb_pull(). A malformed
short frame can reach this path without a byte available.

Use skb_pull_data() so the length check and pull happen together before
the returned credit byte is consumed.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:23:20 -04:00
Michael Bommarito
daf23014e5 Bluetooth: virtio_bt: validate rx pkt_type header length
virtbt_rx_handle() reads the leading pkt_type byte from the RX skb
and forwards the remainder to hci_recv_frame() for every
event/ACL/SCO/ISO type, without checking that the remaining payload
is at least the fixed HCI header for that type.

After the preceding patch bounds the backend-supplied used.len to
[1, VIRTBT_RX_BUF_SIZE], a one-byte completion still reaches
hci_recv_frame() with skb->len already pulled to 0. If the byte
happened to be HCI_ACLDATA_PKT, the ACL-vs-ISO classification
fast-path in hci_dev_classify_pkt_type() dereferences
hci_acl_hdr(skb)->handle whenever the HCI device has an active
CIS_LINK, BIS_LINK, or PA_LINK connection, reading two bytes of
uninitialized RX-buffer data. The same hazard exists for every
packet type the driver accepts because none of the switch cases in
virtbt_rx_handle() check skb->len against the per-type minimum HCI
header size before handing the frame to the core.

After stripping pkt_type, require skb->len to cover the fixed
header size for the selected type (event 2, ACL 4, SCO 3, ISO 4)
before calling hci_recv_frame(); drop ratelimited otherwise.
Unknown pkt_type values still take the original kfree_skb() default
path.

Use bt_dev_err_ratelimited() because both the length and pkt_type
values come from an untrusted backend that can otherwise flood the
kernel log.

Fixes: 160fbcf3bf ("Bluetooth: virtio_bt: Use skb_put to set length")
Cc: stable@vger.kernel.org
Cc: Soenke Huster <soenke.huster@eknoes.de>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:22:33 -04:00
Michael Bommarito
21bd244b6d Bluetooth: virtio_bt: clamp rx length before skb_put
virtbt_rx_work() calls skb_put(skb, len) where len comes directly
from virtqueue_get_buf() with no validation against the buffer we
posted to the device. The RX skb is allocated in virtbt_add_inbuf()
and exposed to virtio as exactly 1000 bytes via sg_init_one().

Checking len against skb_tailroom(skb) is not sufficient because
alloc_skb() can leave more tailroom than the 1000 bytes actually
handed to the device. A malicious or buggy backend can therefore
report used.len between 1001 and skb_tailroom(skb), causing skb_put()
to include uninitialized kernel heap bytes that were never written by
the device.

The same path also accepts len == 0, in which case skb_put(skb, 0)
leaves the skb empty but virtbt_rx_handle() still reads the pkt_type
byte from skb->data, consuming uninitialized memory.

Define VIRTBT_RX_BUF_SIZE once and reuse it in alloc_skb() and
sg_init_one(), and gate virtbt_rx_work() on that same constant so
the bound checked matches the buffer actually exposed to the device.
Reject used.len == 0 in the same gate so an empty completion can
no longer reach virtbt_rx_handle().

Use bt_dev_err_ratelimited() because the length value comes from an
untrusted backend that can otherwise flood the kernel log.

Same class of bug as commit c04db81cd0 ("net/9p: Fix buffer
overflow in USB transport layer"), which hardened the USB 9p
transport against unchecked device-reported length.

Fixes: 160fbcf3bf ("Bluetooth: virtio_bt: Use skb_put to set length")
Cc: stable@vger.kernel.org
Cc: Soenke Huster <soenke.huster@eknoes.de>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:22:25 -04:00
Tristan Madani
634a4408c0 Bluetooth: btmtk: validate WMT event SKB length before struct access
btmtk_usb_hci_wmt_sync() casts the WMT event response SKB data to
struct btmtk_hci_wmt_evt (7 bytes) and struct btmtk_hci_wmt_evt_funcc
(9 bytes) without first checking that the SKB contains enough data.
A short firmware response causes out-of-bounds reads from SKB tailroom.

Use skb_pull_data() to validate and advance past the base WMT event
header. For the FUNC_CTRL case, pull the additional status field bytes
before accessing them.

Fixes: d019930b00 ("Bluetooth: btmtk: move btusb_mtk_hci_wmt_sync to btmtk.c")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:22:19 -04:00
SeungJu Cheon
f958c7805b Bluetooth: ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths
Several iso_pi(sk) fields (qos, qos_user_set, bc_sid, base, base_len,
sync_handle, bc_num_bis) are written under lock_sock in
iso_sock_setsockopt() and iso_sock_bind(), but read and written under
hci_dev_lock only in two other paths:

  - iso_connect_bis() / iso_connect_cis(), invoked from connect(2),
    read qos/base/bc_sid and reset qos to default_qos on the
    qos_user_set validation failure -- all without lock_sock.

  - iso_connect_ind(), invoked from hci_rx_work, writes sync_handle,
    bc_sid, qos.bcast.encryption, bc_num_bis, base and base_len on
    PA_SYNC_ESTABLISHED / PAST_RECEIVED / BIG_INFO_ADV_REPORT /
    PER_ADV_REPORT events. The BIG_INFO handler additionally passes
    &iso_pi(sk)->qos together with sync_handle / bc_num_bis / bc_bis
    to hci_conn_big_create_sync() while setsockopt may be mutating
    them.

Acquire lock_sock around the affected accesses in both paths.

The locking order hci_dev_lock -> lock_sock matches the existing
iso_conn_big_sync() precedent, whose comment documents the same
requirement for hci_conn_big_create_sync(). The HCI connect/bind
helpers do not wait for command completion -- they enqueue work via
hci_cmd_sync_queue{,_once}() / hci_le_create_cis_pending() and
return -- so the added hold time is comparable to iso_conn_big_sync().

KCSAN report:

BUG: KCSAN: data-race in iso_connect_cis / iso_sock_setsockopt

read to 0xffffa3ae8ce3cdc8 of 1 bytes by task 335 on cpu 0:
 iso_connect_cis+0x49f/0xa20
 iso_sock_connect+0x60e/0xb40
 __sys_connect_file+0xbd/0xe0
 __sys_connect+0xe0/0x110
 __x64_sys_connect+0x40/0x50
 x64_sys_call+0xcad/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

write to 0xffffa3ae8ce3cdc8 of 60 bytes by task 334 on cpu 1:
 iso_sock_setsockopt+0x69a/0x930
 do_sock_setsockopt+0xc3/0x170
 __sys_setsockopt+0xd1/0x130
 __x64_sys_setsockopt+0x64/0x80
 x64_sys_call+0x1547/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 334 Comm: iso_setup_race Not tainted 7.0.0-10949-g8541d8f725c6 #44 PREEMPT(lazy)

The iso_connect_ind() races were found by inspection.

Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:22:05 -04:00
SeungJu Cheon
ca40d48107 Bluetooth: ISO: Fix data-race on dst in iso_sock_connect()
iso_sock_connect() copies the destination address into
iso_pi(sk)->dst under lock_sock, then releases the lock and reads
it back with bacmp() to decide between the CIS and BIS connect
paths:

    lock_sock(sk);
    bacpy(&iso_pi(sk)->dst, &sa->iso_bdaddr);
    iso_pi(sk)->dst_type = sa->iso_bdaddr_type;
    release_sock(sk);

    if (bacmp(&iso_pi(sk)->dst, BDADDR_ANY))  // <- no lock held

This read after release_sock() races with any concurrent write to
iso_pi(sk)->dst on the same socket.

Fix by reading the destination address directly from the local
sockaddr argument (sa->iso_bdaddr) instead of iso_pi(sk)->dst.
Since sa is a function-local argument, reading it requires no
locking and avoids the race.

This patch addresses only the bacmp() race in iso_sock_connect();
other unprotected iso_pi(sk) accesses are fixed separately in the
next patch.

KCSAN report:

BUG: KCSAN: data-race in memcmp+0x39/0xb0

race at unknown origin, with read to 0xffff8f96ea66dde3 of 1 bytes by task 549 on cpu 1:
 memcmp+0x39/0xb0
 iso_sock_connect+0x275/0xb40
 __sys_connect_file+0xbd/0xe0
 __sys_connect+0xe0/0x110
 __x64_sys_connect+0x40/0x50
 x64_sys_call+0xcad/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x00 -> 0xee

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 549 Comm: iso_race_combin Not tainted 7.0.0-08391-g1d51b370a0f8 #40 PREEMPT(lazy)

Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:21:58 -04:00
Aurelien DESBRIERES
902fe40bce Bluetooth: hci_uart: Fix NULL deref in recv callbacks when priv is uninitialized
When a fault is injected during hci_uart line discipline setup, the
proto open() callback may fail leaving hu->priv as NULL. A subsequent
TIOCSTI ioctl can trigger the recv() callback before priv is
initialized, causing a NULL pointer dereference.

Fix all four affected HCI UART protocol drivers by adding a NULL check
on hu->priv at the start of their recv() callbacks: h4, h5, ath and
bcsp.

Reported-by: syzbot+ff30eeab8e07b37d524e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ff30eeab8e07b37d524e
Signed-off-by: Aurelien DESBRIERES <aurelien@hackers.camp>
Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:21:43 -04:00
Sai Teja Aluvala
5917dd39db Bluetooth: btintel_pcie: treat boot stage bit 12 as warning
CSR boot stage register bit 12 is documented as a device warning,
not a fatal error. Rename the bit definition accordingly and stop
including it in btintel_pcie_in_error().

This keeps warning-only boot stage values from being classified as
errors while preserving abort-handler state as the actual error
condition.

Fixes: 190377500f ("Bluetooth: btintel_pcie: Dump debug registers on error")
Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Sai Teja Aluvala <aluvala.sai.teja@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:21:34 -04:00
Pauli Virtanen
4e37f6452d Bluetooth: SCO: hold sk properly in sco_conn_ready
sk deref in sco_conn_ready must be done either under conn->lock, or
holding a refcount, to avoid concurrent close. conn->sk and parent sk is
currently accessed without either, and without checking parent->sk_state:

    [Task 1]            [Task 2]
                        sco_sock_release
    sco_conn_ready
      sk = conn->sk
                          lock_sock(sk)
                            conn->sk = NULL
      lock_sock(sk)
                          release_sock(sk)
                          sco_sock_kill(sk)
       UAF on sk deref

and similarly for access to sco_get_sock_listen() return value.

Fix possible UAF by holding sk refcount in sco_conn_ready() and making
sco_get_sock_listen() increase refcount. Also recheck after lock_sock
that the socket is still valid.  Adjust conn->sk locking so it's
protected also by lock_sock() of the associated socket if any.

Fixes: 27c24fda62 ("Bluetooth: switch to lock_sock in SCO")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:21:25 -04:00
Siwei Zhang
0a120d9616 Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_new_connection_cb()
Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().

Fixes: 80808e431e ("Bluetooth: Add l2cap_chan_ops abstraction")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:21:09 -04:00
Siwei Zhang
78a88d43da Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_get_sndtimeo_cb()
Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().

Fixes: 8d836d71e2 ("Bluetooth: Access sk_sndtimeo indirectly in l2cap_core.c")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:21:07 -04:00
Siwei Zhang
2ff1a41a91 Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_state_change_cb()
Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().

Fixes: 89bc500e41 ("Bluetooth: Add state tracking to struct l2cap_chan")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:21:04 -04:00
Mikhail Gavrilov
91b5a598b5 Bluetooth: l2cap: defer conn param update to avoid conn->lock/hdev->lock inversion
When a BLE peripheral sends an L2CAP Connection Parameter Update Request
the processing path is:

  process_pending_rx()          [takes conn->lock]
    l2cap_le_sig_channel()
      l2cap_conn_param_update_req()
        hci_le_conn_update()    [takes hdev->lock]

Meanwhile other code paths take the locks in the opposite order:

  l2cap_chan_connect()          [takes hdev->lock]
    ...
      mutex_lock(&conn->lock)

  l2cap_conn_ready()            [hdev->lock via hci_cb_list_lock]
    ...
      mutex_lock(&conn->lock)

This is a classic AB/BA deadlock which lockdep reports as a circular
locking dependency when connecting a BLE MIDI keyboard (Carry-On FC-49).

Fix this by making hci_le_conn_update() defer the HCI command through
hci_cmd_sync_queue() so it no longer needs to take hdev->lock in the
caller context.  The sync callback uses __hci_cmd_sync_status_sk() to
wait for the HCI_EV_LE_CONN_UPDATE_COMPLETE event, then updates the
stored connection parameters (hci_conn_params) and notifies userspace
(mgmt_new_conn_param) only after the controller has confirmed the update.

A reference on hci_conn is held via hci_conn_get()/hci_conn_put() for
the lifetime of the queued work to prevent use-after-free, and
hci_conn_valid() is checked before proceeding in case the connection was
removed while the work was pending.  The hci_dev_lock is held across
hci_conn_valid() and all conn field accesses to prevent a concurrent
disconnect from invalidating the connection mid-use.

Fixes: f044eb0524 ("Bluetooth: Store latency and supervision timeout in connection params")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:20:51 -04:00
Dudu Lu
4f42363c81 Bluetooth: l2cap: fix MPS check in l2cap_ecred_reconf_req
The L2CAP specification states that if more than one channel is being
reconfigured, the MPS shall not be decreased. The current check has
two issues:

1) The comparison uses >= (greater-than-or-equal), which incorrectly
   rejects reconfiguration requests where the MPS stays the same.
   Since the spec says MPS "shall be greater than or equal to the
   current MPS", only a strict decrease (remote_mps > mps) should be
   rejected. Keeping the same MPS is valid.

2) The multi-channel guard uses `&& i` (loop index) to approximate
   "more than one channel", but this incorrectly allows MPS decrease
   for the first channel (i==0) even when multiple channels are being
   reconfigured. Replace with `&& num_scid > 1` which correctly
   checks whether the request covers more than one channel.

Fixes: 7accb1c432 ("Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ")
Signed-off-by: Dudu Lu <phx0fer@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:20:38 -04:00
Dudu Lu
72b8deccff Bluetooth: bnep: fix incorrect length parsing in bnep_rx_frame() extension handling
In bnep_rx_frame(), the BNEP_FILTER_NET_TYPE_SET and
BNEP_FILTER_MULTI_ADDR_SET extension header parsing has two bugs:

1) The 2-byte length field is read with *(u16 *)(skb->data + 1), which
   performs a native-endian read. The BNEP protocol specifies this field
   in big-endian (network byte order), and the same file correctly uses
   get_unaligned_be16() for the identical fields in
   bnep_ctrl_set_netfilter() and bnep_ctrl_set_mcfilter().

2) The length is multiplied by 2, but unlike BNEP_SETUP_CONN_REQ where
   the length byte counts UUID pairs (requiring * 2 for two UUIDs per
   entry), the filter extension length field already represents the total
   data size in bytes. This is confirmed by bnep_ctrl_set_netfilter()
   which reads the same field as a byte count and divides by 4 to get
   the number of filter entries.

   The bogus * 2 means skb_pull advances twice as far as it should,
   either dropping valid data from the next header or causing the pull
   to fail entirely when the doubled length exceeds the remaining skb.

Fix by splitting the pull into two steps: first use skb_pull_data() to
safely pull and validate the 3-byte fixed header (ctrl type + length),
then pull the variable-length data using the properly decoded length.

Fixes: bf8b9a9cb7 ("Bluetooth: bnep: Add support to extended headers of control frames")
Signed-off-by: Dudu Lu <phx0fer@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:19:09 -04:00
Luiz Augusto von Dentz
5ddb801426 Bluetooth: hci_event: Fix OOB read and infinite loop in hci_le_create_big_complete_evt
hci_le_create_big_complete_evt() iterates over BT_BOUND connections for
a BIG handle using a while loop, accessing ev->bis_handle[i++] on each
iteration.  However, there is no check that i stays within ev->num_bis
before the array access.

When a controller sends a LE_Create_BIG_Complete event with fewer
bis_handle entries than there are BT_BOUND connections for that BIG,
or with num_bis=0, the loop reads beyond the valid bis_handle[] flex
array into adjacent heap memory.  Since the out-of-bounds values
typically exceed HCI_CONN_HANDLE_MAX (0x0EFF), hci_conn_set_handle()
rejects them and the connection remains in BT_BOUND state.  The same
connection is then found again by hci_conn_hash_lookup_big_state(),
creating an infinite loop with hci_dev_lock held.

Fix this by terminating the BIG if in case not all BIS could be setup
properly.

Fixes: a0bfde167b ("Bluetooth: ISO: Add support for connecting multiple BISes")
Cc: stable@vger.kernel.org
Signed-off-by: ZhiTao Ou <hkbinbinbin@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 16:18:22 -04:00
David Carlier
0beddb0c38 Bluetooth: hci_conn: fix potential UAF in create_big_sync
Add hci_conn_valid() check in create_big_sync() to detect stale
connections before proceeding with BIG creation. Handle the
resulting -ECANCELED in create_big_complete() and re-validate the
connection under hci_dev_lock() before dereferencing, matching the
pattern used by create_le_conn_complete() and create_pa_complete().

Keep the hci_conn object alive across the async boundary by taking
a reference via hci_conn_get() when queueing create_big_sync(), and
dropping it in the completion callback. The refcount and the lock
are complementary: the refcount keeps the object allocated, while
hci_dev_lock() serializes hci_conn_hash_del()'s list_del_rcu() on
hdev->conn_hash, as required by hci_conn_del().

hci_conn_put() is called outside hci_dev_unlock() so the final put
(which resolves to kfree() via bt_link_release) does not run under
hdev->lock, though the release path would be safe either way.

Without this, create_big_complete() would unconditionally
dereference the conn pointer on error, causing a use-after-free
via hci_connect_cfm() and hci_conn_del().

Fixes: eca0ae4aea ("Bluetooth: Add initial implementation of BIS connections")
Cc: stable@vger.kernel.org
Co-developed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 15:58:40 -04:00
Pauli Virtanen
b819db93d7 Bluetooth: SCO: fix sleeping under spinlock in sco_conn_ready
sco_conn_ready calls sleeping functions under conn->lock spinlock.

The critical section can be reduced: conn->hcon is modified only with
hdev->lock held. It is guaranteed to be held in sco_conn_ready, so
conn->lock is not needed to guard it.

Move taking conn->lock after lock_sock(parent). This also follows the
lock ordering lock_sock() > conn->lock elsewhere in the file.

Fixes: 27c24fda62 ("Bluetooth: switch to lock_sock in SCO")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06 15:58:29 -04:00
Jakub Kicinski
b89e0100a5 Quite a number of fixes now:
- mac80211
    - remove HT NSS validation to work with broken APs
      (with a kunit fix now)
    - remove 'static' that could cause races
    - check station link lookup before further processing
    - fix use-after-free due to delete in list iteration
    - remove AP station on assoc failures to fix crashes
  - ath12k
    - fix OF node refcount imbalance
    - fix queue flush ("REO update") in MLO
    - fix RCU assert
  - ath12k:
    - fix Kconfig with POWER_SEQUENCING
    - fix WMI buffer leaks on error conditions
    - don't use uninitialized stack data when processing RSSI events
    - fix logic for determining the peer ID in the RX path
  - ath5k: fix a potential stack buffer overwrite
  - rsi: fix thread lifetime race
  - brcmfmac: fix potential UAF
  - nl80211:
    - stricter permissions/checks for PMK and netns
    - fix netlink policy vs. code type confusion
  - cw1200: revert a broken locking change
  - various fixes to not trust values from firmware
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmn7Hj8ACgkQ10qiO8sP
 aACOiA//QZXrPPCxJmCkThP9DUYv2sxRAbTEEJLRJcXZODWJ7nBDn7tdM/nvOJmm
 8Y0v/+DqhqCVJ4Zkvh4u8S6Z7tGt83LKi+j8dAMoZw+Bn2Qq5mFvOK9ENOk4pfmB
 isVKTDFxAgbgtkClo3eygK6z9KuClY5JbQ9wmcE450cjuIYQ4z6UmNTeHcnsombX
 flvKX8JXINwolX1MrL5t+qMZ14bd+BL0Ui9VkvxxuECvdrQemhf7iBfCELfs8Q2r
 PxQcfPq5zIKPTqAcR50HPV/1YDPk66tG3P3NxA/DvAYUXXc7XHr0RYD5IJ/HYafM
 APVHA0XU4aoyDLiWSrSDj1WoRs+MYIlUl7BRdrE4ABenwB2lGTkfdUYuq9dQKPtQ
 Ku92UATOrXdMEZH7TVnme4RbtotWH+nUCw1os/OyoGNfL3czFJIqntUXVjLuUB4e
 Bo3qWtTHVmu3B+OJlNsKiiz8d4L5FxHFnaSKA4MkEfNrZsSuV4N4Sl6PsWMWvJUa
 G/EnUJiDKBaFO0lDAZRtYFr8CVxRBBlCKfCQs+mwxLXHjB7BZW3eA5Ps9qWq4krV
 VbNwDTBwIPdWz12UfHmSpkoIPCbrTcbvdUM7uNwJKr8nuulve2lkZo73LEhEX8k2
 jdXxDndaQyWX9Lmx5b9NpA/xRqxRlgFN7U3ZV8ntH4d9IsQOzZM=
 =RBbh
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Quite a number of fixes now:
 - mac80211
   - remove HT NSS validation to work with broken APs
     (with a kunit fix now)
   - remove 'static' that could cause races
   - check station link lookup before further processing
   - fix use-after-free due to delete in list iteration
   - remove AP station on assoc failures to fix crashes
 - ath12k
   - fix OF node refcount imbalance
   - fix queue flush ("REO update") in MLO
   - fix RCU assert
 - ath12k:
   - fix Kconfig with POWER_SEQUENCING
   - fix WMI buffer leaks on error conditions
   - don't use uninitialized stack data when processing RSSI events
   - fix logic for determining the peer ID in the RX path
 - ath5k: fix a potential stack buffer overwrite
 - rsi: fix thread lifetime race
 - brcmfmac: fix potential UAF
 - nl80211:
   - stricter permissions/checks for PMK and netns
   - fix netlink policy vs. code type confusion
 - cw1200: revert a broken locking change
 - various fixes to not trust values from firmware

* tag 'wireless-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (25 commits)
  wifi: nl80211: re-check wiphy netns in nl80211_prepare_wdev_dump() continuation
  wifi: nl80211: require CAP_NET_ADMIN over the target netns in SET_WIPHY_NETNS
  wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage
  wifi: mac80211: remove station if connection prep fails
  wifi: mac80211: use safe list iteration in radar detect work
  wifi: libertas: notify firmware load wait on disconnect
  wifi: ath5k: do not access array OOB
  wifi: ath12k: fix peer_id usage in normal RX path
  wifi: ath12k: initialize RSSI dBm conversion event state
  wifi: ath12k: fix leak in some ath12k_wmi_xxx() functions
  wifi: cw1200: Revert "Fix locking in error paths"
  wifi: mac80211: tests: mark HT check strict
  wifi: rsi: fix kthread lifetime race between self-exit and external-stop
  wifi: mac80211: drop stray 'static' from fast-RX rx_result
  wifi: mac80211: check ieee80211_rx_data_set_link return in pubsta MLO path
  wifi: nl80211: require admin perm on SET_PMK / DEL_PMK
  wifi: libertas: fix integer underflow in process_cmdrequest()
  wifi: b43legacy: enforce bounds check on firmware key index in RX path
  wifi: b43: enforce bounds check on firmware key index in b43_rx()
  wifi: brcmfmac: Fix potential use-after-free issue when stopping watchdog task
  ...
====================

Link: https://patch.msgid.link/20260506110325.219675-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06 07:29:31 -07:00
Maoyi Xie
79240f3f6d wifi: nl80211: re-check wiphy netns in nl80211_prepare_wdev_dump() continuation
NL80211_CMD_GET_SCAN is implemented as a multi-call dumpit. The first
invocation of nl80211_prepare_wdev_dump() validates the requested wdev
against the caller's netns via __cfg80211_wdev_from_attrs(). Subsequent
invocations look up the same wiphy by its global index and do not check
that the wiphy is still in the caller's netns.

Add the same filter to the continuation path. If the wiphy's netns no
longer matches the caller's, return -ENODEV and the netlink dump
machinery terminates the walk cleanly.

Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506064854.2207105-3-maoyixie.tju@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-06 11:08:41 +02:00
Maoyi Xie
15994bb0cb wifi: nl80211: require CAP_NET_ADMIN over the target netns in SET_WIPHY_NETNS
NL80211_CMD_SET_WIPHY_NETNS dispatches with GENL_UNS_ADMIN_PERM, which
verifies that the caller has CAP_NET_ADMIN for the source netns. It
doesn't verify that the caller has CAP_NET_ADMIN over the target netns
selected by NL80211_ATTR_NETNS_FD or NL80211_ATTR_PID.

This diverges from the convention enforced in
net/core/rtnetlink.c::rtnl_get_net_ns_capable():

    /* For now, the caller is required to have CAP_NET_ADMIN in
     * the user namespace owning the target net ns.
     */
    if (!sk_ns_capable(sk, net->user_ns, CAP_NET_ADMIN))
        return ERR_PTR(-EACCES);

A user with CAP_NET_ADMIN in their own user namespace can therefore
push a wiphy into an arbitrary netns (including init_net) over which
they have no privilege.

Mirror the rtnetlink convention by requiring CAP_NET_ADMIN in the
target netns before calling cfg80211_switch_netns().

Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506064854.2207105-2-maoyixie.tju@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-06 11:05:52 +02:00
Johannes Berg
0f3c0a1973 wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage
This is documented as a u8 and has a policy of NLA_U8, but uses
nla_get_u32() which means it's completely broken on big-endian.
Fix it to use nla_get_u8().

Fixes: 9bb7e0f24e ("cfg80211: add peer measurement with FTM initiator API")
Link: https://patch.msgid.link/20260505113837.260159-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-06 11:03:21 +02:00
Johannes Berg
283fc9e44f wifi: mac80211: remove station if connection prep fails
If connection preparation fails for MLO connections, then the
interface is completely reset to non-MLD. In this case, we must
not keep the station since it's related to the link of the vif
being removed. Delete an existing station. Any "new_sta" is
already being removed, so that doesn't need changes.

This fixes a use-after-free/double-free in debugfs if that's
enabled, because a vif going from MLD (and to MLD, but that's
not relevant here) recreates its entire debugfs.

Cc: stable@vger.kernel.org
Fixes: 81151ce462 ("wifi: mac80211: support MLO authentication/association with one link")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260505151533.c4e52deb06ad.Iafe56cec7de8512626169496b134bce3a6c17010@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-06 11:02:57 +02:00
Jakub Kicinski
3e8ec3440b Merge branch 'xsk-fix-bugs-around-xsk-skb-allocation'
Jason Xing says:

====================
xsk: fix bugs around xsk skb allocation

There are rare issues around xsk_build_skb(). Some of them
were founded by Sashiko[1][2].

[1]: https://lore.kernel.org/all/20260415082654.21026-1-kerneljasonxing@gmail.com/
[2]: https://lore.kernel.org/all/20260418045644.28612-1-kerneljasonxing@gmail.com/
====================

Link: https://patch.msgid.link/20260502200722.53960-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:27:55 -07:00
Jason Xing
203cee647f xsk: fix u64 descriptor address truncation on 32-bit architectures
In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
uintptr_t cast:

    skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);

On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
mode the chunk offset is encoded in bits 48-63 of the descriptor
address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
lost entirely. The completion queue then returns a truncated address to
userspace, making buffer recycling impossible.

Fix this by handling the 32-bit case directly in
xsk_skb_destructor_set_addr(): when !CONFIG_64BIT, allocate an
xsk_addrs struct (the same path already used for multi-descriptor
SKBs) to store the full u64 address. The existing tagged-pointer logic
in xsk_skb_destructor_is_addr() stays unchanged: slab pointers returned
from kmem_cache_zalloc() are always word-aligned and therefore have
bit 0 clear, which correctly identifies them as a struct pointer
rather than an inline tagged address on every architecture.

Factor the shared kmem_cache_zalloc + destructor_arg assignment into
__xsk_addrs_alloc() and add a wrapper xsk_addrs_alloc() that handles
the inline-to-list upgrade (is_addr check + get_addr + num_descs = 1).
The three former open-coded kmem_cache_zalloc call sites now reduce to
a single call each.

Propagate the -ENOMEM from xsk_skb_destructor_set_addr() through
xsk_skb_init_misc() so the caller can clean up the skb via kfree_skb()
before skb->destructor is installed.

The overhead is one extra kmem_cache_zalloc per first descriptor on
32-bit only; 64-bit builds are completely unchanged.

Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c6 ("xsk: avoid data corruption on cq descriptor number")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-9-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:27:51 -07:00
Jason Xing
e0f229025a xsk: fix xsk_addrs slab leak on multi-buffer error path
When xsk_build_skb() / xsk_build_skb_zerocopy() sees the first
continuation descriptor, it promotes destructor_arg from an inlined
address to a freshly allocated xsk_addrs (num_descs = 1). The counter
is bumped to >= 2 only at the very end of a successful build (by calling
xsk_inc_num_desc()).

If the build fails in between (e.g. alloc_page() returns NULL with
-EAGAIN, or the MAX_SKB_FRAGS overflow hits), we jump to free_err, skip
calling xsk_inc_num_desc() to increment num_descs and leave the half-built
skb attached to xs->skb for the app to retry. The skb now has
1) destructor_arg = a real xsk_addrs pointer,
2) num_descs = 1

If the app never retries and just close()s the socket, xsk_release()
calls xsk_drop_skb() -> xsk_consume_skb(), which decides whether to
free xsk_addrs by testing num_descs > 1:

    if (unlikely(num_descs > 1))
        kmem_cache_free(xsk_tx_generic_cache, destructor_arg);

Because num_descs is exactly 1 the branch is skipped and the
xsk_addrs object is leaked to the xsk_tx_generic_cache slab.

Fix it by directly testing if destructor_arg is still addr. Or else it
is modified and used to store the newly allocated memory from
xsk_tx_generic_cache regardless of increment of num_desc, which we
need to handle.

Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c6 ("xsk: avoid data corruption on cq descriptor number")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-8-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:27:51 -07:00
Jason Xing
8c2cff50af xsk: avoid skb leak in XDP_TX_METADATA case
Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means no frag and xs->skb is
   NULL) and users enable metadata feature.
2. xsk_skb_metadata() returns a error code.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
4. there is no chance to free this skb anymore.

Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
Fixes: 30c3055f9c ("xsk: wrap generic metadata handling onto separate function")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-7-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:27:50 -07:00
Jason Xing
3dec153ae4 xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()
Once xsk_skb_init_misc() has been called on an skb, its destructor is
set to xsk_destruct_skb(), which submits the descriptor address(es) to
the completion queue and advances the CQ producer. If such an skb is
subsequently freed via kfree_skb() along an error path - before the
skb has ever been handed to the driver - the destructor still runs and
submits a bogus, half-initialized address to the CQ.

Postpone the init phase when we believe the allocation of first frag is
successfully completed. Before this init, skb can be safely freed by
kfree_skb().

Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/
Fixes: c30d084960 ("xsk: avoid overwriting skb fields for multi-buffer traffic")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-6-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:27:50 -07:00
Jason Xing
0f3776583d xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding
any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb
to accumulate subsequent descriptors.

If a continuation descriptor fails (e.g. alloc_page returns NULL with
-EAGAIN), we jump to free_err where the condition:

  if (skb && !skb_shinfo(skb)->nr_frags)
      kfree_skb(skb);

evaluates to true because nr_frags is still 0 (the first descriptor
used the linear area, not frags). This frees the skb while xs->skb
still points to it, creating a dangling pointer. On the next transmit
attempt or socket close, xs->skb is dereferenced, causing a
use-after-free or double-free.

Fix by using a !xs->skb check to handle first frag situation, ensuring
we only free skbs that were freshly allocated in this call
(xs->skb is NULL) and never free an in-progress multi-buffer skb that
the caller still references.

Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
Fixes: 6b9c129c2f ("xsk: remove @first_frag from xsk_build_skb()")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-5-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:27:50 -07:00
Jason Xing
8cd3c1c6e7 xsk: handle NULL dereference of the skb without frags issue
When a first descriptor (xs->skb == NULL) triggers -EOVERFLOW in
xsk_build_skb_zerocopy() (e.g., MAX_SKB_FRAGS exceeded), the
free_err -EOVERFLOW handler unconditionally dereferences xs->skb
via xsk_inc_num_desc(xs->skb) and xsk_drop_skb(xs->skb), causing
a NULL pointer dereference.

Fix this by guarding the existing xsk_inc_num_desc()/xsk_drop_skb()
calls with an xs->skb check (for the continuation case), and add
an else branch for the first-descriptor case that manually cancels
the one reserved CQ slot and increments invalid_descs by one to
account for the single invalid descriptor.

Fixes: cf24f5a5fe ("xsk: add support for AF_XDP multi-buffer on Tx path")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-4-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:27:50 -07:00
Jason Xing
0bb7a9caf5 xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means xs->skb is NULL) and
   hit the limit MAX_SKB_FRAGS.
2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
   is why bug can be triggered.
4. there is no chance to free this skb anymore.

Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
call xsk_drop_skb(xs->skb) to do the right thing.

Fixes: cf24f5a5fe ("xsk: add support for AF_XDP multi-buffer on Tx path")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:27:50 -07:00
Jason Xing
d73a9a63f9 xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices
skb_checksum_help() is a common helper that writes the folded
16-bit checksum back via skb->data + csum_start + csum_offset,
i.e. it relies on the skb's linear head and fails (with WARN_ONCE
and -EINVAL) when skb_headlen() is 0.

AF_XDP generic xmit takes two very different paths depending on the
netdev. Drivers that advertise IFF_TX_SKB_NO_LINEAR (e.g. virtio_net)
skip the "copy payload into a linear head" step on purpose as a
performance optimisation: xsk_build_skb_zerocopy() only attaches UMEM
pages as frags and never calls skb_put(), so skb_headlen() stays 0
for the whole skb. For these skbs there is simply no linear area for
skb_checksum_help() to write the csum into - the sw-csum fallback is
structurally inapplicable.

The patch tries to catch this and reject the combination with error at
setup time. Rejecting at bind() converts this silent per-packet failure
into a synchronous, actionable -EOPNOTSUPP at setup time. HW csum and
launch_time metadata on IFF_TX_SKB_NO_LINEAR drivers are unaffected
because they do not call skb_checksum_help().

Without the patch, every descriptor carrying 'XDP_TX_METADATA |
XDP_TXMD_FLAGS_CHECKSUM' produces:
1) a WARN_ONCE "offset (N) >= skb_headlen() (0)" from skb_checksum_help(),
2) sendmsg() returning -EINVAL without consuming the descriptor
   (invalid_descs is not incremented),
3) a wedged TX ring: __xsk_generic_xmit() does not advance the
    consumer on non-EOVERFLOW errors, so the next sendmsg() re-reads
    the same descriptor and re-hits the same WARN until the socket
    is closed.

Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/#t
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Fixes: 30c3055f9c ("xsk: wrap generic metadata handling onto separate function")
Link: https://patch.msgid.link/20260502200722.53960-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:27:49 -07:00
Jakub Kicinski
22675f0726 Merge branch 'net-mlx5-fixes-for-socket-direct'
Tariq Toukan says:

====================
net/mlx5: Fixes for Socket-Direct

This series fixes several race conditions and bugs in the mlx5
Socket-Direct (SD) single netdev flow.

Patch 1 serializes mlx5_sd_init()/mlx5_sd_cleanup() with
mlx5_devcom_comp_lock() and tracks the SD group state on the primary
device, preventing concurrent or duplicate bring-up/tear-down.

Patch 2 fixes the debugfs "multi-pf" directory being stored on the
calling device's sd struct instead of the primary's, which caused
memory leaks and recreation errors when cleanup ran from a different PF.

Patch 3 fixes a race where a secondary PF could access the primary's
auxiliary device after it had been unbound, by holding the primary's
device lock while operating on its auxiliary device.

Patch 4 fixes missing cleanup on ETH probe errors. The analogous gap on
the resume path requires introducing sd_suspend/resume APIs that only
destroy FW resources and is left for a follow-up series.
====================

Link: https://patch.msgid.link/20260504180206.268568-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:13:12 -07:00
Shay Drory
d466ddda55 net/mlx5e: SD, Fix race condition in secondary device probe/remove
When utilizing Socket-Direct single netdev functionality the driver
resolves the actual auxiliary device using mlx5_sd_get_adev(). However,
the current implementation returns the primary ETH auxiliary device
without holding the device lock, leading to a potential race condition
where the ETH device could be unbound or removed concurrently during
probe, suspend, resume, or remove operations.[1]

Fix this by introducing mlx5_sd_put_adev() and updating
mlx5_sd_get_adev() so that secondaries devices would get a ref and
acquire the device lock of the returned auxiliary device. After the lock
is acquired, a second devcom check is needed[2].
In addition, update The callers to pair the get operation with the new
put operation, ensuring the lock is held while the auxiliary device is
being operated on and released afterwards.

The "primary" designation is determined once in sd_register(). It's set
before devcom is marked ready, and it never changes after that.
In Addition, The primary path never locks a secondary: When the primary
device invoke mlx5_sd_get_adev(), it sees dev == primary and returns.
no additional lock is taken.
Therefore lock ordering is always: secondary_lock -> primary_lock. The
reverse never happens, so ABBA deadlock is impossible.

[1]
for example:
BUG: kernel NULL pointer dereference, address: 0000000000000370
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP
CPU: 4 UID: 0 PID: 3945 Comm: bash Not tainted 6.19.0-rc3+ #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5e_dcbnl_dscp_app+0x23/0x100 [mlx5_core]
Call Trace:
 <TASK>
 mlx5e_remove+0x82/0x12a [mlx5_core]
 device_release_driver_internal+0x194/0x1f0
 bus_remove_device+0xc6/0x140
 device_del+0x159/0x3c0
 ? devl_param_driverinit_value_get+0x29/0x80
 mlx5_rescan_drivers_locked+0x92/0x160 [mlx5_core]
 mlx5_unregister_device+0x34/0x50 [mlx5_core]
 mlx5_uninit_one+0x43/0xb0 [mlx5_core]
 remove_one+0x4e/0xc0 [mlx5_core]
 pci_device_remove+0x39/0xa0
 device_release_driver_internal+0x194/0x1f0
 unbind_store+0x99/0xa0
 kernfs_fop_write_iter+0x12e/0x1e0
 vfs_write+0x215/0x3d0
 ksys_write+0x5f/0xd0
 do_syscall_64+0x55/0xe90
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

[2]
    CPU0 (primary)                     CPU1 (secondary)
==========================================================================
mlx5e_remove() (device_lock held)
                                     mlx5e_remove() (2nd device_lock held)
                                      mlx5_sd_get_adev()
                                       mlx5_devcom_comp_is_ready() => true
                                       device_lock(primary)
 mlx5_sd_get_adev() ==> ret adev
 _mlx5e_remove()
 mlx5_sd_cleanup()
 // mlx5e_remove finished
 // releasing device_lock
                                       //need another check here...
                                       mlx5_devcom_comp_is_ready() => false

Fixes: 381978d283 ("net/mlx5e: Create single netdev per SD group")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504180206.268568-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:13:09 -07:00
Shay Drory
3564222cfd net/mlx5e: SD, Fix missing cleanup on probe error
When _mlx5e_probe() fails, the preceding successful mlx5_sd_init() is
not undone. Auxiliary bus probe failure skips binding, so mlx5e_remove()
is never called for that adev and the matching mlx5_sd_cleanup() never
runs - leaking the per-dev SD struct.

Call mlx5_sd_cleanup() on the probe error path to balance
mlx5_sd_init().

A similar gap exists on the resume path: mlx5_sd_init() and
mlx5_sd_cleanup() are currently bundled with both probe/remove and
suspend/resume, even though only the FW alias state actually needs to
follow the suspend/resume lifecycle - the sd struct allocation and
devcom membership are software state that should track the full bound
lifetime. As a result, a failed resume can leave a still-bound device
with sd == NULL, which mlx5_sd_get_adev() can't distinguish from a
non-SD device. Fixing this requires sd_suspend/resume APIs which will
only destroy FW resources and is left for a follow-up series.

Fixes: 381978d283 ("net/mlx5e: Create single netdev per SD group")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504180206.268568-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:13:09 -07:00
Shay Drory
05217e4ffb net/mlx5: SD, Keep multi-pf debugfs entries on primary
mlx5_sd_init() creates the "multi-pf" debugfs directory under the
primary device debugfs root, but stored the dentry in the calling
device's sd struct. When sd_cleanup() run on a different PF,
this leads to using the wrong sd->dfs for removing entries, which
results in memory leak and an error in when re-creating the SD.[1]

Fix it by explicitly storing the debugfs dentry in the primary
device sd struct and use it for all per-group files.

[1]
debugfs: 'multi-pf' already exists in '0000:08:00.1'

Fixes: 4375130bf5 ("net/mlx5: SD, Add debugfs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504180206.268568-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:13:09 -07:00
Shay Drory
3abcedfdfd net/mlx5: SD: Serialize init/cleanup
mlx5_sd_init() / mlx5_sd_cleanup() may run from multiple PFs in the same
Socket-Direct group. This can cause the SD bring-up/tear-down sequence
to be executed more than once or interleaved across PFs.

Protect SD init/cleanup with mlx5_devcom_comp_lock() and track the SD
group state on the primary device. Skip init if the primary is already
UP, and skip cleanup unless the primary is UP.

The state check on cleanup is needed because sd_register() drops the
devcom comp lock between marking the comp ready and assigning
primary_dev on each peer. A concurrent cleanup that acquires the lock
in this window would observe devcom_is_ready==true while primary_dev
is still NULL (causing mlx5_sd_get_primary() to return NULL) or while
the FW alias setup performed by mlx5_sd_init()'s body has not yet run
(causing sd_cmd_unset_primary() to dereference a NULL tx_ft). Gate the
cleanup body on primary_sd->state == MLX5_SD_STATE_UP, which is set
only at the very end of mlx5_sd_init() under the same comp lock - so
observing UP guarantees primary_dev, secondaries[], tx_ft, and dfs are
all populated. Also bail explicitly if mlx5_sd_get_primary() returns
NULL, in case state is checked on a peer whose primary_dev hasn't been
assigned yet.

In addition, move mlx5_devcom_comp_set_ready(false) from sd_unregister()
into the cleanup's locked section, including the !primary and
state != UP early-exit paths, so the device cannot unregister and free
its struct mlx5_sd while devcom is still marked ready. A concurrent
init acquiring the devcom lock will now observe devcom is no longer
ready and bail out immediately.

Fixes: 381978d283 ("net/mlx5e: Create single netdev per SD group")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504180206.268568-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:13:09 -07:00
Jakub Kicinski
af0e9b26b9 Merge branch 'net-mlx5e-psp-fixes'
Tariq Toukan says:

====================
net/mlx5e: PSP fixes

This patchset provides bug fixes from Cosmin to the mlx5e PSP feature.
====================

Link: https://patch.msgid.link/20260504181100.269334-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05 19:09:07 -07:00