Commit Graph

3116 Commits

Author SHA1 Message Date
Linus Torvalds
64edfa6506 Delete some obsolete networking code
Old code like amateur radio and NFC have long been a burden
 to core networking developers. syzbot loves to find bugs
 in BKL-era code, and noobs try to fix them.
 
 If we want to have a fighting chance of surviving the LLM-pocalypse
 this code needs to find a dedicated owner or get deleted.
 We've talked about these deletions multiple times in the past
 and every time someone wanted the code to stay. It is never
 very clear to me how many of those people actually use the code
 vs are just nostalgic to see it go. Amateur radio did have
 occasional users (or so I think) but most users switched
 to user space implementations since its all super slow stuff.
 Nobody stepped up to maintain the kernel code.
 
 We were lucky enough to find someone who wants to help with NFC
 so we're giving that a chance. Let's try to put the rest of
 this code behind us.
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmnqqWYACgkQMUZtbf5S
 IrtEpQ/9F5+8POE6dg6gJVLDKx1+i6GiaOIweAl8h5DatzhBAAGuGr9JyTw0P/iy
 QX7/SU8WQIhi+LVTYBX9M5bJ3Rf+Iws4dll0CyoTTdOFvGwCAck8Ee/w+1gZdsQY
 aG0mQPmftfMEdZGX3KXt8UPDWG7QX4w1gSqxqYcSs1ohN6Txi1F94tmgqXgzYHzv
 vxWP3cF3XTv4eM6BpQj4tiLT3hvrTUfoCZEn9oF4Hn+miYU/yNlWxh0/pmfNjcxd
 vpNN0VfJVK48uPrj57Ep2x9OjkHPviojrUZT0Y55ENBhn1Lykry4MaxsJVsVYhuC
 OqJHQYTFyxwT/USTJxs1gplFyO0i37oCEEt43BKm2KS7rYHgc4pQgMJz7R2IS3wL
 z1xFl45QFt5kX3pw8BvWPXwBomkbDeFORB40Y1qc8RHMfAUKqOhbhzV8rDq9uKup
 0nJxdijdh3/2qdO+LB1pU5rq/MbfAxOQSnRJmKLoKLVljaZHMAVbm829sdap8OM+
 VMnyPF5hOAuTHV0NZJJ2BbcznI4MFDxM1lNEWFuRC39RQeeGRIHsNMjvs4HMHLaW
 V827UBXpUOK6HR3nGCKX3VpLJByUYAIkdIKvRugbWdynvXAw+FJUHx4wRzvFi6oi
 E7ucUY+FI5YOS1rmQJ+rqBjhThcIAdj2U9SNAykDKRVa7zPEUMU=
 =3vMU
 -----END PGP SIGNATURE-----

Merge tag 'net-deletions' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking deletions from Jakub Kicinski:
 "Delete some obsolete networking code

  Old code like amateur radio and NFC have long been a burden to core
  networking developers. syzbot loves to find bugs in BKL-era code, and
  noobs try to fix them.

  If we want to have a fighting chance of surviving the LLM-pocalypse
  this code needs to find a dedicated owner or get deleted. We've talked
  about these deletions multiple times in the past and every time
  someone wanted the code to stay. It is never very clear to me how many
  of those people actually use the code vs are just nostalgic to see it
  go. Amateur radio did have occasional users (or so I think) but most
  users switched to user space implementations since its all super slow
  stuff. Nobody stepped up to maintain the kernel code.

  We were lucky enough to find someone who wants to help with NFC so
  we're giving that a chance. Let's try to put the rest of this code
  behind us"

* tag 'net-deletions' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next:
  drivers: net: 8390: wd80x3: Remove this driver
  drivers: net: 8390: ultra: Remove this driver
  drivers: net: 8390: AX88190: Remove this driver
  drivers: net: fujitsu: fmvj18x: Remove this driver
  drivers: net: smsc: smc91c92: Remove this driver
  drivers: net: smsc: smc9194: Remove this driver
  drivers: net: amd: nmclan: Remove this driver
  drivers: net: amd: lance: Remove this driver
  drivers: net: 3com: 3c589: Remove this driver
  drivers: net: 3com: 3c574: Remove this driver
  drivers: net: 3com: 3c515: Remove this driver
  drivers: net: 3com: 3c509: Remove this driver
  net: packetengines: remove obsolete yellowfin driver and vendor dir
  net: packetengines: remove obsolete hamachi driver
  net: remove unused ATM protocols and legacy ATM device drivers
  net: remove ax25 and amateur radio (hamradio) subsystem
  net: remove ISDN subsystem and Bluetooth CMTP
  caif: remove CAIF NETWORK LAYER
2026-04-24 09:41:58 -07:00
Andrew Lunn
a3fb9a5bf6 drivers: net: smsc: smc91c92: Remove this driver
The smc91c92 was written by David A Hinds in 1999. It is an PCMCIA
device, so unlikely to be used with modern kernels.

Remove the Documentation as well, since it refers to kernel versions
1.2.13 until 1.3.71 and FTP sites which no longer exist.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260422-v7-0-0-net-next-driver-removal-v1-v2-8-08a5b59784d5@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23 15:57:06 -07:00
Andrew Lunn
91f3a27ae9 drivers: net: 3com: 3c509: Remove this driver
The 3c509 was written by Donald Becker between 1993-2000. It is an ISA
device, so unlikely to be used with modern kernels.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260422-v7-0-0-net-next-driver-removal-v1-v2-1-08a5b59784d5@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23 15:56:43 -07:00
Jakub Kicinski
6deb535950 net: remove unused ATM protocols and legacy ATM device drivers
Remove the ATM protocol modules and PCI/SBUS ATM device drivers
that are no longer in active use.

The ATM core protocol stack, PPPoATM, BR2684, and USB DSL modem
drivers (drivers/usb/atm/) are retained in-tree to maintain PPP
over ATM (PPPoA) and PPPoE-over-BR2684 support for DSL connections.
The Solos ADSL2+ PCI driver is also retained.

Removed ATM protocol modules:
 - net/atm/clip.c - Classical IP over ATM (RFC 2225)
 - net/atm/lec.c - LAN Emulation Client (LANE)
 - net/atm/mpc.c, mpoa_caches.c, mpoa_proc.c - Multi-Protocol Over ATM

Removed PCI/SBUS ATM device drivers (drivers/atm/):
 - adummy, atmtcp - software/testing ATM devices
 - eni - Efficient Networks ENI155P (OC-3, ~1995)
 - fore200e - FORE Systems 200E PCI/SBUS (OC-3, ~1999)
 - he - ForeRunner HE (OC-3/OC-12, ~2000)
 - idt77105 - IDT 77105 25 Mbps ATM PHY
 - idt77252 - IDT 77252 NICStAR II (OC-3, ~2000)
 - iphase - Interphase ATM PCI (OC-3/DS3/E3)
 - lanai - Efficient Networks Speedstream 3010
 - nicstar - IDT 77201 NICStAR (155/25 Mbps, ~1999)
 - suni - PMC S/UNI SONET PHY library

Also clean up references in:
 - net/bridge/ - remove ATM LANE hook (br_fdb_test_addr_hook,
   br_fdb_test_addr)
 - net/core/dev.c - remove br_fdb_test_addr_hook export
 - defconfig files - remove ATM driver config options

The removed code is moved to an out-of-tree module package (mod-orphan).

Acked-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260422041846.2035118-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23 12:21:14 -07:00
Jakub Kicinski
dd8d4bc28a net: remove ax25 and amateur radio (hamradio) subsystem
Remove the amateur radio (AX.25, NET/ROM, ROSE) protocol implementation
and all associated hamradio device drivers from the kernel tree.
This set of protocols has long been a huge bug/syzbot magnet,
and since nobody stepped up to help us deal with the influx
of the AI-generated bug reports we need to move it out of tree
to protect our sanity.

The code is moved to an out-of-tree repo:
https://github.com/linux-netdev/mod-orphan
if it's cleaned up and reworked there we can accept it back.

Minimal stub headers are kept for include/net/ax25.h (AX25_P_IP,
AX25_ADDR_LEN, ax25_address) and include/net/rose.h (ROSE_ADDR_LEN)
so that the conditional integration code in arp.c and tun.c continues
to compile and work when the out-of-tree modules are loaded.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Carlos Bilbao <carlos.bilbao@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260421021824.1293976-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-23 10:24:02 -07:00
Jakub Kicinski
6d5431555d caif: remove CAIF NETWORK LAYER
Remove CAIF (Communication CPU to Application CPU Interface), the
ST-Ericsson modem protocol. The subsystem has been orphaned since 2013.
The last meaningful changes from the maintainers were in March 2013:
  a8c7687bf2 ("caif_virtio: Check that vringh_config is not null")
  b2273be8d2 ("caif_virtio: Use vringh_notify_enable correctly")
  0d2e1a2926 ("caif_virtio: Introduce caif over virtio")

Not-so-coincidentally, according to "the Internet" ST-Ericsson officially
shut down its modem joint venture in Aug 2013.

If anyone is using this code please yell!

In the 13 years since, the code has accumulated 200 non-merge commits,
of which 71 were cross-tree API changes, 21 carried Fixes: tags, and
the remaining ~110 were cleanups, doc conversions, treewide refactors,
and one partial removal (caif_hsi, ca75bcf0a8).

We are still getting fixes to this code, in the last 10 days there were
3 reports on security@ about CAIF that I have been CCed on.

UAPI constants (AF_CAIF, ARPHRD_CAIF, N_CAIF, VIRTIO_ID_CAIF) and the
SELinux classmap entry are intentionally kept for ABI stability.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260416182829.1440262-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23 10:23:44 -07:00
Stanislav Fomichev
7ef83bf171 net: move promiscuity handling into netdev_rx_mode_work
Move unicast promiscuity tracking into netdev_rx_mode_work so it runs
under netdev_ops_lock instead of under the addr_lock spinlock. This
is required because __dev_set_promiscuity calls dev_change_rx_flags
and __dev_notify_flags, both of which may need to sleep.

Change ASSERT_RTNL() to netdev_ops_assert_locked() in
__dev_set_promiscuity, netif_set_allmulti and __dev_change_flags
since these are now called from the work queue under the ops lock.

Link: https://lore.kernel.org/netdev/20260214033859.43857-1-jiayuan.chen@linux.dev/
Fixes: 78cd408356 ("net: add missing instance lock to dev_set_promiscuity")
Reported-by: syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-5-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-21 12:50:24 +02:00
Stanislav Fomichev
3554b4345d net: introduce ndo_set_rx_mode_async and netdev_rx_mode_work
Add ndo_set_rx_mode_async callback that drivers can implement instead
of the legacy ndo_set_rx_mode. The legacy callback runs under the
netif_addr_lock spinlock with BHs disabled, preventing drivers from
sleeping. The async variant runs from a work queue with rtnl_lock and
netdev_lock_ops held, in fully sleepable context.

When __dev_set_rx_mode() sees ndo_set_rx_mode_async, it schedules
netdev_rx_mode_work instead of calling the driver inline. The work
function takes two snapshots of each address list (uc/mc) under
the addr_lock, then drops the lock and calls the driver with the
work copies. After the driver returns, it reconciles the snapshots
back to the real lists under the lock.

Add netif_rx_mode_sync() to opportunistically execute the pending
workqueue update inline, so that rx mode changes are committed
before returning to userspace:
  - dev_change_flags (SIOCSIFFLAGS / RTM_NEWLINK)
  - dev_set_promiscuity
  - dev_set_allmulti
  - dev_ifsioc SIOCADDMULTI / SIOCDELMULTI
  - do_setlink (RTM_SETLINK)

Note that some deep hierarchies still do skip the lower updates via:
  - dev_uc_sync
  - dev_mc_sync

If we do end up hitting user-visible issues, we can add more calls to
netif_rx_mode_sync in specific places. But hopefully we should not,
the actual user-visible lists are still synced, it's that just HW state
that might be lagging.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-3-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-21 12:50:03 +02:00
Jakub Kicinski
03a1569c2b netfilter pull request nf-next-26-04-10
-----BEGIN PGP SIGNATURE-----
 
 iQJdBAABCABHFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmnYzgIbFIAAAAAABAAO
 bWFudTIsMi41KzEuMTIsMiwyDRxmd0BzdHJsZW4uZGUACgkQcJGo2a1f9gCfDw/+
 KWf9fUlsE3uaxK889hfR0QU5ANQ03Ix1eVvr6Vh0Y5Za1glZDUuls0EsH0ej7/36
 ZQqAu2vaevHTVZl3EhAS1vu8KBcldl36YEtvJsQXFkFuOoO3F/dBdttwAif2tzv8
 ammqXOKicRHok1A3cy8R1fkGFAHpfn5BjBc68A0+SY1N2NFVdVNS9BP4p7tuSdkk
 JCj3TdDmBcddZ3SnY/z27S4+8jUL3e7HEAbsMApzIERcxe1w/6gEbb5Oa6AUwtHT
 2SwQlUyhBa6gx2tARgUsHcck5QiW8b1tX7y1tzyo2q6rw78m1Eublib5nYCav/w8
 9pSjRLlzSYBQ22e3wz7WqFXZRaM5+O38s3Moxfn/xrQblTk8CyW/5zGQJKivW9oG
 LEirCPbL6U6ZB/2Uy+3EvzG5TBP3cppB5sXaQfMdSQ03wvYFXMN35hb54ePZW6CX
 Db3lCwimOuXq+hkjVzZIU9ZmGr03oNohFX1GA0gDqrWtc9KsEKW8/KQvX61N8QK3
 YEMIZ6fbMkstCY98fS3j6r6+V1he6wzcZpsqjd9FACYXtf8LQbPvoMA4BfcGR8+X
 iQVEZcrvdGa39VH1TQFlXJIe/Pv+9tZ+CF44MsrNyYH0mD4gTInajklO3lkw/YQj
 RQTHJLal9RCF9gVZRqHgkpE8vUj0mtUkp6Atz6En4mU=
 =C/nr
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-26-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter: updates for net-next

1-3) IPVS updates from Julian Anastasov to enhance visibility into
     IPVS internal state by exposing hash size, load factor etc and
     allows userspace to tune the load factor used for resizing hash
     tables.

4) reject empty/not nul terminated device names from xt_physdev.
   This isn't a bug fix; existing code doesn't require a c-string.
   But clean this up anyway because conceptually the interface name
   definitely should be a c-string.

5) Switch nfnetlink to skb_mac_header helpers that didn't exist back
   when this code was written.  This gives us additional debug checks
   but is not intended to change functionality.

6) Let the xt ttl/hoplimit match reject unknown operator modes.
   This is a cleanup, the evaluation function simply returns false when
   the mode is out of range.  From Marino Dzalto.

7) xt_socket match should enable defrag after all other checks. This
   bug is harmless, historically defrag could not be disabled either
   except by rmmod.

8) remove UDP-Lite conntrack support, from Fernando Fernandez Mancera.

9) Avoid a couple -Wflex-array-member-not-at-end warnings in the old
   xtables 32bit compat code, from Gustavo A. R. Silva.

10) nftables fwd expression should drop packets when their ttl/hl has
    expired.  This is a bug fix deferred, its not deemed important
    enough for -rc8.
11) Add additional checks before assuming the mac header is an ethernet
    header, from Zhengchuan Liang.

* tag 'nf-next-26-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: require Ethernet MAC header before using eth_hdr()
  netfilter: nft_fwd_netdev: check ttl/hl before forwarding
  netfilter: x_tables: Avoid a couple -Wflex-array-member-not-at-end warnings
  netfilter: conntrack: remove UDP-Lite conntrack support
  netfilter: xt_socket: enable defrag after all other checks
  netfilter: xt_HL: add pr_fmt and checkentry validation
  netfilter: nfnetlink: prefer skb_mac_header helpers
  netfilter: x_physdev: reject empty or not-nul terminated device names
  ipvs: add conn_lfactor and svc_lfactor sysctl vars
  ipvs: add ip_vs_status info
  ipvs: show the current conn_tab size to users
====================

Link: https://patch.msgid.link/20260410112352.23599-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-12 09:39:21 -07:00
Andy Roulin
c4f2aab121 docs: net: bridge: document stp_mode attribute
Add documentation for the IFLA_BR_STP_MODE bridge attribute in the
"User space STP helper" section of the bridge documentation. Reference
the BR_STP_MODE_* values via kernel-doc and describe the use case for
network namespace environments.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
Link: https://patch.msgid.link/20260405205224.3163000-3-aroulin@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-10 15:52:25 -07:00
Julian Anastasov
8d7de5477e ipvs: add conn_lfactor and svc_lfactor sysctl vars
Allow the default load factor for the connection and service tables
to be configured.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-10 12:16:26 +02:00
Daniel Borkmann
d04686d9bc net: Implement netdev_nl_queue_create_doit
Implement netdev_nl_queue_create_doit which creates a new rx queue in a
virtual netdev and then leases it to a rx queue in a physical netdev.

Example with ynl client:

  # ynl --family netdev --output-json --do queue-create \
        --json '{"ifindex": 8, "type": "rx", "lease": {"ifindex": 4, "queue": {"type": "rx", "id": 15}}}'
  {'id': 1}

Note that the netdevice locking order is always from the virtual to
the physical device.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260402231031.447597-3-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09 18:21:45 -07:00
Or Har-Toov
78c327c172 devlink: Document resource scope filtering
Document the scope parameter for devlink resource show, which allows
filtering the dump to device-level or port-level resources only.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260407194107.148063-13-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 19:55:39 -07:00
Or Har-Toov
170e160a0e devlink: Document port-level resources and full dump
Document the port-level resource support and the option to dump all
resources, including both device-level and port-level entries.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260407194107.148063-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 19:55:39 -07:00
Vladimir Oltean
b773b99352 net: dsa: remove struct platform_data
This is not used anywhere in the kernel.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260406212158.721806-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 19:38:52 -07:00
Ryohei Kinugawa
ed8edcd475 docs/mlx5: Fix typo subfuction
Fix two typos:
 - 'Subfunctons' -> 'Subfunctions'
 - 'subfuction' -> 'subfunction'

Reviewed-by: Joe Damato <joe@dama.to>
Signed-off-by: Ryohei Kinugawa <ryohei.kinugawa@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260324053416.70166-1-ryohei.kinugawa@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 18:30:47 -07:00
Haiyang Zhang
dc3d720e12 net: ethtool: add ethtool COALESCE_RX_CQE_FRAMES/NSECS
Add two parameters for drivers supporting Rx CQE coalescing /
descriptor writeback.

ETHTOOL_A_COALESCE_RX_CQE_FRAMES:
Maximum number of frames that can be coalesced into a CQE or
writeback.

ETHTOOL_A_COALESCE_RX_CQE_NSECS:
Max time in nanoseconds after the first packet arrival in a
coalesced CQE or writeback to be sent.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/20260317191826.1346111-2-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-18 20:01:10 -07:00
Jiri Pirko
63fff8c0f7 documentation: networking: add shared devlink documentation
Document shared devlink instances for multiple PFs on the same chip.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-13-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-14 13:08:50 -07:00
Simon Baatz
0e24d17bd9 tcp: implement RFC 7323 window retraction receiver requirements
By default, the Linux TCP implementation does not shrink the
advertised window (RFC 7323 calls this "window retraction") with the
following exceptions:

- When an incoming segment cannot be added due to the receive buffer
  running out of memory. Since commit 8c670bdfa5 ("tcp: correct
  handling of extreme memory squeeze") a zero window will be
  advertised in this case. It turns out that reaching the required
  memory pressure is easy when window scaling is in use. In the
  simplest case, sending a sufficient number of segments smaller than
  the scale factor to a receiver that does not read data is enough.

- Commit b650d953cd ("tcp: enforce receive buffer memory limits by
  allowing the tcp window to shrink") addressed the "eating memory"
  problem by introducing a sysctl knob that allows shrinking the
  window before running out of memory.

However, RFC 7323 does not only state that shrinking the window is
necessary in some cases, it also formulates requirements for TCP
implementations when doing so (Section 2.4).

This commit addresses the receiver-side requirements: After retracting
the window, the peer may have a snd_nxt that lies within a previously
advertised window but is now beyond the retracted window. This means
that all incoming segments (including pure ACKs) will be rejected
until the application happens to read enough data to let the peer's
snd_nxt be in window again (which may be never).

To comply with RFC 7323, the receiver MUST honor any segment that
would have been in window for any ACK sent by the receiver and, when
window scaling is in effect, SHOULD track the maximum window sequence
number it has advertised. This patch tracks that maximum window
sequence number rcv_mwnd_seq throughout the connection and uses it in
tcp_sequence() when deciding whether a segment is acceptable.

rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
tcp_select_window(). If we count tcp_sequence() as fast path, it is
read in the fast path. Therefore, rcv_mwnd_seq is put into rcv_wnd's
cacheline group.

The logic for handling received data in tcp_data_queue() is already
sufficient and does not need to be updated.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-1-4c7f96b1ec69@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-14 08:01:49 -07:00
ShravyaPanchagiri
00699d9448 docs: octeontx2: fix typo in documentation
Fix spelling mistake "Crate" to "Create" in the documentation.

Signed-off-by: ShravyaPanchagiri <shravy112@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260311030450.8461-1-shravy112@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-12 17:50:57 -07:00
Eric Dumazet
410593fec7 tcp: add sysctl_tcp_shrink_window to netns_ipv4_sysctl.rst
Add missing entry for sysctl_tcp_shrink_window.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260310073855.564927-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 18:21:08 -07:00
Fernando Fernandez Mancera
7da62262ec inet: add ip_local_port_step_width sysctl to improve port usage distribution
With the current port selection algorithm, ports after a reserved port
range or long time used port are used more often than others [1]. This
causes an uneven port usage distribution. This combines with cloud
environments blocking connections between the application server and the
database server if there was a previous connection with the same source
port, leading to connectivity problems between applications on cloud
environments.

The real issue here is that these firewalls cannot cope with
standards-compliant port reuse. This is a workaround for such situations
and an improvement on the distribution of ports selected.

The proposed solution is to implement a variant of RFC 6056 Algorithm 5.
The step size is selected randomly on every connect() call ensuring it
is a coprime with respect to the size of the range of ports we want to
scan. This way, we can ensure that all ports within the range are
scanned before returning an error. To enable this algorithm, the user
must configure the new sysctl option "net.ipv4.ip_local_port_step_width".

In addition, on graphs generated we can observe that the distribution of
source ports is more even with the proposed approach. [2]

[1] https://0xffsoftware.com/port_graph_current_alg.html

[2] https://0xffsoftware.com/port_graph_random_step_alg.html

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260309023946.5473-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 18:59:39 -07:00
Kyoji Ogasawara
aa5ec9d03b net/smc: Add documentation for limit_smc_hs and hs_ctrl
Document missing SMC sysctl parameters limit_smc_hs and hs_ctrl

Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com>
Reviewed-by: D. Wythe<alibuda@linux.alibaba.com>
Link: https://patch.msgid.link/20260309124541.22723-3-sawara04.o@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 17:53:03 -07:00
Kyoji Ogasawara
4a51ac9056 net/smc: fix indentation in smcr_buf_type section
smcr_buf_type section used inconsistent indentation compared
with the rest of this document.

Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com>
Reviewed-by: D. Wythe<alibuda@linux.alibaba.com>
Link: https://patch.msgid.link/20260309124541.22723-2-sawara04.o@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 17:52:36 -07:00
Eric Dumazet
dd378109d2 net-sysfs: use rps_tag_ptr and remove metadata from rps_sock_flow_table
Instead of storing the @mask at the beginning of rps_sock_flow_table,
use 5 low order bits of the rps_tag_ptr to store the log of the size.

This removes a potential cache line miss to fetch @mask.

More importantly, we can switch to vmalloc_huge() without wasting memory.

Tested with:

numactl --interleave=all bash -c "echo 4194304 >/proc/sys/net/core/rps_sock_flow_entries"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 16:54:09 -08:00
Leon Kral
57cc8ab3e9 net/handshake: Fixed grammar mistake
The word "a" was used instead of "an" which is grammatically incorrect.
Fixed by changing from "a" to "an". This improves readability of the
documentation.

Signed-off-by: Leon Kral <leon.j.kral@protonmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Link: https://patch.msgid.link/20260227001151.41610-1-leon.j.kral@protonmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27 19:24:08 -08:00
Yohei Kojima
5cf47393d9 docs: ethtool: clarify the bit-by-bit bitset format description
Clarify the bit-by-bit bitset format's behavior around mandatory
attributes and bit identification. More specifically, the following
changes are made:

* Rephrase a misleading sentence which implies name and index are
  mutually exclusive
* Describe that ETHTOOL_A_BITSET_BITS nest is mandatory
* Describe that a request fails if inconsistent identifiers are given

Signed-off-by: Yohei Kojima <yk@y-koj.net>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/ef90a56965ca66e57aa177929ce3e10c5ca815fa.1772031974.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26 17:55:42 -08:00
Gabriel Goller
d68d21ea6b docs: net: document neigh gc_interval sysctl
Add entry for the neigh/default/gc_interval sysctl. This sysctl is
unused since kernel v2.6.8.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
Link: https://patch.msgid.link/20260225095822.44050-1-g.goller@proxmox.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26 17:37:04 -08:00
Eric Dumazet
64db5933c7 icmp: increase net.ipv4.icmp_msgs_{per_sec,burst}
These sysctls were added in 4cdf507d54 ("icmp: add a global rate
limitation") and their default values might be too small.

Some network tools send probes to closed UDP ports from many hosts
to estimate proportion of packet drops on a particular target.

This patch sets both sysctls to 10000.

Note the per-peer rate-limit (as described in RFC 4443 2.4 (f))
intent is still enforced.

This also increases security, see b38e7819ca
("icmp: randomize the global rate limiter") for reference.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223161742.929830-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-24 17:50:12 -08:00
Gabriel Goller
3197cce4d4 docs: net: document neigh gc_stale_time sysctl
Add missing documentation for a neighbor table garbage collector sysctl
parameter in ip-sysctl.rst:

neigh/default/gc_stale_time: controls how long an unused neighbor entry
is kept before becoming eligible for garbage collection (default: 60
seconds)

Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
Link: https://patch.msgid.link/20260223101257.47563-1-g.goller@proxmox.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-24 17:17:06 -08:00
Eric Dumazet
0201eedb69 ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
Following part was needed before the blamed commit, because
inet_getpeer_v6() second argument was the prefix.

	/* Give more bandwidth to wider prefixes. */
	if (rt->rt6i_dst.plen < 128)
		tmo >>= ((128 - rt->rt6i_dst.plen)>>5);

Now inet_getpeer_v6() retrieves hosts, we need to remove
@tmo adjustement or wider prefixes likes /24 allow 8x
more ICMP to be sent for a given ratelimit.

As we had this issue for a while, this patch changes net.ipv6.icmp.ratelimit
default value from 1000ms to 100ms to avoid potential regressions.

Also add a READ_ONCE() when reading net->ipv6.sysctl.icmpv6_time.

Fixes: fd0273d793 ("ipv6: Remove external dependency on rt6i_dst and rt6i_src")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260216142832.3834174-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-18 16:46:36 -08:00
Linus Torvalds
37a93dd5c4 Networking changes for 7.0
Core & protocols
 ----------------
 
  - A significant effort all around the stack to guide the compiler to
    make the right choice when inlining code, to avoid unneeded calls for
    small helper and stack canary overhead in the fast-path. This
    generates better and faster code with very small or no text size
    increases, as in many cases the call generated more code than the
    actual inlined helper.
 
  - Extend AccECN implementation so that is now functionally complete,
    also allow the user-space enabling it on a per network namespace
    basis.
 
  - Add support for memory providers with large (above 4K) rx buffer.
    Paired with hw-gro, larger rx buffer sizes reduce the number of
    buffers traversing the stack, dincreasing single stream CPU usage by
    up to ~30%.
 
  - Do not add HBH header to Big TCP GSO packets. This simplifies the RX
    path, the TX path and the NIC drivers, and is possible because
    user-space taps can now interpret correctly such packets without the
    HBH hint.
 
  - Allow IPv6 routes to be configured with a gateway address that is
    resolved out of a different interface than the one specified, aligning
    IPv6 to IPv4 behavior.
 
  - Multi-queue aware sch_cake. This makes it possible to scale the rate
    shaper of sch_cake across multiple CPUs, while still enforcing a
    single global rate on the interface.
 
  - Add support for the nbcon (new buffer console) infrastructure to
    netconsole, enabling lock-free, priority-based console operations that
    are safer in crash scenarios.
 
  - Improve the TCP ipv6 output path to cache the flow information, saving
    cpu cycles, reducing cache line misses and stack use.
 
  - Improve netfilter packet tracker to resolve clashes for most protocols,
    avoiding unneeded drops on rare occasions.
 
  - Add IP6IP6 tunneling acceleration to the flowtable infrastructure.
 
  - Reduce tcp socket size by one cache line.
 
  - Notify neighbour changes atomically, avoiding inconsistencies between
    the notification sequence and the actual states sequence.
 
  - Add vsock namespace support, allowing complete isolation of vsocks
    across different network namespaces.
 
  - Improve xsk generic performances with cache-alignment-oriented
    optimizations.
 
  - Support netconsole automatic target recovery, allowing netconsole
    to reestablish targets when underlying low-level interface comes back
    online.
 
 Driver API
 ----------
 
  - Support for switching the working mode (automatic vs manual) of a DPLL
    device via netlink.
 
  - Introduce PHY ports representation to expose multiple front-facing
    media ports over a single MAC.
 
  - Introduce "rx-polarity" and "tx-polarity" device tree properties, to
    generalize polarity inversion requirements for differential signaling.
 
  - Add helper to create, prepare and enable managed clocks.
 
 Device drivers
 --------------
 
  - Add Huawei hinic3 PF etherner driver.
 
  - Add DWMAC glue driver for Motorcomm YT6801 PCIe ethernet controller.
 
  - Add ethernet driver for MaxLinear MxL862xx switches
 
  - Remove parallel-port Ethernet driver.
 
  - Convert existing driver timestamp configuration reporting to
    hwtstamp_get and remove legacy ioctl().
 
  - Convert existing drivers to .get_rx_ring_count(), simplifing the RX
    ring count retrieval. Also remove the legacy fallback path.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt, bng):
      - bnxt: add FW interface update to support FEC stats histogram and
        NVRAM defragmentation
      - bng: add TSO and H/W GRO support
    - nVidia/Mellanox (mlx5):
      - improve latency of channel restart operations, reducing the used
        H/W resources
      - add TSO support for UDP over GRE over VLAN
      - add flow counters support for hardware steering (HWS) rules
      - use a static memory area to store headers for H/W GRO, leading to
        12% RX tput improvement
    - Intel (100G, ice, idpf):
      - ice: reorganizes layout of Tx and Rx rings for cacheline
        locality and utilizes __cacheline_group* macros on the new layouts
      - ice: introduces Synchronous Ethernet (SyncE) support
    - Meta (fbnic):
      - adds debugfs for firmware mailbox and tx/rx rings vectors
 
  - Ethernet virtual:
    - geneve: introduce GRO/GSO support for double UDP encapsulation
 
  - Ethernet NICs consumer, and embedded:
    - Synopsys (stmmac):
      - some code refactoring and cleanups
    - RealTek (r8169):
      - add support for RTL8127ATF (10G Fiber SFP)
      - add dash and LTR support
    - Airoha:
      - AN8811HB 2.5 Gbps phy support
    - Freescale (fec):
      - add XDP zero-copy support
    - Thunderbolt:
      - add get link setting support to allow bonding
    - Renesas:
      - add support for RZ/G3L GBETH SoC
 
  - Ethernet switches:
    - Maxlinear:
      - support R(G)MII slow rate configuration
      - add support for Intel GSW150
    - Motorcomm (yt921x):
      - add DCB/QoS support
    - TI:
      - icssm-prueth: support bridging (STP/RSTP) via the switchdev
        framework
 
  - Ethernet PHYs:
    - Realtek:
      - enable SGMII and 2500Base-X in-band auto-negotiation
      - simplify and reunify C22/C45 drivers
    - Micrel: convert bindings to DT schema
 
  - CAN:
    - move skb headroom content into skb extensions, making CAN metadata
      access more robust
 
  - CAN drivers:
    - rcar_canfd:
      - add support for FD-only mode
      - add support for the RZ/T2H SoC
    - sja1000: cleanup the CAN state handling
 
  - WiFi:
    - implement EPPKE/802.1X over auth frames support
    - split up drop reasons better, removing generic RX_DROP
    - additional FTM capabilities: 6 GHz support, supported number of
      spatial streams and supported number of LTF repetitions
    - better mac80211 iterators to enumerate resources
    - initial UHR (Wi-Fi 8) support for cfg80211/mac80211
 
  - WiFi drivers:
    - Qualcomm/Atheros:
      - ath11k: support for Channel Frequency Response measurement
      - ath12k: a significant driver refactor to support
        multi-wiphy devices and and pave the way for future device support
        in the same driver (rather than splitting to ath13k)
      - ath12k: support for the QCC2072 chipset
    - Intel:
      - iwlwifi: partial Neighbor Awareness Networking (NAN) support
      - iwlwifi: initial support for U-NII-9 and IEEE 802.11bn
    - RealTek (rtw89):
      - preparations for RTL8922DE support
 
  - Bluetooth:
    - implement setsockopt(BT_PHY) to set the connection packet type/PHY
    - set link_policy on incoming ACL connections
 
  - Bluetooth drivers:
    - btusb: add support for MediaTek7920, Realtek RTL8761BU and 8851BE
    - btqca: add WCN6855 firmware priority selection feature
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmmMum4SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkDnMP/3bpHAGj+gylTid3Xsj0TjJ8AkPsQs+W
 uvSMiCB1TvGCTD9kK36Vr+qPoIgJY10UxYMMjt5Gs0A9TvGDDfYnUOVoUIkfkWCH
 grqSdp6dVkyaJVfyLEcuOVQQG2HwEnhC4c3ZOhOxaKNAnsLCP142lYsMR9ktGRuA
 4vDGtz1+y7t8qBk/lyfXDM71KRrtq0HWJZIhmhz8QXTBsgPDfSejbTPNxXQOJoeO
 sKeArsHr/Cmvf89ZtLZ63vbfr4BKDm4PeXqPYR3PrQs2Yu6I1EK4lehygTY2yE2O
 I3MEPlvpa/tiVLxqXNNwEFbYIkMPY6FXS9x05hTxNZM65A6aB3vvdkqPVnVmAlXE
 f+4PYg9paI13lbzZOeQbGfZ5HgPpzQvnginaaX6s9Fp12K3Ll1FkwWdUznFWhzVn
 5LSrGyecR00CdKJByTIw9JGg/1ptz5a57pa8OQmcKRx3WhQ1XeV5TIJQF4QcPgHw
 ApyjmeGDTQMQMzha1fsaVr+i6BK2zgZvKK9uGDTX90xn2JUw/M75tyOlsTtGlnuM
 sZgj0KVGQlG2wLwBB/+D4S9Oi9YlPG00rkCs0E4jk5C/G4NBmMgpEPQg6azkb57h
 Uiy0paohxfwcZ3qbGA9In091ClGqIwOiCBaq+uXRq1ro88Neo6PWkjz5ItNrsD8t
 Ttgd5AVAQyPT
 =O31Y
 -----END PGP SIGNATURE-----

Merge tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "Core & protocols:

   - A significant effort all around the stack to guide the compiler to
     make the right choice when inlining code, to avoid unneeded calls
     for small helper and stack canary overhead in the fast-path.

     This generates better and faster code with very small or no text
     size increases, as in many cases the call generated more code than
     the actual inlined helper.

   - Extend AccECN implementation so that is now functionally complete,
     also allow the user-space enabling it on a per network namespace
     basis.

   - Add support for memory providers with large (above 4K) rx buffer.
     Paired with hw-gro, larger rx buffer sizes reduce the number of
     buffers traversing the stack, dincreasing single stream CPU usage
     by up to ~30%.

   - Do not add HBH header to Big TCP GSO packets. This simplifies the
     RX path, the TX path and the NIC drivers, and is possible because
     user-space taps can now interpret correctly such packets without
     the HBH hint.

   - Allow IPv6 routes to be configured with a gateway address that is
     resolved out of a different interface than the one specified,
     aligning IPv6 to IPv4 behavior.

   - Multi-queue aware sch_cake. This makes it possible to scale the
     rate shaper of sch_cake across multiple CPUs, while still enforcing
     a single global rate on the interface.

   - Add support for the nbcon (new buffer console) infrastructure to
     netconsole, enabling lock-free, priority-based console operations
     that are safer in crash scenarios.

   - Improve the TCP ipv6 output path to cache the flow information,
     saving cpu cycles, reducing cache line misses and stack use.

   - Improve netfilter packet tracker to resolve clashes for most
     protocols, avoiding unneeded drops on rare occasions.

   - Add IP6IP6 tunneling acceleration to the flowtable infrastructure.

   - Reduce tcp socket size by one cache line.

   - Notify neighbour changes atomically, avoiding inconsistencies
     between the notification sequence and the actual states sequence.

   - Add vsock namespace support, allowing complete isolation of vsocks
     across different network namespaces.

   - Improve xsk generic performances with cache-alignment-oriented
     optimizations.

   - Support netconsole automatic target recovery, allowing netconsole
     to reestablish targets when underlying low-level interface comes
     back online.

  Driver API:

   - Support for switching the working mode (automatic vs manual) of a
     DPLL device via netlink.

   - Introduce PHY ports representation to expose multiple front-facing
     media ports over a single MAC.

   - Introduce "rx-polarity" and "tx-polarity" device tree properties,
     to generalize polarity inversion requirements for differential
     signaling.

   - Add helper to create, prepare and enable managed clocks.

  Device drivers:

   - Add Huawei hinic3 PF etherner driver.

   - Add DWMAC glue driver for Motorcomm YT6801 PCIe ethernet
     controller.

   - Add ethernet driver for MaxLinear MxL862xx switches

   - Remove parallel-port Ethernet driver.

   - Convert existing driver timestamp configuration reporting to
     hwtstamp_get and remove legacy ioctl().

   - Convert existing drivers to .get_rx_ring_count(), simplifing the RX
     ring count retrieval. Also remove the legacy fallback path.

   - Ethernet high-speed NICs:
      - Broadcom (bnxt, bng):
         - bnxt: add FW interface update to support FEC stats histogram
           and NVRAM defragmentation
         - bng: add TSO and H/W GRO support
      - nVidia/Mellanox (mlx5):
         - improve latency of channel restart operations, reducing the
           used H/W resources
         - add TSO support for UDP over GRE over VLAN
         - add flow counters support for hardware steering (HWS) rules
         - use a static memory area to store headers for H/W GRO,
           leading to 12% RX tput improvement
      - Intel (100G, ice, idpf):
         - ice: reorganizes layout of Tx and Rx rings for cacheline
           locality and utilizes __cacheline_group* macros on the new
           layouts
         - ice: introduces Synchronous Ethernet (SyncE) support
      - Meta (fbnic):
         - adds debugfs for firmware mailbox and tx/rx rings vectors

   - Ethernet virtual:
      - geneve: introduce GRO/GSO support for double UDP encapsulation

   - Ethernet NICs consumer, and embedded:
      - Synopsys (stmmac):
         - some code refactoring and cleanups
      - RealTek (r8169):
         - add support for RTL8127ATF (10G Fiber SFP)
         - add dash and LTR support
      - Airoha:
         - AN8811HB 2.5 Gbps phy support
      - Freescale (fec):
         - add XDP zero-copy support
      - Thunderbolt:
         - add get link setting support to allow bonding
      - Renesas:
         - add support for RZ/G3L GBETH SoC

   - Ethernet switches:
      - Maxlinear:
         - support R(G)MII slow rate configuration
         - add support for Intel GSW150
      - Motorcomm (yt921x):
         - add DCB/QoS support
      - TI:
         - icssm-prueth: support bridging (STP/RSTP) via the switchdev
           framework

   - Ethernet PHYs:
      - Realtek:
         - enable SGMII and 2500Base-X in-band auto-negotiation
         - simplify and reunify C22/C45 drivers
      - Micrel: convert bindings to DT schema

   - CAN:
      - move skb headroom content into skb extensions, making CAN
        metadata access more robust

   - CAN drivers:
      - rcar_canfd:
         - add support for FD-only mode
         - add support for the RZ/T2H SoC
      - sja1000: cleanup the CAN state handling

   - WiFi:
      - implement EPPKE/802.1X over auth frames support
      - split up drop reasons better, removing generic RX_DROP
      - additional FTM capabilities: 6 GHz support, supported number of
        spatial streams and supported number of LTF repetitions
      - better mac80211 iterators to enumerate resources
      - initial UHR (Wi-Fi 8) support for cfg80211/mac80211

   - WiFi drivers:
      - Qualcomm/Atheros:
         - ath11k: support for Channel Frequency Response measurement
         - ath12k: a significant driver refactor to support multi-wiphy
           devices and and pave the way for future device support in the
           same driver (rather than splitting to ath13k)
         - ath12k: support for the QCC2072 chipset
      - Intel:
         - iwlwifi: partial Neighbor Awareness Networking (NAN) support
         - iwlwifi: initial support for U-NII-9 and IEEE 802.11bn
      - RealTek (rtw89):
         - preparations for RTL8922DE support

   - Bluetooth:
      - implement setsockopt(BT_PHY) to set the connection packet type/PHY
      - set link_policy on incoming ACL connections

   - Bluetooth drivers:
      - btusb: add support for MediaTek7920, Realtek RTL8761BU and 8851BE
      - btqca: add WCN6855 firmware priority selection feature"

* tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1254 commits)
  bnge/bng_re: Add a new HSI
  net: macb: Fix tx/rx malfunction after phy link down and up
  af_unix: Fix memleak of newsk in unix_stream_connect().
  net: ti: icssg-prueth: Add optional dependency on HSR
  net: dsa: add basic initial driver for MxL862xx switches
  net: mdio: add unlocked mdiodev C45 bus accessors
  net: dsa: add tag format for MxL862xx switches
  dt-bindings: net: dsa: add MaxLinear MxL862xx
  selftests: drivers: net: hw: Modify toeplitz.c to poll for packets
  octeontx2-pf: Unregister devlink on probe failure
  net: renesas: rswitch: fix forwarding offload statemachine
  ionic: Rate limit unknown xcvr type messages
  tcp: inet6_csk_xmit() optimization
  tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock()
  tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect()
  ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6
  ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update()
  ipv6: use np->final in inet6_sk_rebuild_header()
  ipv6: add daddr/final storage in struct ipv6_pinfo
  net: stmmac: qcom-ethqos: fix qcom_ethqos_serdes_powerup()
  ...
2026-02-11 19:31:52 -08:00
Chia-Yu Chang
1247fb19ca tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST
Detect spurious retransmission of a previously sent ACK carrying the
AccECN option after the second retransmission. Since this might be caused
by the middlebox dropping ACK with options it does not recognize, disable
the sending of the AccECN option in all subsequent ACKs. This patch
follows Section 3.2.3.2.2 of AccECN spec (RFC9768), and a new field
(accecn_opt_sent_w_dsack) is added to indicate that an AccECN option was
sent with duplicate SACK info.

Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl:
(TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and
persistently sends AccECN option once it fits into TCP option space.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-13-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03 15:13:25 +01:00
Ilpo Järvinen
7885ce0147 tcp: try to avoid safer when ACKs are thinned
Add newly acked pkts EWMA. When ACK thinning occurs, select
between safer and unsafe cep delta in AccECN processing based
on it. If the packets ACKed per ACK tends to be large, don't
conservatively assume ACE field overflow.

This patch uses the existing 2-byte holes in the rx group for new
u16 variables withtout creating more holes. Below are the pahole
outcomes before and after this patch:

[BEFORE THIS PATCH]
struct tcp_sock {
    [...]
    u32                        delivered_ecn_bytes[3]; /*  2744    12 */
    /* XXX 4 bytes hole, try to pack */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */

    [...]
    /* size: 3264, cachelines: 51, members: 177 */
}

[AFTER THIS PATCH]
struct tcp_sock {
    [...]
    u32                        delivered_ecn_bytes[3]; /*  2744    12 */
    u16                        pkts_acked_ewma;        /*  2756     2 */
    /* XXX 2 bytes hole, try to pack */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */

    [...]
    /* size: 3264, cachelines: 51, members: 178 */
}

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-2-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03 15:13:24 +01:00
Jakub Kicinski
ba9c5611f0 docs: networking: mention that RSS table should be 4x the queue count
Spell out the recommendation that the RSS table should be
4x the queue count to avoid traffic imbalance. Include minor
rephrasing and removal of the explicit 128 entry example
since a 128 entry table is inadequate on modern machines.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131225454.1225151-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02 17:06:00 -08:00
Ethan Nelson-Moore
aba0138eb7 net: ethernet: neterion: s2io: remove unused driver
The s2io driver supports Exar (formerly Neterion and S2io) PCI-X 10
Gigabit Ethernet cards. Hardware supporting PCI-X has not been
manufactured in years. On x86, it was quickly replaced by PCIe. While
it stuck around longer on POWER hardware, the last POWER hardware to
support it was POWER7, which is not supported by ppc64le Linux
distributions. The last supported mainstream ppc64 Linux distribution
was RHEL 7; while it is still supported under ELS, ELS is only
available for x86 and IBM Z. It is possible to use many PCI-X cards in
standard PCI slots (which are still available on new motherboards), but
it does not make sense to do so for 10 Gigabit Ethernet because the
maximum bandwidth of standard PCI is only 1067 Mbps. It is therefore
highly unlikely that this driver is still being used. Remove the
driver, and move the former maintainer to the CREDITS file (restoring
credit for the vxge driver, which was removed in commit f05643a0f6
("eth: remove neterion/vxge").

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260126031352.22997-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-28 20:08:07 -08:00
Dimitri Daskalakis
3085ff59fe Documentation: net: Fix typos in netdevices.rst
Fixes two minor typos. Specifically, on -> or and Devices -> Device.

Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20260122225723.2368698-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-25 13:03:43 -08:00
Jani Nikula
a592a36e49 Documentation: use a source-read extension for the index link boilerplate
The root document usually has a special :ref:`genindex` link to the
generated index. This is also the case for Documentation/index.rst. The
other index.rst files deeper in the directory hierarchy usually don't.

For SPHINXDIRS builds, the root document isn't Documentation/index.rst,
but some other index.rst in the hierarchy. Currently they have a
".. only::" block to add the index link when doing SPHINXDIRS html
builds.

This is obviously very tedious and repetitive. The link is also added to
all index.rst files in the hierarchy for SPHINXDIRS builds, not just the
root document.

Put the boilerplate in a sphinx-includes/subproject-index.rst file, and
include it at the end of the root document for subproject builds in an
ad-hoc source-read extension defined in conf.py.

For now, keep having the boilerplate in translations, because this
approach currently doesn't cover translated index link headers.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
[jc: did s/doctree/kern_doc_dir/ ]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260123143149.2024303-1-jani.nikula@intel.com>
2026-01-23 11:59:34 -07:00
Vadim Fedorenko
5062245a5a net: remove legacy way to get/set HW timestamp config
With all drivers converted to use ndo_hwstamp callbacks the legacy way
can be removed, marking ioctl interface as deprecated.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260116062121.1230184-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20 18:21:27 -08:00
Jakub Kicinski
677a51790b Merge tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux
Pavel Begunkov says:

====================
Add support for providers with large rx buffer

Many modern NICs support configurable receive buffer lengths, and zcrx and
memory providers can use buffers larger than 4K to improve performance.
When paired with hw-gro larger rx buffer sizes can drastically reduce
the number of buffers traversing the stack and save a lot of processing
time. It also allows to give to users larger contiguous chunks of data.

Single stream benchmarks showed up to ~30% CPU util improvement.
E.g. comparison for 4K vs 32K buffers using a 200Gbit NIC:

packets=23987040 (MB=2745098), rps=199559 (MB/s=22837)
CPU    %usr   %nice    %sys %iowait    %irq   %soft   %idle
  0    1.53    0.00   27.78    2.72    1.31   66.45    0.22
packets=24078368 (MB=2755550), rps=200319 (MB/s=22924)
CPU    %usr   %nice    %sys %iowait    %irq   %soft   %idle
  0    0.69    0.00    8.26   31.65    1.83   57.00    0.57

This series adds net infrastructure for memory providers configuring
the size and implements it for bnxt. It's an opt-in feature for drivers,
they should advertise support for the parameter in the qops and must check
if the hardware supports the given size. It's limited to memory providers
as it drastically simplifies implementation. It doesn't affect the fast
path zcrx uAPI, and the user exposed parameter is defined in zcrx terms,
which allows it to be flexible and adjusted in the future.

A liburing example can be found at [2]

full branch:
[1] https://github.com/isilence/linux.git zcrx/large-buffers-v8
Liburing example:
[2] https://github.com/isilence/liburing.git zcrx/rx-buf-len

* tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux:
  io_uring/zcrx: document area chunking parameter
  selftests: iou-zcrx: test large chunk sizes
  eth: bnxt: support qcfg provided rx page size
  eth: bnxt: adjust the fill level of agg queues with larger buffers
  eth: bnxt: store rx buffer size per queue
  net: pass queue rx page size from memory provider
  net: add bare bone queue configs
  net: reduce indent of struct netdev_queue_mgmt_ops members
  net: memzero mp params when closing a queue
====================

Link: https://patch.msgid.link/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20 18:10:04 -08:00
Shahar Shitrit
8fc8071041 docs: tls: Enhance TLS resync async process documentation
Expand the tls-offload.rst documentation to provide a more detailed
explanation of the asynchronous resync process, including the role
of struct tls_offload_resync_async in managing resync requests on
the kernel side.

Also, add documentation for helper functions
tls_offload_rx_resync_async_request_start/ _end/ _cancel.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1768298883-1602599-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-17 15:26:13 -08:00
Heiner Kallweit
b1b77c82ce net: phy: remove unused fixup unregistering functions
No user of PHY fixups unregisters these. IOW: The fixup unregistering
functions are unused and can be removed. Remove also documentation
for these functions. Whilst at it, remove also mentioning of
phy_register_fixup() from the Documentation, as this function has been
static since ea47e70e47 ("net: phy: remove fixup-related definitions
from phy.h which are not used outside phylib").

Fixup unregistering functions were added with f38e7a32ee
("phy: add phy fixup unregister functions") in 2016, and last user
was removed with 6782d06a47 ("net: usb: lan78xx: Remove KSZ9031 PHY
fixup") in 2024.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/ff8ac321-435c-48d0-b376-fbca80c0c22e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-15 19:48:26 -08:00
Maxime Chevallier
62518b5b3d Documentation: networking: Document the phy_port infrastructure
This documentation aims at describing the main goal of the phy_port
infrastructure.

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-15-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 18:52:36 -08:00
Pavel Begunkov
d1de61db15 io_uring/zcrx: document area chunking parameter
struct io_uring_zcrx_ifq_reg::rx_buf_len is used as a hint specifying
the kernel what buffer size it should use. Document the API and
limitations.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2026-01-14 02:13:37 +00:00
Vladimir Oltean
4e4c00f34d Documentation: net: dsa: mention simple HSR offload helpers
Keep the documentation up to date.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251130131657.65080-16-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 16:51:55 -08:00
Vladimir Oltean
977839161f Documentation: net: dsa: mention availability of RedBox
Since commit 5055cccfc2 ("net: hsr: Provide RedBox support
(HSR-SAN)"), RedBox is available (including for offload in DSA).

Update the DSA documentation that states it isn't.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251130131657.65080-15-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 16:51:49 -08:00
Eric Dumazet
9a5e5334ad tcp: remove icsk->icsk_retransmit_timer
Now sk->sk_timer is no longer used by TCP keepalive, we can use
its storage for TCP and MPTCP retransmit timers for better
cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:28:29 -08:00
Eric Dumazet
08dfe37023 tcp: introduce icsk->icsk_keepalive_timer
sk->sk_timer has been used for TCP keepalives.

Keepalive timers are not in fast path, we want to use sk->sk_timer
storage for retransmit timers, for better cache locality.

Create icsk->icsk_keepalive_timer and change keepalive
code to no longer use sk->sk_timer.

Added space is reclaimed in the following patch.

This includes changes to MPTCP, which was also using sk_timer.

Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer
for inet_sk_diag_fill() sake.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:28:29 -08:00
Daniel Zahka
b11d358bf8 net/mlx5: implement swp_l4_csum_mode via devlink params
swp_l4_csum_mode controls how L4 transmit checksums are computed when
using Software Parser (SWP) hints for header locations.

Supported values:
  1. default: device will choose between full_csum or l4_only. Driver
     will discover the device's choice during initialization.
  2. full_csum: calculate L4 checksum with the pseudo-header.
  3. l4_only: calculate L4 checksum without the pseudo-header. Only
     available when swp_l4_csum_mode_l4_only is set in
     mlx5_ifc_nv_sw_offload_cap_bits.

Note that 'default' might be returned from the device and passed to
userspace, and it might also be set during a
devlink_param::reset_default() call, but attempts to set a value of
default directly with param-set will be rejected.

The l4_only setting is a dependency for PSP initialization in
mlx5e_psp_init().

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251119025038.651131-5-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 19:01:22 -08:00
Daniel Zahka
2a367002ed devlink: support default values for param-get and param-set
Support querying and resetting to default param values.

Introduce two new devlink netlink attrs:
DEVLINK_ATTR_PARAM_VALUE_DEFAULT and
DEVLINK_ATTR_PARAM_RESET_DEFAULT. The former is used to contain an
optional parameter value inside of the param_value nested
attribute. The latter is used in param-set requests from userspace to
indicate that the driver should reset the param to its default value.

To implement this, two new functions are added to the devlink driver
api: devlink_param::get_default() and
devlink_param::reset_default(). These callbacks allow drivers to
implement default param actions for runtime and permanent cmodes. For
driverinit params, the core latches the last value set by a driver via
devl_param_driverinit_value_set(), and uses that as the default value
for a param.

Because default parameter values are optional, it would be impossible
to discern whether or not a param of type bool has default value of
false or not provided if the default value is encoded using a netlink
flag type. For this reason, when a DEVLINK_PARAM_TYPE_BOOL has an
associated default value, the default value is encoded using a u8
type.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251119025038.651131-4-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 19:01:22 -08:00