linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-30 10:04:04 +02:00

Author	SHA1	Message	Date
Jakub Kicinski	dbbda7dd68	Notable features this time: - cfg80211/mac80211 - finished assoc frame encryption/EPPKE/802.1X-over-auth (also hwsim) - radar detection improvements - 6 GHz incumbent signal detection APIs - multi-link support for FILS, probe response templates and client probling - ath12k: - monitor mode support on IPQ5332 - basic hwmon temperature reporting -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmmoGDYACgkQ10qiO8sP aACl9BAAi4ezTR8jjvBQNjJ9EXJmamjVAitMlHulUaw0DVHnMAMALTgYGq0ZpIva 8EMiH/ksfxmYvu8qFYypYH2WcQAsg9DFuuo2Mcd4MwmJkOyQgme1mqaTpTDuHAWp S+wZBgQQCrnhQkmmNUJmp8m4Edw4cYi94jcct0BRYvAMBdQo4hMctA/7Ja8+ttU5 Q2uhHVZjmNPR2OXBp31INp4vo7RK5AXUFI5l/7XX36o7zIudtqbJJ0GL+1UNeG3f v4an+a0tiunacgZiuWeeL/U1t4cZ5WQiDV31FQPIBiiYQO5M76l7+cuikr3HLkG1 kdqGXs77blW32s7NF3MebswIV+dzmBF69HjwCxdsU0iWzp54y8I3Lgu/cN8O721a 2Pt6IGmcsOm9F9Lbrxn6UNHMjn6VQUYGg40NtbhHGwniheLX4Gi4MBjbgOdD3GJh 9h12h/2CRZcHjA6kg3tcdzluD09510IiWMbPaAtXr456CPJ+hBUJIutuXOszbA+7 d9eecObxoMtMqtesRLkhbyBMt7aNkWLYBvpSQVHaJktqt7c5NmKe0xXXdRHeIqKo XpXsl2q/1NrmSj9lPyyte8LHWWXQ+TVWWujqaHFUJdMDT/IBscKk4ahxGoEBtHOR KHRFCD2oRsyCnsI6tSJ3/IuU5AVmBIzd6wZlPYZUZI/PsWuMwIg= =oNzs -----END PGP SIGNATURE----- Merge tag 'wireless-next-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Notable features this time: - cfg80211/mac80211 - finished assoc frame encryption/EPPKE/802.1X-over-auth (also hwsim) - radar detection improvements - 6 GHz incumbent signal detection APIs - multi-link support for FILS, probe response templates and client probling - ath12k: - monitor mode support on IPQ5332 - basic hwmon temperature reporting * tag 'wireless-next-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (38 commits) wifi: UHR: define DPS/DBE/P-EDCA elements and fix size parsing wifi: mac80211_hwsim: change hwsim_class to a const struct wifi: mac80211: give the AP more time for EPPKE as well wifi: ath12k: Remove the unused argument from the Rx data path wifi: ath12k: Enable monitor mode support on IPQ5332 wifi: ath12k: Set up MLO after SSR wifi: ath11k: Silence remoteproc probe deferral prints wifi: cfg80211: support key installation on non-netdev wdevs wifi: cfg80211: make cluster id an array wifi: mac80211: update outdated comment wifi: mac80211: Advertise IEEE 802.1X authentication support wifi: mac80211: Add support for IEEE 802.1X authentication protocol in non-AP STA mode wifi: cfg80211: add support for IEEE 802.1X Authentication Protocol wifi: mac80211: Advertise EPPKE support based on driver capabilities wifi: mac80211_hwsim: Advertise support for (Re)Association frame encryption wifi: mac80211: Fix AAD/Nonce computation for management frames with MLO wifi: rt2x00: use generic nvmem_cell_get wifi: mac80211: fetch unsolicited probe response template by link ID wifi: mac80211: fetch FILS discovery template by link ID wifi: nl80211: don't allow DFS channels for NAN ... ==================== Link: https://patch.msgid.link/20260304113707.175181-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 15:30:05 -08:00
Julian Anastasov	f20c73b046	ipvs: use more keys for connection hashing Simon Kirby reported long time ago that IPVS connection hashing based only on the client address/port (caddr, cport) as hash keys is not suitable for setups that accept traffic on multiple virtual IPs and ports. It can happen for multiple VIP:VPORT services, for single or many fwmark service(s) that match multiple virtual IPs and ports or even for passive FTP with peristence in DR/TUN mode where we expect traffic on multiple ports for the virtual IP. Fix it by adding virtual addresses and ports to the hash function. This causes the traffic from NAT real servers to clients to use second hashing for the in->out direction. As result: - the IN direction from client will use hash node hn0 where the source/dest addresses and ports used by client will be used as hash keys - the OUT direction from NAT real servers will use hash node hn1 for the traffic from real server to client - the persistence templates are hashed only with parameters based on the IN direction, so they now will also use the virtual address, port and fwmark from the service. OLD: - all methods: c_list node: proto, caddr:cport - persistence templates: c_list node: proto, caddr_net:0 - persistence engine templates: c_list node: per-PE, PE-SIP uses jhash NEW: - all methods: hn0 node (dir 0): proto, caddr:cport -> vaddr:vport - MASQ method: hn1 node (dir 1): proto, daddr:dport -> caddr:cport - persistence templates: hn0 node (dir 0): proto, caddr_net:0 -> vaddr:vport_or_0 proto, caddr_net:0 -> fwmark:0 - persistence engine templates: hn0 node (dir 0): as before Also reorder the ip_vs_conn fields, so that hash nodes are on same read-mostly cache line while write-mostly fields are on separate cache line. Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:45 +01:00
Julian Anastasov	2fa7cc9c70	ipvs: switch to per-net connection table Use per-net resizable hash table for connections. The global table is slow to walk when using many namespaces. The table can be resized in the range of [256 - ip_vs_conn_tab_size]. Table is attached only while services are present. Resizing is done by delayed work based on load (the number of connections). Add a hash_key field into the connection to store the table ID in the highest bit and the entry's hash value in the lowest bits. The lowest part of the hash value is used as bucket ID, the remaining part is used to filter the entries in the bucket before matching the keys and as result, helps the lookup operation to access only one cache line. By knowing the table ID and bucket ID for entry, we can unlink it without calculating the hash value and doing lookup by keys. We need only to validate the saved hash_key under lock. For better security switch from jhash to siphash for the default connection hashing but the persistence engines may use their own function. Keeping the hash table loaded with entries below the size (12%) allows to avoid collision for 96+% of the conns. ip_vs_conn_fill_cport() now will rehash the connection with proper locking because unhash+hash is not safe for RCU readers. To invalidate the templates setting just dport to 0xffff is enough, no need to rehash them. As result, ip_vs_conn_unhash() is now unused and removed. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:45 +01:00
Julian Anastasov	840aac3d90	ipvs: use resizable hash table for services Make the hash table for services resizable in the bit range of 4-20. Table is attached only while services are present. Resizing is done by delayed work based on load (the number of hashed services). Table grows when load increases 2+ times (above 12.5% with lfactor=-3) and shrinks 8+ times when load decreases 16+ times (below 0.78%). Switch to jhash hashing to reduce the collisions for multiple services. Add a hash_key field into the service to store the table ID in the highest bit and the entry's hash value in the lowest bits. The lowest part of the hash value is used as bucket ID, the remaining part is used to filter the entries in the bucket before matching the keys and as result, helps the lookup operation to access only one cache line. By knowing the table ID and bucket ID for entry, we can unlink it without calculating the hash value and doing lookup by keys. We need only to validate the saved hash_key under lock. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:45 +01:00
Julian Anastasov	b655388111	ipvs: add resizable hash tables Add infrastructure for resizable hash tables based on hlist_bl which we will use in followup patches. The tables allow RCU lookups during resizing, bucket modifications are protected with per-bucket bit lock and additional custom locking, the tables are resized when load reaches thresholds determined based on load factor parameter. Compared to other implementations we rely on: * fast entry removal by using node unlinking without pre-lookup * entry rehashing when hash key changes * entries can contain multiple hash nodes * custom locking depending on different contexts * adjustable load factor to customize the grow/shrink process Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:45 +01:00
Florian Westphal	831fb31b76	ipv6: make ipv6_anycast_destination logic usable without dst_entry nft_fib_ipv6 uses ipv6_anycast_destination(), but upcoming patch removes the dst_entry usage in favor of fib6_result. Move the 'plen > 127' logic to a new helper and call it from the existing one. Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:44 +01:00
Yung Chih Su	4ee7fa6cf7	net: ipv4: fix ARM64 alignment fault in multipath hash seed `struct sysctl_fib_multipath_hash_seed` contains two u32 fields (user_seed and mp_seed), making it an 8-byte structure with a 4-byte alignment requirement. In `fib_multipath_hash_from_keys()`, the code evaluates the entire struct atomically via `READ_ONCE()`: mp_seed = READ_ONCE(net->ipv4.sysctl_fib_multipath_hash_seed).mp_seed; While this silently works on GCC by falling back to unaligned regular loads which the ARM64 kernel tolerates, it causes a fatal kernel panic when compiled with Clang and LTO enabled. Commit `e35123d83e` ("arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=y") strengthens `READ_ONCE()` to use Load-Acquire instructions (`ldar` / `ldapr`) to prevent compiler reordering bugs under Clang LTO. Since the macro evaluates the full 8-byte struct, Clang emits a 64-bit `ldar` instruction. ARM64 architecture strictly requires `ldar` to be naturally aligned, thus executing it on a 4-byte aligned address triggers a strict Alignment Fault (FSC = 0x21). Fix the read side by moving the `READ_ONCE()` directly to the `u32` member, which emits a safe 32-bit `ldar Wn`. Furthermore, Eric Dumazet pointed out that `WRITE_ONCE()` on the entire struct in `proc_fib_multipath_hash_set_seed()` is also flawed. Analysis shows that Clang splits this 8-byte write into two separate 32-bit `str` instructions. While this avoids an alignment fault, it destroys atomicity and exposes a tear-write vulnerability. Fix this by explicitly splitting the write into two 32-bit `WRITE_ONCE()` operations. Finally, add the missing `READ_ONCE()` when reading `user_seed` in `proc_fib_multipath_hash_seed()` to ensure proper pairing and concurrency safety. Fixes: `4ee2a8cace` ("net: ipv4: Add a sysctl to set multipath hash seed") Signed-off-by: Yung Chih Su <yuuchihsu@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260302060247.7066-1-yuuchihsu@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-03 17:20:37 -08:00
Dipayaan Roy	2b12ffb669	net: mana: Trigger VF reset/recovery on health check failure due to HWC timeout The GF stats periodic query is used as mechanism to monitor HWC health check. If this HWC command times out, it is a strong indication that the device/SoC is in a faulty state and requires recovery. Today, when a timeout is detected, the driver marks hwc_timeout_occurred, clears cached stats, and stops rescheduling the periodic work. However, the device itself is left in the same failing state. Extend the timeout handling path to trigger the existing MANA VF recovery service by queueing a GDMA_EQE_HWC_RESET_REQUEST work item. This is expected to initiate the appropriate recovery flow by suspende resume first and if it fails then trigger a bus rescan. This change is intentionally limited to HWC command timeouts and does not trigger recovery for errors reported by the SoC as a normal command response. Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/aaFShvKnwR5FY8dH@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-03 11:14:22 +01:00
Jiayuan Chen	479d589b40	bpf/bonding: reject vlan+srcmac xmit_hash_policy change when XDP is loaded bond_option_mode_set() already rejects mode changes that would make a loaded XDP program incompatible via bond_xdp_check(). However, bond_option_xmit_hash_policy_set() has no such guard. For 802.3ad and balance-xor modes, bond_xdp_check() returns false when xmit_hash_policy is vlan+srcmac, because the 802.1q payload is usually absent due to hardware offload. This means a user can: 1. Attach a native XDP program to a bond in 802.3ad/balance-xor mode with a compatible xmit_hash_policy (e.g. layer2+3). 2. Change xmit_hash_policy to vlan+srcmac while XDP remains loaded. This leaves bond->xdp_prog set but bond_xdp_check() now returning false for the same device. When the bond is later destroyed, dev_xdp_uninstall() calls bond_xdp_set(dev, NULL, NULL) to remove the program, which hits the bond_xdp_check() guard and returns -EOPNOTSUPP, triggering: WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL)) Fix this by rejecting xmit_hash_policy changes to vlan+srcmac when an XDP program is loaded on a bond in 802.3ad or balance-xor mode. commit `39a0876d59` ("net, bonding: Disallow vlan+srcmac with XDP") introduced bond_xdp_check() which returns false for 802.3ad/balance-xor modes when xmit_hash_policy is vlan+srcmac. The check was wired into bond_xdp_set() to reject XDP attachment with an incompatible policy, but the symmetric path -- preventing xmit_hash_policy from being changed to an incompatible value after XDP is already loaded -- was left unguarded in bond_option_xmit_hash_policy_set(). Note: commit `094ee6017e` ("bonding: check xdp prog when set bond mode") later added a similar guard to bond_option_mode_set(), but bond_option_xmit_hash_policy_set() remained unprotected. Reported-by: syzbot+5a287bcdc08104bc3132@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6995aff6.050a0220.2eeac1.014e.GAE@google.com/T/ Fixes: `39a0876d59` ("net, bonding: Disallow vlan+srcmac with XDP") Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Link: https://patch.msgid.link/20260226080306.98766-2-jiayuan.chen@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-03 10:47:37 +01:00
Kuniyuki Iwashima	425e080a1c	dccp Remove inet_hashinfo2_init_mod(). Commit `c92c81df93` ("net: dccp: fix kernel crash on module load") added inet_hashinfo2_init_mod() for DCCP. Commit `22d6c9eebf` ("net: Unexport shared functions for DCCP.") removed EXPORT_SYMBOL_GPL() it but forgot to remove the function itself. Let's remove inet_hashinfo2_init_mod(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260301063756.1581685-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-02 18:50:28 -08:00
Kuniyuki Iwashima	3c1e53e554	ipmr: Add dedicated mutex for mrt->{mfc_hash,mfc_cache_list}. We will no longer hold RTNL for ipmr_rtm_route() to modify the MFC hash table. Only __dev_get_by_index() in rtm_to_ipmr_mfcc() is the RTNL dependant, otherwise, we just need protection for mrt->mfc_hash and mrt->mfc_cache_list. Let's add a new mutex for ipmr_mfc_add(), ipmr_mfc_delete(), and mroute_clean_tables() (setsockopt(MRT_FLUSH or MRT_DONE)). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-15-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-02 18:49:41 -08:00
Kuniyuki Iwashima	4480d5fa1f	ipmr/ip6mr: Convert net->ipv[46].ipmr_seq to atomic_t. We will no longer hold RTNL for ipmr_mfc_add() and ipmr_mfc_delete(). MFC entry can be loosely connected with VIF by its index for mrt->vif_table[] (stored in mfc_parent), but the two tables are not synchronised. i.e. Even if VIF 1 is removed, MFC for VIF 1 is not automatically removed. The only field that the MFC/VIF interfaces share is net->ipv[46].ipmr_seq, which is protected by RTNL. Adding a new mutex for both just to protect a single field is overkill. Let's convert the field to atomic_t. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-14-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-02 18:49:41 -08:00
Kuniyuki Iwashima	1c36d186a0	ipmr: Define net->ipv4.{ipmr_notifier_ops,ipmr_seq} under CONFIG_IP_MROUTE. net->ipv4.ipmr_notifier_ops and net->ipv4.ipmr_seq are used only in net/ipv4/ipmr.c. Let's move these definitions under CONFIG_IP_MROUTE. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-13-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-02 18:49:41 -08:00
Eric Dumazet	8341c989ac	net: remove addr_len argument of recvmsg() handlers Use msg->msg_namelen as a place holder instead of a temporary variable, notably in inet[6]_recvmsg(). This removes stack canaries and allows tail-calls. $ scripts/bloat-o-meter -t vmlinux.old vmlinux add/remove: 0/0 grow/shrink: 2/19 up/down: 26/-532 (-506) Function old new delta rawv6_recvmsg 744 767 +23 vsock_dgram_recvmsg 55 58 +3 vsock_connectible_recvmsg 50 47 -3 unix_stream_recvmsg 161 158 -3 unix_seqpacket_recvmsg 62 59 -3 unix_dgram_recvmsg 42 39 -3 tcp_recvmsg 546 543 -3 mptcp_recvmsg 1568 1565 -3 ping_recvmsg 806 800 -6 tcp_bpf_recvmsg_parser 983 974 -9 ip_recv_error 588 576 -12 ipv6_recv_rxpmtu 442 428 -14 udp_recvmsg 1243 1224 -19 ipv6_recv_error 1046 1024 -22 udpv6_recvmsg 1487 1461 -26 raw_recvmsg 465 437 -28 udp_bpf_recvmsg 1027 984 -43 sock_common_recvmsg 103 27 -76 inet_recvmsg 257 175 -82 inet6_recvmsg 257 175 -82 tcp_bpf_recvmsg 663 568 -95 Total: Before=25143834, After=25143328, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260227151120.1346573-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-02 18:17:17 -08:00
Avraham Stern	7c6084d7fa	wifi: cfg80211: support key installation on non-netdev wdevs Currently key installation is only supported for netdev. For NAN, support most key operations (except setting default data key) on wdevs instead of netdevs, and adjust all the APIs and tracing to match. Since nothing currently sets NL80211_EXT_FEATURE_SECURE_NAN, this doesn't change anything (P2P Device already isn't allowed.) Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260107150057.69a0cfad95fa.I00efdf3b2c11efab82ef6ece9f393382bcf33ba8@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-02 11:28:33 +01:00
Miri Korenblit	94d8657392	wifi: cfg80211: make cluster id an array cfg80211_nan_conf::cluster_id is currently a pointer, but there is no real reason to not have it an array. It makes things easier as there is no need to check the pointer validity each time. If a cluster ID wasn't provided by user space it will be randomized. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260302091108.2b12e4ccf5bb.Ib16bf5cca55463d4c89e18099cf1dfe4de95d405@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-02 11:01:02 +01:00
Sai Pratyusha Magam	a536be9231	wifi: mac80211: Fix AAD/Nonce computation for management frames with MLO Per IEEE Std 802.11be-2024, 12.5.2.3.3, if the MPDU is an individually addressed Data frame between an AP MLD and a non-AP MLD associated with the AP MLD, then A1/A2/A3 will be MLD MAC addresses. Otherwise, Al/A2/A3 will be over-the-air link MAC addresses. Currently, during AAD and Nonce computation for software based encryption/decryption cases, mac80211 directly uses the addresses it receives in the skb frame header. However, after the first authentication, management frame addresses for non-AP MLD stations are translated to MLD addresses from over the air link addresses in software. This means that the skb header could contain translated MLD addresses, which when used as is, can lead to incorrect AAD/Nonce computation. In the following manner, ensure that the right set of addresses are used: In the receive path, stash the pre-translated link addresses in ieee80211_rx_data and use them for the AAD/Nonce computations when required. In the transmit path, offload the encryption for a CCMP/GCMP key to the hwsim driver that can then ensure that encryption and hence the AAD/Nonce computations are performed on the frame containing the right set of addresses, i.e, MLD addresses if unicast data frame and link addresses otherwise. To do so, register the set key handler in hwsim driver so mac80211 is aware that it is the driver that would take care of encrypting the frame. Offload encryption for a CCMP/GCMP key, while keeping the encryption for WEP/TKIP and MMIE generation for a AES_CMAC or a AES_GMAC key still at the SW crypto in mac layer Co-developed-by: Rohan Dutta <quic_drohan@quicinc.com> Signed-off-by: Rohan Dutta <quic_drohan@quicinc.com> Signed-off-by: Sai Pratyusha Magam <sai.magam@oss.qualcomm.com> Link: https://patch.msgid.link/20260226042959.3766157-1-sai.magam@oss.qualcomm.com [only store and apply link_addrs for unicast non-data rather storing always and applying for !unicast_data] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-02 09:53:19 +01:00
Sriram R	e098c26b35	wifi: mac80211: fetch unsolicited probe response template by link ID Currently, the unsolicited probe response template is always fetched from the default link of a virtual interface in both Multi-Link Operation (MLO) and non-MLO cases. However, in the MLO case there is a need to fetch the unsolicited probe response template from a specific link instead of the default link. Hence, add support for fetching the unsolicited probe response template based on the link ID from the corresponding link data. Signed-off-by: Sriram R <quic_srirrama@quicinc.com> Co-developed-by: Raj Kumar Bhagat <raj.bhagat@oss.qualcomm.com> Signed-off-by: Raj Kumar Bhagat <raj.bhagat@oss.qualcomm.com> Link: https://patch.msgid.link/20260220-fils-prob-by-link-v1-2-a2746a853f75@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-02 09:29:15 +01:00
Sriram R	0495b64132	wifi: mac80211: fetch FILS discovery template by link ID Currently, the FILS discovery template is always fetched from the default link of a virtual interface in both Multi-Link Operation (MLO) and non-MLO cases. However, in the MLO case there is a need to fetch the FILS discovery template from a specific link instead of the default link. Hence, add support for fetching the FILS discovery template based on the link ID from the corresponding link data. Signed-off-by: Sriram R <quic_srirrama@quicinc.com> Co-developed-by: Raj Kumar Bhagat <raj.bhagat@oss.qualcomm.com> Signed-off-by: Raj Kumar Bhagat <raj.bhagat@oss.qualcomm.com> Link: https://patch.msgid.link/20260220-fils-prob-by-link-v1-1-a2746a853f75@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-02 09:29:15 +01:00
Miri Korenblit	033fe322f5	wifi: nl80211/cfg80211: support stations of non-netdev interfaces Currently, a station can only be added to a netdev interface, mainly because there was no need for a station of a non-netdev interface. But for NAN, we will have stations that belong to the NL80211_IFTYPE_NAN interface. Prepare for adding/changing/deleting a station that belongs to a non-netdev interface. This doesn't actually allow such stations - this will be done in a different patch. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.65c9cc96f814.Ic02066b88bb8ad6b21e15cbea8d720280008c83b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-02 09:23:03 +01:00
Hari Chandrakanthan	6a584e336c	wifi: cfg80211: add support to handle incumbent signal detected event from mac80211/driver When any incumbent signal is detected by an AP/mesh interface operating in 6 GHz band, FCC mandates the AP/mesh to vacate the channels affected by it [1]. Add a new API cfg80211_incumbent_signal_notify() that can be used by mac80211 or drivers to notify the higher layers about the signal interference event with the interference bitmap in which each bit denotes the affected 20 MHz in the operating channel. Add support for the new nl80211 event and nl80211 attribute as well to notify userspace on the details about the interference event. Userspace is expected to process it and take further action - vacate the channel, or reduce the bandwidth. [1] - https://apps.fcc.gov/kdb/GetAttachment.html?id=nXQiRC%2B4mfiA54Zha%2BrW4Q%3D%3D&desc=987594%20D02%20U-NII%206%20GHz%20EMC%20Measurement%20v03&tracking_number=277034 Signed-off-by: Hari Chandrakanthan <quic_haric@quicinc.com> Signed-off-by: Amith A <amith.a@oss.qualcomm.com> Link: https://patch.msgid.link/20260216032027.2310956-2-amith.a@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-02 09:14:54 +01:00
Janusz Dziedzic	d69cb039ab	wifi: cfg80211: set and report chandef CAC ongoing Allow to track and check CAC state from user mode by simple check phy channels eg. using iw phy1 channels command. This is done for regular CAC and background CAC. It is important for background CAC while we can start it from any app (eg. iw or hostapd). Signed-off-by: Janusz Dziedzic <janusz.dziedzic@gmail.com> Link: https://patch.msgid.link/20260206171830.553879-3-janusz.dziedzic@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-02 09:10:28 +01:00
Jesper Dangaard Brouer	67713dff63	net: sched: sch_dualpi2: use qdisc_dequeue_drop() for dequeue drops DualPI2 drops packets during dequeue but was using kfree_skb_reason() directly, bypassing trace_qdisc_drop. Convert to qdisc_dequeue_drop() and add QDISC_DROP_L4S_STEP_NON_ECN to the qdisc drop reason enum. - Set TCQ_F_DEQUEUE_DROPS flag in dualpi2_init() - Use enum qdisc_drop_reason in drop_and_retry() - Replace kfree_skb_reason() with qdisc_dequeue_drop() Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/177211351978.3011628.11267023360997620069.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-28 15:31:35 -08:00
Jesper Dangaard Brouer	9d3e7f9718	net: sched: rename QDISC_DROP_CAKE_FLOOD to QDISC_DROP_FLOOD_PROTECTION Rename QDISC_DROP_CAKE_FLOOD to QDISC_DROP_FLOOD_PROTECTION to use a generic name without embedding the qdisc name. This follows the principle that drop reasons should describe the drop mechanism rather than being tied to a specific qdisc implementation. The flood protection drop reason is used by qdiscs implementing probabilistic drop algorithms (like BLUE) that detect unresponsive flows indicating potential DoS or flood attacks. CAKE uses this via its Cobalt AQM component. Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/177211347537.3011628.13759059534638729639.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-28 15:31:35 -08:00
Jesper Dangaard Brouer	f30d9073ec	net: sched: rename QDISC_DROP_FQ_* to generic names Rename FQ-specific drop reasons to generic names: - QDISC_DROP_FQ_BAND_LIMIT -> QDISC_DROP_BAND_LIMIT - QDISC_DROP_FQ_HORIZON_LIMIT -> QDISC_DROP_HORIZON_LIMIT This follows the principle that drop reasons should describe the drop mechanism rather than being tied to a specific qdisc implementation. These concepts (priority band limits, timestamp horizon) could apply to other qdiscs as well. Remove the local macro define FQDR() and instead use the full QDISC_DROP_* name to make it easier to navigate code. Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/177211346902.3011628.12523261489552097455.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-28 15:31:35 -08:00
Jesper Dangaard Brouer	3e28f8ad47	net: sched: sfq: convert to qdisc drop reasons Convert SFQ to use the new qdisc-specific drop reason infrastructure. This patch demonstrates how to convert a flow-based qdisc to use the new enum qdisc_drop_reason. As part of this conversion: - Add QDISC_DROP_MAXFLOWS for flow table exhaustion - Rename FQ_FLOW_LIMIT to generic FLOW_LIMIT, now shared by FQ and SFQ - Use QDISC_DROP_OVERLIMIT for sfq_drop() when overall limit exceeded - Use QDISC_DROP_FLOW_LIMIT for per-flow depth limit exceeded The FLOW_LIMIT reason is now a common drop reason for per-flow limits, applicable to both FQ and SFQ qdiscs. Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/177211345946.3011628.12770616071857185664.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-28 15:31:34 -08:00
Jesper Dangaard Brouer	ff2998f29f	net: sched: introduce qdisc-specific drop reason tracing Create new enum qdisc_drop_reason and trace_qdisc_drop tracepoint for qdisc layer drop diagnostics with direct qdisc context visibility. The new tracepoint includes qdisc handle, parent, kind (name), and device information. Existing SKB_DROP_REASON_QDISC_DROP is retained for backwards compatibility via kfree_skb_reason(). Convert qdiscs with drop reasons to use the new infrastructure. Change CAKE's cobalt_should_drop() return type from enum skb_drop_reason to enum qdisc_drop_reason to fix implicit enum conversion warnings. Use QDISC_DROP_UNSPEC as the 'not dropped' sentinel instead of SKB_NOT_DROPPED_YET. Both have the same compiled value (0), so the comparison logic remains semantically equivalent. Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/177211345275.3011628.1974310302645218067.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-28 15:31:34 -08:00
Nikhil P. Rao	60abb0ac11	xsk: Fix fragment node deletion to prevent buffer leak After commit `b692bf9a75` ("xsk: Get rid of xdp_buff_xsk::xskb_list_node"), the list_node field is reused for both the xskb pool list and the buffer free list, this causes a buffer leak as described below. xp_free() checks if a buffer is already on the free list using list_empty(&xskb->list_node). When list_del() is used to remove a node from the xskb pool list, it doesn't reinitialize the node pointers. This means list_empty() will return false even after the node has been removed, causing xp_free() to incorrectly skip adding the buffer to the free list. Fix this by using list_del_init() instead of list_del() in all fragment handling paths, this ensures the list node is reinitialized after removal, allowing the list_empty() to work correctly. Fixes: `b692bf9a75` ("xsk: Get rid of xdp_buff_xsk::xskb_list_node") Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com> Link: https://patch.msgid.link/20260225000456.107806-2-nikhil.rao@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-28 08:55:11 -08:00
Victor Nogueira	11cb63b0d1	net/sched: Only allow act_ct to bind to clsact/ingress qdiscs and shared blocks As Paolo said earlier [1]: "Since the blamed commit below, classify can return TC_ACT_CONSUMED while the current skb being held by the defragmentation engine. As reported by GangMin Kim, if such packet is that may cause a UaF when the defrag engine later on tries to tuch again such packet." act_ct was never meant to be used in the egress path, however some users are attaching it to egress today [2]. Attempting to reach a middle ground, we noticed that, while most qdiscs are not handling TC_ACT_CONSUMED, clsact/ingress qdiscs are. With that in mind, we address the issue by only allowing act_ct to bind to clsact/ingress qdiscs and shared blocks. That way it's still possible to attach act_ct to egress (albeit only with clsact). [1] https://lore.kernel.org/netdev/674b8cbfc385c6f37fb29a1de08d8fe5c2b0fbee.1771321118.git.pabeni@redhat.com/ [2] https://lore.kernel.org/netdev/cc6bfb4a-4a2b-42d8-b9ce-7ef6644fb22b@ovn.org/ Reported-by: GangMin Kim <km.kim1503@gmail.com> Fixes: `3f14b377d0` ("net/sched: act_ct: fix skb leak and crash on ooo frags") CC: stable@vger.kernel.org Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260225134349.1287037-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-27 19:06:21 -08:00
Eric Dumazet	5151ec54f5	net: use try_cmpxchg() in lock_sock_nested() Add a fast path in lock_sock_nested(), to avoid acquiring the socket spinlock only to set @owned to one: spin_lock_bh(&sk->sk_lock.slock); if (unlikely(sock_owned_by_user_nocheck(sk))) __lock_sock(sk); sk->sk_lock.owned = 1; spin_unlock_bh(&sk->sk_lock.slock); On x86_64 compiler generates something quite efficient: 00000000000077c0 <lock_sock_nested>: 77c0: f3 0f 1e fa endbr64 77c4: e8 00 00 00 00 call __fentry__ 77c9: b9 01 00 00 00 mov $0x1,%ecx 77ce: 31 c0 xor %eax,%eax 77d0: f0 48 0f b1 8f 48 01 00 00 lock cmpxchg %rcx,0x148(%rdi) 77d9: 75 06 jne slow_path 77db: 2e e9 00 00 00 00 cs jmp __x86_return_thunk-0x4 slow_path: ... Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20260226021215.1764237-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-27 17:25:45 -08:00
Eric Dumazet	29252397bc	inet: annotate data-races around isk->inet_num UDP/TCP lookups are using RCU, thus isk->inet_num accesses should use READ_ONCE() and WRITE_ONCE() where needed. Fixes: `3ab5aee7fe` ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260225203545.1512417-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-27 17:16:59 -08:00
Paul Moses	62413a9c3c	net/sched: act_gate: snapshot parameters with RCU on replace The gate action can be replaced while the hrtimer callback or dump path is walking the schedule list. Convert the parameters to an RCU-protected snapshot and swap updates under tcf_lock, freeing the previous snapshot via call_rcu(). When REPLACE omits the entry list, preserve the existing schedule so the effective state is unchanged. Fixes: `a51c328df3` ("net: qos: introduce a gate control flow action") Cc: stable@vger.kernel.org Signed-off-by: Paul Moses <p@1g4.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260223150512.2251594-2-p@1g4.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-27 16:10:36 -08:00
Byungchul Park	fd6dad4e1a	netmem: remove the pp fields from net_iov Now that the pp fields in net_iov have no users, remove them from net_iov and clean up. Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20260224061424.11219-1-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-26 19:45:24 -08:00
Jakub Kicinski	0314e382cf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc2). Conflicts: tools/testing/selftests/drivers/net/hw/rss_ctx.py `19c3a2a81d` ("selftests: drv-net: rss: Generate unique ports for RSS context tests") `ce5a0f4612` ("selftests: drv-net: rss_ctx: test RSS contexts persist after ifdown/up") include/net/inet_connection_sock.h `858d2a4f67` ("tcp: fix potential race in tcp_v6_syn_recv_sock()") `fcd3d039fa` ("tcp: make tcp_v{4,6}_send_check() static") https://lore.kernel.org/aZ8PSFLzBrEU3I89@sirena.org.uk drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c `69050f8d6d` ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") `bf4afc53b7` ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") `8a96b9144f` ("net/mlx5e: Alloc xsk channel param out of mlx5e_open_xsk()") Adjacent changes: net/netfilter/ipvs/ip_vs_ctl.c `c59bd9e62e` ("ipvs: use more counters to avoid service lookups") `bf4afc53b7` ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-26 10:23:00 -08:00
Linus Torvalds	b9c8fc2cae	Including fixes from IPsec, Bluetooth and netfilter Current release - regressions: - wifi: fix dev_alloc_name() return value check - rds: fix recursive lock in rds_tcp_conn_slots_available Current release - new code bugs: - vsock: lock down child_ns_mode as write-once Previous releases - regressions: - core: - do not pass flow_id to set_rps_cpu() - consume xmit errors of GSO frames - netconsole: avoid OOB reads, msg is not nul-terminated - netfilter: h323: fix OOB read in decode_choice() - tcp: re-enable acceptance of FIN packets when RWIN is 0 - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb(). - wifi: brcmfmac: fix potential kernel oops when probe fails - phy: register phy led_triggers during probe to avoid AB-BA deadlock - eth: bnxt_en: fix deleting of Ntuple filters - eth: wan: farsync: fix use-after-free bugs caused by unfinished tasklets - eth: xscale: check for PTP support properly Previous releases - always broken: - tcp: fix potential race in tcp_v6_syn_recv_sock() - kcm: fix zero-frag skb in frag_list on partial sendmsg error - xfrm: - fix race condition in espintcp_close() - always flush state and policy upon NETDEV_UNREGISTER event - bluetooth: - purge error queues in socket destructors - fix response to L2CAP_ECRED_CONN_REQ - eth: mlx5: - fix circular locking dependency in dump - fix "scheduling while atomic" in IPsec MAC address query - eth: gve: fix incorrect buffer cleanup for QPL - eth: team: avoid NETDEV_CHANGEMTU event when unregistering slave - eth: usb: validate USB endpoints Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmmgYU4SHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkLBgQAINazHstJ0DoDkvmwXapRSN0Ffauyd46 oX6nfeWOT3BzZbAhZHtGgCSs4aULifJWMevtT7pq7a7PgZwMwfa47BugR1G/u5UE hCqalNjRTB/U2KmFk6eViKSacD4FvUIAyAMOotn1aEdRRAkBIJnIW/o/ZR9ZUkm0 5+UigO64aq57+FOc5EQdGjYDcTVdzW12iOZ8ZqwtSATdNd9aC+gn3voRomTEo+Fm kQinkFEPAy/YyHGmfpC/z87/RTgkYLpagmsT4ZvBJeNPrIRvFEibSpPNhuzTzg81 /BW5M8sJmm3XFiTiRp6Blv+0n6HIpKjAZMHn5c9hzX9cxPZQ24EjkXEex9ClaxLd OMef79rr1HBwqBTpIlK7xfLKCdT5Iex88s8HxXRB/Psqk9pVP469cSoK6cpyiGiP I+4WT0wn9ukTiu/yV2L2byVr1sanlu54P+UBYJpDwqq3lZ1ngWtkJ+SY369jhwAS FYIBmUSKhmWz3FEULaGpgPy4m9Fl/fzN8IFh2Buoc/Puq61HH7MAMjRty2ZSFTqj gbHrRhlkCRqubytgjsnCDPLoJF4ZYcXtpo/8ogG3641H1I+dN+DyGGVZ/ioswkks My1ds0rKqA3BHCmn+pN/qqkuopDCOB95dqOpgDqHG7GePrpa/FJ1guhxexsCd+nL Run2RcgDmd+d =HBOu -----END PGP SIGNATURE----- Merge tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from IPsec, Bluetooth and netfilter Current release - regressions: - wifi: fix dev_alloc_name() return value check - rds: fix recursive lock in rds_tcp_conn_slots_available Current release - new code bugs: - vsock: lock down child_ns_mode as write-once Previous releases - regressions: - core: - do not pass flow_id to set_rps_cpu() - consume xmit errors of GSO frames - netconsole: avoid OOB reads, msg is not nul-terminated - netfilter: h323: fix OOB read in decode_choice() - tcp: re-enable acceptance of FIN packets when RWIN is 0 - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb(). - wifi: brcmfmac: fix potential kernel oops when probe fails - phy: register phy led_triggers during probe to avoid AB-BA deadlock - eth: - bnxt_en: fix deleting of Ntuple filters - wan: farsync: fix use-after-free bugs caused by unfinished tasklets - xscale: check for PTP support properly Previous releases - always broken: - tcp: fix potential race in tcp_v6_syn_recv_sock() - kcm: fix zero-frag skb in frag_list on partial sendmsg error - xfrm: - fix race condition in espintcp_close() - always flush state and policy upon NETDEV_UNREGISTER event - bluetooth: - purge error queues in socket destructors - fix response to L2CAP_ECRED_CONN_REQ - eth: - mlx5: - fix circular locking dependency in dump - fix "scheduling while atomic" in IPsec MAC address query - gve: fix incorrect buffer cleanup for QPL - team: avoid NETDEV_CHANGEMTU event when unregistering slave - usb: validate USB endpoints" * tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) netfilter: nf_conntrack_h323: fix OOB read in decode_choice() dpaa2-switch: validate num_ifs to prevent out-of-bounds write net: consume xmit errors of GSO frames vsock: document write-once behavior of the child_ns_mode sysctl vsock: lock down child_ns_mode as write-once selftests/vsock: change tests to respect write-once child ns mode net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query net/mlx5: Fix missing devlink lock in SRIOV enable error path net/mlx5: E-switch, Clear legacy flag when moving to switchdev net/mlx5: LAG, disable MPESW in lag_disable_change() net/mlx5: DR, Fix circular locking dependency in dump selftests: team: Add a reference count leak test team: avoid NETDEV_CHANGEMTU event when unregistering slave net: mana: Fix double destroy_workqueue on service rescan PCI path MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER dpll: zl3073x: Remove redundant cleanup in devm_dpll_init() selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0 tcp: re-enable acceptance of FIN packets when RWIN is 0 vsock: Use container_of() to get net namespace in sysctl handlers net: usb: kaweth: validate USB endpoints ...	2026-02-26 08:00:13 -08:00
Bobby Eshleman	102eab95f0	vsock: lock down child_ns_mode as write-once Two administrator processes may race when setting child_ns_mode as one process sets child_ns_mode to "local" and then creates a namespace, but another process changes child_ns_mode to "global" between the write and the namespace creation. The first process ends up with a namespace in "global" mode instead of "local". While this can be detected after the fact by reading ns_mode and retrying, it is fragile and error-prone. Make child_ns_mode write-once so that a namespace manager can set it once and be sure it won't change. Writing a different value after the first write returns -EBUSY. This applies to all namespaces, including init_net, where an init process can write "local" to lock all future namespaces into local mode. Fixes: `eafb64f40c` ("vsock: add netns to vsock core") Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Co-developed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-2-c0cde6959923@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-26 11:10:03 +01:00
Florian Westphal	6b94d081f8	netfilter: nf_tables: remove register tracking infrastructure This facility was disabled in commit `9e539c5b6d` ("netfilter: nf_tables: disable expression reduction infra"), because not all nft_exprs guarantee they will update the destination register: some may set NFT_BREAK instead to cancel evaluation of the rule. This has been dead code ever since. There are no plans to salvage this at this time, so remove this. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-10-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:26 -08:00
Julian Anastasov	09b71fb459	ipvs: no_cport and dropentry counters can be per-net Change the no_cport counters to be per-net and address family. This should reduce the extra conn lookups done during present NO_CPORT connections. By changing from global to per-net dropentry counters, one net will not affect the drop rate of another net. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-7-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:26 -08:00
Julian Anastasov	c59bd9e62e	ipvs: use more counters to avoid service lookups When new connection is created we can lookup for services multiple times to support fallback options. We already have some counters to skip specific lookups because it costs CPU cycles for hash calculation, etc. Add more counters for fwmark/non-fwmark services (fwm_services and nonfwm_services) and make all counters per address family. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-6-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:26 -08:00
Julian Anastasov	b24ae1a387	ipvs: use single svc table fwmark based services and non-fwmark based services can be hashed in same service table. This reduces the burden of working with two tables. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-4-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:25 -08:00
Jiejian Wu	74455a5b43	ipvs: make ip_vs_svc_table and ip_vs_svc_fwm_table per netns Current ipvs uses one global mutex "__ip_vs_mutex" to keep the global "ip_vs_svc_table" and "ip_vs_svc_fwm_table" safe. But when there are tens of thousands of services from different netns in the table, it takes a long time to look up the table, for example, using "ipvsadm -ln" from different netns simultaneously. We make "ip_vs_svc_table" and "ip_vs_svc_fwm_table" per netns, and we add "service_mutex" per netns to keep these two tables safe instead of the global "__ip_vs_mutex" in current version. To this end, looking up services from different netns simultaneously will not get stuck, shortening the time consumption in large-scale deployment. It can be reproduced using the simple scripts below. init.sh: #!/bin/bash for((i=1;i<=4;i++));do ip netns add ns$i ip netns exec ns$i ip link set dev lo up ip netns exec ns$i sh add-services.sh done add-services.sh: #!/bin/bash for((i=0;i<30000;i++)); do ipvsadm -A -t 10.10.10.10:$((80+$i)) -s rr done runtest.sh: #!/bin/bash for((i=1;i<4;i++));do ip netns exec ns$i ipvsadm -ln > /dev/null & done ip netns exec ns4 ipvsadm -ln > /dev/null Run "sh init.sh" to initiate the network environment. Then run "time ./runtest.sh" to evaluate the time consumption. Our testbed is a 4-core Intel Xeon ECS. The result of the original version is around 8 seconds, while the result of the modified version is only 0.8 seconds. Signed-off-by: Jiejian Wu <jiejian@linux.alibaba.com> Co-developed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-2-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:25 -08:00
Kuniyuki Iwashima	fc1f97929a	bonding: Optimise is_netpoll_tx_blocked(). bond_start_xmit() spends some cycles in is_netpoll_tx_blocked(): if (unlikely(is_netpoll_tx_blocked(dev))) return NETDEV_TX_BUSY; because of the "pushf;pop reg" sequence (aka irqs_disabled()). Let's swap the conditions in is_netpoll_tx_blocked() and convert netpoll_block_tx to a static key. Before: 1.23 │ mov %gs:0x28,%rax 1.24 │ mov %rax,0x18(%rsp) 29.45 │ pushfq 0.50 │ pop %rax 0.47 │ test $0x200,%eax │ ↓ je 1b4 0.49 │ 32: lea 0x980(%rsi),%rbx After: 0.72 │ mov %gs:0x28,%rax 0.81 │ mov %rax,0x18(%rsp) 0.82 │ nop 2.77 │ 2a: lea 0x980(%rsi),%rbx Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260223230749.2376145-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 18:13:38 -08:00
Eric Dumazet	539a6cf084	tcp: move inet6_csk_update_pmtu() to tcp_ipv6.c This function is only called from tcp_v6_mtu_reduced() and can be (auto)inlined by the compiler. Note that inet6_csk_route_socket() is no longer (auto)inlined, which is a good thing as it is slow path. $ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1 add/remove: 0/2 grow/shrink: 2/0 up/down: 93/-129 (-36) Function old new delta tcp_v6_mtu_reduced 139 228 +89 inet6_csk_route_socket 486 490 +4 __pfx_inet6_csk_update_pmtu 16 - -16 inet6_csk_update_pmtu 113 - -113 Total: Before=25076512, After=25076476, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260223153047.886683-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:47:27 -08:00
Eric Dumazet	fcd3d039fa	tcp: make tcp_v{4,6}_send_check() static tcp_v{4,6}_send_check() are only called from tcp_output.c and should be made static so that the compiler does not need to put an out of line copy of them. Remove (struct inet_connection_sock_af_ops) send_check field and use instead @net_header_len. Move @net_header_len close to @queue_xmit for data locality as both are used in TCP tx fast path. $ scripts/bloat-o-meter -t vmlinux.2 vmlinux.3 add/remove: 0/2 grow/shrink: 0/3 up/down: 0/-172 (-172) Function old new delta __tcp_transmit_skb 3426 3423 -3 tcp_v4_send_check 136 132 -4 mptcp_subflow_init 777 763 -14 __pfx_tcp_v6_send_check 16 - -16 tcp_v6_send_check 135 - -135 Total: Before=25143196, After=25143024, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260223100729.3761597-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:16:09 -08:00
Eric Dumazet	255688652b	tcp: move tcp_v6_send_check() to tcp_output.c Move tcp_v6_send_check() so that __tcp_transmit_skb() can inline it. $ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2 add/remove: 0/0 grow/shrink: 1/0 up/down: 105/0 (105) Function old new delta __tcp_transmit_skb 3321 3426 +105 Total: Before=25143091, After=25143196, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260223100729.3761597-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:16:09 -08:00
Eric Dumazet	bd5e5e1d41	tcp: inline __tcp_v4_send_check() Inline __tcp_v4_send_check(), like __tcp_v6_send_check(). Move tcp_v4_send_check() to tcp_output.c close to its fast path caller (__tcp_transmit_skb()). Note __tcp_v4_send_check() is still out-of-line for tcp4_gso_segment() because it is called in an unlikely() section. $ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1 add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-9 (-9) Function old new delta __tcp_v4_send_check 130 121 -9 Total: Before=25143100, After=25143091, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260223100729.3761597-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:16:09 -08:00
Eric Dumazet	f033335937	udp: move udp6_csum_init() back to net/ipv6/udp.c This function has a single caller in net/ipv6/udp.c. Move it there so that the compiler can decide to (auto)inline it if he prefers to. IBT glue is removed anyway. With clang, we can see it was able to inline it and also inlined one other helper at the same time. UDPLITE removal will also help. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 1/0 up/down: 840/-785 (55) Function old new delta __udp6_lib_rcv 1247 2087 +840 __pfx_udp6_csum_init 16 - -16 udp6_csum_init 769 - -769 Total: Before=25074399, After=25074454, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260223093445.3696368-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 16:30:40 -08:00
Eric Dumazet	2550def53b	net: __lock_sock() can be static After commit `6511882cdd` ("mptcp: allocate fwd memory separately on the rx and tx path") __lock_sock() can be static again. Make sure __lock_sock() is not inlined, so that lock_sock_nested() no longer needs a stack canary. Add a noinline attribute on lock_sock_nested() so that calls to lock_sock() from net/core/sock.c are not inlined, none of them are fast path to deserve that: - sockopt_lock_sock() - sock_set_reuseport() - sock_set_reuseaddr() - sock_set_mark() - sock_set_keepalive() - sock_no_linger() - sock_bindtoindex() - sk_wait_data() - sock_set_rcvbuf() $ scripts/bloat-o-meter -t vmlinux.old vmlinux add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-312 (-312) Function old new delta __lock_sock 192 188 -4 __lock_sock_fast 239 86 -153 lock_sock_nested 227 72 -155 Total: Before=24888707, After=24888395, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260223092716.3673939-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 16:30:33 -08:00
Paolo Abeni	1348659dc9	bluetooth pull request for net: - purge error queues in socket destructors - hci_sync: Fix CIS host feature condition - L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ - L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short - L2CAP: Fix response to L2CAP_ECRED_CONN_REQ - L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ - L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ - hci_qca: Cleanup on all setup failures -----BEGIN PGP SIGNATURE----- iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmmcw1EZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKUTyD/4jtQwDrveC19zamF5n7lFY Oils6eftANcLFzLwTrMqGO7IxESga4qdNOf2vc/UgVSUfNqsPIUJ5El+LzpXZXAa sYBP/KudEX53CfU3fEVyPTUaWkZ4CdMRZeiCmgXqW7GxYbGw92SFuaSIHAP6Ep4s Z7Ryd1H0xhX9QPMc4g4IgoMiBiKzNs4GtlLSbDJcivAtbC/34nkMOxK9g+1DbU0F qzW+oPfYCpPzXTf20I1QIAMt5smnSM3Tuvo9u2pZRuEGpKjENxeY4hdAejfjeKA6 RLWXm6JvMP2lUBT68plMQQdYyQ8DxG75sVjgSoQYIu2YTVnsX76t/kD2hhiHXH/Y nQoy4dtA1/5V7Ka0cfMhcvino4Rb9Gh3dsFKJOuWRT+aTY+gNhpyr56SuJh24Y3C 7tUeEDI4fBkJGaRAbreVbaI5vw4kbSfi7IDOM/ccWDSLaG8HGaLOtn0IU8q4AgMa IkYzB5zwtiyM/zaSTO1k0HkpjR0wwftnTd+Fj2mUWdTwSeek64R9enmKYmg5UJrv 14yhfLHFsbAQo+o1B3ZslnCdYQJpgFmyAInV6Jpunc78IE9+g/YA55K22JbDDSzI t9Zy25OWLyYZyuD1PzDkMlYU5OARNYeyRXbJ3w037LrpqRoEuFsK0qTmgi+kR9C7 VR9IpCqgf4SJbL7ge83H8g== =JBaa -----END PGP SIGNATURE----- Merge tag 'for-net-2026-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - purge error queues in socket destructors - hci_sync: Fix CIS host feature condition - L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ - L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short - L2CAP: Fix response to L2CAP_ECRED_CONN_REQ - L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ - L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ - hci_qca: Cleanup on all setup failures * tag 'for-net-2026-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ Bluetooth: L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ Bluetooth: Fix CIS host feature condition Bluetooth: L2CAP: Fix response to L2CAP_ECRED_CONN_REQ Bluetooth: hci_qca: Cleanup on all setup failures Bluetooth: purge error queues in socket destructors Bluetooth: L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ ==================== Link: https://patch.msgid.link/20260223211634.3800315-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-24 15:03:08 +01:00
Sebastian Andrzej Siewior	983512f3a8	net: Drop the lock in skb_may_tx_timestamp() skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must not be taken in IRQ context, only softirq is okay. A few drivers receive the timestamp via a dedicated interrupt and complete the TX timestamp from that handler. This will lead to a deadlock if the lock is already write-locked on the same CPU. Taking the lock can be avoided. The socket (pointed by the skb) will remain valid until the skb is released. The ->sk_socket and ->file member will be set to NULL once the user closes the socket which may happen before the timestamp arrives. If we happen to observe the pointer while the socket is closing but before the pointer is set to NULL then we may use it because both pointer (and the file's cred member) are RCU freed. Drop the lock. Use READ_ONCE() to obtain the individual pointer. Add a matching WRITE_ONCE() where the pointer are cleared. Link: https://lore.kernel.org/all/20260205145104.iWinkXHv@linutronix.de Fixes: `b245be1f4d` ("net-timestamp: no-payload only sysctl") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260220183858.N4ERjFW6@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-24 11:27:29 +01:00
Luiz Augusto von Dentz	c28d2bff70	Bluetooth: L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short Test L2CAP/ECFC/BV-26-C expect the response to L2CAP_ECRED_CONN_REQ with and MTU value < L2CAP_ECRED_MIN_MTU (64) to be L2CAP_CR_LE_INVALID_PARAMS rather than L2CAP_CR_LE_UNACCEPT_PARAMS. Also fix not including the correct number of CIDs in the response since the spec requires all CIDs being rejected to be included in the response. Link: https://github.com/bluez/bluez/issues/1868 Fixes: `15f02b9105` ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-02-23 15:28:56 -05:00
Luiz Augusto von Dentz	7accb1c432	Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ This fixes responding with an invalid result caused by checking the wrong size of CID which should have been (cmd_len - sizeof(*req)) and on top of it the wrong result was use L2CAP_CR_LE_INVALID_PARAMS which is invalid/reserved for reconf when running test like L2CAP/ECFC/BI-03-C: > ACL Data RX: Handle 64 flags 0x02 dlen 14 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6 MTU: 64 MPS: 64 Source CID: 64 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reserved (0x000c) Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003) Fiix L2CAP/ECFC/BI-04-C which expects L2CAP_RECONF_INVALID_MPS (0x0002) when more than one channel gets its MPS reduced: > ACL Data RX: Handle 64 flags 0x02 dlen 16 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 8 MTU: 264 MPS: 99 Source CID: 64 ! Source CID: 65 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration successful (0x0000) Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002) Fix L2CAP/ECFC/BI-05-C when SCID is invalid (85 unconnected): > ACL Data RX: Handle 64 flags 0x02 dlen 14 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6 MTU: 65 MPS: 64 ! Source CID: 85 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration successful (0x0000) Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003) Fix L2CAP/ECFC/BI-06-C when MPS < L2CAP_ECRED_MIN_MPS (64): > ACL Data RX: Handle 64 flags 0x02 dlen 14 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6 MTU: 672 ! MPS: 63 Source CID: 64 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002) Result: Reconfiguration failed - other unacceptable parameters (0x0004) Fix L2CAP/ECFC/BI-07-C when MPS reduced for more than one channel: > ACL Data RX: Handle 64 flags 0x02 dlen 16 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 3 len 8 MTU: 84 ! MPS: 71 Source CID: 64 ! Source CID: 65 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration successful (0x0000) Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002) Link: https://github.com/bluez/bluez/issues/1865 Fixes: `15f02b9105` ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-02-23 15:23:37 -05:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Eric Dumazet	858d2a4f67	tcp: fix potential race in tcp_v6_syn_recv_sock() Code in tcp_v6_syn_recv_sock() after the call to tcp_v4_syn_recv_sock() is done too late. After tcp_v4_syn_recv_sock(), the child socket is already visible from TCP ehash table and other cpus might use it. Since newinet->pinet6 is still pointing to the listener ipv6_pinfo bad things can happen as syzbot found. Move the problematic code in tcp_v6_mapped_child_init() and call this new helper from tcp_v4_syn_recv_sock() before the ehash insertion. This allows the removal of one tcp_sync_mss(), since tcp_v4_syn_recv_sock() will call it with the correct context. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot+937b5bbb6a815b3e5d0b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69949275.050a0220.2eeac1.0145.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260217161205.2079883-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-19 14:02:19 -08:00
Linus Torvalds	8bf22c33e7	Including fixes from Netfilter. Current release - new code bugs: - net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT - eth: mlx5e: XSK, Fix unintended ICOSQ change - phy_port: correctly recompute the port's linkmodes - vsock: prevent child netns mode switch from local to global - couple of kconfig fixes for new symbols Previous releases - regressions: - nfc: nci: fix false-positive parameter validation for packet data - net: do not delay zero-copy skbs in skb_attempt_defer_free() Previous releases - always broken: - mctp: ensure our nlmsg responses to user space are zero-initialised - ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data() - fixes for ICMP rate limiting Misc: - intel: fix PCI device ID conflict between i40e and ipw2200 Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmmXUh8ACgkQMUZtbf5S IrufYA//ZVj+4gvegqKwKZYXNBndVW00GGTYqaILbaenK1olUVUelVB91eV2Klc/ dXCeKG/MgEPuT89IjkPzVr2Yg4x6uhjcQL1rsahORn+GuQfSI/P8y7ysDOPnHVeM Rtsg1m8z3EizJcHPeAJe7nEqFzfvZ2m+FCEGe++z8BYaUZUVApytgpIWOHO/aB+p t13bCNzd05XxPphMl610T00Fncj2jCVDHILMgTB5rmFmkeJuQwNrRGXQSoQame46 +g+yCZjT0eVTrBaH1EUssWfrOT3VJj3BEee6gSp7k9mxMkbW18i8shBgmxS+EHjk u19wwBzSrHK+JY1UExim+1E/rZisQVmEE1Gs0ALedxAu9zC/Julzfa2/+BFsc0j7 QTXd4jukG3aTPIX8v3TV2Igu0j+bAT4WdpzvnsXXBMVKy7wFYMd1+aSOLyFH2W9L qRbg50oUATcsz77bZt6YUTJEgua4HXNYGtn15FMZOR7HJVR2L44Q5TK5mQxGp5iM GabeKMzg6bsjE98STM3nbWks3pIb9ptIk++i0913eSqKgn84bDPtp3Gabfgle2SJ 8gjKS61K8rDt5x8StXVod7oGQ4asL8RJyOtE/avgbWUu9BNH8/oKqsE6TQrpXauv 1ndiyim/mPe4fBCxkVAi2+uq5/ph9z8XyleESz9VYwyL3Rl4nsg= =qSCj -----END PGP SIGNATURE----- Merge tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Netfilter. Current release - new code bugs: - net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT - eth: mlx5e: XSK, Fix unintended ICOSQ change - phy_port: correctly recompute the port's linkmodes - vsock: prevent child netns mode switch from local to global - couple of kconfig fixes for new symbols Previous releases - regressions: - nfc: nci: fix false-positive parameter validation for packet data - net: do not delay zero-copy skbs in skb_attempt_defer_free() Previous releases - always broken: - mctp: ensure our nlmsg responses to user space are zero-initialised - ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data() - fixes for ICMP rate limiting Misc: - intel: fix PCI device ID conflict between i40e and ipw2200" * tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits) net: nfc: nci: Fix parameter validation for packet data net/mlx5e: Use unsigned for mlx5e_get_max_num_channels net/mlx5e: Fix deadlocks between devlink and netdev instance locks net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event net/mlx5: Fix misidentification of write combining CQE during poll loop net/mlx5e: Fix misidentification of ASO CQE during poll loop net/mlx5: Fix multiport device check over light SFs bonding: alb: fix UAF in rlb_arp_recv during bond up/down bnge: fix reserving resources from FW eth: fbnic: Advertise supported XDP features. rds: tcp: fix uninit-value in __inet_bind net/rds: Fix NULL pointer dereference in rds_tcp_accept_one octeontx2-af: Fix default entries mcam entry action net/mlx5e: XSK, Fix unintended ICOSQ change ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow() inet: move icmp_global_{credit,stamp} to a separate cache line icmp: prevent possible overflow in icmp_global_allow() selftests/net: packetdrill: add ipv4-mapped-ipv6 tests ...	2026-02-19 10:39:08 -08:00
Eric Dumazet	87b08913a9	inet: move icmp_global_{credit,stamp} to a separate cache line icmp_global_credit was meant to be changed ~1000 times per second, but if an admin sets net.ipv4.icmp_msgs_per_sec to a very high value, icmp_global_credit changes can inflict false sharing to surrounding fields that are read mostly. Move icmp_global_credit and icmp_global_stamp to a separate cacheline aligned group. Fixes: `b056b4cd91` ("icmp: move icmp_global.credit and icmp_global.stamp to per netns storage") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260216142832.3834174-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-18 16:46:36 -08:00
Linus Torvalds	23b0f90ba8	Summary * Removed macros from proc handler converters Replace the proc converter macros with "regular" functions. Though it is more verbose than the macro version, it helps when debugging and better aligns with coding-style.rst. * General cleanup Remove superfluous ctl_table forward declarations. Const qualify the memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays. Add missing kernel doc to proc_dointvec_conv. * Testing This series was run through sysctl selftests/kunit test suite in x86_64. And went into linux-next after rc4, giving it a good 3 weeks of testing -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmmUabYACgkQupfNUreW QU8y2Qv/d2y35uQPRDh0HKWKWXJy41C2RJzd/rFCWJPCwo150whTSHIHkWYnu76g 10QblBXQmXi9TVqFnJ7Il7PWgqkMPjzA13tfT9eXNWU8j2OB/mcVKNl9X4wm/jWi QxtGmBsIQ/nxb2pUzMCykzgfc5mLi2NQ8qhZ5bOnq7UW3zdYmzEqx+tRdvIacyIk adComi5v8xUDqyEbVFaBovuX2WHQkPyBMnD64nwWG93JpNG/+9PxGzv/DNUXY11Y epVOfSoKdJbSLjYoHEPEhT0aHjSydq3QHru7uF6wzKOFTfHej/XkXXbUnFXPO2Pn c5J0u/HziYG5eN2QTqGfrhECZYuCFPemtUozltbcgGebkl1wKH+k9K5vsCaz/mhk ihUC3mui++W/n9B9HJRYh1XeEpk6C1pWERCOx27XFZ25fSek2YO6ZWkT0q+gceC0 t4+eIFSGJ3OzheJgHNK9XhTMWiQPmHyA6brXYGx4WeRvJFLpVddPF7k3Z89zIAu/ Fut7FGTH =0Z+I -----END PGP SIGNATURE----- Merge tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Remove macros from proc handler converters Replace the proc converter macros with "regular" functions. Though it is more verbose than the macro version, it helps when debugging and better aligns with coding-style.rst. - General cleanup Remove superfluous ctl_table forward declarations. Const qualify the memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays. Add missing kernel doc to proc_dointvec_conv. - Testing This series was run through sysctl selftests/kunit test suite in x86_64. And went into linux-next after rc4, giving it a good 3 weeks of testing * tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions sysctl: Replace unidirectional INT converter macros with functions sysctl: Add kernel doc to proc_douintvec_conv sysctl: Replace UINT converter macros with functions sysctl: Add CONFIG_PROC_SYSCTL guards for converter macros sysctl: clarify proc_douintvec_minmax doc sysctl: Return -ENOSYS from proc_douintvec_conv when CONFIG_PROC_SYSCTL=n sysctl: Remove unused ctl_table forward declarations loadpin: Implement custom proc_handler for enforce alloc_tag: move memory_allocation_profiling_sysctls into .rodata sysctl: Add missing kernel-doc for proc_dointvec_conv	2026-02-18 10:45:36 -08:00
Fernando Fernandez Mancera	9e371b0ba7	ipv6: addrconf: reduce default temp_valid_lft to 2 days This is a recommendation from RFC 8981 and it was intended to be changed by commit `969c54646a` ("ipv6: Implement draft-ietf-6man-rfc4941bis") but it only changed the sysctl documentation. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260214172543.5783-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-17 17:12:06 -08:00
Eric Dumazet	452a3eee22	ipv6: fix a race in ip6_sock_set_v6only() It is unlikely that this function will be ever called with isk->inet_num being not zero. Perform the check on isk->inet_num inside the locked section for complete safety. Fixes: `9b115749ac` ("ipv6: add ip6_sock_set_v6only") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260216102202.3343588-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-17 16:45:29 -08:00
Qanux	6db8b56eed	ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data() On the receive path, __ioam6_fill_trace_data() uses trace->nodelen to decide how much data to write for each node. It trusts this field as-is from the incoming packet, with no consistency check against trace->type (the 24-bit field that tells which data items are present). A crafted packet can set nodelen=0 while setting type bits 0-21, causing the function to write ~100 bytes past the allocated region (into skb_shared_info), which corrupts adjacent heap memory and leads to a kernel panic. Add a shared helper ioam6_trace_compute_nodelen() in ioam6.c to derive the expected nodelen from the type field, and use it: - in ioam6_iptunnel.c (send path, existing validation) to replace the open-coded computation; - in exthdrs.c (receive path, ipv6_hop_ioam) to drop packets whose nodelen is inconsistent with the type field, before any data is written. Per RFC 9197, bits 12-21 are each short (4-octet) fields, so they are included in IOAM6_MASK_SHORT_FIELDS (changed from 0xff100000 to 0xff1ffc00). Fixes: `9ee11f0fff` ("ipv6: ioam: Data plane support for Pre-allocated Trace") Cc: stable@vger.kernel.org Signed-off-by: Junxi Qian <qjx1298677004@gmail.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260211040412.86195-1-qjx1298677004@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-13 12:24:05 -08:00
Linus Torvalds	311aa68319	RDMA v7.0 merge window Usual smallish cycle: - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaY44vgAKCRCFwuHvBreF YfiZAP91cMZfogN7r1FMD75xDZu55dI3Jvy8OaixyRxlWLGPcQEAjritdL0o7fZp YrD1OXNS/1XG//rPBVw7xj+54Aa8hAU= =AVcu -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Usual smallish cycle. The NFS biovec work to push it down into RDMA instead of indirecting through a scatterlist is pretty nice to see, been talked about for a long time now. - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/mlx5: Implement DMABUF export ops RDMA/uverbs: Add DMABUF object type and operations RDMA/uverbs: Support external FD uobjects RDMA/siw: Fix potential NULL pointer dereference in header processing RDMA/umad: Reject negative data_len in ib_umad_write IB/core: Extend rate limit support for RC QPs RDMA/mlx5: Support rate limit only for Raw Packet QP RDMA/bnxt_re: Report QP rate limit in debugfs RDMA/bnxt_re: Report packet pacing capabilities when querying device RDMA/bnxt_re: Add support for QP rate limiting MAINTAINERS: Drop RDMA files from Hyper-V section RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc svcrdma: use bvec-based RDMA read/write API RDMA/core: add rdma_rw_max_sge() helper for SQ sizing RDMA/core: add MR support for bvec-based RDMA operations RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations RDMA/core: add bio_vec based RDMA read/write API RDMA/irdma: Use kvzalloc for paged memory DMA address array RDMA/rxe: Fix race condition in QP timer handlers RDMA/mana_ib: Add device‑memory support ...	2026-02-12 17:05:20 -08:00
Daniel Golle	85ee987429	net: dsa: add tag format for MxL862xx switches Add proprietary special tag format for the MaxLinear MXL862xx family of switches. While using the same Ethertype as MaxLinear's GSW1xx switches, the actual tag format differs significantly, hence we need a dedicated tag driver for that. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/c64e6ddb6c93a4fac39f9ab9b2d8bf551a2b118d.1770433307.git.daniel@makrotopia.org Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-11 11:27:57 +01:00
Eric Dumazet	a6eee39cc2	tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock() As explained in commit `85d05e2817` ("ipv6: change inet6_sk_rebuild_header() to use inet->cork.fl.u.ip6"): TCP v6 spends a good amount of time rebuilding a fresh fl6 at each transmit in inet6_csk_xmit()/inet6_csk_route_socket(). TCP v4 caches the information in inet->cork.fl.u.ip4 instead. After this patch, passive TCP ipv6 flows have correctly initialized inet->cork.fl.u.ip6 structure. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:50 -08:00
Jakub Kicinski	792aaea994	netfilter pull request nf-next-26-02-06 -----BEGIN PGP SIGNATURE----- iQJdBAABCABHFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmmGB20bFIAAAAAABAAO bWFudTIsMi41KzEuMTEsMiwyDRxmd0BzdHJsZW4uZGUACgkQcJGo2a1f9gC/tQ/7 B7/akiCP/QeGF7go78PZQlpIGmjtoCOcQ9uxymlmpLkArepcIEkgZ04tFH0FClY6 d3QPfT9iNap222aCQxZwCiaWrXqUNynW7RwH72SkqGmO8JTLKlzW8CQC+yGkyznj FxwRKzB8XO5Ohtw0wED3mzcf9DelsvJpX5rCU5gEjsHZjKA/rEwYgovyM+es+xSx JbHHc2tzLQuDZ1BL7rEW8TJDxmJ2bCsFJHKeIvykk3D2nVg01P0AwhUeIy+7ObV7 bQh7B8DhYwKNLtgZvDi8D6o4nWQvkjfF5BadrWusumDCtIupcwbelpcUeCsUWBqC oCjLMcH7TwmT513RXWMId50z93FWciduCHUGrQt4BxLBZmkQ9kE0iamZVIAAzLl8 VYIM9qb+nUk58jnLFl3xTqW2GetSj/p31bp6e78+SQFvqjie2z9/I+nGBr7A8aAB bNd5vpvHSEg5OP7oKk+Dhr26MiCDowtuzvdC4lYR+loFYoI+a1FS6a1w/kcw9/VA XmR6Y8is+CTy4XYTQZ4klYTVpoTkWa/D/t1CTC4IlELzYS49L6qSyef6m91IWeQ6 Way5+3ZON7sA6SM1PZ/zjsKDxYLo/hQz2+dw6YLVflfY62khvuK2Yc56MQcZEjsH 7x0b3MaKvNn9yqKC+Mk7QZ55nCjV3wyGp3GQ+ClAqZ4= =wU6p -----END PGP SIGNATURE----- Merge tag 'nf-next-26-02-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next The following patchset contains Netfilter updates for net-next: 1) Fix net-next-only use-after-free bug in nf_tables rbtree set: Expired elements cannot be released right away after unlink anymore because there is no guarantee that the binary-search blob is going to be updated. Spotted by syzkaller. 2) Fix esoteric bug in nf_queue with udp fraglist gro, broken since 6.11. Patch 3 adds extends the nfqueue selftest for this. 4) Use dedicated slab for flowtable entries, currently the -512 cache is used, which is wasteful. From Qingfang Deng. 5) Recent net-next update extended existing test for ip6ip6 tunnels, add the required /config entry. Test still passed by accident because the previous tests network setup gets re-used, so also update the test so it will fail in case the ip6ip6 tunnel interface cannot be added. 6) Fix 'nft get element mytable myset { 1.2.3.4 }' on big endian platforms, this was broken since code was added in v5.1. 7) Fix nf_tables counter reset support on 32bit platforms, where counter reset may cause huge values to appear due to wraparound. Broken since reset feature was added in v6.11. From Anders Grahn. 8-11) update nf_tables rbtree set type to detect partial operlaps. This will eventually speed up nftables userspace: at this time userspace does a netlink dump of the set content which slows down incremental updates on interval sets. From Pablo Neira Ayuso. * tag 'nf-next-26-02-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nft_set_rbtree: validate open interval overlap netfilter: nft_set_rbtree: validate element belonging to interval netfilter: nft_set_rbtree: check for partial overlaps in anonymous sets netfilter: nft_set_rbtree: fix bogus EEXIST with NLM_F_CREATE with null interval netfilter: nft_counter: fix reset of counters on 32bit archs netfilter: nft_set_hash: fix get operation on big endian selftests: netfilter: add IPV6_TUNNEL to config netfilter: flowtable: dedicated slab for flow entry selftests: netfilter: nft_queue.sh: add udp fraglist gro test case netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation netfilter: nft_set_rbtree: don't gc elements on insert ==================== Link: https://patch.msgid.link/20260206153048.17570-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:25:38 -08:00
Paolo Abeni	dc010e1b4b	xfrm: reduce struct sec_path size The mentioned struct has an hole and uses unnecessary wide type to store MAC length and indexes of very small arrays. It's also embedded into the skb_extensions, and the latter, due to recent CAN changes, may exceeds the 192 bytes mark (3 cachelines on x86_64 arch) on some reasonable configurations. Reordering and the sec_path fields, shrinking xfrm_offload.orig_mac_len to 16 bits and xfrm_offload.{len,olen,verified_cnt} to u8, we can save 16 bytes and keep skb_extensions size under control. Before: struct sec_path { int len; int olen; int verified_cnt; /* XXX 4 bytes hole, try to pack /$ struct xfrm_state xvec[6]; struct xfrm_offload ovec[1]; /* size: 88, cachelines: 2, members: 5 / / sum members: 84, holes: 1, sum holes: 4 / / last cacheline: 24 bytes / }; After: struct sec_path { struct xfrm_state xvec[6]; struct xfrm_offload ovec[1]; /* typedef u8 -> __u8 / unsigned char len; / typedef u8 -> __u8 / unsigned char olen; / typedef u8 -> __u8 / unsigned char verified_cnt; / size: 72, cachelines: 2, members: 5 / / padding: 1 / / last cacheline: 8 bytes */ }; Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Reviewed-by: Steffen Klassert <steffen.klassert@secunet.com> Link: https://patch.msgid.link/83846bd2e3fa08899bd0162e41bfadfec95e82ef.1770398071.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:21:48 -08:00
Vladimir Oltean	c22ba07c82	net: dsa: eliminate local type for tc policers David Yang is saying that struct flow_action_entry in include/net/flow_offload.h has gained new fields and DSA's struct dsa_mall_policer_tc_entry, derived from that, isn't keeping up. This structure is passed to drivers and they are completely oblivious to the values of fields they don't see. This has happened before, and almost always the solution was to make the DSA layer thinner and use the upstream data structures. Here, the reason why we didn't do that is because struct flow_action_entry :: police is an anonymous structure. That is easily enough fixable, just name those fields "struct flow_action_police" and reference them from DSA. Make the according transformations to the two users (sja1105 and felix): "rate_bytes_per_sec" -> "rate_bytes_ps". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Co-developed-by: David Yang <mmyangfl@gmail.com> Signed-off-by: David Yang <mmyangfl@gmail.com> Link: https://patch.msgid.link/20260206075427.44733-1-mmyangfl@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 15:30:11 +01:00
Alice Mikityanska	35f66ce900	net/ipv6: Remove HBH helpers Now that the HBH jumbo helpers are not used by any driver or GSO, remove them altogether. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-13-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:50:13 -08:00
Alice Mikityanska	b2936b4fd5	net/ipv6: Introduce payload_len helpers The next commits will transition away from using the hop-by-hop extension header to encode packet length for BIG TCP. Add wrappers around ip6->payload_len that return the actual value if it's non-zero, and calculate it from skb->len if payload_len is set to zero (and a symmetrical setter). The new helpers are used wherever the surrounding code supports the hop-by-hop jumbo header for BIG TCP IPv6, or the corresponding IPv4 code uses skb_ip_totlen (e.g., in include/net/netfilter/nf_tables_ipv6.h). No behavioral change in this commit. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-2-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:50:03 -08:00
Eric Dumazet	a35b6e4863	tcp: inline tcp_filter() This helper is already (auto)inlined from IPv4 TCP stack. Make it an inline function to benefit IPv6 as well. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 1/0 up/down: 30/-49 (-19) Function old new delta tcp_v6_rcv 3448 3478 +30 __pfx_tcp_filter 16 - -16 tcp_filter 33 - -33 Total: Before=24891904, After=24891885, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260205164329.3401481-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:12:11 -08:00
Qiliang Yuan	7acee67a6b	netns: optimize netns cleaning by batching unhash_nsid calls Currently, unhash_nsid() scans the entire system for each netns being killed, leading to O(L_dying_net * M_alive_net * N_id) complexity, as __peernet2id() also performs a linear search in the IDR. Optimize this to O(M_alive_net * N_id) by batching unhash operations. Move unhash_nsid() out of the per-netns loop in cleanup_net() to perform a single-pass traversal over survivor namespaces. Identify dying peers by an 'is_dying' flag, which is set under net_rwsem write lock after the netns is removed from the global list. This batches the unhashing work and eliminates the O(L_dying_net) multiplier. To minimize the impact on struct net size, 'is_dying' is placed in an existing hole after 'hash_mix' in struct net. Use a restartable idr_get_next() loop for iteration. This avoids the unsafe modification issue inherent to idr_for_each() callbacks and allows dropping the nsid_lock to safely call sleepy rtnl_net_notifyid(). Clean up redundant nsid_lock and simplify the destruction loop now that unhashing is centralized. Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204074854.3506916-1-realwujing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:01:31 -08:00
Pablo Neira Ayuso	648946966a	netfilter: nft_set_rbtree: validate open interval overlap Open intervals do not have an end element, in particular an open interval at the end of the set is hard to validate because of it is lacking the end element, and interval validation relies on such end element to perform the checks. This patch adds a new flag field to struct nft_set_elem, this is not an issue because this is a temporary object that is allocated in the stack from the insert/deactivate path. This flag field is used to specify that this is the last element in this add/delete command. The last flag is used, in combination with the start element cookie, to check if there is a partial overlap, eg. Already exists: 255.255.255.0-255.255.255.254 Add interval: 255.255.255.0-255.255.255.255 ~~~~~~~~~~~~~ start element overlap Basically, the idea is to check for an existing end element in the set if there is an overlap with an existing start element. However, the last open interval can come in any position in the add command, the corner case can get a bit more complicated: Already exists: 255.255.255.0-255.255.255.254 Add intervals: 255.255.255.0-255.255.255.255,255.255.255.0-255.255.255.254 ~~~~~~~~~~~~~ start element overlap To catch this overlap, annotate that the new start element is a possible overlap, then report the overlap if the next element is another start element that confirms that previous element in an open interval at the end of the set. For deletions, do not update the start cookie when deleting an open interval, otherwise this can trigger spurious EEXIST when adding new elements. Unfortunately, there is no NFT_SET_ELEM_INTERVAL_OPEN flag which would make easier to detect open interval overlaps. Fixes: `7c84d41416` ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-02-06 13:36:07 +01:00
Florian Westphal	207b3ebacb	netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation Ulrich reports a regression with nfqueue: If an application did not set the 'F_GSO' capability flag and a gso packet with an unconfirmed nf_conn entry is received all packets are now dropped instead of queued, because the check happens after skb_gso_segment(). In that case, we did have exclusive ownership of the skb and its associated conntrack entry. The elevated use count is due to skb_clone happening via skb_gso_segment(). Move the check so that its peformed vs. the aggregated packet. Then, annotate the individual segments except the first one so we can do a 2nd check at reinject time. For the normal case, where userspace does in-order reinjects, this avoids packet drops: first reinjected segment continues traversal and confirms entry, remaining segments observe the confirmed entry. While at it, simplify nf_ct_drop_unconfirmed(): We only care about unconfirmed entries with a refcnt > 1, there is no need to special-case dying entries. This only happens with UDP. With TCP, the only unconfirmed packet will be the TCP SYN, those aren't aggregated by GRO. Next patch adds a udpgro test case to cover this scenario. Reported-by: Ulrich Weber <ulrich.weber@gmail.com> Fixes: `7d8dc1c7be` ("netfilter: nf_queue: drop packets with cloned unconfirmed conntracks") Signed-off-by: Florian Westphal <fw@strlen.de>	2026-02-06 13:34:55 +01:00
Davide Caratti	a90f6dcefc	net/sched: don't use dynamic lockdep keys with clsact/ingress/noqueue Currently we are registering one dynamic lockdep key for each allocated qdisc, to avoid false deadlock reports when mirred (or TC eBPF) redirects packets to another device while the root lock is acquired [1]. Since dynamic keys are a limited resource, we can save them at least for qdiscs that are not meant to acquire the root lock in the traffic path, or to carry traffic at all, like: - clsact - ingress - noqueue Don't register dynamic keys for the above schedulers, so that we hit MAX_LOCKDEP_KEYS later in our tests. [1] https://github.com/multipath-tcp/mptcp_net-next/issues/451 Changes in v2: - change ordering of spin_lock_init() vs. lockdep_register_key() (Jakub Kicinski) Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/94448f7fa7c4f52d2ce416a4895ec87d456d7417.1770220576.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 09:32:45 -08:00
Eric Dumazet	22c1264415	tcp: move __reqsk_free() out of line Inlining __reqsk_free() is overkill, let's reclaim 2 Kbytes of text. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 2/4 grow/shrink: 2/14 up/down: 225/-2338 (-2113) Function old new delta __reqsk_free - 114 +114 sock_edemux 18 82 +64 inet_csk_listen_start 233 264 +31 __pfx___reqsk_free - 16 +16 __pfx_reqsk_queue_alloc 16 - -16 __pfx_reqsk_free 16 - -16 reqsk_queue_alloc 46 - -46 tcp_req_err 272 177 -95 reqsk_fastopen_remove 348 253 -95 cookie_bpf_check 157 62 -95 cookie_tcp_reqsk_alloc 387 290 -97 cookie_v4_check 1568 1465 -103 reqsk_free 105 - -105 cookie_v6_check 1519 1412 -107 sock_gen_put 187 78 -109 sock_pfree 212 82 -130 tcp_try_fastopen 1818 1683 -135 tcp_v4_rcv 3478 3294 -184 reqsk_put 306 90 -216 tcp_get_cookie_sock 551 318 -233 tcp_v6_rcv 3404 3141 -263 tcp_conn_request 2677 2384 -293 Total: Before=24887415, After=24885302, chg -0.01% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204055147.1682705-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 09:23:06 -08:00
Eric Dumazet	d5c5391554	inet: move reqsk_queue_alloc() to net/ipv4/inet_connection_sock.c Only called once from inet_csk_listen_start(), it can be static. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204055147.1682705-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 09:23:05 -08:00
David Yang	770e112634	flow_offload: add const qualifiers to function arguments Some functions do not modify the pointed-to data, but lack const qualifiers. Add const qualifiers to the arguments of flow_rule_match_has_control_flags() and flow_cls_offload_flow_rule(). Signed-off-by: David Yang <mmyangfl@gmail.com> Link: https://patch.msgid.link/20260204052839.198602-1-mmyangfl@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-05 16:24:22 +01:00
Oliver Hartkopp	96ea3a1e2d	can: add CAN skb extension infrastructure To remove the private CAN bus skb headroom infrastructure 8 bytes need to be stored in the skb. The skb extensions are a common pattern and an easy and efficient way to hold private data travelling along with the skb. We only need the skb_ext_add() and skb_ext_find() functions to allocate and access CAN specific content as the skb helpers to copy/clone/free skbs automatically take care of skb extensions and their final removal. This patch introduces the complete CAN skb extensions infrastructure: - add struct can_skb_ext in new file include/net/can.h - add include/net/can.h in MAINTAINERS - add SKB_EXT_CAN to skbuff.c and skbuff.h - select SKB_EXTENSIONS in Kconfig when CONFIG_CAN is enabled - check for existing CAN skb extensions in can_rcv() in af_can.c - add CAN skb extensions allocation at every skb_alloc() location - duplicate the skb extensions if cloning outgoing skbs (framelen/gw_hops) - introduce can_skb_ext_add() and can_skb_ext_find() helpers The patch also corrects an indention issue in the original code from 2018: Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202602010426.PnGrYAk3-lkp@intel.com/ Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20260201-can_skb_ext-v8-2-3635d790fe8b@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-05 11:58:39 +01:00
Randy Dunlap	a34b0e4e21	net/iucv: clean up iucv kernel-doc warnings Fix numerous (many) kernel-doc warnings in iucv.[ch]: - convert function documentation comments to a common (kernel-doc) look, even for static functions (without "/*") - use matching parameter and parameter description names - use better wording in function descriptions (Jakub & AI) - remove duplicate kernel-doc comments from the header file (Jakub) Examples: Warning: include/net/iucv/iucv.h:210 missing initial short description on line: iucv_unregister Warning: include/net/iucv/iucv.h:216 function parameter 'handle' not described in 'iucv_unregister' Warning: include/net/iucv/iucv.h:467 function parameter 'answer' not described in 'iucv_message_send2way' Warning: net/iucv/iucv.c:727 missing initial short description on line: * iucv_cleanup_queue Build-tested with both "make htmldocs" and "make ARCH=s390 defconfig all". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20260203075248.1177869-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-04 20:39:58 -08:00
Eric Dumazet	309dd99421	tcp: split tcp_check_space() in two parts tcp_check_space() is fat and not inlined. Move its slow path in (out of line) __tcp_check_space() and make tcp_check_space() an inline function for better TCP performance. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 2/2 grow/shrink: 4/0 up/down: 708/-582 (126) Function old new delta __tcp_check_space - 521 +521 tcp_rcv_established 1860 1916 +56 tcp_rcv_state_process 3342 3384 +42 tcp_event_new_data_sent 248 286 +38 tcp_data_snd_check 71 106 +35 __pfx___tcp_check_space - 16 +16 __pfx_tcp_check_space 16 - -16 tcp_check_space 566 - -566 Total: Before=24896373, After=24896499, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260203050932.3522221-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-04 20:37:06 -08:00
Jakub Kicinski	333225e1e9	Some more changes, including pulls from drivers: - ath drivers: small features/cleanups - rtw drivers: mostly refactoring for rtw89 RTL8922DE support - mac80211: use hrtimers for CAC to avoid too long delays - cfg80211/mac80211: some initial UHR (Wi-Fi 8) support -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmmDNuEACgkQ10qiO8sP aADhvA/+J35p2CDkffi1KfZxxx1YdHAAlj1zjhjLzshCMCG3oWzLpOL7se5bgN/C axPLPbeCAXtsRXln083lbwtrSRexPHSVhelPDNtybLPEocQYrksV8a6V3eWXCNTR ymN4iDaO/K0gLkDRKH5T8lwZvJttA6iHi+Fm4ir+dsr0O5vwwe4CuAEPA1SuZ2rh 0lQMz6pEzsxq+sZX3p8SoBwXx147l0n6gwMNIgBTKo1tjZha4oaavdvcqq4zaZWV WCcg4YVA/dWHL0UuwtIF8uQADM43quegBBUFx63QgzfgcnHAnBk2Ckeein/bfvnv XOKlI4UJi1cxTkTJkDOrSn5IwBzVSlBXE3qEUKKnu5G3+ZgfdsnWmSPeTtOndvAE rgbwwZb2SKH1kCvL0FDZTwq/iR9KF60ZfhWIq9Sz7m6VZxJoR8QACHglYCysj2JB B1+oT53EIqP7Ob4s/GN2Yg9M0l4Lv3E6J9g6h3b8yeq9qEXVF8MaVN683rtNpec9 mUqLRlcoToB2W/qvEVESKj8jMvajYZ6TDoO7mSP3paTW3HgMC3wlPJlDc4Q/6h7e LAKEljXlv6ofNGCcCL37l6KATqSZpIZn+tpSqbELIirWlc/rnTIDU2qZRb7MA1e1 3lKdrS6pOXGS1GJr7HWuLb4cX1SukyXNeyIcZJlSFoxG4oDPvwI= =/NUu -----END PGP SIGNATURE----- Merge tag 'wireless-next-2026-02-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Some more changes, including pulls from drivers: - ath drivers: small features/cleanups - rtw drivers: mostly refactoring for rtw89 RTL8922DE support - mac80211: use hrtimers for CAC to avoid too long delays - cfg80211/mac80211: some initial UHR (Wi-Fi 8) support * tag 'wireless-next-2026-02-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (59 commits) wifi: brcmsmac: phy: Remove unreachable error handling code wifi: mac80211: Add eMLSR/eMLMR action frame parsing support wifi: mac80211: add initial UHR support wifi: cfg80211: add initial UHR support wifi: ieee80211: add some initial UHR definitions wifi: mac80211: use wiphy_hrtimer_work for CAC timeout wifi: mac80211: correct ieee80211-{s1g/eht}.h include guard comments wifi: ath12k: clear stale link mapping of ahvif->links_map wifi: ath12k: Add support TX hardware queue stats wifi: ath12k: Add support RX PDEV stats wifi: ath12k: Fix index decrement when array_len is zero wifi: ath12k: support OBSS PD configuration for AP mode wifi: ath12k: add WMI support for spatial reuse parameter configuration dt-bindings: net: wireless: ath11k-pci: deprecate 'firmware-name' property wifi: ath11k: add usecase firmware handling based on device compatible wifi: ath10k: sdio: add missing lock protection in ath10k_sdio_fw_crashed_dump() wifi: ath10k: fix lock protection in ath10k_wmi_event_peer_sta_ps_state_chg() wifi: ath10k: snoc: support powering on the device via pwrseq wifi: rtw89: pci: warn if SPS OCP happens for RTL8922DE wifi: rtw89: pci: restore LDO setting after device resume ... ==================== Link: https://patch.msgid.link/20260204121143.181112-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-04 20:31:05 -08:00
Chia-Yu Chang	4fa4ac5e58	tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_info Add 2-bit tcpi_ecn_mode feild within tcp_info to indicate which ECN mode is negotiated: ECN_MODE_DISABLED, ECN_MODE_RFC3168, ECN_MODE_ACCECN, or ECN_MODE_PENDING. This is done by utilizing available bits from tcpi_accecn_opt_seen (reduced from 16 bits to 2 bits) and tcpi_accecn_fail_mode (reduced from 16 bits to 4 bits). Also, an extra 24-bit tcpi_options2 field is identified to represent newer options and connection features, as all 8 bits of tcpi_options field have been used. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-14-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-03 15:13:25 +01:00
Chia-Yu Chang	1247fb19ca	tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST Detect spurious retransmission of a previously sent ACK carrying the AccECN option after the second retransmission. Since this might be caused by the middlebox dropping ACK with options it does not recognize, disable the sending of the AccECN option in all subsequent ACKs. This patch follows Section 3.2.3.2.2 of AccECN spec (RFC9768), and a new field (accecn_opt_sent_w_dsack) is added to indicate that an AccECN option was sent with duplicate SACK info. Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl: (TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and persistently sends AccECN option once it fits into TCP option space. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-13-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-03 15:13:25 +01:00
Chia-Yu Chang	2ed661248e	tcp: accecn: fallback outgoing half link to non-AccECN According to Section 3.2.2.1 of AccECN spec (RFC9768), if the Server is in AccECN mode and in SYN-RCVD state, and if it receives a value of zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the connection the Server MUST NOT set ECT on outgoing packets and MUST NOT respond to AccECN feedback. Nonetheless, as a Data Receiver it MUST NOT disable AccECN feedback. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-12-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-03 15:13:25 +01:00
Chia-Yu Chang	f326f1f17f	tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK For Accurate ECN, the first SYN/ACK sent by the TCP server shall set the ACE flag (Table 1 of RFC9768) and the AccECN option to complete the capability negotiation. However, if the TCP server needs to retransmit such a SYN/ACK (for example, because it did not receive an ACK acknowledging its SYN/ACK, or received a second SYN requesting AccECN support), the TCP server retransmits the SYN/ACK without the AccECN option. This is because the SYN/ACK may be lost due to congestion, or a middlebox may block the AccECN option. Furthermore, if this retransmission also times out, to expedite connection establishment, the TCP server should retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and without the AccECN option, while maintaining AccECN feedback mode. This complies with Section 3.2.3.2.2 of the AccECN spec RFC9768. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-10-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-03 15:13:24 +01:00
Chia-Yu Chang	f1eaea5585	tcp: add TCP_SYNACK_RETRANS synack_type Before this patch, retransmitted SYN/ACK did not have a specific synack_type; however, the upcoming patch needs to distinguish between retransmitted and non-retransmitted SYN/ACK for AccECN negotiation to transmit the fallback SYN/ACK during AccECN negotiation. Therefore, this patch introduces a new synack_type (TCP_SYNACK_RETRANS). Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-9-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-03 15:13:24 +01:00
Chia-Yu Chang	c5ff6b8371	tcp: accecn: handle unexpected AccECN negotiation feedback According to Sections 3.1.2 and 3.1.3 of AccECN spec (RFC9768). In Section 3.1.2, it says an AccECN implementation has no need to recognize or support the Server response labelled 'Nonce' or ECN-nonce feedback more generally, as RFC 3540 has been reclassified as Historic. AccECN is compatible with alternative ECN feedback integrity approaches to the nonce. The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is reserved for future use. A TCP Client (A) that receives such a SYN/ACK follows the procedure for forward compatibility given in Section 3.1.3. Then in Section 3.1.3, it says if a TCP Client has sent a SYN requesting AccECN feedback with (AE,CWR,ECE) = (1,1,1) then receives a SYN/ACK with the currently reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have logic specific to such a combination, the Client MUST enable AccECN mode as if the SYN/ACK onfirmed that the Server supported AccECN and as if it fed back that the IP-ECN field on the SYN had arrived unchanged. Fixes: `3cae34274c` ("tcp: accecn: AccECN negotiation"). Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-7-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-03 15:13:24 +01:00
Chia-Yu Chang	e68c28f22f	tcp: disable RFC3168 fallback identifier for CC modules When AccECN is not successfully negociated for a TCP flow, it defaults fallback to classic ECN (RFC3168). However, L4S service will fallback to non-ECN. This patch enables congestion control module to control whether it should not fallback to classic ECN after unsuccessful AccECN negotiation. A new CA module flag (TCP_CONG_NO_FALLBACK_RFC3168) identifies this behavior expected by the CA. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-6-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-03 15:13:24 +01:00
Chia-Yu Chang	100f946b8d	tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers Two flags for congestion control (CC) module are added in this patch related to AccECN negotiation. First, a new flag (TCP_CONG_NEEDS_ACCECN) defines that the CC expects to negotiate AccECN functionality using the ECE, CWR and AE flags in the TCP header. Second, during ECN negotiation, ECT(0) in the IP header is used. This patch enables CC to control whether ECT(0) or ECT(1) should be used on a per-segment basis. A new flag (TCP_CONG_ECT_1_NEGOTIATION) defines the expected ECT value in the IP header by the CA when not-yet initialized for the connection. The detailed AccECN negotiaotn can be found in IETF RFC9768. Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com> Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com> Signed-off-by: Ilpo Järvinen <ij@kernel.org> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-5-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-03 15:13:24 +01:00
Geliang Tang	2d85088d46	tcp: export tcp_splice_state Export struct tcp_splice_state and tcp_splice_data_recv() in net/tcp.h so that they can be used by MPTCP in the next patch. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-3-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 18:15:32 -08:00
Eric Dumazet	b409a7f717	ipv6: colocate inet6_cork in inet_cork_full All inet6_cork users also use one inet_cork_full. Reduce number of parameters and increase data locality. This saves ~275 bytes of code on x86_64. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:30 -08:00
Eric Dumazet	8776c4ef3a	inet: add dst4_mtu() and dst6_mtu() helpers With CONFIG_MITIGATION_RETPOLINE=y dst_mtu() is a bit fat, because it is generic. Indeed, clang does not always inline it. Add dst4_mtu() and dst6_mtu() helpers for callers that expect either ipv4_mtu() or ip6_mtu() to be called. These helpers are always inlined. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:29 -08:00
Eric Dumazet	1bc46dd209	ipv6: pass proto by value to ipv6_push_nfrag_opts() and ipv6_push_frag_opts() With CONFIG_STACKPROTECTOR_STRONG=y, it is better to avoid passing a pointer to an automatic variable. Change these exported functions to return 'u8 proto' instead of void. - ipv6_push_nfrag_opts() - ipv6_push_frag_opts() For instance, replace ipv6_push_frag_opts(skb, opt, &proto); with: proto = ipv6_push_frag_opts(skb, opt, proto); Note that even after this change, ip6_xmit() has to use a stack canary because of @first_hop variable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:28 -08:00
Eric Dumazet	82f35bec11	net: l3mdev: use skb_dst_dev_rcu() in l3mdev_l3_out() Extend the RCU section a bit so that we can use the safer skb_dst_dev_rcu() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130191906.3781856-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:09:11 -08:00
Lorenzo Bianconi	0d95280a2d	wifi: mac80211: Add eMLSR/eMLMR action frame parsing support Introduce support in AP mode for parsing of the Operating Mode Notification frame sent by the client to enable/disable MLO eMLSR or eMLMR if supported by both the AP and the client. Add drv_set_eml_op_mode mac80211 callback in order to configure underlay driver with eMLSR/eMLMR info. Tested-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260129-mac80211-emlsr-v4-1-14bdadf57380@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-02-02 10:11:18 +01:00
Johannes Berg	a108511471	wifi: mac80211: add initial UHR support Add support for making UHR connections and accepting AP stations with UHR support. Link: https://patch.msgid.link/20260130164259.7185980484eb.Ieec940b58dbf8115dab7e1e24cb5513f52c8cb2f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-02-02 10:11:08 +01:00
Johannes Berg	072e6f7f41	wifi: cfg80211: add initial UHR support Add initial support for making UHR connections (or suppressing that), adding UHR capable stations on the AP side, encoding and decoding UHR MCSes (except rate calculation for the new MCSes 17, 19, 20 and 23) as well as regulatory support. Link: https://patch.msgid.link/20260130164259.54cc12fbb307.I26126bebd83c7ab17e99827489f946ceabb3521f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-02-02 10:11:07 +01:00
Ethan Nelson-Moore	82fff3b055	net: ax25: remove plumbing for never-implemented DAMA Master support The AX25_DAMA_MASTER option has been unimplemented and marked broken ever since it was introduced in 2007 in commit `954b2e7f4c` ("[NET] AX.25 Kconfig and docs updates and fixes"). At this point, it is very unlikely it will be implemented. Remove it. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Link: https://patch.msgid.link/20260129080908.44710-1-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-30 19:19:39 -08:00
Eric Dumazet	ed9b70040d	tcp: reduce tcp sockets size by one cache line By default, when a kmem_cache is created with SLAB_TYPESAFE_BY_RCU, slub has to use extra storage for the freelist pointer after each object, because slub assumes that any bit in the object can be used by RCU readers. Because proto_register() is also using SLAB_HWCACHE_ALIGN, this forces slub to use one extra cache line per object. We can instead put the slub freelist anywhere in the object, granted the concurrent RCU readers are not supposed to use the pointer value. Add a new (struct sock)sk_freeptr field, in an union with sk_rcu: No RCU readers would need to look at sk_rcu, which is only used at free phase. Tested: grep . /sys/kernel/slab/TCP/{object_size,slab_size,objs_per_slab} grep . /sys/kernel/slab/TCPv6/{object_size,slab_size,objs_per_slab} Before: /sys/kernel/slab/TCP/object_size:2368 /sys/kernel/slab/TCP/slab_size:2432 /sys/kernel/slab/TCP/objs_per_slab:13 /sys/kernel/slab/TCPv6/object_size:2496 /sys/kernel/slab/TCPv6/slab_size:2560 /sys/kernel/slab/TCPv6/objs_per_slab:12 After this patch, we can pack one more TCPv6 object per slab, and object_size == slab_size. /sys/kernel/slab/TCP/object_size:2368 /sys/kernel/slab/TCP/slab_size:2368 /sys/kernel/slab/TCP/objs_per_slab:13 /sys/kernel/slab/TCPv6/object_size:2496 /sys/kernel/slab/TCPv6/slab_size:2496 /sys/kernel/slab/TCPv6/objs_per_slab:13 Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260129153458.4163797-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-30 17:15:51 -08:00
Jakub Kicinski	303c1a66a2	Another fairly large set of changes, notably: - cfg80211/mac80211 - most of EPPKE/802.1X over auth frames support - additional FTM capabilities - split up drop reasons better, removing generic RX_DROP - NAN cleanups/fixes - ath11k: - support for Channel Frequency Response measurement - ath12k: - support for the QCC2072 chipset - iwlwifi: - partial NAN support - UNII-9 support - some UHR/802.11bn FW APIs - remove most of MLO/EHT from iwlmvm (such devices use iwlmld) - rtw89: - preparations for RTL8922DE support -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAml7PQcACgkQ10qiO8sP aAAKiA/6AnyNxa0bX2VFsWYW6KJYnJBVNLlP2ghkV3uIWtJoZdXuQO+W8/cy9Cng yrhfPzNfT+2hqmxasxI0tND3H3tW9CqcwX80J84eP9JCpYuPept9uGpSxPQoQl5J Q2k9gX1NlO/SEa8/mOFDT4EmH0bQobxiN84kxSg6Riaazkj6ZjHVVm/3PgzNhxlA v77m5thlhopzYxKn38qA19E9uHSLcY7XwkeYOZDf00Zhgot29lmDeHOf39IH+HvI +a20q6tW59D7iX2IUyvLnWzFV1iEcJ6ONF/hYJ0r3TlfmX/NDWfOQxx87K8M1Tqh sMa+FGrFdqloE1aYi1l+9m6Wu30pHmh7vhlgskPffPmvG+RkCEQCg1Me7eoFOzTB 81K2CMJ34Cp9se+QdiBtY5GpRPZIOlFmY6ZVyZIoEXHkn6r0R94e6dsMZuFcqjv1 y1dzv7BnraVMAQcqwkE9pQtq6LeJoHl2OUT2JzjbKhQhivMf9YubPBZ2QC1LZdMg NYEX4XSeJ/etpUk1MZFnm5wOw545tMi3U2sAhpYWbE6UBPDrQBvYADqd3lq3DmWe BdCDHTbqMnAJ3C0xFEKTYTmVF8IoFt6eOclFUPw4Uhq+YmU9x8wx1yBQbF9TjyKU a/rDCahmryj5gwD0QFJKhdQjfKaQFVNZWZqaKaokM84+8kIdA2U= =70Rs -----END PGP SIGNATURE----- Merge tag 'wireless-next-2026-01-29' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Another fairly large set of changes, notably: - cfg80211/mac80211 - most of EPPKE/802.1X over auth frames support - additional FTM capabilities - split up drop reasons better, removing generic RX_DROP - NAN cleanups/fixes - ath11k: - support for Channel Frequency Response measurement - ath12k: - support for the QCC2072 chipset - iwlwifi: - partial NAN support - UNII-9 support - some UHR/802.11bn FW APIs - remove most of MLO/EHT from iwlmvm (such devices use iwlmld) - rtw89: - preparations for RTL8922DE support * tag 'wireless-next-2026-01-29' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (184 commits) wifi: iwlegacy: add missing mutex protection in il4965_store_tx_power() wifi: iwlegacy: add missing mutex protection in il3945_store_measurement() wifi: mac80211: use u64_stats_t with u64_stats_sync properly wifi: p54: Fix memory leak in p54_beacon_update() wifi: cfg80211: treat deprecated INDOOR_SP_AP_OLD control value as LPI mode wifi: rtw88: sdio: Migrate to use sdio specific shutdown function wifi: rsi: sdio: Migrate to use sdio specific shutdown function sdio: Provide a bustype shutdown function wifi: nl80211/cfg80211: support operating as RSTA in PMSR FTM request wifi: nl80211/cfg80211: add negotiated burst period to FTM result wifi: nl80211/cfg80211: clarify periodic FTM parameters for non-EDCA based ranging wifi: nl80211/cfg80211: add new FTM capabilities wifi: iwlwifi: rename struct iwl_mcc_allowed_ap_type_cmd::offset_map wifi: iwlwifi: mvm: Remove link_id from time_events wifi: iwlwifi: mld: change cluster_id type to u8 array wifi: iwlwifi: support V13 of iwl_lari_config_change_cmd wifi: iwlwifi: split bios_value_u32 to separate the header wifi: iwlwifi: uefi: cache the DSM functions wifi: iwlwifi: acpi: cache the DSM functions wifi: iwlwifi: mvm: Cleanup MLO code ... ==================== Link: https://patch.msgid.link/20260129110136.176980-39-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-29 19:17:43 -08:00
Eric Dumazet	b1cd687e3e	ipv6: optimize fl6_update_dst() fl6_update_dst() is called for every TCP (and others) transmit, and is a nop for common cases. Split it in two parts : 1) fl6_update_dst() inline helper, small and fast. 2) __fl6_update_dst() for the exception, out of line. Small size increase to get better TX performance. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 2/2 grow/shrink: 8/0 up/down: 296/-125 (171) Function old new delta __fl6_update_dst - 104 +104 rawv6_sendmsg 2244 2284 +40 udpv6_sendmsg 3013 3043 +30 tcp_v6_connect 1514 1534 +20 cookie_v6_check 1501 1519 +18 ip6_datagram_dst_update 673 690 +17 inet6_sk_rebuild_header 499 516 +17 inet6_csk_route_socket 507 524 +17 inet6_csk_route_req 343 360 +17 __pfx___fl6_update_dst - 16 +16 __pfx_fl6_update_dst 16 - -16 fl6_update_dst 109 - -109 Total: Before=22570304, After=22570475, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260128185548.3738781-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-29 18:47:21 -08:00
Jakub Kicinski	a010fe8d86	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.19-rc8). No adjacent changes, conflicts: drivers/net/ethernet/spacemit/k1_emac.c `2c84959167` ("net: spacemit: Check for netif_carrier_ok() in emac_stats_update()") `f66086798f` ("net: spacemit: Remove broken flow control support") https://lore.kernel.org/aXjAqZA3iEWD_DGM@sirena.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-29 17:28:54 -08:00
Luiz Augusto von Dentz	6c3ea155e5	Bluetooth: L2CAP: Fix not tracking outstanding TX ident This attempts to proper track outstanding request by using struct ida and allocating from it in l2cap_get_ident using ida_alloc_range which would reuse ids as they are free, then upon completion release the id using ida_free. This fixes the qualification test case L2CAP/COS/CED/BI-29-C which attempts to check if the host stack is able to work after 256 attempts to connect which requires Ident field to use the full range of possible values in order to pass the test. Link: https://github.com/bluez/bluez/issues/1829 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>	2026-01-29 13:36:35 -05:00
Luiz Augusto von Dentz	0e2a6af810	Bluetooth: Fix using PHYs bitfields as PHY value This renames the PHY fields in bt_iso_io_qos to PHYs (plural) since it represents a bitfield where multiple PHYs can be set and make the same change also to HCI_OP_LE_SET_CIG_PARAMS since both c_phy and p_phy fields are bitfields. This also fixes the assumption that hci_evt_le_cis_established PHYs fields are compatible with bt_iso_io_qos, they are not, the fields in hci_evt_le_cis_established represent just a single PHY value so they need to be converted to bitfield when set in bt_iso_io_qos. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-01-29 13:27:47 -05:00
Luiz Augusto von Dentz	132c0779d4	Bluetooth: L2CAP: Add support for setting BT_PHY This enables client to use setsockopt(BT_PHY) to set the connection packet type/PHY: Example setting BT_PHY_BR_1M_1SLOT: < HCI Command: Change Conne.. (0x01\|0x000f) plen 4 Handle: 1 Address: 00:AA:01:01:00:00 (Intel Corporation) Packet type: 0x331e 2-DH1 may not be used 3-DH1 may not be used DM1 may be used DH1 may be used 2-DH3 may not be used 3-DH3 may not be used 2-DH5 may not be used 3-DH5 may not be used > HCI Event: Command Status (0x0f) plen 4 Change Connection Packet Type (0x01\|0x000f) ncmd 1 Status: Success (0x00) > HCI Event: Connection Packet Typ.. (0x1d) plen 5 Status: Success (0x00) Handle: 1 Address: 00:AA:01:01:00:00 (Intel Corporation) Packet type: 0x331e 2-DH1 may not be used 3-DH1 may not be used DM1 may be used DH1 may be used 2-DH3 may not be used 3-DH3 may not be used 2-DH5 may not be used Example setting BT_PHY_LE_1M_TX and BT_PHY_LE_1M_RX: < HCI Command: LE Set PHY (0x08\|0x0032) plen 7 Handle: 1 Address: 00:AA:01:01:00:00 (Intel Corporation) All PHYs preference: 0x00 TX PHYs preference: 0x01 LE 1M RX PHYs preference: 0x01 LE 1M PHY options preference: Reserved (0x0000) > HCI Event: Command Status (0x0f) plen 4 LE Set PHY (0x08\|0x0032) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 6 LE PHY Update Complete (0x0c) Status: Success (0x00) Handle: 1 Address: 00:AA:01:01:00:00 (Intel Corporation) TX PHY: LE 1M (0x01) RX PHY: LE 1M (0x01) Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-01-29 13:25:34 -05:00
Naga Bhavani Akella	fe05e3c059	Bluetooth: hci_sync: Add LE Channel Sounding HCI Command/event structures 1. Implement LE Event Mask to include events required for LE Channel Sounding 2. Enable Channel Sounding feature bit in the LE Host Supported Features command 3. Define HCI command and event structures necessary for LE Channel Sounding functionality Signed-off-by: Naga Bhavani Akella <naga.akella@oss.qualcomm.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-01-29 13:24:48 -05:00
Luiz Augusto von Dentz	129d1ef3c5	Bluetooth: hci_conn: Fix using conn->le_{tx,rx}_phy as supported PHYs conn->le_{tx,rx}_phy is not actually a bitfield as it set by HCI_EV_LE_PHY_UPDATE_COMPLETE it is actually correspond to the current PHY in use not what is supported by the controller, so this introduces different fields (conn->le_{tx,rx}_def_phys) to track what PHYs are supported by the connection. Fixes: `eab2404ba7` ("Bluetooth: Add BT_PHY socket option") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-01-29 13:21:40 -05:00
Scott Mitchell	e19079adcd	netfilter: nfnetlink_queue: optimize verdict lookup with hash table The current implementation uses a linear list to find queued packets by ID when processing verdicts from userspace. With large queue depths and out-of-order verdicting, this O(n) lookup becomes a significant bottleneck, causing userspace verdict processing to dominate CPU time. Replace the linear search with a hash table for O(1) average-case packet lookup by ID. A global rhashtable spanning all network namespaces attributes hash bucket memory to kernel but is subject to fixed upper bound. Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-29 09:52:07 +01:00
Kuniyuki Iwashima	d2492688bb	nfc: nci: Fix race between rfkill and nci_unregister_device(). syzbot reported the splat below [0] without a repro. It indicates that struct nci_dev.cmd_wq had been destroyed before nci_close_device() was called via rfkill. nci_dev.cmd_wq is only destroyed in nci_unregister_device(), which (I think) was called from virtual_ncidev_close() when syzbot close()d an fd of virtual_ncidev. The problem is that nci_unregister_device() destroys nci_dev.cmd_wq first and then calls nfc_unregister_device(), which removes the device from rfkill by rfkill_unregister(). So, the device is still visible via rfkill even after nci_dev.cmd_wq is destroyed. Let's unregister the device from rfkill first in nci_unregister_device(). Note that we cannot call nfc_unregister_device() before nci_close_device() because 1) nfc_unregister_device() calls device_del() which frees all memory allocated by devm_kzalloc() and linked to ndev->conn_info_list 2) nci_rx_work() could try to queue nci_conn_info to ndev->conn_info_list which could be leaked Thus, nfc_unregister_device() is split into two functions so we can remove rfkill interfaces only before nci_close_device(). [0]: DEBUG_LOCKS_WARN_ON(1) WARNING: kernel/locking/lockdep.c:238 at hlock_class kernel/locking/lockdep.c:238 [inline], CPU#0: syz.0.8675/6349 WARNING: kernel/locking/lockdep.c:238 at check_wait_context kernel/locking/lockdep.c:4854 [inline], CPU#0: syz.0.8675/6349 WARNING: kernel/locking/lockdep.c:238 at __lock_acquire+0x39d/0x2cf0 kernel/locking/lockdep.c:5187, CPU#0: syz.0.8675/6349 Modules linked in: CPU: 0 UID: 0 PID: 6349 Comm: syz.0.8675 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/13/2026 RIP: 0010:hlock_class kernel/locking/lockdep.c:238 [inline] RIP: 0010:check_wait_context kernel/locking/lockdep.c:4854 [inline] RIP: 0010:__lock_acquire+0x3a4/0x2cf0 kernel/locking/lockdep.c:5187 Code: 18 00 4c 8b 74 24 08 75 27 90 e8 17 f2 fc 02 85 c0 74 1c 83 3d 50 e0 4e 0e 00 75 13 48 8d 3d 43 f7 51 0e 48 c7 c6 8b 3a de 8d <67> 48 0f b9 3a 90 31 c0 0f b6 98 c4 00 00 00 41 8b 45 20 25 ff 1f RSP: 0018:ffffc9000c767680 EFLAGS: 00010046 RAX: 0000000000000001 RBX: 0000000000040000 RCX: 0000000000080000 RDX: ffffc90013080000 RSI: ffffffff8dde3a8b RDI: ffffffff8ff24ca0 RBP: 0000000000000003 R08: ffffffff8fef35a3 R09: 1ffffffff1fde6b4 R10: dffffc0000000000 R11: fffffbfff1fde6b5 R12: 00000000000012a2 R13: ffff888030338ba8 R14: ffff888030338000 R15: ffff888030338b30 FS: 00007fa5995f66c0(0000) GS:ffff8881256f8000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7e72f842d0 CR3: 00000000485a0000 CR4: 00000000003526f0 Call Trace: <TASK> lock_acquire+0x106/0x330 kernel/locking/lockdep.c:5868 touch_wq_lockdep_map+0xcb/0x180 kernel/workqueue.c:3940 __flush_workqueue+0x14b/0x14f0 kernel/workqueue.c:3982 nci_close_device+0x302/0x630 net/nfc/nci/core.c:567 nci_dev_down+0x3b/0x50 net/nfc/nci/core.c:639 nfc_dev_down+0x152/0x290 net/nfc/core.c:161 nfc_rfkill_set_block+0x2d/0x100 net/nfc/core.c:179 rfkill_set_block+0x1d2/0x440 net/rfkill/core.c:346 rfkill_fop_write+0x461/0x5a0 net/rfkill/core.c:1301 vfs_write+0x29a/0xb90 fs/read_write.c:684 ksys_write+0x150/0x270 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fa59b39acb9 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fa5995f6028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fa59b615fa0 RCX: 00007fa59b39acb9 RDX: 0000000000000008 RSI: 0000200000000080 RDI: 0000000000000007 RBP: 00007fa59b408bf7 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fa59b616038 R14: 00007fa59b615fa0 R15: 00007ffc82218788 </TASK> Fixes: `6a2968aaf5` ("NFC: basic NCI protocol implementation") Reported-by: syzbot+f9c5fd1a0874f9069dce@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/695e7f56.050a0220.1c677c.036c.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260127040411.494931-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-28 19:32:26 -08:00
Eric Dumazet	d5fb143dbe	tcp: move tcp_rack_advance() to tcp_input.c tcp_rack_advance() is called from tcp_ack() and tcp_sacktag_one(). Moving it to tcp_input.c allows the compiler to inline it and save both space and cpu cycles in TCP fast path. $ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2 add/remove: 0/2 grow/shrink: 1/1 up/down: 98/-132 (-34) Function old new delta tcp_ack 5741 5839 +98 tcp_sacktag_one 407 395 -12 __pfx_tcp_rack_advance 16 - -16 tcp_rack_advance 104 - -104 Total: Before=22572680, After=22572646, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260127032147.3498272-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-28 19:31:51 -08:00
Eric Dumazet	629a68865a	tcp: move tcp_rack_update_reo_wnd() to tcp_input.c tcp_rack_update_reo_wnd() is called only once from tcp_ack() Move it to tcp_input.c so that it can be inlined by the compiler to save space and cpu cycles. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 1/0 up/down: 110/-153 (-43) Function old new delta tcp_ack 5631 5741 +110 __pfx_tcp_rack_update_reo_wnd 16 - -16 tcp_rack_update_reo_wnd 137 - -137 Total: Before=22572723, After=22572680, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260127032147.3498272-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-28 19:31:51 -08:00
Konstantin Taranov	a01745ccf7	RDMA/mana_ib: Add device‑memory support Introduce a basic DM implementation that enables creating and registering device memory, and using the associated memory keys for networking operations. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/20260127082649.429018-1-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-27 09:16:11 -05:00
Pagadala Yesu Anjaneyulu	fd5bfcf430	wifi: cfg80211: treat deprecated INDOOR_SP_AP_OLD control value as LPI mode Although value 4 (INDOOR_SP_AP_OLD) is deprecated in IEEE standards, existing APs may still use this control value. Since this value is based on the old specification, we cannot trust such APs implement proper power controls. Therefore, move IEEE80211_6GHZ_CTRL_REG_INDOOR_SP_AP_OLD case from SP_AP to LPI_AP power type handling to prevent potential power limit violations. Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260111163601.6b5a36d3601e.I1704ee575fd25edb0d56f48a0a3169b44ef72ad0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-27 13:42:26 +01:00
Avraham Stern	853800c746	wifi: nl80211/cfg80211: support operating as RSTA in PMSR FTM request Add an option to operate as the RSTA in an FTM measurement request. When requested, the device will dwell on the requested channel until the peer starts the FTM negotiation. This option is only valid for trigger-based/non trigger-based measurement with LMR feedback which will allow the RSTA to receive the results of the measurement. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260111190221.1f95fc0afab4.Iae2d32783b8e7c4a29089fec0f4c6bce94d303cc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-27 13:40:38 +01:00
Avraham Stern	cfd46d1c6f	wifi: nl80211/cfg80211: add negotiated burst period to FTM result The FTM result includes some of the periodic measurement negotiated parameters (like the burst duration and number of bursts), but it doesn't include the burst period. Add it to the FTM result notification. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260111190221.e0778f86edef.I3c98c1933eb639963bc3ffdef81a8788b59f2188@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-27 13:40:36 +01:00
Avraham Stern	853ce6943c	wifi: nl80211/cfg80211: clarify periodic FTM parameters for non-EDCA based ranging Periodic FTM request attributes are defined based on the periodic parameters used in EDCA-based ranging negotiation. However, non-EDCA based ranging (trigger-based/non-trigger-based) does not include periodic parameters in the negotiation protocol, even though upper layers may still request periodic measurements. Clarify the semantics of periodic ranging attributes when used with non-EDCA based ranging. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260111190221.b89cb3f68e1a.I7a9d8c6d1c66c77f1b43120a841101c96c3f19ad@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-27 13:40:30 +01:00
Avraham Stern	86c6b6e4d1	wifi: nl80211/cfg80211: add new FTM capabilities Add new capabilities to the PMSR FTM capabilities list. The new capabilities include 6 GHz support, supported number of spatial streams and supported number of LTF repetitions. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Tested-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260111190221.bf43785c18f6.Ic98cf9790ddee84bf88e5720b93c46c23af3c96c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-27 13:40:25 +01:00
Bobby Eshleman	eafb64f40c	vsock: add netns to vsock core Add netns logic to vsock core. Additionally, modify transport hook prototypes to be used by later transport-specific patches (e.g., *_seqpacket_allow()). Namespaces are supported primarily by changing socket lookup functions (e.g., vsock_find_connected_socket()) to take into account the socket namespace and the namespace mode before considering a candidate socket a "match". This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode to report the mode and /proc/sys/net/vsock/child_ns_mode to set the mode for new namespaces. Add netns functionality (initialization, passing to transports, procfs, etc...) to the af_vsock socket layer. Later patches that add netns support to transports depend on this patch. This patch changes the allocation of random ports for connectible vsocks in order to avoid leaking the random port range starting point to other namespaces. dgram_allow(), stream_allow(), and seqpacket_allow() callbacks are modified to take a vsk in order to perform logic on namespace modes. In future patches, the net will also be used for socket lookups in these functions. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-1-2859a7512097@meta.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-27 10:45:38 +01:00
Eric Dumazet	df7388b3d7	net: inline get_netmem() and put_netmem() These helpers are used in network fast paths. Only call out-of-line helpers for netmem case. We might consider inlining __get_netmem() and __put_netmem() in the future. $ scripts/bloat-o-meter -t vmlinux.3 vmlinux.4 add/remove: 6/6 grow/shrink: 22/1 up/down: 2614/-646 (1968) Function old new delta pskb_carve 1669 1894 +225 gro_pull_from_frag0 - 206 +206 get_page 190 380 +190 skb_segment 3561 3747 +186 put_page 595 765 +170 skb_copy_ubufs 1683 1822 +139 __pskb_trim_head 276 401 +125 __pskb_copy_fclone 734 858 +124 skb_zerocopy 1092 1215 +123 pskb_expand_head 892 1008 +116 skb_split 828 940 +112 skb_release_data 297 409 +112 ___pskb_trim 829 941 +112 __skb_zcopy_downgrade_managed 120 226 +106 tcp_clone_payload 530 634 +104 esp_ssg_unref 191 294 +103 dev_gro_receive 1464 1514 +50 __put_netmem - 41 +41 __get_netmem - 41 +41 skb_shift 1139 1175 +36 skb_try_coalesce 681 714 +33 __pfx_put_page 112 144 +32 __pfx_get_page 32 64 +32 __pskb_pull_tail 1137 1168 +31 veth_xdp_get 250 267 +17 __pfx_gro_pull_from_frag0 - 16 +16 __pfx___put_netmem - 16 +16 __pfx___get_netmem - 16 +16 __pfx_put_netmem 16 - -16 __pfx_gro_try_pull_from_frag0 16 - -16 __pfx_get_netmem 16 - -16 put_netmem 114 - -114 get_netmem 130 - -130 napi_gro_frags 929 771 -158 gro_try_pull_from_frag0 196 - -196 Total: Before=22565857, After=22567825, chg +0.01% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260122045720.1221017-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-25 13:18:53 -08:00
Eric Dumazet	87918dd4ea	net: inline net_is_devmem_iov() 1) Inline this small helper to reduce code size and decrease cpu costs. 2) Constify its argument. 3) Move it to include/net/netmem.h, as a prereq for the following patch. $ scripts/bloat-o-meter -t vmlinux.2 vmlinux.3 add/remove: 0/2 grow/shrink: 0/4 up/down: 0/-158 (-158) Function old new delta validate_xmit_skb 866 857 -9 __pfx_net_is_devmem_iov 16 - -16 net_is_devmem_iov 22 - -22 get_netmem 152 130 -22 put_netmem 140 114 -26 tcp_recvmsg_locked 3860 3797 -63 Total: Before=22566015, After=22565857, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260122045720.1221017-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-25 13:18:53 -08:00
Eric Dumazet	f6c3665b6d	bonding: annotate data-races around slave->last_rx slave->last_rx and slave->target_last_arp_rx[...] can be read and written locklessly. Add READ_ONCE() and WRITE_ONCE() annotations. syzbot reported: BUG: KCSAN: data-race in bond_rcv_validate / bond_rcv_validate write to 0xffff888149f0d428 of 8 bytes by interrupt on cpu 1: bond_rcv_validate+0x202/0x7a0 drivers/net/bonding/bond_main.c:3335 bond_handle_frame+0xde/0x5e0 drivers/net/bonding/bond_main.c:1533 __netif_receive_skb_core+0x5b1/0x1950 net/core/dev.c:6039 __netif_receive_skb_one_core net/core/dev.c:6150 [inline] __netif_receive_skb+0x59/0x270 net/core/dev.c:6265 netif_receive_skb_internal net/core/dev.c:6351 [inline] netif_receive_skb+0x4b/0x2d0 net/core/dev.c:6410 ... write to 0xffff888149f0d428 of 8 bytes by interrupt on cpu 0: bond_rcv_validate+0x202/0x7a0 drivers/net/bonding/bond_main.c:3335 bond_handle_frame+0xde/0x5e0 drivers/net/bonding/bond_main.c:1533 __netif_receive_skb_core+0x5b1/0x1950 net/core/dev.c:6039 __netif_receive_skb_one_core net/core/dev.c:6150 [inline] __netif_receive_skb+0x59/0x270 net/core/dev.c:6265 netif_receive_skb_internal net/core/dev.c:6351 [inline] netif_receive_skb+0x4b/0x2d0 net/core/dev.c:6410 br_netif_receive_skb net/bridge/br_input.c:30 [inline] NF_HOOK include/linux/netfilter.h:318 [inline] ... value changed: 0x0000000100005365 -> 0x0000000100005366 Fixes: `f5b2b966f0` ("[PATCH] bonding: Validate probe replies in ARP monitor") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://patch.msgid.link/20260122162914.2299312-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-23 13:55:56 -08:00
Jakub Kicinski	8e3245cb30	net: add queue config validation callback I imagine (tm) that as the number of per-queue configuration options grows some of them may conflict for certain drivers. While the drivers can obviously do all the validation locally doing so is fairly inconvenient as the config is fed to drivers piecemeal via different ops (for different params and NIC-wide vs per-queue). Add a centralized callback for validating the queue config in queue ops. The callback gets invoked before memory provider is installed, and in the future should also be called when ring params are modified. The validation is done after each layer of configuration. Since we can't fail MP un-binding we must make sure that the config is valid both before and after MP overrides are applied. This is moot for now since the set of MP and device configs are disjoint. It will matter significantly in the future, so adding it now so that we don't forget.. Link: https://patch.msgid.link/20260122005113.2476634-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-23 11:49:02 -08:00
Jakub Kicinski	fc1a78a25c	net: use netdev_queue_config() for mp restart We should follow the prepare/commit approach for queue configuration. The qcfg struct should be added to dev->cfg rather than directly to queue objects so that we can clone and discard the pending config easily. Remove the qcfg in struct netdev_rx_queue, and switch remaining callers to netdev_queue_config(). netdev_queue_config() will construct the qcfg on the fly based on device defaults and state of the queue. ndo_default_qcfg becomes optional because having the callback itself does not have any meaningful semantics to us. Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://patch.msgid.link/20260122005113.2476634-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-23 11:49:02 -08:00
Jakub Kicinski	b9ac2c60a3	net: introduce a trivial netdev_queue_config() We may choose to extend or reimplement the logic which renders the per-queue config. The drivers should not poke directly into the queue state. Add a helper for drivers to use when they want to query the config for a specific queue. Link: https://patch.msgid.link/20260122005113.2476634-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-23 11:49:01 -08:00
Paolo Abeni	0c09e89f6c	geneve: expose gso partial features for tunnel offload GSO partial features for tunnels do not require any kind of support from the underlying device: we can safely add them to the geneve UDP tunnel. The only point of attention is the skb required features propagation in the device xmit op: partial features must be stripped, except for UDP_TUNNEL*. Keep partial features disabled by default. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/d851ca8e928cf05d68310bcbaeaa5e9e0b01e058.1769011015.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-23 11:31:14 -08:00
Jakub Kicinski	9abf22075d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.19-rc7). Conflicts: drivers/net/ethernet/huawei/hinic3/hinic3_irq.c `b35a6fd37a` ("hinic3: Add adaptive IRQ coalescing with DIM") `fb2bb2a1eb` ("hinic3: Fix netif_queue_set_napi queue_index input parameter error") https://lore.kernel.org/fc0a7fdf08789a52653e8ad05281a0a849e79206.1768915707.git.zhuyikai1@h-partners.com drivers/net/wireless/ath/ath12k/mac.c drivers/net/wireless/ath/ath12k/wifi7/hw.c `3170757210` ("wifi: ath12k: Fix wrong P2P device link id issue") `c26f294fef` ("wifi: ath12k: Move ieee80211_ops callback to the arch specific module") https://lore.kernel.org/20260114123751.6a208818@canb.auug.org.au Adjacent changes: drivers/net/wireless/ath/ath12k/mac.c `8b8d6ee53d` ("wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channel") `914c890d3b` ("wifi: ath12k: Add framework for hardware specific ieee80211_ops registration") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-22 20:14:36 -08:00
Eric Dumazet	bc1f0b1c98	tcp: move tcp_rate_check_app_limited() to tcp.c tcp_rate_check_app_limited() is used from tcp_sendmsg_locked() fast path and from other callers. Move it to tcp.c so that it can be inlined in tcp_sendmsg_locked(). Small increase of code, for better TCP performance. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 1/0 up/down: 87/0 (87) Function old new delta tcp_sendmsg_locked 4217 4304 +87 Total: Before=22566462, After=22566549, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20260121095923.3134639-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-22 18:28:48 -08:00
Eric Dumazet	b814bdcecd	tcp: move tcp_rate_gen to tcp_input.c This function is called from one caller only, in TCP fast path. Move it to tcp_input.c so that compiler can inline it. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 1/0 up/down: 226/-300 (-74) Function old new delta tcp_ack 5405 5631 +226 __pfx_tcp_rate_gen 16 - -16 tcp_rate_gen 284 - -284 Total: Before=22566536, After=22566462, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20260121095923.3134639-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-22 18:28:48 -08:00
Pablo Neira Ayuso	f175b46d91	netfilter: nf_tables: add .abort_skip_removal flag for set types The pipapo set backend is the only user of the .abort interface so far. To speed up pipapo abort path, removals are skipped. The follow up patch updates the rbtree to use to build an array of ordered elements, then use binary search. This needs a new .abort interface but, unlike pipapo, it also need to undo/remove elements. Add a flag and use it from the pipapo set backend. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-22 17:18:13 +01:00
Jakub Kicinski	9146fe2829	Another set of updates: - various small fixes for ath10k/ath12k/mwifiex/rsi - cfg80211 fix for HE bitrate overflow - mac80211 fixes - S1G beacon handling in scan - skb tailroom handling for HW encryption - CSA fix for multi-link - handling of disabled links during association -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmlyArkACgkQ10qiO8sP aACZxA/+N/q+DAHhVgqETwqOh80WAFTSuhDZsUXc6PNtFkOHvuZaeHePU+fn8hco +CkUVEnWYoNgUiaVg1697PJubp0psN7H+3+cq8kn8C/oNB2YnEV2kPCk4x7R8LCj TP6mad/Fb+I6Ct7XaCFymCS49eP4Bju7UBgYgLTiKJYvabA+Jim7LavZr8j0Tvra h9TNeA0I8+dgGppAWLTssrnsxp65xfSdq71mtRCFUrpEUHjzCl589PEv6BYcIRwv N50pm6Am5KJ1TZn5sVSYVfKiiG7UtL/pbXbsM5Cj/54yIFIgmE3bGI5MGAXRlWNG o/d/bo0rJg1xppipyZDEN5OJS6S0ijyC5TNKFFRX6IU2eZ8jJs7CWrQw6L8hWCbY G+lnVSh4yPAzLEk80S/zBHQccAPXnONtm+cFyPsPab79oAboxQVauuDdH1t5cxbQ 1HRn0RopyfEoPLmxsCcCSVcdF+hDwRZxAUO735Opnz/amDJNjBKgXTKezXSubfPH 5hvoAs/VZh7xSyJmEViDO5gavW1SX4nKlUixLYLUrXyq4i+eZkEIvqlCkIWuN9EW y/leooWqFzoXY3K8QL8nKQyHWg10WJL1l56tVz+Y+YAh8TvqgWWTB/O/u3g+/ZqO eDkaeKbbexUy6iyN0/TMb2RNnTERO0xOepMmIu3sw0HXhptkl1Q= =njuT -----END PGP SIGNATURE----- Merge tag 'wireless-2026-11-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Another set of updates: - various small fixes for ath10k/ath12k/mwifiex/rsi - cfg80211 fix for HE bitrate overflow - mac80211 fixes - S1G beacon handling in scan - skb tailroom handling for HW encryption - CSA fix for multi-link - handling of disabled links during association * tag 'wireless-2026-11-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: cfg80211: ignore link disabled flag from userspace wifi: mac80211: apply advertised TTLM from association response wifi: mac80211: parse all TTLM entries wifi: mac80211: don't increment crypto_tx_tailroom_needed_cnt twice wifi: mac80211: don't perform DA check on S1G beacon wifi: ath12k: Fix wrong P2P device link id issue wifi: ath12k: fix dead lock while flushing management frames wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channel wifi: ath12k: cancel scan only on active scan vdev wifi: mwifiex: Fix a loop in mwifiex_update_ampdu_rxwinsize() wifi: mac80211: correctly check if CSA is active wifi: cfg80211: Fix bitrate calculation overflow for HE rates wifi: rsi: Fix memory corruption due to not set vif driver data size wifi: ath12k: don't force radio frequency check in freq_to_idx() wifi: ath12k: fix dma_free_coherent() pointer wifi: ath10k: fix dma_free_coherent() pointer ==================== Link: https://patch.msgid.link/20260122110248.15450-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-22 07:54:31 -08:00
Tonghao Zhang	11ea9b8a88	net: bonding: use workqueue to make sure peer notify updated in lacp mode The rtnl lock might be locked, preventing ad_cond_set_peer_notif() from acquiring the lock and updating send_peer_notif. This patch addresses the issue by using a workqueue. Since updating send_peer_notif does not require high real-time performance, such delayed updates are entirely acceptable. In fact, checking this value and using it in multiple places, all operations are protected at the same time by rtnl lock, such as - read send_peer_notif - send_peer_notif-- - bond_should_notify_peers By the way, rtnl lock is still required, when accessing bond.params.* for updating send_peer_notif. In lacp mode, resetting send_peer_notif in workqueue is safe, simple and effective way. Additionally, this patch introduces bond_peer_notify_may_events(), which is used to check whether an event should be sent. This function will be used in both patch 1 and 2. Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Hangbin Liu <liuhangbin@gmail.com> Cc: Jason Xing <kerneljasonxing@gmail.com> Suggested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/f95accb5db0b10ce3ed2f834fc70f716c9abbb9c.1768709239.git.tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-22 11:20:33 +01:00
Jakub Kicinski	a7c708dc0d	netfilter pull request nf-next-26-01-20 -----BEGIN PGP SIGNATURE----- iQJdBAABCABHFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmlvnjQbFIAAAAAABAAO bWFudTIsMi41KzEuMTEsMiwyDRxmd0BzdHJsZW4uZGUACgkQcJGo2a1f9gBGtQ// fpGuA96XbcQVHDAkOYYAsjkHk4DPJvTdL4m4Pnw5SO3m5lVq0kw5Cp+6drv3q/Pd pMTuAUliVDOlK7wYemsThv/DgzqSO93uxrqTeX2J4tb/TgVYA9040aAfvWKo76iE GxSqfM255cMOAJ2zpBbgP3WfwiklGBlB7phPDTP3yoxbwH6TdtDCxcpVJ+M8wVN4 7CsRY3P8ZWZR1lW2V7sHERRABdsVfRmEtlFCEP+WARKHtkTyMZeqRsUtk3f50iRB AeEm3Uryj+q5s2Uof+uO8Lu0RxaBezJID4JksbWXEc/bsxaGKroPXx1qUsruwAJP 1TW+HL2yJx1xIydinoKFSD6PE7as0LeRvwCLFNbOqTrGefpPFX7sIdQNb/qMh1IN JpU0O0cwhWPYKjXD8pGcVNqTFs9CABRGSZBRUkSKMhWqwF1Hu0habF4nL70QkCqv FuhrelmNY/pDn7X5EQRII7cZMAxEL2lFtv+HERwZH2uZvDdMjE6Fu/+NdGQT4bCe d95dlnd1UkpMhI2CPsDKACXb5aqA5apWb7+2F5WcXzlI7XmNcHJksY8OVkjB0r7p +6IeBAYLPEUv+PYsR6g7vf6pAHA6I/axkMoK4vFXRnU3POnrVyZh3JHL/CChokWj cy8BYZukYSqvOnEoPJRpWkwO8opWDZmT5gIUSqHgKGk= =QWao -----END PGP SIGNATURE----- Merge tag 'nf-next-26-01-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== Subject: netfilter: updates for net-next 1) Speed up nftables transactions after earlier transaction failed. Due to a (harmeless) bug we remained in slow paranoia mode until a successful transaction completes. 2) Allow generic tracker to resolve clashes, this avoids very rare packet drops. From Yuto Hamaguchi. 3) Increase the cleanup budget to 64 entries in nf_conncount to reap more entries in one go, from Fernando Fernandez Mancera. 4) Allow icmp trackers to resolve clashes, this avoids very rare initial packet drop with test cases that have high-frequency pings. After this all trackers except tcp and sctp allow clash resolution. 5) Disentangle netfilter headers, don't include nftables/xtables headers in subsystems that are unrelated. 6) Don't rely on implicit includes coming from nf_conntrack_proto_gre.h. 7) Allow nfnetlink_queue nfq instance struct to get accounted via memcg, from Scott Mitchell. 8) Reject bogus xt target/match data upfront via netlink policiy in nft_compat interface rather than relying on x_tables API to do it. 9) Fix nf_conncount breakage when trying to limit loopback flows via prerouting rule, from Fernando Fernandez Mancera. This is a recent breakage but not seen as urgent enough to rush this via net tree at this late stage in development cycle. 10) Fix a possible off-by-one when parsing tcp option in xtables tcpmss match. Also handled via -next due to late stage in development cycle. * tag 'nf-next-26-01-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: xt_tcpmss: check remaining length before reading optlen netfilter: nf_conncount: fix tracking of connections from localhost netfilter: nft_compat: add more restrictions on netlink attributes netfilter: nfnetlink_queue: nfqnl_instance GFP_ATOMIC -> GFP_KERNEL_ACCOUNT allocation netfilter: nf_conntrack: don't rely on implicit includes netfilter: don't include xt and nftables.h in unrelated subsystems netfilter: nf_conntrack: enable icmp clash support netfilter: nf_conncount: increase the connection clean up limit to 64 netfilter: nf_conntrack: Add allow_clash to generic protocol handler netfilter: nf_tables: reset table validation state on abort ==================== Link: https://patch.msgid.link/20260120191803.22208-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-21 20:23:12 -08:00
Eric Dumazet	b8d9b7daf0	gro: inline tcp6_gro_complete() Remove one function call from GRO stack for native IPv6 + TCP packets. $ scripts/bloat-o-meter -t vmlinux.2 vmlinux.3 add/remove: 0/0 grow/shrink: 1/1 up/down: 298/-5 (293) Function old new delta ipv6_gro_complete 435 733 +298 tcp6_gro_complete 311 306 -5 Total: Before=22593532, After=22593825, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260120164903.1912995-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-21 19:28:32 -08:00
Eric Dumazet	87737cd76e	gro: inline tcp6_gro_receive() FDO/LTO are unable to inline tcp6_gro_receive() from ipv6_gro_receive() Make sure tcp6_check_fraglist_gro() is only called only when needed, so that compiler can leave it out-of-line. $ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2 add/remove: 2/0 grow/shrink: 3/1 up/down: 1123/-253 (870) Function old new delta ipv6_gro_receive 1069 1846 +777 tcp6_check_fraglist_gro - 272 +272 ipv6_offload_init 218 274 +56 __pfx_tcp6_check_fraglist_gro - 16 +16 ipv6_gro_complete 433 435 +2 tcp6_gro_receive 959 706 -253 Total: Before=22592662, After=22593532, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260120164903.1912995-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-21 19:28:32 -08:00
Eric Dumazet	a4674aa58b	tcp: preserve const qualifier in tcp_rsk() and inet_rsk() We can change tcp_rsk() and inet_rsk() to propagate their argument const qualifier thanks to container_of_const(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260120125353.1470456-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-21 19:20:04 -08:00
Eric Dumazet	670ade3bfa	tcp: move tcp_rate_skb_delivered() to tcp_input.c tcp_rate_skb_delivered() is only called from tcp_input.c. Move it there and make it static. Both gcc and clang are (auto)inlining it, TCP performance is increased at a small space cost. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 3/0 up/down: 509/-187 (322) Function old new delta tcp_sacktag_walk 1682 1867 +185 tcp_ack 5230 5405 +175 tcp_shifted_skb 437 586 +149 __pfx_tcp_rate_skb_delivered 16 - -16 tcp_rate_skb_delivered 171 - -171 Total: Before=22566192, After=22566514, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20260118123204.2315993-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-20 19:03:09 -08:00
Jakub Kicinski	677a51790b	Merge tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux Pavel Begunkov says: ==================== Add support for providers with large rx buffer Many modern NICs support configurable receive buffer lengths, and zcrx and memory providers can use buffers larger than 4K to improve performance. When paired with hw-gro larger rx buffer sizes can drastically reduce the number of buffers traversing the stack and save a lot of processing time. It also allows to give to users larger contiguous chunks of data. Single stream benchmarks showed up to ~30% CPU util improvement. E.g. comparison for 4K vs 32K buffers using a 200Gbit NIC: packets=23987040 (MB=2745098), rps=199559 (MB/s=22837) CPU %usr %nice %sys %iowait %irq %soft %idle 0 1.53 0.00 27.78 2.72 1.31 66.45 0.22 packets=24078368 (MB=2755550), rps=200319 (MB/s=22924) CPU %usr %nice %sys %iowait %irq %soft %idle 0 0.69 0.00 8.26 31.65 1.83 57.00 0.57 This series adds net infrastructure for memory providers configuring the size and implements it for bnxt. It's an opt-in feature for drivers, they should advertise support for the parameter in the qops and must check if the hardware supports the given size. It's limited to memory providers as it drastically simplifies implementation. It doesn't affect the fast path zcrx uAPI, and the user exposed parameter is defined in zcrx terms, which allows it to be flexible and adjusted in the future. A liburing example can be found at [2] full branch: [1] https://github.com/isilence/linux.git zcrx/large-buffers-v8 Liburing example: [2] https://github.com/isilence/liburing.git zcrx/rx-buf-len * tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux: io_uring/zcrx: document area chunking parameter selftests: iou-zcrx: test large chunk sizes eth: bnxt: support qcfg provided rx page size eth: bnxt: adjust the fill level of agg queues with larger buffers eth: bnxt: store rx buffer size per queue net: pass queue rx page size from memory provider net: add bare bone queue configs net: reduce indent of struct netdev_queue_mgmt_ops members net: memzero mp params when closing a queue ==================== Link: https://patch.msgid.link/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-20 18:10:04 -08:00
Jakub Kicinski	8766d61a1d	Revert "Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'" This reverts commit `77b9c4a438`, reversing changes made to 4515ec4ad58a37e70a9e1256c0b993958c9b7497: `931420a2fc` ("selftests/net: Add netkit container tests") `ab771c938d` ("selftests/net: Make NetDrvContEnv support queue leasing") `6be87fbb27` ("selftests/net: Add env for container based tests") `61d99ce3df` ("selftests/net: Add bpf skb forwarding program") `920da36341` ("netkit: Add xsk support for af_xdp applications") `eef51113f8` ("netkit: Add netkit notifier to check for unregistering devices") `b5ef109d22` ("netkit: Implement rtnl_link_ops->alloc and ndo_queue_create") `b5c3fa4a0b` ("netkit: Add single device mode for netkit") `0073d2fd67` ("xsk: Proxy pool management for leased queues") `1ecea95dd3` ("xsk: Extend xsk_rcv_check validation") `804bf334d0` ("net: Proxy netdev_queue_get_dma_dev for leased queues") `0caa9a8dde` ("net: Proxy net_mp_{open,close}_rxq for leased queues") `ff8889ff91` ("net, ethtool: Disallow leased real rxqs to be resized") `9e2103f361` ("net: Add lease info to queue-get response") `31127dedde` ("net: Implement netdev_nl_queue_create_doit") `a5546e18f7` ("net: Add queue-create operation") The series will conflict with io_uring work, and the code needs more polish. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-20 18:06:01 -08:00
Florian Westphal	d00453b6e3	netfilter: nf_conntrack: don't rely on implicit includes several netfilter compilation units rely on implicit includes coming from nf_conntrack_proto_gre.h. Clean this up and add the required dependencies where needed. nf_conntrack.h requires net_generic() helper. Place various gre/ppp/vlan includes to where they are needed. Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-20 16:23:37 +01:00
Florian Westphal	910d271227	netfilter: don't include xt and nftables.h in unrelated subsystems conntrack, xtables and nftables are distinct subsystems, don't use them in other subystems. Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-20 16:23:37 +01:00
Fernando Fernandez Mancera	21d033e472	netfilter: nf_conncount: increase the connection clean up limit to 64 After the optimization to only perform one GC per jiffy, a new problem was introduced. If more than 8 new connections are tracked per jiffy the list won't be cleaned up fast enough possibly reaching the limit wrongly. In order to prevent this issue, only skip the GC if it was already triggered during the same jiffy and the increment is lower than the clean up limit. In addition, increase the clean up limit to 64 connections to avoid triggering GC too often and do more effective GCs. This has been tested using a HTTP server and several performance tools while having nft_connlimit/xt_connlimit or OVS limit configured. Output of slowhttptest + OVS limit at 52000 connections: slow HTTP test status on 340th second: initializing: 0 pending: 432 connected: 51998 error: 0 closed: 0 service available: YES Fixes: `d265929930` ("netfilter: nf_conncount: reduce unnecessary GC") Reported-by: Aleksandra Rukomoinikova <ARukomoinikova@k2.cloud> Closes: https://lore.kernel.org/netfilter/b2064e7b-0776-4e14-adb6-c68080987471@k2.cloud/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-20 16:23:37 +01:00
David Wei	0caa9a8dde	net: Proxy net_mp_{open,close}_rxq for leased queues When a process in a container wants to setup a memory provider, it will use the virtual netdev and a leased rxq, and call net_mp_{open,close}_rxq to try and restart the queue. At this point, proxy the queue restart on the real rxq in the physical netdev. For memory providers (io_uring zero-copy rx and devmem), it causes the real rxq in the physical netdev to be filled from a memory provider that has DMA mapped memory from a process within a container. Signed-off-by: David Wei <dw@davidwei.uk> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260115082603.219152-6-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 11:58:49 +01:00
Daniel Borkmann	9e2103f361	net: Add lease info to queue-get response Populate nested lease info to the queue-get response that returns the ifindex, queue id with type and optionally netns id if the device resides in a different netns. Example with ynl client: # ip a [...] 4: enp10s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:24 qdisc mq state UP group default qlen 1000 link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 scope global enp10s0f0np0 valid_lft forever preferred_lft forever inet6 fe80::eaeb:d3ff:fea3:43f6/64 scope link proto kernel_ll valid_lft forever preferred_lft forever [...] # ethtool -i enp10s0f0np0 driver: mlx5_core [...] # ./pyynl/cli.py \ --spec ~/netlink/specs/netdev.yaml \ --do queue-get \ --json '{"ifindex": 4, "id": 15, "type": "rx"}' {'id': 15, 'ifindex': 4, 'lease': {'ifindex': 8, 'netns-id': 0, 'queue': {'id': 1, 'type': 'rx'}}, 'napi-id': 8227, 'type': 'rx', 'xsk': {}} # ip netns list foo (id: 0) # ip netns exec foo ip a [...] 8: nk@NONE: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff inet6 fe80::200:ff:fe00:0/64 scope link proto kernel_ll valid_lft forever preferred_lft forever [...] # ip netns exec foo ethtool -i nk driver: netkit [...] # ip netns exec foo ls /sys/class/net/nk/queues/ rx-0 rx-1 tx-0 # ip netns exec foo ./pyynl/cli.py \ --spec ~/netlink/specs/netdev.yaml \ --do queue-get \ --json '{"ifindex": 8, "id": 1, "type": "rx"}' {'id': 1, 'ifindex': 8, 'type': 'rx'} Note that the caller of netdev_nl_queue_fill_one() holds the netdevice lock. For the queue-get we do not lock both devices. When queues get {un,}leased, both devices are locked, thus if __netif_get_rx_queue_peer() returns true, the peer pointer points to a valid device. The netns-id is fetched via peernet2id_alloc() similarly as done in OVS. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260115082603.219152-4-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 11:58:49 +01:00
Daniel Borkmann	31127dedde	net: Implement netdev_nl_queue_create_doit Implement netdev_nl_queue_create_doit which creates a new rx queue in a virtual netdev and then leases it to a rx queue in a physical netdev. Example with ynl client: # ./pyynl/cli.py \ --spec ~/netlink/specs/netdev.yaml \ --do queue-create \ --json '{"ifindex": 8, "type": "rx", "lease": {"ifindex": 4, "queue": {"type": "rx", "id": 15}}}' {'id': 1} Note that the netdevice locking order is always from the virtual to the physical device. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260115082603.219152-3-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 11:58:49 +01:00
Benjamin Berg	50b359896f	wifi: cfg80211: ignore link disabled flag from userspace When the AP has an advertised TID to Link Mapping (TTLM) it shall include the element in the association response. As such, when this element is present it needs to be used for the currently dormant links. See Draft P802.11REVmf_D1.0 section 35.3.7.2.3 ("Negotiation of TTLM") for the details. The flag is also not usable in case userspace wants to specify a negotiated TTLM during association. Note that for the link reconfiguration case, mac80211 did not use the information. Draft P802.11REVmf_D1.0 states in section 35.3.6.4 ("Link reconfiguration to the setup links) that we "shall operate with all the TIDs mapped to the newly added links ..." All this means that the flag is not needed. The implementation should parse the information from the association response. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260118093904.754e057896a5.Ifd06f5ef839a93bfd54d0593dc932870f95f3242@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-20 10:02:01 +01:00
Eric Dumazet	03e9d91dd6	ipv6: annotate data-races in ip6_multipath_hash_{policy,fields}() Add missing READ_ONCE() when reading sysctl values. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Eric Dumazet	3681282530	ipv6: annotate date-race in ipv6_can_nonlocal_bind() Add a missing READ_ONCE(), and add const qualifiers to the two parameters. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Eric Dumazet	ded139b59b	ipv6: annotate data-races from ip6_make_flowlabel() Use READ_ONCE() to read sysctl values in ip6_make_flowlabel() and ip6_make_flowlabel() Add a const qualifier to 'struct net' parameters. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Eric Dumazet	e82a347d92	ipv6: add sysctl_ipv6_flowlabel group Group together following struct netns_sysctl_ipv6 fields: - flowlabel_consistency - auto_flowlabels - flowlabel_state_ranges After this patch, ip6_make_flowlabel() uses a single cache line to fetch auto_flowlabels and flowlabel_state_ranges (instead of two before the patch). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Eric Dumazet	f10ab9d3a7	tcp: move tcp_rate_skb_sent() to tcp_output.c It is only called from __tcp_transmit_skb() and __tcp_retransmit_skb(). Move it in tcp_output.c and make it static. clang compiler is now able to inline it from __tcp_transmit_skb(). gcc compiler inlines it in the two callers, which is also fine. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20260114165109.1747722-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-17 15:43:16 -08:00
Jakub Kicinski	c27022497d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.19-rc6). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-15 18:02:48 -08:00
Shahar Shitrit	cfbc8b6bab	net: Introduce netif_xmit_timeout_ms() helper Introduce a new helper function netif_xmit_timeout_ms() to check if a TX queue is stopped and has timed out and report the timeout duration. This makes the timeout logic reusable, and will be used in several places in subsequent patches. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768209383-1546791-2-git-send-email-tariqt@nvidia.com Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-15 11:55:05 +01:00
Jason Xing	a2cb2e23b2	xsk: move cq_cached_prod_lock to avoid touching a cacheline in sending path We (Paolo and I) noticed that in the sending path touching an extra cacheline due to cq_cached_prod_lock will impact the performance. After moving the lock from struct xsk_buff_pool to struct xsk_queue, the performance is increased by ~5% which can be observed by xdpsock. An alternative approach [1] can be using atomic_try_cmpxchg() to have the same effect. But unfortunately I don't have evident performance numbers to prove the atomic approach is better than the current patch. The advantage is to save the contention time among multiple xsks sharing the same pool while the disadvantage is losing good maintenance. The full discussion can be found at the following link. [1]: https://lore.kernel.org/all/20251128134601.54678-1-kerneljasonxing@gmail.com/ Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://patch.msgid.link/20260104012125.44003-3-kerneljasonxing@gmail.com Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-15 10:07:45 +01:00
Kavita Kavita	63e7e3b643	wifi: mac80211: allow key installation before association Currently, mac80211 allows key installation only after association completes. However, Enhanced Privacy Protection Key Exchange (EPPKE) requires key installation before association to enable encryption and decryption of (Re)Association Request and Response frames. Add support to install keys prior to association when the peer is an Enhanced Privacy Protection (EPP) peer that requires encryption and decryption of (Re)Association Request and Response frames. Introduce a new boolean parameter "epp_peer" in the "ieee80211_sta" profile to indicate that the peer supports the Enhanced Privacy Protection Key Exchange (EPPKE) protocol. For non-AP STA mode, it is set when the authentication algorithm is WLAN_AUTH_EPPKE during station profile initialization. For AP mode, it is set during NL80211_CMD_NEW_STA and NL80211_CMD_ADD_LINK_STA. When "epp_peer" parameter is set, mac80211 now accepts keys before association and enables encryption of the (Re)Association Request/Response frames. Co-developed-by: Sai Pratyusha Magam <sai.magam@oss.qualcomm.com> Signed-off-by: Sai Pratyusha Magam <sai.magam@oss.qualcomm.com> Signed-off-by: Kavita Kavita <kavita.kavita@oss.qualcomm.com> Link: https://patch.msgid.link/20260114111900.2196941-6-kavita.kavita@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-14 14:34:16 +01:00
Sai Pratyusha Magam	6ee3a22c61	wifi: nl80211: Add support for EPP peer indication Introduce a new netlink attribute NL80211_ATTR_EPP_PEER to be used with NL80211_CMD_NEW_STA and NL80211_CMD_ADD_LINK_STA for the userspace to indicate that a non-AP STA is an Enhanced Privacy Protection (EPP) peer. Co-developed-by: Rohan Dutta <quic_drohan@quicinc.com> Signed-off-by: Rohan Dutta <quic_drohan@quicinc.com> Signed-off-by: Sai Pratyusha Magam <sai.magam@oss.qualcomm.com> Signed-off-by: Kavita Kavita <kavita.kavita@oss.qualcomm.com> Link: https://patch.msgid.link/20260114111900.2196941-5-kavita.kavita@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-14 14:34:16 +01:00
Dipayaan Roy	3b194343c2	net: mana: Implement ndo_tx_timeout and serialize queue resets per port. Implement .ndo_tx_timeout for MANA so any stalled TX queue can be detected and a device-controlled port reset for all queues can be scheduled to a ordered workqueue. The reset for all queues on stall detection is recomended by hardware team. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260112130552.GA11785@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-13 19:14:36 -08:00
Pavel Begunkov	c0b709bf43	net: pass queue rx page size from memory provider Allow memory providers to configure rx queues with a custom receive page size. It's passed in struct pp_memory_provider_params, which is copied into the queue, so it's preserved across queue restarts. Then, it's propagated to the driver in a new queue config parameter. Drivers should explicitly opt into using it by setting QCFG_RX_PAGE_SIZE, in which case they should implement ndo_default_qcfg, validate the size on queue restart and honour the current config in case of a reset. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>	2026-01-14 02:13:36 +00:00
Pavel Begunkov	efcb9a4d32	net: add bare bone queue configs We'll need to pass extra parameters when allocating a queue for memory providers. Define a new structure for queue configurations, and pass it to qapi callbacks. It's empty for now, actual parameters will be added in following patches. Configurations should persist across resets, and for that they're default-initialised on device registration and stored in struct netdev_rx_queue. We also add a new qapi callback for defaulting a given config. It must be implemented if a driver wants to use queue configs and is optional otherwise. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>	2026-01-14 02:13:36 +00:00
Jakub Kicinski	92d76cf96d	net: reduce indent of struct netdev_queue_mgmt_ops members Trivial change, reduce the indent. I think the original is copied from real NDOs. It's unnecessarily deep, makes passing struct args problematic. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>	2026-01-14 02:13:36 +00:00
Toke Høiland-Jørgensen	8b27fd66f5	net/sched: Export mq functions for reuse To enable the cake_mq qdisc to reuse code from the mq qdisc, export a bunch of functions from sch_mq. Split common functionality out from some functions so it can be composed with other code, and export other functions wholesale. To discourage wanton reuse, put the symbols into a new NET_SCHED_INTERNAL namespace, and a sch_priv.h header file. No functional change intended. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-1-8d613fece5d8@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-13 11:54:29 +01:00
Eric Dumazet	ffe4ccd359	net: add net.core.qdisc_max_burst In blamed commit, I added a check against the temporary queue built in __dev_xmit_skb(). Idea was to drop packets early, before any spinlock was acquired. if (unlikely(defer_count > READ_ONCE(q->limit))) { kfree_skb_reason(skb, SKB_DROP_REASON_QDISC_DROP); return NET_XMIT_DROP; } It turned out that HTB Qdisc has a zero q->limit. HTB limits packets on a per-class basis. Some of our tests became flaky. Add a new sysctl : net.core.qdisc_max_burst to control how many packets can be stored in the temporary lockless queue. Also add a new QDISC_BURST_DROP drop reason to better diagnose future issues. Thanks Neal ! Fixes: `100dfa74ca` ("net: dev_queue_xmit() llist adoption") Reported-and-bisected-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20260107104159.3669285-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-13 10:12:11 +01:00
Heiner Kallweit	c4277d21ab	net: phy: realtek: add dummy PHY driver for RTL8127ATF RTL8127ATF supports a SFP+ port for fiber modules (10GBASE-SR/LR/ER/ZR and DAC). The list of supported modes was provided by Realtek. According to the r8127 vendor driver also 1G modules are supported, but this needs some more complexity in the driver, and only 10G mode has been tested so far. Therefore mainline support will be limited to 10G for now. The SFP port signals are hidden in the chip IP and driven by firmware. Therefore mainline SFP support can't be used here. This PHY driver is used by the RTL8127ATF support in r8169. RTL8127ATF reports the same PHY ID as the TP version. Therefore use a dummy PHY ID. This PHY driver is used by the RTL8127ATF support in r8169. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/e3d55162-210a-4fab-9abf-99c6954eee10@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-12 19:29:11 -08:00
Jakub Kicinski	669aa3e3fa	First set of changes for the current -next cycle, of note: - ath12k gets an overhaul to support multi-wiphy device wiphy and pave the way for future device support in the same driver (rather than splitting to ath13k) - mac80211 gets some better iteration macros -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmllQ8YACgkQ10qiO8sP aAA5Eg/+Nk1NNSP0+69YdXZECE+cGPoXVUsLZSEalgd7SZ8jKkCfVVshGAhimFp0 qMcRgkjGsQkVrOlKcCdX9GkQTQnI2HIvxfqXaE0pB39KelB3lKnBycD47FuC6OVE mjNZYpUhNs9wuKs8uP+BRO9L/41C/5uE8hyTR3cf2bqkhx+FAasdYZaVFenpaOJ2 lmH5GkeAjxEYfyYk/7I70ixtIZ4oDISj99W97rfqSiTQx7VEOD8NdWZirUpLbpt1 UDR+rCapJQ1wl9p8riSE09hzJALKKVI9YHIDWvfTI81pO+Xt1eyf1wWu0ewz2t6P 5tLk4LChFZeUqV6oJqTyRYWVlSQ8d9wVFNjPF0dJCiqmh48oz4BCFCWE5dJHgh8Q LN8LErrjTBBbgldwbQm7HRHb7llt0MRCmW2qJKKSU4aIBbEC/1sFu0w6smNIPCst GJ+fCBYugFAHZ0cft468FW/TSs49zlpGZShRcy22/Ll2iuGp1+TA5nmMAcfF4kdf GoPIO2c4gdnHUAp046czquU4KnbtzI0AWLdti7jAHSdVNv1+u3uTgudKVh+PGqIj BjOfJvUYCDqK9vkLyZXA0iMfKOdKjSOM+IWT0elYjr4MUy/E9fKK28AmXsze4IzB iNMHowJFlQUEi+/26NxO1+8kZpf1x05rxDGcszvECoDiow2TWJg= =F1n0 -----END PGP SIGNATURE----- Merge tag 'wireless-next-2026-01-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== First set of changes for the current -next cycle, of note: - ath12k gets an overhaul to support multi-wiphy device wiphy and pave the way for future device support in the same driver (rather than splitting to ath13k) - mac80211 gets some better iteration macros * tag 'wireless-next-2026-01-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (120 commits) wifi: mac80211: remove width argument from ieee80211_parse_bitrates wifi: mac80211_hwsim: remove NAN by default wifi: mac80211: improve station iteration ergonomics wifi: mac80211: improve interface iteration ergonomics wifi: cfg80211: include S1G_NO_PRIMARY flag when sending channel wifi: mac80211: unexport ieee80211_get_bssid() wl1251: Replace strncpy with strscpy in wl1251_acx_fw_version wifi: iwlegacy: 3945-rs: remove redundant pointer check in il3945_rs_tx_status() and il3945_rs_get_rate() wifi: mac80211: don't send an unused argument to ieee80211_check_combinations wifi: libertas: fix WARNING in usb_tx_block wifi: mwifiex: Allocate dev name earlier for interface workqueue name wifi: wlcore: sdio: Use pm_ptr instead of #ifdef CONFIG_PM wifi: cfg80211: Fix use_for flag update on BSS refresh wifi: brcmfmac: rename function that frees vif wifi: brcmfmac: fix/add kernel-doc comments wifi: mac80211: Update csa_finalize to use link_id wifi: cfg80211: add cfg80211_stop_link() for per-link teardown wifi: ath12k: Skip DP peer creation for scan vdev wifi: ath12k: move firmware stats request outside of atomic context wifi: ath12k: add the missing RCU lock in ath12k_dp_tx_free_txbuf() ... ==================== Link: https://patch.msgid.link/20260112185836.378736-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-12 17:02:02 -08:00
Johannes Berg	f813117f20	wifi: mac80211: improve station iteration ergonomics Right now, the only way to iterate stations is to declare an iterator function, possibly data structure to use, and pass all that to the iteration helper function. This is annoying, and there's really no inherent need for it. Add a new for_each_station() macro that does the iteration in a more ergonomic way. To avoid even more exported functions, do the old ieee80211_iterate_stations_mtx() as an inline using the new way, which may also let the compiler optimise it a bit more, e.g. via inlining the iterator function. Link: https://patch.msgid.link/20260108143431.d2b641f6f6af.I4470024f7404446052564b15bcf8b3f1ada33655@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-12 19:48:17 +01:00
Johannes Berg	6b3bafa2bd	wifi: mac80211: improve interface iteration ergonomics Right now, the only way to iterate interfaces is to declare an iterator function, possibly data structure to use, and pass all that to the iteration helper function. This is annoying, and there's really no inherent need for it, except it was easier to implement with the iflist mutex, but that's not used much now. Add a new for_each_interface() macro that does the iteration in a more ergonomic way. To avoid even more exported functions, do the old ieee80211_iterate_active_interfaces_mtx() as an inline using the new way, which may also let the compiler optimise it a bit more, e.g. via inlining the iterator function. Also provide for_each_active_interface() for the common case of just iterating active interfaces. Link: https://patch.msgid.link/20260108143431.f2581e0c381a.Ie387227504c975c109c125b3c57f0bb3fdab2835@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-12 19:48:17 +01:00
Eric Dumazet	872ac785e7	ipv4: ip_tunnel: spread netdev_lockdep_set_classes() Inspired by yet another syzbot report. IPv6 tunnels call netdev_lockdep_set_classes() for each tunnel type, while IPv4 currently centralizes netdev_lockdep_set_classes() call from ip_tunnel_init(). Make ip_tunnel_init() a macro, so that we have different lockdep classes per tunnel type. Fixes: `0bef512012` ("net: add netdev_lockdep_set_classes() to virtual drivers") Reported-by: syzbot+1240b33467289f5ab50b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/695d439f.050a0220.1c677c.0347.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260106172426.1760721-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-08 18:02:35 -08:00
Manish Dharanenthiran	dc4b176cce	wifi: cfg80211: add cfg80211_stop_link() for per-link teardown Currently, whenever cfg80211_stop_iface() is called, the entire iface is stopped. However, there could be a need in AP/P2P_GO mode, where one would like to stop a single link in MLO operation instead of the whole MLD interface. Hence, introduce cfg80211_stop_link() to allow drivers to tear down only a specified AP/P2P_GO link during MLO operation. Passing -1 preserves the existing behavior of stopping the whole interface. Make cfg80211_stop_iface() call this function by passing -1 to keep the default behavior the same, that is, to stop all links and use cfg80211_stop_link() with the desired link_id for AP/P2P_GO mode, to stop only that link. This brings no behavioral change for single-link/non-MLO interfaces, and enables drivers to stop an AP/P2P_GO link without disrupting other links on the same interface. Signed-off-by: Manish Dharanenthiran <manish.dharanenthiran@oss.qualcomm.com> Link: https://patch.msgid.link/20251127-stop_link-v2-1-43745846c5fd@qti.qualcomm.com [make cfg80211_stop_iface() inline] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-08 13:11:01 +01:00
Daniel Sedlak	55ffb0b14a	tcp: clarify tcp_congestion_ops functions comments The optional and required hints in the tcp_congestion_ops are information for the user of this interface to signalize its importance when implementing these functions. However, cong_avoid comment incorrectly tells that it is required, in reality congestion control must provide one of either cong_avoid or cong_control. In addition, min_tso_segs has not had any comment optional/required hints. So mark it as optional since it is used only in BBR. Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Daniel Sedlak <daniel.sedlak@cdn77.com> Link: https://patch.msgid.link/20260105115533.1151442-1-daniel.sedlak@cdn77.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-06 17:35:17 -08:00
Eric Dumazet	e9cd04b281	udp: udplite is unlikely Add some unlikely() annotations to speed up the fast path, at least with clang compiler. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260105101719.2378881-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-06 17:06:03 -08:00
Gustavo A. R. Silva	c86af46b9c	ipv4/inet_sock.h: Avoid thousands of -Wflex-array-member-not-at-end warnings Use DEFINE_RAW_FLEX() to avoid thousands of -Wflex-array-member-not-at-end warnings. Remove struct ip_options_data, and adjust the rest of the code so that flexible-array member struct ip_options_rcu::opt.__data[] ends last in struct icmp_bxm. Compensate for this by using the DEFINE_RAW_FLEX() helper to define each on-stack struct instance that contained struct ip_options_data as a member, and to define struct ip_options_rcu with a fixed on-stack size for its nested flexible-array member opt.__data[]. Also, add a couple of code comments to prevent people from adding members to a struct after another member that contains a flexible array. With these changes, fix 2600 warnings of the following type: include/net/inet_sock.h:65:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/aVteBadWA6AbTp7X@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-06 17:02:52 -08:00
Joel Granados	f7386f545e	sysctl: Remove unused ctl_table forward declarations Remove superfluous forward declarations of ctl_table from header files where they are no longer needed. These declarations were left behind after sysctl code refactoring and cleanup. Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>	2026-01-05 13:54:41 +01:00
Vladimir Oltean	06e219f6a7	net: dsa: properly keep track of conduit reference Problem description ------------------- DSA has a mumbo-jumbo of reference handling of the conduit net device and its kobject which, sadly, is just wrong and doesn't make sense. There are two distinct problems. 1. The OF path, which uses of_find_net_device_by_node(), never releases the elevated refcount on the conduit's kobject. Nominally, the OF and non-OF paths should result in objects having identical reference counts taken, and it is already suspicious that dsa_dev_to_net_device() has a put_device() call which is missing in dsa_port_parse_of(), but we can actually even verify that an issue exists. With CONFIG_DEBUG_KOBJECT_RELEASE=y, if we run this command "before" and "after" applying this patch: (unbind the conduit driver for net device eno2) echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind we see these lines in the output diff which appear only with the patch applied: kobject: 'eno2' (ffff002009a3a6b8): kobject_release, parent 0000000000000000 (delayed 1000) kobject: '109' (ffff0020099d59a0): kobject_release, parent 0000000000000000 (delayed 1000) 2. After we find the conduit interface one way (OF) or another (non-OF), it can get unregistered at any time, and DSA remains with a long-lived, but in this case stale, cpu_dp->conduit pointer. Holding the net device's underlying kobject isn't actually of much help, it just prevents it from being freed (but we never need that kobject directly). What helps us to prevent the net device from being unregistered is the parallel netdev reference mechanism (dev_hold() and dev_put()). Actually we actually use that netdev tracker mechanism implicitly on user ports since commit `2f1e8ea726` ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), via netdev_upper_dev_link(). But time still passes at DSA switch probe time between the initial of_find_net_device_by_node() code and the user port creation time, time during which the conduit could unregister itself and DSA wouldn't know about it. So we have to run of_find_net_device_by_node() under rtnl_lock() to prevent that from happening, and release the lock only with the netdev tracker having acquired the reference. Do we need to keep the reference until dsa_unregister_switch() / dsa_switch_shutdown()? 1: Maybe yes. A switch device will still be registered even if all user ports failed to probe, see commit `86f8b1c01a` ("net: dsa: Do not make user port errors fatal"), and the cpu_dp->conduit pointers remain valid. I haven't audited all call paths to see whether they will actually use the conduit in lack of any user port, but if they do, it seems safer to not rely on user ports for that reference. 2. Definitely yes. We support changing the conduit which a user port is associated to, and we can get into a situation where we've moved all user ports away from a conduit, thus no longer hold any reference to it via the net device tracker. But we shouldn't let it go nonetheless - see the next change in relation to dsa_tree_find_first_conduit() and LAG conduits which disappear. We have to be prepared to return to the physical conduit, so the CPU port must explicitly keep another reference to it. This is also to say: the user ports and their CPU ports may not always keep a reference to the same conduit net device, and both are needed. As for the conduit's kobject for the /sys/class/net/ entry, we don't care about it, we can release it as soon as we hold the net device object itself. History and blame attribution ----------------------------- The code has been refactored so many times, it is very difficult to follow and properly attribute a blame, but I'll try to make a short history which I hope to be correct. We have two distinct probing paths: - one for OF, introduced in 2016 in commit `83c0afaec7` ("net: dsa: Add new binding implementation") - one for non-OF, introduced in 2017 in commit `71e0bbde0d` ("net: dsa: Add support for platform data") These are both complete rewrites of the original probing paths (which used struct dsa_switch_driver and other weird stuff, instead of regular devices on their respective buses for register access, like MDIO, SPI, I2C etc): - one for OF, introduced in 2013 in commit `5e95329b70` ("dsa: add device tree bindings to register DSA switches") - one for non-OF, introduced in 2008 in commit `91da11f870` ("net: Distributed Switch Architecture protocol support") except for tiny bits and pieces like dsa_dev_to_net_device() which were seemingly carried over since the original commit, and used to this day. The point is that the original probing paths received a fix in 2015 in the form of commit `679fb46c57` ("net: dsa: Add missing master netdev dev_put() calls"), but the fix never made it into the "new" (dsa2) probing paths that can still be traced to today, and the fixed probing path was later deleted in 2019 in commit `93e86b3bc8` ("net: dsa: Remove legacy probing support"). That is to say, the new probing paths were never quite correct in this area. The existence of the legacy probing support which was deleted in 2019 explains why dsa_dev_to_net_device() returns a conduit with elevated refcount (because it was supposed to be released during dsa_remove_dst()). After the removal of the legacy code, the only user of dsa_dev_to_net_device() calls dev_put(conduit) immediately after this function returns. This pattern makes no sense today, and can only be interpreted historically to understand why dev_hold() was there in the first place. Change details -------------- Today we have a better netdev tracking infrastructure which we should use. Logically netdev_hold() belongs in common code (dsa_port_parse_cpu(), where dp->conduit is assigned), but there is a tradeoff to be made with the rtnl_lock() section which would become a bit too long if we did that - dsa_port_parse_cpu() also calls request_module(). So we duplicate a bit of logic in order for the callers of dsa_port_parse_cpu() to be the ones responsible of holding the conduit reference and releasing it on error. This shortens the rtnl_lock() section significantly. In the dsa_switch_probe() error path, dsa_switch_release_ports() will be called in a number of situations, one being where dsa_port_parse_cpu() maybe didn't get the chance to run at all (a different port failed earlier, etc). So we have to test for the conduit being NULL prior to calling netdev_put(). There have still been so many transformations to the code since the blamed commits (rename master -> conduit, commit `0650bf52b3` ("net: dsa: be compatible with masters which unregister on shutdown")), that it only makes sense to fix the code using the best methods available today and see how it can be backported to stable later. I suspect the fix cannot even be backported to kernels which lack dsa_switch_shutdown(), and I suspect this is also maybe why the long-lived conduit reference didn't make it into the new DSA probing paths at the time (problems during shutdown). Because dsa_dev_to_net_device() has a single call site and has to be changed anyway, the logic was just absorbed into the non-OF dsa_port_parse(). Tested on the ocelot/felix switch and on dsa_loop, both on the NXP LS1028A with CONFIG_DEBUG_KOBJECT_RELEASE=y. Reported-by: Ma Ke <make24@iscas.ac.cn> Closes: https://lore.kernel.org/netdev/20251214131204.4684-1-make24@iscas.ac.cn/ Fixes: `83c0afaec7` ("net: dsa: Add new binding implementation") Fixes: `71e0bbde0d` ("net: dsa: Add support for platform data") Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251215150236.3931670-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-23 10:32:08 +01:00
Linus Torvalds	7b8e9264f5	Including fixes from netfilter and CAN. Current release - regressions: - netfilter: nf_conncount: fix leaked ct in error paths - sched: act_mirred: fix loop detection - sctp: fix potential deadlock in sctp_clone_sock() - can: fix build dependency - eth: mlx5e: do not update BQL of old txqs during channel reconfiguration Previous releases - regressions: - sched: ets: always remove class from active list before deleting it - inet: frags: flush pending skbs in fqdir_pre_exit() - netfilter: nf_nat: remove bogus direction check - mptcp: - schedule rtx timer only after pushing data - avoid deadlock on fallback while reinjecting - can: gs_usb: fix error handling - eth: mlx5e: - avoid unregistering PSP twice - fix double unregister of HCA_PORTS component - eth: bnxt_en: fix XDP_TX path - eth: mlxsw: fix use-after-free when updating multicast route stats Previous releases - always broken: - ethtool: avoid overflowing userspace buffer on stats query - openvswitch: fix middle attribute validation in push_nsh() action - eth: mlx5: fw_tracer, validate format string parameters - eth: mlxsw: spectrum_router: fix neighbour use-after-free - eth: ipvlan: ignore PACKET_LOOPBACK in handle_mode_l2() Misc: - Jozsef Kadlecsik retires from maintaining netfilter - tools: ynl: fix build on systems with old kernel headers Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmlEPS0SHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkHLEP/3TnVI9qLivOGsZ48bn5UUcIaIlX0wiw i5gwpUhbtW3zcLSphO0Nh/CmProme6dFhaMOOkk48bKAdIxRpOMbCC20bfYDLyd0 ZJTUQqheKpuI23vOnhfs2TTOkcqz6cM7txUq681taQ8Nfo8Dbf0fgOO2HT5ljRD+ JXlxrdvicZDtve68sSdnsAj5M15EPQGKTPMyPkymBmCKnCMQK9SQKaeTEtaW6NIO yM0KqDSNoulDc/6LYMhx8DTtyE7yTiTxwe2NixjdyYljXsk0KiRDirCzxnZuSgh8 oh+cFLdFDq4mwdYEKjgC5c3ifQyLEpZvwzlY5MKoobsVnT5SbigeSK53l5rEkO3V sM84lITHfqTJvlid0AF/ixEc6iWwV7nGRBHh2FXNbfoIKt45eF77jPi9YFsq6Z95 vlCzYIbY0f2L1y3mPvZbzGQbh2Z12b5kyK8QA1j7SK+zNzxgXbf4+ZtYxHg7O3Ne gecmIpTKXMWodaZyfsRQPjR/F6UIlMqsgl9Ci9bfUw+XwL8x+7bJxQAQz+yVjLla ng14BItiYKBavcPZBjYlhGKqD1fzGhVZqQecrCkF0VTbMusRd9RcwytU1NG4QGDx V5aL28ht85KtMednEWOBkrg+PeXnNyZHzLAf2Xtx3UkaGgiDC8G4IxUv3orlOLD1 sFPfZnSiGCof =6pla -----END PGP SIGNATURE----- Merge tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and CAN. Current release - regressions: - netfilter: nf_conncount: fix leaked ct in error paths - sched: act_mirred: fix loop detection - sctp: fix potential deadlock in sctp_clone_sock() - can: fix build dependency - eth: mlx5e: do not update BQL of old txqs during channel reconfiguration Previous releases - regressions: - sched: ets: always remove class from active list before deleting it - inet: frags: flush pending skbs in fqdir_pre_exit() - netfilter: nf_nat: remove bogus direction check - mptcp: - schedule rtx timer only after pushing data - avoid deadlock on fallback while reinjecting - can: gs_usb: fix error handling - eth: - mlx5e: - avoid unregistering PSP twice - fix double unregister of HCA_PORTS component - bnxt_en: fix XDP_TX path - mlxsw: fix use-after-free when updating multicast route stats Previous releases - always broken: - ethtool: avoid overflowing userspace buffer on stats query - openvswitch: fix middle attribute validation in push_nsh() action - eth: - mlx5: fw_tracer, validate format string parameters - mlxsw: spectrum_router: fix neighbour use-after-free - ipvlan: ignore PACKET_LOOPBACK in handle_mode_l2() Misc: - Jozsef Kadlecsik retires from maintaining netfilter - tools: ynl: fix build on systems with old kernel headers" * tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) net: hns3: add VLAN id validation before using net: hns3: using the num_tqps to check whether tqp_index is out of range when vf get ring info from mbx net: hns3: using the num_tqps in the vf driver to apply for resources net: enetc: do not transmit redirected XDP frames when the link is down selftests/tc-testing: Test case exercising potential mirred redirect deadlock net/sched: act_mirred: fix loop detection sctp: Clear inet_opt in sctp_v6_copy_ip_options(). sctp: Fetch inet6_sk() after setting ->pinet6 in sctp_clone_sock(). net/handshake: duplicate handshake cancellations leak socket net/mlx5e: Don't include PSP in the hard MTU calculations net/mlx5e: Do not update BQL of old txqs during channel reconfiguration net/mlx5e: Trigger neighbor resolution for unresolved destinations net/mlx5e: Use ip6_dst_lookup instead of ipv6_dst_lookup_flow for MAC init net/mlx5: Serialize firmware reset with devlink net/mlx5: fw_tracer, Handle escaped percent properly net/mlx5: fw_tracer, Validate format string parameters net/mlx5: Drain firmware reset in shutdown callback net/mlx5: fw reset, clear reset requested on drain_fw_reset net: dsa: mxl-gsw1xx: manually clear RANEG bit net: dsa: mxl-gsw1xx: fix .shutdown driver operation ...	2025-12-19 07:55:35 +12:00
Florian Westphal	8e1a1bc4f5	netfilter: nf_tables: avoid chain re-validation if possible Hamza Mahfooz reports cpu soft lock-ups in nft_chain_validate(): watchdog: BUG: soft lockup - CPU#1 stuck for 27s! [iptables-nft-re:37547] [..] RIP: 0010:nft_chain_validate+0xcb/0x110 [nf_tables] [..] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_table_validate+0x6b/0xb0 [nf_tables] nf_tables_validate+0x8b/0xa0 [nf_tables] nf_tables_commit+0x1df/0x1eb0 [nf_tables] [..] Currently nf_tables will traverse the entire table (chain graph), starting from the entry points (base chains), exploring all possible paths (chain jumps). But there are cases where we could avoid revalidation. Consider: 1 input -> j2 -> j3 2 input -> j2 -> j3 3 input -> j1 -> j2 -> j3 Then the second rule does not need to revalidate j2, and, by extension j3, because this was already checked during validation of the first rule. We need to validate it only for rule 3. This is needed because chain loop detection also ensures we do not exceed the jump stack: Just because we know that j2 is cycle free, its last jump might now exceed the allowed stack size. We also need to update all reachable chains with the new largest observed call depth. Care has to be taken to revalidate even if the chain depth won't be an issue: chain validation also ensures that expressions are not called from invalid base chains. For example, the masquerade expression can only be called from NAT postrouting base chains. Therefore we also need to keep record of the base chain context (type, hooknum) and revalidate if the chain becomes reachable from a different hook location. Reported-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Closes: https://lore.kernel.org/netfilter-devel/20251118221735.GA5477@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/ Tested-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-12-15 15:02:44 +01:00
Jakub Kicinski	006a5035b4	inet: frags: flush pending skbs in fqdir_pre_exit() We have been seeing occasional deadlocks on pernet_ops_rwsem since September in NIPA. The stuck task was usually modprobe (often loading a driver like ipvlan), trying to take the lock as a Writer. lockdep does not track readers for rwsems so the read wasn't obvious from the reports. On closer inspection the Reader holding the lock was conntrack looping forever in nf_conntrack_cleanup_net_list(). Based on past experience with occasional NIPA crashes I looked thru the tests which run before the crash and noticed that the crash follows ip_defrag.sh. An immediate red flag. Scouring thru (de)fragmentation queues reveals skbs sitting around, holding conntrack references. The problem is that since conntrack depends on nf_defrag_ipv6, nf_defrag_ipv6 will load first. Since nf_defrag_ipv6 loads first its netns exit hooks run _after_ conntrack's netns exit hook. Flush all fragment queue SKBs during fqdir_pre_exit() to release conntrack references before conntrack cleanup runs. Also flush the queues in timer expiry handlers when they discover fqdir->dead is set, in case packet sneaks in while we're running the pre_exit flush. The commit under Fixes is not exactly the culprit, but I think previously the timer firing would eventually unblock the spinning conntrack. Fixes: `d5dd88794a` ("inet: fix various use-after-free in defrags units") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:15:27 -08:00
Jakub Kicinski	1231eec699	inet: frags: add inet_frag_queue_flush() Instead of exporting inet_frag_rbtree_purge() which requires that caller takes care of memory accounting, add a new helper. We will need to call it from a few places in the next patch. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:15:27 -08:00
Linus Torvalds	bbbf7f3284	- fix a bug with O_APPEND in cached mode causing data to be written multiple times on server - use kvmalloc for trans_fd to avoid problems with large msize and fragmented memory This should hopefully be used in more transports when time allows - convert to new mount API - minor cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmk1dvwACgkQq06b7GqY 5nCRWA//a4qCTs/8FRS7N0Mz5Jg84VZ2JPnVN6iydLKbFDkgUL8JXI723VmApb6D wR21yRm7VuWpnGVdfPF6BtjZV7cYXEEwukfLqkXtOwx/WaRREKpN0sOciXMd0htg ZgnhhrabCOiSAYJYb9/29sNwhvfweQi0BeAFdIEAPrVonFUYXRzFS0v4AiBCs5PY n8X6aoshViAG05MZycB2VYaCxT45I+8YNCXtSsT/uX+3BP1FuRYMRluAYCLu/TU8 oKyFjkpIri01211OEORx6gs5CDeCv0LpELfk5EW2QF/mz4oW3/4bAchg22NgNV6x 0OCbgTqwlSJVETCZfZso/TV8efMlk1rLxSA0xjQY9r1lA26BTubrNadnC2W2nhSv GpPbDu6s3Cj3WD7P2InGtXxzUIZDCm8kHfHjzbbzgwOS6jl8SaDSnc6J2HtKNESL T55hqzqzv4POFQKrgznaQcaDW7imftOA+9xv+k+j5DTDKS9LLxiKS7+dyyzYAyyX sCjOd/T7JMvXIp4TxwbROn2+VBVYVO3ZaaKdK8e+qVKvWooP3iorGXAg0O0wZYJV tLE5zioZiRngzhqAczDIpJAe5Qd/SIi6W+sAOpKLSvVVKGf/akr0wH0KxI5z7ox8 uBStVXTOoh52qQr0L7nnBIUJ2VJLJUt9TCzwwvaLM5B1TyPgxh8= =7ST9 -----END PGP SIGNATURE----- Merge tag '9p-for-6.19-rc1' of https://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: - fix a bug with O_APPEND in cached mode causing data to be written multiple times on server - use kvmalloc for trans_fd to avoid problems with large msize and fragmented memory This should hopefully be used in more transports when time allows - convert to new mount API - minor cleanups * tag '9p-for-6.19-rc1' of https://github.com/martinetd/linux: 9p: fix new mount API cache option handling 9p: fix cache/debug options printing in v9fs_show_options 9p: convert to the new mount API 9p: create a v9fs_context structure to hold parsed options net/9p: move structures and macros to header files fs/fs_parse: add back fsparam_u32hex fs/9p: delete unnnecessary condition fs/9p: Don't open remote file with APPEND mode when writeback cache is used net/9p: cleanup: change p9_trans_module->def to bool 9p: Use kvmalloc for message buffers on supported transports	2025-12-07 08:29:09 -08:00
Linus Torvalds	7203ca412f	Significant patch series in this merge are as follows: - The 10 patch series "__vmalloc()/kvmalloc() and no-block support" from Uladzislau Rezki reworks the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT). - The 2 patch series "ksm: fix exec/fork inheritance" from xu xin fixes a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec. - The 4 patch series "mm/zswap: misc cleanup of code and documentations" from SeongJae Park does some light maintenance work on the zswap code. - The 5 patch series "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" from Mauricio Faria de Oliveira enhances the /sys/kernel/debug/page_owner debug feature. It adds unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time. - The 2 patch series "mm/page_alloc: pcp->batch cleanups" from Joshua Hahn makes some minor alterations to the page allocator's per-cpu-pages feature. - The 2 patch series "Improve UFFDIO_MOVE scalability by removing anon_vma lock" from Lokesh Gidra addresses a scalability issue in userfaultfd's UFFDIO_MOVE operation. - The 2 patch series "kasan: cleanups for kasan_enabled() checks" from Sabyrzhan Tasbolatov performs some cleanup in the KASAN code. - The 2 patch series "drivers/base/node: fold node register and unregister functions" from Donet Tom cleans up the NUMA node handling code a little. - The 4 patch series "mm: some optimizations for prot numa" from Kefeng Wang provides some cleanups and small optimizations to the NUMA allocation hinting code. - The 5 patch series "mm/page_alloc: Batch callers of free_pcppages_bulk" from Joshua Hahn addresses long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings. - The 2 patch series "optimize the logic for handling dirty file folios during reclaim" from Baolin Wang removes some now-unnecessary work from page reclaim. - The 10 patch series "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" from SeongJae Park enhances the DAMOS auto-tuning feature. - The 2 patch series "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan fixes DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration. - The 15 patch series "expand mmap_prepare functionality, port more users" from Lorenzo Stoakes enhances the new(ish) file_operations.mmap_prepare() method and ports additional callsites from the old ->mmap() over to ->mmap_prepare(). - The 8 patch series "Fix stale IOTLB entries for kernel address space" from Lu Baolu fixes a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry. - The 4 patch series "mm/huge_memory: cleanup __split_unmapped_folio()" from Wei Yang cleans up and optimizes the folio splitting code. - The 5 patch series "mm, swap: misc cleanup and bugfix" from Kairui Song implements some cleanups and a minor fix in the swap discard code. - The 8 patch series "mm/damon: misc documentation fixups" from SeongJae Park does as advertised. - The 9 patch series "mm/damon: support pin-point targets removal" from SeongJae Park permits userspace to remove a specific monitoring target in the middle of the current targets list. - The 2 patch series "mm: MISC follow-up patches for linux/pgalloc.h" from Harry Yoo implements a couple of cleanups related to mm header file inclusion. - The 2 patch series "mm/swapfile.c: select swap devices of default priority round robin" from Baoquan He improves the selection of swap devices for NUMA machines. - The 3 patch series "mm: Convert memory block states (MEM_) macros to enums" from Israel Batista changes the memory block labels from macros to enums so they will appear in kernel debug info. - The 3 patch series "ksm: perform a range-walk to jump over holes in break_ksm" from Pedro Demarchi Gomes addresses an inefficiency when KSM unmerges an address range. - The 22 patch series "mm/damon/tests: fix memory bugs in kunit tests" from SeongJae Park fixes leaks and unhandled malloc() failures in DAMON userspace unit tests. - The 2 patch series "some cleanups for pageout()" from Baolin Wang cleans up a couple of minor things in the page scanner's writeback-for-eviction code. - The 2 patch series "mm/hugetlb: refactor sysfs/sysctl interfaces" from Hui Zhu moves hugetlb's sysfs/sysctl handling code into a new file. - The 9 patch series "introduce VM_MAYBE_GUARD and make it sticky" from Lorenzo Stoakes makes the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs. - The 2 patch series "mm: perform guard region install/remove under VMA lock" from Lorenzo Stoakes reduces mmap lock contention for callers performing VMA guard region operations. - The 2 patch series "vma_start_write_killable" from Matthew Wilcox starts work in permitting applications to be killed when they are waiting on a read_lock on the VMA lock. - The 11 patch series "mm/damon/tests: add more tests for online parameters commit" from SeongJae Park adds additional userspace testing of DAMON's "commit" feature. - The 9 patch series "mm/damon: misc cleanups" from SeongJae Park does that. - The 2 patch series "make VM_SOFTDIRTY a sticky VMA flag" from Lorenzo Stoakes addresses the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another. - The 16 patch series "mm: support device-private THP" from Balbir Singh introduces support for Transparent Huge Page (THP) migration in zone device-private memory. - The 3 patch series "Optimize folio split in memory failure" from Zi Yan optimizes folio split operations in the memory failure code. - The 2 patch series "mm/huge_memory: Define split_type and consolidate split support checks" from Wei Yang provides some more cleanups in the folio splitting code. - The 16 patch series "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" from Lorenzo Stoakes cleans up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t. - The 4 patch series "reparent the THP split queue" from Muchun Song reparents the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources. - The 3 patch series "unify PMD scan results and remove redundant cleanup" from Wei Yang does a little cleanup in the hugepage collapse code. - The 6 patch series "zram: introduce writeback bio batching" from Sergey Senozhatsky improves zram writeback efficiency by introducing batched bio writeback support. - The 4 patch series "memcg: cleanup the memcg stats interfaces" from Shakeel Butt cleans up our handling of the interrupt safety of some memcg stats. - The 4 patch series "make vmalloc gfp flags usage more apparent" from Vishal Moola cleans up vmalloc's handling of incoming GFP flags. - The 6 patch series "mm: Add soft-dirty and uffd-wp support for RISC-V" from Chunyan Zhang teches soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension. - The 5 patch series "mm: swap: small fixes and comment cleanups" from Youngjun Park fixes a small bug and cleans up some of the swap code. - The 4 patch series "initial work on making VMA flags a bitmap" from Lorenzo Stoakes starts work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit. - The 2 patch series "mm/swapfile: fix and cleanup swap list iterations" from Youngjun Park addresses a possible bug in the swap discard code and cleans things up a little. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTEb0wAKCRDdBJ7gKXxA jjfIAP94W4EkCCwNOupnChoG+YWw/JW21anXt5NN+i5svn1yugEAwzvv6A+cAFng o+ug/fyrfPZG7PLp2R8WFyGIP0YoBA4= =IUzS -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit `ebb9aeb980` ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] * tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...	2025-12-05 13:52:43 -08:00
Jakub Kicinski	4a18b6cd7c	bluetooth-next pull request for net-next: core: - HCI: Add initial support for PAST - hci_core: Introduce HCI_CONN_FLAG_PAST - ISO: Add support to bind to trigger PAST - HCI: Always use the identity address when initializing a connection - ISO: Attempt to resolve broadcast address - MGMT: Allow use of Set Device Flags without Add Device - ISO: Fix not updating BIS sender source address - HCI: Add support for LL Extended Feature Set driver: - btusb: Add new VID/PID 2b89/6275 for RTL8761BUV - btusb: MT7920: Add VID/PID 0489/e135 - btusb: MT7922: Add VID/PID 0489/e170 - btusb: Add new VID/PID 13d3/3533 for RTL8821CE - btusb: Add new VID/PID 0x0489/0xE12F for RTL8852BE-VT - btusb: Add new VID/PID 0x13d3/0x3618 for RTL8852BE-VT - btusb: Add new VID/PID 0x13d3/0x3619 for RTL8852BE-VT - btusb: Reclassify Qualcomm WCN6855 debug packets - btintel_pcie: Introduce HCI Driver protocol - btintel_pcie: Support for S4 (Hibernate) - btintel_pcie: Suspend/Resume: Controller doorbell interrupt handling - dt-bindings: net: Convert Marvell 8897/8997 bindings to DT schema - btbcm: Use kmalloc_array() to prevent overflow - btrtl: Add the support for RTL8761CUV - hci_h5: avoid sending two SYNC messages - hci_h5: implement CRC data integrity MAINTAINERS: - Add Bartosz Golaszewski as Qualcomm hci_qca maintainer -----BEGIN PGP SIGNATURE----- iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmkuCjwZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKdOvD/9ReyEoUaUZ6aKC7TO66fT0 eo/sga036O3k9oeeECsEBOOgwOy86gbR7uGol/vV6mocm/Duql/pNqMrgC28Cxhy u+aGX7sf3s8Pc/e82Syx6u+QPKa6xz3Hnx1UdCM/HXb0nOkJUm3VHFmshFxmXE9g xyMWKtLp1DrXOZNLauR/p8fgAEhGQs8muICuWT/SrEbXZ4+coQoz2h6279IA9FGZ 2qj/pcLIBFk0qePkiaZ5LJGjF0P0+S9uX9XlhF9yIsLIckH8qAbPsHpD1+RsbkId R4WYxIBTVeeUhvWQezodsZJa8HRFQendCeBq8QhOo6fEcprpxrb4/8NS6hipblfy rTeyKUoPmPZ/5/nvx9pbRSqqdVOVT/pEOzD5o66bppX7s7Xq3k/WIRqCKpM4znIp iZyUlaoX6J5R+SHaOLVzRun6oUzXnIHbIGbWopfQBtMgpDajTcxQJpsTWdFQW4El RP+N9xoF+OBa4pKumKCe11pzUJOkBislDIAjNvp8wmevfSfkmo8CbqK5WwfL0rar VeBDlkfPYLVNiJ30WKwmLC4ymRvzAFhC0R6LoDPw8OWTZ7VBc7ReCcHDp8GBWbew pKe7jMWCCOo1q9zx0KBAwghQ6ZbABnK6N6Fg/Wpb/kfz7YkqnsRWRJkI2BTotEdu +bIZ2MwuJ59ROnfPy0tNBw== =ndah -----END PGP SIGNATURE----- Merge tag 'for-net-next-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: core: - HCI: Add initial support for PAST - hci_core: Introduce HCI_CONN_FLAG_PAST - ISO: Add support to bind to trigger PAST - HCI: Always use the identity address when initializing a connection - ISO: Attempt to resolve broadcast address - MGMT: Allow use of Set Device Flags without Add Device - ISO: Fix not updating BIS sender source address - HCI: Add support for LL Extended Feature Set driver: - btusb: Add new VID/PID 2b89/6275 for RTL8761BUV - btusb: MT7920: Add VID/PID 0489/e135 - btusb: MT7922: Add VID/PID 0489/e170 - btusb: Add new VID/PID 13d3/3533 for RTL8821CE - btusb: Add new VID/PID 0x0489/0xE12F for RTL8852BE-VT - btusb: Add new VID/PID 0x13d3/0x3618 for RTL8852BE-VT - btusb: Add new VID/PID 0x13d3/0x3619 for RTL8852BE-VT - btusb: Reclassify Qualcomm WCN6855 debug packets - btintel_pcie: Introduce HCI Driver protocol - btintel_pcie: Support for S4 (Hibernate) - btintel_pcie: Suspend/Resume: Controller doorbell interrupt handling - dt-bindings: net: Convert Marvell 8897/8997 bindings to DT schema - btbcm: Use kmalloc_array() to prevent overflow - btrtl: Add the support for RTL8761CUV - hci_h5: avoid sending two SYNC messages - hci_h5: implement CRC data integrity MAINTAINERS: - Add Bartosz Golaszewski as Qualcomm hci_qca maintainer * tag 'for-net-next-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (29 commits) Bluetooth: btusb: Add new VID/PID 13d3/3533 for RTL8821CE Bluetooth: HCI: Add support for LL Extended Feature Set drivers/bluetooth: btbcm: Use kmalloc_array() to prevent overflow Bluetooth: btintel_pcie: Introduce HCI Driver protocol Bluetooth: btusb: add new custom firmwares Bluetooth: btusb: Add new VID/PID 0x13d3/0x3619 for RTL8852BE-VT Bluetooth: btusb: Add new VID/PID 0x13d3/0x3618 for RTL8852BE-VT Bluetooth: btusb: Add new VID/PID 0x0489/0xE12F for RTL8852BE-VT Bluetooth: iso: fix socket matching ambiguity between BIS and CIS Bluetooth: MAINTAINERS: Add Bartosz Golaszewski as Qualcomm hci_qca maintainer Bluetooth: btrtl: Add the support for RTL8761CUV Bluetooth: Remove redundant pm_runtime_mark_last_busy() calls dt-bindings: net: Convert Marvell 8897/8997 bindings to DT schema Bluetooth: btusb: Reclassify Qualcomm WCN6855 debug packets Bluetooth: btusb: Add new VID/PID 2b89/6275 for RTL8761BUV Bluetooth: btintel_pcie: Suspend/Resume: Controller doorbell interrupt handling Bluetooth: btintel_pcie: Support for S4 (Hibernate) Bluetooth: btusb: MT7922: Add VID/PID 0489/e170 Bluetooth: btusb: MT7920: Add VID/PID 0489/e135 Bluetooth: ISO: Fix not updating BIS sender source address ... ==================== Link: https://patch.msgid.link/20251201213818.97249-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-01 17:10:52 -08:00
Vladimir Oltean	0e75bfe340	net: dsa: add simple HSR offload helpers It turns out that HSR offloads are so fine-grained that many DSA switches can do a small part even though they weren't specifically designed for the protocols supported by that driver (HSR and PRP). Specifically NETIF_F_HW_HSR_DUP - it is simple packet duplication on transmit, towards all (aka 2) ports members of the HSR device. For many DSA switches, we know how to duplicate a packet, even though we never typically use that feature. The transmit port mask from the tagging protocol can have multiple bits set, and the switch should send the packet once to every port with a bit set from that mask. Nonetheless, not all tagging protocols are like this, and sometimes the port is a single numeric value rather than a bit mask. For that reason, and also because switches can sometimes change tagging protocols for different ones, we need to make HSR offload helpers opt-in. For devices that can do nothing else HSR-specific, we introduce dsa_port_simple_hsr_join() and dsa_port_simple_hsr_leave(). These functions monitor when two user ports of the same switch are part of the same HSR device, and when that condition is true, they toggle the NETIF_F_HW_HSR_DUP feature flag of both net devices. Normally only dsa_port_simple_hsr_join() and dsa_port_simple_hsr_leave() are needed. The dsa_port_simple_hsr_validate() helper is just to see what kind of configuration could be offloadable using the generic helpers. This is used by switch drivers which are not currently using the right tagging protocol to offload this HSR ring, but could in principle offload it after changing the tagger. Suggested-by: David Yang <mmyangfl@gmail.com> Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk> Cc: Chester A. Unal" <chester.a.unal@arinc9.com> Cc: "Clément Léger" <clement.leger@bootlin.com> Cc: Daniel Golle <daniel@makrotopia.org> Cc: DENG Qingfang <dqfext@gmail.com> Cc: Florian Fainelli <florian.fainelli@broadcom.com> Cc: George McCollister <george.mccollister@gmail.com> Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: Jonas Gorski <jonas.gorski@gmail.com> Cc: Kurt Kanzenbach <kurt@linutronix.de> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Sean Wang <sean.wang@mediatek.com> Cc: UNGLinuxDriver@microchip.com Cc: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251130131657.65080-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-01 16:45:07 -08:00
Long Li	9bf66036d6	net: mana: Handle hardware recovery events when probing the device When MANA is being probed, it's possible that hardware is in recovery mode and the device may get GDMA_EQE_HWC_RESET_REQUEST over HWC in the middle of the probe. Detect such condition and go through the recovery service procedure. Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1764193552-9712-1-git-send-email-longli@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-01 13:53:53 -08:00
Luiz Augusto von Dentz	a106e50be7	Bluetooth: HCI: Add support for LL Extended Feature Set This adds support for emulating LL Extended Feature Set introduced in 6.0 that adds the following: Commands: - HCI_LE_Read_All_Local_Supported_Features(0x2087)(Feature:47,1) - HCI_LE_Read_All_Remote_Features(0x2088)(Feature:47,2) Events: - HCI_LE_Read_All_Remote_Features_Complete(0x2b)(Mask bit:42) Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-12-01 16:21:16 -05:00
Luiz Augusto von Dentz	14b06c3a88	Bluetooth: HCI: Always use the identity address when initializing a connection This makes sure hci_conn is initialized with the identity address if a matching IRK exists which avoids the trouble of having to do it at multiple places which seems to be missing (e.g. CIS, BIS and PA). Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-12-01 16:00:06 -05:00
Luiz Augusto von Dentz	d3413703d5	Bluetooth: ISO: Add support to bind to trigger PAST This makes it possible to bind to a different destination address after being connected (BT_CONNECTED, BT_CONNECT2) which then triggers PAST Sender proceedure to transfer the PA Sync to the destination address. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-12-01 16:00:04 -05:00
Luiz Augusto von Dentz	c530569adc	Bluetooth: hci_core: Introduce HCI_CONN_FLAG_PAST This introduces a new device flag so userspace can indicate if it wants to enable PAST Receiver for a specific device. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-12-01 15:58:54 -05:00
Luiz Augusto von Dentz	33b2835f0b	Bluetooth: HCI: Add initial support for PAST This adds PAST related commands (HCI_OP_LE_PAST, HCI_OP_LE_PAST_SET_INFO and HCI_OP_LE_PAST_PARAMS) and events (HCI_EV_LE_PAST_RECEIVED) along with handling of PAST sender and receiver features bits including new MGMG settings ( HCI_EV_LE_PAST_RECEIVED and MGMT_SETTING_PAST_RECEIVER) which userspace can use to determine if PAST is supported by the controller. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-12-01 15:58:54 -05:00
Jakub Kicinski	840a64710e	netfilter pull request 25-11-28 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmko6jsACgkQ1w0aZmrP KyGIGBAAkmLtKNMnouv2eOJjJb50ERQ1cYvKG3zSI5GrOnkYvfS3MfU5rLuBR/ee L/xRpgNZdXMFAu1nkpFbNIoSwpOe3JaUuizlzLwTRYRmtZeRlGfvzDqiY4CDYKU1 7gBP0EMeTeF0SJRntU6S+zoTY7Xru5w40u5wVTnm0etiigwklv4EgixnzuSLSdkz Av3KLE0BN85cNs6onZ6s4N4dEpIyQ7Ln0imdFiJOLvg42lM6uVNfXB6CxUIo/tIC VzY9vQ5rTfhcNx3lRbaJaDOE6k01x+RsBM15AkkAlafLMfvRIH4zK9qiV9tfT6c+ t7md70+7w6j7zB9sXuI1tSMOCMvtxYfB49RJVomasEJ8J7VZ+x/7vaFYSfvydEVb hy1v9jOuViWWCEQhswLwQw/Xl42MVCE/zReHHBAxIC+I7nAZgEYqOCtYYPex3gZq l5gfiJhWqdg5yOuQepZkNo5TaFbkANgFcDuUp8IfWsbwZ2xdIIqIbHVNmenr0UuS 4ml+t8is/rsLi/gHoKfmfbG64wG1reVcRpVxWQljr9ePkg+04fRtesaOG44k/R+i wdUxHL4D4WV2SnNHznw8J12tgbsIc/VgwU0EFEUxUahc18quxaumZTVuL7enbFw1 3qgN+9qQ5ONDuABR9fedFGIoCFmOVkZLXgJnLgTC7bbZ6v0GvSM= =exaR -----END PGP SIGNATURE----- Merge tag 'nf-next-25-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following batch contains Netfilter updates for net-next: 0) Add sanity check for maximum encapsulations in bridge vlan, reported by the new AI robot. 1) Move the flowtable path discovery code to its own file, the nft_flow_offload.c mixes the nf_tables evaluation with the path discovery logic, just split this in two for clarity. 2) Consolidate flowtable xmit path by using dev_queue_xmit() and the real device behind the layer 2 vlan/pppoe device. This allows to inline encapsulation. After this update, hw_ifidx can be removed since both ifidx and hw_ifidx now point to the same device. 3) Support for IPIP encapsulation in the flowtable, extend selftest to cover for this new layer 3 offload, from Lorenzo Bianconi. 4) Push down the skb into the conncount API to fix duplicates in the conncount list for packets with non-confirmed conntrack entries, this is due to an optimization introduced in `d265929930` ("netfilter: nf_conncount: reduce unnecessary GC"). From Fernando Fernandez Mancera. 5) In conncount, disable BH when performing garbage collection to consolidate existing behaviour in the conncount API, also from Fernando. 6) A matching packet with a confirmed conntrack invokes GC if conncount reaches the limit in an attempt to release slots. This allows the existing extensions to be used for real conntrack counting, not just limiting new connections, from Fernando. 7) Support for updating ct count objects in nf_tables, from Fernando. 8) Extend nft_flowtables.sh selftest to send IPv6 TCP traffic, from Lorenzo Bianconi. 9) Fixes for UAPI kernel-doc documentation, from Randy Dunlap. * tag 'nf-next-25-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: improve UAPI kernel-doc comments netfilter: ip6t_srh: fix UAPI kernel-doc comments format selftests: netfilter: nft_flowtable.sh: Add the capability to send IPv6 TCP traffic netfilter: nft_connlimit: add support to object update operation netfilter: nft_connlimit: update the count if add was skipped netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH netfilter: nf_conncount: rework API to use sk_buff directly selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest netfilter: flowtable: Add IPIP tx sw acceleration netfilter: flowtable: Add IPIP rx sw acceleration netfilter: flowtable: use tuple address to calculate next hop netfilter: flowtable: remove hw_ifidx netfilter: flowtable: inline pppoe encapsulation in xmit path netfilter: flowtable: inline vlan encapsulation in xmit path netfilter: flowtable: consolidate xmit path netfilter: flowtable: move path discovery infrastructure to its own file netfilter: flowtable: check for maximum number of encapsulations in bridge vlan ==================== Link: https://patch.msgid.link/20251128002345.29378-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-28 20:08:39 -08:00
Jakub Kicinski	2c80116b50	Apart from the usual small things just driver updates: - mt76: - WED support for >32-bit DMA - airoha NPU support - regdomain improvements - continued WiFi7/MLO work - rtw89 - support USB devices RTL8852AU and RTL8852CU - initial work for RTL8922DE - improved injection support - rtl8xxxu: 40 MHz connection fixes/support - brcmfmac: Acer A1 840 tablet quirk -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmkoKZoACgkQ10qiO8sP aADi1g//e4/kTZzl8j09V/PU+2xPQ6dqNBwsYjwowl4CPusWEJqny0M5nOs9F1ob 5lpVY3rMl4S6D5yUHY9B1fBkAgj3xuky4Udm0KONpwiGMexIn1CjlND5Qa2XW2fz BaHMoCI6RXzdgQoQqWQNtyxvstsb5PXfAE8h3avO+uoFhfm9zdvmWLw4cjgy76qo YAcUhwgIntc3oouDajOahwnxNDR2ZmZ+ATwDsmoqzhOtvTLoARnZD4+tLD1VFe+L yW3FdrbQYYOAdRyYiIcCIiLfr9AvqeEluCYy06J4Viafkf8io84IgijTuxM8tHpp spA0RA0LWNwcaYG6xf07VwjwbuhnhJEZEAfapEqhF7R6zcH7ZA6Y3vLB9JhB9bPX UnOb+kLrqiwnKHyHbcyW8uVFPj4D9vl9xDKM0wGCKFrv14q4Cwy/uIWW5Vy6GJnh Iyft0RxG83jU4x3uSx9Ywss/ByfhBuRChrBpy3ud6hf5D5dtbnH2310kBvmNMla0 G+y2/EDjmC4uFAglKS7CwoYHYE6KJclg1hxX8jZoKq5EoKBT+n7/uM8a6vDa5lW5 l1Sa3nJHfHHbQCQKc8jSHoC543rYMid36bJpUtnaWse35cN7bQI2v1EFuOmQD4zO W/OSJnH7roGUaFu9k76cCQjXWgF4NEgmRGlnvSrPEJ5YnPDJ6VU= =7c9f -----END PGP SIGNATURE----- Merge tag 'wireless-next-2025-11-27' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Apart from the usual small things just driver updates: - mt76: - WED support for >32-bit DMA - airoha NPU support - regdomain improvements - continued WiFi7/MLO work - rtw89 - support USB devices RTL8852AU and RTL8852CU - initial work for RTL8922DE - improved injection support - rtl8xxxu: 40 MHz connection fixes/support - brcmfmac: Acer A1 840 tablet quirk * tag 'wireless-next-2025-11-27' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (152 commits) wifi: mac80211: allow sharing identical chanctx for S1G interfaces wifi: nl80211: vendor-cmd: intel: fix a blank kernel-doc line warning wifi: cfg80211: include s1g_primary_2mhz when comparing chandefs wifi: cfg80211: include s1g_primary_2mhz when sending chandef wifi: ieee80211: correct FILS status codes mt76: mt7615: Fix memory leak in mt7615_mcu_wtbl_sta_add() wifi: mt76: mt792x: fix wifi init fail by setting MCU_RUNNING after CLC load wifi: mt76: Strip whitespace from build ddate wifi: mt76: mt7996: Add missing locking in mt7996_mac_sta_rc_work() wifi: mt76: mt7996: skip ieee80211_iter_keys() on scanning link remove wifi: mt76: mt7996: skip deflink accounting for offchannel links wifi: mt76: Move mt76_abort_scan out of mt76_reset_device() wifi: mt76: mt7996: move mt7996_update_beacons under mt76 mutex wifi: mt76: mt7996: grab mt76 mutex in mt7996_mac_sta_event() wifi: mt76: mt7925: ensure the 6GHz A-MPDU density cap from the hardware. wifi: mt76: mt7996: fix EMI rings for RRO wifi: mt76: mt7996: fix using wrong phy to start in mt7996_mac_restart() wifi: mt76: mt7996: fix MLO set key and group key issues wifi: mt76: mt7996: fix MLD group index assignment wifi: mt76: mt7996: use correct link_id when filling TXD and TXP ... ==================== Link: https://patch.msgid.link/20251127103806.17776-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-28 19:34:21 -08:00
Fernando Fernandez Mancera	be102eb6a0	netfilter: nf_conncount: rework API to use sk_buff directly When using nf_conncount infrastructure for non-confirmed connections a duplicated track is possible due to an optimization introduced since commit `d265929930` ("netfilter: nf_conncount: reduce unnecessary GC"). In order to fix this introduce a new conncount API that receives directly an sk_buff struct. It fetches the tuple and zone and the corresponding ct from it. It comes with both existing conncount variants nf_conncount_count_skb() and nf_conncount_add_skb(). In addition remove the old API and adjust all the users to use the new one. This way, for each sk_buff struct it is possible to check if there is a ct present and already confirmed. If so, skip the add operation. Fixes: `d265929930` ("netfilter: nf_conncount: reduce unnecessary GC") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-11-28 00:05:49 +00:00
Lorenzo Bianconi	ab427db178	netfilter: flowtable: Add IPIP rx sw acceleration Introduce sw acceleration for rx path of IPIP tunnels relying on the netfilter flowtable infrastructure. Subsequent patches will add sw acceleration for IPIP tunnels tx path. This series introduces basic infrastructure to accelerate other tunnel types (e.g. IP6IP6). IPIP rx sw acceleration can be tested running the following scenario where the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP tunnel is used to access a remote site (using eth1 as the underlay device): ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2) $ip addr show 6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff inet 192.168.0.2/24 scope global eth0 valid_lft forever preferred_lft forever 7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff inet 192.168.1.1/24 scope global eth1 valid_lft forever preferred_lft forever 8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000 link/ipip 192.168.1.1 peer 192.168.1.2 inet 192.168.100.1/24 scope global tun0 valid_lft forever preferred_lft forever $ip route show default via 192.168.100.2 dev tun0 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1 192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1 $nft list ruleset table inet filter { flowtable ft { hook ingress priority filter devices = { eth0, eth1 } } chain forward { type filter hook forward priority filter; policy accept; meta l4proto { tcp, udp } flow add @ft } } Reproducing the scenario described above using veths I got the following results: - TCP stream received from the IPIP tunnel: - net-next: (baseline) ~ 71Gbps - net-next + IPIP flowtbale support: ~101Gbps Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-11-28 00:00:38 +00:00
Pablo Neira Ayuso	030feea309	netfilter: flowtable: remove hw_ifidx hw_ifidx was originally introduced to store the real netdevice as a requirement for the hardware offload support in: `73f97025a9` ("netfilter: nft_flow_offload: use direct xmit if hardware offload is enabled") Since ("netfilter: flowtable: consolidate xmit path"), ifidx and hw_ifidx points to the real device in the xmit path, remove it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-11-28 00:00:22 +00:00
Pablo Neira Ayuso	b5964aac51	netfilter: flowtable: consolidate xmit path Use dev_queue_xmit() for the XMIT_NEIGH case. Store the interface index of the real device behind the vlan/pppoe device, this introduces an extra lookup for the real device in the xmit path because rt->dst.dev provides the vlan/pppoe device. XMIT_NEIGH now looks more similar to XMIT_DIRECT but the check for stale dst and the neighbour lookup still remain in place which is convenient to deal with network topology changes. Note that nft_flow_route() needs to relax the check for _XMIT_NEIGH so the existing basic xfrm offload (which only works in one direction) does not break. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-11-27 23:59:56 +00:00
Pablo Neira Ayuso	93d7a7ed07	netfilter: flowtable: move path discovery infrastructure to its own file This file contains the path discovery that is run from the forward chain for the packet offloading the flow into the flowtable. This consists of a series of calls to dev_fill_forward_path() for each device stack. More topologies may be supported in the future, so move this code to its own file to separate it from the nftables flow_offload expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-11-27 23:59:43 +00:00
Jakub Kicinski	db4029859d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: net/xdp/xsk.c `0ebc27a4c6` ("xsk: avoid data corruption on cq descriptor number") `8da7bea7db` ("xsk: add indirect call for xsk_destruct_skb") `30ed05adca` ("xsk: use a smaller new lock for shared pool case") https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-27 12:19:08 -08:00
Eric Dumazet	9a5e5334ad	tcp: remove icsk->icsk_retransmit_timer Now sk->sk_timer is no longer used by TCP keepalive, we can use its storage for TCP and MPTCP retransmit timers for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:28:29 -08:00
Eric Dumazet	08dfe37023	tcp: introduce icsk->icsk_keepalive_timer sk->sk_timer has been used for TCP keepalives. Keepalive timers are not in fast path, we want to use sk->sk_timer storage for retransmit timers, for better cache locality. Create icsk->icsk_keepalive_timer and change keepalive code to no longer use sk->sk_timer. Added space is reclaimed in the following patch. This includes changes to MPTCP, which was also using sk_timer. Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer for inet_sk_diag_fill() sake. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:28:29 -08:00
Eric Dumazet	27e8257a86	net: move sk_dst_pending_confirm and sk_pacing_status to sock_read_tx group These two fields are mostly read in TCP tx path, move them in an more appropriate group for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:28:29 -08:00
Eric Dumazet	3a6e8fd0bf	tcp: rename icsk_timeout() to tcp_timeout_expires() In preparation of sk->tcp_timeout_timer introduction, rename icsk_timeout() helper and change its argument to plain 'const struct sock *sk'. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:28:28 -08:00
Eric Dumazet	191ff13e42	net_sched: add qdisc_dequeue_drop() helper Some qdisc like cake, codel, fq_codel might drop packets in their dequeue() method. This is currently problematic because dequeue() runs with the qdisc spinlock held. Freeing skbs can be extremely expensive. Add qdisc_dequeue_drop() method and a new TCQ_F_DEQUEUE_DROPS so that these qdiscs can opt-in to defer the skb frees after the socket spinlock is released. TCQ_F_DEQUEUE_DROPS is an attempt to not penalize other qdiscs with an extra cache line miss. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-14-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	0170d7f47c	net_sched: add tcf_kfree_skb_list() helper Using kfree_skb_list_reason() to free list of skbs from qdisc operations seems wrong as each skb might have a different drop reason. Cleanup __dev_xmit_skb() to call tcf_kfree_skb_list() once in preparation of the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-13-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	ad50d5a3fc	net_sched: add Qdisc_read_mostly and Qdisc_write groups It is possible to reorg Qdisc to avoid always dirtying 2 cache lines in fast path by reducing this to a single dirtied cache line. In current layout, we change only four/six fields in the first cache line: - q.spinlock - q.qlen - bstats.bytes - bstats.packets - some Qdisc also change q.next/q.prev In the second cache line we change in the fast path: - running - state - qstats.backlog /* --- cacheline 2 boundary (128 bytes) --- / struct sk_buff_head gso_skb __attribute__((__aligned__(64))); / 0x80 0x18 / struct qdisc_skb_head q; / 0x98 0x18 / struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); / 0xb0 0x10 / / --- cacheline 3 boundary (192 bytes) --- / struct gnet_stats_queue qstats; / 0xc0 0x14 / bool running; / 0xd4 0x1 / / XXX 3 bytes hole, try to pack / unsigned long state; / 0xd8 0x8 / struct Qdisc next_sched; /* 0xe0 0x8 / struct sk_buff_head skb_bad_txq; / 0xe8 0x18 / / --- cacheline 4 boundary (256 bytes) --- / Reorganize things to have a first cache line mostly read, then a mostly written one. This gives a ~3% increase of performance under tx stress. Note that there is an additional hole because @qstats now spans over a third cache line. / --- cacheline 2 boundary (128 bytes) --- / __u8 __cacheline_group_begin__Qdisc_read_mostly[0] __attribute__((__aligned__(64))); / 0x80 0 / struct sk_buff_head gso_skb; / 0x80 0x18 / struct Qdisc next_sched; /* 0x98 0x8 / struct sk_buff_head skb_bad_txq; / 0xa0 0x18 / __u8 __cacheline_group_end__Qdisc_read_mostly[0]; / 0xb8 0 / / XXX 8 bytes hole, try to pack / / --- cacheline 3 boundary (192 bytes) --- / __u8 __cacheline_group_begin__Qdisc_write[0] __attribute__((__aligned__(64))); / 0xc0 0 / struct qdisc_skb_head q; / 0xc0 0x18 / unsigned long state; / 0xd8 0x8 / struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); / 0xe0 0x10 / bool running; / 0xf0 0x1 / / XXX 3 bytes hole, try to pack / struct gnet_stats_queue qstats; / 0xf4 0x14 / / --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- / __u8 __cacheline_group_end__Qdisc_write[0]; / 0x108 0 / / XXX 56 bytes hole, try to pack */ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-8-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	2773cb0b31	net_sched: use qdisc_skb_cb(skb)->pkt_segs in bstats_update() Avoid up to two cache line misses in qdisc dequeue() to fetch skb_shinfo(skb)->gso_segs/gso_size while qdisc spinlock is held. This gives a 5 % improvement in a TX intensive workload. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-6-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	b2a38f6df9	net_sched: make room for (struct qdisc_skb_cb)->pkt_segs Add a new u16 field, next to pkt_len : pkt_segs This will cache shinfo->gso_segs to speed up qdisc deqeue(). Move slave_dev_queue_mapping at the end of qdisc_skb_cb, and move three bits from tc_skb_cb : - post_ct - post_ct_snat - post_ct_dnat Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-2-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:31 +01:00
Lachlan Hodges	cba1ba11c1	wifi: cfg80211: include s1g_primary_2mhz when comparing chandefs When comparing chandefs, ensure we include s1g_primary_2mhz. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20251125025927.245280-3-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-25 10:31:28 +01:00
Paolo Abeni	075b19c211	net: factor-out _sk_charge() helper Move out of __inet_accept() the code dealing charging newly accepted socket to memcg. MPTCP will soon use it to on a per subflow basis, in different contexts. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Geliang Tang <geliang@kernel.org> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-1-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-24 19:49:40 -08:00
Eric Dumazet	4fe5a00ec7	net: sched: fix TCF_LAYER_TRANSPORT handling in tcf_get_base_ptr() syzbot reported that tcf_get_base_ptr() can be called while transport header is not set [1]. Instead of returning a dangling pointer, return NULL. Fix tcf_get_base_ptr() callers to handle this NULL value. [1] WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 skb_transport_header include/linux/skbuff.h:3071 [inline] WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 tcf_get_base_ptr include/net/pkt_cls.h:539 [inline] WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 em_nbyte_match+0x2d8/0x3f0 net/sched/em_nbyte.c:43 Modules linked in: CPU: 1 UID: 0 PID: 6019 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Call Trace: <TASK> tcf_em_match net/sched/ematch.c:494 [inline] __tcf_em_tree_match+0x1ac/0x770 net/sched/ematch.c:520 tcf_em_tree_match include/net/pkt_cls.h:512 [inline] basic_classify+0x115/0x2d0 net/sched/cls_basic.c:50 tc_classify include/net/tc_wrapper.h:197 [inline] __tcf_classify net/sched/cls_api.c:1764 [inline] tcf_classify+0x4cf/0x1140 net/sched/cls_api.c:1860 multiq_classify net/sched/sch_multiq.c:39 [inline] multiq_enqueue+0xfd/0x4c0 net/sched/sch_multiq.c:66 dev_qdisc_enqueue+0x4e/0x260 net/core/dev.c:4118 __dev_xmit_skb net/core/dev.c:4214 [inline] __dev_queue_xmit+0xe83/0x3b50 net/core/dev.c:4729 packet_snd net/packet/af_packet.c:3076 [inline] packet_sendmsg+0x3e33/0x5080 net/packet/af_packet.c:3108 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0x21c/0x270 net/socket.c:742 ____sys_sendmsg+0x505/0x830 net/socket.c:2630 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot+f3a497f02c389d86ef16@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6920855a.a70a0220.2ea503.0058.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20251121154100.1616228-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-24 18:53:14 -08:00
Daniel Zahka	2a367002ed	devlink: support default values for param-get and param-set Support querying and resetting to default param values. Introduce two new devlink netlink attrs: DEVLINK_ATTR_PARAM_VALUE_DEFAULT and DEVLINK_ATTR_PARAM_RESET_DEFAULT. The former is used to contain an optional parameter value inside of the param_value nested attribute. The latter is used in param-set requests from userspace to indicate that the driver should reset the param to its default value. To implement this, two new functions are added to the devlink driver api: devlink_param::get_default() and devlink_param::reset_default(). These callbacks allow drivers to implement default param actions for runtime and permanent cmodes. For driverinit params, the core latches the last value set by a driver via devl_param_driverinit_value_set(), and uses that as the default value for a param. Because default parameter values are optional, it would be impossible to discern whether or not a param of type bool has default value of false or not provided if the default value is encoded using a netlink flag type. For this reason, when a DEVLINK_PARAM_TYPE_BOOL has an associated default value, the default value is encoded using a u8 type. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251119025038.651131-4-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 19:01:22 -08:00
Daniel Zahka	011d133bb9	devlink: pass extack through to devlink_param::get() Allow devlink_param::get() handlers to report error messages via extack. This function is called in a few different contexts, but not all of them will have an valid extack to use. When devlink_param::get() is called from param_get_doit or param_get_dumpit contexts, pass the extack through so that drivers can report errors when retrieving param values. devlink_param::get() is called from the context of devlink_param_notify(), pass NULL in for the extack. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251119025038.651131-2-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 19:01:22 -08:00
Eric Dumazet	ecfea98b7d	tcp: add net.ipv4.tcp_rcvbuf_low_rtt This is a follow up of commit `aa251c8463` ("tcp: fix too slow tcp_rcvbuf_grow() action") which brought again the issue that I tried to fix in commit `65c5287892` ("tcp: fix sk_rcvbuf overshoot") We also recently increased tcp_rmem[2] to 32 MB in commit `572be9bf9d` ("tcp: increase tcp_rmem[2] to 32 MB") Idea of this patch is to not let tcp_rcvbuf_grow() grow sk->sk_rcvbuf too fast for small RTT flows. If sk->sk_rcvbuf is too big, this can force NIC driver to not recycle pages from their page pool, and also can cause cache evictions for DDIO enabled cpus/NIC, as receivers are usually slower than senders. Add net.ipv4.tcp_rcvbuf_low_rtt sysctl, set by default to 1000 usec (1 ms) If RTT if smaller than the sysctl value, use the RTT/tcp_rcvbuf_low_rtt ratio to control sk_rcvbuf inflation. Tested: Pair of hosts with a 200Gbit IDPF NIC. Using netperf/netserver Client initiates 8 TCP bulk flows, asking netserver to use CPU #10 only. super_netperf 8 -H server -T,10 -l 30 On server, use perf -e tcp:tcp_rcvbuf_grow while test is running. Before: sysctl -w net.ipv4.tcp_rcvbuf_low_rtt=1 perf record -a -e tcp:tcp_rcvbuf_grow sleep 30 ; perf script\|tail -20\|cut -c30-230 1153.051201: tcp:tcp_rcvbuf_grow: time=398 rtt_us=382 copied=6905856 inq=180224 space=6115328 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil 1153.138752: tcp:tcp_rcvbuf_grow: time=446 rtt_us=413 copied=5529600 inq=180224 space=4505600 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=21286912 famil 1153.361484: tcp:tcp_rcvbuf_grow: time=415 rtt_us=380 copied=7061504 inq=204800 space=6725632 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil 1153.457642: tcp:tcp_rcvbuf_grow: time=483 rtt_us=421 copied=5885952 inq=720896 space=4407296 ooo=0 scaling_ratio=240 rcvbuf=23763511 rcv_ssthresh=22223271 window_clamp=22278291 rcv_wnd=21430272 famil 1153.466002: tcp:tcp_rcvbuf_grow: time=308 rtt_us=281 copied=3244032 inq=180224 space=2883584 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41992059 window_clamp=42050919 rcv_wnd=41713664 famil 1153.747792: tcp:tcp_rcvbuf_grow: time=394 rtt_us=332 copied=4460544 inq=585728 space=3063808 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41992059 window_clamp=42050919 rcv_wnd=41373696 famil 1154.260747: tcp:tcp_rcvbuf_grow: time=652 rtt_us=226 copied=10977280 inq=737280 space=9486336 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29197743 window_clamp=29217691 rcv_wnd=28368896 fami 1154.375019: tcp:tcp_rcvbuf_grow: time=461 rtt_us=443 copied=7573504 inq=507904 space=6856704 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25288704 famil 1154.463072: tcp:tcp_rcvbuf_grow: time=494 rtt_us=408 copied=7983104 inq=200704 space=7065600 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25579520 famil 1154.474658: tcp:tcp_rcvbuf_grow: time=507 rtt_us=459 copied=5586944 inq=540672 space=4718592 ooo=0 scaling_ratio=240 rcvbuf=17852266 rcv_ssthresh=16692999 window_clamp=16736499 rcv_wnd=16056320 famil 1154.584657: tcp:tcp_rcvbuf_grow: time=494 rtt_us=427 copied=8126464 inq=204800 space=7782400 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil 1154.702117: tcp:tcp_rcvbuf_grow: time=480 rtt_us=406 copied=5734400 inq=180224 space=5349376 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=21286912 famil 1155.941595: tcp:tcp_rcvbuf_grow: time=717 rtt_us=670 copied=11042816 inq=3784704 space=7159808 ooo=0 scaling_ratio=240 rcvbuf=19581357 rcv_ssthresh=18333222 window_clamp=18357522 rcv_wnd=14614528 fam 1156.384735: tcp:tcp_rcvbuf_grow: time=529 rtt_us=473 copied=9011200 inq=180224 space=7258112 ooo=0 scaling_ratio=240 rcvbuf=19581357 rcv_ssthresh=18333222 window_clamp=18357522 rcv_wnd=18018304 famil 1157.821676: tcp:tcp_rcvbuf_grow: time=529 rtt_us=272 copied=8224768 inq=602112 space=6545408 ooo=0 scaling_ratio=240 rcvbuf=67000000 rcv_ssthresh=62793576 window_clamp=62812500 rcv_wnd=62115840 famil 1158.906379: tcp:tcp_rcvbuf_grow: time=710 rtt_us=445 copied=11845632 inq=540672 space=10240000 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29205935 window_clamp=29217691 rcv_wnd=28536832 fam 1164.600160: tcp:tcp_rcvbuf_grow: time=841 rtt_us=430 copied=12976128 inq=1290240 space=11304960 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29212591 window_clamp=29217691 rcv_wnd=27856896 fa 1165.163572: tcp:tcp_rcvbuf_grow: time=845 rtt_us=800 copied=12632064 inq=540672 space=7921664 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25912795 window_clamp=25937095 rcv_wnd=25260032 fami 1165.653464: tcp:tcp_rcvbuf_grow: time=388 rtt_us=309 copied=4493312 inq=180224 space=3874816 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41995899 window_clamp=42050919 rcv_wnd=41713664 famil 1166.651211: tcp:tcp_rcvbuf_grow: time=556 rtt_us=553 copied=6328320 inq=540672 space=5554176 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=20946944 famil After: sysctl -w net.ipv4.tcp_rcvbuf_low_rtt=1000 perf record -a -e tcp:tcp_rcvbuf_grow sleep 30 ; perf script\|tail -20\|cut -c30-230 1457.053149: tcp:tcp_rcvbuf_grow: time=128 rtt_us=24 copied=1441792 inq=40960 space=1269760 ooo=0 scaling_ratio=240 rcvbuf=2960741 rcv_ssthresh=2605474 window_clamp=2775694 rcv_wnd=2568192 family=AF_I 1458.000778: tcp:tcp_rcvbuf_grow: time=128 rtt_us=31 copied=1441792 inq=24576 space=1400832 ooo=0 scaling_ratio=240 rcvbuf=3060163 rcv_ssthresh=2810042 window_clamp=2868902 rcv_wnd=2674688 family=AF_I 1458.088059: tcp:tcp_rcvbuf_grow: time=190 rtt_us=110 copied=3227648 inq=385024 space=2781184 ooo=0 scaling_ratio=240 rcvbuf=6728240 rcv_ssthresh=6252705 window_clamp=6307725 rcv_wnd=5799936 family=AF 1458.148549: tcp:tcp_rcvbuf_grow: time=232 rtt_us=129 copied=3956736 inq=237568 space=2842624 ooo=0 scaling_ratio=240 rcvbuf=6731333 rcv_ssthresh=6252705 window_clamp=6310624 rcv_wnd=5918720 family=AF 1458.466861: tcp:tcp_rcvbuf_grow: time=193 rtt_us=83 copied=2949120 inq=180224 space=2457600 ooo=0 scaling_ratio=240 rcvbuf=5751438 rcv_ssthresh=5357689 window_clamp=5391973 rcv_wnd=5054464 family=AF_ 1458.775476: tcp:tcp_rcvbuf_grow: time=257 rtt_us=127 copied=4304896 inq=352256 space=3346432 ooo=0 scaling_ratio=240 rcvbuf=8067131 rcv_ssthresh=7523275 window_clamp=7562935 rcv_wnd=7061504 family=AF 1458.776631: tcp:tcp_rcvbuf_grow: time=200 rtt_us=96 copied=3260416 inq=143360 space=2768896 ooo=0 scaling_ratio=240 rcvbuf=6397256 rcv_ssthresh=5938567 window_clamp=5997427 rcv_wnd=5828608 family=AF_ 1459.707973: tcp:tcp_rcvbuf_grow: time=215 rtt_us=96 copied=2506752 inq=163840 space=1388544 ooo=0 scaling_ratio=240 rcvbuf=3068867 rcv_ssthresh=2768282 window_clamp=2877062 rcv_wnd=2555904 family=AF_ 1460.246494: tcp:tcp_rcvbuf_grow: time=231 rtt_us=80 copied=3756032 inq=204800 space=3117056 ooo=0 scaling_ratio=240 rcvbuf=7288091 rcv_ssthresh=6773725 window_clamp=6832585 rcv_wnd=6471680 family=AF_ 1460.714596: tcp:tcp_rcvbuf_grow: time=270 rtt_us=110 copied=4714496 inq=311296 space=3719168 ooo=0 scaling_ratio=240 rcvbuf=8957739 rcv_ssthresh=8339020 window_clamp=8397880 rcv_wnd=7933952 family=AF 1462.029977: tcp:tcp_rcvbuf_grow: time=101 rtt_us=19 copied=1105920 inq=40960 space=1036288 ooo=0 scaling_ratio=240 rcvbuf=2338970 rcv_ssthresh=2091684 window_clamp=2192784 rcv_wnd=1986560 family=AF_I 1462.802385: tcp:tcp_rcvbuf_grow: time=89 rtt_us=45 copied=1069056 inq=0 space=1064960 ooo=0 scaling_ratio=240 rcvbuf=2338970 rcv_ssthresh=2091684 window_clamp=2192784 rcv_wnd=2035712 family=AF_INET6 1462.918648: tcp:tcp_rcvbuf_grow: time=105 rtt_us=33 copied=1441792 inq=180224 space=1069056 ooo=0 scaling_ratio=240 rcvbuf=2383282 rcv_ssthresh=2091684 window_clamp=2234326 rcv_wnd=1896448 family=AF_ 1463.222533: tcp:tcp_rcvbuf_grow: time=273 rtt_us=144 copied=4603904 inq=385024 space=3469312 ooo=0 scaling_ratio=240 rcvbuf=8422564 rcv_ssthresh=7891053 window_clamp=7896153 rcv_wnd=7409664 family=AF 1466.519312: tcp:tcp_rcvbuf_grow: time=130 rtt_us=23 copied=1343488 inq=0 space=1261568 ooo=0 scaling_ratio=240 rcvbuf=2780158 rcv_ssthresh=2493778 window_clamp=2606398 rcv_wnd=2494464 family=AF_INET6 1466.681003: tcp:tcp_rcvbuf_grow: time=128 rtt_us=21 copied=1441792 inq=12288 space=1343488 ooo=0 scaling_ratio=240 rcvbuf=2932027 rcv_ssthresh=2578555 window_clamp=2748775 rcv_wnd=2568192 family=AF_I 1470.689959: tcp:tcp_rcvbuf_grow: time=255 rtt_us=122 copied=3932160 inq=204800 space=3551232 ooo=0 scaling_ratio=240 rcvbuf=8182038 rcv_ssthresh=7647384 window_clamp=7670660 rcv_wnd=7442432 family=AF 1471.754154: tcp:tcp_rcvbuf_grow: time=188 rtt_us=95 copied=2138112 inq=577536 space=1429504 ooo=0 scaling_ratio=240 rcvbuf=3113650 rcv_ssthresh=2806426 window_clamp=2919046 rcv_wnd=2248704 family=AF_ 1476.813542: tcp:tcp_rcvbuf_grow: time=269 rtt_us=99 copied=3088384 inq=180224 space=2564096 ooo=0 scaling_ratio=240 rcvbuf=6219470 rcv_ssthresh=5771893 window_clamp=5830753 rcv_wnd=5509120 family=AF_ 1477.738309: tcp:tcp_rcvbuf_grow: time=166 rtt_us=54 copied=1777664 inq=180224 space=1417216 ooo=0 scaling_ratio=240 rcvbuf=3117118 rcv_ssthresh=2874958 window_clamp=2922298 rcv_wnd=2613248 family=AF_ We can see sk_rcvbuf values are much smaller, and that rtt_us (estimation of rtt from a receiver point of view) is kept small, instead of being bloated. No difference in throughput. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20251119084813.3684576-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 17:44:23 -08:00
Eric Dumazet	6d5dea6824	tcp: tcp_moderate_rcvbuf is only used in rx path sysctl_tcp_moderate_rcvbuf is only used from tcp_rcvbuf_grow(). Move it to netns_ipv4_read_rx group. Remove various CACHELINE_ASSERT_GROUP_SIZE() from netns_ipv4_struct_check(), as they have no real benefit but cause pain for all changes. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251119084813.3684576-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 17:44:23 -08:00
Pauli Virtanen	79a2d4678b	Bluetooth: hci_core: lookup hci_conn on RX path on protocol side The hdev lock/lookup/unlock/use pattern in the packet RX path doesn't ensure hci_conn* is not concurrently modified/deleted. This locking appears to be leftover from before conn_hash started using RCU commit `bf4c632524` ("Bluetooth: convert conn hash to RCU") and not clear if it had purpose since then. Currently, there are code paths that delete hci_conn* from elsewhere than the ordered hdev->workqueue where the RX work runs in. E.g. commit `5af1f84ed1` ("Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync") introduced some of these, and there probably were a few others before it. It's better to do the locking so that even if these run concurrently no UAF is possible. Move the lookup of hci_conn and associated socket-specific conn to protocol recv handlers, and do them within a single critical section to cover hci_conn* usage and lookup. syzkaller has reported a crash that appears to be this issue: [Task hdev->workqueue] [Task 2] hci_disconnect_all_sync l2cap_recv_acldata(hcon) hci_conn_get(hcon) hci_abort_conn_sync(hcon) hci_dev_lock hci_dev_lock hci_conn_del(hcon) v-------------------------------- hci_dev_unlock hci_conn_put(hcon) conn = hcon->l2cap_data (UAF) Fixes: `5af1f84ed1` ("Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync") Reported-by: syzbot+d32d77220b92eddd89ad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d32d77220b92eddd89ad Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-11-20 17:01:09 -05:00
Chris Lu	4015b97976	Bluetooth: btusb: mediatek: Fix kernel crash when releasing mtk iso interface When performing reset tests and encountering abnormal card drop issues that lead to a kernel crash, it is necessary to perform a null check before releasing resources to avoid attempting to release a null pointer. <4>[ 29.158070] Hardware name: Google Quigon sku196612/196613 board (DT) <4>[ 29.158076] Workqueue: hci0 hci_cmd_sync_work [bluetooth] <4>[ 29.158154] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) <4>[ 29.158162] pc : klist_remove+0x90/0x158 <4>[ 29.158174] lr : klist_remove+0x88/0x158 <4>[ 29.158180] sp : ffffffc0846b3c00 <4>[ 29.158185] pmr_save: 000000e0 <4>[ 29.158188] x29: ffffffc0846b3c30 x28: ffffff80cd31f880 x27: ffffff80c1bdc058 <4>[ 29.158199] x26: dead000000000100 x25: ffffffdbdc624ea3 x24: ffffff80c1bdc4c0 <4>[ 29.158209] x23: ffffffdbdc62a3e6 x22: ffffff80c6c07000 x21: ffffffdbdc829290 <4>[ 29.158219] x20: 0000000000000000 x19: ffffff80cd3e0648 x18: 000000031ec97781 <4>[ 29.158229] x17: ffffff80c1bdc4a8 x16: ffffffdc10576548 x15: ffffff80c1180428 <4>[ 29.158238] x14: 0000000000000000 x13: 000000000000e380 x12: 0000000000000018 <4>[ 29.158248] x11: ffffff80c2a7fd10 x10: 0000000000000000 x9 : 0000000100000000 <4>[ 29.158257] x8 : 0000000000000000 x7 : 7f7f7f7f7f7f7f7f x6 : 2d7223ff6364626d <4>[ 29.158266] x5 : 0000008000000000 x4 : 0000000000000020 x3 : 2e7325006465636e <4>[ 29.158275] x2 : ffffffdc11afeff8 x1 : 0000000000000000 x0 : ffffffdc11be4d0c <4>[ 29.158285] Call trace: <4>[ 29.158290] klist_remove+0x90/0x158 <4>[ 29.158298] device_release_driver_internal+0x20c/0x268 <4>[ 29.158308] device_release_driver+0x1c/0x30 <4>[ 29.158316] usb_driver_release_interface+0x70/0x88 <4>[ 29.158325] btusb_mtk_release_iso_intf+0x68/0xd8 [btusb (HASH:e8b6 5)] <4>[ 29.158347] btusb_mtk_reset+0x5c/0x480 [btusb (HASH:e8b6 5)] <4>[ 29.158361] hci_cmd_sync_work+0x10c/0x188 [bluetooth (HASH:a4fa 6)] <4>[ 29.158430] process_scheduled_works+0x258/0x4e8 <4>[ 29.158441] worker_thread+0x300/0x428 <4>[ 29.158448] kthread+0x108/0x1d0 <4>[ 29.158455] ret_from_fork+0x10/0x20 <0>[ 29.158467] Code: 91343000 940139d1 f9400268 927ff914 (f9401297) <4>[ 29.158474] ---[ end trace 0000000000000000 ]--- <0>[ 29.167129] Kernel panic - not syncing: Oops: Fatal exception <2>[ 29.167144] SMP: stopping secondary CPUs <4>[ 29.167158] ------------[ cut here ]------------ Fixes: `ceac1cb025` ("Bluetooth: btusb: mediatek: add ISO data transmission functions") Signed-off-by: Chris Lu <chris.lu@mediatek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-11-20 16:51:14 -05:00
Jakub Kicinski	9e203721ec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile `e1bb28bf13` ("selftest: af_unix: Add test for SO_PEEK_OFF.") `45a1cd8346` ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 09:13:26 -08:00
Pagadala Yesu Anjaneyulu	a77f0ad44f	wifi: cfg80211: Add support for 6GHz AP role not relevant AP type Add IEEE80211_6GHZ_CTRL_REG_AP_ROLE_NOT_RELEVANT and map it to IEEE80211_REG_LPI_AP for safe regulatory compliance when AP role classification is not applicable. Use LPI as safe fallback to prevent power limit violations. Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251112110828.856283677cc7.I36138a34847c3b4e680974bf347dde844448f3bc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-20 10:25:10 +01:00
Aditya Garg	45120304e8	net: mana: Drop TX skb on post_work_request failure and unmap resources Drop TX packets when posting the work request fails and ensure DMA mappings are always cleaned up. Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1763464269-10431-3-git-send-email-gargaditya@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:11:57 -08:00
Aditya Garg	934fa943b5	net: mana: Handle SKB if TX SGEs exceed hardware limit The MANA hardware supports a maximum of 30 scatter-gather entries (SGEs) per TX WQE. Exceeding this limit can cause TX failures. Add ndo_features_check() callback to validate SKB layout before transmission. For GSO SKBs that would exceed the hardware SGE limit, clear NETIF_F_GSO_MASK to enforce software segmentation in the stack. Add a fallback in mana_start_xmit() to linearize non-GSO SKBs that still exceed the SGE limit. Also, Add ethtool counter for SKBs linearized Co-developed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1763464269-10431-2-git-send-email-gargaditya@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:11:57 -08:00
Jakub Kicinski	c3995fc1a8	ipsec-2025-11-18 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmkcEO8ACgkQrB3Eaf9P W7eXjA//ReWvgmIwM87WjEwI0E8y/ChS3GwWOMKo2XVwntkuctW+gvTfKn7WDMcs AuqbhCpoRdA1a3rEUWNBKoMT1PYmWHt4oElC2vEodIKcvrtVpOukyHQg5zaOTRni TCiXUD5kojyCC3YX8J2VXnIsvmHl/0Wo2iEd9MBivOkKXh7UGy/azOqPMhwmQBHx Ds37Mj86tRPylEaVtW9Js7BWTBWBCg5TpUJbJY8DvaYP1TBFduao2ExMo2dFPeYC 495N856k+Pa1OVqW6Ss40I59UXmXbs5WcUd8mOhleqxUaAQoaUqSfQwdw0UErS+2 lttuMH1pnNgpkWMgusXWgs8lxXiwbH74eIthtR6/9k/B80eKaQ5Rwp8sAZ0DV+8M FoL7PBHWQzWvc+/L+8zJ0g78mv5+ymvSdkl2ZQXPJiJF1hdZ31RGQAwlPDYqrq63 WNu19dKwXzASWR/YBXO9vw7pdjljs8BXZcTMNDZcS3FgWonI47nTIpy0vjx9vinm 4KzaIpg+cjEt1SNrO45sPoBmoMj642aEHtkAEhR47U8FHQTBW2/9l/WdpIJhYhjb IrVdVw32Fo55HJby14YlwODPpUJ0t/UcI32KdTXd5kI+UqqyeiIxdtLfaXiNTDGJ RQ80mTeG9AKxfcD7LGK73ndWJxBb+2C+6MPQN7+AF3rh1bFcGQ8= =h/E5 -----END PGP SIGNATURE----- Merge tag 'ipsec-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2025-11-18 1) Misc fixes for xfrm_state creation/modification/deletion. Patchset from Sabrina Dubroca. 2) Fix inner packet family determination for xfrm offloads. From Jianbo Liu. 3) Don't push locally generated packets directly to L2 tunnel mode offloading, they still need processing from the standard xfrm path. From Jianbo Liu. 4) Fix memory leaks in xfrm_add_acquire for policy offloads and policy security contexts. From Zilin Guan. * tag 'ipsec-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix memory leak in xfrm_add_acquire() xfrm: Prevent locally generated packets from direct output in tunnel mode xfrm: Determine inner GSO type from packet inner protocol xfrm: Check inner packet family directly from skb_dst xfrm: check all hash buckets for leftover states during netns deletion xfrm: set err and extack on failure to create pcpu SA xfrm: call xfrm_dev_state_delete when xfrm_state_migrate fails to add the state xfrm: make state as DEAD before final put when migrate fails xfrm: also call xfrm_state_delete_tunnel at destroy time for states that were never added xfrm: drop SA reference in xfrm_state_update if dir doesn't match ==================== Link: https://patch.msgid.link/20251118085344.2199815-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-18 17:58:44 -08:00
Erni Sri Satya Vennela	be4f1d67ec	net: mana: Add standard counter rx_missed_errors Report standard counter stats->rx_missed_errors using hc_rx_discards_no_wqe from the hardware. Add a global workqueue to periodically run mana_query_gf_stats every 2 seconds to get the latest info in eth_stats and define a driver capability flag to notify hardware of the periodic queries. To avoid repeated failures and log flooding, the workqueue is not rescheduled if mana_query_gf_stats fails on HWC timeout error and the stats are reset to 0. Other errors are transient which will not need a VF reset for recovery. Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1763120599-6331-3-git-send-email-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-17 19:52:30 -08:00
Erni Sri Satya Vennela	e275d9091c	net: mana: Move hardware counter stats from per-port to per-VF context Move hardware counter (HC) statistics from mana_port_context to mana_context to enable sharing stats across multiple network ports on the same MANA VF. Previously, each network port queried hardware counters independently using MANA_QUERY_GF_STAT command (GF = Generic Function stats from GDMA hardware), resulting in redundant queries when multiple ports existed on the same device. Isolate hardware counter stats by introducing mana_ethtool_hc_stats in mana_context and update the code to ensure all stats are properly reported via ethtool -S <interface>, maintaining consistency with previous behavior. Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1763120599-6331-2-git-send-email-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-17 19:52:30 -08:00
Shakeel Butt	d929525c2e	memcg: net: track network throttling due to memcg memory pressure The kernel can throttle network sockets if the memory cgroup associated with the corresponding socket is under memory pressure. The throttling actions include clamping the transmit window, failing to expand receive or send buffers, aggressively prune out-of-order receive queue, FIN deferred to a retransmitted packet and more. Let's add memcg metric to track such throttling actions. At the moment memcg memory pressure is defined through vmpressure and in future it may be defined using PSI or we may add more flexible way for the users to define memory pressure, maybe through ebpf. However the potential throttling actions will remain the same, so this newly introduced metric will continue to track throttling actions irrespective of how memcg memory pressure is defined. Link: https://lkml.kernel.org/r/20251016161035.86161-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Daniel Sedlak <daniel.sedlak@cdn77.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-16 17:28:06 -08:00
Yue Haibing	06ac470658	sctp: Remove unused declaration sctp_auth_init_hmacs() Commit `bf40785fa4` ("sctp: Use HMAC-SHA1 and HMAC-SHA256 library for chunk authentication") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20251113114501.32905-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-14 18:00:34 -08:00
Eric Dumazet	6d650ae928	tcp: gro: inline tcp_gro_pull_header() tcp_gro_pull_header() is used in GRO fast path, inline it. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251113140358.58242-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-14 18:00:08 -08:00
Heiner Kallweit	4aa73c6051	net: dsa: remove definition of struct dsa_switch_driver Since `93e86b3bc8` ("net: dsa: Remove legacy probing support") this struct has no user any longer. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/4053a98f-052f-4dc1-a3d4-ed9b3d3cc7cb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-13 17:40:22 -08:00
Jakub Kicinski	c99ebb6132	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc6). No conflicts, adjacent changes in: drivers/net/phy/micrel.c `96a9178a29` ("net: phy: micrel: lan8814 fix reset of the QSGMII interface") `61b7ade9ba` ("net: phy: micrel: Add support for non PTP SKUs for lan8814") and a trivial one in tools/testing/selftests/drivers/net/Makefile. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-13 12:35:38 -08:00
Linus Torvalds	d0309c0543	Including fixes from Bluetooth and Wireless. No known outstanding regressions. Current release - regressions: - eth: bonding: fix mii_status when slave is down - eth: mlx5e: fix missing error assignment in mlx5e_xfrm_add_state() Previous releases - regressions: - sched: limit try_bulk_dequeue_skb() batches - ipv4: route: prevent rt_bind_exception() from rebinding stale fnhe - af_unix: initialise scc_index in unix_add_edge() - netpoll: fix incorrect refcount handling causing incorrect cleanup - bluetooth: don't hold spin lock over sleeping functions - hsr: Fix supervision frame sending on HSRv0 - sctp: prevent possible shift out-of-bounds - tipc: fix use-after-free in tipc_mon_reinit_self(). - dsa: tag_brcm: do not mark link local traffic as offloaded - eth: virtio-net: fix incorrect flags recording in big mode Previous releases - always broken: - sched: initialize struct tc_ife to fix kernel-infoleak - wifi: - mac80211: reject address change while connecting - iwlwifi: avoid toggling links due to wrong element use - bluetooth: cancel mesh send timer when hdev removed - strparser: fix signed/unsigned mismatch bug - handshake: fix memory leak in tls_handshake_accept() Misc: - selftests: mptcp: fix some flaky tests Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmkV/O8SHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkeBQQALgJv+sfJ/8H+BA3US7BU9kwydXtJneo mHKnse3NCiV3kXyJu7p+CIBU4xP8LMMmpVFZQ0dPKnbFyfGWjCX+UOEMU4NBnhG7 dfSGWAxR8iS0gUi/5K4dT55LZjeQ392Zu2OqrBRjAAYG0s5vXbAcCx0jUhfmG+ZD rJupFYuRuF7W3UvWC/TQviY4oJ0goaZnrm6Y5ADX9rblPlzD5xgOTqZzSuUpLxQM Q37IGjW3F12FrNmqabC3MBQcNfWNpqUgkDfFxMJbxGPDe9CjvSAAmMVNmDDkr+EH eL5+M1cdH+L9RRBNSQ4SX9dsNwhyyU04wrEADGf39jcgw/ACXI+t0laj/Hm0O1xg vMOtqiQYSe8lVVdCswmi6BdYxBsNU6l2gx0evq/qGztFbDW9s5rn26uXy5CIqlJa 04k1lmT5EeMpE9opwxPJpw+5LyjjtCsD6AGOlLb8DU6cbXKVUhxCZVHdDNnfxt4J ZfQo8aUx6X3vnDGMzPWEZbYMqd4va6hVPTJdUuqk1enuE6KfhKMhWbP5D9a/qiqM lhinWYdmBP6bKJlxdfsH7kgwhfuoi/jT54VYu33l5LmrOk7+tO7gLJcRhNZW9jsl KkJ+Wk1JQ7/oiQI8tcUCc+LEwwY54F+34HAHjFLNELW+bp/vvoMEVeZKku2YM3Gy xW+7WYdrx2RK =JAyr -----END PGP SIGNATURE----- Merge tag 'net-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth and Wireless. No known outstanding regressions. Current release - regressions: - eth: - bonding: fix mii_status when slave is down - mlx5e: fix missing error assignment in mlx5e_xfrm_add_state() Previous releases - regressions: - sched: limit try_bulk_dequeue_skb() batches - ipv4: route: prevent rt_bind_exception() from rebinding stale fnhe - af_unix: initialise scc_index in unix_add_edge() - netpoll: fix incorrect refcount handling causing incorrect cleanup - bluetooth: don't hold spin lock over sleeping functions - hsr: Fix supervision frame sending on HSRv0 - sctp: prevent possible shift out-of-bounds - tipc: fix use-after-free in tipc_mon_reinit_self(). - dsa: tag_brcm: do not mark link local traffic as offloaded - eth: virtio-net: fix incorrect flags recording in big mode Previous releases - always broken: - sched: initialize struct tc_ife to fix kernel-infoleak - wifi: - mac80211: reject address change while connecting - iwlwifi: avoid toggling links due to wrong element use - bluetooth: cancel mesh send timer when hdev removed - strparser: fix signed/unsigned mismatch bug - handshake: fix memory leak in tls_handshake_accept() Misc: - selftests: mptcp: fix some flaky tests" * tag 'net-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits) hsr: Follow standard for HSRv0 supervision frames hsr: Fix supervision frame sending on HSRv0 virtio-net: fix incorrect flags recording in big mode ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe wifi: iwlwifi: mld: always take beacon ies in link grading wifi: iwlwifi: mvm: fix beacon template/fixed rate wifi: iwlwifi: fix aux ROC time event iterator usage net_sched: limit try_bulk_dequeue_skb() batches selftests: mptcp: join: properly kill background tasks selftests: mptcp: connect: trunc: read all recv data selftests: mptcp: join: userspace: longer transfer selftests: mptcp: join: endpoints: longer transfer selftests: mptcp: join: rm: set backup flag selftests: mptcp: connect: fix fallback note due to OoO ethtool: fix incorrect kernel-doc style comment in ethtool.h mlx5: Fix default values in create CQ Bluetooth: btrtl: Avoid loading the config file on security chips net/mlx5e: Fix potentially misleading debug message net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps net/mlx5e: Fix maxrate wraparound in threshold between units ...	2025-11-13 11:20:25 -08:00
Jakub Kicinski	e949824730	More -next material, notably: - split ieee80211.h file, it's way too big - mac80211: initial chanctx work towards NAN - mac80211: MU-MIMO sniffer improvements - ath12k: statistics improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmkUdDsACgkQ10qiO8sP aABeDhAAnBtrADnRQYo5BMbV6/7JsJL3GV6Zj5/Nrq6si8TE8vquf4kevUbISpgk Mi2tQ0UxFTW6LckT3LTOxzQSZzaPANSPO9AQm/q9/BLtAsdPpgc8yHQTkZlkiatF dS+WZFSZpF8hisKmkYCDvnggaqipnJUvnwtIY6xxSZftn3J4h6B9Wp+XjyCLtDxC 2DvztdQJ3oqYBFsSpk6J0gA0/4lF+jlVmZ+DPpVSYlCRJivFAqcpwdx3vdh4Pib3 dUNgh/MJuPv01RIA783TCsHBnOKPxuD5OfusQzkXdj33yX2bcPL6M2s57FgnFf8q l4B9+R/Q7/8ohp+qMOd4S+SteFa/7WlbJ+5UjJ71y7xScBKOaRZMf6wKKqSZozP1 zOB4AxlC7COrL3tsljC0Vun9CgBL4Ov/XBe7G2WTOUIrZ2KOU328/3atndbLeVJg knwsiNdKJCJJpTkO3zHzaYfDhDghSaINj1fl67hZV7s3Jj7u4lAD8HfV/9CKoMRd X1ltgB84u/nFc2aL2fGQbQg7NJLVqIvyx6iyss6K58nNwQMf0ZFUJOihgkSMCRK3 t4qQrXVdorWnRvioA/roHICkGBZdZw53Jz+0EltRxsjTfmzkki6EbXeGhWtRzlSo 5Bx1L4vtK7143nMlD2H/JuwDopD1fOxAM8L3sTlqJE8nu/3plPA= =N5Tm -----END PGP SIGNATURE----- Merge tag 'wireless-next-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== More -next material, notably: - split ieee80211.h file, it's way too big - mac80211: initial chanctx work towards NAN - mac80211: MU-MIMO sniffer improvements - ath12k: statistics improvements * tag 'wireless-next-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (26 commits) wifi: cw1200: Fix potential memory leak in cw1200_bh_rx_helper() wifi: mac80211: make monitor link info check more specific wifi: mac80211: track MU-MIMO configuration on disabled interfaces wifi: cfg80211/mac80211: Add fallback mechanism for INDOOR_SP connection wifi: cfg80211/mac80211: clean up duplicate ap_power handling wifi: cfg80211: use a C99 initializer in wiphy_register wifi: cfg80211: fix doc of struct key_params wifi: mac80211: remove unnecessary vlan NULL check wifi: mac80211: pass frame type to element parsing wifi: mac80211: remove "disabling VHT" message wifi: mac80211: add and use chanctx usage iteration wifi: mac80211: simplify ieee80211_recalc_chanctx_min_def() API wifi: mac80211: remove chanctx to link back-references wifi: mac80211: make link iteration safe for 'break' wifi: mac80211: fix EHT typo wifi: cfg80211: fix EHT typo wifi: ieee80211: split NAN definitions out wifi: ieee80211: split P2P definitions out wifi: ieee80211: split S1G definitions out wifi: ieee80211: split EHT definitions out ... ==================== Link: https://patch.msgid.link/20251112115126.16223-4-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-12 09:33:24 -08:00
Luiz Augusto von Dentz	485e0626e5	Bluetooth: hci_event: Fix not handling PA Sync Lost event This handles PA Sync Lost event which previously was assumed to be handled with BIG Sync Lost but their lifetime are not the same thus why there are 2 different events to inform when each sync is lost. Fixes: `b2a5f2e1c1` ("Bluetooth: hci_event: Add support for handling LE BIG Sync Lost event") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-11-11 08:55:18 -05:00
Pagadala Yesu Anjaneyulu	b54cf0f449	wifi: cfg80211/mac80211: Add fallback mechanism for INDOOR_SP connection Implement fallback to LPI mode when SP mode is not permitted by regulatory constraints for INDOOR_SP connections. Limit fallback mechanism to client mode. Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251110140806.8b43201a34ae.I37fc7bb5892eb9d044d619802e8f2095fde6b296@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-11 11:05:00 +01:00
Pagadala Yesu Anjaneyulu	e18efacc9c	wifi: cfg80211/mac80211: clean up duplicate ap_power handling Move duplicated ap_power type handling code to an inline function in cfg80211. Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251110140806.959948da1cb5.I893b5168329fb3232f249c182a35c99804112da6@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-11 11:05:00 +01:00
Jason Xing	8da7bea7db	xsk: add indirect call for xsk_destruct_skb Since Eric proposed an idea about adding indirect call wrappers for UDP and managed to see a huge improvement[1], the same situation can also be applied in xsk scenario. This patch adds an indirect call for xsk and helps current copy mode improve the performance by around 1% stably which was observed with IXGBE at 10Gb/sec loaded. If the throughput grows, the positive effect will be magnified. I applied this patch on top of batch xmit series[2], and was able to see <5% improvement from our internal application which is a little bit unstable though. Use INDIRECT wrappers to keep xsk_destruct_skb static as it used to be when the mitigation config is off. Be aware of the freeing path that can be very hot since the frequency can reach around 2,000,000 times per second with the xdpsock test. [1]: https://lore.kernel.org/netdev/20251006193103.2684156-2-edumazet@google.com/ [2]: https://lore.kernel.org/all/20251021131209.41491-1-kerneljasonxing@gmail.com/ Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251031103328.95468-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-11 10:21:08 +01:00
Jakub Kicinski	7fc2bf8d30	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaRJJFwAKCRAraS+Z+3y5 E8eZAQDWQ3D76HOlLK8tBAQ8aSpxwsfr7fpheiMSCEI5r7O5sQEA5LuHrBvEb/Cr zfU7DxhEQEoVeJMTSia2hEJKWhoNpQ0= =n2Ap -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-11-10 We've added 19 non-merge commits during the last 3 day(s) which contain a total of 22 files changed, 1345 insertions(+), 197 deletions(-). The main changes are: 1) Preserve skb metadata after a TC BPF program has changed the skb, from Jakub Sitnicki. This allows a TC program at the end of a TC filter chain to still see the skb metadata, even if another TC program at the front of the chain has changed the skb using BPF helpers. 2) Initial af_smc bpf_struct_ops support to control the smc specific syn/synack options, from D. Wythe. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: bpf/selftests: Add selftest for bpf_smc_hs_ctrl net/smc: bpf: Introduce generic hook for handshake flow bpf: Export necessary symbols for modules with struct_ops selftests/bpf: Cover skb metadata access after bpf_skb_change_proto selftests/bpf: Cover skb metadata access after change_head/tail helper selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room selftests/bpf: Cover skb metadata access after vlan push/pop helper selftests/bpf: Expect unclone to preserve skb metadata selftests/bpf: Dump skb metadata on verification failure selftests/bpf: Verify skb metadata in BPF instead of userspace bpf: Make bpf_skb_change_head helper metadata-safe bpf: Make bpf_skb_change_proto helper metadata-safe bpf: Make bpf_skb_adjust_room metadata-safe bpf: Make bpf_skb_vlan_push helper metadata-safe bpf: Make bpf_skb_vlan_pop helper metadata-safe vlan: Make vlan_remove_tag return nothing bpf: Unclone skb head on bpf_dynptr_write to skb metadata net: Preserve metadata on pskb_expand_head net: Helper to move packet data and metadata after skb_push/pull ==================== Link: https://patch.msgid.link/20251110232427.3929291-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 16:43:51 -08:00
Kuniyuki Iwashima	73edb26b06	sctp: Don't inherit do_auto_asconf in sctp_clone_sock(). syzbot reported list_del(&sp->auto_asconf_list) corruption in sctp_destroy_sock(). The repro calls setsockopt(SCTP_AUTO_ASCONF, 1) to a SCTP listener, calls accept(), and close()s the child socket. setsockopt(SCTP_AUTO_ASCONF, 1) sets sp->do_auto_asconf to 1 and links sp->auto_asconf_list to a per-netns list. Both fields are placed after sp->pd_lobby in struct sctp_sock, and sctp_copy_descendant() did not copy the fields before the cited commit. Also, sctp_clone_sock() did not set them explicitly. In addition, sctp_auto_asconf_init() is called from sctp_sock_migrate(), but it initialises the fields only conditionally. The two fields relied on __GFP_ZERO added in sk_alloc(), but sk_clone() does not use it. Let's clear newsp->do_auto_asconf in sctp_clone_sock(). [0]: list_del corruption. prev->next should be ffff8880799e9148, but was ffff8880799e8808. (prev=ffff88803347d9f8) kernel BUG at lib/list_debug.c:64! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6008 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 RIP: 0010:__list_del_entry_valid_or_report+0x15a/0x190 lib/list_debug.c:62 Code: e8 7b 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 7c ee 92 fd 49 8b 17 48 c7 c7 80 0a bf 8b 48 89 de 4c 89 f9 e8 07 c6 94 fc 90 <0f> 0b 4c 89 f7 e8 4c 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 4d RSP: 0018:ffffc90003067ad8 EFLAGS: 00010246 RAX: 000000000000006d RBX: ffff8880799e9148 RCX: b056988859ee6e00 RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffc90003067807 R09: 1ffff9200060cf00 R10: dffffc0000000000 R11: fffff5200060cf01 R12: 1ffff1100668fb3f R13: dffffc0000000000 R14: ffff88803347d9f8 R15: ffff88803347d9f8 FS: 00005555823e5500(0000) GS:ffff88812613e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000200000000480 CR3: 00000000741ce000 CR4: 00000000003526f0 Call Trace: <TASK> __list_del_entry_valid include/linux/list.h:132 [inline] __list_del_entry include/linux/list.h:223 [inline] list_del include/linux/list.h:237 [inline] sctp_destroy_sock+0xb4/0x370 net/sctp/socket.c:5163 sk_common_release+0x75/0x310 net/core/sock.c:3961 sctp_close+0x77e/0x900 net/sctp/socket.c:1550 inet_release+0x144/0x190 net/ipv4/af_inet.c:437 __sock_release net/socket.c:662 [inline] sock_close+0xc3/0x240 net/socket.c:1455 __fput+0x44c/0xa70 fs/file_table.c:468 task_work_run+0x1d4/0x260 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xe9/0x130 kernel/entry/common.c:43 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline] do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: `16942cf4d3` ("sctp: Use sk_clone() in sctp_accept().") Reported-by: syzbot+ba535cb417f106327741@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/690d2185.a70a0220.22f260.000e.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20251106223418.1455510-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 16:22:09 -08:00
D. Wythe	15f295f556	net/smc: bpf: Introduce generic hook for handshake flow The introduction of IPPROTO_SMC enables eBPF programs to determine whether to use SMC based on the context of socket creation, such as network namespaces, PID and comm name, etc. As a subsequent enhancement, to introduce a new generic hook that allows decisions on whether to use SMC or not at runtime, including but not limited to local/remote IP address or ports. User can write their own implememtion via bpf_struct_ops now to choose whether to use SMC or not before TCP 3rd handshake to be comleted. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Link: https://patch.msgid.link/20251107035632.115950-3-alibuda@linux.alibaba.com	2025-11-10 11:19:41 -08:00
Chien Wong	473235677a	wifi: cfg80211: fix doc of struct key_params The seq in struct key_params is for many ciphers, including CCMP, GCMP, CMAC, GMAC. In addition to get_key(), it is also used when setting keys. Signed-off-by: Chien Wong <m@xv97.com> Link: https://patch.msgid.link/20251107142332.181308-1-m@xv97.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-10 10:39:14 +01:00
Johannes Berg	1a1cad924e	wifi: mac80211: fix EHT typo This is clearly EHT, not ETH, fix the typo. Link: https://patch.msgid.link/20251105153958.12a04517f7ec.Idcf800817fa30605b1002c3d2287cad016e7aea7@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-10 10:38:37 +01:00
Johannes Berg	30b6089aad	wifi: cfg80211: fix EHT typo This is clearly EHT, not ETH, fix the typo. Link: https://patch.msgid.link/20251105153958.e9d4af3b768e.I5f3378326837e3f62928a2f1fd3403f29cea069b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-10 10:38:36 +01:00
Jakub Kicinski	a0c3aefb08	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-11-06 (i40, ice, iavf) Mohammad Heib introduces a new devlink parameter, max_mac_per_vf, for controlling the maximum number of MAC address filters allowed by a VF. This allows administrators to control the VF behavior in a more nuanced manner. Aleksandr and Przemek add support for Receive Side Scaling of GTP to iAVF for VFs running on E800 series ice hardware. This improves performance and scalability for virtualized network functions in 5G and LTE deployments. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: iavf: add RSS support for GTP protocol via ethtool ice: Extend PTYPE bitmap coverage for GTP encapsulated flows ice: improve TCAM priority handling for RSS profiles ice: implement GTP RSS context tracking and configuration ice: add virtchnl definitions and static data for GTP RSS ice: add flow parsing for GTP and new protocol field support i40e: support generic devlink param "max_mac_per_vf" devlink: Add new "max_mac_per_vf" generic device param ==================== Link: https://patch.msgid.link/20251106225321.1609605-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 19:15:36 -08:00
Jakub Kicinski	f05d26198c	psp: add stats from psp spec to driver facing api Provide a driver api for reporting device statistics required by the "Implementation Requirements" section of the PSP Architecture Specification. Use a warning to ensure drivers report stats required by the spec. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251106002608.1578518-4-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 18:53:57 -08:00
Jakub Kicinski	dae4a92399	psp: report basic stats from the core Track and report stats common to all psp devices from the core. A 'stale-event' is when the core marks the rx state of an active psp_assoc as incapable of authenticating psp encapsulated data. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251106002608.1578518-2-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 18:53:56 -08:00
Eric Dumazet	416dd649f3	tcp: add net.ipv4.tcp_comp_sack_rtt_percent TCP SACK compression has been added in 2018 in commit `5d9f4262b7` ("tcp: add SACK compression"). It is working great for WAN flows (with large RTT). Wifi in particular gets a significant boost _when_ ACK are suppressed. Add a new sysctl so that we can tune the very conservative 5 % value that has been used so far in this formula, so that small RTT flows can benefit from this feature. delay = min ( 5 % of RTT, 1 ms) This patch adds new tcp_comp_sack_rtt_percent sysctl to ease experiments and tuning. Given that we cap the delay to 1ms (tcp_comp_sack_delay_ns sysctl), set the default value to 33 %. Quoting Neal Cardwell ( https://lore.kernel.org/netdev/CADVnQymZ1tFnEA1Q=vtECs0=Db7zHQ8=+WCQtnhHFVbEOzjVnQ@mail.gmail.com/ ) The rationale for 33% is basically to try to facilitate pipelining, where there are always at least 3 ACKs and 3 GSO/TSO skbs per SRTT, so that the path can maintain a budget for 3 full-sized GSO/TSO skbs "in flight" at all times: + 1 skb in the qdisc waiting to be sent by the NIC next + 1 skb being sent by the NIC (being serialized by the NIC out onto the wire) + 1 skb being received and aggregated by the receiver machine's aggregation mechanism (some combination of LRO, GRO, and sack compression) Note that this is basically the same magic number (3) and the same rationales as: (a) tcp_tso_should_defer() ensuring that we defer sending data for no longer than cwnd/tcp_tso_win_divisor (where tcp_tso_win_divisor = 3), and (b) bbr_quantization_budget() ensuring that cwnd is at least 3 GSO/TSO skbs to maintain pipelining and full throughput at low RTTs Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20251106115236.3450026-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 18:41:44 -08:00
Kuniyuki Iwashima	1e9d3005e0	tcp: Apply max RTO to non-TFO SYN+ACK. Since commit `54a378f434` ("tcp: add the ability to control max RTO"), TFO SYN+ACK RTO is capped by the TFO full sk's inet_csk(sk)->icsk_rto_max. The value is inherited from the parent listener. Let's apply the same cap to non-TFO SYN+ACK. Note that req->rsk_listener is always non-NULL when we call tcp_reqsk_timeout() in reqsk_timer_handler() or tcp_check_req(). It could be NULL for SYN cookie req, but we do not use req->timeout then. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251106003357.273403-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 18:05:26 -08:00
Kuniyuki Iwashima	207ce0f6bc	tcp: Remove timeout arg from reqsk_timeout(). reqsk_timeout() is always called with @timeout being TCP_RTO_MAX. Let's remove the arg. As a prep for the next patch, reqsk_timeout() is moved to tcp.h and renamed to tcp_reqsk_timeout(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251106003357.273403-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 18:05:26 -08:00
Kuniyuki Iwashima	3ce5dd8161	tcp: Remove timeout arg from reqsk_queue_hash_req(). inet_csk_reqsk_queue_hash_add() is no longer shared by DCCP. We do not need to pass req->timeout down to reqsk_queue_hash_req(). Let's move tcp_timeout_init() from tcp_conn_request() to reqsk_queue_hash_req(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251106003357.273403-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 18:05:25 -08:00
Kuniyuki Iwashima	be88c549e9	tcp: Call tcp_syn_ack_timeout() directly. Since DCCP has been removed, we do not need to use request_sock_ops.syn_ack_timeout(). Let's call tcp_syn_ack_timeout() directly. Now other function pointers of request_sock_ops are protocol-dependent. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251106003357.273403-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 18:05:25 -08:00
Daniel Borkmann	24ab8efb9a	xsk: Move NETDEV_XDP_ACT_ZC into generic header Move NETDEV_XDP_ACT_ZC into xdp_sock_drv.h header such that external code can reuse it, and rename it into more generic NETDEV_XDP_ACT_XSK. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20251031212103.310683-7-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 16:46:11 -08:00
Daniel Golle	c6230446b1	net: dsa: add tagging driver for MaxLinear GSW1xx switch family Add support for a new DSA tagging protocol driver for the MaxLinear GSW1xx switch family. The GSW1xx switches use a proprietary 8-byte special tag inserted between the source MAC address and the EtherType field to indicate the source and destination ports for frames traversing the CPU port. Implement the tag handling logic to insert the special tag on transmit and parse it on receive. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Link: https://patch.msgid.link/0e973ebfd9433c30c96f50670da9e9449a0d98f2.1762170107.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 14:16:17 -08:00
Mohammad Heib	9352d40c8b	devlink: Add new "max_mac_per_vf" generic device param Add a new device generic parameter to controls the maximum number of MAC filters allowed per VF. For example, to limit a VF to 3 MAC addresses: $ devlink dev param set pci/0000:3b:00.0 name max_mac_per_vf \ value 3 \ cmode runtime Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2025-11-06 12:57:31 -08:00
Linus Torvalds	c90841db35	hardening fixes for v6.18-rc5 - Introduce __nocfi_generic for arm32 Clang (Nathan Chancellor) -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaQz8RwAKCRA2KwveOeQk u51dAP9YhHIttRev7rdRwDWwrlhO+i2fps0vq8G9S6keSndr+AEAv8BtQexexPhI 1oSNLeB/kVUVp5dQWjni48IgMxaG4A8= =zFGT -----END PGP SIGNATURE----- Merge tag 'hardening-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: "This is a work-around for a (now fixed) corner case in the arm32 build with Clang KCFI enabled. - Introduce __nocfi_generic for arm32 Clang (Nathan Chancellor)" * tag 'hardening-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: libeth: xdp: Disable generic kCFI pass for libeth_xdp_tx_xmit_bulk() ARM: Select ARCH_USES_CFI_GENERIC_LLVM_PASS compiler_types: Introduce __nocfi_generic	2025-11-06 11:54:59 -08:00
Jakub Kicinski	1ec9871fbb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc5). Conflicts: drivers/net/wireless/ath/ath12k/mac.c `9222582ec5` ("Revert "wifi: ath12k: Fix missing station power save configuration"") `6917e268c4` ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon") https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net Adjacent changes: drivers/net/ethernet/intel/Kconfig `b1d16f7c00` ("libie: depend on DEBUG_FS when building LIBIE_FWLOG") `93f53db9f9` ("ice: switch to Page Pool") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 09:27:40 -08:00

... 3 4 5 6 7 ...

19301 Commits