linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-29 17:43:52 +02:00

Author	SHA1	Message	Date
Linus Torvalds	334fbe734e	mm.git review status for linus..mm-stable Everything: Total patches: 368 Reviews/patch: 1.56 Reviewed rate: 74% Excluding DAMON: Total patches: 316 Reviews/patch: 1.77 Reviewed rate: 81% Excluding DAMON and zram: Total patches: 306 Reviews/patch: 1.81 Reviewed rate: 82% Excluding DAMON, zram and maple_tree: Total patches: 276 Reviews/patch: 2.01 Reviewed rate: 91% Significant patch series in this merge: - The 30 patch series "maple_tree: Replace big node with maple copy" from Liam Howlett is mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - The 12 patch series "mm, swap: swap table phase III: remove swap_map" from Kairui Song offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - The 2 patch series "mm: memfd_luo: preserve file seals" from Pratyush Yadav adds file seal preservation to LUO's memfd code. - The 2 patch series "mm: zswap: add per-memcg stat for incompressible pages" from Jiayuan Chen adds additional userspace stats reportng to zswap. - The 4 patch series "arch, mm: consolidate empty_zero_page" from Mike Rapoport implements some cleanups for our handling of ZERO_PAGE() and zero_pfn. - The 2 patch series "mm/kmemleak: Improve scan_should_stop() implementation" from Zhongqiu Han provides an robustness improvement and some cleanups in the kmemleak code. - The 4 patch series "Improve khugepaged scan logic" from Vernon Yang "improves the khugepaged scan logic and reduces CPU consumption by prioritizing scanning tasks that access memory frequently". - The 2 patch series "Make KHO Stateless" from Jason Miu simplifies Kexec Handover by "transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel" - The 3 patch series "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" from Thomas Ballasi and Steven Rostedt enhances vmscan's tracepointing. - The 5 patch series "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" from Catalin Marinas is a cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation. - The 2 patch series "Fix KASAN support for KHO restored vmalloc regions" from Pasha Tatashin fixes a WARN() which can be emitted the KHO restores a vmalloc area. - The 4 patch series "mm: Remove stray references to pagevec" from Tal Zussman provides several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago. - The 17 patch series "mm: Eliminate fake head pages from vmemmap optimization" from Kiryl Shutsemau simplifies the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page. - The 2 patch series "mm/damon/core: improve DAMOS quota efficiency for core layer filters" from SeongJae Park improves two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used. - The 3 patch series "mm/damon: strictly respect min_nr_regions" from SeongJae Park improves DAMON usability by extending the treatment of the min_nr_regions user-settable parameter. - The 3 patch series "mm/page_alloc: pcp locking cleanup" from Vlastimil Babka is a proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ennsed. - The 16 patch series "mm: cleanups around unmapping / zapping" from David Hildenbrand implements "a bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions". - The 6 patch series "support batched checking of the young flag for MGLRU" from Baolin Wang supports batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64. - The 5 patch series "memcg: obj stock and slab stat caching cleanups" from Johannes Weiner provides memcg cleanup and robustness improvements. - The 5 patch series "Allow order zero pages in page reporting" from Yuvraj Sakshith enhances page_reporting's free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - The 6 patch series "mm: vma flag tweaks" from Lorenzo Stoakes is cleanup work following from the recent conversion of the VMA flags to a bitmap. - The 10 patch series "mm/damon: add optional debugging-purpose sanity checks" from SeongJae Park adds some more developer-facing debug checks into DAMON core. - The 2 patch series "mm/damon: test and document power-of-2 min_region_sz requirement" from SeongJae Park adds an additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling. - The 3 patch series "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" from SeongJae Park fixes a hard-to-hit time overflow issue in DAMON core. - The 7 patch series "mm/damon: improve/fixup/update ratio calculation, test and documentation" from SeongJae Park is a "batch of misc/minor improvements and fixups" for DAMON. - The 4 patch series "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" from David Hildenbrand fixes a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - The 6 patch series "zram: recompression cleanups and tweaks" from Sergey Senozhatsky provides "a somewhat random mix of fixups, recompression cleanups and improvements" in the zram code. - The 11 patch series "mm/damon: support multiple goal-based quota tuning algorithms" from SeongJae Park extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select. - The 4 patch series "mm: thp: reduce unnecessary start_stop_khugepaged()" from Breno Leitao fixes the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged. - The 3 patch series "mm: improve map count checks" from Lorenzo Stoakes provides some cleanups and slight fixes in the mremap, mmap and vma code. - The 5 patch series "mm/damon: support addr_unit on default monitoring targets for modules" from SeongJae Park extends the use of DAMON core's addr_unit tunable. - The 5 patch series "mm: khugepaged cleanups and mTHP prerequisites" from Nico Pache provides cleanups in the khugepaged and is a base for Nico's planned khugepaged mTHP support. - The 15 patch series "mm: memory hot(un)plug and SPARSEMEM cleanups" from David Hildenbrand implements code movement and cleanups in the memhotplug and sparsemem code. - The 2 patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" from David Hildenbrand rationalizes some memhotplug Kconfig support. - The 6 patch series "change young flag check functions to return bool" from Baolin Wang is "a cleanup patchset to change all young flag check functions to return bool". - The 3 patch series "mm/damon/sysfs: fix memory leak and NULL dereference issues" from Josh Law and SeongJae Park fixes a few potential DAMON bugs. - The 25 patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma code" from "converts a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it". Mainly in the vma code. - The 21 patch series "mm: expand mmap_prepare functionality and usage" from Lorenzo Stoakes "expands the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time". Cleanups, documentation, extension of mmap_prepare into filesystem drivers. - The 13 patch series "mm/huge_memory: refactor zap_huge_pmd()" from Lorenzo Stoakes simplifies and cleans up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad3HDQAKCRDdBJ7gKXxA jrUQAPwNhPk5nPSxnyxjAeQtOBHqgCdnICeEismLajPKd9aYRgEA0s2XAu3tSUYi GrBnWImHG3s4ePQxVcPCegWTsOUrXgQ= =1Q7o -----END PGP SIGNATURE----- Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...	2026-04-15 12:59:16 -07:00
Linus Torvalds	91a4855d6c	Networking changes for 7.1. Core & protocols ---------------- - Support HW queue leasing, allowing containers to be granted access to HW queues for zero-copy operations and AF_XDP. - Number of code moves to help the compiler with inlining. Avoid output arguments for returning drop reason where possible. - Rework drop handling within qdiscs to include more metadata about the reason and dropping qdisc in the tracepoints. - Remove the rtnl_lock use from IP Multicast Routing. - Pack size information into the Rx Flow Steering table pointer itself. This allows making the table itself a flat array of u32s, thus making the table allocation size a power of two. - Report TCP delayed ack timer information via socket diag. - Add ip_local_port_step_width sysctl to allow distributing the randomly selected ports more evenly throughout the allowed space. - Add support for per-route tunsrc in IPv6 segment routing. - Start work of switching sockopt handling to iov_iter. - Improve dynamic recvbuf sizing in MPTCP, limit burstiness and avoid buffer size drifting up. - Support MSG_EOR in MPTCP. - Add stp_mode attribute to the bridge driver for STP mode selection. This addresses concerns about call_usermodehelper() usage. - Remove UDP-Lite support (as announced in 2023). - Remove support for building IPv6 as a module. Remove the now unnecessary function calling indirection. Cross-tree stuff ---------------- - Move Michael MIC code from generic crypto into wireless, it's considered insecure but some WiFi networks still need it. Netfilter --------- - Switch nft_fib_ipv6 module to no longer need temporary dst_entry object allocations by using fib6_lookup() + RCU. Florian W reports this gets us ~13% higher packet rate. - Convert IPVS's global __ip_vs_mutex to per-net service_mutex and switch the service tables to be per-net. Convert some code that walks the service lists to use RCU instead of the service_mutex. - Add more opinionated input validation to lower security exposure. - Make IPVS hash tables to be per-netns and resizable. Wireless -------- - Finished assoc frame encryption/EPPKE/802.1X-over-auth. - Radar detection improvements. - Add 6 GHz incumbent signal detection APIs. - Multi-link support for FILS, probe response templates and client probing. - New APIs and mac80211 support for NAN (Neighbor Aware Networking, aka Wi-Fi Aware) so less work must be in firmware. Driver API ---------- - Add numerical ID for devlink instances (to avoid having to create fake bus/device pairs just to have an ID). Support shared devlink instances which span multiple PFs. - Add standard counters for reporting pause storm events (implement in mlx5 and fbnic). - Add configuration API for completion writeback buffering (implement in mana). - Support driver-initiated change of RSS context sizes. - Support DPLL monitoring input frequency (implement in zl3073x). - Support per-port resources in devlink (implement in mlx5). Misc ---- - Expand the YAML spec for Netfilter. Drivers ------- - Software: - macvlan: support multicast rx for bridge ports with shared source MAC address - team: decouple receive and transmit enablement for IEEE 802.3ad LACP "independent control" - Ethernet high-speed NICs: - nVidia/Mellanox: - support high order pages in zero-copy mode (for payload coalescing) - support multiple packets in a page (for systems with 64kB pages) - Broadcom 25-400GE (bnxt): - implement XDP RSS hash metadata extraction - add software fallback for UDP GSO, lowering the IOMMU cost - Broadcom 800GE (bnge): - add link status and configuration handling - add various HW and SW statistics - Marvell/Cavium: - NPC HW block support for cn20k - Huawei (hinic3): - add mailbox / control queue - add rx VLAN offload - add driver info and link management - Ethernet NICs: - Marvell/Aquantia: - support reading SFP module info on some AQC100 cards - Realtek PCI (r8169): - add support for RTL8125cp - Realtek USB (r8152): - support for the RTL8157 5Gbit chip - add 2500baseT EEE status/configuration support - Ethernet NICs embedded and off-the-shelf IP: - Synopsys (stmmac): - cleanup and reorganize SerDes handling and PCS support - cleanup descriptor handling and per-platform data - cleanup and consolidate MDIO defines and handling - shrink driver memory use for internal structures - improve Tx IRQ coalescing - improve TCP segmentation handling - add support for Spacemit K3 - Cadence (macb): - support PHYs that have inband autoneg disabled with GEM - support IEEE 802.3az EEE - rework usrio capabilities and handling - AMD (xgbe): - improve power management for S0i3 - improve TX resilience for link-down handling - Virtual: - Google cloud vNIC: - support larger ring sizes in DQO-QPL mode - improve HW-GRO handling - support UDP GSO for DQO format - PCIe NTB: - support queue count configuration - Ethernet PHYs: - automatically disable PHY autonomous EEE if MAC is in charge - Broadcom: - add BCM84891/BCM84892 support - Micrel: - support for LAN9645X internal PHY - Realtek: - add RTL8224 pair order support - support PHY LEDs on RTL8211F-VD - support spread spectrum clocking (SSC) - Maxlinear: - add PHY-level statistics via ethtool - Ethernet switches: - Maxlinear (mxl862xx): - support for bridge offloading - support for VLANs - support driver statistics - Bluetooth: - large number of fixes and new device IDs - Mediatek: - support MT6639 (MT7927) - support MT7902 SDIO - WiFi: - Intel (iwlwifi): - UNII-9 and continuing UHR work - MediaTek (mt76): - mt7996/mt7925 MLO fixes/improvements - mt7996 NPU support (HW eth/wifi traffic offload) - Qualcomm (ath12k): - monitor mode support on IPQ5332 - basic hwmon temperature reporting - support IPQ5424 - Realtek: - add USB RX aggregation to improve performance - add USB TX flow control by tracking in-flight URBs - Cellular: - IPA v5.2 support Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmnelNoACgkQMUZtbf5S IrtWFw//WyiXuEiGawVQONnbu1dtR+3nw/cvNpSYi0IM66vbRUB9n+9fxm2MIyG4 4jI/c/X/fxIvUxEqGez3yPn5P7KqkQR8WRYwkxrMYKRpXeukN0IDk5Euew5DskCe wtBKNJOQWKdKXff0bLQoJ9dHWYuJ2IMRVil5M3fhUbeUOXeyJD7Yn1w2ICvJAbj+ T/Hw7sEtchNaHp6h6SbaQfahkUFHQG5peNoETkZF4UDF6ALGY29WH91GXeO2lrgN IxX203KtaavV0oU8T0oixZgOc57Ns081YfFL/F1JP2HV6lgkwhuq+zxCrRTi1c9M HPTXgwD7Z80Y74nM3YTLrPfoMOP8GLBZgdV3rUpwmteM26+gMTm+O1zHUur5ZoGy D6TaMFguPTIqiRyrARa9xY/J6r9TQkc2Wfu4bIuPndKFg8xPoepuEObODnh0+5Hg 4j4pdFhIo2huENhSg7kVb/yl+1q68SFwM3RqTmx+OhCa0AyjcKIKgt/UBhismdnG r8obxzb+nXeJc2rRDuwNMwlBlcMSbep27uGt64zeHMMXVhTVqOoytNaL/X/ZpH2m A0DscUrpHvb36IoDPtanc6irP+JOh5Xe7Nw5qhkgwsMc7hlf8SyyHB4OUBBaz1qA ETSnHlfwklRmXSpWqH2LyGXjdOQpDKP46+h0W3dttMD2/cRBqYo= =EhQZ -----END PGP SIGNATURE----- Merge tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Support HW queue leasing, allowing containers to be granted access to HW queues for zero-copy operations and AF_XDP - Number of code moves to help the compiler with inlining. Avoid output arguments for returning drop reason where possible - Rework drop handling within qdiscs to include more metadata about the reason and dropping qdisc in the tracepoints - Remove the rtnl_lock use from IP Multicast Routing - Pack size information into the Rx Flow Steering table pointer itself. This allows making the table itself a flat array of u32s, thus making the table allocation size a power of two - Report TCP delayed ack timer information via socket diag - Add ip_local_port_step_width sysctl to allow distributing the randomly selected ports more evenly throughout the allowed space - Add support for per-route tunsrc in IPv6 segment routing - Start work of switching sockopt handling to iov_iter - Improve dynamic recvbuf sizing in MPTCP, limit burstiness and avoid buffer size drifting up - Support MSG_EOR in MPTCP - Add stp_mode attribute to the bridge driver for STP mode selection. This addresses concerns about call_usermodehelper() usage - Remove UDP-Lite support (as announced in 2023) - Remove support for building IPv6 as a module. Remove the now unnecessary function calling indirection Cross-tree stuff: - Move Michael MIC code from generic crypto into wireless, it's considered insecure but some WiFi networks still need it Netfilter: - Switch nft_fib_ipv6 module to no longer need temporary dst_entry object allocations by using fib6_lookup() + RCU. Florian W reports this gets us ~13% higher packet rate - Convert IPVS's global __ip_vs_mutex to per-net service_mutex and switch the service tables to be per-net. Convert some code that walks the service lists to use RCU instead of the service_mutex - Add more opinionated input validation to lower security exposure - Make IPVS hash tables to be per-netns and resizable Wireless: - Finished assoc frame encryption/EPPKE/802.1X-over-auth - Radar detection improvements - Add 6 GHz incumbent signal detection APIs - Multi-link support for FILS, probe response templates and client probing - New APIs and mac80211 support for NAN (Neighbor Aware Networking, aka Wi-Fi Aware) so less work must be in firmware Driver API: - Add numerical ID for devlink instances (to avoid having to create fake bus/device pairs just to have an ID). Support shared devlink instances which span multiple PFs - Add standard counters for reporting pause storm events (implement in mlx5 and fbnic) - Add configuration API for completion writeback buffering (implement in mana) - Support driver-initiated change of RSS context sizes - Support DPLL monitoring input frequency (implement in zl3073x) - Support per-port resources in devlink (implement in mlx5) Misc: - Expand the YAML spec for Netfilter Drivers - Software: - macvlan: support multicast rx for bridge ports with shared source MAC address - team: decouple receive and transmit enablement for IEEE 802.3ad LACP "independent control" - Ethernet high-speed NICs: - nVidia/Mellanox: - support high order pages in zero-copy mode (for payload coalescing) - support multiple packets in a page (for systems with 64kB pages) - Broadcom 25-400GE (bnxt): - implement XDP RSS hash metadata extraction - add software fallback for UDP GSO, lowering the IOMMU cost - Broadcom 800GE (bnge): - add link status and configuration handling - add various HW and SW statistics - Marvell/Cavium: - NPC HW block support for cn20k - Huawei (hinic3): - add mailbox / control queue - add rx VLAN offload - add driver info and link management - Ethernet NICs: - Marvell/Aquantia: - support reading SFP module info on some AQC100 cards - Realtek PCI (r8169): - add support for RTL8125cp - Realtek USB (r8152): - support for the RTL8157 5Gbit chip - add 2500baseT EEE status/configuration support - Ethernet NICs embedded and off-the-shelf IP: - Synopsys (stmmac): - cleanup and reorganize SerDes handling and PCS support - cleanup descriptor handling and per-platform data - cleanup and consolidate MDIO defines and handling - shrink driver memory use for internal structures - improve Tx IRQ coalescing - improve TCP segmentation handling - add support for Spacemit K3 - Cadence (macb): - support PHYs that have inband autoneg disabled with GEM - support IEEE 802.3az EEE - rework usrio capabilities and handling - AMD (xgbe): - improve power management for S0i3 - improve TX resilience for link-down handling - Virtual: - Google cloud vNIC: - support larger ring sizes in DQO-QPL mode - improve HW-GRO handling - support UDP GSO for DQO format - PCIe NTB: - support queue count configuration - Ethernet PHYs: - automatically disable PHY autonomous EEE if MAC is in charge - Broadcom: - add BCM84891/BCM84892 support - Micrel: - support for LAN9645X internal PHY - Realtek: - add RTL8224 pair order support - support PHY LEDs on RTL8211F-VD - support spread spectrum clocking (SSC) - Maxlinear: - add PHY-level statistics via ethtool - Ethernet switches: - Maxlinear (mxl862xx): - support for bridge offloading - support for VLANs - support driver statistics - Bluetooth: - large number of fixes and new device IDs - Mediatek: - support MT6639 (MT7927) - support MT7902 SDIO - WiFi: - Intel (iwlwifi): - UNII-9 and continuing UHR work - MediaTek (mt76): - mt7996/mt7925 MLO fixes/improvements - mt7996 NPU support (HW eth/wifi traffic offload) - Qualcomm (ath12k): - monitor mode support on IPQ5332 - basic hwmon temperature reporting - support IPQ5424 - Realtek: - add USB RX aggregation to improve performance - add USB TX flow control by tracking in-flight URBs - Cellular: - IPA v5.2 support" * tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1561 commits) net: pse-pd: fix kernel-doc function name for pse_control_find_by_id() wireguard: device: use exit_rtnl callback instead of manual rtnl_lock in pre_exit wireguard: allowedips: remove redundant space tools: ynl: add sample for wireguard wireguard: allowedips: Use kfree_rcu() instead of call_rcu() MAINTAINERS: Add netkit selftest files selftests/net: Add additional test coverage in nk_qlease selftests/net: Split netdevsim tests from HW tests in nk_qlease tools/ynl: Make YnlFamily closeable as a context manager net: airoha: Add missing PPE configurations in airoha_ppe_hw_init() net: airoha: Fix VIP configuration for AN7583 SoC net: caif: clear client service pointer on teardown net: strparser: fix skb_head leak in strp_abort_strp() net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete() selftests/bpf: add test for xdp_master_redirect with bond not up net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master net: airoha: Remove PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration sctp: disable BH before calling udp_tunnel_xmit_skb() sctp: fix missing encap_port propagation for GSO fragments net: airoha: Rely on net_device pointer in ETS callbacks ...	2026-04-14 18:36:10 -07:00
Jakub Kicinski	35c2c39832	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Merge in late fixes in preparation for the net-next PR. Conflicts: include/net/sch_generic.h `a6bd339dbb` ("net_sched: fix skb memory leak in deferred qdisc drops") `ff2998f29f` ("net: sched: introduce qdisc-specific drop reason tracing") https://lore.kernel.org/adz0iX85FHMz0HdO@sirena.org.uk drivers/net/ethernet/airoha/airoha_eth.c `1acdfbdb51` ("net: airoha: Fix VIP configuration for AN7583 SoC") `bf3471e6e6` ("net: airoha: Make flow control source port mapping dependent on nbq parameter") Adjacent changes: drivers/net/ethernet/airoha/airoha_ppe.c `f44218cd5e` ("net: airoha: Reset PPE cpu port configuration in airoha_ppe_hw_init()") `7da62262ec` ("inet: add ip_local_port_step_width sysctl to improve port usage distribution") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-14 12:04:00 -07:00
Jakub Kicinski	e9dc62f25b	bluetooth-next pull request for net-next: core: - hci_core: Rate limit the logging of invalid ISO handle - hci_sync: make hci_cmd_sync_run_once return -EEXIST if exists - hci_event: fix locking in hci_conn_request_evt() with HCI_PROTO_DEFER - hci_event: fix potential UAF in SSP passkey handlers - HCI: Avoid a couple -Wflex-array-member-not-at-end warnings - L2CAP: CoC: Disconnect if received packet size exceeds MPS - L2CAP: Add missing chan lock in l2cap_ecred_reconf_rsp - L2CAP: Fix printing wrong information if SDU length exceeds MTU - SCO: check for codecs->num_codecs == 1 before assigning to sco_pi(sk)->codec drivers: - btusb: MT7922: Add VID/PID 0489/e174 - btusb: Add Lite-On 04ca:3807 for MediaTek MT7921 - btusb: Add MT7927 IDs ASUS ROG Crosshair X870E Hero, Lenovo Legion Pro 7 16ARX9, Gigabyte Z790 AORUS MASTER X, MSI X870E Ace Max, TP-Link Archer TBE550E, ASUS X870E / ProArt X870E-Creator. - btusb: Add MT7902 IDs 13d3/3579, 13d3/3580, 13d3/3594, 13d3/3596, 0e8d/1ede - btusb: Add MT7902 IDs 13d3/3579, 13d3/3580, 13d3/3594, 13d3/3596, 0e8d/1ede - btusb: MediaTek MT7922: Add VID 0489 & PID e11d - btintel: Add support for Scorpious Peak2 support - btintel: Add support for Scorpious Peak2F support - btintel_pcie: Add device id of Scorpius Peak2, Nova Lake-PCD-H - btintel_pcie: Add device id of Scorpious2, Nova Lake-PCD-S - btmtk: Add reset mechanism if downloading firmware failed - btmtk: Add MT6639 (MT7927) Bluetooth support - btmtk: fix ISO interface setup for single alt setting - btmtk: add MT7902 SDIO support - Bluetooth: btmtk: add MT7902 MCU support - btbcm: Add entry for BCM4343A2 UART Bluetooth - qca: enable pwrseq support for wcn39xx devices - hci_qca: Fix BT not getting powered-off on rmmod - hci_qca: disable power control for WCN7850 when bt_en is not defined - hci_qca: Fix missing wakeup during SSR memdump handling - hci_ldisc: Clear HCI_UART_PROTO_INIT on error - mmc: sdio: add MediaTek MT7902 SDIO device ID - hci_ll: Enable BROKEN_ENHANCED_SETUP_SYNC_CONN for WL183x -----BEGIN PGP SIGNATURE----- iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmnc7dkZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKXaFEACXu8AL+3U+nx5u7tWcdGrI XGP9LaXsSZBTt7lpYk64P0yArSmoebZ6yKXVPe/kV+hrI91NCP13hBo1kx2UTPY9 hwLuchb/xMZa4a8BZpzDZRSSsAiuI76BpU43eJAmC5qC2bRzzSkphnsh8LSHIm21 tEYn8zDBr4PE3NgEFasaNhOPwXsuFW/AJfjxJ3O4wMALkMud5d2+u2IO0lIska5a DjN6MWY8OCa6bSBn/9ah0qDagIBrhaocAL0kAIaTznBDWFwh7whavuolCS7JARw+ pedDiwh3QraIb/m0qyfmjswJGyVCexyTNc+20HffezEK6yfi3TVUZXMF3OFde3zt UP6Nm8q/Gv8L9v6UsUUj8xQXW7zy1Gdt7LX/z3x1vksxJDc1Iuo148n509Nm8+Pd ecv/kLrA/2hat5PfZSzclcHohAeJhATG3sNeZO0u/resOW+3TwPpi5IwVXzu3SdF 0sm8H7Uv7IuPc5LkG0F9J/reSa5s3lhMH1VXr89NUGmr2dhYsSr4fr6Hlb0Udxi8 TOU0if2JO/e/VlWcyzgXxPzhVFHLEY0IfZoEC8YHec73q+US1rWTXRgm3JZmjA/x 28l7RkXHUUebSHfuMOZ1qi8LxxfF/z5IS0rbLfLcg5yezI61K7hOJTYG6NKnJ+0U mGLUdeNNbLtTEwgU+G7iXA== =jFF1 -----END PGP SIGNATURE----- Merge tag 'for-net-next-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: core: - hci_core: Rate limit the logging of invalid ISO handle - hci_sync: make hci_cmd_sync_run_once return -EEXIST if exists - hci_event: fix locking in hci_conn_request_evt() with HCI_PROTO_DEFER - hci_event: fix potential UAF in SSP passkey handlers - HCI: Avoid a couple -Wflex-array-member-not-at-end warnings - L2CAP: CoC: Disconnect if received packet size exceeds MPS - L2CAP: Add missing chan lock in l2cap_ecred_reconf_rsp - L2CAP: Fix printing wrong information if SDU length exceeds MTU - SCO: check for codecs->num_codecs == 1 before assigning to sco_pi(sk)->codec drivers: - btusb: MT7922: Add VID/PID 0489/e174 - btusb: Add Lite-On 04ca:3807 for MediaTek MT7921 - btusb: Add MT7927 IDs ASUS ROG Crosshair X870E Hero, Lenovo Legion Pro 7 16ARX9, Gigabyte Z790 AORUS MASTER X, MSI X870E Ace Max, TP-Link Archer TBE550E, ASUS X870E / ProArt X870E-Creator. - btusb: Add MT7902 IDs 13d3/3579, 13d3/3580, 13d3/3594, 13d3/3596, 0e8d/1ede - btusb: Add MT7902 IDs 13d3/3579, 13d3/3580, 13d3/3594, 13d3/3596, 0e8d/1ede - btusb: MediaTek MT7922: Add VID 0489 & PID e11d - btintel: Add support for Scorpious Peak2 support - btintel: Add support for Scorpious Peak2F support - btintel_pcie: Add device id of Scorpius Peak2, Nova Lake-PCD-H - btintel_pcie: Add device id of Scorpious2, Nova Lake-PCD-S - btmtk: Add reset mechanism if downloading firmware failed - btmtk: Add MT6639 (MT7927) Bluetooth support - btmtk: fix ISO interface setup for single alt setting - btmtk: add MT7902 SDIO support - Bluetooth: btmtk: add MT7902 MCU support - btbcm: Add entry for BCM4343A2 UART Bluetooth - qca: enable pwrseq support for wcn39xx devices - hci_qca: Fix BT not getting powered-off on rmmod - hci_qca: disable power control for WCN7850 when bt_en is not defined - hci_qca: Fix missing wakeup during SSR memdump handling - hci_ldisc: Clear HCI_UART_PROTO_INIT on error - mmc: sdio: add MediaTek MT7902 SDIO device ID - hci_ll: Enable BROKEN_ENHANCED_SETUP_SYNC_CONN for WL183x * tag 'for-net-next-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (59 commits) Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling Bluetooth: btintel_pcie: use strscpy to copy plain strings Bluetooth: hci_event: fix potential UAF in SSP passkey handlers Bluetooth: hci.h: Avoid a couple -Wflex-array-member-not-at-end warnings Bluetooth: SCO: check for codecs->num_codecs == 1 before assigning to sco_pi(sk)->codec Bluetooth: btintel_pcie: Align shared DMA memory to 128 bytes Bluetooth: l2cap: Add missing chan lock in l2cap_ecred_reconf_rsp Bluetooth: hci_ll: Enable BROKEN_ENHANCED_SETUP_SYNC_CONN for WL183x Bluetooth: btusb: MediaTek MT7922: Add VID 0489 & PID e11d Bluetooth: btmtk: hide unused btmtk_mt6639_devs[] array Bluetooth: btusb: Add MT7927 ID for ASUS X870E / ProArt X870E-Creator Bluetooth: btusb: Add MT7927 ID for TP-Link Archer TBE550E Bluetooth: btusb: Add MT7927 ID for MSI X870E Ace Max Bluetooth: btusb: Add MT7927 ID for Gigabyte Z790 AORUS MASTER X Bluetooth: btusb: Add MT7927 ID for Lenovo Legion Pro 7 16ARX9 Bluetooth: btusb: Add MT7927 ID for ASUS ROG Crosshair X870E Hero Bluetooth: btmtk: fix ISO interface setup for single alt setting Bluetooth: btmtk: Add MT6639 (MT7927) Bluetooth support Bluetooth: fix locking in hci_conn_request_evt() with HCI_PROTO_DEFER Bluetooth: btmtk: refactor endpoint lookup ... ==================== Link: https://patch.msgid.link/20260413132247.320961-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-13 14:26:03 -07:00
Linus Torvalds	b7d74ea0fd	vfs-7.1-rc1.kino Please consider pulling these changes from the signed vfs-7.1-rc1.kino tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCgAKCRCRxhvAZXjc otmnAP4sbsxZQdz2TG2hJuOwnEZOkkxZQOUMc3ERVyZaWXIeTAEA7e5M+8FpoG9n 8ipO76UoaXdGLESrqVdp9EOhLqOW7QY= =uMeJ -----END PGP SIGNATURE----- Merge tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64	2026-04-13 12:19:01 -07:00
Jakub Kicinski	b025461303	tcp: update window_clamp when SO_RCVBUF is set Commit under Fixes moved recomputing the window clamp to tcp_measure_rcv_mss() (when scaling_ratio changes). I suspect it missed the fact that we don't recompute the clamp when rcvbuf is set. Until scaling_ratio changes we are stuck with the old window clamp which may be based on the small initial buffer. scaling_ratio may never change. Inspired by Eric's recent commit `d1361840f8` ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning") plumb the user action thru to TCP and have it update the clamp. A smaller fix would be to just have tcp_rcvbuf_grow() adjust the clamp even if SOCK_RCVBUF_LOCK is set. But IIUC this is what we were trying to get away from in the first place. Fixes: `a2cbb16039` ("tcp: Update window clamping condition") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumaze@google.com> Link: https://patch.msgid.link/20260408001438.129165-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-13 15:32:35 +02:00
Gustavo A. R. Silva	a0cff16d0f	Bluetooth: hci.h: Avoid a couple -Wflex-array-member-not-at-end warnings -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. struct hci_std_codecs and struct hci_std_codecs_v2 are flexible structures, this is structures that contain a flexible-array member (__u8 codec[]; and struct hci_std_codec_v2 codec[];, correspondingly.) Since struct hci_rp_read_local_supported_codecs and struct hci_rp_read_local_supported_codecs_v2 are defined by hardware, we create the new struct hci_std_codecs_hdr and struct hci_std_codecs_v2_hdr types, and use them to replace the object types causing trouble in struct hci_rp_read_local_supported_codecs and struct hci_rp_read_local_supported_codecs_v2, namely struct hci_std_codecs std_codecs; and struct hci_std_codecs_v2_hdr std_codecs;. Also, once -fms-extensions is enabled, we can use transparent struct members in both struct hci_std_codecs and struct hci_std_codecs_v2_hdr. Notice that the newly created types does not contain the flex-array member `codec`, which is the object causing the -Wfamnae warnings. After these changes, the size of struct hci_rp_read_local_supported_codecs and struct hci_rp_read_local_supported_codecs_v2, along with their member's offsets remain the same, hence the memory layouts don't change: Before changes: struct hci_rp_read_local_supported_codecs { __u8 status; /* 0 1 / struct hci_std_codecs std_codecs; / 1 1 / struct hci_vnd_codecs vnd_codecs; / 2 1 / / size: 3, cachelines: 1, members: 3 / / last cacheline: 3 bytes / } __attribute__((__packed__)); struct hci_rp_read_local_supported_codecs_v2 { __u8 status; / 0 1 / struct hci_std_codecs_v2 std_codecs; / 1 1 / struct hci_vnd_codecs_v2 vendor_codecs; / 2 1 / / size: 3, cachelines: 1, members: 3 / / last cacheline: 3 bytes / } __attribute__((__packed__)); After changes: struct hci_rp_read_local_supported_codecs { __u8 status; / 0 1 / struct hci_std_codecs_hdr std_codecs; / 1 1 / struct hci_vnd_codecs vnd_codecs; / 2 1 / / size: 3, cachelines: 1, members: 3 / / last cacheline: 3 bytes / } __attribute__((__packed__)); struct hci_rp_read_local_supported_codecs_v2 { __u8 status; / 0 1 / struct hci_std_codecs_v2_hdr std_codecs; / 1 1 / struct hci_vnd_codecs_v2 vendor_codecs; / 2 1 / / size: 3, cachelines: 1, members: 3 / / last cacheline: 3 bytes */ } __attribute__((__packed__)); With these changes fix the following warnings: include/net/bluetooth/hci.h:1490:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] include/net/bluetooth/hci.h:1525:34: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-13 09:19:42 -04:00
Jakub Kicinski	9336854a59	Merge branch 'net-reduce-sk_filter-and-friends-bloat' Eric Dumazet says: ==================== net: reduce sk_filter() (and friends) bloat Some functions return an error by value, and a drop_reason by an output parameter. This extra parameter can force stack canaries. A drop_reason is enough and more efficient. This series reduces bloat by 678 bytes on x86_64: $ scripts/bloat-o-meter -t vmlinux.old vmlinux.final add/remove: 0/0 grow/shrink: 3/18 up/down: 79/-757 (-678) Function old new delta vsock_queue_rcv_skb 50 79 +29 ipmr_cache_report 1290 1315 +25 ip6mr_cache_report 1322 1347 +25 tcp_v6_rcv 3169 3167 -2 packet_rcv_spkt 329 327 -2 unix_dgram_sendmsg 1731 1726 -5 netlink_unicast 957 945 -12 netlink_dump 1372 1359 -13 sk_filter_trim_cap 889 858 -31 netlink_broadcast_filtered 1633 1595 -38 tcp_v4_rcv 3152 3111 -41 raw_rcv_skb 122 80 -42 ping_queue_rcv_skb 109 61 -48 ping_rcv 215 162 -53 rawv6_rcv_skb 278 224 -54 __sk_receive_skb 690 632 -58 raw_rcv 591 527 -64 udpv6_queue_rcv_one_skb 935 869 -66 udp_queue_rcv_one_skb 919 853 -66 tun_net_xmit 1146 1074 -72 sock_queue_rcv_skb_reason 166 76 -90 Total: Before=29722890, After=29722212, chg -0.00% Future conversions from sock_queue_rcv_skb() to sock_queue_rcv_skb_reason() can be done later. ==================== Link: https://patch.msgid.link/20260409145625.2306224-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 14:30:28 -07:00
Eric Dumazet	fb37aea2a0	net: change sk_filter_trim_cap() to return a drop_reason by value Current return value can be replaced with the drop_reason, reducing kernel bloat: $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 1/11 up/down: 32/-603 (-571) Function old new delta tcp_v6_rcv 3135 3167 +32 unix_dgram_sendmsg 1731 1726 -5 netlink_unicast 957 945 -12 netlink_dump 1372 1359 -13 sk_filter_trim_cap 882 858 -24 tcp_v4_rcv 3143 3111 -32 __pfx_tcp_filter 32 - -32 netlink_broadcast_filtered 1633 1595 -38 sock_queue_rcv_skb_reason 126 76 -50 tun_net_xmit 1127 1074 -53 __sk_receive_skb 690 632 -58 udpv6_queue_rcv_one_skb 935 869 -66 udp_queue_rcv_one_skb 919 853 -66 tcp_filter 154 - -154 Total: Before=29722783, After=29722212, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260409145625.2306224-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 14:30:25 -07:00
Eric Dumazet	97449a5f1a	tcp: change tcp_filter() to return the reason by value sk_filter_trim_cap() will soon return the reason by value, do the same for tcp_filter(). Note: tcp_filter() is no longer inlined. Following patch will inline it again. $ scripts/bloat-o-meter -t vmlinux.4 vmlinux.5 add/remove: 2/0 grow/shrink: 0/2 up/down: 186/-43 (143) Function old new delta tcp_filter - 154 +154 __pfx_tcp_filter - 32 +32 tcp_v4_rcv 3152 3143 -9 tcp_v6_rcv 3169 3135 -34 Total: Before=29722640, After=29722783, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260409145625.2306224-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 14:30:25 -07:00
Eric Dumazet	900f27fb79	net: change sock_queue_rcv_skb_reason() to return a drop_reason Change sock_queue_rcv_skb_reason() to return the drop_reason directly instead of using a reference. This is part of an effort to remove stack canaries and reduce bloat. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 3/7 up/down: 79/-301 (-222) Function old new delta vsock_queue_rcv_skb 50 79 +29 ipmr_cache_report 1290 1315 +25 ip6mr_cache_report 1322 1347 +25 packet_rcv_spkt 329 327 -2 sock_queue_rcv_skb_reason 166 128 -38 raw_rcv_skb 122 80 -42 ping_queue_rcv_skb 109 61 -48 ping_rcv 215 162 -53 rawv6_rcv_skb 278 224 -54 raw_rcv 591 527 -64 Total: Before=29722890, After=29722668, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260409145625.2306224-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 14:30:25 -07:00
Fernando Fernandez Mancera	a6bd339dbb	net_sched: fix skb memory leak in deferred qdisc drops When the network stack cleans up the deferred list via qdisc_run_end(), it operates on the root qdisc. If the root qdisc do not implement the TCQ_F_DEQUEUE_DROPS flag the packets queue to free are never freed and gets stranded on the child's local to_free list. Fix this by making qdisc_dequeue_drop() aware of the root qdisc. It fetches the root qdisc and check for the TCQ_F_DEQUEUE_DROPS flag. If the flag is present, the packet is appended directly to the root's to_free list. Otherwise, drop it directly as it was done before the optimization was implemented. Fixes: `a6efc273ab` ("net_sched: use qdisc_dequeue_drop() in cake, codel, fq_codel") Reported-by: Damilola Bello <damilola@aterlo.com> Closes: https://lore.kernel.org/netdev/CAPgFtOLaedBMU0f_BxV2bXftTJSmJr018Q5uozOo5vVo6b9tjw@mail.gmail.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260408100044.4530-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:38:18 -07:00
Hangbin Liu	1346586a9a	netlink: add a nla_nest_end_safe() helper The nla_len field in struct nlattr is a __u16, which can only hold values up to 65535. If a nested attribute grows beyond this limit, nla_nest_end() silently truncates the length, producing a corrupted netlink message with no indication of the problem. Since nla_nest_end() is used everywhere and this issue rarely happens, let's add a new helper to check the length. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-4-7623a5e8f70b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:23:50 -07:00
Joe Damato	82db77f6fb	net: tso: Introduce tso_dma_map and helpers Add struct tso_dma_map to tso.h for tracking DMA addresses of mapped GSO payload data and tso_dma_map_completion_state. The tso_dma_map combines DMA mapping storage with iterator state, allowing drivers to walk pre-mapped DMA regions linearly. Includes fields for the DMA IOVA path (iova_state, iova_offset, total_len) and a fallback per-region path (linear_dma, frags[], frag_idx, offset). The tso_dma_map_completion_state makes the IOVA completion state opaque for drivers. Drivers are expected to allocate this and use the added helpers to update the completion state. Adds skb_frag_phys() to skbuff.h, returning the physical address of a paged fragment's data, which is used by the tso_dma_map helpers introduced in this commit described below. The added TSO DMA map helpers are: tso_dma_map_init(): DMA-maps the linear payload region and all frags upfront. Prefers the DMA IOVA API for a single contiguous mapping with one IOTLB sync; falls back to per-region dma_map_phys() otherwise. Returns 0 on success, cleans up partial mappings on failure. tso_dma_map_cleanup(): Handles both IOVA and fallback teardown paths. tso_dma_map_count(): counts how many descriptors the next N bytes of payload will need. Returns 1 if IOVA is used since the mapping is contiguous. tso_dma_map_next(): yields the next (dma_addr, chunk_len) pair. On the IOVA path, each segment is a single contiguous chunk. On the fallback path, indicates when a chunk starts a new DMA mapping so the driver can set dma_unmap_len on that descriptor for completion-time unmapping. tso_dma_map_completion_save(): updates the completion state. Drivers will call this at xmit time. tso_dma_map_complete(): tears down the mapping at completion time and returns true if the IOVA path was used. If it was not used, this is a no-op and returns false. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-2-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 10:54:31 -07:00
Jakub Kicinski	03a1569c2b	netfilter pull request nf-next-26-04-10 -----BEGIN PGP SIGNATURE----- iQJdBAABCABHFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmnYzgIbFIAAAAAABAAO bWFudTIsMi41KzEuMTIsMiwyDRxmd0BzdHJsZW4uZGUACgkQcJGo2a1f9gCfDw/+ KWf9fUlsE3uaxK889hfR0QU5ANQ03Ix1eVvr6Vh0Y5Za1glZDUuls0EsH0ej7/36 ZQqAu2vaevHTVZl3EhAS1vu8KBcldl36YEtvJsQXFkFuOoO3F/dBdttwAif2tzv8 ammqXOKicRHok1A3cy8R1fkGFAHpfn5BjBc68A0+SY1N2NFVdVNS9BP4p7tuSdkk JCj3TdDmBcddZ3SnY/z27S4+8jUL3e7HEAbsMApzIERcxe1w/6gEbb5Oa6AUwtHT 2SwQlUyhBa6gx2tARgUsHcck5QiW8b1tX7y1tzyo2q6rw78m1Eublib5nYCav/w8 9pSjRLlzSYBQ22e3wz7WqFXZRaM5+O38s3Moxfn/xrQblTk8CyW/5zGQJKivW9oG LEirCPbL6U6ZB/2Uy+3EvzG5TBP3cppB5sXaQfMdSQ03wvYFXMN35hb54ePZW6CX Db3lCwimOuXq+hkjVzZIU9ZmGr03oNohFX1GA0gDqrWtc9KsEKW8/KQvX61N8QK3 YEMIZ6fbMkstCY98fS3j6r6+V1he6wzcZpsqjd9FACYXtf8LQbPvoMA4BfcGR8+X iQVEZcrvdGa39VH1TQFlXJIe/Pv+9tZ+CF44MsrNyYH0mD4gTInajklO3lkw/YQj RQTHJLal9RCF9gVZRqHgkpE8vUj0mtUkp6Atz6En4mU= =C/nr -----END PGP SIGNATURE----- Merge tag 'nf-next-26-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next 1-3) IPVS updates from Julian Anastasov to enhance visibility into IPVS internal state by exposing hash size, load factor etc and allows userspace to tune the load factor used for resizing hash tables. 4) reject empty/not nul terminated device names from xt_physdev. This isn't a bug fix; existing code doesn't require a c-string. But clean this up anyway because conceptually the interface name definitely should be a c-string. 5) Switch nfnetlink to skb_mac_header helpers that didn't exist back when this code was written. This gives us additional debug checks but is not intended to change functionality. 6) Let the xt ttl/hoplimit match reject unknown operator modes. This is a cleanup, the evaluation function simply returns false when the mode is out of range. From Marino Dzalto. 7) xt_socket match should enable defrag after all other checks. This bug is harmless, historically defrag could not be disabled either except by rmmod. 8) remove UDP-Lite conntrack support, from Fernando Fernandez Mancera. 9) Avoid a couple -Wflex-array-member-not-at-end warnings in the old xtables 32bit compat code, from Gustavo A. R. Silva. 10) nftables fwd expression should drop packets when their ttl/hl has expired. This is a bug fix deferred, its not deemed important enough for -rc8. 11) Add additional checks before assuming the mac header is an ethernet header, from Zhengchuan Liang. * tag 'nf-next-26-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: require Ethernet MAC header before using eth_hdr() netfilter: nft_fwd_netdev: check ttl/hl before forwarding netfilter: x_tables: Avoid a couple -Wflex-array-member-not-at-end warnings netfilter: conntrack: remove UDP-Lite conntrack support netfilter: xt_socket: enable defrag after all other checks netfilter: xt_HL: add pr_fmt and checkentry validation netfilter: nfnetlink: prefer skb_mac_header helpers netfilter: x_physdev: reject empty or not-nul terminated device names ipvs: add conn_lfactor and svc_lfactor sysctl vars ipvs: add ip_vs_status info ipvs: show the current conn_tab size to users ==================== Link: https://patch.msgid.link/20260410112352.23599-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 09:39:21 -07:00
Jakub Kicinski	118cbd428e	Final updates, notably: - crypto: move Michael MIC code into wireless (only) - mac80211: - multi-link 4-addr support - NAN data support (but no drivers yet) - ath10k: DT quirk to make it work on some devices - ath12k: IPQ5424 support - rtw89: USB improvements for performance -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmnYm4gACgkQ10qiO8sP aABbhw/+Jb3JGnUT3PGqE+uyDsP5+uj6/thHVjugbtgFZeeNE6rAdprma1OzBbIF BZCgKivNRA4UHehYlZtgnp5Hm5bYCEF6ntGk5a7SLx6HuVn1JWR4cqc1zfCkmRmH 7PtvShCy+rx2t6q/O5mZ3s2z4cSbauzvJ3s5j/qme/lZV5K6Hrx6HnJ6fudyBHOT vS6Ahl2tfFIyek6Qfs5xfUFzcNY4kEw6O8yMJ8F+ZgV4fXWZybmo4Ld6j/w0veiF 0Wo9XYNQVu0EBqTkprsKwnZ0bmfQq03hivmupTm9b4gtXQjakb9o/wG2gQilKYkF ycS39cW3TxAGQDs/2U4l1CreumeRXtlq9OUywDu9arEICkJ0C7ierR3azo3fNeDB WomDYREtV7g3KOzoC6T0Zxivgdkg66W2ZVWvvWIHdSWCY+hYK+JsGXK6fZ5uAyo2 5BOq/PQtK/m/3DTwR+lF3KYeUB8r+Q/FiJ1kMih6lAK+dZGDGhQPjN+seaVsIpfw cZVHu6hAmt7ks2zxbKpBmdFbyKjQu3Yk1UZNXtOKPuLzwKGdTECiRlJimJsOffcn lcveT93oz5zGSCwH1pEw0nIlsnEVGFQjxCOVln7eOs66i8vnJKMb/8OpvcxFqjsL rvItAvw+cPWrLXLB9VkxqZg2ndrWe9gGh+JJtRkGXhEjC4U9Ll0= =lf+O -----END PGP SIGNATURE----- Merge tag 'wireless-next-2026-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Final updates, notably: - crypto: move Michael MIC code into wireless (only) - mac80211: - multi-link 4-addr support - NAN data support (but no drivers yet) - ath10k: DT quirk to make it work on some devices - ath12k: IPQ5424 support - rtw89: USB improvements for performance * tag 'wireless-next-2026-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (124 commits) wifi: cfg80211: Explicitly include <linux/export.h> in michael-mic.c wifi: ath10k: Add device-tree quirk to skip host cap QMI requests dt-bindings: wireless: ath10k: Add quirk to skip host cap QMI requests crypto: Remove michael_mic from crypto_shash API wifi: ipw2x00: Use michael_mic() from cfg80211 wifi: ath12k: Use michael_mic() from cfg80211 wifi: ath11k: Use michael_mic() from cfg80211 wifi: mac80211, cfg80211: Export michael_mic() and move it to cfg80211 wifi: ipw2x00: Rename michael_mic() to libipw_michael_mic() wifi: libertas_tf: refactor endpoint lookup wifi: libertas: refactor endpoint lookup wifi: at76c50x: refactor endpoint lookup wifi: ath12k: Enable IPQ5424 WiFi device support wifi: ath12k: Add CE remap hardware parameters for IPQ5424 wifi: ath12k: add ath12k_hw_regs for IPQ5424 wifi: ath12k: add ath12k_hw_version_map entry for IPQ5424 wifi: ath12k: Add ath12k_hw_params for IPQ5424 dt-bindings: net: wireless: add ath12k wifi device IPQ5424 wifi: ath10k: fix station lookup failure during disconnect wifi: ath12k: Create symlink for each radio in a wiphy ... ==================== Link: https://patch.msgid.link/20260410064703.735099-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 09:17:42 -07:00
Eric Dumazet	29703d7813	tcp: add indirect call wrapper in tcp_conn_request() Small improvement in SYN processing, to directly call tcp_v6_init_seq_and_ts_off() or tcp_v4_init_seq_and_ts_off(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260410174950.745670-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 09:17:03 -07:00
Eric Dumazet	f5148298b0	tcp: return a drop_reason from tcp_add_backlog() Part of a stack canary removal from tcp_v{4,6}_rcv(). Return a drop_reason instead of a boolean, so that we no longer have to pass the address of a local variable. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-37 (-37) Function old new delta tcp_v6_rcv 3133 3129 -4 tcp_v4_rcv 3206 3202 -4 tcp_add_backlog 1281 1252 -29 Total: Before=25567186, After=25567149, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260409101147.1642967-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 09:07:53 -07:00
Linus Torvalds	7c6c4ed80b	vfs-7.0-rc8.fixes Please consider pulling these changes from the signed vfs-7.0-rc8.fixes tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCgAKCRCRxhvAZXjc om/TAQDsAIxYiJ4hR7rNrKuyL+FP7kuN8WX9DmjU+45Pt/SZNwEA2pSH0y7osa2+ xRGkN0pPQTu6JIlx0rCXlY9PYnXCPQg= =RHx5 -----END PGP SIGNATURE----- Merge tag 'vfs-7.0-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "The kernfs rbtree is keyed by (hash, ns, name) where the hash is seeded with the raw namespace pointer via init_name_hash(ns). The resulting hash values are exposed to userspace through readdir seek positions, and the pointer-based ordering in kernfs_name_compare() is observable through entry order. Switch from raw pointers to ns_common::ns_id for both hashing and comparison. A preparatory commit first replaces all const void * namespace parameters with const struct ns_common * throughout kernfs, sysfs, and kobject so the code can access ns->ns_id. Also compare the ns_id when hashes match in the rbtree to handle crafted collisions. Also fix eventpoll RCU grace period issue and a cachefiles refcount problem" * tag 'vfs-7.0-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: kernfs: make directory seek namespace-aware kernfs: use namespace id instead of pointer for hashing and comparison kernfs: pass struct ns_common instead of const void * for namespace tags eventpoll: defer struct eventpoll free to RCU grace period cachefiles: fix incorrect dentry refcount in cachefiles_cull()	2026-04-10 08:40:49 -07:00
Fernando Fernandez Mancera	84dee05d9d	netfilter: conntrack: remove UDP-Lite conntrack support UDP-Lite (RFC 3828) socket support was recently retired from the core networking stack. As a follow-up of that, drop the connection tracker and NAT support for UDP-Lite in Netfilter. This patch removes CONFIG_NF_CT_PROTO_UDPLITE and scrubs UDP-Lite awareness from the conntrack core, NAT core, nft_ct, and ctnetlink. Please note that stateless packet inspection, matching, ipsets or logging support for IPPROTO_UDPLITE is preserved. As conntrack no longer extracts UDP-Lite ports or tracks its L4 state, when performing NAT the UDP-Lite checksum cannot be updated anymore. That is an expected and acceptable consequence of removing UDP-Lite conntrack module. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-10 12:16:26 +02:00
Jakub Kicinski	581d28606c	net: remove the netif_get_rx_queue_lease_locked() helpers The netif_get_rx_queue_lease_locked() API hides the locking and the descend onto the leased queue. Making the code harder to follow (at least to me). Remove the API and open code the descend a bit. Most of the code now looks like: if (!leased) return __helper(x); hw_rxq = .. netdev_lock(hw_rxq->dev); ret = __helper(x); netdev_unlock(hw_rxq->dev); return ret; Of course if we have more code paths that need the wrapping we may need to revisit. For now, IMHO, having to know what netif_get_rx_queue_lease_locked() does is not worth the 20LoC it saves. Link: https://patch.msgid.link/20260408151251.72bd2482@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-09 18:26:28 -07:00
Jakub Kicinski	1508922588	Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp' Daniel Borkmann says: ==================== netkit: Support for io_uring zero-copy and AF_XDP Containers use virtual netdevs to route traffic from a physical netdev in the host namespace. They do not have access to the physical netdev in the host and thus can't use memory providers or AF_XDP that require reconfiguring/restarting queues in the physical netdev. This patchset adds the concept of queue leasing to virtual netdevs that allow containers to use memory providers and AF_XDP at native speed. Leased queues are bound to a real queue in a physical netdev and act as a proxy. Memory providers and AF_XDP operations take an ifindex and queue id, so containers would pass in an ifindex for a virtual netdev and a queue id of a leased queue, which then gets proxied to the underlying real queue. We have implemented support for this concept in netkit and tested the latter against Nvidia ConnectX-6 (mlx5) as well as Broadcom BCM957504 (bnxt_en) 100G NICs. For more details see the individual patches. ==================== Link: https://patch.msgid.link/20260402231031.447597-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-09 18:24:35 -07:00
David Wei	222b5566a0	net: Proxy netdev_queue_get_dma_dev for leased queues Extend netdev_queue_get_dma_dev to return the physical device of the real rxq for DMA in case the queue was leased. This allows memory providers like io_uring zero-copy or devmem to bind to the physically leased rxq via virtual devices such as netkit. Signed-off-by: David Wei <dw@davidwei.uk> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402231031.447597-8-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-09 18:21:46 -07:00
Daniel Borkmann	1e91c98bc9	net: Slightly simplify net_mp_{open,close}_rxq net_mp_open_rxq is currently not used in the tree as all callers are using __net_mp_open_rxq directly, and net_mp_close_rxq is only used once while all other locations use __net_mp_close_rxq. Consolidate into a single API, netif_mp_{open,close}_rxq, using the netif_ prefix to indicate that the caller is responsible for locking. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402231031.447597-6-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-09 18:21:46 -07:00
Daniel Borkmann	21d58b35e5	net: Add lease info to queue-get response Populate nested lease info to the queue-get response that returns the ifindex, queue id with type and optionally netns id if the device resides in a different netns. Example with ynl client when using AF_XDP via queue leasing: # ip a [...] 4: enp10s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:24 qdisc mq state UP group default qlen 1000 link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 scope global enp10s0f0np0 valid_lft forever preferred_lft forever inet6 fe80::eaeb:d3ff:fea3:43f6/64 scope link proto kernel_ll valid_lft forever preferred_lft forever [...] # ethtool -i enp10s0f0np0 driver: mlx5_core [...] # ynl --family netdev --output-json --do queue-get \ --json '{"ifindex": 4, "id": 15, "type": "rx"}' {'id': 15, 'ifindex': 4, 'lease': {'ifindex': 8, 'netns-id': 0, 'queue': {'id': 1, 'type': 'rx'}}, 'napi-id': 8227, 'type': 'rx', 'xsk': {}} # ip netns list foo (id: 0) # ip netns exec foo ip a [...] 8: nk@NONE: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff inet6 fe80::200:ff:fe00:0/64 scope link proto kernel_ll valid_lft forever preferred_lft forever [...] # ip netns exec foo ethtool -i nk driver: netkit [...] # ip netns exec foo ls /sys/class/net/nk/queues/ rx-0 rx-1 tx-0 # ip netns exec foo ynl --family netdev --output-json --do queue-get \ --json '{"ifindex": 8, "id": 1, "type": "rx"}' {"id": 1, "type": "rx", "ifindex": 8, "xsk": {}} Note that the caller of netdev_nl_queue_fill_one() holds the netdevice lock. For the queue-get we do not lock both devices. When queues get {un,}leased, both devices are locked, thus if __netif_get_rx_queue_lease() returns a lease pointer, it points to a valid device. The netns-id is fetched via peernet2id_alloc() similarly as done in OVS. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402231031.447597-4-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-09 18:21:46 -07:00
Daniel Borkmann	d04686d9bc	net: Implement netdev_nl_queue_create_doit Implement netdev_nl_queue_create_doit which creates a new rx queue in a virtual netdev and then leases it to a rx queue in a physical netdev. Example with ynl client: # ynl --family netdev --output-json --do queue-create \ --json '{"ifindex": 8, "type": "rx", "lease": {"ifindex": 4, "queue": {"type": "rx", "id": 15}}}' {'id': 1} Note that the netdevice locking order is always from the virtual to the physical device. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402231031.447597-3-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-09 18:21:45 -07:00
Jakub Kicinski	b6e39e4846	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc8). Conflicts: net/ipv6/seg6_iptunnel.c `c3812651b5` ("seg6: separate dst_cache for input and output paths in seg6 lwtunnel") `78723a62b9` ("seg6: add per-route tunnel source address") https://lore.kernel.org/adZhwtOYfo-0ImSa@sirena.org.uk net/ipv4/icmp.c `fde29fd934` ("ipv4: icmp: fix null-ptr-deref in icmp_build_probe()") `d98adfbdd5` ("ipv4: drop ipv6_stub usage and use direct function calls") https://lore.kernel.org/adO3dccqnr6j-BL9@sirena.org.uk Adjacent changes: drivers/net/ethernet/stmicro/stmmac/chain_mode.c `51f4e090b9` ("net: stmmac: fix integer underflow in chain mode") `6b4286e055` ("net: stmmac: rename STMMAC_GET_ENTRY() -> STMMAC_NEXT_ENTRY()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-09 13:20:59 -07:00
Konstantin Taranov	b21058880c	RDMA/mana_ib: Support memory windows Implement .alloc_mw() and .dealloc_mw() for mana device. This is just the basic infrastructure, MW is not practically usable until additional kernel support for allowing user space to submit MW work requests is completed. Link: https://patch.msgid.link/r/20260331090851.2276205-1-kotaranov@linux.microsoft.com Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-04-09 11:22:06 -03:00
Christian Brauner	e3b2cf6e5d	kernfs: pass struct ns_common instead of const void * for namespace tags kernfs has historically used const void * to pass around namespace tags used for directory-level namespace filtering. The only current user of this is sysfs network namespace tagging where struct net pointers are cast to void . Replace all const void namespace parameters with const struct ns_common * throughout the kernfs, sysfs, and kobject namespace layers. This includes the kobj_ns_type_operations callbacks, kobject_namespace(), and all sysfs/kernfs APIs that accept or return namespace tags. Passing struct ns_common is needed because various codepaths require access to the underlying namespace. A struct ns_common can always be converted back to the concrete namespace type (e.g., struct net) via container_of() or to_ns_common() in the reverse direction. This is a preparatory change for switching to ns_id-based directory iteration to prevent a KASLR pointer leak through the current use of raw namespace pointers as hash seeds and comparison keys. Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-04-09 14:36:52 +02:00
Or Har-Toov	6f38acfed5	devlink: Add port-level resource registration infrastructure The current devlink resource infrastructure supports only device-level resources. Some hardware resources are associated with specific ports rather than the entire device, and today we have no way to show resource per-port. Add support for registering resources at the port level. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Or Har-Toov	7be3163c49	devlink: Refactor resource functions to be generic Currently the resource functions take devlink pointer as parameter and take the resource list from there. Allow resource functions to work with other resource lists that will be added in next patches and not only with the devlink's resource list. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Eric Dumazet	202ab59941	net: dropreason: add MACVLAN_BROADCAST_BACKLOG and IPVLAN_MULTICAST_BACKLOG ipvlan and macvlan use queues to process broadcast/multicast packets from a work queue. Under attack these queues can drop packets. Add MACVLAN_BROADCAST_BACKLOG drop_reason for macvlan broadcast queue. Add IPVLAN_MULTICAST_BACKLOG drop_reason for ipvlan multicast queue. Use different reasons as some deployments use both ipvlan and macvlan. Also change ipvlan_rcv_frame() to use SKB_DROP_REASON_DEV_READY when the device is not UP. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260407150710.1640747-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:19:18 -07:00
Eric Dumazet	ea25e03da7	codel: annotate data-races in codel_dump_stats() codel_dump_stats() only runs with RTNL held, reading fields that can be changed in qdisc fast path. Add READ_ONCE()/WRITE_ONCE() annotations. Alternative would be to acquire the qdisc spinlock, but our long-term goal is to make qdisc dump operations lockless as much as we can. tc_codel_xstats fields don't need to be latched atomically, otherwise this bug would have been caught earlier. No change in kernel size: $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 0/0 grow/shrink: 1/1 up/down: 3/-1 (2) Function old new delta codel_qdisc_dequeue 2462 2465 +3 codel_dump_stats 250 249 -1 Total: Before=29739919, After=29739921, chg +0.00% Fixes: `76e3cc126b` ("codel: Controlled Delay AQM") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260407143053.1570620-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:18:52 -07:00
Xiang Mei	f81f4e79b1	bonding: remove unused bond_is_first_slave and bond_is_last_slave macros Since commit `2884bf72fb` ("net: bonding: fix use-after-free in bond_xmit_broadcast()"), bond_is_last_slave() was only used in bond_xmit_broadcast(). After the recent fix replaced that usage with a simple index comparison, bond_is_last_slave() has no remaining callers. bond_is_first_slave() likewise has no callers. Remove both unused macros. Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260404220412.444753-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:07:08 -07:00
Florian Westphal	936206e3f6	netfilter: nfnetlink_queue: make hash table per queue Sharing a global hash table among all queues is tempting, but it can cause crash: BUG: KASAN: slab-use-after-free in nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue] [..] nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue] nfnetlink_rcv_msg+0x46a/0x930 kmem_cache_alloc_node_noprof+0x11e/0x450 struct nf_queue_entry is freed via kfree, but parallel cpu can still encounter such an nf_queue_entry when walking the list. Alternative fix is to free the nf_queue_entry via kfree_rcu() instead, but as we have to alloc/free for each skb this will cause more mem pressure. Cc: Scott Mitchell <scott.k.mitch1@gmail.com> Fixes: `e19079adcd` ("netfilter: nfnetlink_queue: optimize verdict lookup with hash table") Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 13:34:51 +02:00
Tuan Do	f8dca15a1b	netfilter: nft_ct: fix use-after-free in timeout object destroy nft_ct_timeout_obj_destroy() frees the timeout object with kfree() immediately after nf_ct_untimeout(), without waiting for an RCU grace period. Concurrent packet processing on other CPUs may still hold RCU-protected references to the timeout object obtained via rcu_dereference() in nf_ct_timeout_data(). Add an rcu_head to struct nf_ct_timeout and use kfree_rcu() to defer freeing until after an RCU grace period, matching the approach already used in nfnetlink_cttimeout.c. KASAN report: BUG: KASAN: slab-use-after-free in nf_conntrack_tcp_packet+0x1381/0x29d0 Read of size 4 at addr ffff8881035fe19c by task exploit/80 Call Trace: nf_conntrack_tcp_packet+0x1381/0x29d0 nf_conntrack_in+0x612/0x8b0 nf_hook_slow+0x70/0x100 __ip_local_out+0x1b2/0x210 tcp_sendmsg_locked+0x722/0x1580 __sys_sendto+0x2d8/0x320 Allocated by task 75: nft_ct_timeout_obj_init+0xf6/0x290 nft_obj_init+0x107/0x1b0 nf_tables_newobj+0x680/0x9c0 nfnetlink_rcv_batch+0xc29/0xe00 Freed by task 26: nft_obj_destroy+0x3f/0xa0 nf_tables_trans_destroy_work+0x51c/0x5c0 process_one_work+0x2c4/0x5a0 Fixes: `7e0b2b57f0` ("netfilter: nft_ct: add ct timeout support") Cc: stable@vger.kernel.org Signed-off-by: Tuan Do <tuan@calif.io> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 13:34:16 +02:00
Pablo Neira Ayuso	c6f8557758	netfilter: nf_tables_offload: add nft_flow_action_entry_next() and use it Add a new helper function to retrieve the next action entry in flow rule, check if the maximum number of actions is reached, bail out in such case. Replace existing opencoded iteration on the action array by this helper function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:31 +02:00
Pablo Neira Ayuso	3785091c6c	netfilter: nft_meta: add double-tagged vlan and pppoe support Currently: add rule netdev x y ip saddr 1.1.1.1 does not work with neither double-tagged vlan nor pppoe packets. This is because the network and transport header offset are not pointing to the IP and transport protocol headers in the stack. This patch expands NFT_META_PROTOCOL and NFT_META_L4PROTO to parse double-tagged vlan and pppoe packets so matching network and transport header fields becomes possible with the existing userspace generated bytecode. Note that this parser only supports double-tagged vlan which is composed of vlan offload + vlan header in the skb payload area for simplicity. NFT_META_PROTOCOL is used by bridge and netdev family as an implicit dependency in the bytecode to match on network header fields. Similarly, there is also NFT_META_L4PROTO, which is also used as an implicit dependency when matching on the transport protocol header fields. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:31 +02:00
Eric Dumazet	7fb4c19670	net: pull headers in qdisc_pkt_len_segs_init() Most ndo_start_xmit() methods expects headers of gso packets to be already in skb->head. net/core/tso.c users are particularly at risk, because tso_build_hdr() does a memcpy(hdr, skb->data, hdr_len); qdisc_pkt_len_segs_init() already does a dissection of gso packets. Use pskb_may_pull() instead of skb_header_pointer() to make sure drivers do not have to reimplement this. Some malicious packets could be fed, detect them so that we can drop them sooner with a new SKB_DROP_REASON_SKB_BAD_GSO drop_reason. Fixes: `e876f208af` ("net: Add a software TSO helper API") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260403221540.3297753-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 19:02:13 -07:00
Miri Korenblit	840492bf33	wifi: mac80211: add NAN peer schedule support Peer schedules specify which channels the peer is available on and when. Add support for configuring peer NAN schedules: - build and store the schedule and maps - for each channel, make sure that it fits into the capabilities, and take the minimum between it and the local compatible nan channel. - configure the driver Note that the removal of a peer schedule should be done by the driver upon NMI station removal. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260326121156.185ff2283fa6.I0345eb665be8ccf4a77eb1aca9a421eb8d2432e2@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-04-07 15:36:03 +02:00
Miri Korenblit	27e9b326b6	wifi: mac80211: support NAN stations Add support for both NMI and NDI stations. The NDI station will be linked to the NMI station of the NAN peer for which the NDI station is added. A peer can choose to reuse its NMI address as the NDI address. Since different keys might be in use for NAN management and for data frames, we will have 2 different stations, even if they'll have the same address. Even though there are no links in NAN, sta->deflink will still be used to store the one set of capabilities and SMPS mode. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260326121156.9fdd37b8e755.I7a7bd6e8e751cab49c329419485839afd209cfc6@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-04-07 15:36:03 +02:00
Miri Korenblit	589c06e8fd	wifi: mac80211: add NAN local schedule support A NAN local schedule consist of a list of NAN channels, and an array that maps time slots to the channel it is scheduled to (or NULL to indicate unscheduled). A NAN channel is the configuration of a channel which is used for NAN operations. It is a new type of chanctx user (before, the only user is a link). A NAN channel may not have a chanctx assigned if it is ULWed out. A NAN channel may or may not be scheduled (for example, user space may want to prepare the resources before the actual schedule is configured). Add management of the NAN local schedule. Since we introduce a new chanctx user, also adjust the different for_each_chanctx_user_* macros to visit also the NAN channels and take those into account. Co-developed-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260326121156.03350fd40630.Id158f815cfc9b5ab1ebdb8ee608bda426e4d7474@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-04-07 15:36:02 +02:00
Benjamin Berg	b16df0dacb	wifi: mac80211: export ieee80211_calculate_rx_timestamp The function is quite useful when handling beacon timestamps. Export it so that it can be used by mac80211_hwsim and others. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260326121156.a1abc9c52f37.Ieabfe66768b1bf64c3076d62e73c50794faeacdc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-04-07 15:36:02 +02:00
Benjamin Berg	7f0de94ef4	wifi: mac80211: add a TXQ for management frames on NAN devices Currently there is no TXQ for non-data frames. Add a new txq_mgmt for this purpose and create one of these on NAN devices. On NAN devices, these frames may only be transmitted during the discovery window and it is therefore helpful to schedule them using a queue. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260326121156.32eddd986bd2.Iee95758287c276155fbd7779d3f263339308e083@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-04-07 15:36:02 +02:00
Geliang Tang	eb477fdd68	tcp: add recv_should_stop helper Factor out a new helper tcp_recv_should_stop() from tcp_recvmsg_locked() and tcp_splice_read() to check whether to stop receiving. And use this helper in mptcp_recvmsg() and mptcp_splice_read() to reduce redundant code. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-3-b0b33bea3fed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 19:14:27 -07:00
Maciej Fijalkowski	93e84fe45b	xsk: fix XDP_UMEM_SG_FLAG issues Currently xp_assign_dev_shared() is missing XDP_USE_SG being propagated to flags so set it in order to preserve mtu check that is supposed to be done only when no multi-buffer setup is in picture. Also, this flag has the same value as XDP_UMEM_TX_SW_CSUM so we could get unexpected SG setups for software Tx checksums. Since csum flag is UAPI, modify value of XDP_UMEM_SG_FLAG. Fixes: `d609f3d228` ("xsk: add multi-buffer support for sockets sharing umem") Reviewed-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20260402154958.562179-4-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:43:51 -07:00
Maciej Fijalkowski	1ee1605138	xsk: respect tailroom for ZC setups Multi-buffer XDP stores information about frags in skb_shared_info that sits at the tailroom of a packet. The storage space is reserved via xdp_data_hard_end(): ((xdp)->data_hard_start + (xdp)->frame_sz - \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) and then we refer to it via macro below: static inline struct skb_shared_info * xdp_get_shared_info_from_buff(const struct xdp_buff xdp) { return (struct skb_shared_info )xdp_data_hard_end(xdp); } Currently we do not respect this tailroom space in multi-buffer AF_XDP ZC scenario. To address this, introduce xsk_pool_get_tailroom() and use it within xsk_pool_get_rx_frame_size() which is used in ZC drivers to configure length of HW Rx buffer. Typically drivers on Rx Hw buffers side work on 128 byte alignment so let us align the value returned by xsk_pool_get_rx_frame_size() in order to avoid addressing this on driver's side. This addresses the fact that idpf uses mentioned function before pool->dev being set so we were at risk that after subtracting tailroom we would not provide 128-byte aligned value to HW. Since xsk_pool_get_rx_frame_size() is actively used in xsk_rcv_check() and __xsk_rcv(), add a variant of this routine that will not include 128 byte alignment and therefore old behavior is preserved. Reviewed-by: Björn Töpel <bjorn@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Fixes: `24ea50127e` ("xsk: support mbuf on ZC RX") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20260402154958.562179-3-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:43:51 -07:00
Daniel Golle	f259e08494	net: dsa: add bridge member iteration macro Drivers that offload bridges need to iterate over the ports that are members of a given bridge, for example to rebuild per-port forwarding bitmaps when membership changes. Currently drivers typically open-code this by combining dsa_switch_for_each_user_port() with a dsa_port_offloads_bridge_dev() check, or cache bridge membership within the driver. Add dsa_switch_for_each_bridge_member() macro to express this pattern directly, and use it for the existing dsa_bridge_ports() inline helper. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/e7136aaa26773f39e805a00fe4ecf13cd2b83fc0.1775049897.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:30:33 -07:00
Daniel Golle	b0a79590d1	net: dsa: move dsa_bridge_ports() helper to dsa.h The yt921x driver contains a helper to create a bitmap of ports which are members of a bridge. Move the helper as static inline function into dsa.h, so other driver can make use of it as well. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/4f8bbfce3e4e3a02064fc4dc366263136c6e0383.1775049897.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:30:33 -07:00
Byungchul Park	db359fccf2	mm: introduce a new page type for page pool in page type Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to determine if a page belongs to a page pool. However, with the planned removal of @pp_magic, we should instead leverage the page_type in struct page, such as PGTY_netpp, for this purpose. Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(), and __ClearPageNetpp() instead, and remove the existing APIs accessing @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and netmem_clear_pp_magic(). Plus, add @page_type to struct net_iov at the same offset as struct page so as to use the page_type APIs for struct net_iov as well. While at it, reorder @type and @owner in struct net_iov to avoid a hole and increasing the struct size. This work was inspired by the following link: https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/ While at it, move the sanity check for page pool to on the free path. [byungchul@sk.com: gate the sanity check, per Johannes] Link: https://lkml.kernel.org/r/20260316223113.20097-1-byungchul@sk.com Link: https://lkml.kernel.org/r/20260224051347.19621-1-byungchul@sk.com Co-developed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Byungchul Park <byungchul@sk.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: David Wei <dw@davidwei.uk> Cc: Dragos Tatulea <dtatulea@nvidia.com> Cc: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Mark Bloch <mbloch@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Simon Horman <horms@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:53:19 -07:00
Chris J Arges	77facb3522	net: increase IP_TUNNEL_RECURSION_LIMIT to 5 In configurations with multiple tunnel layers and MPLS lwtunnel routing, a single tunnel hop can increment the counter beyond this limit. This causes packets to be dropped with the "Dead loop on virtual device" message even when a routing loop doesn't exist. Increase IP_TUNNEL_RECURSION_LIMIT from 4 to 5 to handle this use-case. Fixes: `6f1a9140ec` ("net: add xmit recursion limit to tunnel xmit functions") Link: https://lore.kernel.org/netdev/88deb91b-ef1b-403c-8eeb-0f971f27e34f@redhat.com/ Signed-off-by: Chris J Arges <carges@cloudflare.com> Link: https://patch.msgid.link/20260402222401.3408368-1-carges@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:52:10 -07:00
Jakub Kicinski	8ffb33d770	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc7). Conflicts: net/vmw_vsock/af_vsock.c `b18c833888` ("vsock: initialize child_ns_mode_locked in vsock_net_init()") `0de607dc4f` ("vsock: add G2H fallback for CIDs not owned by H2G transport") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c `ceee35e567` ("bnxt_en: Refactor some basic ring setup and adjustment logic") `57cdfe0dc7` ("bnxt_en: Resize RSS contexts on channel count change") drivers/net/wireless/intel/iwlwifi/mld/mac80211.c `4d56037a02` ("wifi: iwlwifi: mld: block EMLSR during TDLS connections") `687a95d204` ("wifi: iwlwifi: mld: correctly set wifi generation data") drivers/net/wireless/intel/iwlwifi/mld/scan.h `b6045c899e` ("wifi: iwlwifi: mld: Refactor scan command handling") `ec66ec6a5a` ("wifi: iwlwifi: mld: Fix MLO scan timing") drivers/net/wireless/intel/iwlwifi/mvm/fw.c `078df640ef` ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v 2") `323156c354` ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:03:13 -07:00
Jeremy Kerr	22cb45afd2	net: mctp: perform source address lookups when we populate our dst Rather than querying the output device for its address in mctp_local_output, set up the source address when we're populating the dst structure. If no address is assigned, use MCTP_ADDR_NULL. This will allow us more flexibility when routing for NULL-source-eid cases. For now though, we still reject a NULL source address in the output path. We need to update the tests a little, so that addresses are assigned before we do the dst lookups. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20260331-dev-mctp-null-eids-v1-1-b4d047372eaf@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-02 13:31:36 +02:00
Long Li	dbeb256e8d	RDMA/mana_ib: Disable RX steering on RSS QP destroy When an RSS QP is destroyed (e.g. DPDK exit), mana_ib_destroy_qp_rss() destroys the RX WQ objects but does not disable vPort RX steering in firmware. This leaves stale steering configuration that still points to the destroyed RX objects. If traffic continues to arrive (e.g. peer VM is still transmitting) and the VF interface is subsequently brought up (mana_open), the firmware may deliver completions using stale CQ IDs from the old RX objects. These CQ IDs can be reused by the ethernet driver for new TX CQs, causing RX completions to land on TX CQs: WARNING: mana_poll_tx_cq+0x1b8/0x220 [mana] (is_sq == false) WARNING: mana_gd_process_eq_events+0x209/0x290 (cq_table lookup fails) Fix this by disabling vPort RX steering before destroying RX WQ objects. Note that mana_fence_rqs() cannot be used here because the fence completion is delivered on the CQ, which is polled by user-mode (e.g. DPDK) and not visible to the kernel driver. Refactor the disable logic into a shared mana_disable_vport_rx() in mana_en, exported for use by mana_ib, replacing the duplicate code. The ethernet driver's mana_dealloc_queues() is also updated to call this common function. Fixes: `0266a17763` ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Cc: stable@vger.kernel.org Signed-off-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/20260325194100.1929056-1-longli@microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-03-30 13:47:45 -04:00
Fernando Fernandez Mancera	964870b4b9	ipv6: remove ipv6_stub infrastructure completely As IPv6 is built-in only and there are no more users of ipv6_stub, the ipv6_stub is now entirely obsolete. Remove all the code related to the definition, initialization and usage. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Link: https://patch.msgid.link/20260325120928.15848-11-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:24 -07:00
Fernando Fernandez Mancera	ad84b1eefe	bpf: remove ipv6_bpf_stub completely and use direct function calls As IPv6 is built-in only, the ipv6_bpf_stub can be removed completely. Convert all ipv6_bpf_stub usage to direct function calls instead. The fallback functions introduced previously will prevent linkage errors when CONFIG_IPV6 is disabled. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260325120928.15848-10-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:24 -07:00
Fernando Fernandez Mancera	d76f6b170a	net: convert remaining ipv6_stub users to direct function calls As IPv6 is built-in only, the ipv6_stub infrastructure is no longer necessary. Convert remaining ipv6_stub users to make direct function calls. The fallback functions introduced previously will prevent linkage errors when CONFIG_IPV6 is disabled. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Link: https://patch.msgid.link/20260325120928.15848-9-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:23 -07:00
Fernando Fernandez Mancera	4b70b20215	ipv6: prepare headers for ipv6_stub removal In preparation for dropping ipv6_stub and converting its users to direct function calls, introduce static inline dummy functions and fallback macros in the IPv6 networking headers. In addition, introduce checks on fib6_nh_init(), ip6_dst_lookup_flow() and ip6_fragment() to avoid a crash due to ipv6.disable=1 set during booting. The other functions are safe as they cannot be called with ipv6.disable=1 set. These fallbacks ensure that when CONFIG_IPV6 is completely disabled, there are no compiling or linking errors due to code paths not guarded by preprocessor macro IS_ENABLED(CONFIG_IPV6). In addition, export ndisc_send_na(), ip6_route_input() and ip6_fragment(). Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Link: https://patch.msgid.link/20260325120928.15848-6-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:23 -07:00
Fernando Fernandez Mancera	fde39f7df1	ipv6: replace IS_BUILTIN(CONFIG_IPV6) with IS_ENABLED(CONFIG_IPV6) As IPv6 is built-in only, it does not make sense to continue using IS_BUILTIN(CONFIG_IPV6). Therefore, replace it with IS_ENABLED() when necessary and drop it if it isn't valid anymore. Notice that there is still one instance related to ICMPv6, as it requires more changes it will be handle separately. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260325120928.15848-4-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:23 -07:00
Fernando Fernandez Mancera	0557a34487	net: remove EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() macros As IPv6 is built-in only, the macro is always evaluating to an empty one. Remove it completely from the code. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260325120928.15848-3-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:22 -07:00
Jiayuan Chen	552994294f	tcp: Fix inconsistent indenting warning Suppress such warning reported by test robot: include/net/tcp.h:1449 tcp_ca_event() warn: inconsistent indenting Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603251430.gQ3VuiKV-lkp@intel.com/ Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260325071854.805-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:44:45 -07:00
Sabrina Dubroca	629ec78ef8	mpls: add seqcount to protect the platform_label{,s} pair The RCU-protected codepaths (mpls_forward, mpls_dump_routes) can have an inconsistent view of platform_labels vs platform_label in case of a concurrent resize (resize_platform_label_table, under platform_mutex). This can lead to OOB accesses. This patch adds a seqcount, so that we get a consistent snapshot. Note that mpls_label_ok is also susceptible to this, so the check against RTA_DST in rtm_to_route_config, done outside platform_mutex, is not sufficient. This value gets passed to mpls_label_ok once more in both mpls_route_add and mpls_route_del, so there is no issue, but that additional check must not be removed. Reported-by: Yuan Tan <tanyuan98@outlook.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Fixes: `7720c01f3f` ("mpls: Add a sysctl to control the size of the mpls label table") Fixes: `dde1b38e87` ("mpls: Convert mpls_dump_routes() to RCU.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/cd8fca15e3eb7e212b094064cd83652e20fd9d31.1774284088.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 18:32:14 -07:00
Jakub Kicinski	dbd94b9831	A fairly big set of changes all over, notably with: - cfg80211: new APIs for NAN (Neighbor Aware Networking, aka Wi-Fi Aware) so less work must be in firmware - mt76: - mt7996/mt7925 MLO fixes/improvements - mt7996 NPU support (HW eth/wifi traffic offload) - iwlwifi: UNII-9 and continuing UHR work -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmnFTegACgkQ10qiO8sP aABpghAAmcubFELG/ivDfwujEXjeKRU4CGcFPWDnOwBo28w8bQ36SoKRh251BUSL 4XCEwZwPR2gFI77bJ7fLn1gsRNd8Cv+t8wsi2K3TV3bOy6wCxH85A7l4GmN5vGzP 9MLcAAT7R684YAC4gFAi3DqFmSucd/ZodAt93Cw7+ikXq2tvrbR5wgUv9AQ5mUIw f5cqocOOv+4IbSL+r2cQnCAKLGWxVMJpoiWuAPpIQn7odcrncrhvBIG3l9ZC4KOL BKiO+YpK8Yg3+uc9zrz+RwOcQx6TjzgAydFY/AnqOmGfQ2dGaWC/zy/5stCOVrfd mAqw4jr14eAumUoHQoNrOBsWikuDBKmYMjHVObR3cKB9jJ/54CHtSYJVueg9gdhP 4+s5lNkX0zEt76wimYQRpCkYhalBUZMwUv3HFnab99PDDmWvNFS8uHi8i2g7U81i yVdxI3MbQp2SRgJMDbKQPziSad1qJyIzg/LoN9fb6GV1DoNZ3IZabgVMOA2IoB0L zYi3Yuyo63yhDh2Np9uzDsIRQAbTCdbou2fzPqy6CvOyG6JXxCI8PZpZAN7dqYxc u8rljjaxQ4IYfBWrryFdHzIrYHJLo/B4g8kSFE+vzLiFblFmTxBoziHDWpJ4u5im YTyOyBYAtzQf0l8cZPKzRq+AuVgIuJVNV/3zyxnoFxfqg/lUWNk= =zap4 -----END PGP SIGNATURE----- Merge tag 'wireless-next-2026-03-26' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== A fairly big set of changes all over, notably with: - cfg80211: new APIs for NAN (Neighbor Aware Networking, aka Wi-Fi Aware) so less work must be in firmware - mt76: - mt7996/mt7925 MLO fixes/improvements - mt7996 NPU support (HW eth/wifi traffic offload) - iwlwifi: UNII-9 and continuing UHR work * tag 'wireless-next-2026-03-26' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (230 commits) wifi: mac80211: ignore reserved bits in reconfiguration status wifi: cfg80211: allow protected action frame TX for NAN wifi: ieee80211: Add some missing NAN definitions wifi: nl80211: Add a notification to notify NAN channel evacuation wifi: nl80211: add NL80211_CMD_NAN_ULW_UPDATE notification wifi: nl80211: allow reporting spurious NAN Data frames wifi: cfg80211: allow ToDS=0/FromDS=0 data frames on NAN data interfaces wifi: nl80211: define an API for configuring the NAN peer's schedule wifi: nl80211: add support for NAN stations wifi: cfg80211: separately store HT, VHT and HE capabilities for NAN wifi: cfg80211: add support for NAN data interface wifi: cfg80211: make sure NAN chandefs are valid wifi: cfg80211: Add an API to configure local NAN schedule wifi: mac80211: cleanup error path of ieee80211_do_open wifi: mac80211: extract channel logic from link logic wifi: iwlwifi: mld: set RX_FLAG_RADIOTAP_TLV_AT_END generically wifi: iwlwifi: reduce the number of prints upon firmware crash wifi: iwlwifi: fix the description of SESSION_PROTECTION_CMD wifi: iwlwifi: mld: introduce iwl_mld_vif_fw_id_valid wifi: iwlwifi: mld: block EMLSR during TDLS connections ... ==================== Link: https://patch.msgid.link/20260326152021.305959-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 18:17:14 -07:00
Jakub Kicinski	9ebcf66cd6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc6). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 12:09:57 -07:00
Long Li	45b2b84ac6	net: mana: Set default number of queues to 16 Set the default number of queues per vPort to MANA_DEF_NUM_QUEUES (16), as 16 queues can achieve optimal throughput for typical workloads. The actual number of queues may be lower if it exceeds the hardware reported limit. Users can increase the number of queues up to max_queues via ethtool if needed. Signed-off-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/20260323194925.1766385-1-longli@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-26 15:04:31 +01:00
Pablo Neira Ayuso	02a3231b6d	netfilter: nf_conntrack_expect: store netns and zone in expectation __nf_ct_expect_find() and nf_ct_expect_find_get() are called under rcu_read_lock() but they dereference the master conntrack via exp->master. Since the expectation does not hold a reference on the master conntrack, this could be dying conntrack or different recycled conntrack than the real master due to SLAB_TYPESAFE_RCU. Store the netns, the master_tuple and the zone in struct nf_conntrack_expect as a safety measure. This patch is required by the follow up fix not to dump expectations that do not belong to this netns. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:24:40 +01:00
Pablo Neira Ayuso	bffcaad9af	netfilter: ctnetlink: ensure safe access to master conntrack Holding reference on the expectation is not sufficient, the master conntrack object can just go away, making exp->master invalid. To access exp->master safely: - Grab the nf_conntrack_expect_lock, this gets serialized with clean_from_lists() which also holds this lock when the master conntrack goes away. - Hold reference on master conntrack via nf_conntrack_find_get(). Not so easy since the master tuple to look up for the master conntrack is not available in the existing problematic paths. This patch goes for extending the nf_conntrack_expect_lock section to address this issue for simplicity, in the cases that are described below this is just slightly extending the lock section. The add expectation command already holds a reference to the master conntrack from ctnetlink_create_expect(). However, the delete expectation command needs to grab the spinlock before looking up for the expectation. Expand the existing spinlock section to address this to cover the expectation lookup. Note that, the nf_ct_expect_iterate_net() calls already grabs the spinlock while iterating over the expectation table, which is correct. The get expectation command needs to grab the spinlock to ensure master conntrack does not go away. This also expands the existing spinlock section to cover the expectation lookup too. I needed to move the netlink skb allocation out of the spinlock to keep it GFP_KERNEL. For the expectation events, the IPEXP_DESTROY event is already delivered under the spinlock, just move the delivery of IPEXP_NEW under the spinlock too because the master conntrack event cache is reached through exp->master. While at it, add lockdep notations to help identify what codepaths need to grab the spinlock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:18:32 +01:00
Pablo Neira Ayuso	9c42bc9db9	netfilter: nf_conntrack_expect: honor expectation helper field The expectation helper field is mostly unused. As a result, the netfilter codebase relies on accessing the helper through exp->master. Always set on the expectation helper field so it can be used to reach the helper. nf_ct_expect_init() is called from packet path where the skb owns the ct object, therefore accessing exp->master for the newly created expectation is safe. This saves a lot of updates in all callsites to pass the ct object as parameter to nf_ct_expect_init(). This is a preparation patches for follow up fixes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:18:31 +01:00
Miri Korenblit	154b0296c0	wifi: nl80211: Add a notification to notify NAN channel evacuation If all available channel resources are used for NAN channels, and one of them is shared with another interface, and that interface needs to move to a different channel (for example STA interface that needs to do a channel or a link switch), then the driver can evacuate one of the NAN channels (i.e. detach it from its channel resource and announce to the peers that this channel is ULWed). In that case, the driver needs to notify user space about the channel evacuation, so the user space can adjust the local schedule accordingly. Add a notification to let userspace know about it. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.d5bebfd5ff73.Iaaf5ef17e1ab7a38c19d60558e68fcf517e2b400@changeid Link: https://patch.msgid.link/20260318123926.206536-11-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:55 +01:00
Miri Korenblit	44ea50a5bf	wifi: nl80211: add NL80211_CMD_NAN_ULW_UPDATE notification Add a new notification command that allows drivers to notify user space when the device's ULW (Unaligned Schedule) blob has been updated. This enables user space to attach the updated ULW blob to frames sent to NAN peers. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.32b715af4ebb.Ibdb6e33941afd94abf77245245f87e4338d729d3@changeid Link: https://patch.msgid.link/20260318123926.206536-10-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:55 +01:00
Miri Korenblit	f826534483	wifi: nl80211: allow reporting spurious NAN Data frames Currently we have this ability for AP and GO. But it is now needed also for NAN_DATA mode - as per Wi-Fi Aware (TM) 4.0 specification 6.2.5: "If a NAN Device receives a unicast NAN Data frame destined for it, but with A1 address and A2 address that are not assigned to the NDP, it shall discard the frame, and should send a Data Path Termination NAF to the frame transmitter" To allow this, change NL80211_CMD_UNEXPECTED_FRAME to support also NAN_DATA, so drivers can report such cases and the user space can act accordingly. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260108102921.5cf9f1351655.I47c98ce37843730b8b9eb8bd8e9ef62ed6c17613@changeid Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219094725.3846371-6-miriam.rachel.korenblit@intel.com Link: https://patch.msgid.link/20260318123926.206536-9-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:55 +01:00
Miri Korenblit	c4aa273ff6	wifi: nl80211: define an API for configuring the NAN peer's schedule Add an NL80211 command to configure the NAN schedule of a NAN peer. Such a schedule contains a list of NAN channels, and a mapping from each time slots to the corresponding channel (or unscheduled). Also contains more information about the schedule, such as sequence ID and map ID. Not all of the restrictions are validated in this patch. In particular, comparison of two maps of the same peer requires storing/retrieving each map of each peer, only for validation. Therefore, it is the responsibilty of the driver to check that. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.5b13fa5af4f6.If0e214ff5b52c9666e985fefa3f7be0ad14d93fb@changeid Link: https://patch.msgid.link/20260318123926.206536-7-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:55 +01:00
Miri Korenblit	1f1101c29e	wifi: nl80211: add support for NAN stations There are 2 types of logical links with a NAN peer: - management (NMI), which is used for Tx/Rx of NAN management frames. - data (NDI), which is used for Tx/Rx of data frames, or non-NAN management frames. The NMI station has two roles: - representation of the NAN peer - for example, the peer's schedule and the HT, VHT, HE capabilities - belong to the NMI station, and not to the NDI ones. - Tx/Rx of NAN management frames to/from the peer. The NDI station is used for Tx/Rx data frames of a specific NDP that was established with the NAN peer. Note that a peer can choose to reuse its NMI address as the NDI address. In that case, it is expected that two stations will be added even though they will have the same address. - An NDI station can only be added after the corresponding NMI station was configured with capabilities. - All the NDI stations will be removed before the NDI interface is brought down. - All NMI stations will be removed before NAN is stopped. - Before NMI sta removal, all corresponding NDI stations will be removed Add support for adding, removing, and changing NMI and NDI stations. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.d280936ee832.I6d859eee759bb5824a9ffd2984410faf879ba00e@changeid Link: https://patch.msgid.link/20260318123926.206536-6-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:54 +01:00
Miri Korenblit	bd11c96604	wifi: cfg80211: separately store HT, VHT and HE capabilities for NAN In NAN, unlike in other modes, there is only one set of (HT, VHT, HE) capabilities that is used for all channels (and bands) used in the NAN data path. This set of capabilities will have to be a special one, for example - have the minimum of (HT-for-5 GHz, HT-for-2.4 GHz), careful handling of the bits that have a different meaning for each band, etc. While we could use the exiting sband/iftype capabilities, and require identical capabilities for all bands (makes no sense since this means that we will have VHT capabilities in the 2.4 GHz slot), or require that only one of the sbands will be set, or have logic to extract the minimum and handle the conflicting bits - it seems simpler to add a dedicated set of capabilities which is special for NAN, and is band agnostic, to be populated by the driver. That way we also let the driver decide how it wants to handle the conflicting bits. Add this special set of these capabilities to wiphy:nan_capabilities, to be populated by the driver. Send it to user space. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.4b6f3e4a81b4.I45422adc0df3ad4101d857a92e83f0de5cf241e1@changeid Link: https://patch.msgid.link/20260318123926.206536-5-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:54 +01:00
Miri Korenblit	0e8ec738a7	wifi: cfg80211: add support for NAN data interface This new interface type represents a NAN data interface (NDI). It is used for data communication with NAN peers. Note that the existing NL80211_IFTYPE_NAN interface, which is the NAN Management Interface (NMI), is used for management communication. An NDI interface is started when a new NAN data path is about to be established, and is stopped after the NAN data path is terminated. - An NDI interface can only be started if the NMI is running, and NAN is started. - Before the NMI is stopped, the NDI interfaces will be stopped. Add the new interface type, handle add/remove operations for it, and makes sure of the conditions above. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.0d681335c2e2.I92973483e927820ae2297853c141842fdb262747@changeid Link: https://patch.msgid.link/20260318123926.206536-4-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:53 +01:00
Miri Korenblit	6e78b70c9a	wifi: cfg80211: Add an API to configure local NAN schedule Add an nl80211 API to allow user space to configure the local NAN schedule. The local schedule consists of a list of channel definitions and a schedule map, in which each element covers a time slot and indicates on what channel the device should be in that time slot. Channels can be added to schedule even without being scheduled, for reservation purposes. A schedule can be configured either immedietally or be deferred, in case there are already connected peers. When the deferred flag is set, the command is a request from the device to perform an announced schedule update: send the updated NAN Availability - as set in this command - to the peers, and do the actual switch to the new schedule on the right time (i.e. at the end of the slot after the slot in which the update was sent to the peers). In addition, a notification will be sent to indicate a deferred update completion. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.ecca178a2de0.Ic977ab08b4ed5cf9b849e55d3a59b01ad3fbd08e@changeid Link: https://patch.msgid.link/20260318123926.206536-2-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:05 +01:00
Eric Dumazet	d1e59a4697	tcp: add cwnd_event_tx_start to tcp_congestion_ops (tcp_congestion_ops)->cwnd_event() is called very often, with @event oscillating between CA_EVENT_TX_START and other values. This is not branch prediction friendly. Provide a new cwnd_event_tx_start pointer dedicated for CA_EVENT_TX_START. Both BBR and CUBIC benefit from this change, since they only care about CA_EVENT_TX_START. No change in kernel size: $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 4/4 grow/shrink: 3/1 up/down: 564/-568 (-4) Function old new delta bbr_cwnd_event_tx_start - 450 +450 cubictcp_cwnd_event_tx_start - 70 +70 __pfx_cubictcp_cwnd_event_tx_start - 16 +16 __pfx_bbr_cwnd_event_tx_start - 16 +16 tcp_unregister_congestion_control 93 99 +6 tcp_update_congestion_control 518 521 +3 tcp_register_congestion_control 422 425 +3 __tcp_transmit_skb 3308 3306 -2 __pfx_cubictcp_cwnd_event 16 - -16 __pfx_bbr_cwnd_event 16 - -16 cubictcp_cwnd_event 80 - -80 bbr_cwnd_event 454 - -454 Total: Before=25240512, After=25240508, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260323234920.1097858-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-24 21:00:38 -07:00
Jonas Köppeler	815980fe6d	net_sched: codel: fix stale state for empty flows in fq_codel When codel_dequeue() finds an empty queue, it resets vars->dropping but does not reset vars->first_above_time. The reference CoDel algorithm (Nichols & Jacobson, ACM Queue 2012) resets both: dodeque_result codel_queue_t::dodeque(time_t now) { ... if (r.p == NULL) { first_above_time = 0; // <-- Linux omits this } ... } Note that codel_should_drop() does reset first_above_time when called with a NULL skb, but codel_dequeue() returns early before ever calling codel_should_drop() in the empty-queue case. The post-drop code paths do reach codel_should_drop(NULL) and correctly reset the timer, so a dropped packet breaks the cycle -- but the next delivered packet re-arms first_above_time and the cycle repeats. For sparse flows such as ICMP ping (one packet every 200ms-1s), the first packet arms first_above_time, the flow goes empty, and the second packet arrives after the interval has elapsed and gets dropped. The pattern repeats, producing sustained loss on flows that are not actually congested. Test: veth pair, fq_codel, BQL disabled, 30000 iptables rules in the consumer namespace (NAPI-64 cycle ~14ms, well above fq_codel's 5ms target), ping at 5 pps under UDP flood: Before fix: 26% ping packet loss After fix: 0% ping packet loss Fix by resetting first_above_time to zero in the empty-queue path of codel_dequeue(), matching the reference algorithm. Fixes: `76e3cc126b` ("codel: Controlled Delay AQM") Fixes: `d068ca2ae2` ("codel: split into multiple files") Co-developed-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de> Reported-by: Chris Arges <carges@cloudflare.com> Tested-by: Jonas Köppeler <j.koeppeler@tu-berlin.de> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/all/20260318134826.1281205-7-hawk@kernel.org/ Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260323174920.253526-1-hawk@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-24 20:57:57 -07:00
Paolo Abeni	51a209ee33	ipsec-2026-03-23 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmnA+cgACgkQrB3Eaf9P W7fJgBAAlZKkRki11NUIeI8IjOzEoMRShSsbOMjeCVBUDKc05krfWyln1FLuQbD/ BNSgRNFQ0uT653Cn88CbVRtxuebkmhde7bH29yEpfnsd/duVDlJaHkwjCEH15hvb zIeWrzdn+ct77Kg6i1EsJ5BfC7kADYWfgCFrSAAz2MEerCGNcLn2pKlopAEIGAD9 Ahd7XohBK9uxP8ZhF4GLQAjTImTDEQmJJek0QDdGp6sr+V0PuIh1MQ75SjW+9rZK 4p+rHhsOGCcjobljbksYTJd9/5hC2ThqsYBBbRsxS+g9ibvMvDoal2PCtBA7SnHZ F66PL8Lui555V4jL80Fi80Mu/uquizOX0iMiVjhAtepiqxn9IZleXutddPN/9yCg tHlk7IytBSovGBBT/AdL6F8hOVvwAFa/pnr/6pzjcjmiIkwSLMCU0ge/yjF01vGK tnltSGfuZ9+aF6XEjAmIZ2jMbA7mtKIoc9VOJB5/96yFS3G48/E7Aq6SNYIF8vyB N6xgdbhqp4PfIYuQ+zWcibj2XAGlXW9RF34i2CSbf7BlztetoctS8iuHlUWIlkS3 dcYAp7/ZQWRM779pg9pTKw7kGUwPlS0LbUBr4Z8nvcxdBUULuKc+9PAgRO3nX1v0 7EbIukGdhc+hvM8zC/aok8g6h8cPNvvaaL8CLL+wSYt28/xHrLs= =E39n -----END PGP SIGNATURE----- Merge tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-03-23 1) Add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi. From Sabrina Dubroca. 2) Fix the condition on x->pcpu_num in xfrm_sa_len by using the proper check. From Sabrina Dubroca. 3) Call xdo_dev_state_delete during state update to properly cleanup the xdo device state. From Sabrina Dubroca. 4) Fix a potential skb leak in espintcp when async crypto is used. From Sabrina Dubroca. 5) Validate inner IPv4 header length in IPTFS payload to avoid parsing malformed packets. From Roshan Kumar. 6) Fix skb_put() panic on non-linear skb during IPTFS reassembly. From Fernando Fernandez Mancera. 7) Silence various sparse warnings related to RCU, state, and policy handling. From Sabrina Dubroca. 8) Fix work re-schedule race after cancel in xfrm_nat_keepalive_net_fini(). From Hyunwoo Kim. 9) Prevent policy_hthresh.work from racing with netns teardown by using a proper cleanup mechanism. From Minwoo Ra. 10) Validate that the family of the source and destination addresses match in pfkey_send_migrate(). From Eric Dumazet. 11) Only publish mode_data after the clone is setup in the IPTFS receive path. This prevents leaving x->mode_data pointing at freed memory on error. From Paul Moses. Please pull or let me know if there are problems. ipsec-2026-03-23 * tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: iptfs: only publish mode_data after clone setup af_key: validate families in pfkey_send_migrate() xfrm: prevent policy_hthresh.work from racing with netns teardown xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini() xfrm: avoid RCU warnings around the per-netns netlink socket xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfo xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfo xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini} xfrm: state: silence sparse warnings during netns exit xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_proto xfrm: state: add xfrm_state_deref_prot to state_by* walk under lock xfrm: state: fix sparse warnings around XFRM_STATE_INSERT xfrm: state: fix sparse warnings in xfrm_state_init xfrm: state: fix sparse warnings on xfrm_state_hold_rcu xfrm: iptfs: fix skb_put() panic on non-linear skb during reassembly xfrm: iptfs: validate inner IPv4 header length in IPTFS payload esp: fix skb leak with espintcp and async crypto xfrm: call xdo_dev_state_delete during state update xfrm: fix the condition on x->pcpu_num in xfrm_sa_len xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi ==================== Link: https://patch.msgid.link/20260323083440.2741292-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-24 15:16:28 +01:00
Martin KaFai Lau	e537dd15d0	udp: Fix wildcard bind conflict check when using hash2 When binding a udp_sock to a local address and port, UDP uses two hashes (udptable->hash and udptable->hash2) for collision detection. The current code switches to "hash2" when hslot->count > 10. "hash2" is keyed by local address and local port. "hash" is keyed by local port only. The issue can be shown in the following bind sequence (pseudo code): bind(fd1, "[fd00::1]:8888") bind(fd2, "[fd00::2]:8888") bind(fd3, "[fd00::3]:8888") bind(fd4, "[fd00::4]:8888") bind(fd5, "[fd00::5]:8888") bind(fd6, "[fd00::6]:8888") bind(fd7, "[fd00::7]:8888") bind(fd8, "[fd00::8]:8888") bind(fd9, "[fd00::9]:8888") bind(fd10, "[fd00::10]:8888") /* Correctly return -EADDRINUSE because "hash" is used * instead of "hash2". udp_lib_lport_inuse() detects the * conflict. / bind(fail_fd, "[::]:8888") / After one more socket is bound to "[fd00::11]:8888", * hslot->count exceeds 10 and "hash2" is used instead. / bind(fd11, "[fd00::11]:8888") bind(fail_fd, "[::]:8888") / succeeds unexpectedly */ The same issue applies to the IPv4 wildcard address "0.0.0.0" and the IPv4-mapped wildcard address "::ffff:0.0.0.0". For example, if there are existing sockets bound to "192.168.1.[1-11]:8888", then binding "0.0.0.0:8888" or "[::ffff:0.0.0.0]:8888" can also miss the conflict when hslot->count > 10. TCP inet_csk_get_port() already has the correct check in inet_use_bhash2_on_bind(). Rename it to inet_use_hash2_on_bind() and move it to inet_hashtables.h so udp.c can reuse it in this fix. Fixes: `30fff9231f` ("udp: bind() optimisation") Reported-by: Andrew Onyshchuk <oandrew@meta.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260319181817.1901357-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-23 18:46:45 -07:00
Kuniyuki Iwashima	4be7b99c25	ipv6: Don't remove permanent routes with exceptions from tb6_gc_hlist. The cited commit mechanically put fib6_remove_gc_list() just after every fib6_clean_expires() call. When a temporary route is promoted to a permanent route, there may already be exception routes tied to it. If fib6_remove_gc_list() removes the route from tb6_gc_hlist, such exception routes will no longer be aged. Let's replace fib6_remove_gc_list() with a new helper fib6_may_remove_gc_list() and use fib6_age_exceptions() there. Note that net->ipv6 is only compiled when CONFIG_IPV6 is enabled, so fib6_{add,remove,may_remove}_gc_list() are guarded. Fixes: `5eb902b8e7` ("net/ipv6: Remove expired routes with a separated list of routes.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260320072317.2561779-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-23 16:59:31 -07:00
Jakub Kicinski	edab1ca5ec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc5). net/netfilter/nft_set_rbtree.c `598adea720` ("netfilter: revert nft_set_rbtree: validate open interval overlap") `3aea466a43` ("netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock") https://lore.kernel.org/abgaQBpeGstdN4oq@sirena.org.uk No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-19 14:16:00 -07:00
Luiz Augusto von Dentz	761fb8ec87	Bluetooth: L2CAP: Fix regressions caused by reusing ident This attempt to fix regressions caused by reusing ident which apparently is not handled well on certain stacks causing the stack to not respond to requests, so instead of simple returning the first unallocated id this stores the last used tx_ident and then attempt to use the next until all available ids are exausted and then cycle starting over to 1. Link: https://bugzilla.kernel.org/show_bug.cgi?id=221120 Link: https://bugzilla.kernel.org/show_bug.cgi?id=221177 Fixes: `6c3ea155e5` ("Bluetooth: L2CAP: Fix not tracking outstanding TX ident") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Christian Eggers <ceggers@arri.de>	2026-03-19 14:44:25 -04:00
Paolo Abeni	9ac76f3d0b	Aside from various small improvements/cleanups, not much: - cfg80211/mac80211: S1G and UHR improvements - hwsim: incumbent signal report test support -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmm7sb4ACgkQ10qiO8sP aAD+iQ/+IYWmM1Z/Iu6eZZx/VPrc4Xnj/8UgbalyjetyLRYFNEvawFSdutqZ23uI FO7vYbzGXMtAlt7fjmxVKMiN4aoX+rISRjG5cnH1qPpeVO8w9fnOZyqmNUFJFboN ibpr4dqPIS2qZDKegvOa9JO+8KkkPerPWl608eOzXPxoZaZAMnXOhWuV4cWdvuTT vEnL+Ma4ckkOV6QdBFazYaxAyTt3Mpqj5ULodixtKPMdgB3P+6mAVipp/icE5R1P R/Vd7Fn+0r7wb/4+1S6DcCBvT6V6Ui94bIRF9DB5LGG/9iLPrGYRD52qQpetzXzA Si238bs7qi/6t6Q5tfzK1LZVnzZXTUqcWGS6ba4JiMxrLTAK1AEmcLved6A48ywt YH9zImLRBRMSANbH27BoWvijT5YZGMetH06cTdFmZ8MMGoYV7CWBxVOaIroH7WMx exMnWEcX6PUVMtlIR4FTGwX/nalGbvnBtoMv9ei3NRb2Dkart8OFT6vIDfy6TBnD BzAUE3pDAW3I7ukbLQGJ3mmanZpHtF/Xgfr5Y9EbZHPjtC08l7cwdd2zn0n3Q2qu JGlzZiut6sJTfnRESbUvJ6fnCMdGARpQxq6p2At3njJW0sncvyV9WFKh4A+ReaDr PQ24fgapG5PNEISevO+/FV1z2qZ0+IbHSmcH+BIoktBnPUBLZFo= =cLVw -----END PGP SIGNATURE----- Merge tag 'wireless-next-2026-03-19' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Aside from various small improvements/cleanups, not much: - cfg80211/mac80211: S1G and UHR improvements - hwsim: incumbent signal report test support * tag 'wireless-next-2026-03-19' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (31 commits) qtnfmac: use alloc_netdev macro for single queue devices wifi: libertas: don't kill URBs in interrupt context wifi: libertas: use USB anchors for tracking in-flight URBs wifi: nl80211: use int for band coming from netlink wifi: rsi_91x_usb: do not pause rfkill polling when stopping mac80211 wifi: mac80211: fix STA link removal during link removal wifi: nl80211: reject S1G/60G with HT chantype wifi: ieee80211: fix definition of EHT-MCS 15 in MRU wifi: cfg80211: check non-S1G width with S1G chandef wifi: cfg80211: restrict cfg80211_chandef_create() to only HT-based bands wifi: mac80211: don't use cfg80211_chandef_create() for default chandef wifi: mac80211: Remove deleted sta links in ieee80211_ml_reconf_work() wifi: b43: use register definitions in nphy_op_software_rfkill wifi: cfg80211: split control freq check from chandef check wifi: mac80211: always use full chanctx compatible check wifi: mac80211: refactor chandef tracing macros wifi: mac80211: validate HE 6 GHz operation when EHT is used wifi: nl80211: split out UHR operation information wifi: mwifiex: drop redundant device reference wifi: rt2x00: drop redundant device reference ... ==================== Link: https://patch.msgid.link/20260319082439.79875-3-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-19 15:30:20 +01:00
Eric Woudstra	96450df197	bridge: No DEV_PATH_BR_VLAN_UNTAG_HW for dsa foreign In network setup as below: fastpath bypass .----------------------------------------. / \ \| IP - forwarding \| \| / \ v \| / wan ... \| / \| \| \| \| \| brlan.1 \| \| \| +-------------------------------+ \| \| vlan 1 \| \| \| \| \| \| brlan (vlan-filtering) \| \| \| +---------------+ \| \| \| DSA-SWITCH \| \| \| vlan 1 \| \| \| \| to \| \| \| \| untagged 1 vlan 1 \| \| +---------------+---------------+ . / \ ----->wlan1 lan0 . . . ^ ^ vlan 1 tagged packets untagged packets br_vlan_fill_forward_path_mode() sets DEV_PATH_BR_VLAN_UNTAG_HW when filling in from brlan.1 towards wlan1. But it should be set to DEV_PATH_BR_VLAN_UNTAG in this case. Using BR_VLFLAG_ADDED_BY_SWITCHDEV is not correct. The dsa switchdev adds it as a foreign port. The same problem for all foreignly added dsa vlans on the bridge. First add the vlan, trying only native devices. If this fails, we know this may be a vlan from a foreign device. Use BR_VLFLAG_TAGGING_BY_SWITCHDEV to make sure DEV_PATH_BR_VLAN_UNTAG_HW is set only when there if no foreign device involved. Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Eric Woudstra <ericwouds@gmail.com> Link: https://patch.msgid.link/20260317110347.363875-1-ericwouds@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-19 13:14:00 +01:00
Haiyang Zhang	d01440e10a	net: mana: Add ethtool counters for RX CQEs in coalesced type For RX CQEs with type CQE_RX_COALESCED_4, to measure the coalescing efficiency, add counters to count how many contains 2, 3, 4 packets respectively. Also, add a counter for the error case of first packet with length == 0. Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/20260317191826.1346111-4-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-18 20:01:10 -07:00
Haiyang Zhang	c2fe3ff3d6	net: mana: Add support for RX CQE Coalescing Our NIC can have up to 4 RX packets on 1 CQE. To support this feature, check and process the type CQE_RX_COALESCED_4. The default setting is disabled, to avoid possible regression on latency. And, add ethtool handler to switch this feature. To turn it on, run: ethtool -C <nic> rx-cqe-frames 4 To turn it off: ethtool -C <nic> rx-cqe-frames 1 The rx-cqe-nsec is the time out value in nanoseconds after the first packet arrival in a coalesced CQE to be sent. It's read-only for this NIC. Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/20260317191826.1346111-3-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-18 20:01:10 -07:00
Jakub Kicinski	7c46bd845d	Just a few updates: - cfg80211: - guarantee pmsr work is cancelled - mac80211: - reject TDLS operations on non-TDLS stations - fix crash in AP_VLAN bandwidth change - fix leak or double-free on some TX preparation failures - remove keys needed for beacons _after_ stopping those - fix debugfs static branch race - avoid underflow in inactive time - fix another NULL dereference in mesh on invalid frames - ti/wlcore: avoid infinite realloc loop -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmm63pMACgkQ10qiO8sP aAD+0w//QZRJt2tOsp/QOqmlEGs5zk8BnutcrU0ov/8OatgCX5sYWr2GD9Eub15P t+NWWSJoOaXrEvlpyhFTDB4RPnKKUbajFVGmQJTgddFvYzXARDupFrQIpBZ+UqYr kwNH/vnHxOuQ5MLaiuvaldbMdzdsH1R9Lr0nBqilg1tL1emQVTFFAfMh6URlbzB/ EaMG7sWKyzjVCvaGNBKsjyrfdWAz4LkyAw47St/MDV9GofSdSA2Oyt7PGM+TYuQ1 ozKsbOBiXuKIQkNVXNFQrrsGePY1hXgj4F0mO1KvjRov+2Wq+Xk+KFFpCCGeZrGt ZTehROtzS3I96UZmpFimJGdLOiiFC/CqP9bDBOn4y87Ink24m0/z2WFyLcp4IpDy KQFaPpvFnigZmuB+crtv+OI1bNuzb04EjfC1+M3AhDgkcMaSUUD/zxczge4DP1tX llYMZh0LL8CdUezTBcB/l3uBMTWh6R7T2bUUIIGLtyMqpMBl4GwncJ7dQFl2wyXr ytXZFE4rJNDXzvxkYOoOrT+JCD1COPiIuddy7xXWdxuC6yzY4H7QXGtljgOZUaqf 0ED6HiTvLG25lep1SLmgbwN2x9+izGxjWrUFqT7DIjxQo9bBulwBUARosoGAAxXW 7pio7oKDtYVD8FYSsFhbmNS/z+9Gs5wqgrfSyjrmvxHZm+rJJFw= =C5Rn -----END PGP SIGNATURE----- Merge tag 'wireless-2026-03-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Just a few updates: - cfg80211: - guarantee pmsr work is cancelled - mac80211: - reject TDLS operations on non-TDLS stations - fix crash in AP_VLAN bandwidth change - fix leak or double-free on some TX preparation failures - remove keys needed for beacons _after_ stopping those - fix debugfs static branch race - avoid underflow in inactive time - fix another NULL dereference in mesh on invalid frames - ti/wlcore: avoid infinite realloc loop * tag 'wireless-2026-03-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: always free skb on ieee80211_tx_prepare_skb() failure wifi: wlcore: Return -ENOMEM instead of -EAGAIN if there is not enough headroom wifi: mac80211: fix NULL deref in mesh_matches_local() wifi: mac80211: check tdls flag in ieee80211_tdls_oper wifi: cfg80211: cancel pmsr_free_wk in cfg80211_pmsr_wdev_down wifi: mac80211: Fix static_branch_dec() underflow for aql_disable. mac80211: fix crash in ieee80211_chan_bw_change for AP_VLAN stations wifi: mac80211: use jiffies_delta_to_msecs() for sta_info inactive times wifi: mac80211: remove keys after disabling beaconing wifi: mac80211_hwsim: fully initialise PMSR capabilities ==================== Link: https://patch.msgid.link/20260318172515.381148-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-18 19:25:41 -07:00
Xiang Mei	b3a6df291f	udp_tunnel: fix NULL deref caused by udp_sock_create6 when CONFIG_IPV6=n When CONFIG_IPV6 is disabled, the udp_sock_create6() function returns 0 (success) without actually creating a socket. Callers such as fou_create() then proceed to dereference the uninitialized socket pointer, resulting in a NULL pointer dereference. The captured NULL deref crash: BUG: kernel NULL pointer dereference, address: 0000000000000018 RIP: 0010:fou_nl_add_doit (net/ipv4/fou_core.c:590 net/ipv4/fou_core.c:764) [...] Call Trace: <TASK> genl_family_rcv_msg_doit.constprop.0 (net/netlink/genetlink.c:1114) genl_rcv_msg (net/netlink/genetlink.c:1194 net/netlink/genetlink.c:1209) [...] netlink_rcv_skb (net/netlink/af_netlink.c:2550) genl_rcv (net/netlink/genetlink.c:1219) netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) netlink_sendmsg (net/netlink/af_netlink.c:1894) __sock_sendmsg (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1)) __sys_sendto (./include/linux/file.h:62 (discriminator 1) ./include/linux/file.h:83 (discriminator 1) net/socket.c:2183 (discriminator 1)) __x64_sys_sendto (net/socket.c:2213 (discriminator 1) net/socket.c:2209 (discriminator 1) net/socket.c:2209 (discriminator 1)) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (net/arch/x86/entry/entry_64.S:130) This patch makes udp_sock_create6 return -EPFNOSUPPORT instead, so callers correctly take their error paths. There is only one caller of the vulnerable function and only privileged users can trigger it. Fixes: `fd384412e1` ("udp_tunnel: Seperate ipv6 functions into its own file.") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260317010241.1893893-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-18 18:00:07 -07:00
Felix Fietkau	d5ad6ab61c	wifi: mac80211: always free skb on ieee80211_tx_prepare_skb() failure ieee80211_tx_prepare_skb() has three error paths, but only two of them free the skb. The first error path (ieee80211_tx_prepare() returning TX_DROP) does not free it, while invoke_tx_handlers() failure and the fragmentation check both do. Add kfree_skb() to the first error path so all three are consistent, and remove the now-redundant frees in callers (ath9k, mt76, mac80211_hwsim) to avoid double-free. Document the skb ownership guarantee in the function's kdoc. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/20260314065455.2462900-1-nbd@nbd.name Fixes: `06be6b149f` ("mac80211: add ieee80211_tx_prepare_skb() helper function") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-18 09:09:58 +01:00
Daniel Borkmann	a0671125d4	clsact: Fix use-after-free in init/destroy rollback asymmetry Fix a use-after-free in the clsact qdisc upon init/destroy rollback asymmetry. The latter is achieved by first fully initializing a clsact instance, and then in a second step having a replacement failure for the new clsact qdisc instance. clsact_init() initializes ingress first and then takes care of the egress part. This can fail midway, for example, via tcf_block_get_ext(). Upon failure, the kernel will trigger the clsact_destroy() callback. Commit `1cb6f0bae5` ("bpf: Fix too early release of tcx_entry") details the way how the transition is happening. If tcf_block_get_ext on the q->ingress_block ends up failing, we took the tcx_miniq_inc reference count on the ingress side, but not yet on the egress side. clsact_destroy() tests whether the {ingress,egress}_entry was non-NULL. However, even in midway failure on the replacement, both are in fact non-NULL with a valid egress_entry from the previous clsact instance. What we really need to test for is whether the qdisc instance-specific ingress or egress side previously got initialized. This adds a small helper for checking the miniq initialization called mini_qdisc_pair_inited, and utilizes that upon clsact_destroy() in order to fix the use-after-free scenario. Convert the ingress_destroy() side as well so both are consistent to each other. Fixes: `1cb6f0bae5` ("bpf: Fix too early release of tcx_entry") Reported-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260313065531.98639-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-17 12:09:16 +01:00
Jamal Hadi Salim	66360460ca	net/sched: teql: Fix double-free in teql_master_xmit Whenever a TEQL devices has a lockless Qdisc as root, qdisc_reset should be called using the seq_lock to avoid racing with the datapath. Failure to do so may cause crashes like the following: [ 238.028993][ T318] BUG: KASAN: double-free in skb_release_data (net/core/skbuff.c:1139) [ 238.029328][ T318] Free of addr ffff88810c67ec00 by task poc_teql_uaf_ke/318 [ 238.029749][ T318] [ 238.029900][ T318] CPU: 3 UID: 0 PID: 318 Comm: poc_teql_ke Not tainted 7.0.0-rc3-00149-ge5b31d988a41 #704 PREEMPT(full) [ 238.029906][ T318] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 238.029910][ T318] Call Trace: [ 238.029913][ T318] <TASK> [ 238.029916][ T318] dump_stack_lvl (lib/dump_stack.c:122) [ 238.029928][ T318] print_report (mm/kasan/report.c:379 mm/kasan/report.c:482) [ 238.029940][ T318] ? skb_release_data (net/core/skbuff.c:1139) [ 238.029944][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) ... [ 238.029957][ T318] ? skb_release_data (net/core/skbuff.c:1139) [ 238.029969][ T318] kasan_report_invalid_free (mm/kasan/report.c:221 mm/kasan/report.c:563) [ 238.029979][ T318] ? skb_release_data (net/core/skbuff.c:1139) [ 238.029989][ T318] check_slab_allocation (mm/kasan/common.c:231) [ 238.029995][ T318] kmem_cache_free (mm/slub.c:2637 (discriminator 1) mm/slub.c:6168 (discriminator 1) mm/slub.c:6298 (discriminator 1)) [ 238.030004][ T318] skb_release_data (net/core/skbuff.c:1139) ... [ 238.030025][ T318] sk_skb_reason_drop (net/core/skbuff.c:1256) [ 238.030032][ T318] pfifo_fast_reset (./include/linux/ptr_ring.h:171 ./include/linux/ptr_ring.h:309 ./include/linux/skb_array.h:98 net/sched/sch_generic.c:827) [ 238.030039][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) ... [ 238.030054][ T318] qdisc_reset (net/sched/sch_generic.c:1034) [ 238.030062][ T318] teql_destroy (./include/linux/spinlock.h:395 net/sched/sch_teql.c:157) [ 238.030071][ T318] __qdisc_destroy (./include/net/pkt_sched.h:328 net/sched/sch_generic.c:1077) [ 238.030077][ T318] qdisc_graft (net/sched/sch_api.c:1062 net/sched/sch_api.c:1053 net/sched/sch_api.c:1159) [ 238.030089][ T318] ? __pfx_qdisc_graft (net/sched/sch_api.c:1091) [ 238.030095][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 238.030102][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 238.030106][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 238.030114][ T318] tc_get_qdisc (net/sched/sch_api.c:1529 net/sched/sch_api.c:1556) ... [ 238.072958][ T318] Allocated by task 303 on cpu 5 at 238.026275s: [ 238.073392][ T318] kasan_save_stack (mm/kasan/common.c:58) [ 238.073884][ T318] kasan_save_track (mm/kasan/common.c:64 (discriminator 5) mm/kasan/common.c:79 (discriminator 5)) [ 238.074230][ T318] __kasan_slab_alloc (mm/kasan/common.c:369) [ 238.074578][ T318] kmem_cache_alloc_node_noprof (./include/linux/kasan.h:253 mm/slub.c:4542 mm/slub.c:4869 mm/slub.c:4921) [ 238.076091][ T318] kmalloc_reserve (net/core/skbuff.c:616 (discriminator 107)) [ 238.076450][ T318] __alloc_skb (net/core/skbuff.c:713) [ 238.076834][ T318] alloc_skb_with_frags (./include/linux/skbuff.h:1383 net/core/skbuff.c:6763) [ 238.077178][ T318] sock_alloc_send_pskb (net/core/sock.c:2997) [ 238.077520][ T318] packet_sendmsg (net/packet/af_packet.c:2926 net/packet/af_packet.c:3019 net/packet/af_packet.c:3108) [ 238.081469][ T318] [ 238.081870][ T318] Freed by task 299 on cpu 1 at 238.028496s: [ 238.082761][ T318] kasan_save_stack (mm/kasan/common.c:58) [ 238.083481][ T318] kasan_save_track (mm/kasan/common.c:64 (discriminator 5) mm/kasan/common.c:79 (discriminator 5)) [ 238.085348][ T318] kasan_save_free_info (mm/kasan/generic.c:587 (discriminator 1)) [ 238.085900][ T318] __kasan_slab_free (mm/kasan/common.c:287) [ 238.086439][ T318] kmem_cache_free (mm/slub.c:6168 (discriminator 3) mm/slub.c:6298 (discriminator 3)) [ 238.087007][ T318] skb_release_data (net/core/skbuff.c:1139) [ 238.087491][ T318] consume_skb (net/core/skbuff.c:1451) [ 238.087757][ T318] teql_master_xmit (net/sched/sch_teql.c:358) [ 238.088116][ T318] dev_hard_start_xmit (./include/linux/netdevice.h:5324 ./include/linux/netdevice.h:5333 net/core/dev.c:3871 net/core/dev.c:3887) [ 238.088468][ T318] sch_direct_xmit (net/sched/sch_generic.c:347) [ 238.088820][ T318] __qdisc_run (net/sched/sch_generic.c:420 (discriminator 1)) [ 238.089166][ T318] __dev_queue_xmit (./include/net/sch_generic.h:229 ./include/net/pkt_sched.h:121 ./include/net/pkt_sched.h:117 net/core/dev.c:4196 net/core/dev.c:4802) Workflow to reproduce: 1. Initialize a TEQL topology (dummy0 and ifb0 as slaves, teql0 up). 2. Start multiple sender workers continuously transmitting packets through teql0 to drive teql_master_xmit(). 3. In parallel, repeatedly delete and re-add the root qdisc on dummy0 and ifb0 via RTNETLINK, forcing frequent teardown and reset activity (teql_destroy() / qdisc_reset()). 4. After running both workloads concurrently for several iterations, KASAN reports slab-use-after-free or double-free in the skb free path. Fix this by moving dev_reset_queue to sch_generic.h and calling it, instead of qdisc_reset, in teql_destroy since it handles both the lock and lockless cases correctly for root qdiscs. Fixes: `96009c7d50` ("sched: replace __QDISC_STATE_RUNNING bit with a spin lock") Reported-by: Xianrui Dong <keenanat2000@gmail.com> Tested-by: Xianrui Dong <keenanat2000@gmail.com> Co-developed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260315155422.147256-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-16 19:40:32 -07:00
Maciej Fijalkowski	cc6421acd9	xsk: remove repeated defines Seems we have been carrying around repeated defines for unaligned mode logic. Remove redundant ones. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260313111931.438911-1-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-16 19:28:21 -07:00
Jiri Pirko	1850e76b38	devlink: introduce shared devlink instance for PFs on same chip Multiple PFs may reside on the same physical chip, running a single firmware. Some of the resources and configurations may be shared among these PFs. Currently, there is no good object to pin the configuration knobs on. Introduce a shared devlink instance, instantiated upon probe of the first PF and removed during remove of the last PF. The shared devlink instance is not backed by any device device, as there is no PCI device related to it. The implementation uses reference counting to manage the lifecycle: each PF that probes calls devlink_shd_get() to get or create the shared instance, and calls devlink_shd_put() when it removes. The shared instance is automatically destroyed when the last PF removes. Example: pci/0000:08:00.0: index 0 nested_devlink: auxiliary/mlx5_core.eth.0 devlink_index/1: index 1 nested_devlink: pci/0000:08:00.0 pci/0000:08:00.1 auxiliary/mlx5_core.eth.0: index 2 pci/0000:08:00.1: index 3 nested_devlink: auxiliary/mlx5_core.eth.1 auxiliary/mlx5_core.eth.1: index 4 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260312100407.551173-12-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-14 13:08:50 -07:00
Jiri Pirko	20b0f383aa	devlink: add devlink_dev_driver_name() helper and use it in trace events In preparation to dev-less devlinks, add devlink_dev_driver_name() that returns the driver name stored in devlink struct, and use it in all trace events. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260312100407.551173-9-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-14 13:08:49 -07:00
Jiri Pirko	0f5531879a	devlink: add helpers to get bus_name/dev_name Introduce devlink_bus_name() and devlink_dev_name() helpers and convert all direct accesses to devlink->dev->bus->name and dev_name(devlink->dev) to use them. This prepares for dev-less devlink instances where these helpers will be extended to handle the missing device. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260312100407.551173-3-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-14 13:08:47 -07:00
Eric Dumazet	d15d3de94a	net: dropreason: add SKB_DROP_REASON_RECURSION_LIMIT ip[6]tunnel_xmit() can drop packets if a too deep recursion level is detected. Add SKB_DROP_REASON_RECURSION_LIMIT drop reason. We will use this reason later in __dev_queue_xmit(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260312201824.203093-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-14 08:38:06 -07:00
Simon Baatz	0e24d17bd9	tcp: implement RFC 7323 window retraction receiver requirements By default, the Linux TCP implementation does not shrink the advertised window (RFC 7323 calls this "window retraction") with the following exceptions: - When an incoming segment cannot be added due to the receive buffer running out of memory. Since commit `8c670bdfa5` ("tcp: correct handling of extreme memory squeeze") a zero window will be advertised in this case. It turns out that reaching the required memory pressure is easy when window scaling is in use. In the simplest case, sending a sufficient number of segments smaller than the scale factor to a receiver that does not read data is enough. - Commit `b650d953cd` ("tcp: enforce receive buffer memory limits by allowing the tcp window to shrink") addressed the "eating memory" problem by introducing a sysctl knob that allows shrinking the window before running out of memory. However, RFC 7323 does not only state that shrinking the window is necessary in some cases, it also formulates requirements for TCP implementations when doing so (Section 2.4). This commit addresses the receiver-side requirements: After retracting the window, the peer may have a snd_nxt that lies within a previously advertised window but is now beyond the retracted window. This means that all incoming segments (including pure ACKs) will be rejected until the application happens to read enough data to let the peer's snd_nxt be in window again (which may be never). To comply with RFC 7323, the receiver MUST honor any segment that would have been in window for any ACK sent by the receiver and, when window scaling is in effect, SHOULD track the maximum window sequence number it has advertised. This patch tracks that maximum window sequence number rcv_mwnd_seq throughout the connection and uses it in tcp_sequence() when deciding whether a segment is acceptable. rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in tcp_select_window(). If we count tcp_sequence() as fast path, it is read in the fast path. Therefore, rcv_mwnd_seq is put into rcv_wnd's cacheline group. The logic for handling received data in tcp_data_queue() is already sufficient and does not need to be updated. Signed-off-by: Simon Baatz <gmbnomis@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-1-4c7f96b1ec69@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-14 08:01:49 -07:00
Kuniyuki Iwashima	68aeb21ef0	udp: Don't pass udptable to IPv4 socket lookup functions. Since UDP and UDP-Lite had dedicated socket hash tables for each, we have had to pass the pointer down to many socket lookup functions. UDP-Lite gone, and we do not need to do that. Let's fetch net->ipv4.udp_table only where needed in IPv4 stack: __udp4_lib_lookup(), __udp4_lib_mcast_deliver(), and udp_diag_dump(). Some functions are renamed as the wrapper functions are no longer needed. __udp4_lib_err() -> udp_err() __udp_diag_destroy() -> udp_diag_destroy() udp_dump_one() -> udp_diag_dump_one() udp_dump() -> udp_diag_dump() Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-15-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-13 18:57:46 -07:00
Kuniyuki Iwashima	deffb85478	udp: Don't pass udptable to IPv6 socket lookup functions. Since UDP and UDP-Lite had dedicated socket hash tables for each, we have had to pass the pointer down to many socket lookup functions. UDP-Lite gone, and we do not need to do that. Let's fetch net->ipv4.udp_table only where needed in IPv6 stack: __udp6_lib_lookup() and __udp6_lib_mcast_deliver(). __udp6_lib_err() is renamed to udpv6_err() as its wrapper is no longer needed. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-14-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-13 18:57:46 -07:00

1 2 3 4 5 ...

19299 Commits