linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-13 00:28:54 +02:00

Author	SHA1	Message	Date
Jakub Kicinski	735a309b4b	net: add net_iov_init() and use it to initialize ->page_type Commit `db359fccf2` ("mm: introduce a new page type for page pool in page type") added a page_type field to struct net_iov at the same offset as struct page::page_type, so that page_pool_set_pp_info() can call __SetPageNetpp() uniformly on both pages and net_iovs. The page-type API requires the field to hold the UINT_MAX "no type" sentinel before a type can be set; for real struct page that invariant is established by the page allocator on free. struct net_iov is not allocated through the page allocator, so the field is left as zero (io_uring zcrx, which uses __GFP_ZERO) or as slab garbage (devmem, which uses kvmalloc_objs() without zeroing). When the page pool then calls page_pool_set_pp_info() on a freshly-bound niov, __SetPageNetpp()'s VM_BUG_ON_PAGE(page->page_type != UINT_MAX) fires and the kernel BUGs. Triggered in selftests by io_uring zcrx setup through the fbnic queue restart path: kernel BUG at ./include/linux/page-flags.h:1062! RIP: 0010:page_pool_set_pp_info (./include/linux/page-flags.h:1062 net/core/page_pool.c:716) Call Trace: <TASK> net_mp_niov_set_page_pool (net/core/page_pool.c:1360) io_pp_zc_alloc_netmems (io_uring/zcrx.c:1089 io_uring/zcrx.c:1110) fbnic_fill_bdq (./include/net/page_pool/helpers.h:160 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:906) __fbnic_nv_restart (drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:2470 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:2874) fbnic_queue_start (drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:2903) netdev_rx_queue_reconfig (net/core/netdev_rx_queue.c:137) __netif_mp_open_rxq (net/core/netdev_rx_queue.c:234) io_register_zcrx (io_uring/zcrx.c:818 io_uring/zcrx.c:903) __io_uring_register (io_uring/register.c:931) __do_sys_io_uring_register (io_uring/register.c:1029) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) </TASK> The same path is reachable through devmem dmabuf binding via netdev_nl_bind_rx_doit() -> net_devmem_bind_dmabuf_to_queue(). Add a net_iov_init() helper that stamps ->owner, ->type and the ->page_type sentinel, and use it from both the devmem and io_uring zcrx niov init loops. Fixes: `db359fccf2` ("mm: introduce a new page type for page pool in page type") Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/20260428025320.853452-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-29 16:40:08 -07:00
Linus Torvalds	334fbe734e	mm.git review status for linus..mm-stable Everything: Total patches: 368 Reviews/patch: 1.56 Reviewed rate: 74% Excluding DAMON: Total patches: 316 Reviews/patch: 1.77 Reviewed rate: 81% Excluding DAMON and zram: Total patches: 306 Reviews/patch: 1.81 Reviewed rate: 82% Excluding DAMON, zram and maple_tree: Total patches: 276 Reviews/patch: 2.01 Reviewed rate: 91% Significant patch series in this merge: - The 30 patch series "maple_tree: Replace big node with maple copy" from Liam Howlett is mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - The 12 patch series "mm, swap: swap table phase III: remove swap_map" from Kairui Song offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - The 2 patch series "mm: memfd_luo: preserve file seals" from Pratyush Yadav adds file seal preservation to LUO's memfd code. - The 2 patch series "mm: zswap: add per-memcg stat for incompressible pages" from Jiayuan Chen adds additional userspace stats reportng to zswap. - The 4 patch series "arch, mm: consolidate empty_zero_page" from Mike Rapoport implements some cleanups for our handling of ZERO_PAGE() and zero_pfn. - The 2 patch series "mm/kmemleak: Improve scan_should_stop() implementation" from Zhongqiu Han provides an robustness improvement and some cleanups in the kmemleak code. - The 4 patch series "Improve khugepaged scan logic" from Vernon Yang "improves the khugepaged scan logic and reduces CPU consumption by prioritizing scanning tasks that access memory frequently". - The 2 patch series "Make KHO Stateless" from Jason Miu simplifies Kexec Handover by "transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel" - The 3 patch series "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" from Thomas Ballasi and Steven Rostedt enhances vmscan's tracepointing. - The 5 patch series "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" from Catalin Marinas is a cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation. - The 2 patch series "Fix KASAN support for KHO restored vmalloc regions" from Pasha Tatashin fixes a WARN() which can be emitted the KHO restores a vmalloc area. - The 4 patch series "mm: Remove stray references to pagevec" from Tal Zussman provides several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago. - The 17 patch series "mm: Eliminate fake head pages from vmemmap optimization" from Kiryl Shutsemau simplifies the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page. - The 2 patch series "mm/damon/core: improve DAMOS quota efficiency for core layer filters" from SeongJae Park improves two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used. - The 3 patch series "mm/damon: strictly respect min_nr_regions" from SeongJae Park improves DAMON usability by extending the treatment of the min_nr_regions user-settable parameter. - The 3 patch series "mm/page_alloc: pcp locking cleanup" from Vlastimil Babka is a proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ennsed. - The 16 patch series "mm: cleanups around unmapping / zapping" from David Hildenbrand implements "a bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions". - The 6 patch series "support batched checking of the young flag for MGLRU" from Baolin Wang supports batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64. - The 5 patch series "memcg: obj stock and slab stat caching cleanups" from Johannes Weiner provides memcg cleanup and robustness improvements. - The 5 patch series "Allow order zero pages in page reporting" from Yuvraj Sakshith enhances page_reporting's free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - The 6 patch series "mm: vma flag tweaks" from Lorenzo Stoakes is cleanup work following from the recent conversion of the VMA flags to a bitmap. - The 10 patch series "mm/damon: add optional debugging-purpose sanity checks" from SeongJae Park adds some more developer-facing debug checks into DAMON core. - The 2 patch series "mm/damon: test and document power-of-2 min_region_sz requirement" from SeongJae Park adds an additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling. - The 3 patch series "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" from SeongJae Park fixes a hard-to-hit time overflow issue in DAMON core. - The 7 patch series "mm/damon: improve/fixup/update ratio calculation, test and documentation" from SeongJae Park is a "batch of misc/minor improvements and fixups" for DAMON. - The 4 patch series "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" from David Hildenbrand fixes a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - The 6 patch series "zram: recompression cleanups and tweaks" from Sergey Senozhatsky provides "a somewhat random mix of fixups, recompression cleanups and improvements" in the zram code. - The 11 patch series "mm/damon: support multiple goal-based quota tuning algorithms" from SeongJae Park extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select. - The 4 patch series "mm: thp: reduce unnecessary start_stop_khugepaged()" from Breno Leitao fixes the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged. - The 3 patch series "mm: improve map count checks" from Lorenzo Stoakes provides some cleanups and slight fixes in the mremap, mmap and vma code. - The 5 patch series "mm/damon: support addr_unit on default monitoring targets for modules" from SeongJae Park extends the use of DAMON core's addr_unit tunable. - The 5 patch series "mm: khugepaged cleanups and mTHP prerequisites" from Nico Pache provides cleanups in the khugepaged and is a base for Nico's planned khugepaged mTHP support. - The 15 patch series "mm: memory hot(un)plug and SPARSEMEM cleanups" from David Hildenbrand implements code movement and cleanups in the memhotplug and sparsemem code. - The 2 patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" from David Hildenbrand rationalizes some memhotplug Kconfig support. - The 6 patch series "change young flag check functions to return bool" from Baolin Wang is "a cleanup patchset to change all young flag check functions to return bool". - The 3 patch series "mm/damon/sysfs: fix memory leak and NULL dereference issues" from Josh Law and SeongJae Park fixes a few potential DAMON bugs. - The 25 patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma code" from "converts a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it". Mainly in the vma code. - The 21 patch series "mm: expand mmap_prepare functionality and usage" from Lorenzo Stoakes "expands the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time". Cleanups, documentation, extension of mmap_prepare into filesystem drivers. - The 13 patch series "mm/huge_memory: refactor zap_huge_pmd()" from Lorenzo Stoakes simplifies and cleans up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad3HDQAKCRDdBJ7gKXxA jrUQAPwNhPk5nPSxnyxjAeQtOBHqgCdnICeEismLajPKd9aYRgEA0s2XAu3tSUYi GrBnWImHG3s4ePQxVcPCegWTsOUrXgQ= =1Q7o -----END PGP SIGNATURE----- Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...	2026-04-15 12:59:16 -07:00
Byungchul Park	db359fccf2	mm: introduce a new page type for page pool in page type Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to determine if a page belongs to a page pool. However, with the planned removal of @pp_magic, we should instead leverage the page_type in struct page, such as PGTY_netpp, for this purpose. Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(), and __ClearPageNetpp() instead, and remove the existing APIs accessing @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and netmem_clear_pp_magic(). Plus, add @page_type to struct net_iov at the same offset as struct page so as to use the page_type APIs for struct net_iov as well. While at it, reorder @type and @owner in struct net_iov to avoid a hole and increasing the struct size. This work was inspired by the following link: https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/ While at it, move the sanity check for page pool to on the free path. [byungchul@sk.com: gate the sanity check, per Johannes] Link: https://lkml.kernel.org/r/20260316223113.20097-1-byungchul@sk.com Link: https://lkml.kernel.org/r/20260224051347.19621-1-byungchul@sk.com Co-developed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Byungchul Park <byungchul@sk.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: David Wei <dw@davidwei.uk> Cc: Dragos Tatulea <dtatulea@nvidia.com> Cc: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Mark Bloch <mbloch@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Simon Horman <horms@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:53:19 -07:00
Byungchul Park	fd6dad4e1a	netmem: remove the pp fields from net_iov Now that the pp fields in net_iov have no users, remove them from net_iov and clean up. Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20260224061424.11219-1-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-26 19:45:24 -08:00
Eric Dumazet	df7388b3d7	net: inline get_netmem() and put_netmem() These helpers are used in network fast paths. Only call out-of-line helpers for netmem case. We might consider inlining __get_netmem() and __put_netmem() in the future. $ scripts/bloat-o-meter -t vmlinux.3 vmlinux.4 add/remove: 6/6 grow/shrink: 22/1 up/down: 2614/-646 (1968) Function old new delta pskb_carve 1669 1894 +225 gro_pull_from_frag0 - 206 +206 get_page 190 380 +190 skb_segment 3561 3747 +186 put_page 595 765 +170 skb_copy_ubufs 1683 1822 +139 __pskb_trim_head 276 401 +125 __pskb_copy_fclone 734 858 +124 skb_zerocopy 1092 1215 +123 pskb_expand_head 892 1008 +116 skb_split 828 940 +112 skb_release_data 297 409 +112 ___pskb_trim 829 941 +112 __skb_zcopy_downgrade_managed 120 226 +106 tcp_clone_payload 530 634 +104 esp_ssg_unref 191 294 +103 dev_gro_receive 1464 1514 +50 __put_netmem - 41 +41 __get_netmem - 41 +41 skb_shift 1139 1175 +36 skb_try_coalesce 681 714 +33 __pfx_put_page 112 144 +32 __pfx_get_page 32 64 +32 __pskb_pull_tail 1137 1168 +31 veth_xdp_get 250 267 +17 __pfx_gro_pull_from_frag0 - 16 +16 __pfx___put_netmem - 16 +16 __pfx___get_netmem - 16 +16 __pfx_put_netmem 16 - -16 __pfx_gro_try_pull_from_frag0 16 - -16 __pfx_get_netmem 16 - -16 put_netmem 114 - -114 get_netmem 130 - -130 napi_gro_frags 929 771 -158 gro_try_pull_from_frag0 196 - -196 Total: Before=22565857, After=22567825, chg +0.01% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260122045720.1221017-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-25 13:18:53 -08:00
Eric Dumazet	87918dd4ea	net: inline net_is_devmem_iov() 1) Inline this small helper to reduce code size and decrease cpu costs. 2) Constify its argument. 3) Move it to include/net/netmem.h, as a prereq for the following patch. $ scripts/bloat-o-meter -t vmlinux.2 vmlinux.3 add/remove: 0/2 grow/shrink: 0/4 up/down: 0/-158 (-158) Function old new delta validate_xmit_skb 866 857 -9 __pfx_net_is_devmem_iov 16 - -16 net_is_devmem_iov 22 - -22 get_netmem 152 130 -22 put_netmem 140 114 -26 tcp_recvmsg_locked 3860 3797 -63 Total: Before=22566015, After=22565857, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260122045720.1221017-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-25 13:18:53 -08:00
Bobby Eshleman	8443c31608	net: netmem: remove NET_IOV_MAX from net_iov_type enum Remove the NET_IOV_MAX workaround from the net_iov_type enum. This entry was previously added to force the enum size to unsigned long to satisfy the NET_IOV_ASSERT_OFFSET static assertions. After commit `f3d85c9ee5` ("netmem: introduce struct netmem_desc mirroring struct page") this approach became unnecessary by placing the net_iov_type after the netmem_desc. Placing the net_iov_type after netmem_desc results in the net_iov_type size having no effect on the position or layout of the fields that mirror the struct page. The layout before this patch: struct net_iov { union { struct netmem_desc desc; /* 0 48 / struct { long unsigned int _flags; / 0 8 / long unsigned int pp_magic; / 8 8 / struct page_pool pp; /* 16 8 / long unsigned int _pp_mapping_pad; / 24 8 / long unsigned int dma_addr; / 32 8 / atomic_long_t pp_ref_count; / 40 8 / }; / 0 48 / }; / 0 48 / struct net_iov_area owner; /* 48 8 / enum net_iov_type type; / 56 8 / / size: 64, cachelines: 1, members: 3 / }; The layout after this patch: struct net_iov { union { struct netmem_desc desc; / 0 48 / struct { long unsigned int _flags; / 0 8 / long unsigned int pp_magic; / 8 8 / struct page_pool pp; /* 16 8 / long unsigned int _pp_mapping_pad; / 24 8 / long unsigned int dma_addr; / 32 8 / atomic_long_t pp_ref_count; / 40 8 / }; / 0 48 / }; / 0 48 / struct net_iov_area owner; /* 48 8 / enum net_iov_type type; / 56 4 / / size: 64, cachelines: 1, members: 3 / / padding: 4 */ }; Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20251024-b4-devmem-remove-niov-max-v1-1-ba72c68bc869@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-28 17:41:46 -07:00
Byungchul Park	53615ad26e	netmem: replace __netmem_clear_lsb() with netmem_to_nmdesc() Now that we have struct netmem_desc, it'd better access the pp fields via struct netmem_desc rather than struct net_iov. Introduce netmem_to_nmdesc() for safely converting netmem_ref to netmem_desc regardless of the type underneath e.i. netmem_desc, net_iov. While at it, remove __netmem_clear_lsb() and make netmem_to_nmdesc() used instead. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20251013044133.69472-1-byungchul@sk.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-10-14 13:37:26 +02:00
Byungchul Park	89ade7c730	netmem, mlx4: access ->pp_ref_count through netmem_desc instead of page To eliminate the use of struct page in page pool, the page pool users should use netmem descriptor and APIs instead. Make mlx4 access ->pp_ref_count through netmem_desc instead of page. While at it, add a helper, pp_page_to_nmdesc() and __pp_page_to_nmdesc(), that can be used to get netmem_desc from page only if it's a pp page. For now that netmem_desc overlays on page, it can be achieved by just casting, and use macro and _Generic to cover const casting as well. Plus, change page_pool_page_is_pp() to check for 'const struct page ' instead of 'struct page ' since it doesn't modify data and additionally covers const type. Signed-off-by: Byungchul Park <byungchul@sk.com> Link: https://patch.msgid.link/20250721021835.63939-4-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-23 17:46:54 -07:00
Byungchul Park	38a436d4e2	netmem: use netmem_desc instead of page to access ->pp in __netmem_get_pp() To eliminate the use of the page pool fields in struct page, the page pool code should use netmem descriptor and APIs instead. However, __netmem_get_pp() still accesses ->pp via struct page. So change it to use struct netmem_desc instead, since ->pp no longer will be available in struct page. While at it, add a helper, __netmem_to_nmdesc(), that can be used to unsafely get pointer to netmem_desc backing the netmem_ref, only when the netmem_ref is always backed by system memory. Signed-off-by: Byungchul Park <byungchul@sk.com> Link: https://patch.msgid.link/20250721021835.63939-3-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-23 17:46:54 -07:00
Byungchul Park	f3d85c9ee5	netmem: introduce struct netmem_desc mirroring struct page To simplify struct page, the page pool members of struct page should be moved to other, allowing these members to be removed from struct page. Introduce a network memory descriptor to store the members, struct netmem_desc, and make it union'ed with the existing fields in struct net_iov, allowing to organize the fields of struct net_iov. Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20250721021835.63939-2-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-23 17:46:54 -07:00
Byungchul Park	4369d40da2	netmem: use _Generic to cover const casting for page_to_netmem() The current page_to_netmem() doesn't cover const casting resulting in trying to cast const struct page * to const netmem_ref fails. To cover the case, change page_to_netmem() to use macro and _Generic. Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/20250702053256.4594-5-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:40:10 -07:00
Dragos Tatulea	c9e1225352	net: Allow const args for of page_to_netmem() This allows calling page_to_netmem() with a const page * argument. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:11 -07:00
Mina Almasry	0f66b616b8	netmem: fix netmem comments Trivial fix to a couple of outdated netmem comments. No code changes, just more accurately describing current code. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250615203511.591438-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:00:26 -07:00
Mina Almasry	383faec0fd	net: enable driver support for netmem TX Drivers need to make sure not to pass netmem dma-addrs to the dma-mapping API in order to support netmem TX. Add helpers and netmem_dma_*() helpers that enables special handling of netmem dma-addrs that drivers can use. Document in netmem.rst what drivers need to do to support netmem TX. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250508004830.4100853-7-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:49 +02:00
Mina Almasry	e9f3d61db5	net: add get_netmem/put_netmem support Currently net_iovs support only pp ref counts, and do not support a page ref equivalent. This is fine for the RX path as net_iovs are used exclusively with the pp and only pp refcounting is needed there. The TX path however does not use pp ref counts, thus, support for get_page/put_page equivalent is needed for netmem. Support get_netmem/put_netmem. Check the type of the netmem before passing it to page or net_iov specific code to obtain a page ref equivalent. For dmabuf net_iovs, we obtain a ref on the underlying binding. This ensures the entire binding doesn't disappear until all the net_iovs have been put_netmem'ed. We do not need to track the refcount of individual dmabuf net_iovs as we don't allocate/free them from a pool similar to what the buddy allocator does for pages. This code is written to be extensible by other net_iov implementers. get_netmem/put_netmem will check the type of the netmem and route it to the correct helper: pages -> [get\|put]_page() dmabuf net_iovs -> net_devmem_[get\|put]_net_iov() new net_iovs -> new helpers Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250508004830.4100853-3-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:48 +02:00
Mina Almasry	03e96b8c11	netmem: add niov->type attribute to distinguish different net_iov types Later patches in the series adds TX net_iovs where there is no pp associated, so we can't rely on niov->pp->mp_ops to tell what is the type of the net_iov. Add a type enum to the net_iov which tells us the net_iov type. Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250508004830.4100853-2-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:48 +02:00
Pavel Begunkov	7d60fa9e1a	net: generalise net_iov chunk owners Currently net_iov stores a pointer to struct dmabuf_genpool_chunk_owner, which serves as a useful abstraction to share data and provide a context. However, it's too devmem specific, and we want to reuse it for other memory providers, and for that we need to decouple net_iov from devmem. Make net_iov to point to a new base structure called net_iov_area, which dmabuf_genpool_chunk_owner extends. Reviewed-by: Mina Almasry <almasrymina@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-4-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-06 16:27:30 -08:00
Alexander Lobakin	9bd9f72a74	netmem: add a couple of page helper wrappers Add the following netmem counterparts: * virt_to_netmem() -- simple page_to_netmem(virt_to_page()) wrapper; * netmem_is_pfmemalloc() -- page_is_pfmemalloc() for page-backed netmems, false otherwise; and the following "unsafe" versions: * __netmem_to_page() * __netmem_get_pp() * __netmem_address() They do the same as their non-underscored buddies, but assume the netmem is always page-backed. When working with header &page_pools, you don't need to check whether netmem belongs to the host memory and you can never get NULL instead of &page. Checks for the LSB, clearing the LSB, branches take cycles and increase object code size, sometimes significantly. When you're sure your PP is always host, you can avoid this by using the underscored counterparts. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20241203173733.3181246-8-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-05 18:41:07 -08:00
Mina Almasry	8ab79ed50c	page_pool: devmem support Convert netmem to be a union of struct page and struct netmem. Overload the LSB of struct netmem* to indicate that it's a net_iov, otherwise it's a page. Currently these entries in struct page are rented by the page_pool and used exclusively by the net stack: struct { unsigned long pp_magic; struct page_pool *pp; unsigned long _pp_mapping_pad; unsigned long dma_addr; atomic_long_t pp_ref_count; }; Mirror these (and only these) entries into struct net_iov and implement netmem helpers that can access these common fields regardless of whether the underlying type is page or net_iov. Implement checks for net_iov in netmem helpers which delegate to mm APIs, to ensure net_iov are never passed to the mm stack. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240910171458.219195-6-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 20:44:31 -07:00
Mina Almasry	170aafe35c	netdev: support binding dma-buf to netdevice Add a netdev_dmabuf_binding struct which represents the dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to rx queues on the netdevice. On the binding, the dma_buf_attach & dma_buf_map_attachment will occur. The entries in the sg_table from mapping will be inserted into a genpool to make it ready for allocation. The chunks in the genpool are owned by a dmabuf_chunk_owner struct which holds the dma-buf offset of the base of the chunk and the dma_addr of the chunk. Both are needed to use allocations that come from this chunk. We create a new type that represents an allocation from the genpool: net_iov. We setup the net_iov allocation size in the genpool to PAGE_SIZE for simplicity: to match the PAGE_SIZE normally allocated by the page pool and given to the drivers. The user can unbind the dmabuf from the netdevice by closing the netlink socket that established the binding. We do this so that the binding is automatically unbound even if the userspace process crashes. The binding and unbinding leaves an indicator in struct netdev_rx_queue that the given queue is bound, and the binding is actuated by resetting the rx queue using the queue API. The netdev_dmabuf_binding struct is refcounted, and releases its resources only when all the refs are released. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # excluding netlink Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240910171458.219195-4-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 20:44:31 -07:00
Mina Almasry	4dec64c52e	page_pool: convert to use netmem Abstract the memory type from the page_pool so we can later add support for new memory types. Convert the page_pool to use the new netmem type abstraction, rather than use struct page directly. As of this patch the netmem type is a no-op abstraction: it's always a struct page underneath. All the page pool internals are converted to use struct netmem instead of struct page, and the page pool now exports 2 APIs: 1. The existing struct page API. 2. The new struct netmem API. Keeping the existing API is transitional; we do not want to refactor all the current drivers using the page pool at once. The netmem abstraction is currently a no-op. The page_pool uses page_to_netmem() to convert allocated pages to netmem, and uses netmem_to_page() to convert the netmem back to pages to pass to mm APIs, Follow up patches to this series add non-paged netmem support to the page_pool. This change is factored out on its own to limit the code churn to this 1 patch, for ease of code review. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/20240628003253.1694510-6-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-02 18:59:33 -07:00
Mina Almasry	18ddbf5cf0	net: introduce abstraction for network memory Add the netmem_ref type, an abstraction for network memory. To add support for new memory types to the net stack, we must first abstract the current memory type. Currently parts of the net stack use struct page directly: - page_pool - drivers - skb_frag_t Originally the plan was to reuse struct page* for the new memory types, and to set the LSB on the page* to indicate it's not really a page. However, for compiler type checking we need to introduce a new type. netmem_ref is introduced to abstract the underlying memory type. Currently it's a no-op abstraction that is always a struct page underneath. In parallel there is an undergoing effort to add support for devmem to the net stack: https://lore.kernel.org/netdev/20231208005250.2910004-1-almasrymina@google.com/ netmem_ref can be pointers to different underlying memory types, and the low bits are set to indicate the memory type. Helpers are provided to convert netmem pointers to the underlying memory type (currently only struct page). In the devmem series helpers are provided so that calling code can use netmem without worrying about the underlying memory type unless absolutely necessary. Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-02-20 09:22:58 +01:00

23 Commits