mirror of
https://github.com/torvalds/linux.git
synced 2026-06-05 13:06:59 +02:00
When page_pool_create_percpu() fails on page_pool_list(), it falls
through to its err_uninit: label, which calls page_pool_uninit().
At that point page_pool_init() has already taken two references
when the user requested PP_FLAG_ALLOW_UNREADABLE_NETMEM:
pool->mp_ops->init(pool)
static_branch_inc(&page_pool_mem_providers);
Neither is undone by page_pool_uninit(); both are only undone by
__page_pool_destroy() (success-side teardown). The error path
therefore leaks the per-provider reference taken by mp_ops->init
(io_zcrx_ifq->refs in the io_uring zcrx provider, the dmabuf
binding refcount in the devmem provider) plus one increment of
the page_pool_mem_providers static branch on every failure of
xa_alloc_cyclic() inside page_pool_list().
The leaked io_zcrx_ifq->refs in turn pins everything
io_zcrx_ifq_free() would release on cleanup: ifq->user (uid),
ifq->mm_account (mmdrop), ifq->dev (device refcount),
ifq->netdev_tracker (netdev refcount), and the rbuf region.
The leaked static branch increment forces all subsequent
page_pool_alloc_netmems() and page_pool_return_page() callers to
take the slow mp_ops branch for the lifetime of the kernel.
Reachable via the io_uring zcrx path:
io_uring_register(IORING_REGISTER_ZCRX_IFQ) /* CAP_NET_ADMIN */
-> __io_uring_register
-> io_register_zcrx
-> zcrx_register_netdev
-> netif_mp_open_rxq
-> driver ndo_queue_mem_alloc
-> page_pool_create_percpu
-> page_pool_init succeeds (mp_ops->init runs, branch++)
-> page_pool_list fails (xa_alloc_cyclic -ENOMEM)
-> goto err_uninit <-- leak
The same shape applies to the devmem dmabuf provider via
mp_dmabuf_devmem_init()/mp_dmabuf_devmem_destroy().
Restore the cleanup symmetry by moving the mp_ops->destroy() and
static_branch_dec() calls out of __page_pool_destroy() and into
page_pool_uninit(), so page_pool_uninit() is again the strict
inverse of page_pool_init(). page_pool_uninit() has only two
callers (the err_uninit: path and __page_pool_destroy()), so this
preserves the single-call invariant on the success path while
fixing the err path. The error path of page_pool_init() itself
still skips the mp_ops cleanup correctly: mp_ops->init is the
last action that takes a reference before page_pool_init() returns
0, so when it returns an error neither the refcount nor the static
branch has been touched.
Triggering the bug requires xa_alloc_cyclic() to fail with -ENOMEM,
which under normal GFP_KERNEL retry behaviour is rare. It is
deterministic under CONFIG_FAULT_INJECTION with fail_page_alloc /
xa fault injection, or under sustained memory pressure. The leak
is silent: there is no warning, and the released kernel build
continues running with a permanently-incremented static branch.
Fixes:
|
||
|---|---|---|
| .. | ||
| bpf_sk_storage.c | ||
| datagram.c | ||
| dev_addr_lists_test.c | ||
| dev_addr_lists.c | ||
| dev_api.c | ||
| dev_ioctl.c | ||
| dev.c | ||
| dev.h | ||
| devmem.c | ||
| devmem.h | ||
| drop_monitor.c | ||
| dst_cache.c | ||
| dst.c | ||
| failover.c | ||
| fib_notifier.c | ||
| fib_rules.c | ||
| filter.c | ||
| flow_dissector.c | ||
| flow_offload.c | ||
| gen_estimator.c | ||
| gen_stats.c | ||
| gro_cells.c | ||
| gro.c | ||
| gso.c | ||
| hotdata.c | ||
| hwbm.c | ||
| ieee8021q_helpers.c | ||
| link_watch.c | ||
| lock_debug.c | ||
| lwt_bpf.c | ||
| lwtunnel.c | ||
| Makefile | ||
| mp_dmabuf_devmem.h | ||
| neighbour.c | ||
| net_namespace.c | ||
| net_test.c | ||
| net-procfs.c | ||
| net-sysfs.c | ||
| net-sysfs.h | ||
| net-traces.c | ||
| netclassid_cgroup.c | ||
| netdev_config.c | ||
| netdev_queues.c | ||
| netdev_rx_queue.c | ||
| netdev-genl-gen.c | ||
| netdev-genl-gen.h | ||
| netdev-genl.c | ||
| netevent.c | ||
| netmem_priv.h | ||
| netpoll.c | ||
| netprio_cgroup.c | ||
| of_net.c | ||
| page_pool_priv.h | ||
| page_pool_user.c | ||
| page_pool.c | ||
| pktgen.c | ||
| ptp_classifier.c | ||
| rtnetlink.c | ||
| scm.c | ||
| secure_seq.c | ||
| selftests.c | ||
| skb_fault_injection.c | ||
| skbuff.c | ||
| skmsg.c | ||
| sock_destructor.h | ||
| sock_diag.c | ||
| sock_map.c | ||
| sock_reuseport.c | ||
| sock.c | ||
| stream.c | ||
| sysctl_net_core.c | ||
| timestamping.c | ||
| tso.c | ||
| utils.c | ||
| xdp.c | ||