linux/net/core
Jakub Kicinski 2da4def0f4 netpoll: prevent hanging NAPI when netcons gets enabled
Paolo spotted hangs in NIPA running driver tests against virtio.
The tests hang in virtnet_close() -> virtnet_napi_tx_disable().

The problem is only reproducible if running multiple of our tests
in sequence (I used TEST_PROGS="xdp.py ping.py netcons_basic.sh \
netpoll_basic.py stats.py"). Initial suspicion was that this is
a simple case of double-disable of NAPI, but instrumenting the
code reveals:

 Deadlocked on NAPI ffff888007cd82c0 (virtnet_poll_tx):
   state: 0x37, disabled: false, owner: 0, listed: false, weight: 64

The NAPI was not in fact disabled, owner is 0 (rather than -1),
so the NAPI "thinks" it's scheduled for CPU 0 but it's not listed
(!list_empty(&n->poll_list) => false). It seems odd that normal NAPI
processing would wedge itself like this.

Better suspicion is that netpoll gets enabled while NAPI is polling,
and also grabs the NAPI instance. This confuses napi_complete_done():

  [netpoll]                                   [normal NAPI]
                                        napi_poll()
                                          have = netpoll_poll_lock()
                                            rcu_access_pointer(dev->npinfo)
                                              return NULL # no netpoll
                                          __napi_poll()
					    ->poll(->weight)
  poll_napi()
    cmpxchg(->poll_owner, -1, cpu)
      poll_one_napi()
        set_bit(NAPI_STATE_NPSVC, ->state)
                                              napi_complete_done()
                                                if (NAPIF_STATE_NPSVC)
                                                  return false
                                           # exit without clearing SCHED

This feels very unlikely, but perhaps virtio has some interactions
with the hypervisor in the NAPI ->poll that makes the race window
larger?

Best I could to to prove the theory was to add and trigger this
warning in napi_poll (just before netpoll_poll_unlock()):

      WARN_ONCE(!have && rcu_access_pointer(n->dev->npinfo) &&
                napi_is_scheduled(n) && list_empty(&n->poll_list),
                "NAPI race with netpoll %px", n);

If this warning hits the next virtio_close() will hang.

This patch survived 30 test iterations without a hang (without it
the longest clean run was around 10). Credit for triggering this
goes to Breno's recent netconsole tests.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/c5a93ed1-9abe-4880-a3bb-8d1678018b1d@redhat.com
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20250726010846.1105875-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-30 18:05:52 -07:00
..
bpf_sk_storage.c
datagram.c net: devmem: support single IOV with sendmsg 2025-05-26 10:00:48 +01:00
dev_addr_lists_test.c
dev_addr_lists.c net: s/dev_pre_changeaddr_notify/netif_pre_changeaddr_notify/ 2025-07-18 17:27:47 -07:00
dev_api.c net: define an enum for the napi threaded state 2025-07-24 18:34:55 -07:00
dev_ioctl.c net: s/dev_get_flags/netif_get_flags/ 2025-07-18 17:27:47 -07:00
dev.c bpf-next-6.17 2025-07-30 09:58:50 -07:00
dev.h net: define an enum for the napi threaded state 2025-07-24 18:34:55 -07:00
devmem.c net: devmem: move list_add to net_devmem_bind_dmabuf. 2025-05-27 19:19:35 -07:00
devmem.h net: Fix net_devmem_bind_dmabuf for non-devmem configs 2025-05-30 19:23:36 -07:00
drop_monitor.c treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
dst_cache.c net: dst: annotate data-races around dst->obsolete 2025-07-02 14:32:29 -07:00
dst.c net: dst: add four helpers to annotate data-races around dst->dev 2025-07-02 14:32:30 -07:00
failover.c
fib_notifier.c
fib_rules.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-04-17 12:26:50 -07:00
filter.c bpf: Reject narrower access to pointer ctx fields 2025-07-23 19:33:49 -07:00
flow_dissector.c
flow_offload.c
gen_estimator.c treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
gen_stats.c
gro_cells.c
gro.c
gso.c
hotdata.c tcp: move tcp_memory_allocated into net_aligned_data 2025-07-02 14:22:02 -07:00
hwbm.c
ieee8021q_helpers.c net: ieee8021q: fix insufficient table-size assertion 2025-07-01 12:55:49 +02:00
link_watch.c
lock_debug.c netdev: fix the locking for netdev notifications 2025-04-17 18:55:14 -07:00
lwt_bpf.c
lwtunnel.c inet: Remove rtnl_is_held arg of lwtunnel_valid_encap_type(_attr)?(). 2025-05-20 19:18:24 -07:00
Makefile
mp_dmabuf_devmem.h
neighbour.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-26 11:49:45 -07:00
net_namespace.c Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
net_test.c
net-procfs.c
net-sysfs.c net: s/dev_set_threaded/netif_set_threaded/ 2025-07-18 17:27:47 -07:00
net-sysfs.h net: remove RTNL use for /proc/sys/net/core/rps_default_mask 2025-07-07 18:42:12 -07:00
net-traces.c
netclassid_cgroup.c net, bpf: Fix RCU usage in task_cls_state() for BPF programs 2025-06-11 21:30:29 +02:00
netdev_rx_queue.c net: Reoder rxq_idx check in __net_mp_open_rxq() 2025-06-25 16:53:51 -07:00
netdev-genl-gen.c net: define an enum for the napi threaded state 2025-07-24 18:34:55 -07:00
netdev-genl-gen.h net: devmem: TCP tx netlink api 2025-05-13 11:12:48 +02:00
netdev-genl.c net: define an enum for the napi threaded state 2025-07-24 18:34:55 -07:00
netevent.c
netmem_priv.h page_pool: Track DMA-mapped pages and unmap them when destroying the pool 2025-04-14 16:30:29 -07:00
netpoll.c netpoll: prevent hanging NAPI when netcons gets enabled 2025-07-30 18:05:52 -07:00
netprio_cgroup.c
of_net.c
page_pool_priv.h
page_pool_user.c
page_pool.c page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() 2025-07-07 18:40:09 -07:00
pktgen.c net: pktgen: fix code style (WARNING: Prefer strscpy over strcpy) 2025-04-17 13:02:41 +02:00
ptp_classifier.c
request_sock.c
rtnetlink.c net: s/dev_get_flags/netif_get_flags/ 2025-07-18 17:27:47 -07:00
scm.c af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD 2025-07-04 09:32:35 +02:00
secure_seq.c
selftests.c net: selftests: add PHY-loopback test for bad TCP checksums 2025-07-18 17:19:46 -07:00
skb_fault_injection.c
skbuff.c skbuff: Add MSG_MORE flag to optimize tcp large packet transmission 2025-07-09 19:25:57 -07:00
skmsg.c bpf, sockmap: Fix psock incorrectly pointing to sk 2025-06-10 18:16:15 +02:00
sock_destructor.h
sock_diag.c
sock_map.c bpf: Remove attach_type in sockmap_link 2025-07-11 10:51:55 -07:00
sock_reuseport.c
sock.c net: track pfmemalloc drops via SKB_DROP_REASON_PFMEMALLOC 2025-07-18 16:59:05 -07:00
stream.c net: stream: add description for sk_stream_write_space() 2025-07-18 16:57:21 -07:00
sysctl_net_core.c net: remove RTNL use for /proc/sys/net/core/rps_default_mask 2025-07-07 18:42:12 -07:00
timestamping.c
tso.c
utils.c net: Fix checksum update for ILA adj-transport 2025-05-30 19:53:51 -07:00
xdp.c xsk: add missing virtual address conversion for page 2025-05-27 11:46:47 +02:00