linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-29 17:43:52 +02:00

Author	SHA1	Message	Date
Brett Creeley	3bc06da858	virtio_net: sync rss_trailer.max_tx_vq on queue_pairs change via VQ_PAIRS_SET When netif_is_rxfh_configured() is true (i.e., the user has explicitly configured the RSS indirection table), virtnet_set_queues() skips the RSS update path and falls through to the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command to change the number of queue pairs. However, it does not update vi->rss_trailer.max_tx_vq to reflect the new queue_pairs value. This causes a mismatch between vi->curr_queue_pairs and vi->rss_trailer.max_tx_vq. Any subsequent RSS reconfiguration (e.g., via ethtool -X) calls virtnet_commit_rss_command(), which sends the stale max_tx_vq to the device, silently reverting the queue count. Reproduction: 1. User configured RSS ethtool -X eth0 equal 8 2. VQ_PAIRS_SET path; max_tx_vq stays 16 ethtool -L eth0 combined 12 3. RSS commit uses max_tx_vq=16 instead of 12 ethtool -X eth0 equal 4 Fix this by updating vi->rss_trailer.max_tx_vq after a successful VQ_PAIRS_SET command when RSS is enabled, keeping it in sync with curr_queue_pairs. Fixes: `50bfcaedd7` ("virtio_net: Update rss when set queue") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260416212121.29073-1-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-23 09:35:53 -07:00
Jakub Kicinski	8ffb33d770	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc7). Conflicts: net/vmw_vsock/af_vsock.c `b18c833888` ("vsock: initialize child_ns_mode_locked in vsock_net_init()") `0de607dc4f` ("vsock: add G2H fallback for CIDs not owned by H2G transport") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c `ceee35e567` ("bnxt_en: Refactor some basic ring setup and adjustment logic") `57cdfe0dc7` ("bnxt_en: Resize RSS contexts on channel count change") drivers/net/wireless/intel/iwlwifi/mld/mac80211.c `4d56037a02` ("wifi: iwlwifi: mld: block EMLSR during TDLS connections") `687a95d204` ("wifi: iwlwifi: mld: correctly set wifi generation data") drivers/net/wireless/intel/iwlwifi/mld/scan.h `b6045c899e` ("wifi: iwlwifi: mld: Refactor scan command handling") `ec66ec6a5a` ("wifi: iwlwifi: mld: Fix MLO scan timing") drivers/net/wireless/intel/iwlwifi/mvm/fw.c `078df640ef` ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v 2") `323156c354` ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:03:13 -07:00
Srujana Challa	b4e5f04c58	virtio_net: clamp rss_max_key_size to NETDEV_RSS_KEY_LEN rss_max_key_size in the virtio spec is the maximum key size supported by the device, not a mandatory size the driver must use. Also the value 40 is a spec minimum, not a spec maximum. The current code rejects RSS and can fail probe when the device reports a larger rss_max_key_size than the driver buffer limit. Instead, clamp the effective key length to min(device rss_max_key_size, NETDEV_RSS_KEY_LEN) and keep RSS enabled. This keeps probe working on devices that advertise larger maximum key sizes while respecting the netdev RSS key buffer size limit. Fixes: `3f7d9c1964` ("virtio_net: Add hash_key_length check") Cc: stable@vger.kernel.org Signed-off-by: Srujana Challa <schalla@marvell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260326142344.1171317-1-schalla@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 19:47:44 -07:00
Di Zhu	f8844dfeea	virtio-net: enable NETIF_F_GRO_HW only if GRO-related offloads are supported Negotiating VIRTIO_NET_F_CTRL_GUEST_OFFLOADS indicates the device allows control over offload support, but the offloads that can be controlled may have nothing to do with GRO (e.g., if neither GUEST_TSO4 nor GUEST_TSO6 is supported). In such a setup, reporting NETIF_F_GRO_HW as available for the device is too optimistic and misleading to the user. Improve the situation by masking off NETIF_F_GRO_HW unless the device possesses actual GRO-related offload capabilities. Out of an abundance of caution, this does not change the current behaviour for hardware with just v6 or just v4 GRO: current interfaces do not allow distinguishing between v6/v4 GRO, so we can't expose them to userspace precisely. Signed-off-by: Di Zhu <zhud@hygon.cn> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20260323041730.986351-1-zhud@hygon.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:28:49 -07:00
Michael S. Tsirkin	fe3e54253f	virtio_net: sync RX buffer before reading the header receive_buf() reads the virtio header through buf before page_pool_dma_sync_for_cpu() runs in receive_small() or receive_mergeable(). The header buffer is thus unsynchronized at the point where flags and, for mergeable buffers, num_buffers are consumed. Omar Elghoul reported that on s390x Secure Execution this showed up as greatly reduced virtio-net performance together with "bad gso" and "bad csum" messages in dmesg. This is because with SE sync actually copies data, so the header is uninitialized. Move the sync into receive_buf() so the header is synchronized before any access through buf. Tool use: Cursor with GPT-5.4 drafted the initial code move from prompt: "in drivers/net/virtio_net.c, move page_pool_dma_sync_for_cpu on receive path to before memory is accessed through buf". The result and the commit log were reviewed and edited manually. Fixes: `24fbd3967f` ("virtio_net: add page_pool support for buffer allocation") Reported-by: Omar Elghoul <oelghoul@linux.ibm.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Tested-by: Omar Elghoul <oelghoul@linux.ibm.com> Link: https://lore.kernel.org/r/20260323150136.14452-1-oelghoul@linux.ibm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Vishwanath Seshagiri <vishs@meta.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/f4caa9be9e5addae7851c012cab0a733be7f0974.1774365273.git.mst@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:07:45 -07:00
Jakub Kicinski	9ebcf66cd6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc6). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 12:09:57 -07:00
xietangxin	ba8bda9a08	virtio_net: Fix UAF on dst_ops when IFF_XMIT_DST_RELEASE is cleared and napi_tx is false A UAF issue occurs when the virtio_net driver is configured with napi_tx=N and the device's IFF_XMIT_DST_RELEASE flag is cleared (e.g., during the configuration of tc route filter rules). When IFF_XMIT_DST_RELEASE is removed from the net_device, the network stack expects the driver to hold the reference to skb->dst until the packet is fully transmitted and freed. In virtio_net with napi_tx=N, skbs may remain in the virtio transmit ring for an extended period. If the network namespace is destroyed while these skbs are still pending, the corresponding dst_ops structure has freed. When a subsequent packet is transmitted, free_old_xmit() is triggered to clean up old skbs. It then calls dst_release() on the skb associated with the stale dst_entry. Since the dst_ops (referenced by the dst_entry) has already been freed, a UAF kernel paging request occurs. fix it by adds skb_dst_drop(skb) in start_xmit to explicitly release the dst reference before the skb is queued in virtio_net. Call Trace: Unable to handle kernel paging request at virtual address ffff80007e150000 CPU: 2 UID: 0 PID: 6236 Comm: ping Kdump: loaded Not tainted 7.0.0-rc1+ #6 PREEMPT ... percpu_counter_add_batch+0x3c/0x158 lib/percpu_counter.c:98 (P) dst_release+0xe0/0x110 net/core/dst.c:177 skb_release_head_state+0xe8/0x108 net/core/skbuff.c:1177 sk_skb_reason_drop+0x54/0x2d8 net/core/skbuff.c:1255 dev_kfree_skb_any_reason+0x64/0x78 net/core/dev.c:3469 napi_consume_skb+0x1c4/0x3a0 net/core/skbuff.c:1527 __free_old_xmit+0x164/0x230 drivers/net/virtio_net.c:611 [virtio_net] free_old_xmit drivers/net/virtio_net.c:1081 [virtio_net] start_xmit+0x7c/0x530 drivers/net/virtio_net.c:3329 [virtio_net] ... Reproduction Steps: NETDEV="enp3s0" config_qdisc_route_filter() { tc qdisc del dev $NETDEV root tc qdisc add dev $NETDEV root handle 1: prio tc filter add dev $NETDEV parent 1:0 \ protocol ip prio 100 route to 100 flowid 1:1 ip route add 192.168.1.100/32 dev $NETDEV realm 100 } test_ns() { ip netns add testns ip link set $NETDEV netns testns ip netns exec testns ifconfig $NETDEV 10.0.32.46/24 ip netns exec testns ping -c 1 10.0.32.1 ip netns del testns } config_qdisc_route_filter test_ns sleep 2 test_ns Fixes: `f2fc6a5458` ("[NETNS][IPV6] route6 - move ip6_dst_ops inside the network namespace") Cc: stable@vger.kernel.org Signed-off-by: xietangxin <xietangxin@yeah.net> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Fixes: `0287587884` ("net: better IFF_XMIT_DST_RELEASE support") Link: https://patch.msgid.link/20260312025406.15641-1-xietangxin@yeah.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-24 20:46:08 -07:00
Xuan Zhuo	38ec410b99	virtio-net: correct hdr_len handling for VIRTIO_NET_F_GUEST_HDRLEN The commit `be50da3e9d` ("net: virtio_net: implement exact header length guest feature") introduces support for the VIRTIO_NET_F_GUEST_HDRLEN feature in virtio-net. This feature requires virtio-net to set hdr_len to the actual header length of the packet when transmitting, the number of bytes from the start of the packet to the beginning of the transport-layer payload. However, in practice, hdr_len was being set using skb_headlen(skb), which is clearly incorrect. This commit fixes that issue. Fixes: `be50da3e9d` ("net: virtio_net: implement exact header length guest feature") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20260320021818.111741-2-xuanzhuo@linux.alibaba.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-24 11:12:07 +01:00
Vishwanath Seshagiri	24fbd3967f	virtio_net: add page_pool support for buffer allocation Use page_pool for RX buffer allocation in mergeable and small buffer modes to enable page recycling and avoid repeated page allocator calls. skb_mark_for_recycle() enables page reuse in the network stack. Big packets mode is unchanged because it uses page->private for linked list chaining of multiple pages per buffer, which conflicts with page_pool's internal use of page->private. Implement conditional DMA premapping using virtqueue_dma_dev(): - When non-NULL (vhost, virtio-pci): use PP_FLAG_DMA_MAP with page_pool handling DMA mapping, submit via virtqueue_add_inbuf_premapped() - When NULL (VDUSE, direct physical): page_pool handles allocation only, submit via virtqueue_add_inbuf_ctx() This preserves the DMA premapping optimization from commit `31f3cd4e57` ("virtio-net: rq submits premapped per-buffer") while adding page_pool support as a prerequisite for future zero-copy features (devmem TCP, io_uring ZCRX). Page pools are created in probe and destroyed in remove (not open/close), following existing driver behavior where RX buffers remain in virtqueues across interface state changes. Signed-off-by: Vishwanath Seshagiri <vishs@meta.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260310183107.2822016-1-vishs@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-16 19:19:39 -07:00
Bui Quang Minh	e3f8800aa2	virtio-net: xsk: Support wakeup on RX side When XDP_USE_NEED_WAKEUP is used and the fill ring is empty so no buffer is allocated on RX side, allow RX NAPI to be descheduled. This avoids wasting CPU cycles on polling. Users will be notified and they need to make a wakeup call after refilling the ring. Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20260304154317.7506-1-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-06 17:38:23 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Gustavo A. R. Silva	4156c3745f	virtio_net: Fix misalignment bug in struct virtnet_info Use the new TRAILING_OVERLAP() helper to fix a misalignment bug along with the following warning: drivers/net/virtio_net.c:429:46: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] This helper creates a union between a flexible-array member (FAM) and a set of members that would otherwise follow it (in this case `u8 rss_hash_key_data[VIRTIO_NET_RSS_MAX_KEY_SIZE];`). This overlays the trailing members (rss_hash_key_data) onto the FAM (hash_key_data) while keeping the FAM and the start of MEMBERS aligned. The static_assert() ensures this alignment remains. Notice that due to tail padding in flexible `struct virtio_net_rss_config_trailer`, `rss_trailer.hash_key_data` (at offset 83 in struct virtnet_info) and `rss_hash_key_data` (at offset 84 in struct virtnet_info) are misaligned by one byte. See below: struct virtio_net_rss_config_trailer { __le16 max_tx_vq; /* 0 2 / __u8 hash_key_length; / 2 1 / __u8 hash_key_data[]; / 3 0 / / size: 4, cachelines: 1, members: 3 / / padding: 1 / / last cacheline: 4 bytes / }; struct virtnet_info { ... struct virtio_net_rss_config_trailer rss_trailer; / 80 4 / / XXX last struct has 1 byte of padding / u8 rss_hash_key_data[40]; / 84 40 / ... / size: 832, cachelines: 13, members: 48 / / sum members: 801, holes: 8, sum holes: 31 / / paddings: 2, sum paddings: 5 / }; After changes, those members are correctly aligned at offset 795: struct virtnet_info { ... union { struct virtio_net_rss_config_trailer rss_trailer; / 792 4 / struct { unsigned char __offset_to_hash_key_data[3]; / 792 3 / u8 rss_hash_key_data[40]; / 795 40 / }; / 792 43 / }; / 792 44 / ... / size: 840, cachelines: 14, members: 47 / / sum members: 801, holes: 8, sum holes: 35 / / padding: 4 / / paddings: 1, sum paddings: 4 / / last cacheline: 8 bytes / }; As a result, the RSS key passed to the device is shifted by 1 byte: the last byte is cut off, and instead a (possibly uninitialized) byte is added at the beginning. As a last note `struct virtio_net_rss_config_hdr rss_hdr;` is also moved to the end, since it seems those three members should stick around together. :) Cc: stable@vger.kernel.org Fixes: `ed3100e90d` ("virtio_net: Use new RSS config structs") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/aWIItWq5dV9XTTCJ@kspp Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-15 10:18:42 +01:00
Bui Quang Minh	a0c159647e	virtio-net: clean up __virtnet_rx_pause/resume The delayed refill worker is removed which makes virtnet_rx_pause/resume quite the same as __virtnet_rx_pause/resume. So remove __virtnet_rx_pause/resume and move the code to virtnet_rx_pause/resume. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20260106150438.7425-4-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-10 11:13:02 -08:00
Bui Quang Minh	1e7b90aa79	virtio-net: remove unused delayed refill worker Since we switched to retry refilling receive buffer in NAPI poll instead of delayed worker, remove all now unused delayed refill worker code. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20260106150438.7425-3-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-10 11:13:02 -08:00
Bui Quang Minh	fcdef3bcbb	virtio-net: don't schedule delayed refill worker When we fail to refill the receive buffers, we schedule a delayed worker to retry later. However, this worker creates some concurrency issues. For example, when the worker runs concurrently with virtnet_xdp_set, both need to temporarily disable queue's NAPI before enabling again. Without proper synchronization, a deadlock can happen when napi_disable() is called on an already disabled NAPI. That napi_disable() call will be stuck and so will the subsequent napi_enable() call. To simplify the logic and avoid further problems, we will instead retry refilling in the next NAPI poll. Fixes: `4bc12818b3` ("virtio-net: disable delayed refill when pausing rx") Reported-by: Paolo Abeni <pabeni@redhat.com> Closes: https://lore.kernel.org/526b5396-459d-4d02-8635-a222d07b46d7@redhat.com Cc: stable@vger.kernel.org Suggested-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260106150438.7425-2-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-10 11:12:48 -08:00
Kommula Shiva Shankar	acb4bc6e1b	virtio_net: fix device mismatch in devm_kzalloc/devm_kfree Initial rss_hdr allocation uses virtio_device->device, but virtnet_set_queues() frees using net_device->device. This device mismatch causing below devres warning [ 3788.514041] ------------[ cut here ]------------ [ 3788.514044] WARNING: drivers/base/devres.c:1095 at devm_kfree+0x84/0x98, CPU#16: vdpa/1463 [ 3788.514054] Modules linked in: octep_vdpa virtio_net virtio_vdpa [last unloaded: virtio_vdpa] [ 3788.514064] CPU: 16 UID: 0 PID: 1463 Comm: vdpa Tainted: G W 6.18.0 #10 PREEMPT [ 3788.514067] Tainted: [W]=WARN [ 3788.514069] Hardware name: Marvell CN106XX board (DT) [ 3788.514071] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 3788.514074] pc : devm_kfree+0x84/0x98 [ 3788.514076] lr : devm_kfree+0x54/0x98 [ 3788.514079] sp : ffff800084e2f220 [ 3788.514080] x29: ffff800084e2f220 x28: ffff0003b2366000 x27: 000000000000003f [ 3788.514085] x26: 000000000000003f x25: ffff000106f17c10 x24: 0000000000000080 [ 3788.514089] x23: ffff00045bb8ab08 x22: ffff00045bb8a000 x21: 0000000000000018 [ 3788.514093] x20: ffff0004355c3080 x19: ffff00045bb8aa00 x18: 0000000000080000 [ 3788.514098] x17: 0000000000000040 x16: 000000000000001f x15: 000000000007ffff [ 3788.514102] x14: 0000000000000488 x13: 0000000000000005 x12: 00000000000fffff [ 3788.514106] x11: ffffffffffffffff x10: 0000000000000005 x9 : ffff800080c8c05c [ 3788.514110] x8 : ffff800084e2eeb8 x7 : 0000000000000000 x6 : 000000000000003f [ 3788.514115] x5 : ffff8000831bafe0 x4 : ffff800080c8b010 x3 : ffff0004355c3080 [ 3788.514119] x2 : ffff0004355c3080 x1 : 0000000000000000 x0 : 0000000000000000 [ 3788.514123] Call trace: [ 3788.514125] devm_kfree+0x84/0x98 (P) [ 3788.514129] virtnet_set_queues+0x134/0x2e8 [virtio_net] [ 3788.514135] virtnet_probe+0x9c0/0xe00 [virtio_net] [ 3788.514139] virtio_dev_probe+0x1e0/0x338 [ 3788.514144] really_probe+0xc8/0x3a0 [ 3788.514149] __driver_probe_device+0x84/0x170 [ 3788.514152] driver_probe_device+0x44/0x120 [ 3788.514155] __device_attach_driver+0xc4/0x168 [ 3788.514158] bus_for_each_drv+0x8c/0xf0 [ 3788.514161] __device_attach+0xa4/0x1c0 [ 3788.514164] device_initial_probe+0x1c/0x30 [ 3788.514168] bus_probe_device+0xb4/0xc0 [ 3788.514170] device_add+0x614/0x828 [ 3788.514173] register_virtio_device+0x214/0x258 [ 3788.514175] virtio_vdpa_probe+0xa0/0x110 [virtio_vdpa] [ 3788.514179] vdpa_dev_probe+0xa8/0xd8 [ 3788.514183] really_probe+0xc8/0x3a0 [ 3788.514186] __driver_probe_device+0x84/0x170 [ 3788.514189] driver_probe_device+0x44/0x120 [ 3788.514192] __device_attach_driver+0xc4/0x168 [ 3788.514195] bus_for_each_drv+0x8c/0xf0 [ 3788.514197] __device_attach+0xa4/0x1c0 [ 3788.514200] device_initial_probe+0x1c/0x30 [ 3788.514203] bus_probe_device+0xb4/0xc0 [ 3788.514206] device_add+0x614/0x828 [ 3788.514209] _vdpa_register_device+0x58/0x88 [ 3788.514211] octep_vdpa_dev_add+0x104/0x228 [octep_vdpa] [ 3788.514215] vdpa_nl_cmd_dev_add_set_doit+0x2d0/0x3c0 [ 3788.514218] genl_family_rcv_msg_doit+0xe4/0x158 [ 3788.514222] genl_rcv_msg+0x218/0x298 [ 3788.514225] netlink_rcv_skb+0x64/0x138 [ 3788.514229] genl_rcv+0x40/0x60 [ 3788.514233] netlink_unicast+0x32c/0x3b0 [ 3788.514237] netlink_sendmsg+0x170/0x3b8 [ 3788.514241] __sys_sendto+0x12c/0x1c0 [ 3788.514246] __arm64_sys_sendto+0x30/0x48 [ 3788.514249] invoke_syscall.constprop.0+0x58/0xf8 [ 3788.514255] do_el0_svc+0x48/0xd0 [ 3788.514259] el0_svc+0x48/0x210 [ 3788.514264] el0t_64_sync_handler+0xa0/0xe8 [ 3788.514268] el0t_64_sync+0x198/0x1a0 [ 3788.514271] ---[ end trace 0000000000000000 ]--- Fix by using virtio_device->device consistently for allocation and deallocation Fixes: `4944be2f5a` ("virtio_net: Allocate rss_hdr with devres") Signed-off-by: Kommula Shiva Shankar <kshankar@marvell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20260102101900.692770-1-kshankar@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-04 10:55:25 -08:00
Jakub Kicinski	db4029859d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: net/xdp/xsk.c `0ebc27a4c6` ("xsk: avoid data corruption on cq descriptor number") `8da7bea7db` ("xsk: add indirect call for xsk_destruct_skb") `30ed05adca` ("xsk: use a smaller new lock for shared pool case") https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-27 12:19:08 -08:00
Jon Kohler	1cd1c47234	virtio-net: avoid unnecessary checksum calculation on guest RX Commit `a2fb4bc4e2` ("net: implement virtio helpers to handle UDP GSO tunneling.") inadvertently altered checksum offload behavior for guests not using UDP GSO tunneling. Before, tun_put_user called tun_vnet_hdr_from_skb, which passed has_data_valid = true to virtio_net_hdr_from_skb. After, tun_put_user began calling tun_vnet_hdr_tnl_from_skb instead, which passes has_data_valid = false into both call sites. This caused virtio hdr flags to not include VIRTIO_NET_HDR_F_DATA_VALID for SKBs where skb->ip_summed == CHECKSUM_UNNECESSARY. As a result, guests are forced to recalculate checksums unnecessarily. Restore the previous behavior by ensuring has_data_valid = true is passed in the !tnl_gso_type case, but only from tun side, as virtio_net_hdr_tnl_from_skb() is used also by the virtio_net driver, which in turn must not use VIRTIO_NET_HDR_F_DATA_VALID on tx. cc: stable@vger.kernel.org Fixes: `a2fb4bc4e2` ("net: implement virtio helpers to handle UDP GSO tunneling.") Signed-off-by: Jon Kohler <jon@nutanix.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20251125222754.1737443-1-jon@nutanix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-26 19:45:54 -08:00
Liming Wu	cfeb7cd80f	virtio_net: enhance wake/stop tx queue statistics accounting This patch refines and strengthens the statistics collection of TX queue wake/stop events introduced by commit `c39add9b24` ("virtio_net: Add TX stopped and wake counters"). Previously, the driver only recorded partial wake/stop statistics for TX queues. Some wake events triggered by 'skb_xmit_done()' or resume operations were not counted, which made the per-queue metrics incomplete. Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251120015320.1418-1-liming.wu@jaguarmicro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:38:01 -08:00
Jakub Kicinski	c99ebb6132	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc6). No conflicts, adjacent changes in: drivers/net/phy/micrel.c `96a9178a29` ("net: phy: micrel: lan8814 fix reset of the QSGMII interface") `61b7ade9ba` ("net: phy: micrel: Add support for non PTP SKUs for lan8814") and a trivial one in tools/testing/selftests/drivers/net/Makefile. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-13 12:35:38 -08:00
Xuan Zhuo	0eff2eaa53	virtio-net: fix incorrect flags recording in big mode The purpose of commit `703eec1b24` ("virtio_net: fixing XDP for fully checksummed packets handling") is to record the flags in advance, as their value may be overwritten in the XDP case. However, the flags recorded under big mode are incorrect, because in big mode, the passed buf does not point to the rx buffer, but rather to the page of the submitted buffer. This commit fixes this issue. For the small mode, the commit `c11a49d58a` ("virtio_net: Fix mismatched buf address when unmapping for small packets") fixed it. Tested-by: Alyssa Ross <hi@alyssa.is> Fixes: `703eec1b24` ("virtio_net: fixing XDP for fully checksummed packets handling") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251111090828.23186-1-xuanzhuo@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-13 13:16:30 +01:00
Jakub Kicinski	1ec9871fbb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc5). Conflicts: drivers/net/wireless/ath/ath12k/mac.c `9222582ec5` ("Revert "wifi: ath12k: Fix missing station power save configuration"") `6917e268c4` ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon") https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net Adjacent changes: drivers/net/ethernet/intel/Kconfig `b1d16f7c00` ("libie: depend on DEBUG_FS when building LIBIE_FWLOG") `93f53db9f9` ("ice: switch to Page Pool") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 09:27:40 -08:00
Bui Quang Minh	0c71670396	virtio-net: fix received length check in big packets Since commit `4959aebba8` ("virtio-net: use mtu size as buffer length for big packets"), when guest gso is off, the allocated size for big packets is not MAX_SKB_FRAGS * PAGE_SIZE anymore but depends on negotiated MTU. The number of allocated frags for big packets is stored in vi->big_packets_num_skbfrags. Because the host announced buffer length can be malicious (e.g. the host vhost_net driver's get_rx_bufs is modified to announce incorrect length), we need a check in virtio_net receive path. Currently, the check is not adapted to the new change which can lead to NULL page pointer dereference in the below while loop when receiving length that is larger than the allocated one. This commit fixes the received length check corresponding to the new change. Fixes: `4959aebba8` ("virtio-net: use mtu size as buffer length for big packets") Cc: stable@vger.kernel.org Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://patch.msgid.link/20251030144438.7582-1-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 18:49:29 -08:00
Michael S. Tsirkin	c3838262b8	virtio_net: fix alignment for virtio_net_hdr_v1_hash Changing alignment of header would mean it's no longer safe to cast a 2 byte aligned pointer between formats. Use two 16 bit fields to make it 2 byte aligned as previously. This fixes the performance regression since commit ("virtio_net: enable gso over UDP tunnel support.") as it uses virtio_net_hdr_v1_hash_tunnel which embeds virtio_net_hdr_v1_hash. Pktgen in guest + XDP_DROP on TAP + vhost_net shows the TX PPS is recovered from 2.4Mpps to 4.45Mpps. Fixes: `56a06bd40f` ("virtio_net: enable gso over UDP tunnel support.") Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://patch.msgid.link/20251031060551.126-1-jasowang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:14:07 -08:00
Chu Guangqing	f4b2786fb1	virtio_net: Fix a typo error in virtio_net Fix the spelling error of "separate". Signed-off-by: Chu Guangqing <chuguangqing@inspur.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251103074305.4727-1-chuguangqing@inspur.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:00:37 -08:00
Bui Quang Minh	1ab6658174	virtio-net: drop the multi-buffer XDP packet in zerocopy In virtio-net, we have not yet supported multi-buffer XDP packet in zerocopy mode when there is a binding XDP program. However, in that case, when receiving multi-buffer XDP packet, we skip the XDP program and return XDP_PASS. As a result, the packet is passed to normal network stack which is an incorrect behavior (e.g. a XDP program for packet count is installed, multi-buffer XDP packet arrives and does go through XDP program. As a result, the packet count does not increase but the packet is still received from network stack).This commit instead returns XDP_ABORTED in that case. Fixes: `99c861b44e` ("virtio_net: xsk: rx: support recv merge mode") Cc: stable@vger.kernel.org Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20251022155630.49272-1-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-23 17:30:40 -07:00
Linus Torvalds	bf897d2626	virtio,vhost: fixes, cleanups Just fixes and cleanups this time around. The mapping cleanups are preparing the ground for new features, though. In order patches were almost there but I feel they didn't spend enough time in next yet. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmjdEQQPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpsDAH/2yWtj3WWBUmRo5oZ5Rkveebb0oMkm642zGB nmJ21UdIelvRM1sQoaV0+m6B8mMBDpmxrN3Mg7sSLFMN3xK5DF5QkWEwudJ1RJaq VVfPyv29tee5mtCve/aG/d+JWipYPdma96Gi8l3UaRXq7TTWBkVAFpxukpK5I1O5 NJJigcxxu6O/gbrR9JxW6HSX9BmV7hsFtsW2HR/C2hXWlTECaJeQJ/ZvooLFhzfZ pnwFtWjk3D6wYCWquvyE6OhrFDqLsLEW2GgYehL2BRYp/PcLizxewmZL0ghnP7mV bT5QRHGjxPZIHckBwvVjsIE0eN9at3nl5koBnWbxRYsJau3HgVA= =4Bdo -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: "Just fixes and cleanups this time around. The mapping cleanups are preparing the ground for new features, though" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio-vdpa: Drop redundant conversion to bool vduse: Use fixed 4KB bounce pages for non-4KB page size vduse: switch to use virtio map API instead of DMA API vdpa: introduce map ops vdpa: support virtio_map virtio: introduce map ops in virtio core virtio_ring: rename dma_handle to map_handle virtio: introduce virtio_map container union virtio: rename dma helpers virtio_ring: switch to use dma_{map\|unmap}_page() virtio_ring: constify virtqueue pointer for DMA helpers virtio_balloon: Remove redundant __GFP_NOWARN vhost: vringh: Fix copy_to_iter return value check vhost: vringh: Modify the return value check	2025-10-04 08:48:16 -07:00
Jason Wang	b41cb3bcf6	virtio: rename dma helpers Following patch will introduce virtio mapping function to avoid abusing DMA API for device that doesn't do DMA. To ease the introduction, this patch rename "dma" to "map" for the current dma mapping helpers. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20250821064641.5025-4-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>	2025-10-01 07:24:43 -04:00
Breno Leitao	483446690a	net: virtio_net: add get_rxrings ethtool callback for RX ring queries Replace the existing virtnet_get_rxnfc callback with a dedicated virtnet_get_rxrings implementation to provide the number of RX rings directly via the new ethtool_ops get_rx_ring_count pointer. This simplifies the RX ring count retrieval and aligns virtio_net with the new ethtool API for querying RX ring parameters. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250917-gxrings-v4-8-dae520e2e1cb@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-18 07:07:39 -07:00
Jakub Kicinski	1827f773e4	net: xdp: pass full flags to xdp_update_skb_shared_info() xdp_update_skb_shared_info() needs to update skb state which was maintained in xdp_buff / frame. Pass full flags into it, instead of breaking it out bit by bit. We will need to add a bit for unreadable frags (even tho XDP doesn't support those the driver paths may be common), at which point almost all call sites would become: xdp_update_skb_shared_info(skb, num_frags, sinfo->xdp_frags_size, MY_PAGE_SIZE * num_frags, xdp_buff_is_frag_pfmemalloc(xdp), xdp_buff_is_frag_unreadable(xdp)); Keep a helper for accessing the flags, in case we need to transform them somehow in the future (e.g. to cover up xdp_buff vs xdp_frame differences). While we are touching call callers - rename the helper to xdp_update_skb_frags_info(), previous name may have implied that it's shinfo that's updated. We are updating flags in struct sk_buff based on frags that got attched. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20250905221539.2930285-2-kuba@kernel.org Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-11 12:00:20 +02:00
Junnan Wu	45d8ef6322	virtio_net: adjust the execution order of function `virtnet_close` during freeze "Use after free" issue appears in suspend once race occurs when napi poll scheduls after `netif_device_detach` and before napi disables. For details, during suspend flow of virtio-net, the tx queue state is set to "__QUEUE_STATE_DRV_XOFF" by CPU-A. And at some coincidental times, if a TCP connection is still working, CPU-B does `virtnet_poll` before napi disable. In this flow, the state "__QUEUE_STATE_DRV_XOFF" of tx queue will be cleared. This is not the normal process it expects. After that, CPU-A continues to close driver then virtqueue is removed. Sequence likes below: -------------------------------------------------------------------------- CPU-A CPU-B ----- ----- suspend is called A TCP based on virtio-net still work virtnet_freeze \|- virtnet_freeze_down \| \|- netif_device_detach \| \| \|- netif_tx_stop_all_queues \| \| \|- netif_tx_stop_queue \| \| \|- set_bit \| \| (__QUEUE_STATE_DRV_XOFF,...) \| \| softirq rasied \| \| \|- net_rx_action \| \| \|- napi_poll \| \| \|- virtnet_poll \| \| \|- virtnet_poll_cleantx \| \| \|- netif_tx_wake_queue \| \| \|- test_and_clear_bit \| \| (__QUEUE_STATE_DRV_XOFF,...) \| \|- virtnet_close \| \|- virtnet_disable_queue_pair \| \|- virtnet_napi_tx_disable \|- remove_vq_common -------------------------------------------------------------------------- When TCP delayack timer is up, a cpu gets softirq and irq handler `tcp_delack_timer_handler` will be called, which will finally call `start_xmit` in virtio net driver. Then the access to tx virtq will cause panic. The root cause of this issue is that napi tx is not disable before `netif_tx_stop_queue`, once `virnet_poll` schedules in such coincidental time, the tx queue state will be cleared. To solve this issue, adjusts the order of function `virtnet_close` in `virtnet_freeze_down`. Co-developed-by: Ying Xu <ying123.xu@samsung.com> Signed-off-by: Ying Xu <ying123.xu@samsung.com> Signed-off-by: Junnan Wu <junnan01.wu@samsung.com> Message-Id: <20250812090817.3463403-1-junnan01.wu@samsung.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2025-08-26 03:38:20 -04:00
Jakub Kicinski	af2d6148d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc7). Conflicts: Documentation/netlink/specs/ovpn.yaml `880d43ca9a` ("netlink: specs: clean up spaces in brackets") `af52020fc5` ("ovpn: reject unexpected netlink attributes") drivers/net/phy/phy_device.c `a44312d58e` ("net: phy: Don't register LEDs for genphy") `f0f2b992d8` ("net: phy: Don't register LEDs for genphy") https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org drivers/net/wireless/intel/iwlwifi/fw/regulatory.c drivers/net/wireless/intel/iwlwifi/mld/regulatory.c `5fde0fcbd7` ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap") `ea045a0de3` ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware") net/ipv6/mcast.c `ae3264a25a` ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()") `a8594c956c` ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()") https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-17 11:00:33 -07:00
Zigit Zo	be5dcaed69	virtio-net: fix recursived rtnl_lock() during probe() The deadlock appears in a stack trace like: virtnet_probe() rtnl_lock() virtio_config_changed_work() netdev_notify_peers() rtnl_lock() It happens if the VMM sends a VIRTIO_NET_S_ANNOUNCE request while the virtio-net driver is still probing. The config_work in probe() will get scheduled until virtnet_open() enables the config change notification via virtio_config_driver_enable(). Fixes: `df28de7b00` ("virtio-net: synchronize operstate with admin state on up/down") Signed-off-by: Zigit Zo <zuozhijie@bytedance.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250716115717.1472430-1-zuozhijie@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-17 07:37:59 -07:00
Liming Wu	2f82e99546	virtio_net: simplify tx queue wake condition check Consolidate the two nested if conditions for checking tx queue wake conditions into a single combined condition. This improves code readability without changing functionality. And move netif_tx_wake_queue into if condition to reduce unnecessary checks for queue stops. Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com> Tested-by: Lei Yang <leiyang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250710023208.846-1-liming.wu@jaguarmicro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-11 15:59:54 -07:00
Jakub Kicinski	b430f6c38d	Merge branch 'virtio_udp_tunnel_08_07_2025' of https://github.com/pabeni/linux-devel Paolo Abeni says: ==================== virtio: introduce GSO over UDP tunnel Some virtualized deployments use UDP tunnel pervasively and are impacted negatively by the lack of GSO support for such kind of traffic in the virtual NIC driver. The virtio_net specification recently introduced support for GSO over UDP tunnel, this series updates the virtio implementation to support such a feature. Currently the kernel virtio support limits the feature space to 64, while the virtio specification allows for a larger number of features. Specifically the GSO-over-UDP-tunnel-related virtio features use bits 65-69. The first four patches in this series rework the virtio and vhost feature support to cope with up to 128 bits. The limit is set by a define and could be easily raised in future, as needed. This implementation choice is aimed at keeping the code churn as limited as possible. For the same reason, only the virtio_net driver is reworked to leverage the extended feature space; all other virtio/vhost drivers are unaffected, but could be upgraded to support the extended features space in a later time. The last four patches bring in the actual GSO over UDP tunnel support. As per specification, some additional fields are introduced into the virtio net header to support the new offload. The presence of such fields depends on the negotiated features. New helpers are introduced to convert the UDP-tunneled skb metadata to an extended virtio net header and vice versa. Such helpers are used by the tun and virtio_net driver to cope with the newly supported offloads. Tested with basic stream transfer with all the possible permutations of host kernel/qemu/guest kernel with/without GSO over UDP tunnel support. ==================== Link: https://patch.msgid.link/cover.1751874094.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 13:32:35 -07:00
Bui Quang Minh	f47e8f618c	virtio-net: xsk: rx: move the xdp->data adjustment to buf_to_xdp() This commit does not do any functional changes. It moves xdp->data adjustment for buffer other than first buffer to buf_to_xdp() helper so that the xdp_buff adjustment does not scatter over different functions. Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20250705075515.34260-1-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 18:41:35 -07:00
Paolo Abeni	56a06bd40f	virtio_net: enable gso over UDP tunnel support. If the related virtio feature is set, enable transmission and reception of gso over UDP tunnel packets. Most of the work is done by the previously introduced helper, just need to determine the UDP tunnel features inside the virtio_net_hdr and update accordingly the virtio net hdr size. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 18:05:59 +02:00
Paolo Abeni	3b17aa1301	virtio_net: add supports for extended offloads The virtio_net driver needs it to implement GSO over UDP tunnel offload. The only missing piece is mapping them to/from the extended features. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 18:05:33 +02:00
Paolo Abeni	6b9fd8857b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc5). No conflicts. No adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-04 08:03:18 +02:00
Laurent Vivier	24b2f5df86	virtio_net: Enforce minimum TX ring size for reliability The `tx_may_stop()` logic stops TX queues if free descriptors (`sq->vq->num_free`) fall below the threshold of (`MAX_SKB_FRAGS` + 2). If the total ring size (`ring_num`) is not strictly greater than this value, queues can become persistently stopped or stop after minimal use, severely degrading performance. A single sk_buff transmission typically requires descriptors for: - The virtio_net_hdr (1 descriptor) - The sk_buff's linear data (head) (1 descriptor) - Paged fragments (up to MAX_SKB_FRAGS descriptors) This patch enforces that the TX ring size ('ring_num') must be strictly greater than (MAX_SKB_FRAGS + 2). This ensures that the ring is always large enough to hold at least one maximally-fragmented packet plus at least one additional slot. Reported-by: Lei Yang <leiyang@redhat.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250521092236.661410-4-lvivier@redhat.com Tested-by: Lei Yang <leiyang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 11:40:02 +02:00
Laurent Vivier	bd2948d258	virtio_net: Cleanup '2+MAX_SKB_FRAGS' Improve consistency by using everywhere it is needed 'MAX_SKB_FRAGS + 2' rather than '2+MAX_SKB_FRAGS' or '2 + MAX_SKB_FRAGS'. No functional change. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250521092236.661410-3-lvivier@redhat.com Tested-by: Lei Yang <leiyang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 11:40:02 +02:00
Bui Quang Minh	5177373c31	virtio-net: xsk: rx: fix the frame's length check When calling buf_to_xdp, the len argument is the frame data's length without virtio header's length (vi->hdr_len). We check that len with xsk_pool_get_rx_frame_size() + vi->hdr_len to ensure the provided len does not larger than the allocated chunk size. The additional vi->hdr_len is because in virtnet_add_recvbuf_xsk, we use part of XDP_PACKET_HEADROOM for virtio header and ask the vhost to start placing data from hard_start + XDP_PACKET_HEADROOM - vi->hdr_len not hard_start + XDP_PACKET_HEADROOM But the first buffer has virtio_header, so the maximum frame's length in the first buffer can only be xsk_pool_get_rx_frame_size() not xsk_pool_get_rx_frame_size() + vi->hdr_len like in the current check. This commit adds an additional argument to buf_to_xdp differentiate between the first buffer and other ones to correctly calculate the maximum frame's length. Cc: stable@vger.kernel.org Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Fixes: `a4e7ba7027` ("virtio_net: xsk: rx: support recv small mode") Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20250630151315.86722-2-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 11:23:03 +02:00
Bui Quang Minh	7d4a119e45	virtio-net: use the check_mergeable_len helper Replace the current repeated code to check received length in mergeable mode with the new check_mergeable_len helper. Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250630144212.48471-4-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 10:56:55 +02:00
Bui Quang Minh	4be2193b33	virtio-net: remove redundant truesize check with PAGE_SIZE The truesize is guaranteed not to exceed PAGE_SIZE in get_mergeable_buf_len(). It is saved in mergeable context, which is not changeable by the host side, so the check in receive path is quite redundant. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20250630144212.48471-3-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 10:56:55 +02:00
Bui Quang Minh	315dbdd7cd	virtio-net: ensure the received length does not exceed allocated size In xdp_linearize_page, when reading the following buffers from the ring, we forget to check the received length with the true allocate size. This can lead to an out-of-bound read. This commit adds that missing check. Cc: <stable@vger.kernel.org> Fixes: `4941d472bf` ("virtio-net: do not reset during XDP set") Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250630144212.48471-2-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 10:56:55 +02:00
Jakub Kicinski	63d474cfb5	net: drv: virtio: migrate to new RXFH callbacks Add support for the new rxfh_fields callbacks, instead of de-muxing the rxnfc calls. This driver does not support flow filtering so the set_rxnfc callback is completely removed. Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250611145949.2674086-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 17:46:58 -07:00
Jakub Kicinski	001160ec8c	virtio-net: fix total qstat values NIPA tests report that the interface statistics reported via qstat are lower than those reported via ip link. Looks like this is because some tests flip the queue count up and down, and we end up with some of the traffic accounted on disabled queues. Add up counters from disabled queues. Fixes: `d888f04c09` ("virtio-net: support queue stat") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250507003221.823267-3-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-08 11:56:12 +02:00
Jakub Kicinski	4397684a29	virtio-net: free xsk_buffs on error in virtnet_xsk_pool_enable() The selftests added to our CI by Bui Quang Minh recently reveals that there is a mem leak on the error path of virtnet_xsk_pool_enable(): unreferenced object 0xffff88800a68a000 (size 2048): comm "xdp_helper", pid 318, jiffies 4294692778 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 0): __kvmalloc_node_noprof+0x402/0x570 virtnet_xsk_pool_enable+0x293/0x6a0 (drivers/net/virtio_net.c:5882) xp_assign_dev+0x369/0x670 (net/xdp/xsk_buff_pool.c:226) xsk_bind+0x6a5/0x1ae0 __sys_bind+0x15e/0x230 __x64_sys_bind+0x72/0xb0 do_syscall_64+0xc1/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Acked-by: Jason Wang <jasowang@redhat.com> Fixes: `e9f3962441` ("virtio_net: xsk: rx: support fill with xsk buffer") Link: https://patch.msgid.link/20250430163836.3029761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 13:54:40 -07:00
Jakub Kicinski	1e20324b23	virtio-net: don't re-enable refill work too early when NAPI is disabled Commit `4bc12818b3` ("virtio-net: disable delayed refill when pausing rx") fixed a deadlock between reconfig paths and refill work trying to disable the same NAPI instance. The refill work can't run in parallel with reconfig because trying to double-disable a NAPI instance causes a stall under the instance lock, which the reconfig path needs to re-enable the NAPI and therefore unblock the stalled thread. There are two cases where we re-enable refill too early. One is in the virtnet_set_queues() handler. We call it when installing XDP: virtnet_rx_pause_all(vi); ... virtnet_napi_tx_disable(..); ... virtnet_set_queues(..); ... virtnet_rx_resume_all(..); We want the work to be disabled until we call virtnet_rx_resume_all(), but virtnet_set_queues() kicks it before NAPIs were re-enabled. The other case is a more trivial case of mis-ordering in __virtnet_rx_resume() found by code inspection. Taking the spin lock in virtnet_set_queues() (requested during review) may be unnecessary as we are under rtnl_lock and so are all paths writing to ->refill_enabled. Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Bui Quang Minh <minhquangbui99@gmail.com> Fixes: `4bc12818b3` ("virtio-net: disable delayed refill when pausing rx") Fixes: `413f0271f3` ("net: protect NAPI enablement with netdev_lock()") Link: https://patch.msgid.link/20250430163758.3029367-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 13:53:53 -07:00

1 2 3 4 5 ...

836 Commits