linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-28 00:53:34 +02:00

Author	SHA1	Message	Date
Jakub Kicinski	35c2c39832	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Merge in late fixes in preparation for the net-next PR. Conflicts: include/net/sch_generic.h `a6bd339dbb` ("net_sched: fix skb memory leak in deferred qdisc drops") `ff2998f29f` ("net: sched: introduce qdisc-specific drop reason tracing") https://lore.kernel.org/adz0iX85FHMz0HdO@sirena.org.uk drivers/net/ethernet/airoha/airoha_eth.c `1acdfbdb51` ("net: airoha: Fix VIP configuration for AN7583 SoC") `bf3471e6e6` ("net: airoha: Make flow control source port mapping dependent on nbq parameter") Adjacent changes: drivers/net/ethernet/airoha/airoha_ppe.c `f44218cd5e` ("net: airoha: Reset PPE cpu port configuration in airoha_ppe_hw_init()") `7da62262ec` ("inet: add ip_local_port_step_width sysctl to improve port usage distribution") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-14 12:04:00 -07:00
Norbert Szetei	d114bfdc9b	vsock: fix buffer size clamping order In vsock_update_buffer_size(), the buffer size was being clamped to the maximum first, and then to the minimum. If a user sets a minimum buffer size larger than the maximum, the minimum check overrides the maximum check, inverting the constraint. This breaks the intended socket memory boundaries by allowing the vsk->buffer_size to grow beyond the configured vsk->buffer_max_size. Fix this by checking the minimum first, and then the maximum. This ensures the buffer size never exceeds the buffer_max_size. Fixes: `b9f2b0ffde` ("vsock: handle buffer_size sockopts in the core") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Norbert Szetei <norbert@doyensec.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/180118C5-8BCF-4A63-A305-4EE53A34AB9C@doyensec.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 14:31:50 -07:00
Laurence Rowe	98f28d8d6e	vsock: avoid timeout for non-blocking accept() with empty backlog A common pattern in epoll network servers is to eagerly accept all pending connections from the non-blocking listening socket after epoll_wait indicates the socket is ready by calling accept in a loop until EAGAIN is returned indicating that the backlog is empty. Scheduling a timeout for a non-blocking accept with an empty backlog meant AF_VSOCK sockets used by epoll network servers incurred hundreds of microseconds of additional latency per accept loop compared to AF_INET or AF_UNIX sockets. Signed-off-by: Laurence Rowe <laurencerowe@gmail.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260402204918.130395-1-laurencerowe@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:29:01 -07:00
Jakub Kicinski	8ffb33d770	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc7). Conflicts: net/vmw_vsock/af_vsock.c `b18c833888` ("vsock: initialize child_ns_mode_locked in vsock_net_init()") `0de607dc4f` ("vsock: add G2H fallback for CIDs not owned by H2G transport") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c `ceee35e567` ("bnxt_en: Refactor some basic ring setup and adjustment logic") `57cdfe0dc7` ("bnxt_en: Resize RSS contexts on channel count change") drivers/net/wireless/intel/iwlwifi/mld/mac80211.c `4d56037a02` ("wifi: iwlwifi: mld: block EMLSR during TDLS connections") `687a95d204` ("wifi: iwlwifi: mld: correctly set wifi generation data") drivers/net/wireless/intel/iwlwifi/mld/scan.h `b6045c899e` ("wifi: iwlwifi: mld: Refactor scan command handling") `ec66ec6a5a` ("wifi: iwlwifi: mld: Fix MLO scan timing") drivers/net/wireless/intel/iwlwifi/mvm/fw.c `078df640ef` ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v 2") `323156c354` ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:03:13 -07:00
Stefano Garzarella	b18c833888	vsock: initialize child_ns_mode_locked in vsock_net_init() The `child_ns_mode_locked` field lives in `struct net`, which persists across vsock module reloads. When the module is unloaded and reloaded, `vsock_net_init()` resets `mode` and `child_ns_mode` back to their default values, but does not reset `child_ns_mode_locked`. The stale lock from the previous module load causes subsequent writes to `child_ns_mode` to silently fail: `vsock_net_set_child_mode()` sees the old lock, skips updating the actual value, and returns success when the requested mode matches the stale lock. The sysctl handler reports no error, but `child_ns_mode` remains unchanged. Steps to reproduce: $ modprobe vsock $ echo local > /proc/sys/net/vsock/child_ns_mode $ cat /proc/sys/net/vsock/child_ns_mode local $ modprobe -r vsock $ modprobe vsock $ echo local > /proc/sys/net/vsock/child_ns_mode $ cat /proc/sys/net/vsock/child_ns_mode global <--- expected "local" Fix this by initializing `child_ns_mode_locked` to 0 (unlocked) in `vsock_net_init()`, so the write-once mechanism works correctly after module reload. Fixes: `102eab95f0` ("vsock: lock down child_ns_mode as write-once") Reported-by: Jin Liu <jinl@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260401092153.28462-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:18:56 -07:00
Alexander Graf	0de607dc4f	vsock: add G2H fallback for CIDs not owned by H2G transport When no H2G transport is loaded, vsock currently routes all CIDs to the G2H transport (commit `65b422d9b6` ("vsock: forward all packets to the host when no H2G is registered"). Extend that existing behavior: when an H2G transport is loaded but does not claim a given CID, the connection falls back to G2H in the same way. This matters in environments like Nitro Enclaves, where an instance may run nested VMs via vhost-vsock (H2G) while also needing to reach sibling enclaves at higher CIDs through virtio-vsock-pci (G2H). With the old code, any CID > 2 was unconditionally routed to H2G when vhost was loaded, making those enclaves unreachable without setting VMADDR_FLAG_TO_HOST explicitly on every connect. Requiring every application to set VMADDR_FLAG_TO_HOST creates friction: tools like socat, iperf, and others would all need to learn about it. The flag was introduced 6 years ago and I am still not aware of any tool that supports it. Even if there was support, it would be cumbersome to use. The most natural experience is a single CID address space where H2G only wins for CIDs it actually owns, and everything else falls through to G2H, extending the behavior that already exists when H2G is absent. To give user space at least a hint that the kernel applied this logic, automatically set the VMADDR_FLAG_TO_HOST on the remote address so it can determine the path taken via getpeername(). Add a per-network namespace sysctl net.vsock.g2h_fallback (default 1). At 0 it forces strict routing: H2G always wins for CID > VMADDR_CID_HOST, or ENODEV if H2G is not loaded. Signed-off-by: Alexander Graf <graf@amazon.com> Tested-by: syzbot@syzkaller.appspotmail.com Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260304230027.59857-1-graf@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-12 10:59:36 +01:00
Eric Dumazet	8341c989ac	net: remove addr_len argument of recvmsg() handlers Use msg->msg_namelen as a place holder instead of a temporary variable, notably in inet[6]_recvmsg(). This removes stack canaries and allows tail-calls. $ scripts/bloat-o-meter -t vmlinux.old vmlinux add/remove: 0/0 grow/shrink: 2/19 up/down: 26/-532 (-506) Function old new delta rawv6_recvmsg 744 767 +23 vsock_dgram_recvmsg 55 58 +3 vsock_connectible_recvmsg 50 47 -3 unix_stream_recvmsg 161 158 -3 unix_seqpacket_recvmsg 62 59 -3 unix_dgram_recvmsg 42 39 -3 tcp_recvmsg 546 543 -3 mptcp_recvmsg 1568 1565 -3 ping_recvmsg 806 800 -6 tcp_bpf_recvmsg_parser 983 974 -9 ip_recv_error 588 576 -12 ipv6_recv_rxpmtu 442 428 -14 udp_recvmsg 1243 1224 -19 ipv6_recv_error 1046 1024 -22 udpv6_recvmsg 1487 1461 -26 raw_recvmsg 465 437 -28 udp_bpf_recvmsg 1027 984 -43 sock_common_recvmsg 103 27 -76 inet_recvmsg 257 175 -82 inet6_recvmsg 257 175 -82 tcp_bpf_recvmsg 663 568 -95 Total: Before=25143834, After=25143328, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260227151120.1346573-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-02 18:17:17 -08:00
Bobby Eshleman	102eab95f0	vsock: lock down child_ns_mode as write-once Two administrator processes may race when setting child_ns_mode as one process sets child_ns_mode to "local" and then creates a namespace, but another process changes child_ns_mode to "global" between the write and the namespace creation. The first process ends up with a namespace in "global" mode instead of "local". While this can be detected after the fact by reading ns_mode and retrying, it is fragile and error-prone. Make child_ns_mode write-once so that a namespace manager can set it once and be sure it won't change. Writing a different value after the first write returns -EBUSY. This applies to all namespaces, including init_net, where an init process can write "local" to lock all future namespaces into local mode. Fixes: `eafb64f40c` ("vsock: add netns to vsock core") Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Co-developed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-2-c0cde6959923@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-26 11:10:03 +01:00
Greg Kroah-Hartman	5cc619583c	vsock: Use container_of() to get net namespace in sysctl handlers current->nsproxy is should not be accessed directly as syzbot has found that it could be NULL at times, causing crashes. Fix up the af_vsock sysctl handlers to use container_of() to deal with the current net namespace instead of attempting to rely on current. This is the same type of change done in commit `7f5611cbc4` ("rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Fixes: `eafb64f40c` ("vsock: add netns to vsock core") Link: https://patch.msgid.link/2026022318-rearview-gallery-ae13@gregkh Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 18:59:18 -08:00
Stefano Garzarella	6a997f38bd	vsock: prevent child netns mode switch from local to global A "local" namespace can change its `child_ns_mode` sysctl to "global", allowing nested namespaces to access global CIDs. This can be exploited by an unprivileged user who gained CAP_NET_ADMIN through a user namespace. Prevent this by rejecting writes that attempt to set `child_ns_mode` to "global" when the current namespace's mode is "local". Fixes: `eafb64f40c` ("vsock: add netns to vsock core") Cc: bobbyeshleman@meta.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260212205916.97533-3-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-13 12:28:38 -08:00
Stefano Garzarella	9dd391493a	vsock: fix child netns mode initialization When a new network namespace is created, vsock_net_init() correctly initializes the namespace's mode by reading the parent's `child_ns_mode` via vsock_net_child_mode(). However, the `child_ns_mode` of the new namespace was always hardcoded to VSOCK_NET_MODE_GLOBAL, regardless of its own mode. This means that if a parent namespace has `child_ns_mode` set to "local", the child namespace correctly gets mode "local", but its `child_ns_mode` is reset to "global". As a result, further nested namespaces will incorrectly get mode "global" instead of inheriting "local", breaking the expected propagation of the mode through nested namespaces. Fix this by initializing `child_ns_mode` to the namespace's own mode, so the setting propagates correctly through all levels of nesting. Fixes: `eafb64f40c` ("vsock: add netns to vsock core") Cc: bobbyeshleman@meta.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260212205916.97533-2-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-13 12:28:38 -08:00
Bobby Eshleman	eafb64f40c	vsock: add netns to vsock core Add netns logic to vsock core. Additionally, modify transport hook prototypes to be used by later transport-specific patches (e.g., *_seqpacket_allow()). Namespaces are supported primarily by changing socket lookup functions (e.g., vsock_find_connected_socket()) to take into account the socket namespace and the namespace mode before considering a candidate socket a "match". This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode to report the mode and /proc/sys/net/vsock/child_ns_mode to set the mode for new namespaces. Add netns functionality (initialization, passing to transports, procfs, etc...) to the af_vsock socket layer. Later patches that add netns support to transports depend on this patch. This patch changes the allocation of random ports for connectible vsocks in order to avoid leaking the random port range starting point to other namespaces. dgram_allow(), stream_allow(), and seqpacket_allow() callbacks are modified to take a vsk in order to perform logic on namespace modes. In future patches, the net will also be used for socket lookups in these functions. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-1-2859a7512097@meta.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-27 10:45:38 +01:00
Michal Luczaj	ce5e612dd4	vsock: Make accept()ed sockets use custom setsockopt() SO_ZEROCOPY handling in vsock_connectible_setsockopt() does not get called on accept()ed sockets due to a missing flag. Flip it. Fixes: `e0718bd82e` ("vsock: enable setting SO_ZEROCOPY") Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20251229-vsock-child-sock-custom-sockopt-v2-1-64778d6c4f88@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-05 16:14:50 -08:00
Jakub Kicinski	9e203721ec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile `e1bb28bf13` ("selftest: af_unix: Add test for SO_PEEK_OFF.") `45a1cd8346` ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 09:13:26 -08:00
Michal Luczaj	002541ef65	vsock: Ignore signal/timeout on connect() if already established During connect(), acting on a signal/timeout by disconnecting an already established socket leads to several issues: 1. connect() invoking vsock_transport_cancel_pkt() -> virtio_transport_purge_skbs() may race with sendmsg() invoking virtio_transport_get_credit(). This results in a permanently elevated `vvs->bytes_unsent`. Which, in turn, confuses the SOCK_LINGER handling. 2. connect() resetting a connected socket's state may race with socket being placed in a sockmap. A disconnected socket remaining in a sockmap breaks sockmap's assumptions. And gives rise to WARNs. 3. connect() transitioning SS_CONNECTED -> SS_UNCONNECTED allows for a transport change/drop after TCP_ESTABLISHED. Which poses a problem for any simultaneous sendmsg() or connect() and may result in a use-after-free/null-ptr-deref. Do not disconnect socket on signal/timeout. Keep the logic for unconnected sockets: they don't linger, can't be placed in a sockmap, are rejected by sendmsg(). [1]: https://lore.kernel.org/netdev/e07fd95c-9a38-4eea-9638-133e38c2ec9b@rbox.co/ [2]: https://lore.kernel.org/netdev/20250317-vsock-trans-signal-race-v4-0-fc8837f3f1d4@rbox.co/ [3]: https://lore.kernel.org/netdev/60f1b7db-3099-4f6a-875e-af9f6ef194f6@rbox.co/ Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251119-vsock-interrupted-connect-v2-1-70734cf1233f@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 07:40:06 -08:00
Kees Cook	85cb0757d7	net: Convert proto_ops connect() callbacks to use sockaddr_unsized Update all struct proto_ops connect() callback function prototypes from "struct sockaddr " to "struct sockaddr_unsized " to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 19:10:32 -08:00
Kees Cook	0e50474fa5	net: Convert proto_ops bind() callbacks to use sockaddr_unsized Update all struct proto_ops bind() callback function prototypes from "struct sockaddr " to "struct sockaddr_unsized " to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 19:10:32 -08:00
Stefano Garzarella	f7c877e753	vsock: fix lock inversion in vsock_assign_transport() Syzbot reported a potential lock inversion deadlock between vsock_register_mutex and sk_lock-AF_VSOCK when vsock_linger() is called. The issue was introduced by commit `687aa0c558` ("vsock: Fix transport_* TOCTOU") which added vsock_register_mutex locking in vsock_assign_transport() around the transport->release() call, that can call vsock_linger(). vsock_assign_transport() can be called with sk_lock held. vsock_linger() calls sk_wait_event() that temporarily releases and re-acquires sk_lock. During this window, if another thread hold vsock_register_mutex while trying to acquire sk_lock, a circular dependency is created. Fix this by releasing vsock_register_mutex before calling transport->release() and vsock_deassign_transport(). This is safe because we don't need to hold vsock_register_mutex while releasing the old transport, and we ensure the new transport won't disappear by obtaining a module reference first via try_module_get(). Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com Tested-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com Fixes: `687aa0c558` ("vsock: Fix transport_* TOCTOU") Cc: mhal@rbox.co Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251021121718.137668-1-sgarzare@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-10-23 16:07:58 +02:00
Marco Crivellari	5fd8bb982e	net: replace use of system_wq with system_percpu_wq Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. The old system_unbound_wq will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20250918142427.309519-3-marco.crivellari@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-22 17:40:30 -07:00
Jakub Kicinski	f24775c325	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc2). No conflicts. Adjacent changes: drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c `d7a276a576` ("net: stmmac: rk: convert to suspend()/resume() methods") `de1e963ad0` ("net: stmmac: rk: put the PHY clock on remove") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-14 12:13:00 -07:00
Wang Liang	4d18083d6b	vsock: use sizeof(struct sockaddr_storage) instead of magic value Previous commit `230b183921` ("net: Use standard structures for generic socket address structures.") use 'struct sockaddr_storage address;' to replace 'char address[MAX_SOCK_ADDR];'. The macro MAX_SOCK_ADDR is removed by commit `01893c82b4` ("net: Remove MAX_SOCK_ADDR constant"). The comment in vsock_getname() is outdated, use sizeof(struct sockaddr_storage) instead of magic value 128. Signed-off-by: Wang Liang <wangliang74@huawei.com> Link: https://patch.msgid.link/20250812015929.1419896-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-13 17:05:21 -07:00
Budimir Markovic	aba0c94f61	vsock: Do not allow binding to VMADDR_PORT_ANY It is possible for a vsock to autobind to VMADDR_PORT_ANY. This can cause a use-after-free when a connection is made to the bound socket. The socket returned by accept() also has port VMADDR_PORT_ANY but is not on the list of unbound sockets. Binding it will result in an extra refcount decrement similar to the one fixed in `fcdd2242c0` (vsock: Keep the binding until socket destruction). Modify the check in __vsock_bind_connectible() to also prevent binding to VMADDR_PORT_ANY. Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Reported-by: Budimir Markovic <markovicbudimir@gmail.com> Signed-off-by: Budimir Markovic <markovicbudimir@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250807041811.678-1-markovicbudimir@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-08 12:55:00 -07:00
Wang Liang	002f79a5f0	vsock: remove unnecessary null check in vsock_getname() The local variable 'vm_addr' is always not NULL, no need to check it. Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250725013808.337924-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-26 11:26:20 -07:00
Jakub Kicinski	3321e97eab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc6). No conflicts. Adjacent changes: Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml `0a12c435a1` ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible") `b3603c0466` ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 10:10:49 -07:00
Xuewei Niu	f7c7226592	vsock: Add support for SIOCINQ ioctl Add support for SIOCINQ ioctl, indicating the length of bytes unread in the socket. The value is obtained from `vsock_stream_has_data()`. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://patch.msgid.link/20250708-siocinq-v6-2-3775f9a9e359@antgroup.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:29:52 -07:00
Michal Luczaj	1e7d9df379	vsock: Fix IOCTL_VM_SOCKETS_GET_LOCAL_CID to check also `transport_local` Support returning VMADDR_CID_LOCAL in case no other vsock transport is available. Fixes: `0e12190578` ("vsock: add local transport support in the vsock core") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250703-vsock-transports-toctou-v4-3-98f0eb530747@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:39:49 -07:00
Michal Luczaj	687aa0c558	vsock: Fix transport_* TOCTOU Transport assignment may race with module unload. Protect new_transport from becoming a stale pointer. This also takes care of an insecure call in vsock_use_local_transport(); add a lockdep assert. BUG: unable to handle page fault for address: fffffbfff8056000 Oops: Oops: 0000 [#1] SMP KASAN RIP: 0010:vsock_assign_transport+0x366/0x600 Call Trace: vsock_connect+0x59c/0xc40 __sys_connect+0xe8/0x100 __x64_sys_connect+0x6e/0xc0 do_syscall_64+0x92/0x1c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: `c0cfa2d8a7` ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250703-vsock-transports-toctou-v4-2-98f0eb530747@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:39:49 -07:00
Michal Luczaj	209fd72083	vsock: Fix transport_{g2h,h2g} TOCTOU vsock_find_cid() and vsock_dev_do_ioctl() may race with module unload. transport_{g2h,h2g} may become NULL after the NULL check. Introduce vsock_transport_local_cid() to protect from a potential null-ptr-deref. KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] RIP: 0010:vsock_find_cid+0x47/0x90 Call Trace: __vsock_bind+0x4b2/0x720 vsock_bind+0x90/0xe0 __sys_bind+0x14d/0x1e0 __x64_sys_bind+0x6e/0xc0 do_syscall_64+0x92/0x1c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] RIP: 0010:vsock_dev_do_ioctl.isra.0+0x58/0xf0 Call Trace: __x64_sys_ioctl+0x12d/0x190 do_syscall_64+0x92/0x1c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: `c0cfa2d8a7` ("vsock: add multi-transports support") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250703-vsock-transports-toctou-v4-1-98f0eb530747@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:39:49 -07:00
Michal Luczaj	5ec40864aa	vsock: Move lingering logic to af_vsock core Lingering should be transport-independent in the long run. In preparation for supporting other transports, as well as the linger on shutdown(), move code to core. Generalize by querying vsock_transport::unsent_bytes(), guard against the callback being unimplemented. Do not pass sk_lingertime explicitly. Pull SOCK_LINGER check into vsock_linger(). Flatten the function. Remove the nested block by inverting the condition: return early on !timeout. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250522-vsock-linger-v6-2-2ad00b0e447e@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-27 11:05:21 +02:00
Stefano Garzarella	fccd2b711d	vsock: avoid timeout during connect() if the socket is closing When a peer attempts to establish a connection, vsock_connect() contains a loop that waits for the state to be TCP_ESTABLISHED. However, the other peer can be fast enough to accept the connection and close it immediately, thus moving the state to TCP_CLOSING. When this happens, the peer in the vsock_connect() is properly woken up, but since the state is not TCP_ESTABLISHED, it goes back to sleep until the timeout expires, returning -ETIMEDOUT. If the socket state is TCP_CLOSING, waiting for the timeout is pointless. vsock_connect() can return immediately without errors or delay since the connection actually happened. The socket will be in a closing state, but this is not an issue, and subsequent calls will fail as expected. We discovered this issue while developing a test that accepts and immediately closes connections to stress the transport switch between two connect() calls, where the first one was interrupted by a signal (see Closes link). Reported-by: Luigi Leonardi <leonardi@redhat.com> Closes: https://lore.kernel.org/virtualization/bq6hxrolno2vmtqwcvb5bljfpb7mvwb3kohrvaed6auz5vxrfv@ijmd2f3grobn/ Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://patch.msgid.link/20250328141528.420719-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-02 17:19:30 -07:00
Michal Luczaj	857ae05549	vsock/bpf: Warn on socket without transport In the spirit of commit `91751e2482` ("vsock: prevent null-ptr-deref in vsock_*[has_data\|has_space]"), armorize the "impossible" cases with a warning. Fixes: `634f1a7110` ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 12:00:01 +01:00
Michal Luczaj	78dafe1cf3	vsock: Orphan socket after transport release During socket release, sock_orphan() is called without considering that it sets sk->sk_wq to NULL. Later, if SO_LINGER is enabled, this leads to a null pointer dereferenced in virtio_transport_wait_close(). Orphan the socket only after transport release. Partially reverts the 'Fixes:' commit. KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] lock_acquire+0x19e/0x500 _raw_spin_lock_irqsave+0x47/0x70 add_wait_queue+0x46/0x230 virtio_transport_release+0x4e7/0x7f0 __vsock_release+0xfd/0x490 vsock_release+0x90/0x120 __sock_release+0xa3/0x250 sock_close+0x14/0x20 __fput+0x35e/0xa90 __x64_sys_close+0x78/0xd0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Reported-by: syzbot+9d55b199192a4be7d02c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c Fixes: `fcdd2242c0` ("vsock: Keep the binding until socket destruction") Tested-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250210-vsock-linger-nullderef-v3-1-ef6244d02b54@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-12 20:01:28 -08:00
Michal Luczaj	aa388c7211	vsock: Allow retrying on connect() failure sk_err is set when a (connectible) connect() fails. Effectively, this makes an otherwise still healthy SS_UNCONNECTED socket impossible to use for any subsequent connection attempts. Clear sk_err upon trying to establish a connection. Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-2-1cf57065b770@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-29 18:50:37 -08:00
Michal Luczaj	fcdd2242c0	vsock: Keep the binding until socket destruction Preserve sockets bindings; this includes both resulting from an explicit bind() and those implicitly bound through autobind during connect(). Prevents socket unbinding during a transport reassignment, which fixes a use-after-free: 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2) 2. transport->release() calls vsock_remove_bound() without checking if sk was bound and moved to bound list (refcnt=1) 3. vsock_bind() assumes sk is in unbound list and before __vsock_insert_bound(vsock_bound_sockets()) calls __vsock_remove_bound() which does: list_del_init(&vsk->bound_table); // nop sock_put(&vsk->sk); // refcnt=0 BUG: KASAN: slab-use-after-free in __vsock_bind+0x62e/0x730 Read of size 4 at addr ffff88816b46a74c by task a.out/2057 dump_stack_lvl+0x68/0x90 print_report+0x174/0x4f6 kasan_report+0xb9/0x190 __vsock_bind+0x62e/0x730 vsock_bind+0x97/0xe0 __sys_bind+0x154/0x1f0 __x64_sys_bind+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 2057: kasan_save_stack+0x1e/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x85/0x90 kmem_cache_alloc_noprof+0x131/0x450 sk_prot_alloc+0x5b/0x220 sk_alloc+0x2c/0x870 __vsock_create.constprop.0+0x2e/0xb60 vsock_create+0xe4/0x420 __sock_create+0x241/0x650 __sys_socket+0xf2/0x1a0 __x64_sys_socket+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 2057: kasan_save_stack+0x1e/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x37/0x60 __kasan_slab_free+0x4b/0x70 kmem_cache_free+0x1a1/0x590 __sk_destruct+0x388/0x5a0 __vsock_bind+0x5e1/0x730 vsock_bind+0x97/0xe0 __sys_bind+0x154/0x1f0 __x64_sys_bind+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e refcount_t: addition on 0; use-after-free. WARNING: CPU: 7 PID: 2057 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150 RIP: 0010:refcount_warn_saturate+0xce/0x150 __vsock_bind+0x66d/0x730 vsock_bind+0x97/0xe0 __sys_bind+0x154/0x1f0 __x64_sys_bind+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e refcount_t: underflow; use-after-free. WARNING: CPU: 7 PID: 2057 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150 RIP: 0010:refcount_warn_saturate+0xee/0x150 vsock_remove_bound+0x187/0x1e0 __vsock_release+0x383/0x4a0 vsock_release+0x90/0x120 __sock_release+0xa3/0x250 sock_close+0x14/0x20 __fput+0x359/0xa80 task_work_run+0x107/0x1d0 do_exit+0x847/0x2560 do_group_exit+0xb8/0x250 __x64_sys_exit_group+0x3a/0x50 x64_sys_call+0xfec/0x14f0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: `c0cfa2d8a7` ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-1-1cf57065b770@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-29 18:50:36 -08:00
Stefano Garzarella	91751e2482	vsock: prevent null-ptr-deref in vsock_[has_data\|has_space] Recent reports have shown how we sometimes call vsock__has_data() when a vsock socket has been de-assigned from a transport (see attached links), but we shouldn't. Previous commits should have solved the real problems, but we may have more in the future, so to avoid null-ptr-deref, we can return 0 (no space, no data available) but with a warning. This way the code should continue to run in a nearly consistent state and have a warning that allows us to debug future problems. Fixes: `c0cfa2d8a7` ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/ Link: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/ Link: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/ Co-developed-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Co-developed-by: Wongi Lee <qwerty@theori.io> Signed-off-by: Wongi Lee <qwerty@theori.io> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-01-14 12:29:37 +01:00
Stefano Garzarella	a24009bc9b	vsock: reset socket state when de-assigning the transport Transport's release() and destruct() are called when de-assigning the vsock transport. These callbacks can touch some socket state like sock flags, sk_state, and peer_shutdown. Since we are reassigning the socket to a new transport during vsock_connect(), let's reset these fields to have a clean state with the new transport. Fixes: `c0cfa2d8a7` ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-01-14 12:29:37 +01:00
Michal Luczaj	135ffc7bec	bpf, vsock: Invoke proto::close on close() vsock defines a BPF callback to be invoked when close() is called. However, this callback is never actually executed. As a result, a closed vsock socket is not automatically removed from the sockmap/sockhash. Introduce a dummy vsock_close() and make vsock_release() call proto::close. Note: changes in __vsock_release() look messy, but it's only due to indent level reduction and variables xmas tree reorder. Fixes: `634f1a7110` ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://lore.kernel.org/r/20241118-vsock-bpf-poll-close-v1-3-f1b9669cacdc@rbox.co Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com>	2024-11-25 14:19:14 -08:00
Michal Luczaj	9f0fc98145	bpf, vsock: Fix poll() missing a queue When a verdict program simply passes a packet without redirection, sk_msg is enqueued on sk_psock::ingress_msg. Add a missing check to poll(). Fixes: `634f1a7110` ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://lore.kernel.org/r/20241118-vsock-bpf-poll-close-v1-1-f1b9669cacdc@rbox.co Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com>	2024-11-25 14:19:14 -08:00
Jakub Kicinski	a79993b5fc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.12-rc8). Conflicts: tools/testing/selftests/net/.gitignore `252e01e682` ("selftests: net: add netlink-dumps to .gitignore") `be43a6b238` ("selftests: ncdevmem: Move ncdevmem under drivers/net/hw") https://lore.kernel.org/all/20241113122359.1b95180a@canb.auug.org.au/ drivers/net/phy/phylink.c `671154f174` ("net: phylink: ensure PHY momentary link-fails are handled") `7530ea26c8` ("net: phylink: remove "using_mac_select_pcs"") Adjacent changes: drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c `5b366eae71` ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines") `e96321fad3` ("net: ethernet: Switch back to struct platform_driver::remove()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-14 11:29:15 -08:00
Michal Luczaj	fbf7085b3a	vsock: Fix sk_error_queue memory leak Kernel queues MSG_ZEROCOPY completion notifications on the error queue. Where they remain, until explicitly recv()ed. To prevent memory leaks, clean up the queue when the socket is destroyed. unreferenced object 0xffff8881028beb00 (size 224): comm "vsock_test", pid 1218, jiffies 4294694897 hex dump (first 32 bytes): 90 b0 21 17 81 88 ff ff 90 b0 21 17 81 88 ff ff ..!.......!..... 00 00 00 00 00 00 00 00 00 b0 21 17 81 88 ff ff ..........!..... backtrace (crc 6c7031ca): [<ffffffff81418ef7>] kmem_cache_alloc_node_noprof+0x2f7/0x370 [<ffffffff81d35882>] __alloc_skb+0x132/0x180 [<ffffffff81d2d32b>] sock_omalloc+0x4b/0x80 [<ffffffff81d3a8ae>] msg_zerocopy_realloc+0x9e/0x240 [<ffffffff81fe5cb2>] virtio_transport_send_pkt_info+0x412/0x4c0 [<ffffffff81fe6183>] virtio_transport_stream_enqueue+0x43/0x50 [<ffffffff81fe0813>] vsock_connectible_sendmsg+0x373/0x450 [<ffffffff81d233d5>] ____sys_sendmsg+0x365/0x3a0 [<ffffffff81d246f4>] ___sys_sendmsg+0x84/0xd0 [<ffffffff81d26f47>] __sys_sendmsg+0x47/0x80 [<ffffffff820d3df3>] do_syscall_64+0x93/0x180 [<ffffffff8220012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: `581512a6dc` ("vsock/virtio: MSG_ZEROCOPY flag support") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-12 12:16:51 +01:00
Eric Dumazet	ba4e469e42	vsock: do not leave dangling sk pointer in vsock_create() syzbot was able to trigger the following warning after recent core network cleanup. On error vsock_create() frees the allocated sk object, but sock_init_data() has already attached it to the provided sock object. We must clear sock->sk to avoid possible use-after-free later. WARNING: CPU: 0 PID: 5282 at net/socket.c:1581 __sock_create+0x897/0x950 net/socket.c:1581 Modules linked in: CPU: 0 UID: 0 PID: 5282 Comm: syz.2.43 Not tainted 6.12.0-rc2-syzkaller-00667-g53bac8330865 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:__sock_create+0x897/0x950 net/socket.c:1581 Code: 7f 06 01 65 48 8b 34 25 00 d8 03 00 48 81 c6 b0 08 00 00 48 c7 c7 60 0b 0d 8d e8 d4 9a 3c 02 e9 11 f8 ff ff e8 0a ab 0d f8 90 <0f> 0b 90 e9 82 fd ff ff 89 e9 80 e1 07 fe c1 38 c1 0f 8c c7 f8 ff RSP: 0018:ffffc9000394fda8 EFLAGS: 00010293 RAX: ffffffff89873c46 RBX: ffff888079f3c818 RCX: ffff8880314b9e00 RDX: 0000000000000000 RSI: 00000000ffffffed RDI: 0000000000000000 RBP: ffffffff8d3337f0 R08: ffffffff8987384e R09: ffffffff8989473a R10: dffffc0000000000 R11: fffffbfff203a276 R12: 00000000ffffffed R13: ffff888079f3c8c0 R14: ffffffff898736e7 R15: dffffc0000000000 FS: 00005555680ab500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f22b11196d0 CR3: 00000000308c0000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> sock_create net/socket.c:1632 [inline] __sys_socket_create net/socket.c:1669 [inline] __sys_socket+0x150/0x3c0 net/socket.c:1716 __do_sys_socket net/socket.c:1730 [inline] __se_sys_socket net/socket.c:1728 [inline] __x64_sys_socket+0x7a/0x90 net/socket.c:1728 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f22b117dff9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff56aec0e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000029 RAX: ffffffffffffffda RBX: 00007f22b1335f80 RCX: 00007f22b117dff9 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000028 RBP: 00007f22b11f0296 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f22b1335f80 R14: 00007f22b1335f80 R15: 00000000000012dd Fixes: `48156296a0` ("net: warn, if pf->create does not clear sock->sk on error") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20241022134819.1085254-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-28 18:08:52 -07:00
Jakub Kicinski	4d3d3559fc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml `c25504a0ba` ("dt-bindings: net: fsl,qoriq-mc-dpmac: add missed property phys") `be034ee6c3` ("dt-bindings: net: fsl,qoriq-mc-dpmac: using unevaluatedProperties") https://lore.kernel.org/20240815110934.56ae623a@canb.auug.org.au drivers/net/dsa/vitesse-vsc73xx-core.c `5b9eebc2c7` ("net: dsa: vsc73xx: pass value in phy_write operation") `fa63c6434b` ("net: dsa: vsc73xx: check busy flag in MDIO operations") `2524d6c28b` ("net: dsa: vsc73xx: use defined values in phy operations") https://lore.kernel.org/20240813104039.429b9fe6@canb.auug.org.au Resolve by using FIELD_PREP(), Stephen's resolution is simpler. Adjacent changes: net/vmw_vsock/af_vsock.c `69139d2919` ("vsock: fix recursive ->recvmsg calls") `744500d81f` ("vsock: add support for SIOCOUTQ ioctl") Link: https://patch.msgid.link/20240815141149.33862-1-pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-15 17:18:52 -07:00
Cong Wang	69139d2919	vsock: fix recursive ->recvmsg calls After a vsock socket has been added to a BPF sockmap, its prot->recvmsg has been replaced with vsock_bpf_recvmsg(). Thus the following recursiion could happen: vsock_bpf_recvmsg() -> __vsock_recvmsg() -> vsock_connectible_recvmsg() -> prot->recvmsg() -> vsock_bpf_recvmsg() again We need to fix it by calling the original ->recvmsg() without any BPF sockmap logic in __vsock_recvmsg(). Fixes: `634f1a7110` ("vsock: support sockmap") Reported-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com Tested-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com Cc: Bobby Eshleman <bobby.eshleman@bytedance.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20240812022153.86512-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-15 12:07:04 +02:00
Luigi Leonardi	744500d81f	vsock: add support for SIOCOUTQ ioctl Add support for ioctl(s) in AF_VSOCK. The only ioctl available is SIOCOUTQ/TIOCOUTQ, which returns the number of unsent bytes in the socket. This information is transport-specific and is delegated to them using a callback. Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-02 09:20:28 +01:00
Jens Axboe	92ef0fd55a	net: change proto and proto_ops accept type Rather than pass in flags, error pointer, and whether this is a kernel invocation or not, add a struct proto_accept_arg struct as the argument. This then holds all of these arguments, and prepares accept for being able to pass back more information. No functional changes in this patch. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-05-13 18:19:09 -06:00
Arseniy Krasnov	0fe1798968	virtio/vsock: send credit update during setting SO_RCVLOWAT Send credit update message when SO_RCVLOWAT is updated and it is bigger than number of bytes in rx queue. It is needed, because 'poll()' will wait until number of bytes in rx queue will be not smaller than O_RCVLOWAT, so kick sender to send more data. Otherwise mutual hungup for tx/rx is possible: sender waits for free space and receiver is waiting data in 'poll()'. Rename 'set_rcvlowat' callback to 'notify_set_rcvlowat' and set 'sk->sk_rcvlowat' only in one place (i.e. 'vsock_set_rcvlowat'), so the transport doesn't need to do it. Fixes: `b89d882dc9` ("vsock/virtio: reduce credit update messages") Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-15 10:37:35 +00:00
Arseniy Krasnov	e0718bd82e	vsock: enable setting SO_ZEROCOPY For AF_VSOCK, zerocopy tx mode depends on transport, so this option must be set in AF_VSOCK implementation where transport is accessible (if transport is not set during setting SO_ZEROCOPY: for example socket is not connected, then SO_ZEROCOPY will be enabled, but once transport will be assigned, support of this type of transmission will be checked). To handle SO_ZEROCOPY, AF_VSOCK implementation uses SOCK_CUSTOM_SOCKOPT bit, thus handling SOL_SOCKET option operations, but all of them except SO_ZEROCOPY will be forwarded to the generic handler by calling 'sock_setsockopt()'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-15 13:19:42 +01:00
Arseniy Krasnov	dcc55d7bb2	vsock: enable SOCK_SUPPORT_ZC bit This bit is used by io_uring in case of zerocopy tx mode. io_uring code checks, that socket has this feature. This patch sets it in two places: 1) For socket in 'connect()' call. 2) For new socket which is returned by 'accept()' call. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-15 13:19:42 +01:00
Arseniy Krasnov	5fbfc7d243	vsock: check for MSG_ZEROCOPY support on send This feature totally depends on transport, so if transport doesn't support it, return error. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-15 13:19:42 +01:00
Arseniy Krasnov	49dbe25ada	vsock: read from socket's error queue This adds handling of MSG_ERRQUEUE input flag in receive call. This flag is used to read socket's error queue instead of data queue. Possible scenario of error queue usage is receiving completions for transmission with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK' and 'VSOCK_RECVERR'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-15 13:19:42 +01:00

1 2 3 4

170 Commits