mirror of
https://github.com/torvalds/linux.git
synced 2026-05-31 02:24:24 +02:00
master
11 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
0c0dddc07d |
ovpn: disable BHs when updating device stats
ovpn updates dev->dstats from both process and softirq contexts. In
particular, TCP paths may run from socket callbacks, workqueues or
strparser work, while UDP receive and ovpn's ndo_start_xmit path may
update the same per-device dstats from BH context.
Add ovpn device drop-stat helpers that disable BHs around
dev_dstats_rx_dropped() and dev_dstats_tx_dropped(), and use them for
drop accounting.
The successful RX dev_dstats_rx_add() update is already covered by the
BH-disabled section around gro_cells_receive(). For the successful TCP
TX dev_dstats_tx_add() update, replace the existing preempt-disabled
section with a BH-disabled one.
Fixes:
|
||
|
|
775d8d7ad0 |
ovpn: tcp - use cached peer pointer in ovpn_tcp_close()
ovpn_tcp_close() loads the ovpn_socket via rcu_dereference_sk_user_data() under rcu_read_lock(), takes a reference on sock->peer, caches the peer pointer in a local, and drops the read lock. It then passes sock->peer (rather than the cached local) to ovpn_peer_del(), re-dereferencing the ovpn_socket after the RCU read section has ended. Unlike ovpn_tcp_sendmsg(), which uses the same "load under RCU, use after unlock" pattern but is protected by lock_sock() held across the function, ovpn_tcp_close() runs without the socket lock: inet_release() invokes sk_prot->close() without taking lock_sock first. ovpn_socket_release() can therefore complete its kref_put -> detach -> synchronize_rcu -> kfree(sock) sequence concurrently, in the window after ovpn_tcp_close() drops rcu_read_lock() but before it dereferences sock->peer. The synchronize_rcu() in ovpn_socket_release() protects readers that use the dereferenced pointer inside the RCU read section, not those that escape the pointer to a local and use it afterwards. A reproducer follows the pattern of commit |
||
|
|
8341c989ac |
net: remove addr_len argument of recvmsg() handlers
Use msg->msg_namelen as a place holder instead of a temporary variable, notably in inet[6]_recvmsg(). This removes stack canaries and allows tail-calls. $ scripts/bloat-o-meter -t vmlinux.old vmlinux add/remove: 0/0 grow/shrink: 2/19 up/down: 26/-532 (-506) Function old new delta rawv6_recvmsg 744 767 +23 vsock_dgram_recvmsg 55 58 +3 vsock_connectible_recvmsg 50 47 -3 unix_stream_recvmsg 161 158 -3 unix_seqpacket_recvmsg 62 59 -3 unix_dgram_recvmsg 42 39 -3 tcp_recvmsg 546 543 -3 mptcp_recvmsg 1568 1565 -3 ping_recvmsg 806 800 -6 tcp_bpf_recvmsg_parser 983 974 -9 ip_recv_error 588 576 -12 ipv6_recv_rxpmtu 442 428 -14 udp_recvmsg 1243 1224 -19 ipv6_recv_error 1046 1024 -22 udpv6_recvmsg 1487 1461 -26 raw_recvmsg 465 437 -28 udp_bpf_recvmsg 1027 984 -43 sock_common_recvmsg 103 27 -76 inet_recvmsg 257 175 -82 inet6_recvmsg 257 175 -82 tcp_bpf_recvmsg 663 568 -95 Total: Before=25143834, After=25143328, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260227151120.1346573-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
d4f687fbbc |
ovpn: tcp - fix packet extraction from stream
When processing TCP stream data in ovpn_tcp_recv, we receive large
cloned skbs from __strp_rcv that may contain multiple coalesced packets.
The current implementation has two bugs:
1. Header offset overflow: Using pskb_pull with large offsets on
coalesced skbs causes skb->data - skb->head to exceed the u16 storage
of skb->network_header. This causes skb_reset_network_header to fail
on the inner decapsulated packet, resulting in packet drops.
2. Unaligned protocol headers: Extracting packets from arbitrary
positions within the coalesced TCP stream provides no alignment
guarantees for the packet data causing performance penalties on
architectures without efficient unaligned access. Additionally,
openvpn's 2-byte length prefix on TCP packets causes the subsequent
4-byte opcode and packet ID fields to be inherently misaligned.
Fix both issues by allocating a new skb for each openvpn packet and
using skb_copy_bits to extract only the packet content into the new
buffer, skipping the 2-byte length prefix. Also, check the length before
invoking the function that performs the allocation to avoid creating an
invalid skb.
If the packet has to be forwarded to userspace the 2-byte prefix can be
pushed to the head safely, without misalignment.
As a side effect, this approach also avoids the expensive linearization
that pskb_pull triggers on cloned skbs with page fragments. In testing,
this resulted in TCP throughput improvements of up to 74%.
Fixes:
|
||
|
|
94560267d6 |
ovpn: tcp - don't deref NULL sk_socket member after tcp_close()
When deleting a peer in case of keepalive expiration, the peer is
removed from the OpenVPN hashtable and is temporary inserted in a
"release list" for further processing.
This happens in:
ovpn_peer_keepalive_work()
unlock_ovpn(release_list)
This processing includes detaching from the socket being used to
talk to this peer, by restoring its original proto and socket
ops/callbacks.
In case of TCP it may happen that, while the peer is sitting in
the release list, userspace decides to close the socket.
This will result in a concurrent execution of:
tcp_close(sk)
__tcp_close(sk)
sock_orphan(sk)
sk_set_socket(sk, NULL)
The last function call will set sk->sk_socket to NULL.
When the releasing routine is resumed, ovpn_tcp_socket_detach()
will attempt to dereference sk->sk_socket to restore its original
ops member. This operation will crash due to sk->sk_socket being NULL.
Fix this race condition by testing-and-accessing
sk->sk_socket atomically under sk->sk_callback_lock.
Link: https://lore.kernel.org/netdev/176996279620.3109699.15382994681575380467@eldamar.lan/
Link: https://github.com/OpenVPN/ovpn-net-next/issues/29
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Fixes:
|
||
|
|
93686c472e |
ovpn: set sk_user_data before overriding callbacks
During initialization, we override socket callbacks and set sk_user_data
to an ovpn_socket instance. Currently, these two operations are
decoupled: callbacks are overridden before sk_user_data is set. While
existing callbacks perform safety checks for NULL or non-ovpn
sk_user_data, this condition causes a "half-formed" state where valid
packets arriving during attachment trigger error logs (e.g., "invoked on
non ovpn socket").
Set sk_user_data before overriding the callbacks so that it can be
accessed safely from them. Since we already check that the socket has no
sk_user_data before setting it, this remains safe even if an interrupt
accesses the socket after sk_user_data is set but before the callbacks
are overridden.
This also requires initializing all protocol-specific fields (such as
tcp_tx_work and peer links) before calling ovpn_socket_attach, ensuring
the ovpn_socket is fully formed before it becomes visible to any
callback.
Fixes:
|
||
|
|
efd729408b |
ovpn: use datagram_poll_queue for socket readiness in TCP
openvpn TCP encapsulation uses a custom queue to deliver packets to
userspace. Currently it relies on datagram_poll, which checks
sk_receive_queue, leading to false readiness signals when that queue
contains non-userspace packets.
Switch ovpn_tcp_poll to use datagram_poll_queue with the peer's
user_queue, ensuring poll only signals readiness when userspace data is
actually available. Also refactor ovpn_tcp_poll in order to enforce the
assumption we can make on the lifetime of ovpn_sock and peer.
Fixes:
|
||
|
|
a6a5e87b3e |
ovpn: avoid sleep in atomic context in TCP RX error path
Upon error along the TCP data_ready event path, we have
the following chain of calls:
strp_data_ready()
ovpn_tcp_rcv()
ovpn_peer_del()
ovpn_socket_release()
Since strp_data_ready() may be invoked from softirq context, and
ovpn_socket_release() may sleep, the above sequence may cause a
sleep in atomic context like the following:
BUG: sleeping function called from invalid context at ./ovpn-backports-ovpn-net-next-main-6.15.0-rc5-20250522/drivers/net/ovpn/socket.c:71
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 25, name: ksoftirqd/3
5 locks held by ksoftirqd/3/25:
#0: ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0xb8/0x5b0
OpenVPN/ovpn-backports#1: ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0xb8/0x5b0
OpenVPN/ovpn-backports#2: ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x66/0x1e0
OpenVPN/ovpn-backports#3: ffffffe003ce9818 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x156e/0x17a0
OpenVPN/ovpn-backports#4: ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: ovpn_tcp_data_ready+0x0/0x1b0 [ovpn]
CPU: 3 PID: 25 Comm: ksoftirqd/3 Not tainted 5.10.104+ #0
Call Trace:
walk_stackframe+0x0/0x1d0
show_stack+0x2e/0x44
dump_stack+0xc2/0x102
___might_sleep+0x29c/0x2b0
__might_sleep+0x62/0xa0
ovpn_socket_release+0x24/0x2d0 [ovpn]
unlock_ovpn+0x6e/0x190 [ovpn]
ovpn_peer_del+0x13c/0x390 [ovpn]
ovpn_tcp_rcv+0x280/0x560 [ovpn]
__strp_recv+0x262/0x940
strp_recv+0x66/0x80
tcp_read_sock+0x122/0x410
strp_data_ready+0x156/0x1f0
ovpn_tcp_data_ready+0x92/0x1b0 [ovpn]
tcp_data_ready+0x6c/0x150
tcp_rcv_established+0xb36/0xc50
tcp_v4_do_rcv+0x25e/0x380
tcp_v4_rcv+0x166a/0x17a0
ip_protocol_deliver_rcu+0x8c/0x250
ip_local_deliver_finish+0xf8/0x1e0
ip_local_deliver+0xc2/0x2d0
ip_rcv+0x1f2/0x330
__netif_receive_skb+0xfc/0x290
netif_receive_skb+0x104/0x5b0
br_pass_frame_up+0x190/0x3f0
br_handle_frame_finish+0x3e2/0x7a0
br_handle_frame+0x750/0xab0
__netif_receive_skb_core.constprop.0+0x4c0/0x17f0
__netif_receive_skb+0xc6/0x290
netif_receive_skb+0x104/0x5b0
xgmac_dma_rx+0x962/0xb40
__napi_poll.constprop.0+0x5a/0x350
net_rx_action+0x1fe/0x4b0
__do_softirq+0x1f8/0x85c
run_ksoftirqd+0x80/0xd0
smpboot_thread_fn+0x1f0/0x3e0
kthread+0x1e6/0x210
ret_from_kernel_thread+0x8/0xc
Fix this issue by postponing the ovpn_peer_del() call to
a scheduled worker, as we already do in ovpn_tcp_send_sock()
for the very same reason.
Fixes:
|
||
|
|
ba499a07ce |
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes:
|
||
|
|
36bb1d713a |
ovpn: add support for MSG_NOSIGNAL in tcp_sendmsg
Userspace may want to pass the MSG_NOSIGNAL flag to tcp_sendmsg() in order to avoid generating a SIGPIPE. To pass this flag down the TCP stack a new skb sending API accepting a flags argument is introduced. Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20250415-b4-ovpn-v26-13-577f6097b964@openvpn.net Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
11851cbd60 |
ovpn: implement TCP transport
With this change ovpn is allowed to communicate to peers also via TCP. Parsing of incoming messages is implemented through the strparser API. Note that ovpn redefines sk_prot and sk_socket->ops for the TCP socket used to communicate with the peer. For this reason it needs to access inet6_stream_ops, which is declared as extern in the IPv6 module, but it is not fully exported. Therefore this patch is also adding EXPORT_SYMBOL_GPL(inet6_stream_ops) to net/ipv6/af_inet6.c. Cc: David Ahern <dsahern@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20250415-b4-ovpn-v26-11-577f6097b964@openvpn.net Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Abeni <pabeni@redhat.com> |