linux/net
Salvatore Dipietro 74b85edb20 tcp: Add memory barrier to tcp_push()
[ Upstream commit 7267e8dcad ]

On CPUs with weak memory models, reads and updates performed by tcp_push
to the sk variables can get reordered leaving the socket throttled when
it should not. The tasklet running tcp_wfree() may also not observe the
memory updates in time and will skip flushing any packets throttled by
tcp_push(), delaying the sending. This can pathologically cause 40ms
extra latency due to bad interactions with delayed acks.

Adding a memory barrier in tcp_push removes the bug, similarly to the
previous commit bf06200e73 ("tcp: tsq: fix nonagle handling").
smp_mb__after_atomic() is used to not incur in unnecessary overhead
on x86 since not affected.

Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu
22.04 and Apache Tomcat 9.0.83 running the basic servlet below:

import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class HelloWorldServlet extends HttpServlet {
    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
      throws ServletException, IOException {
        response.setContentType("text/html;charset=utf-8");
        OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
        String s = "a".repeat(3096);
        osw.write(s,0,s.length());
        osw.flush();
    }
}

Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
values is observed while, with the patch, the extra latency disappears.

No patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
  ...
 50.000%    0.91ms
 75.000%    1.13ms
 90.000%    1.46ms
 99.000%    1.74ms
 99.900%    1.89ms
 99.990%   41.95ms  <<< 40+ ms extra latency
 99.999%   48.32ms
100.000%   48.96ms

With patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
  ...
 50.000%    0.90ms
 75.000%    1.13ms
 90.000%    1.45ms
 99.000%    1.72ms
 99.900%    1.83ms
 99.990%    2.11ms  <<< no 40+ ms extra latency
 99.999%    2.53ms
100.000%    2.62ms

Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
affected by this issue and the patch doesn't introduce any additional
delay.

Fixes: 7aa5470c2c ("tcp: tsq: move tsq_flags close to sk_wmem_alloc")
Signed-off-by: Salvatore Dipietro <dipiets@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-31 16:19:02 -08:00
..
6lowpan
9p net: 9p: avoid freeing uninit memory in p9pdu_vreadf 2024-01-01 12:42:41 +00:00
802
8021q vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING 2024-01-31 16:19:01 -08:00
appletalk appletalk: Fix Use-After-Free in atalk_ioctl 2023-12-20 17:01:50 +01:00
atm atm: Fix Use-After-Free in do_vcc_ioctl 2023-12-20 17:01:48 +01:00
ax25 ax25: Kconfig: Update link for linux-ax25.org 2023-09-18 12:56:58 +01:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-24 10:51:39 -07:00
bluetooth Bluetooth: Fix atomicity violation in {min,max}_key_size_set 2024-01-25 15:35:46 -08:00
bpf bpf: Prevent inlining of bpf_fentry_test7() 2023-08-30 08:36:17 +02:00
bpfilter
bridge netfilter: bridge: replace physindev with physinif in nf_bridge_info 2024-01-25 15:35:59 -08:00
caif
can can: isotp: isotp_sendmsg(): fix TX state detection and wait behavior 2023-10-06 12:54:33 +02:00
ceph libceph: use kernel_connect() 2023-10-09 13:35:24 +02:00
core net: fix removing a namespace with conflicting altnames 2024-01-31 16:19:01 -08:00
dcb
dccp dccp/tcp: Call security_inet_conn_request() after setting IPv6 addresses. 2023-11-20 11:59:35 +01:00
devlink devlink: Hold devlink lock on health reporter dump get 2023-10-06 15:56:46 -07:00
dns_resolver keys, dns: Fix size check of V1 server-list header 2024-01-25 15:35:41 -08:00
dsa
ethernet
ethtool ethtool: netlink: Add missing ethnl_ops_begin/complete 2024-01-25 15:36:00 -08:00
handshake net/handshake: fix file ref count in handshake_nl_accept_doit() 2023-10-23 10:19:33 -07:00
hsr hsr: Prevent use after free in prp_create_tagged_frame() 2023-11-20 11:59:34 +01:00
ieee802154 sysctl-6.6-rc1 2023-08-29 17:39:15 -07:00
ife net: sched: ife: fix potential use-after-free 2024-01-01 12:42:30 +00:00
ipv4 tcp: Add memory barrier to tcp_push() 2024-01-31 16:19:02 -08:00
ipv6 ipv6: mcast: fix data-race in ipv6_mc_down / mld_ifc_work 2024-01-25 15:36:00 -08:00
iucv
kcm kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg(). 2023-09-14 10:43:51 +02:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
l2tp udp: annotate data-races around udp->encap_type 2023-11-20 11:58:56 +01:00
l3mdev
lapb
llc llc: Drop support for ETH_P_TR_802_2. 2024-01-31 16:19:01 -08:00
mac80211 wifi: mac80211: fix potential sta-link leak 2024-01-31 16:19:00 -08:00
mac802154
mctp mctp: perform route lookups under a RCU read-side lock 2023-10-10 19:43:22 -07:00
mpls
mptcp mptcp: relax check on MPC passive fallback 2024-01-25 15:35:59 -08:00
ncsi net/ncsi: Fix netlink major/minor version numbers 2024-01-25 15:35:20 -08:00
netfilter ipvs: avoid stat macros calls from preemptible context 2024-01-25 15:35:59 -08:00
netlabel calipso: fix memory leak in netlbl_calipso_add_pass() 2024-01-25 15:35:14 -08:00
netlink drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group 2023-12-13 18:45:10 +01:00
netrom netrom: Deny concurrent connect(). 2023-08-28 06:58:46 +01:00
nfc nfc: Do not send datagram if socket state isn't LLCP_BOUND 2024-01-20 11:51:46 +01:00
nsh
openvswitch net/sched: act_ct: Always fill offloading tuple iifidx 2023-11-20 11:59:37 +01:00
packet packet: Move reference count in packet_sock to atomic_long_t 2023-12-13 18:45:23 +01:00
phonet
psample psample: Require 'CAP_NET_ADMIN' when joining "packets" group 2023-12-13 18:45:10 +01:00
qrtr net: qrtr: ns: Return 0 if server port is not present 2024-01-20 11:51:47 +01:00
rds net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv 2024-01-31 16:19:01 -08:00
rfkill net: rfkill: gpio: set GPIO direction 2024-01-01 12:42:41 +00:00
rose net/rose: fix races in rose_kill_by_device() 2024-01-01 12:42:31 +00:00
rxrpc rxrpc: Fix use of Don't Fragment flag 2024-01-25 15:35:56 -08:00
sched net/sched: act_ct: fix skb leak and crash on ooo frags 2024-01-25 15:35:30 -08:00
sctp sctp: fix busy polling 2024-01-25 15:35:30 -08:00
smc net/smc: fix illegal rmb_desc access in SMC-D connection dump 2024-01-31 16:19:00 -08:00
strparser
sunrpc SUNRPC: use request size to initialize bio_vec in svc_udp_sendto() 2024-01-31 16:19:00 -08:00
switchdev
tipc tipc: Fix kernel-infoleak due to uninitialized TLV value 2023-11-28 17:19:51 +00:00
tls net: tls, fix WARNIING in __sk_msg_free 2024-01-25 15:35:58 -08:00
unix bpf: sockmap, fix proto update hook to avoid dup calls 2024-01-25 15:35:30 -08:00
vmw_vsock virtio/vsock: send credit update during setting SO_RCVLOWAT 2024-01-25 15:35:26 -08:00
wireless wifi: cfg80211: parse all ML elements in an ML probe response 2024-01-25 15:35:30 -08:00
x25
xdp xsk: add multi-buffer support for sockets sharing umem 2024-01-10 17:16:54 +01:00
xfrm ipsec-2023-10-17 2023-10-17 18:21:13 -07:00
compat.c
devres.c
Kconfig
Kconfig.debug
Makefile
socket.c net: Save and restore msg_namelen in sock_sendmsg 2024-01-10 17:16:51 +01:00
sysctl_net.c