Linux kernel source tree
Go to file
Eric Dumazet 6c4d4334e5 tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
[ Upstream commit 4bfe744ff1 ]

I had this bug sitting for too long in my pile, it is time to fix it.

Thanks to Doug Porter for reminding me of it!

We had various attempts in the past, including commit
0cbe6a8f08 ("tcp: remove SOCK_QUEUE_SHRUNK"),
but the issue is that TCP stack currently only generates
EPOLLOUT from input path, when tp->snd_una has advanced
and skb(s) cleaned from rtx queue.

If a flow has a big RTT, and/or receives SACKs, it is possible
that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0
and no more data can be sent until tp->snd_una finally advances.

What is needed is to also check if POLLOUT needs to be generated
whenever tp->snd_nxt is advanced, from output path.

This bug triggers more often after an idle period, as
we do not receive ACK for at least one RTT. tcp_notsent_lowat
could be a fraction of what CWND and pacing rate would allow to
send during this RTT.

In a followup patch, I will remove the bogus call
to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED)
from tcp_check_space(). Fact that we have decided to generate
an EPOLLOUT does not mean the application has immediately
refilled the transmit queue. This optimistic call
might have been the reason the bug seemed not too serious.

Tested:

200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2]

$ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat
$ cat bench_rr.sh
SUM=0
for i in {1..10}
do
 V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"`
 echo $V
 SUM=$(($SUM + $V))
done
echo SUM=$SUM

Before patch:
$ bench_rr.sh
130000000
80000000
140000000
140000000
140000000
140000000
130000000
40000000
90000000
110000000
SUM=1140000000

After patch:
$ bench_rr.sh
430000000
590000000
530000000
450000000
450000000
350000000
450000000
490000000
480000000
460000000
SUM=4680000000  # This is 410 % of the value before patch.

Fixes: c9bee3b7fd ("tcp: TCP_NOTSENT_LOWAT socket option")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Doug Porter <dsp@fb.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09 09:14:37 +02:00
arch arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock 2022-05-09 09:14:36 +02:00
block iocost: don't reset the inuse weight of under-weighted debtors 2022-05-09 09:14:31 +02:00
certs certs: Add support for using elliptic curve keys for signing modules 2021-08-23 19:55:42 +03:00
crypto crypto: xts - Add softdep on ecb 2022-04-08 14:23:55 +02:00
Documentation ext4, doc: fix incorrect h_reserved size 2022-04-27 14:39:01 +02:00
drivers net: hns3: add return value for mailbox handling in PF 2022-05-09 09:14:36 +02:00
fs ceph: fix possible NULL pointer dereference for req->r_session 2022-05-09 09:14:30 +02:00
include tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT 2022-05-09 09:14:37 +02:00
init init/main.c: return 1 from handled __setup() functions 2022-04-13 20:59:10 +02:00
ipc ipc/sem: do not sleep with a spin lock held 2022-02-08 18:34:03 +01:00
kernel bpf: Fix crash due to out of bounds access into reg2btf_ids. 2022-05-01 17:22:26 +02:00
lib hex2bin: fix access beyond string end 2022-05-09 09:14:30 +02:00
LICENSES LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" 2021-07-15 06:31:24 -06:00
mm mm: gup: make fault_in_safe_writeable() use fixup_user_fault() 2022-05-01 17:22:34 +02:00
net tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT 2022-05-09 09:14:37 +02:00
samples samples/bpf, xdpsock: Fix race when running for fix duration of time 2022-04-08 14:23:40 +02:00
scripts gcc-plugins: latent_entropy: use /dev/urandom 2022-04-20 09:34:18 +02:00
security Fix incorrect type in assignment of ipv6 port for audit 2022-04-08 14:23:55 +02:00
sound ASoC: soc-dapm: fix two incorrect uses of list iterator 2022-04-27 14:39:00 +02:00
tools selftests/bpf: Add test for reg2btf_ids out of bounds access 2022-05-01 17:22:34 +02:00
usr usr/include/Makefile: add linux/nfc.h to the compile-test coverage 2022-02-01 17:27:15 +01:00
virt KVM: avoid NULL pointer dereference in kvm_dirty_ring_push 2022-04-13 20:59:26 +02:00
.clang-format clang-format: Update with the latest for_each macro list 2021-05-12 23:32:39 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap mailmap: add Andrej Shadura 2021-10-18 20:22:03 -10:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Daniel Drake to credits 2021-09-21 08:34:58 +03:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: adjust file entry for of_net.c after movement 2022-03-08 19:12:53 +01:00
Makefile Linux 5.15.37 2022-05-01 17:22:35 +02:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.