Linux kernel source tree
Go to file
Tuong Lien ed9420ddce tipc: fix unlimited bundling of small messages
[ Upstream commit e95584a889 ]

We have identified a problem with the "oversubscription" policy in the
link transmission code.

When small messages are transmitted, and the sending link has reached
the transmit window limit, those messages will be bundled and put into
the link backlog queue. However, bundles of data messages are counted
at the 'CRITICAL' level, so that the counter for that level, instead of
the counter for the real, bundled message's level is the one being
increased.
Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
to be tested against the unchanged counter for their own level, while
contributing to an unrestrained increase at the CRITICAL backlog level.

This leaves a gap in congestion control algorithm for small messages
that can result in starvation for other users or a "real" CRITICAL
user. Even that eventually can lead to buffer exhaustion & link reset.

We fix this by keeping a 'target_bskb' buffer pointer at each levels,
then when bundling, we only bundle messages at the same importance
level only. This way, we know exactly how many slots a certain level
have occupied in the queue, so can manage level congestion accurately.

By bundling messages at the same level, we even have more benefits. Let
consider this:
- One socket sends 64-byte messages at the 'CRITICAL' level;
- Another sends 4096-byte messages at the 'LOW' level;

When a 64-byte message comes and is bundled the first time, we put the
overhead of message bundle to it (+ 40-byte header, data copy, etc.)
for later use, but the next message can be a 4096-byte one that cannot
be bundled to the previous one. This means the last bundle carries only
one payload message which is totally inefficient, as for the receiver
also! Later on, another 64-byte message comes, now we make a new bundle
and the same story repeats...

With the new bundling algorithm, this will not happen, the 64-byte
messages will be bundled together even when the 4096-byte message(s)
comes in between. However, if the 4096-byte messages are sent at the
same level i.e. 'CRITICAL', the bundling algorithm will again cause the
same overhead.

Also, the same will happen even with only one socket sending small
messages at a rate close to the link transmit's one, so that, when one
message is bundled, it's transmitted shortly. Then, another message
comes, a new bundle is created and so on...

We will solve this issue radically by another patch.

Fixes: 365ad353c2 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07 18:57:25 +02:00
arch arm: use STACK_TOP when computing mmap base address 2019-10-07 18:57:18 +02:00
block block: mq-deadline: Fix queue restart handling 2019-10-07 18:57:19 +02:00
certs export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR() 2018-08-22 23:21:44 +09:00
crypto crypto: chacha20poly1305 - fix atomic sleep when using async algorithm 2019-07-26 09:14:19 +02:00
Documentation ovl: fix regression caused by overlapping layers detection 2019-09-21 07:17:14 +02:00
drivers xen-netfront: do not use ~0U as error return value for xennet_fill_frags() 2019-10-07 18:57:25 +02:00
firmware kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
fs ocfs2: wait for recovering done after direct unlock request 2019-10-07 18:57:16 +02:00
include rxrpc: Fix rxrpc_recvmsg tracepoint 2019-10-07 18:57:22 +02:00
init initramfs: don't free a non-existent initrd 2019-10-01 08:26:09 +02:00
ipc ipc/mqueue.c: only perform resource calculation if user valid 2019-08-06 19:06:52 +02:00
kernel bpf: fix use after free in prog symbol exposure 2019-10-07 18:57:19 +02:00
lib kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K 2019-10-07 18:57:17 +02:00
LICENSES LICENSES: Remove CC-BY-SA-4.0 license text 2018-10-18 11:28:50 +02:00
mm mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone 2019-10-05 13:10:13 +02:00
net tipc: fix unlimited bundling of small messages 2019-10-07 18:57:25 +02:00
samples samples, bpf: suppress compiler warning 2019-07-14 08:11:04 +02:00
scripts randstruct: Check member structs in is_pure_ops_struct() 2019-10-05 13:10:02 +02:00
security security: smack: Fix possible null-pointer dereferences in smack_socket_sock_rcv_skb() 2019-10-07 18:57:14 +02:00
sound ASoC: Intel: Fix use of potentially uninitialized variable 2019-10-05 13:10:06 +02:00
tools udp: only do GSO if # of segs > 1 2019-10-07 18:57:24 +02:00
usr kbuild: clean compressed initramfs image 2019-10-07 18:57:16 +02:00
virt KVM: coalesced_mmio: add bounds checking 2019-09-21 07:16:44 +02:00
.clang-format clang-format: Set IndentWrappedFunctionNames false 2018-08-01 18:38:51 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS 9p: remove Ron Minnich from MAINTAINERS 2018-08-17 16:20:26 -07:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS platform/x86: Add Intel AtomISP2 dummy / power-management driver 2019-04-20 09:16:02 +02:00
Makefile Linux 4.19.77 2019-10-05 13:10:13 +02:00
README Docs: Added a pointer to the formatted docs to README 2018-03-21 09:02:53 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.